date:20210422

[PATCH] [i386] Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR target/94680]

2021-04-22 Thread Hongtao Liu via Gcc-patches

Hi:
  If the second operand of __builtin_shuffle is const vector 0, and with
specific mask, it can be optimized to movq/vmovps.

.i.e.
foo128:
-   vxorps  %xmm1, %xmm1, %xmm1
-   vmovlhps%xmm1, %xmm0, %xmm0
+   vmovq   %xmm0, %xmm0

 foo256:
-   vxorps  %xmm1, %xmm1, %xmm1
-   vshuff32x4  $0, %ymm1, %ymm0, %ymm0
+   vmovaps %xmm0, %xmm0

 foo512:
-   vxorps  %xmm1, %xmm1, %xmm1
-   vshuff32x4  $68, %zmm1, %zmm0, %zmm0
+   vmovaps %ymm0, %ymm0

  Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

PR target/94680
* config/i386/sse.md (ssedoublevecmode): Add attribute for
V64QI/V32HI/V16SI/V4DI.
(ssehalfvecmode): Add attribute for V2DI/V2DF.
(*vec_concatv4si_0): Extend to VI124_128.
(*vec_concat_0): New pre-reload splitter.
* config/i386/predicates.md (movq_parallel): New predicate.

gcc/testsuite/ChangeLog:

PR target/94680
* gcc.target/i386/avx-pr94680.c: New test.
* gcc.target/i386/avx512f-pr94680.c: New test.
* gcc.target/i386/sse2-pr94680.c: New test.


-- 
BR,
Hongtao
From eec5469cdeecf0e6650e9d2963dea4117919c5d2 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 22 Apr 2021 15:33:16 +0800
Subject: [PATCH] [i386] Optimize __builtin_shuffle when it's used to zero the
 upper bits of the dest. [PR target/94680]

If the second operand of __builtin_shuffle is const vector 0, and with
specific mask, it can be optimized to movq/vmovps.

.i.e.
foo128:
-   vxorps  %xmm1, %xmm1, %xmm1
-   vmovlhps%xmm1, %xmm0, %xmm0
+   vmovq   %xmm0, %xmm0

 foo256:
-   vxorps  %xmm1, %xmm1, %xmm1
-   vshuff32x4  $0, %ymm1, %ymm0, %ymm0
+   vmovaps %xmm0, %xmm0

 foo512:
-   vxorps  %xmm1, %xmm1, %xmm1
-   vshuff32x4  $68, %zmm1, %zmm0, %zmm0
+   vmovaps %ymm0, %ymm0

gcc/ChangeLog:

	PR target/94680
	* config/i386/sse.md (ssedoublevecmode): Add attribute for
	V64QI/V32HI/V16SI/V4DI.
	(ssehalfvecmode): Add attribute for V2DI/V2DF.
	(*vec_concatv4si_0): Extend to VI124_128.
	(*vec_concat_0): New pre-reload splitter.
	* config/i386/predicates.md (movq_parallel): New predicate.

gcc/testsuite/ChangeLog:

	PR target/94680
	* gcc.target/i386/avx-pr94680.c: New test.
	* gcc.target/i386/avx512f-pr94680.c: New test.
	* gcc.target/i386/sse2-pr94680.c: New test.
---
 gcc/config/i386/predicates.md | 33 
 gcc/config/i386/sse.md| 37 +++--
 gcc/testsuite/gcc.target/i386/avx-pr94680.c   | 59 ++
 .../gcc.target/i386/avx512f-pr94680.c | 78 +++
 gcc/testsuite/gcc.target/i386/sse2-pr94680.c  | 51 
 5 files changed, 250 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-pr94680.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr94680.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr94680.c

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index b1df8548af6..4b706003ed8 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1524,6 +1524,39 @@ (define_predicate "misaligned_operand"
   (and (match_code "mem")
(match_test "MEM_ALIGN (op) < GET_MODE_BITSIZE (mode)")))
 
+;; Return true if OP is a parallel for an mov{d,q,dqa,ps,pd} vec_select,
+;; where one of the two operands of the vec_concat is const0_operand.
+(define_predicate "movq_parallel"
+  (match_code "parallel")
+{
+  unsigned nelt = XVECLEN (op, 0);
+  unsigned nelt2 = nelt >> 1;
+  unsigned i;
+
+  if (nelt < 2)
+return false;
+
+  /* Validate that all of the elements are constants,
+ lower halves of permute are lower halves of the first operand,
+ upper halves of permute come from any of the second operand.  */
+  for (i = 0; i < nelt; ++i)
+{
+  rtx er = XVECEXP (op, 0, i);
+  unsigned HOST_WIDE_INT ei;
+
+  if (!CONST_INT_P (er))
+	return 0;
+  ei = INTVAL (er);
+  if (i < nelt2 && ei != i)
+	return 0;
+  if (i >= nelt2
+  	 && (ei < nelt || ei >= nelt<<1))
+	return 0;
+}
+
+  return 1;
+})
+
 ;; Return true if OP is a vzeroall operation, known to be a PARALLEL.
 (define_predicate "vzeroall_operation"
   (match_code "parallel")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9d3728d1cb0..b55636a3e12 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -812,19 +812,22 @@ (define_mode_attr sseintvecmodelower
 
 ;; Mapping of vector modes to a vector mode of double size
 (define_mode_attr ssedoublevecmode
-  [(V32QI "V64QI") (V16HI "V32HI") (V8SI "V16SI") (V4DI "V8DI")
+  [(V64QI "V128QI") (V32HI "V64HI") (V16SI "V32SI") (V8DI "V16DI")
+   (V32QI "V64QI") (V16HI "V32HI") (V8SI "V16SI") (V4DI "V8DI")
(V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI")
+   (V16SF "V32SF") (V8DF "V16DF")
(V8SF "V16SF") (V4DF "V8DF")
(V4SF "V8SF") (V2DF "V4DF")])
 
 ;;

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Jason Merrill via Gcc-patches


On 4/22/21 4:55 AM, Martin Liška wrote:

There's an updated version of the patch, Jonathan noticed correctly
the comment related to assert was not correct.


Subject: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.


Subject line needs "c++:"

Please also include the rationale from your first message before the 
ChangeLog entries.


OK with those adjustments.

Jason

[PATCH] Switch AIX configuration to DWARF2 debugging

2021-04-22 Thread David Edelsohn via Gcc-patches

As requested at the end of Stage 4, this patch changes the debugging
format for AIX configuration of GCC to "DWARF2".  This is in
preparation for removing stabs debugging support from GCC.

The rs6000 configuration files remain somewhat intertwined with the
stabs debugging support, but the configuration no longer generates
stabs debugging information.

I have been bootstrapping and testing GCC with this configuration for years.

This patch means that earlier releases (Technology Levels) of AIX 7.1
and 7.2, prior to DWARF support and fixes, cannot build GCC or support
GCC.

Thanks, David

* config/rs6000/aix71.h (PREFERRED_DEBUGGING_TYPE): Change to DWARF2_DEBUG.
* config/rs6000/aix72.h (PREFERRED_DEBUGGING_TYPE): Same.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 3612ed2593b..807e260a175 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -272,9 +272,9 @@ extern long long intatoll(const char *);

 #define TARGET_AIX_VERSION 71

-/* AIX 7.1 supports DWARF3 debugging, but XCOFF remains the default.  */
+/* AIX 7.1 supports DWARF3+ debugging.  */
 #define DWARF2_DEBUGGING_INFO 1
-#define PREFERRED_DEBUGGING_TYPE XCOFF_DEBUG
+#define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
 #define DEBUG_INFO_SECTION "0x1"
 #define DEBUG_LINE_SECTION "0x2"
 #define DEBUG_PUBNAMES_SECTION "0x3"

diff --git a/gcc/config/rs6000/aix72.h b/gcc/config/rs6000/aix72.h
index d34909283cc..36c5d994439 100644
--- a/gcc/config/rs6000/aix72.h
+++ b/gcc/config/rs6000/aix72.h
@@ -273,9 +273,9 @@ extern long long intatoll(const char *);

 #define TARGET_AIX_VERSION 72

-/* AIX 7.2 supports DWARF3 debugging, but XCOFF remains the default.  */
+/* AIX 7.2 supports DWARF3+ debugging.  */
 #define DWARF2_DEBUGGING_INFO 1
-#define PREFERRED_DEBUGGING_TYPE XCOFF_DEBUG
+#define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
 #define DEBUG_INFO_SECTION "0x1"
 #define DEBUG_LINE_SECTION "0x2"
 #define DEBUG_PUBNAMES_SECTION "0x3"

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Bernhard Reutner-Fischer via Gcc-patches

On Thu, 22 Apr 2021 14:28:24 -0600
Martin Sebor via Gcc-patches  wrote:

> > enum E { e = 5 };
> > struct A { E e: 3; };
> > 
> > constexpr int number_of_bits ()
> > {
> > A a = { };
> > a.e = (E)-1;
> > return 32 - __builtin_clz(a.e);
> > }
> >   
> 
> I had the same thought about using clz.  It works in this case but
> not in if one of the enumerators is negative, or if the underlying
> type is signed.

or for -fshort-enums in this case as is.

Re: [PATCH] Fix logic error in 32-bit trampolines, PR target/98952

2021-04-22 Thread Segher Boessenkool

On Fri, Apr 09, 2021 at 05:09:07PM -0400, Michael Meissner wrote:
> Fix logic error in 32-bit trampolines, PR target/98952.
> 
> The test in the PowerPC 32-bit trampoline support is backwards.  It aborts
> if the trampoline size is greater than the expected size.  It should abort
> when the trampoline size is less than the expected size.

>   PR target/98952
>   * config/rs6000/tramp.S (__trampoline_setup): Fix trampoline size
>   comparison in 32-bit.

> --- a/libgcc/config/rs6000/tramp.S
> +++ b/libgcc/config/rs6000/tramp.S
> @@ -64,8 +64,7 @@ FUNC_START(__trampoline_setup)
>  mflr r11
>  addi r7,r11,trampoline_initial-4-.LCF0 /* trampoline address -4 */
>  
> - li  r8,trampoline_size  /* verify that the trampoline is big 
> enough */
> - cmpwcr1,r8,r4
> + cmpwi   cr1,r4,trampoline_size  /* verify that the trampoline is big 
> enough */
>   srwir4,r4,2 /* # words to move */
>   addir9,r3,-4/* adjust pointer for lwzu */
>   mtctr   r4

As Will says, it looks like the ELFv2 version has the same bug.  Please
fix that the same way.

In the commit message and the changelog, point out that you folded the
cmp with the li while you were at it.  It is easier to read code like
this so the change is fine, but do point it out.

Can you test this in a testcase somehow?  That would have found the
ELFv2 case, for example.

Okay for trunk.  Okay for backport to 11 when that branch opens again.
Does this need more backports?  (Those should follow after 11 of
course).

Thanks,

Segher

Re: [PATCH] Fix logic error in 32-bit trampolines, PR target/98952

2021-04-22 Thread Segher Boessenkool

On Mon, Apr 12, 2021 at 05:02:38PM -0500, will schmidt wrote:
> On Fri, 2021-04-09 at 17:09 -0400, Michael Meissner wrote:
> > -   li  r8,trampoline_size  /* verify that the trampoline is big 
> > enough */
> > -   cmpwcr1,r8,r4
> > +   cmpwi   cr1,r4,trampoline_size  /* verify that the trampoline is big 
> > enough */
> 
> Hmm, I spent several minutes trying to determine how cmpw behaves
> differently than cmpwi before noticing you also swapped the
> order of the r4,r8 operands. 
> 
> That seems OK.
> 
> A statement in the description indicating that you used a cmpwi instead
> of a cmpw since you were in the neighborhood would help call that out. 

In general, don't do two unrelated things, esp. when not pointing it out
explicitly.

> The #elif  _CALL_ELF == 2  portion of tramp.S (line 159 or so) has a
> similar compare stanza with respect to the order of operands on the
> compare.  Will this also have a backwards greater-than less-than issue?
> 
>   li  r8,trampoline_size  /* verify that the trampoline is big 
> enough */
>   cmpwcr1,r8,r4
>   srwir4,r4,3 /* # doublewords to move */
>   addir9,r3,-8/* adjust pointer for stdu */
>   mtctr   r4
>   blt cr1,.Labort

Not sure...  It should use cmpwi as well though, and then it is easier
to see.


Segher

[wwwdocs] IPA/LTO/profile-feedback changes

2021-04-22 Thread Jan Hubicka

Hi,
this patch adds changesentry for IPA/LTO and FDO.

Honza

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 6f58cfe8..bba16ead 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -170,6 +170,37 @@ a work-in-progress.
   use -g together with -gdwarf-2,
   -gdwarf-3 or -gdwarf-4.
   
+  
+Inter-procedural optimization improvements:
+
+  New IPA-modref pass was added to track side-effects of function calls
+  and improve precision of points-to-analysis. Pass can be controlled
+ by -fipa-modref attribute.
+  
+  Identical code folding pass was significantly improved to increase 
number of
+ unified functions and to reduce compile-time memory use.
+  IPA-CP heuristics improved its estimation of potential usefulness of
+  known loop bounds and strides by taking into account the estimated
+  frequency of these loops. 
+  
+  
+Link-time optimization improvements:
+
+  LTO bytecode file format was optimized for smaller object files and
+ faster streaming.
+  Memory allocation of the linking stage was improved to reduce peak
+ memory use.
+
+  
+  
+Profile driven optimization improvements:
+
+  
+Using https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Optimize-Options.html#index-fprofile-values;>-fprofile-values,
+   was improved by tracking more target values for e.g. indirect calls.
+  
+
+

Re: [PATCH] c++: Hard error with tentative parse and CTAD [PR87709]

2021-04-22 Thread Jason Merrill via Gcc-patches


On 4/22/21 11:34 AM, Patrick Palka wrote:

As described in detail in comment #4 of this PR, when tentatively
parsing a construct that can either be a type or an expression, if
during the type parse we encounter an unexpected template placeholder,
we need to simulate an error rather than issue a real error because the
subsequent expression parse can still succeed.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

PR c++/87709
* parser.c (cp_parser_type_id_1): If we see a template
placeholder, first try simulating an error before issuing a real
error.

gcc/testsuite/ChangeLog:

PR c++/87709
* g++.dg/cpp1z/class-deduction86.C: New test.
---
  gcc/cp/parser.c| 11 +++
  gcc/testsuite/g++.dg/cpp1z/class-deduction86.C | 16 
  2 files changed, 23 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction86.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fba516efa23..e1b1617da68 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -23270,10 +23270,13 @@ cp_parser_type_id_1 (cp_parser *parser, 
cp_parser_flags flags,
location_t loc = type_specifier_seq.locations[ds_type_spec];
if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node))
  {
-   error_at (loc, "missing template arguments after %qT",
- auto_node);
-   inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here",
-   tmpl);
+   if (!cp_parser_simulate_error (parser))
+ {
+   error_at (loc, "missing template arguments after %qT",
+ auto_node);
+   inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here",
+   tmpl);
+ }
  }
else if (parser->in_template_argument_list_p)
  error_at (loc, "%qT not permitted in template argument",
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C
new file mode 100644
index 000..a198ed24ec6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C
@@ -0,0 +1,16 @@
+// PR c++/87709
+// { dg-do compile { target c++17 } }
+
+template 
+struct lit {
+  lit(T) { }
+};
+
+template 
+int operator+(lit, lit) {
+  return 0;
+}
+
+auto r2 = (lit(0)) + lit(0);
+
+static_assert(sizeof(lit(0)));

Re: [PATCH] c++: Refine enum direct-list-initialization [CWG2374]

2021-04-22 Thread Jason Merrill via Gcc-patches


On 4/22/21 10:49 AM, Patrick Palka wrote:

This implements the wording changes of CWG2374, which clarifies the
wording of P0138 to forbid e.g. direct-list-initialization of a scoped
enumeration from a different scoped enumeration.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

DR 2374
* decl.c (is_direct_enum_init): Check the implicit
convertibility requirement added by CWG 2374.

gcc/testsuite/ChangeLog:

DR 2374
* g++.dg/cpp1z/direct-enum-init2.C: New test.
---
  gcc/cp/decl.c  | 8 +++-
  gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C | 8 
  2 files changed, 15 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index b81de8ef934..60dc2bf182d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6191,7 +6191,13 @@ is_direct_enum_init (tree type, tree init)
&& ENUM_FIXED_UNDERLYING_TYPE_P (type)
&& TREE_CODE (init) == CONSTRUCTOR
&& CONSTRUCTOR_IS_DIRECT_INIT (init)
-  && CONSTRUCTOR_NELTS (init) == 1)
+  && CONSTRUCTOR_NELTS (init) == 1
+  /* DR 2374: The single element needs to be implicitly
+convertible to the underlying type of the enum.  */
+  && can_convert_arg (ENUM_UNDERLYING_TYPE (type),
+ TREE_TYPE (CONSTRUCTOR_ELT (init, 0)->value),
+ CONSTRUCTOR_ELT (init, 0)->value,
+ LOOKUP_IMPLICIT, tf_none))
  return true;
return false;
  }
diff --git a/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C 
b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C
new file mode 100644
index 000..5b94a8d00fe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C
@@ -0,0 +1,8 @@
+// DR 2374
+// { dg-do compile { target c++17 } }
+
+enum class Orange;
+enum class Apple;
+
+extern Orange o;
+Apple a{o}; // { dg-error "cannot convert" }

[PATCH 1/2] Generate overlapping operations between two areas of memory

2021-04-22 Thread H.J. Lu via Gcc-patches

For op_by_pieces operations between two areas of memory on non-strict
alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to
generate overlapping operations to minimize number of operations if it
is not a stack push which must not overlap.

When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on
starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes
and finishes with the smallest usable integer size, MIN_SIZE, for the
remaining bytes where MAX_SIZE >= MIN_SIZE.  If MIN_SIZE > the remaining
bytes, the last operation is performed on MIN_SIZE bytes of overlapping
memory from the previous operation.

For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates
an overlapping fill with MAX_SIZE if the number of the remaining bytes is
greater than one.

Tested on Linux/x86-64 with both -foverlap-op-by-pieces enabled and
disabled by default.

gcc/

PR middl-end/90773
* common.opt (-foverlap-op-by-pieces): New.
* expr.c (by_pieces_ninsns): If -foverlap-op-by-pieces is enabled,
round up size and alignment to the widest integer mode for maximum
size
(op_by_pieces_d): Add get_usable_mode, m_push and
m_non_zero_memset.
(op_by_pieces_d::op_by_pieces_d): Add 2 bool arguments to
initialize m_push and m_non_zero_memset.
(op_by_pieces_d::get_usable_mode): New.
(op_by_pieces_d::run): Use get_usable_mode to get the largest
usable integer mode and generate overlapping operations for
-foverlap-op-by-pieces.
(PUSHG_P): New.
(move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
change.
(store_by_pieces_d::store_by_pieces_d): Likewise.
(clear_by_pieces): Likewsie.
* toplev.c (process_options): Issue an error when
-foverlap-op-by-pieces is used for strict alignment target.
* doc/invoke.texi: Document -foverlap-op-by-pieces.

gcc/testsuite/

PR middl-end/90773
* g++.dg/pr90773-1.h: New test.
* g++.dg/pr90773-1a.C: Likewise.
* g++.dg/pr90773-1b.C: Likewise.
* g++.dg/pr90773-1c.C: Likewise.
* g++.dg/pr90773-1d.C: Likewise.
* gcc.target/i386/pr90773-1.c: Likewise.
* gcc.target/i386/pr90773-2.c: Likewise.
* gcc.target/i386/pr90773-3.c: Likewise.
* gcc.target/i386/pr90773-4.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-6.c: Likewise.
* gcc.target/i386/pr90773-7.c: Likewise.
* gcc.target/i386/pr90773-8.c: Likewise.
* gcc.target/i386/pr90773-9.c: Likewise.
* gcc.target/i386/pr90773-10.c: Likewise.
* gcc.target/i386/pr90773-11.c: Likewise.
---
 gcc/common.opt |  19 +++
 gcc/doc/invoke.texi|  14 ++
 gcc/expr.c | 159 -
 gcc/testsuite/g++.dg/pr90773-1.h   |  14 ++
 gcc/testsuite/g++.dg/pr90773-1a.C  |  13 ++
 gcc/testsuite/g++.dg/pr90773-1b.C  |   5 +
 gcc/testsuite/g++.dg/pr90773-1c.C  |   5 +
 gcc/testsuite/g++.dg/pr90773-1d.C  |  19 +++
 gcc/testsuite/gcc.target/i386/pr90773-1.c  |  17 +++
 gcc/testsuite/gcc.target/i386/pr90773-10.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-11.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-2.c  |  20 +++
 gcc/testsuite/gcc.target/i386/pr90773-3.c  |  23 +++
 gcc/testsuite/gcc.target/i386/pr90773-4.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-5.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-6.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-7.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-8.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-9.c  |  13 ++
 gcc/toplev.c   |   8 ++
 20 files changed, 383 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c

diff --git a/gcc/common.opt b/gcc/common.opt
index a75b44ee47e..7f5b38c7810 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2123,6 +2123,25 @@

[PATCH 2/2] x86: Enable -foverlap-op-by-pieces by default

2021-04-22 Thread H.J. Lu via Gcc-patches

Add TARGET_PREFER_OVERLAP_OP_BY_PIECES for Alder Lake and Intel core
processoes with AVX2 to enable -foverlap-op-by-pieces by default.

gcc/

PR middl-end/90773
* config/i386/i386-options.c (ix86_option_override_internal):
Enable -foverlap-op-by-pieces by default for
TARGET_PREFER_OVERLAP_OP_BY_PIECES.
* config/i386/i386.h (TARGET_PREFER_OVERLAP_OP_BY_PIECES): New.
* config/i386/x86-tune.def (X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES):
New.

gcc/testsuite/

PR middl-end/90773
* gcc.target/i386/pr90773-12.c: New test.
* gcc.target/i386/pr90773-13.c: Likewise.
---
 gcc/config/i386/i386-options.c |  3 +++
 gcc/config/i386/i386.h |  2 ++
 gcc/config/i386/x86-tune.def   |  6 ++
 gcc/testsuite/gcc.target/i386/pr90773-12.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr90773-13.c | 11 +++
 5 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 2a12228d195..5949d2d5597 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2821,6 +2821,9 @@ ix86_option_override_internal (bool main_args_p,
   if (ix86_indirect_branch != indirect_branch_keep)
 SET_OPTION_IF_UNSET (opts, opts_set, flag_jump_tables, 0);
 
+  if (TARGET_PREFER_OVERLAP_OP_BY_PIECES)
+SET_OPTION_IF_UNSET (opts, opts_set, flag_overlap_op_by_pieces, 1);
+
   return true;
 }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 96b46bac238..cf24fecaddc 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -304,6 +304,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 #define TARGET_SINGLE_STRINGOP ix86_tune_features[X86_TUNE_SINGLE_STRINGOP]
 #define TARGET_PREFER_KNOWN_REP_MOVSB_STOSB \
   ix86_tune_features[X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB]
+#define TARGET_PREFER_OVERLAP_OP_BY_PIECES \
+  ix86_tune_features[X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES]
 #define TARGET_MISALIGNED_MOVE_STRING_PRO_EPILOGUES \
ix86_tune_features[X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES]
 #define TARGET_QIMODE_MATH ix86_tune_features[X86_TUNE_QIMODE_MATH]
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index eb057a67750..848c1b53ad4 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -275,6 +275,12 @@ DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB,
  "prefer_known_rep_movsb_stosb",
  m_SKYLAKE | m_ALDERLAKE | m_CORE_AVX512)
 
+/* X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES: Enable -foverlap-op-by-pieces-run
+   by default.  */
+DEF_TUNE (X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES,
+ "prefer_overlap_op_by_pieces",
+ m_CORE_AVX2)
+
 /* X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES: Enable generation of
compact prologues and epilogues by issuing a misaligned moves.  This
requires target to handle misaligned moves and partial memory stalls
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-12.c 
b/gcc/testsuite/gcc.target/i386/pr90773-12.c
new file mode 100644
index 000..e45840a5b8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-12.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
+
+void
+foo (char *dst, char *src)
+{
+  __builtin_memcpy (dst, src, 255);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 
16 } } */
+/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-13.c 
b/gcc/testsuite/gcc.target/i386/pr90773-13.c
new file mode 100644
index 000..4d5ae8d1086
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-13.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */
+
+void
+foo (char *dst)
+{
+  __builtin_memset (dst, 0, 255);
+}
+
+/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, 
\[0-9\]*\\(%\[\^,\]+\\)" 16 } } */
+/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */
-- 
2.30.2

[PATCH 0/2] Generate overlapping operations between two areas of memory

2021-04-22 Thread H.J. Lu via Gcc-patches

For op_by_pieces operations between two areas of memory on non-strict
alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to
generate overlapping operations to minimize number of operations if it
is not a stack push which must not overlap.

When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on
starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes
and finishes with the smallest usable integer size, MIN_SIZE, for the
remaining bytes where MAX_SIZE >= MIN_SIZE.  If MIN_SIZE > the remaining
bytes, the last operation is performed on MIN_SIZE bytes of overlapping
memory from the previous operation.

For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates
an overlapping fill with MAX_SIZE if the number of the remaining bytes is
greater than one.

Code sizes are reduced slightly on glibc and GCC.  Performance impact on
SPEC CPU 2017 on Intel Xeon are within noise range.

H.J. Lu (2):
  Generate overlapping operations between two areas of memory
  x86: Enable -foverlap-op-by-pieces by default

 gcc/common.opt |  19 +++
 gcc/config/i386/i386-options.c |   3 +
 gcc/config/i386/i386.h |   2 +
 gcc/config/i386/x86-tune.def   |   6 +
 gcc/doc/invoke.texi|  14 ++
 gcc/expr.c | 159 -
 gcc/testsuite/g++.dg/pr90773-1.h   |  14 ++
 gcc/testsuite/g++.dg/pr90773-1a.C  |  13 ++
 gcc/testsuite/g++.dg/pr90773-1b.C  |   5 +
 gcc/testsuite/g++.dg/pr90773-1c.C  |   5 +
 gcc/testsuite/g++.dg/pr90773-1d.C  |  19 +++
 gcc/testsuite/gcc.target/i386/pr90773-1.c  |  17 +++
 gcc/testsuite/gcc.target/i386/pr90773-10.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-11.c |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-12.c |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-13.c |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-2.c  |  20 +++
 gcc/testsuite/gcc.target/i386/pr90773-3.c  |  23 +++
 gcc/testsuite/gcc.target/i386/pr90773-4.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-5.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-6.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-7.c  |  11 ++
 gcc/testsuite/gcc.target/i386/pr90773-8.c  |  13 ++
 gcc/testsuite/gcc.target/i386/pr90773-9.c  |  13 ++
 gcc/toplev.c   |   8 ++
 25 files changed, 416 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C
 create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c

-- 
2.30.2

[Patch] PR fortran/100154 - [9/10/11/12 Regression] ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131

2021-04-22 Thread Harald Anlauf via Gcc-patches

Now with the correct patch attached ...

Sorry for the confusion!

---

Dear Fortranners,

we need to check the arguments to the affected GNU intrinsic extensions
properly, and - as pointed out in the PR by Tobias - we need to allow
function references that have a data pointer result.  Also the argument
names of the character arguments of the subroutine versions needed a
fix ("c" instead of "count").

Regtested on x86_64-pc-linux-gnu.  OK for mainline (12)?
OK for backports after 11.1 release?

Thanks,
Harald


PR fortran/100154 - ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131

Add appropriate static checks for the character and status arguments to
the GNU Fortran intrinsic extensions fget[c], fput[c].  Extend variable
check to allow a function reference having a data pointer result.

gcc/fortran/ChangeLog:

PR fortran/100154
* check.c (variable_check): Allow function reference having a data
pointer result.
(arg_strlen_is_zero): New function.
(gfc_check_fgetputc_sub): Add static check of character and status
arguments.
(gfc_check_fgetput_sub): Likewise.
* intrinsic.c (add_subroutines): Fix argument name for the
character argument to intrinsic subroutines fget[c], fput[c].

gcc/testsuite/ChangeLog:

PR fortran/100154
* gfortran.dg/pr100154.f90: New test.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 82db8e4e1b2..1d30c93df82 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -1055,6 +1055,13 @@ variable_check (gfc_expr *e, int n, bool allow_proc)
  return true;
 }

+  /* F2018:R902: function reference having a data pointer result.  */
+  if (e->expr_type == EXPR_FUNCTION
+  && e->symtree->n.sym->attr.flavor == FL_PROCEDURE
+  && e->symtree->n.sym->attr.function
+  && e->symtree->n.sym->attr.pointer)
+return true;
+
   gfc_error ("%qs argument of %qs intrinsic at %L must be a variable",
 gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic, >where);

@@ -5689,6 +5691,19 @@ gfc_check_spread (gfc_expr *source, gfc_expr *dim, gfc_expr *ncopies)
 /* Functions for checking FGETC, FPUTC, FGET and FPUT (subroutines and
functions).  */

+bool
+arg_strlen_is_zero (gfc_expr *c, int n)
+{
+  if (gfc_var_strlen (c) == 0)
+{
+  gfc_error ("%qs argument of %qs intrinsic at %L must have "
+		 "length at least 1", gfc_current_intrinsic_arg[n]->name,
+		 gfc_current_intrinsic, >where);
+  return true;
+}
+  return false;
+}
+
 bool
 gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status)
 {
@@ -5702,13 +5717,19 @@ gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status)
 return false;
   if (!kind_value_check (c, 1, gfc_default_character_kind))
 return false;
+  if (strcmp (gfc_current_intrinsic, "fgetc") == 0
+  && !variable_check (c, 1, false))
+return false;
+  if (arg_strlen_is_zero (c, 1))
+return false;

   if (status == NULL)
 return true;

   if (!type_check (status, 2, BT_INTEGER)
   || !kind_value_check (status, 2, gfc_default_integer_kind)
-  || !scalar_check (status, 2))
+  || !scalar_check (status, 2)
+  || !variable_check (status, 2, false))
 return false;

   return true;
@@ -5729,13 +5750,19 @@ gfc_check_fgetput_sub (gfc_expr *c, gfc_expr *status)
 return false;
   if (!kind_value_check (c, 0, gfc_default_character_kind))
 return false;
+  if (strcmp (gfc_current_intrinsic, "fget") == 0
+  && !variable_check (c, 0, false))
+return false;
+  if (arg_strlen_is_zero (c, 0))
+return false;

   if (status == NULL)
 return true;

   if (!type_check (status, 1, BT_INTEGER)
   || !kind_value_check (status, 1, gfc_default_integer_kind)
-  || !scalar_check (status, 1))
+  || !scalar_check (status, 1)
+  || !variable_check (status, 1, false))
 return false;

   return true;
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 17fd92eb462..219f04f2317 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -3460,7 +3460,7 @@ add_subroutines (void)
   /* Argument names.  These are used as argument keywords and so need to
  match the documentation.  Please keep this list in sorted order.  */
   static const char
-*a = "a", *c = "count", *cm = "count_max", *com = "command",
+*a = "a", *c_ = "c", *c = "count", *cm = "count_max", *com = "command",
 *cr = "count_rate", *dt = "date", *errmsg = "errmsg", *f = "from",
 *fp = "frompos", *gt = "get", *h = "harvest", *han = "handler",
 *length = "length", *ln = "len", *md = "mode", *msk = "mask",
@@ -3840,12 +3840,12 @@ add_subroutines (void)
   add_sym_3s ("fgetc", GFC_ISYM_FGETC, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU,
 	  gfc_check_fgetputc_sub, NULL, gfc_resolve_fgetc_sub,
 	  ut, BT_INTEGER, di, REQUIRED, INTENT_IN,
-	  c, BT_CHARACTER, dc, REQUIRED, INTENT_OUT,
+	  c_, BT_CHARACTER, dc, REQUIRED,

[Patch] PR fortran/100154 - [9/10/11/12 Regression] ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131

2021-04-22 Thread Harald Anlauf via Gcc-patches

Dear Fortranners,

we need to check the arguments to the affected GNU intrinsic extensions
properly, and - as pointed out in the PR by Tobias - we need to allow
function references that have a data pointer result.  Also the argument
names of the character arguments of the subroutine versions needed a
fix ("c" instead of "count").

Regtested on x86_64-pc-linux-gnu.  OK for mainline (12)?
OK for backports after 11.1 release?

Thanks,
Harald


PR fortran/100154 - ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131

Add appropriate static checks for the character and status arguments to
the GNU Fortran intrinsic extensions fget[c], fput[c].  Extend variable
check to allow a function reference having a data pointer result.

gcc/fortran/ChangeLog:

PR fortran/100154
* check.c (variable_check): Allow function reference having a data
pointer result.
(arg_strlen_is_zero): New function.
(gfc_check_fgetputc_sub): Add static check of character and status
arguments.
(gfc_check_fgetput_sub): Likewise.
* intrinsic.c (add_subroutines): Fix argument name for the
character argument to intrinsic subroutines fget[c], fput[c].

gcc/testsuite/ChangeLog:

PR fortran/100154
* gfortran.dg/pr100154.f90: New test.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 82db8e4e1b2..1d30c93df82 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -5689,6 +5691,19 @@ gfc_check_spread (gfc_expr *source, gfc_expr *dim, gfc_expr *ncopies)
 /* Functions for checking FGETC, FPUTC, FGET and FPUT (subroutines and
functions).  */

+bool
+arg_strlen_is_zero (gfc_expr *c, int n)
+{
+  if (gfc_var_strlen (c) == 0)
+{
+  gfc_error ("%qs argument of %qs intrinsic at %L must have "
+		 "length at least 1", gfc_current_intrinsic_arg[n]->name,
+		 gfc_current_intrinsic, >where);
+  return true;
+}
+  return false;
+}
+
 bool
 gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status)
 {
@@ -5702,13 +5717,19 @@ gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status)
 return false;
   if (!kind_value_check (c, 1, gfc_default_character_kind))
 return false;
+  if (strcmp (gfc_current_intrinsic, "fgetc") == 0
+  && !variable_check (c, 1, false))
+return false;
+  if (arg_strlen_is_zero (c, 1))
+return false;

   if (status == NULL)
 return true;

   if (!type_check (status, 2, BT_INTEGER)
   || !kind_value_check (status, 2, gfc_default_integer_kind)
-  || !scalar_check (status, 2))
+  || !scalar_check (status, 2)
+  || !variable_check (status, 2, false))
 return false;

   return true;
@@ -5729,13 +5750,19 @@ gfc_check_fgetput_sub (gfc_expr *c, gfc_expr *status)
 return false;
   if (!kind_value_check (c, 0, gfc_default_character_kind))
 return false;
+  if (strcmp (gfc_current_intrinsic, "fget") == 0
+  && !variable_check (c, 0, false))
+return false;
+  if (arg_strlen_is_zero (c, 0))
+return false;

   if (status == NULL)
 return true;

   if (!type_check (status, 1, BT_INTEGER)
   || !kind_value_check (status, 1, gfc_default_integer_kind)
-  || !scalar_check (status, 1))
+  || !scalar_check (status, 1)
+  || !variable_check (status, 1, false))
 return false;

   return true;
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 17fd92eb462..219f04f2317 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -3460,7 +3460,7 @@ add_subroutines (void)
   /* Argument names.  These are used as argument keywords and so need to
  match the documentation.  Please keep this list in sorted order.  */
   static const char
-*a = "a", *c = "count", *cm = "count_max", *com = "command",
+*a = "a", *c_ = "c", *c = "count", *cm = "count_max", *com = "command",
 *cr = "count_rate", *dt = "date", *errmsg = "errmsg", *f = "from",
 *fp = "frompos", *gt = "get", *h = "harvest", *han = "handler",
 *length = "length", *ln = "len", *md = "mode", *msk = "mask",
@@ -3840,12 +3840,12 @@ add_subroutines (void)
   add_sym_3s ("fgetc", GFC_ISYM_FGETC, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU,
 	  gfc_check_fgetputc_sub, NULL, gfc_resolve_fgetc_sub,
 	  ut, BT_INTEGER, di, REQUIRED, INTENT_IN,
-	  c, BT_CHARACTER, dc, REQUIRED, INTENT_OUT,
+	  c_, BT_CHARACTER, dc, REQUIRED, INTENT_OUT,
 	  st, BT_INTEGER, di, OPTIONAL, INTENT_OUT);

   add_sym_2s ("fget", GFC_ISYM_FGET, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU,
 	  gfc_check_fgetput_sub, NULL, gfc_resolve_fget_sub,
-	  c, BT_CHARACTER, dc, REQUIRED, INTENT_OUT,
+	  c_, BT_CHARACTER, dc, REQUIRED, INTENT_OUT,
 	  st, BT_INTEGER, di, OPTIONAL, INTENT_OUT);

   add_sym_1s ("flush", GFC_ISYM_FLUSH, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU,
@@ -3855,12 +3855,12 @@ add_subroutines (void)
   add_sym_3s ("fputc", GFC_ISYM_FPUTC, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU,
 	  gfc_check_fgetputc_sub, NULL,

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Martin Sebor via Gcc-patches

On 4/22/21 9:41 AM, Jonathan Wakely wrote:

On Thu, 22 Apr 2021 at 15:59, Martin Sebor  wrote:

On 4/22/21 2:52 AM, Jonathan Wakely wrote:

On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote:

 On 4/21/21 6:11 PM, Martin Sebor wrote:
  > On 4/21/21 2:15 AM, Martin Liška wrote:
  >> Hello.
  >>
  >> It's addressing the following Clang warning:
  >> cp/lex.c:170:45: warning: result of comparison of constant 64
 with expression of type 'enum ovl_op_code' is always true
 [-Wtautological-constant-out-of-range-compare]
  >>
  >> Patch can bootstrap on x86_64-linux-gnu and survives regression
 tests.
  >>
  >> Ready to be installed?
  >> Thanks,
  >> Martin
  >>
  >> gcc/cp/ChangeLog:
  >>
  >> * cp-tree.h (STATIC_ASSERT): Prefer static assert.
  >> * lex.c (init_operators): Remove run-time check.
  >> ---
  >>   gcc/cp/cp-tree.h | 3 +++
  >>   gcc/cp/lex.c | 2 --
  >>   2 files changed, 3 insertions(+), 2 deletions(-)
  >>
  >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
  >> index 81ff375f8a5..a8f72448ea9 100644
  >> --- a/gcc/cp/cp-tree.h
  >> +++ b/gcc/cp/cp-tree.h
  >> @@ -5916,6 +5916,9 @@ enum ovl_op_code {
  >> OVL_OP_MAX
  >>   };
  >>   +/* Make sure it fits in lang_decl_fn::operator_code. */
  >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6));
  >> +
  >
  > I wonder if there's a way to test this directly by something like
  >
  >   static_assert (number-of-bits (ovl_op_info_t::ovl_op_code)
  >  <= number-of-bits (lang_decl_fn::operator_code));

 Good point, but I'm not aware of it. Maybe C++ people can chime in?

ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class")
with no fixed underlying type (i.e. no enum-base like ": int" or ":
long" is specified) which means that the number of bits in is value
representation is the number of bits needed to represent the minimum and
maximum enumerators:

"the values of the enumeration are the values representable by a
hypothetical integer type with minimal width M such that all enumerators
can be represented."

There is no function/utility like number-of-bits that can tell you that
from the type though.You could use
std::underlying_type::type to get the integral type that
the compiler used to represent it, but that will probably be 'int' in
this case and so all it tells you is an upper bound of no more than 32
bits, which is not useful for this purpose.

I suspected there wasn't a function like that.  Thanks for confirming
it.  I wrote the one below just to see if it could be done.  It works
for one bit-field but I can't think of a way to generalize it.  We'd
probably need a built-in for that.  Perhaps one might be useful.

enum E { e = 5 };
struct A { E e: 3; };

constexpr int number_of_bits ()
{
A a = { };
a.e = (E)-1;
int n = 0;
for (; a.e; ++n)
  a.e = (E)((unsigned)a.e ^ (1 << n));
return n;
}

Martin

Or:

enum E { e = 5 };
struct A { E e: 3; };

constexpr int number_of_bits ()
{
A a = { };
a.e = (E)-1;
return 32 - __builtin_clz(a.e);
}

I had the same thought about using clz.  It works in this case but
not in if one of the enumerators is negative, or if the underlying
type is signed.

But you can't get the number-of-bits needed for all the values of the
enum E, which is what I was referring to.

If you know the enumerators go from 0 to MAX (as is the case for
ovl_op_code) you can use (32 - __builtin_clz(MAX)) there too, but in
the general case you don't always know the maximum enumerator without
checking, and it depends whether the enumeration has a fixed
underlying type.

That might be another useful query to add a built-in or trait
for to improve introspection: get the min and max enumerator (or
even all of them, e.g., as an initializer_list or something like
that).

Martin

[Patch] PR fortran/100218 - target of pointer from evaluation of function-reference

2021-04-22 Thread Harald Anlauf via Gcc-patches

Dear Fortranners,

while analyzing a different PR (PR100154), Tobias pointed out that the
target of a pointer from the evaluation of function-reference is allowed
to be used in a variable definition context and thus as an actual
argument to a function or subroutine.

This seems to be a more general issue that seems to have been overlooked.
The attached simple patch allows to compile and run the attached example,
which is by the way already yet rejected with -std=f2003.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?
Shall we backport this to (at least) 11?

Thanks,
Harald


Fortran - allow target of pointer from evaluation of function-reference

Fortran allows the target of a pointer from the evaluation of a
function-reference in a variable definition context (e.g. F2018:R902).

gcc/fortran/ChangeLog:

PR fortran/100218
* expr.c (gfc_check_vardef_context): Extend check to allow pointer
from a function reference.

gcc/testsuite/ChangeLog:

PR fortran/100218
* gfortran.dg/pr100218.f90: New test.

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 92a6700568d..696b9f1daac 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -6121,7 +6132,9 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj,
 }
   if (!pointer && sym->attr.flavor != FL_VARIABLE
   && !(sym->attr.flavor == FL_PROCEDURE && sym == sym->result)
-  && !(sym->attr.flavor == FL_PROCEDURE && sym->attr.proc_pointer))
+  && !(sym->attr.flavor == FL_PROCEDURE && sym->attr.proc_pointer)
+  && !(sym->attr.flavor == FL_PROCEDURE
+	   && sym->attr.function && sym->attr.pointer))
 {
   if (context)
 	gfc_error ("%qs in variable definition context (%s) at %L is not"
diff --git a/gcc/testsuite/gfortran.dg/pr100218.f90 b/gcc/testsuite/gfortran.dg/pr100218.f90
new file mode 100644
index 000..62b18f6a935
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr100218.f90
@@ -0,0 +1,19 @@
+! { dg-do run }
+! { dg-options "-O2 -std=f2008" }
+! PR fortran/100218 - target of pointer from evaluation of function-reference
+
+program p
+  implicit none
+  integer, target :: z = 0
+  call g (f ())
+  if (z /= 1) stop 1
+contains
+  function f () result (r)
+integer, pointer :: r
+r => z
+  end function f
+  subroutine g (x)
+integer, intent(out) :: x
+x = 1
+  end subroutine g
+end program p

[PATCH] config/i386: Commentary typo fix

2021-04-22 Thread Bernhard Reutner-Fischer via Gcc-patches

From: Bernhard Reutner-Fischer 

gcc/ChangeLog:

* config/i386/x86-tune-sched-bd.c (dispatch_group): Commentary
typo fix.
---
 gcc/config/i386/x86-tune-sched-bd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune-sched-bd.c 
b/gcc/config/i386/x86-tune-sched-bd.c
index ad0edf713f5..be38e48b271 100644
--- a/gcc/config/i386/x86-tune-sched-bd.c
+++ b/gcc/config/i386/x86-tune-sched-bd.c
@@ -67,7 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #define BIG 100
 
 
-/* Dispatch groups.  Istructions that affect the mix in a dispatch window.  */
+/* Dispatch groups.  Instructions that affect the mix in a dispatch window.  */
 enum dispatch_group {
   disp_no_group = 0,
   disp_load,
-- 
2.31.1

Re: Fix Fortran rounding issues, PR fortran/96983.

2021-04-22 Thread Richard Biener via Gcc-patches

On April 22, 2021 9:09:28 PM GMT+02:00, Michael Meissner 
 wrote:
>On Wed, Apr 21, 2021 at 10:10:07AM +0200, Tobias Burnus wrote:
>> On 20.04.21 08:58, Richard Biener via Fortran wrote:
>> >On Mon, Apr 19, 2021 at 9:40 PM Michael Meissner via Fortran
>> >  wrote:
>> Is there any reason to not only send the email to fortran@ _and_
>> gcc-patches@ but sending it to 13 Fortran maintainers explicitly?
>(Now
>> removed)
>
>Sorry about that.  With PowerPC backend changes, I generally do
>explicitly add
>the maintainers so things don't get lost.
>
>
>> >>Fix Fortran rounding issues, PR fortran/96983.
>> >>
>> >>Can I check this change into the GCC trunk?
>> >The patch looks fine technically and is definitely an improvement
>since the
>> >intermediate conversion looks odd.  But it might be that the
>standard
>> >requires such dance though the preceeding cases handled don't seem
>to
>> >care.  I'm thinking of a FP format where round(1.6) == 3 because of
>lack
>> >of precision but using an intermediate FP format with higher
>precision
>> >would "correctly" compute 2.
>> 
>> The patched build_round_expr is only called by ANINT / NINT;
>> NINT is real → integer; ANINT is real → real
>> [And the modified code is only called for NINT, reason: see comment
>far below.]
>> 
>> NINT (A[, KIND]) is described (F2018) as "Nearest integer":
>> * Result Characteristics. Integer. If KIND is present, the kind type
>parameter
>>   is that specified by the value of KIND;
>>   otherwise, the kind type parameter is that of default integer type.
>> * The result is the integer nearest A, or if there are two
>>   integers equally near A, the result is whichever such integer has
>the greater
>>   magnitude.
>> * Example. NINT (2.783) has the value 3.
>> 
>> ANINT (A[, KIND]) as "Nearest whole number":
>> * The result is of type real. If KIND is present, the kind type
>parameter is that
>>   specified by the value of KIND; otherwise, the kind type parameter
>is that of A.
>> * The result is the integer nearest A, or if there are two integers
>equally near A,
>>   the result is whichever such integer has the greater magnitude.
>> * Examples. ANINT (2.783) has the value 3.0. ANINT (−2.783) has the
>value −3.0.
>> 
>> >Of course the current code doesn't handle this correctly for the
>> >case if llroundf either.
>> >>I've not contributed to the Fortran front end before.  If the
>maintainers like
>> >>the patch, can somebody point out if I need to do additional things
>to commit
>> >>the patch?
>> Nothing special: a testcase already exists, committing is done as
>usual
>> and a PR to update you have as well.
>
>Given GCC 11 has branched, is it ok to backport the patch to the GCC 11
>branch
>as well?  I assume it is, since it fixes a regression in the compiler.

Please wait until after 11.1 is released. 

Thanks, 
Richard.

Re: Fix Fortran rounding issues, PR fortran/96983.

2021-04-22 Thread Michael Meissner via Gcc-patches

On Wed, Apr 21, 2021 at 10:10:07AM +0200, Tobias Burnus wrote:
> On 20.04.21 08:58, Richard Biener via Fortran wrote:
> >On Mon, Apr 19, 2021 at 9:40 PM Michael Meissner via Fortran
> >  wrote:
> Is there any reason to not only send the email to fortran@ _and_
> gcc-patches@ but sending it to 13 Fortran maintainers explicitly? (Now
> removed)

Sorry about that.  With PowerPC backend changes, I generally do explicitly add
the maintainers so things don't get lost.


> >>Fix Fortran rounding issues, PR fortran/96983.
> >>
> >>Can I check this change into the GCC trunk?
> >The patch looks fine technically and is definitely an improvement since the
> >intermediate conversion looks odd.  But it might be that the standard
> >requires such dance though the preceeding cases handled don't seem to
> >care.  I'm thinking of a FP format where round(1.6) == 3 because of lack
> >of precision but using an intermediate FP format with higher precision
> >would "correctly" compute 2.
> 
> The patched build_round_expr is only called by ANINT / NINT;
> NINT is real → integer; ANINT is real → real
> [And the modified code is only called for NINT, reason: see comment far 
> below.]
> 
> NINT (A[, KIND]) is described (F2018) as "Nearest integer":
> * Result Characteristics. Integer. If KIND is present, the kind type parameter
>   is that specified by the value of KIND;
>   otherwise, the kind type parameter is that of default integer type.
> * The result is the integer nearest A, or if there are two
>   integers equally near A, the result is whichever such integer has the 
> greater
>   magnitude.
> * Example. NINT (2.783) has the value 3.
> 
> ANINT (A[, KIND]) as "Nearest whole number":
> * The result is of type real. If KIND is present, the kind type parameter is 
> that
>   specified by the value of KIND; otherwise, the kind type parameter is that 
> of A.
> * The result is the integer nearest A, or if there are two integers equally 
> near A,
>   the result is whichever such integer has the greater magnitude.
> * Examples. ANINT (2.783) has the value 3.0. ANINT (−2.783) has the value 
> −3.0.
> 
> >Of course the current code doesn't handle this correctly for the
> >case if llroundf either.
> >>I've not contributed to the Fortran front end before.  If the maintainers 
> >>like
> >>the patch, can somebody point out if I need to do additional things to 
> >>commit
> >>the patch?
> Nothing special: a testcase already exists, committing is done as usual
> and a PR to update you have as well.

Given GCC 11 has branched, is it ok to backport the patch to the GCC 11 branch
as well?  I assume it is, since it fixes a regression in the compiler.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[committed] c++: Add testcase for already fixed PR [PR16617]

2021-04-22 Thread Patrick Palka via Gcc-patches

We correctly diagnose the invalid access since r11-1350.

gcc/testsuite/ChangeLog:

PR c++/16617
* g++.dg/template/access36.C: New test.
---
 gcc/testsuite/g++.dg/template/access36.C | 25 
 1 file changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/access36.C

diff --git a/gcc/testsuite/g++.dg/template/access36.C 
b/gcc/testsuite/g++.dg/template/access36.C
new file mode 100644
index 000..72ca23c7017
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access36.C
@@ -0,0 +1,25 @@
+// PR c++/16617
+
+class B
+{
+  protected:
+  int i;
+};
+
+template  void fr ();
+
+class D2 : public B
+{
+  friend void fr ();
+};
+
+template struct X
+{};
+
+template  void fr ()
+{
+  X<::i> x1;  // { dg-error "protected" }
+  X<::i> x2; // { dg-error "protected" }
+}
+
+template void fr();
-- 
2.31.1.362.g311531c9de

[committed] c++: Add testcase for already fixed PR [PR84689]

2021-04-22 Thread Patrick Palka via Gcc-patches

We correctly accept this testcase since r11-1638.

gcc/testsuite/ChangeLog:

PR c++/84689
* g++.dg/cpp0x/sfinae67.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/sfinae67.C | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/sfinae67.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/sfinae67.C 
b/gcc/testsuite/g++.dg/cpp0x/sfinae67.C
new file mode 100644
index 000..cfed92ad472
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/sfinae67.C
@@ -0,0 +1,20 @@
+// PR c++/84689
+// { dg-do compile { target c++11 } }
+
+struct base {
+  void operator()();
+};
+
+struct a : base { };
+struct b : base { };
+
+struct f : a, b {
+  using a::operator();
+  using b::operator();
+};
+
+template  auto g(int) -> decltype(T()());
+template  auto g(...) -> int;
+
+using type = decltype(g(0));
+using type = int;
-- 
2.31.1.362.g311531c9de

[Patch, fortran] PR fortran/82376 - Duplicate function call using -fcheck=pointer

2021-04-22 Thread José Rui Faustino de Sousa via Gcc-patches


Hi All!

Proposed patch to:

PR82376 - Duplicate function call using -fcheck=pointer

Patch tested only on x86_64-pc-linux-gnu.

Evaluate function result and then pass a pointer, instead of a reference 
to the function itself, thus avoiding multiple evaluations of the function.


Thank you very much.

Best regards,
José Rui

Fortran: Fix double function call with -fcheck=pointer [PR]

gcc/fortran/ChangeLog:

PR fortran/82376
* trans-expr.c (gfc_conv_procedure_call): Evaluate function result
and then pass a pointer.

gcc/testsuite/ChangeLog:

PR fortran/82376
* gfortran.dg/PR82376.f90: New test.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 213f32b0a67..b83b021755d 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6014,11 +6014,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			|| (!e->value.function.esym
 && e->symtree->n.sym->attr.pointer))
 			&& fsym && fsym->attr.target)
-		{
-		  gfc_conv_expr (, e);
-		  parmse.expr = gfc_build_addr_expr (NULL_TREE, parmse.expr);
-		}
-
+		/* Make sure the function only gets called once.  */
+		gfc_conv_expr_reference (, e, false);
 	  else if (e->expr_type == EXPR_FUNCTION
 		   && e->symtree->n.sym->result
 		   && e->symtree->n.sym->result != e->symtree->n.sym
diff --git a/gcc/testsuite/gfortran.dg/PR82376.f90 b/gcc/testsuite/gfortran.dg/PR82376.f90
new file mode 100644
index 000..cea1c2ae211
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR82376.f90
@@ -0,0 +1,55 @@
+! { dg-do run }
+!
+! Test the fix for PR82376
+!
+
+program main_p
+
+  integer, parameter :: n = 10
+
+  type :: foo_t
+integer, pointer :: v =>null()
+  end type foo_t
+
+  integer, save :: pcnt = 0
+  
+  type(foo_t) :: int
+  integer :: i
+
+  do i = 1, n
+call init(int, i)
+if(.not.associated(int%v)) stop 1
+if(int%v/=i) stop 2
+if(pcnt/=i) stop 3
+  end do
+
+contains
+
+  function new(data) result(this)
+integer, target, intent(in) :: data
+
+integer, pointer :: this
+
+nullify(this)
+this => data
+pcnt = pcnt + 1
+return
+  end function new
+
+  subroutine init(this, data)
+type(foo_t), intent(out) :: this
+integer, intent(in)  :: data
+
+call set(this, new(data))
+return
+  end subroutine init
+
+  subroutine set(this, that)
+type(foo_t), intent(inout) :: this
+integer, target, intent(in):: that
+
+this%v => that
+return
+  end subroutine set
+  
+end program main_p

testsuite/substr_{9,10}.f90: Move to gfortran.dg/

2021-04-22 Thread Tobias Burnus


The test was added in r11-6687-gbdd1b1f55529da00b867ef05a53a08fbfc3d1c2e 
(PR93340)
but placed into the wrong directly. Fixed by moving to gfortran.dg/
in commit r12-68-gac456fd981db6b0c2f7ee1ab0d17d36087a74dc2 after
confirming that the tests work.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
commit ac456fd981db6b0c2f7ee1ab0d17d36087a74dc2
Author: Tobias Burnus 
Date:   Thu Apr 22 19:14:58 2021 +0200

testsuite/substr_{9,10}.f90: Move to gfortran.dg/

gcc/testsuite/
* substr_9.f90: Move to ...
* gfortran.dg/substr_9.f90: ... here.
* substr_10.f90: Move to ...
* gfortran.dg/substr_10.f90: ... here.
---
 gcc/testsuite/{ => gfortran.dg}/substr_10.f90 | 0
 gcc/testsuite/{ => gfortran.dg}/substr_9.f90  | 0
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/gcc/testsuite/substr_10.f90 b/gcc/testsuite/gfortran.dg/substr_10.f90
similarity index 100%
rename from gcc/testsuite/substr_10.f90
rename to gcc/testsuite/gfortran.dg/substr_10.f90
diff --git a/gcc/testsuite/substr_9.f90 b/gcc/testsuite/gfortran.dg/substr_9.f90
similarity index 100%
rename from gcc/testsuite/substr_9.f90
rename to gcc/testsuite/gfortran.dg/substr_9.f90

Re: [PATCH] gcov: Use system IO buffering

2021-04-22 Thread Andi Kleen via Gcc-patches

Martin Liška  writes:

> Hey.
>
> I/O buffering in gcov seems duplicite to what modern C library can provide.
> The patch is a simplification and can provide easier interface for system
> that don't have a filesystem and would like using GCOV.
>
> I'm going to install the patch after 11.1 if there are no objections.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

What happens if someone compiles the C library with gcov?

Being as self contained as possible (only system calls) would seem
safer.

-Andi

[PATCH] libstdc++: Fix semaphore to work with system_clock timeouts

2021-04-22 Thread Jonathan Wakely via Gcc-patches

The __cond_wait_until_impl function takes a steady_clock timeout, but
then sometimes tries to compare it to a time from the system_clock,
which won't compile.  Additionally, that function gets called with
system_clock timeouts, which also won't compile. This makes the function
accept timeouts for either clock, and compare to the time from the right
clock.

This fixes the compilation error that was causing two tests to fail on
non-futex targets, so we can revert the r12-11 change to disable them.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__cond_wait_until_impl):
Handle system_clock as well as steady_clock.
* testsuite/30_threads/semaphore/try_acquire_for.cc: Re-enable.
* testsuite/30_threads/semaphore/try_acquire_until.cc:
Re-enable.

I'm testing this now on x86_64-linux, powerpc64le-linux, sparc-linux,
power-aix and sparc-solaris. It looks good so far, so I'll push to
trunk when the tests finish.

This should also go to the gcc-11 branch, or the timed waits for
semaphores can't be used with system_clock times on non-futed targets.



commit 6924588774a02dec63fb4b0ad19aed303accc2d2
Author: Jonathan Wakely 
Date:   Thu Apr 22 17:26:50 2021

libstdc++: Fix semaphore to work with system_clock timeouts

The __cond_wait_until_impl function takes a steady_clock timeout, but
then sometimes tries to compare it to a time from the system_clock,
which won't compile.  Additionally, that function gets called with
system_clock timeouts, which also won't compile. This makes the function
accept timeouts for either clock, and compare to the time from the right
clock.

This fixes the compilation error that was causing two tests to fail on
non-futex targets, so we can revert the r12-11 change to disable them.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__cond_wait_until_impl):
Handle system_clock as well as steady_clock.
* testsuite/30_threads/semaphore/try_acquire_for.cc: Re-enable.
* testsuite/30_threads/semaphore/try_acquire_until.cc:
Re-enable.

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 70e5335cfd7..ec7ff51cdbc 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -139,12 +139,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // (e.g. __ulock_wait())which is better than pthread_cond_clockwait
 #endif // ! PLATFORM_TIMED_WAIT
 
-// returns true if wait ended before timeout
-template
+// Returns true if wait ended before timeout.
+// _Clock must be either steady_clock or system_clock.
+template
   bool
   __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
- const chrono::time_point& __atime)
+const chrono::time_point<_Clock, _Dur>& __atime)
   {
+   static_assert(std::__is_one_of<_Clock, chrono::steady_clock,
+  chrono::system_clock>::value);
+
auto __s = chrono::time_point_cast(__atime);
auto __ns = chrono::duration_cast(__atime - __s);
 
@@ -155,12 +159,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  };
 
 #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-   __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
-   return chrono::steady_clock::now() < __atime;
-#else
-   __cv.wait_until(__mx, __ts);
-   return chrono::system_clock::now() < __atime;
-#endif // ! _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
+   if constexpr (is_same_v)
+ __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
+   else
+#endif
+ __cv.wait_until(__mx, __ts);
+   return _Clock::now() < __atime;
   }
 
 // returns true if wait ended before timeout
diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc 
b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc
index 248ecb07e56..e7edc9eeef1 100644
--- a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc
+++ b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc
@@ -21,8 +21,6 @@
 // { dg-require-gthreads "" }
 // { dg-add-options libatomic }
 
-// { dg-skip-if "FIXME: fails" { ! futex } }
-
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc 
b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc
index eb1351cd2bf..49ba33b4999 100644
--- a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc
+++ b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc
@@ -21,8 +21,6 @@
 // { dg-additional-options "-pthread" { target pthread } }
 // { dg-add-options libatomic }
 
-// { dg-skip-if "FIXME: fails" { ! futex } }
-
 #include 
 #include 
 #include

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Jonathan Wakely via Gcc-patches

On Thu, 22 Apr 2021 at 15:59, Martin Sebor  wrote:
>
> On 4/22/21 2:52 AM, Jonathan Wakely wrote:
> > On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote:
> >
> > On 4/21/21 6:11 PM, Martin Sebor wrote:
> >  > On 4/21/21 2:15 AM, Martin Liška wrote:
> >  >> Hello.
> >  >>
> >  >> It's addressing the following Clang warning:
> >  >> cp/lex.c:170:45: warning: result of comparison of constant 64
> > with expression of type 'enum ovl_op_code' is always true
> > [-Wtautological-constant-out-of-range-compare]
> >  >>
> >  >> Patch can bootstrap on x86_64-linux-gnu and survives regression
> > tests.
> >  >>
> >  >> Ready to be installed?
> >  >> Thanks,
> >  >> Martin
> >  >>
> >  >> gcc/cp/ChangeLog:
> >  >>
> >  >> * cp-tree.h (STATIC_ASSERT): Prefer static assert.
> >  >> * lex.c (init_operators): Remove run-time check.
> >  >> ---
> >  >>   gcc/cp/cp-tree.h | 3 +++
> >  >>   gcc/cp/lex.c | 2 --
> >  >>   2 files changed, 3 insertions(+), 2 deletions(-)
> >  >>
> >  >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> >  >> index 81ff375f8a5..a8f72448ea9 100644
> >  >> --- a/gcc/cp/cp-tree.h
> >  >> +++ b/gcc/cp/cp-tree.h
> >  >> @@ -5916,6 +5916,9 @@ enum ovl_op_code {
> >  >> OVL_OP_MAX
> >  >>   };
> >  >>   +/* Make sure it fits in lang_decl_fn::operator_code. */
> >  >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6));
> >  >> +
> >  >
> >  > I wonder if there's a way to test this directly by something like
> >  >
> >  >   static_assert (number-of-bits (ovl_op_info_t::ovl_op_code)
> >  >  <= number-of-bits (lang_decl_fn::operator_code));
> >
> > Good point, but I'm not aware of it. Maybe C++ people can chime in?
> >
> >
> > ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class")
> > with no fixed underlying type (i.e. no enum-base like ": int" or ":
> > long" is specified) which means that the number of bits in is value
> > representation is the number of bits needed to represent the minimum and
> > maximum enumerators:
> >
> > "the values of the enumeration are the values representable by a
> > hypothetical integer type with minimal width M such that all enumerators
> > can be represented."
> >
> > There is no function/utility like number-of-bits that can tell you that
> > from the type though.You could use
> > std::underlying_type::type to get the integral type that
> > the compiler used to represent it, but that will probably be 'int' in
> > this case and so all it tells you is an upper bound of no more than 32
> > bits, which is not useful for this purpose.
>
> I suspected there wasn't a function like that.  Thanks for confirming
> it.  I wrote the one below just to see if it could be done.  It works
> for one bit-field but I can't think of a way to generalize it.  We'd
> probably need a built-in for that.  Perhaps one might be useful.
>
> enum E { e = 5 };
> struct A { E e: 3; };
>
> constexpr int number_of_bits ()
> {
>A a = { };
>a.e = (E)-1;
>int n = 0;
>for (; a.e; ++n)
>  a.e = (E)((unsigned)a.e ^ (1 << n));
>return n;
> }
>
> Martin

Or:

enum E { e = 5 };
struct A { E e: 3; };

constexpr int number_of_bits ()
{
   A a = { };
   a.e = (E)-1;
   return 32 - __builtin_clz(a.e);
}


But you can't get the number-of-bits needed for all the values of the
enum E, which is what I was referring to.

If you know the enumerators go from 0 to MAX (as is the case for
ovl_op_code) you can use (32 - __builtin_clz(MAX)) there too, but in
the general case you don't always know the maximum enumerator without
checking, and it depends whether the enumeration has a fixed
underlying type.

Re: [PATCH] Fix various typos.

2021-04-22 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 22, 2021 at 09:26:47AM -0600, Jeff Law via Gcc-patches wrote:
> > * g++.dg/template/nontype29.C: Fix typos and missing comments.
> > * gcc.dg/Warray-bounds-64.c: Likewise.
> > * gcc.dg/Warray-parameter.c: Likewise.
> > * gcc.dg/Wstring-compare.c: Likewise.
> > * gcc.dg/format/gcc_diag-11.c: Likewise.
> > * gfortran.dg/array_constructor_3.f90: Likewise.
> > * gfortran.dg/matmul_bounds_9.f90: Likewise.
> > * gfortran.dg/pr78033.f90: Likewise.
> > * gfortran.dg/pr96325.f90: Likewise.
> 
> OK and I think we should consider these kinds of typo fixes obvious and thus
> not needing an explicit ack.

Comment typos sure.  The typos in dg-bogus that Richi mentioned is something
that actually was a good idea to review, while in that case it was correct,
the corresponding commit that added those typos into dg-bogus didn't change
any diagnostics from expreession to expression, generally it could be
a test that verifies the diagnostics doesn't contain the typo.

Jakub

[PATCH] c++: Hard error with tentative parse and CTAD [PR87709]

2021-04-22 Thread Patrick Palka via Gcc-patches

As described in detail in comment #4 of this PR, when tentatively
parsing a construct that can either be a type or an expression, if
during the type parse we encounter an unexpected template placeholder,
we need to simulate an error rather than issue a real error because the
subsequent expression parse can still succeed.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

PR c++/87709
* parser.c (cp_parser_type_id_1): If we see a template
placeholder, first try simulating an error before issuing a real
error.

gcc/testsuite/ChangeLog:

PR c++/87709
* g++.dg/cpp1z/class-deduction86.C: New test.
---
 gcc/cp/parser.c| 11 +++
 gcc/testsuite/g++.dg/cpp1z/class-deduction86.C | 16 
 2 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction86.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fba516efa23..e1b1617da68 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -23270,10 +23270,13 @@ cp_parser_type_id_1 (cp_parser *parser, 
cp_parser_flags flags,
location_t loc = type_specifier_seq.locations[ds_type_spec];
if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node))
  {
-   error_at (loc, "missing template arguments after %qT",
- auto_node);
-   inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here",
-   tmpl);
+   if (!cp_parser_simulate_error (parser))
+ {
+   error_at (loc, "missing template arguments after %qT",
+ auto_node);
+   inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here",
+   tmpl);
+ }
  }
else if (parser->in_template_argument_list_p)
  error_at (loc, "%qT not permitted in template argument",
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C
new file mode 100644
index 000..a198ed24ec6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C
@@ -0,0 +1,16 @@
+// PR c++/87709
+// { dg-do compile { target c++17 } }
+
+template 
+struct lit {
+  lit(T) { }
+};
+
+template 
+int operator+(lit, lit) {
+  return 0;
+}
+
+auto r2 = (lit(0)) + lit(0);
+
+static_assert(sizeof(lit(0)));
-- 
2.31.1.362.g311531c9de

Re: [PATCH] Fix various typos.

2021-04-22 Thread Jeff Law via Gcc-patches




On 4/22/2021 6:23 AM, Martin Liška wrote:

Hello.

The patch fixes various typos.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR testsuite/100159
PR testsuite/100192

gcc/ChangeLog:

* builtins.c (expand_builtin): Fix typos and missing comments.
* dwarf2out.c (gen_subprogram_die): Likewise.
(gen_struct_or_union_type_die): Likewise.

gcc/fortran/ChangeLog:

* frontend-passes.c (optimize_expr): Fix typos and missing comments.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype29.C: Fix typos and missing comments.
* gcc.dg/Warray-bounds-64.c: Likewise.
* gcc.dg/Warray-parameter.c: Likewise.
* gcc.dg/Wstring-compare.c: Likewise.
* gcc.dg/format/gcc_diag-11.c: Likewise.
* gfortran.dg/array_constructor_3.f90: Likewise.
* gfortran.dg/matmul_bounds_9.f90: Likewise.
* gfortran.dg/pr78033.f90: Likewise.
* gfortran.dg/pr96325.f90: Likewise.


OK and I think we should consider these kinds of typo fixes obvious and 
thus not needing an explicit ack.


jeff

[committed] libstdc++: Add options for libatomic to test

2021-04-22 Thread Jonathan Wakely via Gcc-patches

This fixes a linker error on AIX:

FAIL: 30_threads/semaphore/try_acquire_posix.cc (test for excess errors)
Excess errors:
ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_add_8
ld: 0711-317 ERROR: Undefined symbol: .__atomic_load_8
ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_sub_8
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
collect2: error: ld returned 8 exit status

libstdc++-v3/ChangeLog:

* testsuite/30_threads/semaphore/try_acquire_posix.cc: Add
options for libatomic.

Tested powerpc64le-linux and powerpc-aix. Committed to trunk.

I'd like to backport this to gcc-11 too.

commit 58871c03318e080962f022f5d77db3c4fde3e351
Author: Jonathan Wakely 
Date:   Thu Apr 22 15:51:08 2021

libstdc++: Add options for libatomic to test

This fixes a linker error on AIX:

FAIL: 30_threads/semaphore/try_acquire_posix.cc (test for excess errors)
Excess errors:
ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_add_8
ld: 0711-317 ERROR: Undefined symbol: .__atomic_load_8
ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_sub_8
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more 
information.
collect2: error: ld returned 8 exit status

libstdc++-v3/ChangeLog:

* testsuite/30_threads/semaphore/try_acquire_posix.cc: Add
options for libatomic.

diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc 
b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc
index 97e386f7b76..2d9ae40db72 100644
--- a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc
+++ b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc
@@ -19,6 +19,7 @@
 // { dg-do run { target c++2a } }
 // { dg-require-effective-target pthread }
 // { dg-require-gthreads "" }
+// { dg-add-options libatomic }
 
 #include 
 #ifdef _GLIBCXX_HAVE_POSIX_SEMAPHORE

[committed] libstdc++: Reject std::make_shared [PR 99006]

2021-04-22 Thread Jonathan Wakely via Gcc-patches

Prior to C++20 it should be ill-formed to use std::make_shared with an
array type (and we don't support the C++20 feature to make it valid yet
anyway).

libstdc++-v3/ChangeLog:

PR libstdc++/99006
* include/bits/shared_ptr.h (allocate_shared): Assert that _Tp
is not an array type.
* include/bits/shared_ptr_base.h (__allocate_shared): Likewise.
* testsuite/20_util/shared_ptr/creation/99006.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 55650236cd97d81f42f9fdb4f6bcb12babafe51f
Author: Jonathan Wakely 
Date:   Thu Apr 22 15:46:51 2021

libstdc++: Reject std::make_shared [PR 99006]

Prior to C++20 it should be ill-formed to use std::make_shared with an
array type (and we don't support the C++20 feature to make it valid yet
anyway).

libstdc++-v3/ChangeLog:

PR libstdc++/99006
* include/bits/shared_ptr.h (allocate_shared): Assert that _Tp
is not an array type.
* include/bits/shared_ptr_base.h (__allocate_shared): Likewise.
* testsuite/20_util/shared_ptr/creation/99006.cc: New test.

diff --git a/libstdc++-v3/include/bits/shared_ptr.h 
b/libstdc++-v3/include/bits/shared_ptr.h
index ac4aa20b794..d5386ad535f 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -857,6 +857,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline shared_ptr<_Tp>
 allocate_shared(const _Alloc& __a, _Args&&... __args)
 {
+  static_assert(!is_array<_Tp>::value, "make_shared not supported");
+
   return shared_ptr<_Tp>(_Sp_alloc_shared_tag<_Alloc>{__a},
 std::forward<_Args>(__args)...);
 }
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index 8766b61fd81..71099afbf7a 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -1831,6 +1831,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline __shared_ptr<_Tp, _Lp>
 __allocate_shared(const _Alloc& __a, _Args&&... __args)
 {
+  static_assert(!is_array<_Tp>::value, "make_shared not supported");
+
   return __shared_ptr<_Tp, _Lp>(_Sp_alloc_shared_tag<_Alloc>{__a},
std::forward<_Args>(__args)...);
 }
diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc 
b/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc
new file mode 100644
index 000..d5f7a5da5e9
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc
@@ -0,0 +1,9 @@
+// FIXME: This should use { target { ! c++20 } }
+// { dg-do compile }
+
+#include 
+
+auto p = std::make_shared(2); // { dg-error "here" }
+auto q = std::make_shared(1, 2); // { dg-error "here" }
+
+// { dg-prune-output "static assertion failed" }

[committed] libstdc++: Fix typo in comment

2021-04-22 Thread Jonathan Wakely via Gcc-patches

libstdc++-v3/ChangeLog:

* config/os/gnu-linux/os_defines.h: Fix type in comment.

Tested powerpc64le-linux. Committed to trunk.

commit 19aa9bc9897955817622574e62b53b24ae0837e9
Author: Jonathan Wakely 
Date:   Thu Apr 22 15:48:29 2021

libstdc++: Fix typo in comment

libstdc++-v3/ChangeLog:

* config/os/gnu-linux/os_defines.h: Fix type in comment.

diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
b/libstdc++-v3/config/os/gnu-linux/os_defines.h
index 64f78df8eef..d5bb2a1886e 100644
--- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
+++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
@@ -33,7 +33,7 @@
 // System-specific #define, typedefs, corrections, etc, go here.  This
 // file will come before all others.
 
-// This keeps isanum, et al from being propagated as macros.
+// This keeps isalnum, et al from being propagated as macros.
 #define __NO_CTYPE 1
 
 #include

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Martin Sebor via Gcc-patches

On 4/22/21 2:52 AM, Jonathan Wakely wrote:

On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote:

On 4/21/21 6:11 PM, Martin Sebor wrote:
 > On 4/21/21 2:15 AM, Martin Liška wrote:
 >> Hello.
 >>
 >> It's addressing the following Clang warning:
 >> cp/lex.c:170:45: warning: result of comparison of constant 64
with expression of type 'enum ovl_op_code' is always true
[-Wtautological-constant-out-of-range-compare]
 >>
 >> Patch can bootstrap on x86_64-linux-gnu and survives regression
tests.
 >>
 >> Ready to be installed?
 >> Thanks,
 >> Martin
 >>
 >> gcc/cp/ChangeLog:
 >>
 >> * cp-tree.h (STATIC_ASSERT): Prefer static assert.
 >> * lex.c (init_operators): Remove run-time check.
 >> ---
 >>   gcc/cp/cp-tree.h | 3 +++
 >>   gcc/cp/lex.c | 2 --
 >>   2 files changed, 3 insertions(+), 2 deletions(-)
 >>
 >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
 >> index 81ff375f8a5..a8f72448ea9 100644
 >> --- a/gcc/cp/cp-tree.h
 >> +++ b/gcc/cp/cp-tree.h
 >> @@ -5916,6 +5916,9 @@ enum ovl_op_code {
 >>     OVL_OP_MAX
 >>   };
 >>   +/* Make sure it fits in lang_decl_fn::operator_code. */
 >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6));
 >> +
 >
 > I wonder if there's a way to test this directly by something like
 >
 >   static_assert (number-of-bits (ovl_op_info_t::ovl_op_code)
 >  <= number-of-bits (lang_decl_fn::operator_code));

Good point, but I'm not aware of it. Maybe C++ people can chime in?

ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class") 
with no fixed underlying type (i.e. no enum-base like ": int" or ": 
long" is specified) which means that the number of bits in is value 
representation is the number of bits needed to represent the minimum and 
maximum enumerators:

"the values of the enumeration are the values representable by a 
hypothetical integer type with minimal width M such that all enumerators 
can be represented."

There is no function/utility like number-of-bits that can tell you that 
from the type though.You could use 
std::underlying_type::type to get the integral type that 
the compiler used to represent it, but that will probably be 'int' in 
this case and so all it tells you is an upper bound of no more than 32 
bits, which is not useful for this purpose.

I suspected there wasn't a function like that.  Thanks for confirming
it.  I wrote the one below just to see if it could be done.  It works
for one bit-field but I can't think of a way to generalize it.  We'd
probably need a built-in for that.  Perhaps one might be useful.

enum E { e = 5 };
struct A { E e: 3; };

constexpr int number_of_bits ()
{
  A a = { };
  a.e = (E)-1;
  int n = 0;
  for (; a.e; ++n)
a.e = (E)((unsigned)a.e ^ (1 << n));
  return n;
}

Martin

[PATCH] c++: Refine enum direct-list-initialization [CWG2374]

2021-04-22 Thread Patrick Palka via Gcc-patches

This implements the wording changes of CWG2374, which clarifies the
wording of P0138 to forbid e.g. direct-list-initialization of a scoped
enumeration from a different scoped enumeration.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

DR 2374
* decl.c (is_direct_enum_init): Check the implicit
convertibility requirement added by CWG 2374.

gcc/testsuite/ChangeLog:

DR 2374
* g++.dg/cpp1z/direct-enum-init2.C: New test.
---
 gcc/cp/decl.c  | 8 +++-
 gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C | 8 
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index b81de8ef934..60dc2bf182d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6191,7 +6191,13 @@ is_direct_enum_init (tree type, tree init)
   && ENUM_FIXED_UNDERLYING_TYPE_P (type)
   && TREE_CODE (init) == CONSTRUCTOR
   && CONSTRUCTOR_IS_DIRECT_INIT (init)
-  && CONSTRUCTOR_NELTS (init) == 1)
+  && CONSTRUCTOR_NELTS (init) == 1
+  /* DR 2374: The single element needs to be implicitly
+convertible to the underlying type of the enum.  */
+  && can_convert_arg (ENUM_UNDERLYING_TYPE (type),
+ TREE_TYPE (CONSTRUCTOR_ELT (init, 0)->value),
+ CONSTRUCTOR_ELT (init, 0)->value,
+ LOOKUP_IMPLICIT, tf_none))
 return true;
   return false;
 }
diff --git a/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C 
b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C
new file mode 100644
index 000..5b94a8d00fe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C
@@ -0,0 +1,8 @@
+// DR 2374
+// { dg-do compile { target c++17 } }
+
+enum class Orange;
+enum class Apple;
+
+extern Orange o;
+Apple a{o}; // { dg-error "cannot convert" }
-- 
2.31.1.362.g311531c9de

Re: [PATCH] Remove __cplusplus >= 201103

2021-04-22 Thread Martin Liška

On 4/22/21 3:59 PM, Richard Biener wrote:
> For system.h please keep it working for C code (not sure if we use
> a C compiler anywhere).

All right. I've just did --enable-languages=all and all is fine.
So there's likely no C usage of it.

Martin

Re: [PATCH] [libstdc++] Fix "bare" notifications dropped by waiters check

2021-04-22 Thread Thomas Rodgers


On 2021-04-22 02:23, Jonathan Wakely wrote:


On 21/04/21 18:29 -0700, Thomas Rodgers wrote:


From: Thomas Rodgers 

NOTE - This patch also needs to be backported to gcc-11 in order for
semaphore release() to work correctly on non-futex platforms.

Tested sparc-sun-solaris2.11

For types that track whether or not there extant waiters (e.g.
semaphore) internally, the __atomic_notify_address_bare() call was
introduced to avoid the overhead of loading the atomic count of
waiters. For platforms that don't have Futex, however, there was
still a check for waiters, and seeing that there are none (because
in the bare case, the count is not incremented), the notification
is dropped. This commit addresses that case.

libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h: Always notify waiters in the
in the case of 'bare' address notification.


Repeated text: "in the in the"


---
libstdc++-v3/include/bits/atomic_wait.h | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h

index 0ac5575190c..984ed70f16c 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -226,9 +226,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}

void
-  _M_notify(const __platform_wait_t* __addr, bool __all) noexcept
+  _M_notify(const __platform_wait_t* __addr, bool __all, bool 
__bare) noexcept

{
-if (!_M_waiting())
+if (!(__bare || _M_waiting()))


Maybe it's just me, but I would find (!__base && !__waiting) to be a
clearer expression of the logic here.

i.e. "return if the wait is not bare and there are no waiters"
rather than "return if the wait is not either bare or has waiters".
The latter makes me take a second to grok.



As discussed on IRC, I went back and forth on this a couple of times and 
neither option seemed particularly great to me. I started down the path 
of propagating the std::true_type/std::false_type bits that the RAII 
wrapper type knows about and making this a compile time choice, but felt 
the change was too big to make this late in the release cycle. I will 
revisit this for stage1 (though ideally all of this will be moved into 
the .so for GCC12 and partially rewritten as part of that process 
anyway).



The patch is OK either way though, with the ChangeLog typo fix.


return;

#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
@@ -304,11 +304,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}

void
-_M_notify(bool __all)
+_M_notify(bool __all, bool __bare = false)
{
if (_M_addr == &_M_w._M_ver)
__atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
-  _M_w._M_notify(_M_addr, __all);
+  _M_w._M_notify(_M_addr, __all, __bare);
}

template// This call is to be used by atomic types which track contention 
externally

@@ -464,7 +464,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__detail::__platform_notify(__addr, __all);
#else
__detail::__bare_wait __w(__addr);
-__w._M_notify(__all);
+__w._M_notify(__all, true);
#endif
}
_GLIBCXX_END_NAMESPACE_VERSION
-- 2.30.2


Tested sparc-sun-solaris2.11.
Committed to master, backported to release/gcc-11.

Re: [PATCH] Fix various typos.

2021-04-22 Thread Martin Liška

On 4/22/21 4:00 PM, Richard Biener wrote:
> For the dg-bogus did you check the emitted diagnostic not tested for
> doesn't have the same misspelling?

I did. I'm going to push it.

Martin

i386: Fix unsigned int -> double conversion on i386 w/ -mfpmath=sse [PR100119]

2021-04-22 Thread Uros Bizjak via Gcc-patches

2021-04-22  Uroš Bizjak  

gcc/
PR target/100119
* config/i386/i386-expand.c (ix86_expand_convert_uns_sidf_sse):
Remove the sign with FE_DOWNWARD, where x - x = -0.0.

gcc/testsuite/

PR target/100119
* gcc.target/i386/pr100119.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 166c23ddb4d..516440eb5c1 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -1550,6 +1550,8 @@ ix86_expand_convert_uns_sixf_sse (rtx, rtx)
   gcc_unreachable ();
 }
 
+static rtx ix86_expand_sse_fabs (rtx op0, rtx *smask);
+
 /* Convert an unsigned SImode value into a DFmode.  Only currently used
for SSE, but applicable anywhere.  */
 
@@ -1569,6 +1571,11 @@ ix86_expand_convert_uns_sidf_sse (rtx target, rtx input)
   x = const_double_from_real_value (TWO31r, DFmode);
 
   x = expand_simple_binop (DFmode, PLUS, fp, x, target, 0, OPTAB_DIRECT);
+
+  /* Remove the sign with FE_DOWNWARD, where x - x = -0.0.  */
+  if (HONOR_SIGNED_ZEROS (DFmode) && flag_rounding_math)
+x = ix86_expand_sse_fabs (x, NULL);
+
   if (x != target)
 emit_move_insn (target, x);
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr100119.c 
b/gcc/testsuite/gcc.target/i386/pr100119.c
new file mode 100644
index 000..be9472eb939
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100119.c
@@ -0,0 +1,28 @@
+/* PR target/100119 */
+/* { dg-do run { target sse2_runtime } } */
+/* { dg-require-effective-target fenv } */
+/* { dg-options "-O2 -frounding-math -msse2 -mno-avx512f -mfpmath=sse" } */
+
+#include 
+
+double
+__attribute__((noinline))
+test (unsigned int x)
+{
+  return x;
+}
+
+int
+main ()
+{
+  double result;
+
+  fesetround (FE_DOWNWARD);
+
+  result = test (0);
+
+  if (__builtin_signbit (result) != 0)
+__builtin_abort ();
+
+  return 0;
+}

Re: [PATCH] libstdc++: Add workaround for ia32 floating atomics miscompilations [PR100184]

2021-04-22 Thread H.J. Lu via Gcc-patches

Wrong PR # in subject.

On Thu, Apr 22, 2021 at 7:11 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> gcc on ia32 miscompiles various atomics involving floating point,
> unfortunately I'm afraid it is too late to fix that for 11.1 and
> as I'm quite lost on it, it might take a while for 12 too
> (disabling all the 8 peephole2s would be easiest, but then we'd
> run into optimization regressions).
>
> While 1.cc just FAILs, with dejagnu 1.6.1 wait_notify.cc hangs the
> make check even after the timeout fires.  The following patch therefore
> xfails the former and skips the latter.
>
> Tested on x86_64-linux where
> make check RUNTESTFLAGS='conformance.exp=atomic_float/*.cc'
> is still
> === libstdc++ Summary ===
>
> # of expected passes8
> and on i686-linux, where it is now
> === libstdc++ Summary ===
>
> # of expected passes5
> # of expected failures  1
> # of unsupported tests  1
>
> Ok for trunk/11.1 ?
>
> 2021-04-22  Jakub Jelinek  
>
> PR target/100182
> * testsuite/29_atomics_atomic_float/1.cc: Add dg-xfail-run-if for
> ia32.
> * testsuite/29_atomics/atomic_float/wait_notify.cc: Add dg-skip-if for
> ia32.
>
> --- libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc.jj  2021-01-05 
> 00:13:58.552294301 +0100
> +++ libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc 2021-04-22 
> 14:29:19.778374653 +0200
> @@ -18,6 +18,7 @@
>  // { dg-add-options ieee }
>  // { dg-options "-std=gnu++2a" }
>  // { dg-do run { target c++2a } }
> +// { dg-xfail-run-if "PR100182" { ia32 } }
>
>  #include 
>  #include 
> --- libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc.jj
> 2021-04-20 23:46:09.214189796 +0200
> +++ libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc   
> 2021-04-22 14:28:19.793044944 +0200
> @@ -2,6 +2,7 @@
>  // { dg-do run { target c++2a } }
>  // { dg-require-gthreads "" }
>  // { dg-additional-options "-pthread" { target pthread } }
> +// { dg-skip-if "PR100182" { ia32 } }
>  // { dg-add-options libatomic }
>
>  // Copyright (C) 2020-2021 Free Software Foundation, Inc.
>
> Jakub
>


-- 
H.J.

Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.

2021-04-22 Thread Martin Liška

On 4/22/21 2:47 PM, Richard Biener wrote:
> On Thu, Apr 22, 2021 at 2:21 PM Martin Liška  wrote:
>>
>> On 4/22/21 1:19 PM, Richard Biener wrote:
>>> On Thu, Apr 22, 2021 at 11:02 AM Martin Liška  wrote:

 On 4/22/21 10:04 AM, Richard Biener wrote:
> On Wed, Apr 21, 2021 at 3:08 PM Martin Liška  wrote:
>>
>> When -flto=jobserver is used and we cannot detect job server, then we can
>> still fallbackto -flto=N mode.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>
> I think this behavior needs to be documented - it falls back to a less
> conservative (possibly system overloading) mode - which IMHO is
> non-obvious and IMHO we shouldn't do.

 Sure, I'm sending corresponding patch. Note that it's quite common mistake
 that '+' is missing in Makefile rule. That was motivation for my change.
>>>
>>> Sure, but that change won't get this fixed.
>>
>> It will as linker command line will contain (-flto=jobserver) and LTO will 
>> fallback to -flto=N.
>>
>>> IMHO we should eventually
>>> emit diagnostic like
>>>
>>> warning: could not find jobserver, compiling N jobs serially
>>>
>>> once N > 1 (or 2?).
>>
>> We do that now (for all N):
>> lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment 
>> variable is unset
>>
>>
>>> Likewise if people just use -flto and auto-detection
>>> finds nothing:
>>
>> -flto != -flto=auto
>>
>> Yes, -flto is a serial linking and we can emit a warning.
> 
> I'd avoid warning if there's just a single ltrans unit.

That's doable and I've just done that in the attached patch.
Two disadvantages:
- one needs waiting for the warning after WPA
- source change (>1 LTRANS) can trigger the warning

> 
>>> warning: using serial compilation of N LTRANS jobs
>>> note: refer to http:// for how to use parallel compile
>>>
>>> using the URL diagnostics to point to -flto=... documentation.
>>
>> What about making that a proper warning (-Wlto)? We have diagnostics 
>> infrastructure
>> that prints URL links.
> 
> Note that drivers like lto-wrapper do not have fully initialized diagnostic
> machinery and use a "different" set of overloads (likewise gen* programs).

I managed printing the warning:

lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment 
variable is unset

lto-wrapper: note: see the ‘-flto’ option documentation for more information


and

lto-wrapper: warning: using serial compilation of 128 LTRANS jobs

lto-wrapper: note: see the ‘-flto’ option documentation for more information


> 
>>>
>>> That is, teach users rather than second-guessing and eventually
>>> blowing things up.  IMHO only the jobserver mode is safe to
>>> automatically use.
>>
>> Well, -flto=auto is also fine and document. I think there is no possibility
>> auto CPU deduction can fail. So -flto=jobserver (with missing make job 
>> server)
>> and -flto (equal to -flto=1) worth emitting a warning.
>>
>> What do you think?
> 
> Yes, that sounds reasonable.  I suspect that people might want to see
> -flto default to -flto=auto but then I don't think that's good because there's
> no system wide job scheduler limiting things (systemd-jobserver anyone?)

Done that.

Thoughts?
Martin

> 
> Richard.
> 
>> Martin
>>
>>>
>>> Richard.
>>>
 Martin

>
> Richard.
>
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the
>> makeserver cannot be detected, then use -flto=N fallback.
>> ---
>>  gcc/lto-wrapper.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
>> index 03a5922f8ea..0b626d7c811 100644
>> --- a/gcc/lto-wrapper.c
>> +++ b/gcc/lto-wrapper.c
>> @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[])
>>if (jobserver && jobserver_error != NULL)
>> {
>>   warning (0, jobserver_error);
>> - parallel = 0;
>> + /* Fall back to auto parallelism.  */
>>   jobserver = 0;
>> + auto_parallel = 1;
>> }
>>else if (!jobserver && jobserver_error == NULL)
>> {
>> --
>> 2.31.1
>>

>>

>From 7cd89e17c49c027027e2aed1722c0758331ac7ac Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 22 Apr 2021 16:27:19 +0200
Subject: [PATCH] Print warning diagnostics for -flto issues.

gcc/ChangeLog:

	* lto-wrapper.c (print_lto_docs_link): New function.
	(run_gcc): Print warning about missing job server detection
	after we know NR of partitions. Do the same for -flto{,=1}.
	* opts.c (get_option_html_page): Support -flto option.
---
 gcc/lto-wrapper.c | 39 +--
 gcc/opts.c|  6 +-
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index

Re: [PATCH] Fix various typos.

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 3:50 PM Martin Liška  wrote:
>
> Hello.
>
> The patch fixes various typos.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

For the dg-bogus did you check the emitted diagnostic not tested for
doesn't have the same misspelling?

Otherwise OK.

Richard.

> Thanks,
> Martin
>
> PR testsuite/100159
> PR testsuite/100192
>
> gcc/ChangeLog:
>
> * builtins.c (expand_builtin): Fix typos and missing comments.
> * dwarf2out.c (gen_subprogram_die): Likewise.
> (gen_struct_or_union_type_die): Likewise.
>
> gcc/fortran/ChangeLog:
>
> * frontend-passes.c (optimize_expr): Fix typos and missing comments.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/template/nontype29.C: Fix typos and missing comments.
> * gcc.dg/Warray-bounds-64.c: Likewise.
> * gcc.dg/Warray-parameter.c: Likewise.
> * gcc.dg/Wstring-compare.c: Likewise.
> * gcc.dg/format/gcc_diag-11.c: Likewise.
> * gfortran.dg/array_constructor_3.f90: Likewise.
> * gfortran.dg/matmul_bounds_9.f90: Likewise.
> * gfortran.dg/pr78033.f90: Likewise.
> * gfortran.dg/pr96325.f90: Likewise.
> ---
>  gcc/builtins.c|  2 +-
>  gcc/dwarf2out.c   |  4 ++--
>  gcc/fortran/frontend-passes.c |  2 +-
>  gcc/testsuite/g++.dg/template/nontype29.C |  4 ++--
>  gcc/testsuite/gcc.dg/Warray-bounds-64.c   |  2 +-
>  gcc/testsuite/gcc.dg/Warray-parameter.c   |  2 +-
>  gcc/testsuite/gcc.dg/Wstring-compare.c| 10 +-
>  gcc/testsuite/gcc.dg/format/gcc_diag-11.c |  2 +-
>  gcc/testsuite/gfortran.dg/array_constructor_3.f90 |  2 +-
>  gcc/testsuite/gfortran.dg/matmul_bounds_9.f90 |  2 +-
>  gcc/testsuite/gfortran.dg/pr78033.f90 |  2 +-
>  gcc/testsuite/gfortran.dg/pr96325.f90 |  2 +-
>  12 files changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index d30c4eb62fc..8c5324bf7de 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -9986,7 +9986,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>break;
>
>  /* Expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it
> -   back to a BUILT_IN_STRCMP. Remember to delete the 3rd paramater
> +   back to a BUILT_IN_STRCMP. Remember to delete the 3rd parameter
> when changing it to a strcmp call.  */
>  case BUILT_IN_STRCMP_EQ:
>target = expand_builtin_memcmp (exp, target, true);
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index aba16848295..c36fd5a7f6a 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -23542,7 +23542,7 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
>resolve_variable_values ();
>  }
>
> -  /* Generate child dies for template paramaters.  */
> +  /* Generate child dies for template parameters.  */
>if (early_dwarf && debug_info_level > DINFO_LEVEL_TERSE)
>  gen_generic_params_dies (decl);
>
> @@ -25471,7 +25471,7 @@ gen_struct_or_union_type_die (tree type, dw_die_ref 
> context_die,
>
>scope_die = scope_die_for (type, context_die);
>
> -  /* Generate child dies for template paramaters.  */
> +  /* Generate child dies for template parameters.  */
>if (!type_die && debug_info_level > DINFO_LEVEL_TERSE)
>  schedule_generic_params_dies_gen (type);
>
> diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
> index 7d3eae67632..93ac4b4a17c 100644
> --- a/gcc/fortran/frontend-passes.c
> +++ b/gcc/fortran/frontend-passes.c
> @@ -373,7 +373,7 @@ optimize_expr (gfc_expr **e, int *walk_subtrees 
> ATTRIBUTE_UNUSED,
>return 0;
>  }
>
> -/* Auxiliary function to handle the arguments to reduction intrnisics.  If 
> the
> +/* Auxiliary function to handle the arguments to reduction intrinsics.  If 
> the
> function is a scalar, just copy it; otherwise returns the new element, the
> old one can be freed.  */
>
> diff --git a/gcc/testsuite/g++.dg/template/nontype29.C 
> b/gcc/testsuite/g++.dg/template/nontype29.C
> index 18a3058e7e3..dd4e20f7e62 100644
> --- a/gcc/testsuite/g++.dg/template/nontype29.C
> +++ b/gcc/testsuite/g++.dg/template/nontype29.C
> @@ -3,7 +3,7 @@
>  // { dg-do compile }
>  // { dg-options "-Wall" }
>
> -#if __cpluspls >= 201103L
> +#if __cplusplus >= 201103L
>
>  // C++ 11 test case from comment #0.
>  namespace comment_0 {
> @@ -60,7 +60,7 @@ void h ()
>  }   // comment_2
>
>
> -#if __cpluspls >= 201103L
> +#if __cplusplus >= 201103L
>
>  // C++ 11 test case from comment #5.
>  namespace comment_5 {
> diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-64.c 
> b/gcc/testsuite/gcc.dg/Warray-bounds-64.c
> index 88b88debff4..f5ebc3dd4b2 100644
> --- a/gcc/testsuite/gcc.dg/Warray-bounds-64.c
> +++ b/gcc/testsuite/gcc.dg/Warray-bounds-64.c
> @@ -7,7 +7,7 @@
> asks for.

Re: [PATCH] Remove __cplusplus >= 201103

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 3:52 PM Martin Liška  wrote:
>
> Right now, we require a C++11 compiler, so the check is not needed any
> longer.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed once GCC 11.1 is released?

For system.h please keep it working for C code (not sure if we use
a C compiler anywhere).

Richard.

> Thanks,
> Martin
>
> gcc/analyzer/ChangeLog:
>
> * program-state.cc (program_state::operator=): Remove
> __cplusplus >= 201103.
> (program_state::program_state): Likewise.
> * program-state.h: Likewise.
> * region-model.h (class region_model): Remove dead code.
>
> gcc/ChangeLog:
>
> * bitmap.h (class auto_bitmap): Remove
> __cplusplus >= 201103.
> * config/aarch64/aarch64.c: Likewise.
> * gimple-ssa-store-merging.c 
> (store_immediate_info::store_immediate_info):
> Likewise.
> * sbitmap.h: Likewise.
> * system.h (STATIC_ASSERT): Likewise.
> ---
>  gcc/analyzer/program-state.cc  |  2 --
>  gcc/analyzer/program-state.h   |  4 
>  gcc/analyzer/region-model.h|  5 -
>  gcc/bitmap.h   |  2 --
>  gcc/config/aarch64/aarch64.c   |  2 --
>  gcc/gimple-ssa-store-merging.c | 10 +-
>  gcc/sbitmap.h  |  2 --
>  gcc/system.h   |  9 +
>  8 files changed, 2 insertions(+), 34 deletions(-)
>
> diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
> index f8094621309..5c690b08fd3 100644
> --- a/gcc/analyzer/program-state.cc
> +++ b/gcc/analyzer/program-state.cc
> @@ -731,7 +731,6 @@ program_state::operator= (const program_state )
>return *this;
>  }
>
> -#if __cplusplus >= 201103
>  /* Move constructor for program_state (when building with C++11).  */
>  program_state::program_state (program_state &)
>  : m_region_model (other.m_region_model),
> @@ -747,7 +746,6 @@ program_state::program_state (program_state &)
>
>m_valid = other.m_valid;
>  }
> -#endif
>
>  /* program_state's dtor.  */
>
> diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h
> index 898c57ff833..f16fe6ba984 100644
> --- a/gcc/analyzer/program-state.h
> +++ b/gcc/analyzer/program-state.h
> @@ -192,11 +192,7 @@ public:
>program_state (const extrinsic_state _state);
>program_state (const program_state );
>program_state& operator= (const program_state );
> -
> -#if __cplusplus >= 201103
>program_state (program_state &);
> -#endif
> -
>~program_state ();
>
>hashval_t hash () const;
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
> index 4123500f3bc..a169396dcb5 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -414,11 +414,6 @@ class region_model
>region_model (region_model_manager *mgr);
>region_model (const region_model );
>~region_model ();
> -
> -#if 0//__cplusplus >= 201103
> -  region_model (region_model &);
> -#endif
> -
>region_model = (const region_model );
>
>bool operator== (const region_model ) const;
> diff --git a/gcc/bitmap.h b/gcc/bitmap.h
> index 84632af7009..26138556b2a 100644
> --- a/gcc/bitmap.h
> +++ b/gcc/bitmap.h
> @@ -951,10 +951,8 @@ class auto_bitmap
>// Prevent making a copy that references our bitmap.
>auto_bitmap (const auto_bitmap &);
>auto_bitmap  = (const auto_bitmap &);
> -#if __cplusplus >= 201103L
>auto_bitmap (auto_bitmap &&);
>auto_bitmap  = (auto_bitmap &&);
> -#endif
>
>bitmap_head m_bits;
>  };
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 12625a4bee3..62fcad88215 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -221,9 +221,7 @@ public:
>   predicate in each predicate argument register.  This means that
>   we need at least 12 pieces.  */
>static const unsigned int MAX_PIECES = NUM_FP_ARG_REGS + NUM_PR_ARG_REGS;
> -#if __cplusplus >= 201103L
>static_assert (MAX_PIECES >= 8, "Need to store at least 8 predicates");
> -#endif
>
>/* Describes one piece of a PST.  Each piece is one of:
>
> diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
> index 7eb50d65a20..123c92d9b44 100644
> --- a/gcc/gimple-ssa-store-merging.c
> +++ b/gcc/gimple-ssa-store-merging.c
> @@ -1595,17 +1595,9 @@ store_immediate_info::store_immediate_info (unsigned 
> HOST_WIDE_INT bs,
>: bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre),
>  stmt (st), order (ord), rhs_code (rhscode), n (nr),
>  ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false),
> -lp_nr (nr2)
> -#if __cplusplus >= 201103L
> -, ops { op0r, op1r }
> +lp_nr (nr2), ops { op0r, op1r }
>  {
>  }
> -#else
> -{
> -  ops[0] = op0r;
> -  ops[1] = op1r;
> -}
> -#endif
>
>  /* Struct representing a group of stores to contiguous memory locations.
> These are produced by the second phase (coalescing) and consumed in the
>

Re: [PATCH 0/3] VAX backend preparatory updates for switching to LRA

2021-04-22 Thread Koning, Paul via Gcc-patches

> On Apr 21, 2021, at 5:32 PM, Maciej W. Rozycki  wrote:
> 
> ...
> OTOH switching to LRA regresses code generation seriously, by making the 
> indexed and indirect VAX address modes severely underutilised, so while 
> with these changes in place the backend can be switched to LRA with just a 
> trivial to remove the redefinition of TARGET_LRA_P, I think it is not yet 
> the right time to do it.

I noticed similar issues with pdp11, which at the moment allows LRA via a -mlra 
switch but doesn't make it the default.  Another mode that isn't handled well 
(or at all) by LRA is autoincrement/autodecrement.  It would be great if all 
these things could be done better, that would help several targets.  (I wonder 
if m68k would be another; doesn't it have similar addressing modes at least on 
the 68040?)

paul

Re: [PATCH] c++: do_class_deduction and dependent init [PR93383]

2021-04-22 Thread Patrick Palka via Gcc-patches

On Wed, 21 Apr 2021, Patrick Palka wrote:

> On Wed, 21 Apr 2021, Jason Merrill wrote:
> 
> > On 4/12/21 1:20 PM, Patrick Palka wrote:
> > > Here we're crashing during deduction for a template placeholder from a
> > > dependent initializer because one of the initializer's elements has an
> > > empty TREE_TYPE, something which resolve_args and later unify_one_argument
> > > don't expect.  And if the deduction from a dependent initializer
> > > otherwise fails, we prematurely issue an error rather than reattempting
> > > the deduction at instantiation time.
> > > 
> > > This patch makes do_class_deduction more tolerant about dependent
> > > initializers, in a manner similar to what do_auto_deduction does: if
> > > deduction from a dependent initializer fails, just return the original
> > > placeholder unchanged.
> > 
> > Why doesn't the type_dependent_expression_p check in do_auto_deduction catch
> > this already?
> 
> That check applies only when context != adc_unify, but here we have
> context == adc_unify since we're being called from
> convert_template_argument.
> 
> And currently, when 'auto' deduction fails for a dependent initializer,
> do_auto_deduction will just silently return the original placeholder:
> 
>   int val = type_unification_real (tparms, targs, parms, , 1, 0,
>DEDUCE_CALL,
>NULL, /*explain_p=*/false);
>   if (val > 0)
> {
>   if (processing_template_decl)
> /* Try again at instantiation time.  */
> return type;
> 
> so I suppose this patch just makes do_class_deduction behave more
> similarly to do_auto_deduction in this situation.

On second thought, I think attempting CTAD a dependent initializer as the patch
does might sometimes give us the wrong answer.  If e.g. the class template in
question has the deduction guides

  template  A(T) -> A;
  A(int) -> A;

then ahead-of-time CTAD for A{v} where v is type-dependent will succeed and
resolve to A, but at instantiation time the type of v might be int.  So
perhaps we should just have do_class_deduction punt on all type-dependent
expressions, e.g.

-- >8 --

gcc/cp/ChangeLog:

PR c++/89565
PR c++/93383
PR c++/99200
* pt.c (do_class_deduction): Give up if the initializer is
type-dependent.

gcc/testsuite/ChangeLog:

PR c++/89565
PR c++/93383
PR c++/99200
* g++.dg/cpp2a/nontype-class39.C: Remove dg-ice.
* g++.dg/cpp2a/nontype-class45.C: New test.
* g++.dg/cpp2a/nontype-class46.C: New test.
---
 gcc/cp/pt.c  |  5 +++
 gcc/testsuite/g++.dg/cpp2a/nontype-class39.C |  2 --
 gcc/testsuite/g++.dg/cpp2a/nontype-class45.C | 32 
 gcc/testsuite/g++.dg/cpp2a/nontype-class46.C | 11 +++
 4 files changed, 48 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class45.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class46.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7bcbe6dc3ce..6673f935ab6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29362,6 +29362,11 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   return error_mark_node;
 }
 
+  /* If the initializer is dependent, we can't resolve the class template
+ placeholder ahead of time.  */
+  if (type_dependent_expression_p (init))
+return ptype;
+
   tree type = TREE_TYPE (tmpl);
 
   bool try_list_ctor = false;
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
index 512afad8e4f..9b4da4f02ea 100644
--- a/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
@@ -1,7 +1,5 @@
 // PR c++/89565
 // { dg-do compile { target c++20 } }
-// { dg-additional-options "-fchecking" }
-// { dg-ice "resolve_args" }
 
 template 
 struct N{};
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C
new file mode 100644
index 000..e7addf5f291
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C
@@ -0,0 +1,32 @@
+// PR c++/99200
+// { dg-do compile { target c++20 } }
+
+template 
+struct A
+{
+  constexpr A (const char ()[N]) { for (int i = 0; i < N; i++) v[i] = s[i]; 
v[N] = 0; }
+  char v[N + 1];
+};
+
+template 
+struct B
+{
+  constexpr operator const char *() { return s.v; }
+};
+
+template 
+const char *
+foo ()
+{ 
+  return B<__PRETTY_FUNCTION__>{};
+}
+
+template 
+const char *
+bar ()
+{ 
+  return B<__FUNCTION__>{};
+}
+
+auto a = foo  ();
+auto b = bar  ();
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C
new file mode 100644
index 000..d91e800424f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C
@@ -0,0 +1,11 @@
+// PR c++/93383
+// { dg-do compile { target c++20 } }
+
+template  struct A {};
+
+template  struct B {
+  void

[PATCH] testsuite/arm: Improve unsigned-float.c

2021-04-22 Thread Christophe Lyon via Gcc-patches

The test requires an FPU, so use -march=armv7-a+fp -mfpu=auto instead
of -march=armv7-a.

We also remove dg-require-effective-target arm_fp_ok, but keep
dg-add-options arm_fp: this enables the test to pass on arm-eabi
configured with default cpu/fpu/mode.

dg-require-effective-target arm_fp_ok fails on such a configuration
for lack of FPU, since dg-options are not taken into account by
dg-require-effective-target.

Add -march=armv7-a+fp -mfpu=auto is sufficient for arm_fp options to
be acceptable.

This enables the test to pass on all the arm-eabi configurations I'm
testing, as well as arm-linux-gnueabi when forcing -march=armv5t.

2021-04-22  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/unsigned-float.c: Remove arm_fp_ok, adjust
dg-options.
---
 gcc/testsuite/gcc.target/arm/unsigned-float.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-float.c 
b/gcc/testsuite/gcc.target/arm/unsigned-float.c
index ad589d9..ea3abc7 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-float.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-float.c
@@ -1,8 +1,8 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_fp_ok } */
-/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } 
} */
 /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { 
"-mpure-code" } } */
-/* { dg-options "-march=armv7-a -O1" } */
+/* { dg-options "-march=armv7-a+fp -mfpu=auto -O1" } */
+/* Do not require arm_ok effective target to avoid skipping on arm-eabi with
+   default configure options.  */
 /* { dg-add-options arm_fp } */
 
 
-- 
2.7.4

Re: [PATCH] libstdc++: Add workaround for ia32 floating atomics miscompilations [PR100184]

2021-04-22 Thread Jonathan Wakely via Gcc-patches


On 22/04/21 14:42 +0200, Jakub Jelinek wrote:

Hi!

gcc on ia32 miscompiles various atomics involving floating point,
unfortunately I'm afraid it is too late to fix that for 11.1 and
as I'm quite lost on it, it might take a while for 12 too
(disabling all the 8 peephole2s would be easiest, but then we'd
run into optimization regressions).

While 1.cc just FAILs, with dejagnu 1.6.1 wait_notify.cc hangs the
make check even after the timeout fires.  The following patch therefore
xfails the former and skips the latter.

Tested on x86_64-linux where
make check RUNTESTFLAGS='conformance.exp=atomic_float/*.cc'
is still
=== libstdc++ Summary ===

# of expected passes8
and on i686-linux, where it is now
=== libstdc++ Summary ===

# of expected passes5
# of expected failures  1
# of unsupported tests  1

Ok for trunk/11.1 ?


OK, thanks.


2021-04-22  Jakub Jelinek  

PR target/100182
* testsuite/29_atomics_atomic_float/1.cc: Add dg-xfail-run-if for
ia32.
* testsuite/29_atomics/atomic_float/wait_notify.cc: Add dg-skip-if for
ia32.

--- libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc.jj  2021-01-05 
00:13:58.552294301 +0100
+++ libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc 2021-04-22 
14:29:19.778374653 +0200
@@ -18,6 +18,7 @@
// { dg-add-options ieee }
// { dg-options "-std=gnu++2a" }
// { dg-do run { target c++2a } }
+// { dg-xfail-run-if "PR100182" { ia32 } }

#include 
#include 
--- libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc.jj
2021-04-20 23:46:09.214189796 +0200
+++ libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc   
2021-04-22 14:28:19.793044944 +0200
@@ -2,6 +2,7 @@
// { dg-do run { target c++2a } }
// { dg-require-gthreads "" }
// { dg-additional-options "-pthread" { target pthread } }
+// { dg-skip-if "PR100182" { ia32 } }
// { dg-add-options libatomic }

// Copyright (C) 2020-2021 Free Software Foundation, Inc.

Jakub

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 2:52 PM Richard Biener
 wrote:
>
> On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek  wrote:
> >
> > On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via Gcc-patches 
> > wrote:
> > > > The question is if the pragma GCC target right now behaves incrementally
> > > > or not, whether
> > > > #pragma GCC target("avx2")
> > > > adds -mavx2 to options if it was missing before and nothing otherwise, 
> > > > or if
> > > > it switches other options off.  If it is incremental, we could e.g. try 
> > > > to
> > > > use the second least significant bit of global_options_set.x_* to mean
> > > > this option has been set explicitly by some surrounding #pragma GCC 
> > > > target.
> > > > The normal tests - global_options_set.x_flag_whatever could still work
> > > > fine because they wouldn't care if the option was explicit from anywhere
> > > > (command line or GCC target or target attribute) and just & 2 would mean
> > > > it was explicit from pragma GCC target; though there is the case of
> > > > bitfields... And then the inlining decision could check the & 2 flags to
> > > > see what is required and what is just from command line.
> > > > Or we can have some other pragma GCC that would be like target but would
> > > > have flags that are explicit (and could e.g. be more restricted, to ISA
> > > > options only, and let those use in addition to #pragma GCC target.
> > >
> > > I'm still curious as to what you think will break if always-inline does 
> > > what
> > > it is documented to do.
> >
> > We will silently accept calling intrinsics that must be used only in certain
> > ISA contexts, which will lead to people writing non-portable code.
> >
> > So -O2 -mno-avx
> > #include 
> >
> > void
> > foo (__m256 *x)
> > {
> >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > }
> > etc. will now be accepted when it shouldn't be.
> > clang rejects it like gcc with:
> > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > feature 'avx', but would be inlined into function 'foo' that is compiled 
> > without support for 'avx'
> >   x[0] = _mm256_sub_ps (x[1], x[2]);
> >  ^
> >
> > Note, if I do:
> > #include 
> >
> > __attribute__((target ("no-sse3"))) void
> > foo (__m256 *x)
> > {
> >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > }
> > and compile
> > clang -S -O2 -mavx2 1.c
> > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > feature 'avx', but would be inlined into function 'foo' that is compiled 
> > without support for 'avx'
> >   x[0] = _mm256_sub_ps (x[1], x[2]);
> >  ^
> > then from the error message it seems that unlike GCC, clang remembers
> > the exact target features that are needed for the intrinsics and checks just
> > those.
> > Though, looking at the preprocessed source, seems it uses
> > static __inline __m256 __attribute__((__always_inline__, __nodebug__, 
> > __target__("avx"), __min_vector_width__(256)))
> > _mm256_sub_ps(__m256 __a, __m256 __b)
> > {
> >   return (__m256)((__v8sf)__a-(__v8sf)__b);
> > }
> > and not target pragmas.
> >
> > Anyway, if we tweak our intrinsic headers so that
> > -#ifndef __AVX__
> >  #pragma GCC push_options
> >  #pragma GCC target("avx")
> > -#define __DISABLE_AVX__
> > -#endif /* __AVX__ */
> >
> > ...
> > -#ifdef __DISABLE_AVX__
> > -#undef __DISABLE_AVX__
> >  #pragma GCC pop_options
> > -#endif /* __DISABLE_AVX__ */
> > and do the opts_set->x_* & 2 stuff on explicit options coming out of
> > target/optimize pragmas and attributes, perhaps we don't even need
> > to introduce a new attribute and can handle everything magically:

Oh, and any such changes will likely interact with Martins ideas to rework
how optimize and target attributes work (aka adding ontop of the
commandline options).  That is, attribute target will then not be enough
to remember the exact set of needed ISA features (as opposed to what
likely clang implements?)

> > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise
> > disallow them for always_inline functions
>
> There are a lot of intrinsics using extern inline __gnu_inline though...
>
> > 2) for the isa flags and option mismatches, only disallow opts_set->x_* & 2
> > stuff
> > This will keep both intrinsics and glibc fortify macros working fine
> > in all the needed use cases.
>
> Yes, see my example in the other mail.
>
> I think before we add any new attributes we should sort out the
> current mess, eventually adding some testcases for desired
> diagnostic.
>
> Richard.
>
> > Jakub
> >

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek  wrote:
>
> On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > > The question is if the pragma GCC target right now behaves incrementally
> > > or not, whether
> > > #pragma GCC target("avx2")
> > > adds -mavx2 to options if it was missing before and nothing otherwise, or 
> > > if
> > > it switches other options off.  If it is incremental, we could e.g. try to
> > > use the second least significant bit of global_options_set.x_* to mean
> > > this option has been set explicitly by some surrounding #pragma GCC 
> > > target.
> > > The normal tests - global_options_set.x_flag_whatever could still work
> > > fine because they wouldn't care if the option was explicit from anywhere
> > > (command line or GCC target or target attribute) and just & 2 would mean
> > > it was explicit from pragma GCC target; though there is the case of
> > > bitfields... And then the inlining decision could check the & 2 flags to
> > > see what is required and what is just from command line.
> > > Or we can have some other pragma GCC that would be like target but would
> > > have flags that are explicit (and could e.g. be more restricted, to ISA
> > > options only, and let those use in addition to #pragma GCC target.
> >
> > I'm still curious as to what you think will break if always-inline does what
> > it is documented to do.
>
> We will silently accept calling intrinsics that must be used only in certain
> ISA contexts, which will lead to people writing non-portable code.
>
> So -O2 -mno-avx
> #include 
>
> void
> foo (__m256 *x)
> {
>   x[0] = _mm256_sub_ps (x[1], x[2]);
> }
> etc. will now be accepted when it shouldn't be.
> clang rejects it like gcc with:
> 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> feature 'avx', but would be inlined into function 'foo' that is compiled 
> without support for 'avx'
>   x[0] = _mm256_sub_ps (x[1], x[2]);
>  ^
>
> Note, if I do:
> #include 
>
> __attribute__((target ("no-sse3"))) void
> foo (__m256 *x)
> {
>   x[0] = _mm256_sub_ps (x[1], x[2]);
> }
> and compile
> clang -S -O2 -mavx2 1.c
> 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> feature 'avx', but would be inlined into function 'foo' that is compiled 
> without support for 'avx'
>   x[0] = _mm256_sub_ps (x[1], x[2]);
>  ^
> then from the error message it seems that unlike GCC, clang remembers
> the exact target features that are needed for the intrinsics and checks just
> those.
> Though, looking at the preprocessed source, seems it uses
> static __inline __m256 __attribute__((__always_inline__, __nodebug__, 
> __target__("avx"), __min_vector_width__(256)))
> _mm256_sub_ps(__m256 __a, __m256 __b)
> {
>   return (__m256)((__v8sf)__a-(__v8sf)__b);
> }
> and not target pragmas.
>
> Anyway, if we tweak our intrinsic headers so that
> -#ifndef __AVX__
>  #pragma GCC push_options
>  #pragma GCC target("avx")
> -#define __DISABLE_AVX__
> -#endif /* __AVX__ */
>
> ...
> -#ifdef __DISABLE_AVX__
> -#undef __DISABLE_AVX__
>  #pragma GCC pop_options
> -#endif /* __DISABLE_AVX__ */
> and do the opts_set->x_* & 2 stuff on explicit options coming out of
> target/optimize pragmas and attributes, perhaps we don't even need
> to introduce a new attribute and can handle everything magically:
>
> 1) if it is gnu_inline extern inline, allow indirect calls, otherwise
> disallow them for always_inline functions

There are a lot of intrinsics using extern inline __gnu_inline though...

> 2) for the isa flags and option mismatches, only disallow opts_set->x_* & 2
> stuff
> This will keep both intrinsics and glibc fortify macros working fine
> in all the needed use cases.

Yes, see my example in the other mail.

I think before we add any new attributes we should sort out the
current mess, eventually adding some testcases for desired
diagnostic.

Richard.

> Jakub
>

Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 2:21 PM Martin Liška  wrote:
>
> On 4/22/21 1:19 PM, Richard Biener wrote:
> > On Thu, Apr 22, 2021 at 11:02 AM Martin Liška  wrote:
> >>
> >> On 4/22/21 10:04 AM, Richard Biener wrote:
> >>> On Wed, Apr 21, 2021 at 3:08 PM Martin Liška  wrote:
> 
>  When -flto=jobserver is used and we cannot detect job server, then we can
>  still fallbackto -flto=N mode.
> 
>  Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
>  Ready to be installed?
> >>>
> >>> I think this behavior needs to be documented - it falls back to a less
> >>> conservative (possibly system overloading) mode - which IMHO is
> >>> non-obvious and IMHO we shouldn't do.
> >>
> >> Sure, I'm sending corresponding patch. Note that it's quite common mistake
> >> that '+' is missing in Makefile rule. That was motivation for my change.
> >
> > Sure, but that change won't get this fixed.
>
> It will as linker command line will contain (-flto=jobserver) and LTO will 
> fallback to -flto=N.
>
> > IMHO we should eventually
> > emit diagnostic like
> >
> > warning: could not find jobserver, compiling N jobs serially
> >
> > once N > 1 (or 2?).
>
> We do that now (for all N):
> lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment 
> variable is unset
>
>
> > Likewise if people just use -flto and auto-detection
> > finds nothing:
>
> -flto != -flto=auto
>
> Yes, -flto is a serial linking and we can emit a warning.

I'd avoid warning if there's just a single ltrans unit.

> > warning: using serial compilation of N LTRANS jobs
> > note: refer to http:// for how to use parallel compile
> >
> > using the URL diagnostics to point to -flto=... documentation.
>
> What about making that a proper warning (-Wlto)? We have diagnostics 
> infrastructure
> that prints URL links.

Note that drivers like lto-wrapper do not have fully initialized diagnostic
machinery and use a "different" set of overloads (likewise gen* programs).

> >
> > That is, teach users rather than second-guessing and eventually
> > blowing things up.  IMHO only the jobserver mode is safe to
> > automatically use.
>
> Well, -flto=auto is also fine and document. I think there is no possibility
> auto CPU deduction can fail. So -flto=jobserver (with missing make job server)
> and -flto (equal to -flto=1) worth emitting a warning.
>
> What do you think?

Yes, that sounds reasonable.  I suspect that people might want to see
-flto default to -flto=auto but then I don't think that's good because there's
no system wide job scheduler limiting things (systemd-jobserver anyone?)

Richard.

> Martin
>
> >
> > Richard.
> >
> >> Martin
> >>
> >>>
> >>> Richard.
> >>>
>  Thanks,
>  Martin
> 
>  gcc/ChangeLog:
> 
>  * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the
>  makeserver cannot be detected, then use -flto=N fallback.
>  ---
>   gcc/lto-wrapper.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
>  diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
>  index 03a5922f8ea..0b626d7c811 100644
>  --- a/gcc/lto-wrapper.c
>  +++ b/gcc/lto-wrapper.c
>  @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[])
> if (jobserver && jobserver_error != NULL)
>  {
>    warning (0, jobserver_error);
>  - parallel = 0;
>  + /* Fall back to auto parallelism.  */
>    jobserver = 0;
>  + auto_parallel = 1;
>  }
> else if (!jobserver && jobserver_error == NULL)
>  {
>  --
>  2.31.1
> 
> >>
>

[PATCH] libstdc++: Add workaround for ia32 floating atomics miscompilations [PR100184]

2021-04-22 Thread Jakub Jelinek via Gcc-patches

Hi!

gcc on ia32 miscompiles various atomics involving floating point,
unfortunately I'm afraid it is too late to fix that for 11.1 and
as I'm quite lost on it, it might take a while for 12 too
(disabling all the 8 peephole2s would be easiest, but then we'd
run into optimization regressions).

While 1.cc just FAILs, with dejagnu 1.6.1 wait_notify.cc hangs the
make check even after the timeout fires.  The following patch therefore
xfails the former and skips the latter.

Tested on x86_64-linux where
make check RUNTESTFLAGS='conformance.exp=atomic_float/*.cc'
is still
=== libstdc++ Summary ===

# of expected passes8
and on i686-linux, where it is now
=== libstdc++ Summary ===

# of expected passes5
# of expected failures  1
# of unsupported tests  1

Ok for trunk/11.1 ?

2021-04-22  Jakub Jelinek  

PR target/100182
* testsuite/29_atomics_atomic_float/1.cc: Add dg-xfail-run-if for
ia32.
* testsuite/29_atomics/atomic_float/wait_notify.cc: Add dg-skip-if for
ia32.

--- libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc.jj  2021-01-05 
00:13:58.552294301 +0100
+++ libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc 2021-04-22 
14:29:19.778374653 +0200
@@ -18,6 +18,7 @@
 // { dg-add-options ieee }
 // { dg-options "-std=gnu++2a" }
 // { dg-do run { target c++2a } }
+// { dg-xfail-run-if "PR100182" { ia32 } }
 
 #include 
 #include 
--- libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc.jj
2021-04-20 23:46:09.214189796 +0200
+++ libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc   
2021-04-22 14:28:19.793044944 +0200
@@ -2,6 +2,7 @@
 // { dg-do run { target c++2a } }
 // { dg-require-gthreads "" }
 // { dg-additional-options "-pthread" { target pthread } }
+// { dg-skip-if "PR100182" { ia32 } }
 // { dg-add-options libatomic }
 
 // Copyright (C) 2020-2021 Free Software Foundation, Inc.

Jakub

Re: GCC 11.1 Release Candidate available from gcc.gnu.org

2021-04-22 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 22, 2021 at 01:27:40PM +0100, Jonathan Wakely wrote:
> I'm just finishing testing this on various targets and will push to
> trunk. I'd like to backport it to gcc-11 too, to fix PR100179.

Ok for 11.1.

> commit 50070d8602a07160cece5890899929e9f210244d
> Author: Jonathan Wakely 
> Date:   Thu Apr 22 11:10:06 2021
> 
> libstdc++: Remove #error from  implementation [PR 100179]
> 
> This removes the #error from  for the case where
> neither __atomic_semaphore nor __platform_semaphore is defined.
> 
> Also rename the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro to
> _GLIBCXX_USE_POSIX_SEMAPHORE for consistency with the similar
> _GLIBCXX_USE_CXX11_ABI macro that can be used to request an alternative
> (ABI-changing) implementation.
> 
> libstdc++-v3/ChangeLog:
> 
> PR libstdc++/100179
> * include/bits/semaphore_base.h: Remove #error.
> * include/std/semaphore: Do not define anything unless one of
> the implementations is available.

Jakub

Re: GCC 11.1 Release Candidate available from gcc.gnu.org

2021-04-22 Thread Jonathan Wakely via Gcc-patches


On 21/04/21 15:30 +0100, Jonathan Wakely wrote:

On 21/04/21 13:12 +0100, Jonathan Wakely wrote:

On 21/04/21 12:38 +0100, Jonathan Wakely wrote:

On 20/04/21 22:12 -0700, Thomas Rodgers wrote:

@@ -86,6 +88,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 }

+_GLIBCXX_ALWAYS_INLINE bool
+_M_try_acquire() noexcept
+{
+  for (;;)
+   {
+ auto __err = sem_trywait(&_M_semaphore);
+ if (__err && (errno == EINTR))
+   continue;
+ else if (__err && (errno == EAGAIN))
+   return false;
+ else if (__err)
+   std::terminate();
+ else
+   break;
+   }
+  return true;
+}
+
 _GLIBCXX_ALWAYS_INLINE void
 _M_release(std::ptrdiff_t __update) noexcept
 {


Please just commit this part to trunk and gcc-11, not the macro
renaming (as that's been fixed by Jakub already).


I think on trunk I'd prefer to do the attached. WDYT?


In fact I think something like this is neded even on gcc-11 branch,
otherwise anything that tries to include  without atomics
or sem_t gets hard errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100179

And  includes  which includes , meaning
 is unusable on those targets.

So I think removing this #error is essential:

-// Note: the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro can be used to force the
-// use of Posix semaphores (sem_t). Doing so however, alters the ABI.
-#if defined __cpp_lib_atomic_wait && !_GLIBCXX_REQUIRE_POSIX_SEMAPHORE
 using __semaphore_impl = __atomic_semaphore;
-#elif _GLIBCXX_HAVE_POSIX_SEMAPHORE
-  using __semaphore_impl = __platform_semaphore;
-#else
-#  error "No suitable semaphore implementation available"
-#endif
+#endif // __cpp_lib_atomic_wait


Here's a simpler patch which just removes the #error and renames the
REQUIRE macro to USE. This still dumps the whole of  and
 in the global namespace when  is included, but
we'll have to live with that for the 11.1 release.

I'm just finishing testing this on various targets and will push to
trunk. I'd like to backport it to gcc-11 too, to fix PR100179.

commit 50070d8602a07160cece5890899929e9f210244d
Author: Jonathan Wakely 
Date:   Thu Apr 22 11:10:06 2021

libstdc++: Remove #error from  implementation [PR 100179]

This removes the #error from  for the case where
neither __atomic_semaphore nor __platform_semaphore is defined.

Also rename the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro to
_GLIBCXX_USE_POSIX_SEMAPHORE for consistency with the similar
_GLIBCXX_USE_CXX11_ABI macro that can be used to request an alternative
(ABI-changing) implementation.

libstdc++-v3/ChangeLog:

PR libstdc++/100179
* include/bits/semaphore_base.h: Remove #error.
* include/std/semaphore: Do not define anything unless one of
the implementations is available.

diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
index 84b33423fff..4948f0fd0bc 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -263,14 +263,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 #endif // __cpp_lib_atomic_wait
 
-// Note: the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro can be used to force the
+// Note: the _GLIBCXX_USE_POSIX_SEMAPHORE macro can be used to force the
 // use of Posix semaphores (sem_t). Doing so however, alters the ABI.
-#if defined __cpp_lib_atomic_wait && !_GLIBCXX_REQUIRE_POSIX_SEMAPHORE
+#if defined __cpp_lib_atomic_wait && !_GLIBCXX_USE_POSIX_SEMAPHORE
   using __semaphore_impl = __atomic_semaphore;
 #elif _GLIBCXX_HAVE_POSIX_SEMAPHORE
   using __semaphore_impl = __platform_semaphore;
-#else
-#  error "No suitable semaphore implementation available"
 #endif
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/std/semaphore b/libstdc++-v3/include/std/semaphore
index a1560915d83..52addca742c 100644
--- a/libstdc++-v3/include/std/semaphore
+++ b/libstdc++-v3/include/std/semaphore
@@ -34,6 +34,7 @@
 #if __cplusplus > 201703L
 #include 
 
+#if __cpp_lib_atomic_wait || _GLIBCXX_HAVE_POSIX_SEMAPHORE
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -89,5 +90,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
+#endif // cpp_lib_atomic_wait || _GLIBCXX_HAVE_POSIX_SEMAPHORE
 #endif // C++20
 #endif // _GLIBCXX_SEMAPHORE

[PATCH] Remove __cplusplus >= 201103

2021-04-22 Thread Martin Liška

Right now, we require a C++11 compiler, so the check is not needed any
longer.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed once GCC 11.1 is released?
Thanks,
Martin

gcc/analyzer/ChangeLog:

* program-state.cc (program_state::operator=): Remove
__cplusplus >= 201103.
(program_state::program_state): Likewise.
* program-state.h: Likewise.
* region-model.h (class region_model): Remove dead code.

gcc/ChangeLog:

* bitmap.h (class auto_bitmap): Remove
__cplusplus >= 201103.
* config/aarch64/aarch64.c: Likewise.
* gimple-ssa-store-merging.c 
(store_immediate_info::store_immediate_info):
Likewise.
* sbitmap.h: Likewise.
* system.h (STATIC_ASSERT): Likewise.
---
 gcc/analyzer/program-state.cc  |  2 --
 gcc/analyzer/program-state.h   |  4 
 gcc/analyzer/region-model.h|  5 -
 gcc/bitmap.h   |  2 --
 gcc/config/aarch64/aarch64.c   |  2 --
 gcc/gimple-ssa-store-merging.c | 10 +-
 gcc/sbitmap.h  |  2 --
 gcc/system.h   |  9 +
 8 files changed, 2 insertions(+), 34 deletions(-)

diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index f8094621309..5c690b08fd3 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -731,7 +731,6 @@ program_state::operator= (const program_state )
   return *this;
 }
 
-#if __cplusplus >= 201103
 /* Move constructor for program_state (when building with C++11).  */
 program_state::program_state (program_state &)
 : m_region_model (other.m_region_model),
@@ -747,7 +746,6 @@ program_state::program_state (program_state &)
 
   m_valid = other.m_valid;
 }
-#endif
 
 /* program_state's dtor.  */
 
diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h
index 898c57ff833..f16fe6ba984 100644
--- a/gcc/analyzer/program-state.h
+++ b/gcc/analyzer/program-state.h
@@ -192,11 +192,7 @@ public:
   program_state (const extrinsic_state _state);
   program_state (const program_state );
   program_state& operator= (const program_state );
-
-#if __cplusplus >= 201103
   program_state (program_state &);
-#endif
-
   ~program_state ();
 
   hashval_t hash () const;
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 4123500f3bc..a169396dcb5 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -414,11 +414,6 @@ class region_model
   region_model (region_model_manager *mgr);
   region_model (const region_model );
   ~region_model ();
-
-#if 0//__cplusplus >= 201103
-  region_model (region_model &);
-#endif
-
   region_model = (const region_model );
 
   bool operator== (const region_model ) const;
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 84632af7009..26138556b2a 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -951,10 +951,8 @@ class auto_bitmap
   // Prevent making a copy that references our bitmap.
   auto_bitmap (const auto_bitmap &);
   auto_bitmap  = (const auto_bitmap &);
-#if __cplusplus >= 201103L
   auto_bitmap (auto_bitmap &&);
   auto_bitmap  = (auto_bitmap &&);
-#endif
 
   bitmap_head m_bits;
 };
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12625a4bee3..62fcad88215 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -221,9 +221,7 @@ public:
  predicate in each predicate argument register.  This means that
  we need at least 12 pieces.  */
   static const unsigned int MAX_PIECES = NUM_FP_ARG_REGS + NUM_PR_ARG_REGS;
-#if __cplusplus >= 201103L
   static_assert (MAX_PIECES >= 8, "Need to store at least 8 predicates");
-#endif
 
   /* Describes one piece of a PST.  Each piece is one of:
 
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 7eb50d65a20..123c92d9b44 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -1595,17 +1595,9 @@ store_immediate_info::store_immediate_info (unsigned 
HOST_WIDE_INT bs,
   : bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre),
 stmt (st), order (ord), rhs_code (rhscode), n (nr),
 ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false),
-lp_nr (nr2)
-#if __cplusplus >= 201103L
-, ops { op0r, op1r }
+lp_nr (nr2), ops { op0r, op1r }
 {
 }
-#else
-{
-  ops[0] = op0r;
-  ops[1] = op1r;
-}
-#endif
 
 /* Struct representing a group of stores to contiguous memory locations.
These are produced by the second phase (coalescing) and consumed in the
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index 89a86e49f60..17660e5a65e 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -301,10 +301,8 @@ private:
   /* Prevent making a copy that refers to our sbitmap.  */
   auto_sbitmap (const auto_sbitmap &);
   auto_sbitmap  = (const auto_sbitmap &);
-#if __cplusplus >= 201103L
   auto_sbitmap (auto_sbitmap &&);
   auto_sbitmap  = (auto_sbitmap &&);
-#endif
 
   /* The

[PATCH] Fix various typos.

2021-04-22 Thread Martin Liška

Hello.

The patch fixes various typos.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR testsuite/100159
PR testsuite/100192

gcc/ChangeLog:

* builtins.c (expand_builtin): Fix typos and missing comments.
* dwarf2out.c (gen_subprogram_die): Likewise.
(gen_struct_or_union_type_die): Likewise.

gcc/fortran/ChangeLog:

* frontend-passes.c (optimize_expr): Fix typos and missing comments.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype29.C: Fix typos and missing comments.
* gcc.dg/Warray-bounds-64.c: Likewise.
* gcc.dg/Warray-parameter.c: Likewise.
* gcc.dg/Wstring-compare.c: Likewise.
* gcc.dg/format/gcc_diag-11.c: Likewise.
* gfortran.dg/array_constructor_3.f90: Likewise.
* gfortran.dg/matmul_bounds_9.f90: Likewise.
* gfortran.dg/pr78033.f90: Likewise.
* gfortran.dg/pr96325.f90: Likewise.
---
 gcc/builtins.c|  2 +-
 gcc/dwarf2out.c   |  4 ++--
 gcc/fortran/frontend-passes.c |  2 +-
 gcc/testsuite/g++.dg/template/nontype29.C |  4 ++--
 gcc/testsuite/gcc.dg/Warray-bounds-64.c   |  2 +-
 gcc/testsuite/gcc.dg/Warray-parameter.c   |  2 +-
 gcc/testsuite/gcc.dg/Wstring-compare.c| 10 +-
 gcc/testsuite/gcc.dg/format/gcc_diag-11.c |  2 +-
 gcc/testsuite/gfortran.dg/array_constructor_3.f90 |  2 +-
 gcc/testsuite/gfortran.dg/matmul_bounds_9.f90 |  2 +-
 gcc/testsuite/gfortran.dg/pr78033.f90 |  2 +-
 gcc/testsuite/gfortran.dg/pr96325.f90 |  2 +-
 12 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index d30c4eb62fc..8c5324bf7de 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -9986,7 +9986,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
   break;
 
 /* Expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it
-   back to a BUILT_IN_STRCMP. Remember to delete the 3rd paramater
+   back to a BUILT_IN_STRCMP. Remember to delete the 3rd parameter
when changing it to a strcmp call.  */
 case BUILT_IN_STRCMP_EQ:
   target = expand_builtin_memcmp (exp, target, true);
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index aba16848295..c36fd5a7f6a 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -23542,7 +23542,7 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
   resolve_variable_values ();
 }
 
-  /* Generate child dies for template paramaters.  */
+  /* Generate child dies for template parameters.  */
   if (early_dwarf && debug_info_level > DINFO_LEVEL_TERSE)
 gen_generic_params_dies (decl);
 
@@ -25471,7 +25471,7 @@ gen_struct_or_union_type_die (tree type, dw_die_ref 
context_die,
 
   scope_die = scope_die_for (type, context_die);
 
-  /* Generate child dies for template paramaters.  */
+  /* Generate child dies for template parameters.  */
   if (!type_die && debug_info_level > DINFO_LEVEL_TERSE)
 schedule_generic_params_dies_gen (type);
 
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index 7d3eae67632..93ac4b4a17c 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -373,7 +373,7 @@ optimize_expr (gfc_expr **e, int *walk_subtrees 
ATTRIBUTE_UNUSED,
   return 0;
 }
 
-/* Auxiliary function to handle the arguments to reduction intrnisics.  If the
+/* Auxiliary function to handle the arguments to reduction intrinsics.  If the
function is a scalar, just copy it; otherwise returns the new element, the
old one can be freed.  */
 
diff --git a/gcc/testsuite/g++.dg/template/nontype29.C 
b/gcc/testsuite/g++.dg/template/nontype29.C
index 18a3058e7e3..dd4e20f7e62 100644
--- a/gcc/testsuite/g++.dg/template/nontype29.C
+++ b/gcc/testsuite/g++.dg/template/nontype29.C
@@ -3,7 +3,7 @@
 // { dg-do compile }
 // { dg-options "-Wall" }
 
-#if __cpluspls >= 201103L
+#if __cplusplus >= 201103L
 
 // C++ 11 test case from comment #0.
 namespace comment_0 {
@@ -60,7 +60,7 @@ void h ()
 }   // comment_2
 
 
-#if __cpluspls >= 201103L
+#if __cplusplus >= 201103L
 
 // C++ 11 test case from comment #5.
 namespace comment_5 {
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-64.c 
b/gcc/testsuite/gcc.dg/Warray-bounds-64.c
index 88b88debff4..f5ebc3dd4b2 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-64.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-64.c
@@ -7,7 +7,7 @@
asks for.
 
{ dg-do compile }
-   { dg-options "-O2 -Wall -Warray-parameter -Wno-vla-paramater" } */
+   { dg-options "-O2 -Wall -Warray-parameter -Wno-vla-parameter" } */
 
 #define NOIPA __attribute__  ((noipa))
 
diff --git a/gcc/testsuite/gcc.dg/Warray-parameter.c 
b/gcc/testsuite/gcc.dg/Warray-parameter.c
index 42be3100e45..6c5195a31be 100644
--- a/gcc/testsuite/gcc.dg/Warray-parameter.c
+++

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via Gcc-patches wrote:
> > The question is if the pragma GCC target right now behaves incrementally
> > or not, whether
> > #pragma GCC target("avx2")
> > adds -mavx2 to options if it was missing before and nothing otherwise, or if
> > it switches other options off.  If it is incremental, we could e.g. try to
> > use the second least significant bit of global_options_set.x_* to mean
> > this option has been set explicitly by some surrounding #pragma GCC target.
> > The normal tests - global_options_set.x_flag_whatever could still work
> > fine because they wouldn't care if the option was explicit from anywhere
> > (command line or GCC target or target attribute) and just & 2 would mean
> > it was explicit from pragma GCC target; though there is the case of
> > bitfields... And then the inlining decision could check the & 2 flags to
> > see what is required and what is just from command line.
> > Or we can have some other pragma GCC that would be like target but would
> > have flags that are explicit (and could e.g. be more restricted, to ISA
> > options only, and let those use in addition to #pragma GCC target.
> 
> I'm still curious as to what you think will break if always-inline does what
> it is documented to do.

We will silently accept calling intrinsics that must be used only in certain
ISA contexts, which will lead to people writing non-portable code.

So -O2 -mno-avx
#include 

void
foo (__m256 *x)
{
  x[0] = _mm256_sub_ps (x[1], x[2]);
}
etc. will now be accepted when it shouldn't be.
clang rejects it like gcc with:
1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target feature 
'avx', but would be inlined into function 'foo' that is compiled without 
support for 'avx'
  x[0] = _mm256_sub_ps (x[1], x[2]);
 ^

Note, if I do:
#include 

__attribute__((target ("no-sse3"))) void
foo (__m256 *x)
{
  x[0] = _mm256_sub_ps (x[1], x[2]);
}
and compile
clang -S -O2 -mavx2 1.c
1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target feature 
'avx', but would be inlined into function 'foo' that is compiled without 
support for 'avx'
  x[0] = _mm256_sub_ps (x[1], x[2]);
 ^
then from the error message it seems that unlike GCC, clang remembers
the exact target features that are needed for the intrinsics and checks just
those.
Though, looking at the preprocessed source, seems it uses
static __inline __m256 __attribute__((__always_inline__, __nodebug__, 
__target__("avx"), __min_vector_width__(256)))
_mm256_sub_ps(__m256 __a, __m256 __b)
{
  return (__m256)((__v8sf)__a-(__v8sf)__b);
}
and not target pragmas.

Anyway, if we tweak our intrinsic headers so that
-#ifndef __AVX__
 #pragma GCC push_options
 #pragma GCC target("avx")
-#define __DISABLE_AVX__
-#endif /* __AVX__ */

...
-#ifdef __DISABLE_AVX__
-#undef __DISABLE_AVX__
 #pragma GCC pop_options
-#endif /* __DISABLE_AVX__ */
and do the opts_set->x_* & 2 stuff on explicit options coming out of
target/optimize pragmas and attributes, perhaps we don't even need
to introduce a new attribute and can handle everything magically:

1) if it is gnu_inline extern inline, allow indirect calls, otherwise
disallow them for always_inline functions
2) for the isa flags and option mismatches, only disallow opts_set->x_* & 2
stuff
This will keep both intrinsics and glibc fortify macros working fine
in all the needed use cases.

Jakub

Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.

2021-04-22 Thread Martin Liška

On 4/22/21 1:19 PM, Richard Biener wrote:
> On Thu, Apr 22, 2021 at 11:02 AM Martin Liška  wrote:
>>
>> On 4/22/21 10:04 AM, Richard Biener wrote:
>>> On Wed, Apr 21, 2021 at 3:08 PM Martin Liška  wrote:

 When -flto=jobserver is used and we cannot detect job server, then we can
 still fallbackto -flto=N mode.

 Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

 Ready to be installed?
>>>
>>> I think this behavior needs to be documented - it falls back to a less
>>> conservative (possibly system overloading) mode - which IMHO is
>>> non-obvious and IMHO we shouldn't do.
>>
>> Sure, I'm sending corresponding patch. Note that it's quite common mistake
>> that '+' is missing in Makefile rule. That was motivation for my change.
> 
> Sure, but that change won't get this fixed.

It will as linker command line will contain (-flto=jobserver) and LTO will 
fallback to -flto=N.

> IMHO we should eventually
> emit diagnostic like
> 
> warning: could not find jobserver, compiling N jobs serially
> 
> once N > 1 (or 2?).

We do that now (for all N):
lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment 
variable is unset


> Likewise if people just use -flto and auto-detection
> finds nothing:

-flto != -flto=auto

Yes, -flto is a serial linking and we can emit a warning.

> 
> warning: using serial compilation of N LTRANS jobs
> note: refer to http:// for how to use parallel compile
> 
> using the URL diagnostics to point to -flto=... documentation.

What about making that a proper warning (-Wlto)? We have diagnostics 
infrastructure
that prints URL links.

> 
> That is, teach users rather than second-guessing and eventually
> blowing things up.  IMHO only the jobserver mode is safe to
> automatically use.

Well, -flto=auto is also fine and document. I think there is no possibility
auto CPU deduction can fail. So -flto=jobserver (with missing make job server)
and -flto (equal to -flto=1) worth emitting a warning.

What do you think?
Martin

> 
> Richard.
> 
>> Martin
>>
>>>
>>> Richard.
>>>
 Thanks,
 Martin

 gcc/ChangeLog:

 * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the
 makeserver cannot be detected, then use -flto=N fallback.
 ---
  gcc/lto-wrapper.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
 index 03a5922f8ea..0b626d7c811 100644
 --- a/gcc/lto-wrapper.c
 +++ b/gcc/lto-wrapper.c
 @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[])
if (jobserver && jobserver_error != NULL)
 {
   warning (0, jobserver_error);
 - parallel = 0;
 + /* Fall back to auto parallelism.  */
   jobserver = 0;
 + auto_parallel = 1;
 }
else if (!jobserver && jobserver_error == NULL)
 {
 --
 2.31.1

>>

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 1:58 PM H.J. Lu  wrote:
>
> On Thu, Apr 22, 2021 at 4:23 AM Richard Biener
>  wrote:
> >
> > On Thu, Apr 22, 2021 at 12:30 PM Jakub Jelinek via Gcc-patches
> >  wrote:
> > >
> > > On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote:
> > > > How about this?
> > > >
> > > > @item general_regs_only
> > > > @cindex @code{general_regs_only} function attribute, x86
> > > > The @code{general_regs_only} function attribute informs the compiler
> > > > that the function uses only general purpose registers.  When the
> > > > compiler inlines a function with the @code{always_inline} attribute,
> > > > target-specific compilation options may lead to inline failures.
> > > > The @code{general_regs_only} attribute, if applicable, can be used
> > > > together with the @code{always_inline} attribute to reduce inlining
> > > > failure.
> > >
> > > I don't really like this attribute.
> > > It is very specific to what you want to solve and doesn't address the
> > > general problem, that always_inline means different things in different
> > > code, and that is a problem for many targets, not just one.
> > >
> > > As has been written, in some cases it means inline always, error
> > > whenever it is called indirectly which can't be optimized into a direct
> > > call that can be inlined and error whenever the inlining fails for other
> > > reasons.
> > >
> > > Another case, e.g. in the glibc fortify wrappers, is inline always
> > > when the call is direct or an indirect call can be optimized into a direct
> > > call and error when the inlining fails, but support indirect calls without
> > > errors.
> > >
> > > Another case, which is most of the x86/aarch64/arm etc. intrinsics, is
> > > inline always unless there is a target mismatch (roughly what is
> > > actually implemented).
> > >
> > > Because from the always_inline attribute it is impossible to determine 
> > > which
> > > one of those it is (for the indirect calls the rule could be
> > > gnu_inline extern inline means indirect calls are ok, anything else means
> > > indirect calls are bad), we need new syntax to distinguish those cases.
> > >
> > > general_regs_only attribute doesn't seem to be it, e.g. for the glibc
> > > fortify wrappers cases I don't see why we should forbid using floating 
> > > point
> > > in such inlines.
> > >
> > > So IMHO we need some new attribute for one of those, or optional parameter
> > > to always_inline.
> > >
> > > For the intrinsic case, ideal would be if we could record which ISA flags
> > > (or more generally which options) are required and which are not.  Either
> > > have some syntax where those would be explicitly specified in attribute 
> > > (but
> > > frankly that would be a maintainance nightmare), or derive those from
> > > surrounding pragmas.  Right now we have those wrapped in
> > > #ifndef __AVX2__
> > > #pragma GCC push_options
> > > #pragma GCC target("avx2")
> > > #define __DISABLE_AVX2__
> > > #endif /* __AVX2__ */
> > >
> > > ...
> > >
> > > #ifdef __DISABLE_AVX2__
> > > #undef __DISABLE_AVX2__
> > > #pragma GCC pop_options
> > > #endif /* __DISABLE_AVX2__ */
> > >
> > > The question is if the pragma GCC target right now behaves incrementally
> > > or not, whether
> > > #pragma GCC target("avx2")
> > > adds -mavx2 to options if it was missing before and nothing otherwise, or 
> > > if
> > > it switches other options off.  If it is incremental, we could e.g. try to
> > > use the second least significant bit of global_options_set.x_* to mean
> > > this option has been set explicitly by some surrounding #pragma GCC 
> > > target.
> > > The normal tests - global_options_set.x_flag_whatever could still work
> > > fine because they wouldn't care if the option was explicit from anywhere
> > > (command line or GCC target or target attribute) and just & 2 would mean
> > > it was explicit from pragma GCC target; though there is the case of
> > > bitfields... And then the inlining decision could check the & 2 flags to
> > > see what is required and what is just from command line.
> > > Or we can have some other pragma GCC that would be like target but would
> > > have flags that are explicit (and could e.g. be more restricted, to ISA
> > > options only, and let those use in addition to #pragma GCC target.
> >
> > I'm still curious as to what you think will break if always-inline does what
> > it is documented to do.
>
> No wrong code.  But the compiler will generate a different error message
> at the later stage if the ISA for the intrinsic isn't enabled.

But all issues at hand we're trying to fix are rejections of _valid_ code.

Improving diagnostics for invalid code should be secondary to
properly accepting valid code.

There's nothing perfect here - for diagnostics we might want to consider
an __attribute__((requires_isa(a,b,c))) which says that calling a function
with such attribute requires the ISAs in the list to be active.  Target
code can then iterate over calls _before_

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread H.J. Lu via Gcc-patches

On Thu, Apr 22, 2021 at 4:23 AM Richard Biener
 wrote:
>
> On Thu, Apr 22, 2021 at 12:30 PM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote:
> > > How about this?
> > >
> > > @item general_regs_only
> > > @cindex @code{general_regs_only} function attribute, x86
> > > The @code{general_regs_only} function attribute informs the compiler
> > > that the function uses only general purpose registers.  When the
> > > compiler inlines a function with the @code{always_inline} attribute,
> > > target-specific compilation options may lead to inline failures.
> > > The @code{general_regs_only} attribute, if applicable, can be used
> > > together with the @code{always_inline} attribute to reduce inlining
> > > failure.
> >
> > I don't really like this attribute.
> > It is very specific to what you want to solve and doesn't address the
> > general problem, that always_inline means different things in different
> > code, and that is a problem for many targets, not just one.
> >
> > As has been written, in some cases it means inline always, error
> > whenever it is called indirectly which can't be optimized into a direct
> > call that can be inlined and error whenever the inlining fails for other
> > reasons.
> >
> > Another case, e.g. in the glibc fortify wrappers, is inline always
> > when the call is direct or an indirect call can be optimized into a direct
> > call and error when the inlining fails, but support indirect calls without
> > errors.
> >
> > Another case, which is most of the x86/aarch64/arm etc. intrinsics, is
> > inline always unless there is a target mismatch (roughly what is
> > actually implemented).
> >
> > Because from the always_inline attribute it is impossible to determine which
> > one of those it is (for the indirect calls the rule could be
> > gnu_inline extern inline means indirect calls are ok, anything else means
> > indirect calls are bad), we need new syntax to distinguish those cases.
> >
> > general_regs_only attribute doesn't seem to be it, e.g. for the glibc
> > fortify wrappers cases I don't see why we should forbid using floating point
> > in such inlines.
> >
> > So IMHO we need some new attribute for one of those, or optional parameter
> > to always_inline.
> >
> > For the intrinsic case, ideal would be if we could record which ISA flags
> > (or more generally which options) are required and which are not.  Either
> > have some syntax where those would be explicitly specified in attribute (but
> > frankly that would be a maintainance nightmare), or derive those from
> > surrounding pragmas.  Right now we have those wrapped in
> > #ifndef __AVX2__
> > #pragma GCC push_options
> > #pragma GCC target("avx2")
> > #define __DISABLE_AVX2__
> > #endif /* __AVX2__ */
> >
> > ...
> >
> > #ifdef __DISABLE_AVX2__
> > #undef __DISABLE_AVX2__
> > #pragma GCC pop_options
> > #endif /* __DISABLE_AVX2__ */
> >
> > The question is if the pragma GCC target right now behaves incrementally
> > or not, whether
> > #pragma GCC target("avx2")
> > adds -mavx2 to options if it was missing before and nothing otherwise, or if
> > it switches other options off.  If it is incremental, we could e.g. try to
> > use the second least significant bit of global_options_set.x_* to mean
> > this option has been set explicitly by some surrounding #pragma GCC target.
> > The normal tests - global_options_set.x_flag_whatever could still work
> > fine because they wouldn't care if the option was explicit from anywhere
> > (command line or GCC target or target attribute) and just & 2 would mean
> > it was explicit from pragma GCC target; though there is the case of
> > bitfields... And then the inlining decision could check the & 2 flags to
> > see what is required and what is just from command line.
> > Or we can have some other pragma GCC that would be like target but would
> > have flags that are explicit (and could e.g. be more restricted, to ISA
> > options only, and let those use in addition to #pragma GCC target.
>
> I'm still curious as to what you think will break if always-inline does what
> it is documented to do.

No wrong code.  But the compiler will generate a different error message
at the later stage if the ISA for the intrinsic isn't enabled.

-- 
H.J.

[PING^3][PATCH][omp, simt] Handle alternative IV

2021-04-22 Thread Tom de Vries

On 12/17/20 5:46 PM, Tom de Vries wrote:
> On 10/15/20 5:05 PM, Tom de Vries wrote:
>> On 10/2/20 3:21 PM, Tom de Vries wrote:
>>> Hi,
>>>
>>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>>> ...
>>>   int s = 1;
>>>   #pragma omp target simd
>>>   for (int i = N - 1; i > -1; i -= s)
>>> a[i] = 1;
>>> ...
>>> which, given that N is 32, sets a[0..31] to 1.
>>>
>>> After omp-expand, this looks like:
>>> ...
>>>:
>>>   simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>>   .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>>   D.3193 = -s;
>>>   s.9 = s;
>>>   D.3204 = .GOMP_SIMT_LANE ();
>>>   D.3205 = -s.9;
>>>   D.3206 = (int) D.3204;
>>>   D.3207 = D.3205 * D.3206;
>>>   i = D.3207 + 31;
>>>   D.3209 = 0;
>>>   D.3210 = -s.9;
>>>   D.3211 = D.3210 - i;
>>>   D.3210 = -s.9;
>>>   D.3212 = D.3211 / D.3210;
>>>   D.3213 = (unsigned int) D.3212;
>>>   D.3213 = i >= 0 ? D.3213 : 0;
>>>
>>>:
>>>   if (D.3209 < D.3213)
>>> goto ; [87.50%]
>>>   else
>>> goto ; [12.50%]
>>>
>>>:
>>>   a[i] = 1;
>>>   D.3215 = -s.9;
>>>   D.3219 = .GOMP_SIMT_VF ();
>>>   D.3216 = (int) D.3219;
>>>   D.3220 = D.3215 * D.3216;
>>>   i = D.3220 + i;
>>>   D.3209 = D.3209 + 1;
>>>   goto ; [100.00%]
>>> ...
>>>
>>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>>> on the lane that is executing) at bb entry.
>>>
>>> So we have the following sequence:
>>> - a[0..31] is set to 1
>>> - i is updated to -32..-1
>>> - D.3209 is updated to 1 (being 0 initially)
>>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>>>   to true
>>> - bb6 is once more executed, which should not happen because all the 
>>> elements
>>>   that needed to be handled were already handled.
>>> - consequently, elements that should not be written are written
>>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>>   ...
>>>   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>>   ...
>>>   and with CANARY_SIZE unmodified, we run into:
>>>   ...
>>>   Expected 0, got 1 at base[-961]
>>>   Aborted (core dumped)
>>>   ...
>>>
>>> The cause of this is as follows:
>>> - because the step s is a variable rather than a constant, an alternative
>>>   IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>>   loop condition is tested in terms of the alternative IV rather than
>>>   the original IV (i in our example).
>>> - the SIMT code in expand_omp_simd works by modifying step and initial 
>>> value.
>>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>>   modified by the SIMT code and then used there-after.
>>> - The step fd->loop.step is loaded into a variable step, which is is 
>>> modified
>>>   by the SIMT code, but afterwards there are uses of both step and
>>>   fd->loop.step.
>>> - There are uses of fd->loop.step in the alternative IV handling code,
>>>   which should use step instead.
>>>
>>> Fix this by introducing an additional variable orig_step, which is not
>>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>>> by either step or orig_step.
>>>
>>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>>
>>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>>> with driver 450.66.
>>>
>>> OK for trunk?
>>>
>>

Ping^3.

Thanks,
- Tom

>>> [omp, simt] Handle alternative IV
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-10-02  Tom de Vries  
>>>
>>> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>>> fd->loop.step by either step or orig_step.
>>>
>>> libgomp/ChangeLog:
>>>
>>> 2020-10-02  Tom de Vries  
>>>
>>> * testsuite/libgomp.c/pr81778.c: New test.
>>>
>>> ---
>>>  gcc/omp-expand.c  | 11 
>>>  libgomp/testsuite/libgomp.c/pr81778.c | 48 
>>> +++
>>>  2 files changed, 54 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>>> index 99cb4f9dda4..80e35ac0294 100644
>>> --- a/gcc/omp-expand.c
>>> +++ b/gcc/omp-expand.c
>>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct 
>>> omp_for_data *fd)
>>>n2 = OMP_CLAUSE_DECL (innerc);
>>>  }
>>>tree step = fd->loop.step;
>>> +  tree orig_step = step; /* May be different from step if is_simt.  */
>>>  
>>>bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>>>   OMP_CLAUSE__SIMT_);
>>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct 
>>> omp_for_data *fd)
>>>tree altv = NULL_TREE, altn2 = NULL_TREE;
>>>if (fd->collapse == 1
>>>&& !broken_loop
>>> -  && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>>> +  && TREE_CODE (orig_step) != INTEGER_CST)
>>>  {
>>>/* The vectorizer currently punts on loops with non-constant steps
>>>  for the main IV (can't compute

Re: [PATCH 1/2] bpf: align function entry point to 64 bits

2021-04-22 Thread Jose E. Marchesi via Gcc-patches



Hi YiFei.

This is OK for both trunk and GCC 10.
Thanks for the fix!

[I see you don't have a copyright transfer contract in place.  I believe
 this change, and also the patch in 2/2, are small/trivial enough to not
 require one, but if you plan to do more contributions in the future we
 will need you to do the paperwork.  I just sent you more information
 about this privately.]

> Libbpf does not treat paddings after functions well. If function
> symbols does not cover a whole text section, it will emit error
> similar to:
>
>   libbpf: sec '.text': failed to find program symbol at offset 56
>
> Each instruction in BPF is a multiple of 8 bytes, so align the
> functions to 8 bytes, similar to how clang does it.
>
> 2021-04-22  YiFei Zhu  
>
> gcc/
>
>   * config/bpf/bpf.h (FUNCTION_BOUNDARY): Set to 64.
> ---
>  gcc/config/bpf/bpf.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
> index ef0123b56a63..7ff2f310d583 100644
> --- a/gcc/config/bpf/bpf.h
> +++ b/gcc/config/bpf/bpf.h
> @@ -57,8 +57,8 @@
> 64-bit at any time.  */
>  #define STACK_BOUNDARY 64
>  
> -/* Function entry points are aligned to 128 bits.  */
> -#define FUNCTION_BOUNDARY 128
> +/* Function entry points are aligned to 64 bits.  */
> +#define FUNCTION_BOUNDARY 64
>  
>  /* Maximum alignment required by data of any type.  */
>  #define BIGGEST_ALIGNMENT 64

Re: [PATCH] aarch64: Avoid duplicating bti j insns for jump tables [PR99988]

2021-04-22 Thread Richard Sandiford via Gcc-patches

Christophe Lyon via Gcc-patches  writes:
> On Wed, 21 Apr 2021 at 14:05, Richard Sandiford via Gcc-patches
>  wrote:
>>
>> Alex Coplan  writes:
>> > Hi Richard,
>> >
>> > On 15/04/2021 18:45, Richard Sandiford wrote:
>> >> Looks good in general, but like you say, it's GCC 12 material.
>> >
>> > Thanks for the review. The attached patch addresses these comments and
>> > bootstraps/regtests OK on aarch64-linux-gnu. OK for trunk?
>>
>> OK, thanks.
>>
>
> The new test fails with -mabi=ilp32:
> sorry, unimplemented: return address signing is only supported for 
> '-mabi=lp64'
>
> Is that OK:
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c
> b/gcc/testsuite/gcc.target/aarch64/pr99988.c
> index 2d87f41..7cca496 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr99988.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target lp64 } } */
>  /* { dg-options "-O2 -mbranch-protection=standard" } */
>  /* { dg-final { scan-assembler-times {bti j} 13 } } */
>  int a;

OK, thanks.

Richard

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 12:30 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote:
> > How about this?
> >
> > @item general_regs_only
> > @cindex @code{general_regs_only} function attribute, x86
> > The @code{general_regs_only} function attribute informs the compiler
> > that the function uses only general purpose registers.  When the
> > compiler inlines a function with the @code{always_inline} attribute,
> > target-specific compilation options may lead to inline failures.
> > The @code{general_regs_only} attribute, if applicable, can be used
> > together with the @code{always_inline} attribute to reduce inlining
> > failure.
>
> I don't really like this attribute.
> It is very specific to what you want to solve and doesn't address the
> general problem, that always_inline means different things in different
> code, and that is a problem for many targets, not just one.
>
> As has been written, in some cases it means inline always, error
> whenever it is called indirectly which can't be optimized into a direct
> call that can be inlined and error whenever the inlining fails for other
> reasons.
>
> Another case, e.g. in the glibc fortify wrappers, is inline always
> when the call is direct or an indirect call can be optimized into a direct
> call and error when the inlining fails, but support indirect calls without
> errors.
>
> Another case, which is most of the x86/aarch64/arm etc. intrinsics, is
> inline always unless there is a target mismatch (roughly what is
> actually implemented).
>
> Because from the always_inline attribute it is impossible to determine which
> one of those it is (for the indirect calls the rule could be
> gnu_inline extern inline means indirect calls are ok, anything else means
> indirect calls are bad), we need new syntax to distinguish those cases.
>
> general_regs_only attribute doesn't seem to be it, e.g. for the glibc
> fortify wrappers cases I don't see why we should forbid using floating point
> in such inlines.
>
> So IMHO we need some new attribute for one of those, or optional parameter
> to always_inline.
>
> For the intrinsic case, ideal would be if we could record which ISA flags
> (or more generally which options) are required and which are not.  Either
> have some syntax where those would be explicitly specified in attribute (but
> frankly that would be a maintainance nightmare), or derive those from
> surrounding pragmas.  Right now we have those wrapped in
> #ifndef __AVX2__
> #pragma GCC push_options
> #pragma GCC target("avx2")
> #define __DISABLE_AVX2__
> #endif /* __AVX2__ */
>
> ...
>
> #ifdef __DISABLE_AVX2__
> #undef __DISABLE_AVX2__
> #pragma GCC pop_options
> #endif /* __DISABLE_AVX2__ */
>
> The question is if the pragma GCC target right now behaves incrementally
> or not, whether
> #pragma GCC target("avx2")
> adds -mavx2 to options if it was missing before and nothing otherwise, or if
> it switches other options off.  If it is incremental, we could e.g. try to
> use the second least significant bit of global_options_set.x_* to mean
> this option has been set explicitly by some surrounding #pragma GCC target.
> The normal tests - global_options_set.x_flag_whatever could still work
> fine because they wouldn't care if the option was explicit from anywhere
> (command line or GCC target or target attribute) and just & 2 would mean
> it was explicit from pragma GCC target; though there is the case of
> bitfields... And then the inlining decision could check the & 2 flags to
> see what is required and what is just from command line.
> Or we can have some other pragma GCC that would be like target but would
> have flags that are explicit (and could e.g. be more restricted, to ISA
> options only, and let those use in addition to #pragma GCC target.

I'm still curious as to what you think will break if always-inline does what
it is documented to do.

Richard.

> Jakub
>

Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 11:02 AM Martin Liška  wrote:
>
> On 4/22/21 10:04 AM, Richard Biener wrote:
> > On Wed, Apr 21, 2021 at 3:08 PM Martin Liška  wrote:
> >>
> >> When -flto=jobserver is used and we cannot detect job server, then we can
> >> still fallbackto -flto=N mode.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > I think this behavior needs to be documented - it falls back to a less
> > conservative (possibly system overloading) mode - which IMHO is
> > non-obvious and IMHO we shouldn't do.
>
> Sure, I'm sending corresponding patch. Note that it's quite common mistake
> that '+' is missing in Makefile rule. That was motivation for my change.

Sure, but that change won't get this fixed.  IMHO we should eventually
emit diagnostic like

warning: could not find jobserver, compiling N jobs serially

once N > 1 (or 2?).  Likewise if people just use -flto and auto-detection
finds nothing:

warning: using serial compilation of N LTRANS jobs
note: refer to http:// for how to use parallel compile

using the URL diagnostics to point to -flto=... documentation.

That is, teach users rather than second-guessing and eventually
blowing things up.  IMHO only the jobserver mode is safe to
automatically use.

Richard.

> Martin
>
> >
> > Richard.
> >
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the
> >> makeserver cannot be detected, then use -flto=N fallback.
> >> ---
> >>  gcc/lto-wrapper.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> >> index 03a5922f8ea..0b626d7c811 100644
> >> --- a/gcc/lto-wrapper.c
> >> +++ b/gcc/lto-wrapper.c
> >> @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[])
> >>if (jobserver && jobserver_error != NULL)
> >> {
> >>   warning (0, jobserver_error);
> >> - parallel = 0;
> >> + /* Fall back to auto parallelism.  */
> >>   jobserver = 0;
> >> + auto_parallel = 1;
> >> }
> >>else if (!jobserver && jobserver_error == NULL)
> >> {
> >> --
> >> 2.31.1
> >>
>

Re: [PATCH][GCC 10] aarch64: Fix SVE ACLE builtins with LTO [PR99216]

2021-04-22 Thread Richard Sandiford via Gcc-patches

Alex Coplan  writes:
> Hi,
>
> Here is a backport of my fix for PR99216. The only change w.r.t the
> original patch is a bump of lto-streamer.h:LTO_minor_version.
>
> Bootstrapped and regtested on aarch64-linux-gnu, no issues.
>
> OK for GCC 10 branch?

OK, thanks.

Richard

> As discussed in the PR, we currently have two different numbering
> schemes for SVE builtins: one for C, and one for C++. This is
> problematic for LTO, where we end up getting confused about which
> intrinsic we're talking about. This patch inserts placeholders into the
> registered_functions vector to ensure that there is a consistent
> numbering scheme for both C and C++.
>
> This version uses integer_zero_node as a placeholder node instead of
> building a function decl. This is safe because the node is only returned
> by the TARGET_BUILTIN_DECL hook, which (on AArch64) is only used for
> validation when builtin decls are streamed into lto1.
>
> gcc/ChangeLog:
>
>   PR target/99216
>   * config/aarch64/aarch64-sve-builtins.cc
>   (function_builder::add_function): Add placeholder_p argument, use
>   placeholder decls if this is set.
>   (function_builder::add_unique_function): Instead of conditionally adding
>   direct overloads, unconditionally add either a direct overload or a
>   placeholder.
>   (function_builder::add_overloaded_function): Set placeholder_p if we're
>   using C++ overloads. Use the obstack for string storage instead
>   of relying on the tree nodes.
>   (function_builder::add_overloaded_functions): Don't return early for
>   m_direct_overloads: we need to add placeholders.
>   * config/aarch64/aarch64-sve-builtins.h
>   (function_builder::add_function): Add placeholder_p argument.
>   * lto-streamer.h (LTO_minor_version): Bump.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/99216
>   * g++.target/aarch64/sve/pr99216.C: New test.
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index d534ca923d9..336a1db662b 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -995,12 +995,29 @@ registered_function &
>  function_builder::add_function (const function_instance ,
>   const char *name, tree fntype, tree attrs,
>   uint64_t required_extensions,
> - bool overloaded_p)
> + bool overloaded_p,
> + bool placeholder_p)
>  {
>unsigned int code = vec_safe_length (registered_functions);
>code = (code << AARCH64_BUILTIN_SHIFT) | AARCH64_BUILTIN_SVE;
> -  tree decl = simulate_builtin_function_decl (input_location, name, fntype,
> -   code, NULL, attrs);
> +
> +  /* We need to be able to generate placeholders to enusre that we have a
> + consistent numbering scheme for function codes between the C and C++
> + frontends, so that everything ties up in LTO.
> +
> + Currently, tree-streamer-in.c:unpack_ts_function_decl_value_fields
> + validates that tree nodes returned by TARGET_BUILTIN_DECL are non-NULL 
> and
> + some node other than error_mark_node. This is a holdover from when 
> builtin
> + decls were streamed by code rather than by value.
> +
> + Ultimately, we should be able to remove this validation of BUILT_IN_MD
> + nodes and remove the target hook. For now, however, we need to appease 
> the
> + validation and return a non-NULL, non-error_mark_node node, so we
> + arbitrarily choose integer_zero_node.  */
> +  tree decl = placeholder_p
> +? integer_zero_node
> +: simulate_builtin_function_decl (input_location, name, fntype,
> +   code, NULL, attrs);
>  
>registered_function  = *ggc_alloc  ();
>rfn.instance = instance;
> @@ -1032,7 +1049,7 @@ function_builder::add_unique_function (const 
> function_instance ,
>  argument_types.address ());
>tree attrs = get_attributes (instance);
>registered_function  = add_function (instance, name, fntype, attrs,
> -required_extensions, false);
> +required_extensions, false, false);
>  
>/* Enter the function into the hash table.  */
>hashval_t hash = instance.hash ();
> @@ -1043,16 +1060,14 @@ function_builder::add_unique_function (const 
> function_instance ,
>  
>/* Also add the function under its overloaded alias, if we want
>   a separate decl for each instance of an overloaded function.  */
> -  if (m_direct_overloads || force_direct_overloads)
> +  char *overload_name = get_name (instance, true);
> +  if (strcmp (name, overload_name) != 0)
>  {
> -  char *overload_name = get_name (instance, true);
> -  if (strcmp (name, overload_name) != 0)
> - {
> -   /*

Re: [PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

2021-04-22 Thread Tom de Vries

On 4/21/21 7:02 PM, Alexander Monakov wrote:
> On Wed, 21 Apr 2021, Tom de Vries wrote:
> 
>>> I don't think implementing futex_wait is possible on nvptx.
>>>
>>
>> Well, I gave it a try, attached below.  Can you explain why you think
>> it's not possible, or pinpoint a problem in the implementation?
> 
> Responding only to this for now. When I said futex_wait I really meant
> Linux futex wait, where the API is tied to a 32-bit futex control word
> and nothing else. Your implementation works with a gomp_barrier_t that
> includes more than one field. It would be confusing to call it a
> "futex wait", it is not a 1:1 replacement.
> 
> (i.e. unlike a proper futex, it can work only for gomp_barrier_t objects)

Ah, I see, agreed, that makes sense.  I was afraid there was some
fundamental problem that I overlooked.

Here's an updated version.  I've tried to make it clear that the
futex_wait/wake are locally used versions, not generic functionality.

The main change in structure is that I'm now using the
generation_to_barrier trick from the rtems port, allowing linux/bar.c to
be included rather than copied (because the barrier argument is now
implicit).

Furthermore, I've reviewed the MEMMODELs used for the atomic accesses,
and updated a few.

Also now the cpu_relax from doacross.h is used.

Thanks,
- Tom
[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

This is a WIP version that does not yet take performance into consideration,
but instead focuses on copying a working version as completely as possible,
and isolating the machine-specific changes to as few functions as
possible.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator, both as-is and with
do_spin hardcoded to 1.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.

---

Re: [PATCH] [i386] MASK_AVX256_SPLIT_UNALIGNED_STORE/LOAD should be cleared in opts->x_target_flags when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute.

2021-04-22 Thread Uros Bizjak via Gcc-patches

On Thu, Apr 22, 2021 at 12:05 PM Hongtao Liu  wrote:
>
> Hi:
>   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>   Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/100093
> * config/i386/i386-options.c (ix86_option_override_internal):
> Clear MASK_AVX256_SPLIT_UNALIGNED_LOAD/STORE in x_target_flags
> when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled
> by target attribute.
>
> gcc/testsuite/ChangeLog:
>
> PR target/100093
> * gcc.target/i386/pr100093.c: New test.

OK.

Thanks,
Uros.

>
>
> --
> BR,
> Hongtao

[PATCH 2/2] bpf: allow BSS symbols to be global symbols

2021-04-22 Thread YiFei Zhu via Gcc-patches

Prior to this, a BSS declaration such as:

  int foo;
  static int bar;

Generates:

  .global foo
  .local  foo
  .comm   foo,4,4
  .local  bar
  .comm   bar,4,4

Creating symbols:

   b foo
  0004 b bar

Both symbols are local. However, libbpf bpf_object__variable_offset
rquires symbols to be STB_GLOBAL & STT_OBJECT for data section lookup.
This patch makes the same declaration generate:

  .global foo
  .type   foo, @object
  .lcomm  foo,4,4
  .local  bar
  .comm   bar,4,4

Creating symbols:

   B foo
  0004 b bar

And libbpf will be okay with looking up the global symbol "foo".

2021-04-22  YiFei Zhu  

gcc/

* config/bpf/bpf.h (ASM_OUTPUT_ALIGNED_BSS): Use .type and .lcomm.
---
 gcc/config/bpf/bpf.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index 7ff2f310d583..55beecbcb364 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -414,9 +414,15 @@ enum reg_class
Try to use asm_output_aligned_bss to implement this macro.  */
 
 #define ASM_OUTPUT_ALIGNED_BSS(FILE, DECL, NAME, SIZE, ALIGN)  \
-  do { \
-ASM_OUTPUT_ALIGNED_LOCAL (FILE, NAME, SIZE, ALIGN);\
-  } while (0)
+  do   \
+{  \
+  ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "object");
\
+  fprintf ((FILE), "%s", "\t.lcomm\t");\
+  assemble_name ((FILE), (NAME));  \
+  fprintf ((FILE), "," HOST_WIDE_INT_PRINT_UNSIGNED ",%u\n",   \
+  (SIZE), (ALIGN) / BITS_PER_UNIT);\
+}  \
+  while (0)
 
 /*** Output and Generation of Labels.  */
 
-- 
2.31.1

[PATCH 1/2] bpf: align function entry point to 64 bits

2021-04-22 Thread YiFei Zhu via Gcc-patches

Libbpf does not treat paddings after functions well. If function
symbols does not cover a whole text section, it will emit error
similar to:

  libbpf: sec '.text': failed to find program symbol at offset 56

Each instruction in BPF is a multiple of 8 bytes, so align the
functions to 8 bytes, similar to how clang does it.

2021-04-22  YiFei Zhu  

gcc/

* config/bpf/bpf.h (FUNCTION_BOUNDARY): Set to 64.
---
 gcc/config/bpf/bpf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index ef0123b56a63..7ff2f310d583 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -57,8 +57,8 @@
64-bit at any time.  */
 #define STACK_BOUNDARY 64
 
-/* Function entry points are aligned to 128 bits.  */
-#define FUNCTION_BOUNDARY 128
+/* Function entry points are aligned to 64 bits.  */
+#define FUNCTION_BOUNDARY 64
 
 /* Maximum alignment required by data of any type.  */
 #define BIGGEST_ALIGNMENT 64
-- 
2.31.1

[PATCH] [i386] MASK_AVX256_SPLIT_UNALIGNED_STORE/LOAD should be cleared in opts->x_target_flags when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute.

2021-04-22 Thread Hongtao Liu via Gcc-patches

Hi:
  Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
  Ok for trunk?

gcc/ChangeLog:

PR target/100093
* config/i386/i386-options.c (ix86_option_override_internal):
Clear MASK_AVX256_SPLIT_UNALIGNED_LOAD/STORE in x_target_flags
when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled
by target attribute.

gcc/testsuite/ChangeLog:

PR target/100093
* gcc.target/i386/pr100093.c: New test.


-- 
BR,
Hongtao
From 6944526e088df6c4ef73ab797ee90f99c0ad566a Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Fri, 16 Apr 2021 11:29:10 +0800
Subject: [PATCH] [i386] MASK_AVX256_SPLIT_UNALIGNED_STORE/LOAD should be
 cleared in opts->x_target_flags when
 X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute.

gcc/ChangeLog:

	PR target/100093
	* config/i386/i386-options.c (ix86_option_override_internal):
	Clear MASK_AVX256_SPLIT_UNALIGNED_LOAD/STORE in x_target_flags
	when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled
	by target attribute.

gcc/testsuite/ChangeLog:

	PR target/100093
	* gcc.target/i386/pr100093.c: New test.
---
 gcc/config/i386/i386-options.c   |  7 +++
 gcc/testsuite/gcc.target/i386/pr100093.c | 12 
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100093.c

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 91da2849c49..ba5f275e1e3 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2853,9 +2853,16 @@ ix86_option_override_internal (bool main_args_p,
   if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL]
   && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
 opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
+  else if (!main_args_p
+	   && ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL])
+opts->x_target_flags &= ~MASK_AVX256_SPLIT_UNALIGNED_LOAD;
+
   if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL]
   && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
+  else if (!main_args_p
+	   && ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL])
+opts->x_target_flags &= ~MASK_AVX256_SPLIT_UNALIGNED_STORE;
 
   /* Enable 128-bit AVX instruction generation
  for the auto-vectorizer.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr100093.c b/gcc/testsuite/gcc.target/i386/pr100093.c
new file mode 100644
index 000..f32a4bc8f2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100093.c
@@ -0,0 +1,12 @@
+/* PR target/100093  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=znver1" } */
+/* { dg-final { scan-assembler-not "vextractf128" } } */
+
+__attribute__((target("tune=skylake-avx512")))
+void fill_avx2(double *__restrict__ data, int n, double value)
+{
+for (int i = 0; i < n * 16; i++) {
+data[i] = value;
+}
+}
-- 
2.18.1

Re: [PATCH] [libstdc++] Fix "bare" notifications dropped by waiters check

2021-04-22 Thread Jonathan Wakely via Gcc-patches


On 21/04/21 18:29 -0700, Thomas Rodgers wrote:

From: Thomas Rodgers 

NOTE - This patch also needs to be backported to gcc-11 in order for
semaphore release() to work correctly on non-futex platforms.

Tested sparc-sun-solaris2.11

For types that track whether or not there extant waiters (e.g.
semaphore) internally, the __atomic_notify_address_bare() call was
introduced to avoid the overhead of loading the atomic count of
waiters. For platforms that don't have Futex, however, there was
still a check for waiters, and seeing that there are none (because
in the bare case, the count is not incremented), the notification
is dropped. This commit addresses that case.

libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h: Always notify waiters in the
in the case of 'bare' address notification.


Repeated text: "in the in the"



---
libstdc++-v3/include/bits/atomic_wait.h | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 0ac5575190c..984ed70f16c 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -226,9 +226,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }

  void
-  _M_notify(const __platform_wait_t* __addr, bool __all) noexcept
+  _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) 
noexcept
  {
-   if (!_M_waiting())
+   if (!(__bare || _M_waiting()))


Maybe it's just me, but I would find (!__base && !__waiting) to be a
clearer expression of the logic here.

i.e. "return if the wait is not bare and there are no waiters"
rather than "return if the wait is not either bare or has waiters".
The latter makes me take a second to grok.

The patch is OK either way though, with the ChangeLog typo fix.



  return;

#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
@@ -304,11 +304,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }

void
-   _M_notify(bool __all)
+   _M_notify(bool __all, bool __bare = false)
{
  if (_M_addr == &_M_w._M_ver)
__atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
- _M_w._M_notify(_M_addr, __all);
+ _M_w._M_notify(_M_addr, __all, __bare);
}

template

[Patch, committed] gfortran.dg/pr68078.f90: Avoid increasing RLIMIT_AS

2021-04-22 Thread Tobias Burnus


The test did fail when the virtual memory already had a hard limit !=
unlimited.

Committed as r12-57-gfaf7d413a3f3337be1a3ac5cdf33e0e3b87b426e

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
commit faf7d413a3f3337be1a3ac5cdf33e0e3b87b426e
Author: Tobias Burnus 
Date:   Thu Apr 22 11:05:17 2021 +0200

gfortran.dg/pr68078.f90: Avoid increasing RLIMIT_AS

pr68078.f90 tests out-of-memory handling and calls set_vm_limit to set the
soft limit.  However, setrlimit was then called with hard limit RLIM_INFINITY,
which failed when the current hard limit was lower.

gcc/testsuite/
* gfortran.dg/set_vm_limit.c (set_vm_limit): Call getrlimit, use
obtained hard limit, and only call setrlimit if new softlimit is lower.

diff --git a/gcc/testsuite/gfortran.dg/set_vm_limit.c b/gcc/testsuite/gfortran.dg/set_vm_limit.c
index 30c4b43e0ed..8344f6fd079 100644
--- a/gcc/testsuite/gfortran.dg/set_vm_limit.c
+++ b/gcc/testsuite/gfortran.dg/set_vm_limit.c
@@ -8,9 +8,20 @@
 void
 set_vm_limit (int vm_limit)
 {
-  struct rlimit rl = { vm_limit, RLIM_INFINITY };
+  struct rlimit rl;
   int r;
 
+  r = getrlimit (RLIMIT_AS, );
+  if (r)
+{
+  perror ("get_vm_limit");
+  exit (1);
+}
+
+  if (vm_limit >= rl.rlim_cur)
+return;
+
+  rl.rlim_cur = vm_limit;
   r = setrlimit (RLIMIT_AS, );
   if (r)
 {

Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.

2021-04-22 Thread Martin Liška

On 4/22/21 10:04 AM, Richard Biener wrote:
> On Wed, Apr 21, 2021 at 3:08 PM Martin Liška  wrote:
>>
>> When -flto=jobserver is used and we cannot detect job server, then we can
>> still fallbackto -flto=N mode.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> 
> I think this behavior needs to be documented - it falls back to a less
> conservative (possibly system overloading) mode - which IMHO is
> non-obvious and IMHO we shouldn't do.

Sure, I'm sending corresponding patch. Note that it's quite common mistake
that '+' is missing in Makefile rule. That was motivation for my change.

Martin

> 
> Richard.
> 
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the
>> makeserver cannot be detected, then use -flto=N fallback.
>> ---
>>  gcc/lto-wrapper.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
>> index 03a5922f8ea..0b626d7c811 100644
>> --- a/gcc/lto-wrapper.c
>> +++ b/gcc/lto-wrapper.c
>> @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[])
>>if (jobserver && jobserver_error != NULL)
>> {
>>   warning (0, jobserver_error);
>> - parallel = 0;
>> + /* Fall back to auto parallelism.  */
>>   jobserver = 0;
>> + auto_parallel = 1;
>> }
>>else if (!jobserver && jobserver_error == NULL)
>> {
>> --
>> 2.31.1
>>

>From b81f5288fc81256e63d44e00793e2732eb2a2d88 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 22 Apr 2021 10:59:51 +0200
Subject: [PATCH] Document -flto=jobserver fallback.

gcc/ChangeLog:

	* doc/invoke.texi: Document what happens when GCC can't detect
	make's job server with -flto=jobserver.
---
 gcc/doc/invoke.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e98b0962b9f..1cfff4c14a9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12340,6 +12340,9 @@ Use @option{-flto=auto} to use GNU make's job server, if available,
 or otherwise fall back to autodetection of the number of CPU threads
 present in your system.
 
+If you specify @option{-flto=jobserver} and make's job server can't be detected,
+then well fall back to the number of CPU threads present in your system.
+
 @item -flto-partition=@var{alg}
 @opindex flto-partition
 Specify the partitioning algorithm used by the link-time optimizer.
-- 
2.31.1

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread Jakub Jelinek via Gcc-patches

On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote:
> How about this?
> 
> @item general_regs_only
> @cindex @code{general_regs_only} function attribute, x86
> The @code{general_regs_only} function attribute informs the compiler
> that the function uses only general purpose registers.  When the
> compiler inlines a function with the @code{always_inline} attribute,
> target-specific compilation options may lead to inline failures.
> The @code{general_regs_only} attribute, if applicable, can be used
> together with the @code{always_inline} attribute to reduce inlining
> failure.

I don't really like this attribute.
It is very specific to what you want to solve and doesn't address the
general problem, that always_inline means different things in different
code, and that is a problem for many targets, not just one.

As has been written, in some cases it means inline always, error
whenever it is called indirectly which can't be optimized into a direct
call that can be inlined and error whenever the inlining fails for other
reasons.

Another case, e.g. in the glibc fortify wrappers, is inline always
when the call is direct or an indirect call can be optimized into a direct
call and error when the inlining fails, but support indirect calls without
errors.

Another case, which is most of the x86/aarch64/arm etc. intrinsics, is
inline always unless there is a target mismatch (roughly what is
actually implemented).

Because from the always_inline attribute it is impossible to determine which
one of those it is (for the indirect calls the rule could be
gnu_inline extern inline means indirect calls are ok, anything else means
indirect calls are bad), we need new syntax to distinguish those cases.

general_regs_only attribute doesn't seem to be it, e.g. for the glibc
fortify wrappers cases I don't see why we should forbid using floating point
in such inlines.

So IMHO we need some new attribute for one of those, or optional parameter
to always_inline.

For the intrinsic case, ideal would be if we could record which ISA flags
(or more generally which options) are required and which are not.  Either
have some syntax where those would be explicitly specified in attribute (but
frankly that would be a maintainance nightmare), or derive those from
surrounding pragmas.  Right now we have those wrapped in
#ifndef __AVX2__
#pragma GCC push_options
#pragma GCC target("avx2")
#define __DISABLE_AVX2__
#endif /* __AVX2__ */

...

#ifdef __DISABLE_AVX2__
#undef __DISABLE_AVX2__
#pragma GCC pop_options
#endif /* __DISABLE_AVX2__ */

The question is if the pragma GCC target right now behaves incrementally
or not, whether
#pragma GCC target("avx2")
adds -mavx2 to options if it was missing before and nothing otherwise, or if
it switches other options off.  If it is incremental, we could e.g. try to
use the second least significant bit of global_options_set.x_* to mean
this option has been set explicitly by some surrounding #pragma GCC target.
The normal tests - global_options_set.x_flag_whatever could still work
fine because they wouldn't care if the option was explicit from anywhere
(command line or GCC target or target attribute) and just & 2 would mean
it was explicit from pragma GCC target; though there is the case of
bitfields... And then the inlining decision could check the & 2 flags to
see what is required and what is just from command line.
Or we can have some other pragma GCC that would be like target but would
have flags that are explicit (and could e.g. be more restricted, to ISA
options only, and let those use in addition to #pragma GCC target.

Jakub

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Martin Liška

There's an updated version of the patch, Jonathan noticed correctly
the comment related to assert was not correct.

Martin
>From e035fd0549ea17ab4f8d71488f577fd1e4077fd9 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 12 Mar 2021 14:32:07 +0100
Subject: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

gcc/cp/ChangeLog:

	* cp-tree.h (STATIC_ASSERT): Prefer static assert.
	* lex.c (init_operators): Remove run-time check.
---
 gcc/cp/cp-tree.h | 3 +++
 gcc/cp/lex.c | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 23a77a2b2e0..cb254e0adea 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5922,6 +5922,9 @@ enum ovl_op_code {
   OVL_OP_MAX
 };
 
+/* Make sure it fits in lang_decl_fn::ovl_op_code. */
+STATIC_ASSERT (OVL_OP_MAX < (1 << 6));
+
 struct GTY(()) ovl_op_info_t {
   /* The IDENTIFIER_NODE for the operator.  */
   tree identifier;
diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c
index 73e14b8394c..43abd019e6e 100644
--- a/gcc/cp/lex.c
+++ b/gcc/cp/lex.c
@@ -166,8 +166,6 @@ init_operators (void)
 
   if (op_ptr->name)
 	{
-	  /* Make sure it fits in lang_decl_fn::operator_code. */
-	  gcc_checking_assert (op_ptr->ovl_op_code < (1 << 6));
 	  tree ident = set_operator_ident (op_ptr);
 	  if (unsigned index = IDENTIFIER_CP_INDEX (ident))
 	{
-- 
2.31.1

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Jonathan Wakely via Gcc-patches

On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote:

> On 4/21/21 6:11 PM, Martin Sebor wrote:
> > On 4/21/21 2:15 AM, Martin Liška wrote:
> >> Hello.
> >>
> >> It's addressing the following Clang warning:
> >> cp/lex.c:170:45: warning: result of comparison of constant 64 with
> expression of type 'enum ovl_op_code' is always true
> [-Wtautological-constant-out-of-range-compare]
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >> Thanks,
> >> Martin
> >>
> >> gcc/cp/ChangeLog:
> >>
> >> * cp-tree.h (STATIC_ASSERT): Prefer static assert.
> >> * lex.c (init_operators): Remove run-time check.
> >> ---
> >>   gcc/cp/cp-tree.h | 3 +++
> >>   gcc/cp/lex.c | 2 --
> >>   2 files changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> >> index 81ff375f8a5..a8f72448ea9 100644
> >> --- a/gcc/cp/cp-tree.h
> >> +++ b/gcc/cp/cp-tree.h
> >> @@ -5916,6 +5916,9 @@ enum ovl_op_code {
> >> OVL_OP_MAX
> >>   };
> >>   +/* Make sure it fits in lang_decl_fn::operator_code. */
> >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6));
> >> +
> >
> > I wonder if there's a way to test this directly by something like
> >
> >   static_assert (number-of-bits (ovl_op_info_t::ovl_op_code)
> >  <= number-of-bits (lang_decl_fn::operator_code));
>
> Good point, but I'm not aware of it. Maybe C++ people can chime in?
>

ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class")
with no fixed underlying type (i.e. no enum-base like ": int" or ": long"
is specified) which means that the number of bits in is value
representation is the number of bits needed to represent the minimum and
maximum enumerators:

"the values of the enumeration are the values representable by a
hypothetical integer type with minimal width M such that all enumerators
can be represented."

There is no function/utility like number-of-bits that can tell you that
from the type though.You could use std::underlying_type::type
to get the integral type that the compiler used to represent it, but that
will probably be 'int' in this case and so all it tells you is an upper
bound of no more than 32 bits, which is not useful for this purpose.

Re: [PATCH] [libstdc++] Fix "bare" notifications dropped by waiters check

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 4:29 AM Thomas Rodgers
 wrote:
>
> From: Thomas Rodgers 
>
> NOTE - This patch also needs to be backported to gcc-11 in order for
> semaphore release() to work correctly on non-futex platforms.
>
> Tested sparc-sun-solaris2.11
>
> For types that track whether or not there extant waiters (e.g.
> semaphore) internally, the __atomic_notify_address_bare() call was
> introduced to avoid the overhead of loading the atomic count of
> waiters. For platforms that don't have Futex, however, there was
> still a check for waiters, and seeing that there are none (because
> in the bare case, the count is not incremented), the notification
> is dropped. This commit addresses that case.

This is OK for the GCC 11 branch if approved for trunk.

Richard.

> libstdc++-v3/ChangeLog:
> * include/bits/atomic_wait.h: Always notify waiters in the
> in the case of 'bare' address notification.
> ---
>  libstdc++-v3/include/bits/atomic_wait.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
> b/libstdc++-v3/include/bits/atomic_wait.h
> index 0ac5575190c..984ed70f16c 100644
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -226,9 +226,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>}
>
>void
> -  _M_notify(const __platform_wait_t* __addr, bool __all) noexcept
> +  _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) 
> noexcept
>{
> -   if (!_M_waiting())
> +   if (!(__bare || _M_waiting()))
>   return;
>
>  #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> @@ -304,11 +304,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   }
>
> void
> -   _M_notify(bool __all)
> +   _M_notify(bool __all, bool __bare = false)
> {
>   if (_M_addr == &_M_w._M_ver)
> __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
> - _M_w._M_notify(_M_addr, __all);
> + _M_w._M_notify(_M_addr, __all, __bare);
> }
>
> template @@ -452,7 +452,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
>  {
>__detail::__bare_wait __w(__addr);
> -  __w._M_notify(__all);
> +  __w._M_notify(__all, true);
>  }
>
>// This call is to be used by atomic types which track contention 
> externally
> @@ -464,7 +464,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  __detail::__platform_notify(__addr, __all);
>  #else
>  __detail::__bare_wait __w(__addr);
> -__w._M_notify(__all);
> +__w._M_notify(__all, true);
>  #endif
>}
>  _GLIBCXX_END_NAMESPACE_VERSION
> --
> 2.30.2
>

Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 3:01 AM H.J. Lu  wrote:
>
> On Wed, Apr 21, 2021 at 4:24 PM Martin Sebor  wrote:
> >
> > On 4/21/21 2:58 PM, H.J. Lu wrote:
> > > On Wed, Apr 21, 2021 at 10:09 AM Martin Sebor  wrote:
> > >>
> > >> On 4/14/21 4:39 PM, H.J. Lu wrote:
> > >>> commit 87c753ac241f25d222d46ba1ac66ceba89d6a200
> > >>> Author: H.J. Lu 
> > >>> Date:   Fri Aug 21 09:42:49 2020 -0700
> > >>>
> > >>>   x86: Add target("general-regs-only") function attribute
> > >>>
> > >>> is incomplete since it is impossible to call integer intrinsics from
> > >>> a function with general-regs-only target attribute.
> > >>>
> > >>> 1. Add general_regs_only function attribute to inform the compiler that
> > >>> functions use only general purpose registers.  When making inlining
> > >>> decisions on such functions, non-GPR compiler options are excluded.
> > >>> 2. Add general_regs_only attribute to x86 intrinsics which use only
> > >>> general purpose registers.
> > >>>
> > >> ...
> > >>> --- a/gcc/doc/extend.texi
> > >>> +++ b/gcc/doc/extend.texi
> > >>> @@ -7066,6 +7066,11 @@ On x86 targets, the @code{fentry_section} 
> > >>> attribute sets the name
> > >>>of the section to record function entry instrumentation calls in when
> > >>>enabled with @option{-pg -mrecord-mcount}
> > >>>
> > >>> +@item general_regs_only
> > >>> +@cindex @code{general_regs_only} function attribute, x86
> > >>> +The @code{general_regs_only} attribute on functions is used to
> > >>> +inform the compiler that functions use only general purpose registers.
> > >>
> > >> I'll just reiterate basically the same comment as before: it's not
> > >> clear from the very brief description above what the requirements
> > >> are for using the attribute.  I'm guessing it can be applied to
> > >> any function (inline or otherwise) but only has any effect when
> > >> the function is actually inlined and otherwise doesn't constrain
> > >> what the function can do.  (Whatever the constraints are, I think
> > >> the manual should spell them out, and likewise for its effects.)
> > >
> > > That is correct.
> > >
> > >> Similarly it's not clear what should be expected when the function
> > >> does use some other register.  Ideally, I think GCC would check and
> > >> issue a nice error message whether or not the function is inlined
> > >> or called.  I suspect that might only be possible for inline
> > >> functions that are actually called and for which the back end must
> > >> emit code.
> > >
> > > This is what GCC does today:
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99744
> >
> > Yes, that's the rather obscure error I think I commented on before
> > and suggested should be improved.  Based on r99744-3.c I don't think
> > this has changed in the improved patch.
>
> My goal is to fix the inline failures, not to improve the compiler error
> message.
>
> > >
> > >> Other than that, I'd suggest to improve the phrasing a bit:
> > >>
> > >> The @code{general_regs_only} function attribute indicates that
> > >> the function uses only general purpose registers... [text
> > >> explaining constraints and errors follows].
> > >>
> > >> Martin
> > >
> > > How about this
> > >
> > > @item general_regs_only
> > > @cindex @code{general_regs_only} function attribute, x86
> > > The @code{general_regs_only} attribute on functions is used to inform
> > > the compiler that functions use only general purpose registers.  It
> > > can be used together with the @code{always_inline} attribute to avoid
> > > inlining failure when there is a mismatch in compiler vector options.
> >
> > Without an article the part "that functions use only general purpose
> > registers" is unclear and/or grammatically incorrect.  What functions?
> > If the function the attribute is applied to, it needs an article, e.g.,
> > "the function" or "a function", and singular.  (Otherwise it could be
> > read as talking about the functions called from the one with
> > the attribute, or some other functions altogether).
> >
> > I tried to correct that above but, if you prefer, the following would
> > be closer to your phrasing but more correct/accurate:
> >
> >The @code{general_regs_only} function attribute informs
> >the compiler that the function uses only general purpose
> >registers.
> >
> > I don't understand what the second sentence is trying to say, and
> > without a better error message for the problem in r99744, I suspect
> > few users will either.  I am suggesting to explain in the text you
> > are adding, under what conditions inlining might fail without
> > the attribute, and what effect the attribute has on the function
> > that prevents the inlining failure.
>
> How about this?
>
> @item general_regs_only
> @cindex @code{general_regs_only} function attribute, x86
> The @code{general_regs_only} function attribute informs the compiler
> that the function uses only general purpose registers.  When the
> compiler inlines a function with the @code{always_inline}

Re: [PATCH 0/3] VAX backend preparatory updates for switching to LRA

2021-04-22 Thread Richard Biener via Gcc-patches

On Thu, Apr 22, 2021 at 1:22 AM Maciej W. Rozycki  wrote:
>
> Hi,
>
>  According to the plan discussed in the context of the recent switch to
> MODE_CC of the VAX backend I have been looking into switching the backend
> to LRA as well.
>
>  It has turned out quite straightforward itself, with just a couple of
> minor issues triggered with a flip to LRA, one causing a build failure
> with target libatomic and another causing a C testsuite regression.  Also
> I have come across a piece of dead code which has never ever been used for
> anything and it is unclear to me what its intended purpose was.
>
>  I have come up with this small patch series then, bundled together for
> easier reference although the individual changes are independent from each
> other.
>
>  I think 3/3 is worth backporting to GCC 11 at one point, perhaps 11.2, so
> that it can be easily picked downstream, as it improves code generation
> with old reload and we may not have another major release still using it.
>
>  OTOH switching to LRA regresses code generation seriously, by making the
> indexed and indirect VAX address modes severely underutilised, so while
> with these changes in place the backend can be switched to LRA with just a
> trivial to remove the redefinition of TARGET_LRA_P, I think it is not yet
> the right time to do it.
>
>  It is not a hard show-stopper though, so while I plan to look into LRA
> now to figure out what is missing there that the old reload has to satisfy
> the VAX backend, the switch to LRA can now be made anytime if so required
> and I am preempted for whatever reason (and nobody else gets to it).
>
>  Questions, comments, OK to apply?

Sounds like a reasonable stance to me.  The patches look all good, thus
they are OK to apply.

Thanks,
Richard.

>
>   Maciej

Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.

2021-04-22 Thread Richard Biener via Gcc-patches

On Wed, Apr 21, 2021 at 3:08 PM Martin Liška  wrote:
>
> When -flto=jobserver is used and we cannot detect job server, then we can
> still fallbackto -flto=N mode.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

I think this behavior needs to be documented - it falls back to a less
conservative (possibly system overloading) mode - which IMHO is
non-obvious and IMHO we shouldn't do.

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the
> makeserver cannot be detected, then use -flto=N fallback.
> ---
>  gcc/lto-wrapper.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 03a5922f8ea..0b626d7c811 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[])
>if (jobserver && jobserver_error != NULL)
> {
>   warning (0, jobserver_error);
> - parallel = 0;
> + /* Fall back to auto parallelism.  */
>   jobserver = 0;
> + auto_parallel = 1;
> }
>else if (!jobserver && jobserver_error == NULL)
> {
> --
> 2.31.1
>

Re: [Patch] libgomp/testsuite: Fix checks for dg-excess-errors

2021-04-22 Thread Richard Biener

On Wed, 21 Apr 2021, Tobias Burnus wrote:

> This was brought up by Richard when testing libgomp with the GCC 11
> distribution
> compiler, which has both nvptx and gcn enabled – but no offloading device was
> available.
> 
> This lead to fails for:
> * testsuite/libgomp.c-c++-common/function-not-offloaded.c:
>   → function (on purpose) not marked for offline
>   → issue: dg-excess-errors by lto1/mkoffload not ignored
> * testsuite/libgomp.c-c++-common/variable-not-offloaded.c
>   → likewise for variables, except that this would succeed
>   with unified-shared memory
>   → issue: dg-excess-errors by lto1/mkoffload not ignored
> * libgomp/testsuite/libgomp.c/pr86416-1.c + *-2.c
>   → expected fail when offloading compiler does not support
>   long double or float128
>   → issue: error output but dg-error/dg-excess-error not active
> 
> The reason for those fails is that used effective target checks
> do not cover those cases correctly. Namely:
> 
> * offload_device - check whether an offloading device is
>   available & used at run time.
>   → problem: compilation was done for the device but here
>   no device is available
> * offload_device_nonshared_as
>   → Likewise and same issue, except that also 0 is returned
>   if the device has unified shared memory.
> 
> Solution: Explicitly check for nvptx/gcn offload compilation
> (but ignore device availability).
> [Uses 'gcc -v |grep ^OFFLOAD_TARGET_NAMES=', which honors
> -foffload=, including -foffload=disable.]
> 
> OK for mainline and GCC 11?
> 
> 
> Tested without offloading support & with nvptx &
> offload support (with an offloading device available).
> 
> @Richard: I would be happy if you could confirm that this fixes
> your issue.

So with this I'm down to

=== libgomp tests ===


Running target unix
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess 
errors)
FAIL: libgomp.c/target-28.c (test for excess errors)
UNRESOLVED: libgomp.c/target-28.c compilation failed to produce executable
XPASS: libgomp.c++/../libgomp.c-c++-common/pr96390.c (test for excess 
errors)
XPASS: libgomp.c++/pr96390.C (test for excess errors)
FAIL: libgomp.c++/target-13.C (test for excess errors)
UNRESOLVED: libgomp.c++/target-13.C compilation failed to produce 
executable

=== libgomp Summary ===

# of expected passes8411
# of unexpected failures2
# of unexpected successes   3

which is nice.  The target-28.c FAIL is

spawn -ignore SIGHUP 
/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/./gcc/xgcc 
-B/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/./gcc/ 
-B/usr/x86_64-suse-linux/bin/ -B/usr/x86_64-suse-linux/lib/ -isystem 
/usr/x86_64-suse-linux/include -isystem /usr/x86_64-suse-linux/sys-include 
-fchecking=1 ../../../../libgomp/testsuite/libgomp.c/target-28.c 
-B/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp/
 
-B/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp/.libs
 
-I/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp
 
-I../../../../libgomp/testsuite/../../include 
-I../../../../libgomp/testsuite/.. -Lno -fmessage-length=0 
-fno-diagnostics-show-caret -fdiagnostics-color=never 
-B/usr/lib64/gcc/x86_64-suse-linux/11 -B/usr/bin 
-B/usr/lib64/gcc/x86_64-suse-linux/11 -B/usr/bin -fopenmp -O2 
-L/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp/.libs
 
-lm -o ./target-28.exe^M
^[[1m/tmp/ccCLBCvn.mkoffload.2.s:233:2: ^[[0m^[[0;1;31merror: 
^[[0m^[[1ms.3 changed binding to STB_GLOBAL^M
^[[0m.global s.3^M
^[[0;1;32m^^M
^[[0m^[[1m/tmp/ccCLBCvn.mkoffload.2.s:234:2: ^[[0m^[[0;1;31merror: 
^[[0m^[[1ms.2 changed binding to STB_GLOBAL^M
^[[0m.global s.2^M
^[[0;1;32m^^M
^[[0mmkoffload: fatal error: 
/usr/bin//x86_64-suse-linux-accel-amdgcn-amdhsa-gcc-11 returned 1 exit 
status^M
compilation terminated.^M
lto-wrapper: fatal error: 
/usr/lib64/gcc/x86_64-suse-linux/11//accel/amdgcn-amdhsa/mkoffload 
returned 1 exit status^M

and the target-13.C one is similar.

Richard.

Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.

2021-04-22 Thread Martin Liška

On 4/21/21 6:11 PM, Martin Sebor wrote:
> On 4/21/21 2:15 AM, Martin Liška wrote:
>> Hello.
>>
>> It's addressing the following Clang warning:
>> cp/lex.c:170:45: warning: result of comparison of constant 64 with 
>> expression of type 'enum ovl_op_code' is always true 
>> [-Wtautological-constant-out-of-range-compare]
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>> Thanks,
>> Martin
>>
>> gcc/cp/ChangeLog:
>>
>> * cp-tree.h (STATIC_ASSERT): Prefer static assert.
>> * lex.c (init_operators): Remove run-time check.
>> ---
>>   gcc/cp/cp-tree.h | 3 +++
>>   gcc/cp/lex.c | 2 --
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>> index 81ff375f8a5..a8f72448ea9 100644
>> --- a/gcc/cp/cp-tree.h
>> +++ b/gcc/cp/cp-tree.h
>> @@ -5916,6 +5916,9 @@ enum ovl_op_code {
>>     OVL_OP_MAX
>>   };
>>   +/* Make sure it fits in lang_decl_fn::operator_code. */
>> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6));
>> +
> 
> I wonder if there's a way to test this directly by something like
> 
>   static_assert (number-of-bits (ovl_op_info_t::ovl_op_code)
>  <= number-of-bits (lang_decl_fn::operator_code));

Good point, but I'm not aware of it. Maybe C++ people can chime in?

> 
> Also, since we are now compiling in C++ 11 mode, would using
> static_assert be appropriate?

We do:

#if __cplusplus >= 201103L
#define STATIC_ASSERT(X) \
  static_assert ((X), #X)
#else
#define STATIC_ASSERT(X) \
  typedef int assertion1[(X) ? 1 : -1] ATTRIBUTE_UNUSED
#endif

Btw. I'm going to send a patch with remove the #else branch.

Martin

> 
> Martin
> 
> 
>>   struct GTY(()) ovl_op_info_t {
>>     /* The IDENTIFIER_NODE for the operator.  */
>>     tree identifier;
>> diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c
>> index 73e14b8394c..43abd019e6e 100644
>> --- a/gcc/cp/lex.c
>> +++ b/gcc/cp/lex.c
>> @@ -166,8 +166,6 @@ init_operators (void)
>>       if (op_ptr->name)
>>   {
>> -  /* Make sure it fits in lang_decl_fn::operator_code. */
>> -  gcc_checking_assert (op_ptr->ovl_op_code < (1 << 6));
>>     tree ident = set_operator_ident (op_ptr);
>>     if (unsigned index = IDENTIFIER_CP_INDEX (ident))
>>   {
>>
>

Re: [PATCH] aarch64: Avoid duplicating bti j insns for jump tables [PR99988]

2021-04-22 Thread Christophe Lyon via Gcc-patches

On Wed, 21 Apr 2021 at 14:05, Richard Sandiford via Gcc-patches
 wrote:
>
> Alex Coplan  writes:
> > Hi Richard,
> >
> > On 15/04/2021 18:45, Richard Sandiford wrote:
> >> Looks good in general, but like you say, it's GCC 12 material.
> >
> > Thanks for the review. The attached patch addresses these comments and
> > bootstraps/regtests OK on aarch64-linux-gnu. OK for trunk?
>
> OK, thanks.
>

The new test fails with -mabi=ilp32:
sorry, unimplemented: return address signing is only supported for '-mabi=lp64'

Is that OK:
diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c
b/gcc/testsuite/gcc.target/aarch64/pr99988.c
index 2d87f41..7cca496 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr99988.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target lp64 } } */
 /* { dg-options "-O2 -mbranch-protection=standard" } */
 /* { dg-final { scan-assembler-times {bti j} 13 } } */
 int a;

Thanks,

Christophe

Re: [PATCH] Use hardware_concurrency only if _GLIBCXX_HAS_GTHREADS

2021-04-22 Thread Martin Liška

On 4/21/21 10:16 PM, Jakub Jelinek wrote:
> On Wed, Apr 21, 2021 at 08:53:54PM +0100, Jonathan Wakely wrote:
 What would be IMHO a good idea would be to use configure test for
 #include 
 int t = std::thread::hardware_concurrency ();
 and in that case use that as a fallback to the previous implementation,
 that will be strictly an improvement.
>>>
>>> That does not make sense to me. Original motivation was removal of the 
>>> complicated
>>> implementation with a C++ standard function.
> 
> The reason it is larger and more complicated is that it answers a different
> question than std::thread::hardware_concurrency().
> hardware_concurrency() says how many threads the hw has available, if we
> wanted just that, you could just keep the sysconf call.
> The current implementation answers the question how many threads can this
> process and its children be scheduled on.
> Consider some NUMA box with hundreds of threads, but where it might
> be desirable to keep specific compilations limited to some cores and do
> other compilations or whatever else on other cores.
> 
> Sure, hardware_concurrency() answer is better than no information about the
> concurrency at all.  But I don't really see the costs of the current
> implementation, it is < 100 lines of code that have been written once and
> it is unlikely they will need changing - at this point the interfaces are
> fairly stable, haven't been changed on the glibc side for years.  It is true
> that when it was introduced in glibc, there have been 2 big changes to it
> soon.

All right, I understand that hardware_concurrency doesn't match exactly our 
needs.
Thus I'm going to install the following patch that removes the FIXME.

Thanks for help with that,
Martin

> 
>   Jakub
> 

>From dbe87a4229126829e0a912e9928c82b8dde7b51c Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 22 Apr 2021 09:14:28 +0200
Subject: [PATCH] Remove not feasible FIXME

gcc/ChangeLog:

	* lto-wrapper.c: Remove FIXME about usage of
	hardware_concurrency. The function is not on par with
	what we have now.
---
 gcc/lto-wrapper.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 0b626d7c811..49894e4fc61 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1304,8 +1304,6 @@ init_num_threads (void)
 #endif
 }
 
-/* FIXME: once using -std=c++11, we can use std::thread::hardware_concurrency.  */
-
 /* Test and return reason why a jobserver cannot be detected.  */
 
 static const char *
-- 
2.31.1

Re: [PATCH] Use std::thread::hardware_concurrency in lto-wrapper.c.

2021-04-22 Thread Eric Botcazou

> Anyway, that's why we have a C++ ISO standard and so
> std::thread::hardware_concurrency function implementation can (and likely
> should) handle all this. Or?

This is a compiler though, i.e. quite low level in the software stack, so the 
implementation must be conservative and thus fancy C++ features avoided.

-- 
Eric Botcazou

Re: [PATCH] Use hardware_concurrency only if _GLIBCXX_HAS_GTHREADS

2021-04-22 Thread Martin Liška

On 4/21/21 9:53 PM, Jonathan Wakely wrote:
> On Wed, 21 Apr 2021 at 20:41, Martin Liška  wrote:
>>
>> On 4/21/21 9:15 PM, Jakub Jelinek wrote:
>>> On Wed, Apr 21, 2021 at 08:28:55PM +0200, Jakub Jelinek via Gcc-patches 
>>> wrote:
> There's a patch attempt for the problem with 
> std::thread::hardware_concurrency where
> it's used only if _GLIBCXX_HAS_GTHREADS is set.
>
> Does it help?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> PR bootstrap/100186
> * lto-wrapper.c: Use hardware_concurrency only if
> _GLIBCXX_HAS_GTHREADS.
> ---
>  gcc/lto-wrapper.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 6ba401007f6..8a85b3e93a8 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -1285,7 +1285,11 @@ run_gcc (unsigned argc, char *argv[])
>static char current_dir[] = { '.', DIR_SEPARATOR, '\0' };
>
>/* Number of CPUs that can be used for parallel LTRANS phase.  */
> -  unsigned long nthreads_var = std::thread::hardware_concurrency ();
> +  unsigned long nthreads_var = 0;
> +
> +#ifdef _GLIBCXX_HAS_GTHREADS
> +  nthreads_var = std::thread::hardware_concurrency ();
> +#endif

 _GLIBCXX_HAS_GTHREADS is a libstdc++ internal macro, it shouldn't be used
 outside of libstdc++.
 And, when using some other compiler or standard C++ library, nthreads_var
 will be always 0.  That isn't an improvement.
>>>
>>> What would be IMHO a good idea would be to use configure test for
>>> #include 
>>> int t = std::thread::hardware_concurrency ();
>>> and in that case use that as a fallback to the previous implementation,
>>> that will be strictly an improvement.
>>
>> That does not make sense to me. Original motivation was removal of the 
>> complicated
>> implementation with a C++ standard function.
>>
>> @Jonathan:
>>
>> What standard says about usage of std::thread::hardware_concurrency() [1]?
>> The function is defined in C++11, how can one detect if threading is enabled
>> in the C++ library and use it?
> 
> A configure test.
> 
> The standard says it's required to exist, but libstdc++ has
> traditionally not defined things that are not supportable on the
> target.

Which seems logical to me.

> So for targets with no multithreading, we didn't define
> anything in . Even though that has changed in current releases
> (so that std::thread exists even if it's useless), that won't help you
> with previous releases. You need a configure test.

All right.

> 
> As I said in the PR, opinions differ on whether it's better to define
> everything (even if it's useless) or not define them. Arguably, users
> on small embedded targets do not want the library to waste space
> defining a useless std::thread class that just throws an exception if
> you ever try to create a new thread (but that's what the standard
> requires).
> 

I think defining only things that work is the right way.

Martin

Re: [PATCH] Use std::thread::hardware_concurrency in lto-wrapper.c.

2021-04-22 Thread Eric Botcazou

> This patch broke bootstrap on AIX.
> 
> std::thread is not provided in all instances.  GCC is not compiled
> multi-threaded by default.

Right, this will very likely break on Windows too.

-- 
Eric Botcazou

[PATCH] arm: remove error in CPP_SPEC when float-abi soft and hard are used together

2021-04-22 Thread Christophe Lyon via Gcc-patches

arm.h has had this error message since 1997, and was never updated to
take softfp into account. Anyway, it seems it was useful long ago, but
it is no longer needed since option parsing has been improved:
-mfloat-abi is handled via arm.opt and updates the var_float_abi
variable. So, the last instance of -mfloat-abi= on the command line
wins.

This patch just removes this error message, thus enabling many more
tests to pass on arm-eabi:

* with -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd (2 more passes)
gcc.target/arm/pr52375.c
g++.target/arm/pr99593.C (test for excess errors)

* with -mthumb/-mfloat-abi=soft/-march=armv6s-m (115 more passes in C, 90 more 
in C++)
gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors)
gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors)
gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors)
g++.target/arm/no_unique_address_1.C
g++.target/arm/no_unique_address_2.C

* with -mthumb/-mfloat-abi=soft/-march=armv7-m (153 more passes in C, 90 more 
in C++)
gcc.dg/pr59418.c (test for excess errors)
gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors)
gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors)
gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors)
gcc.target/arm/bfloat16_scalar_2_1.c (test for excess errors)
gcc.target/arm/bfloat16_scalar_3_1.c (test for excess errors)
gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors)
gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors)
gcc.target/arm/pr52375.c (test for excess errors)
gcc.target/arm/simd/vld1_bf16_1.c (test for excess errors)
gcc.target/arm/simd/vldn_lane_bf16_1.c (test for excess errors)
gcc.target/arm/simd/vst1_bf16_1.c (test for excess errors)
gcc.target/arm/simd/vstn_lane_bf16_1.c (test for excess errors)
g++.target/arm/no_unique_address_1.C
g++.target/arm/no_unique_address_2.C

* with -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp (65 more passes)
gcc.target/arm/atomic-comp-swap-release-acquire-3.c (test for excess errors)
gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-not dmb
gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times ldaex 4
gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times stlex 4
gcc.target/arm/atomic-op-acq_rel-3.c (test for excess errors)
gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times ldaex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times stlex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-acquire-3.c (test for excess errors)
gcc.target/arm/atomic-op-acquire-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times ldaex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times strex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-char-3.c (test for excess errors)
gcc.target/arm/atomic-op-char-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-char-3.c scan-assembler-times ldrexb\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-char-3.c scan-assembler-times strexb\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-consume-3.c (test for excess errors)
gcc.target/arm/atomic-op-consume-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-consume-3.c scan-assembler-times ldaex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-consume-3.c scan-assembler-times strex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-int-3.c (test for excess errors)
gcc.target/arm/atomic-op-int-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-int-3.c scan-assembler-times ldrex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-int-3.c scan-assembler-times strex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-relaxed-3.c (test for excess errors)
gcc.target/arm/atomic-op-relaxed-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-relaxed-3.c scan-assembler-times ldrex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-relaxed-3.c scan-assembler-times strex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-release-3.c (test for excess errors)
gcc.target/arm/atomic-op-release-3.c scan-assembler-not dmb
gcc.target/arm/atomic-op-release-3.c scan-assembler-times ldrex\tr[0-9]+, 
\\[r[0-9]+\\] 6
gcc.target/arm/atomic-op-release-3.c scan-assembler-times stlex\t...?, r[0-9]+, 
\\[r[0-9]+\\] 6

Re: [PATCH] Fix target/100106 ICE in gen_movdi

2021-04-22 Thread Richard Biener

On Thu, 22 Apr 2021, Bernd Edlinger wrote:

> Aehm,
> 
> forgot to mention,
> 
> tested on arm-none-eabi (arm-sim target)
> is it OK for trunk?

OK next week (when RC2 looks good).

Thanks,
Richard.

> 
> Thanks
> Bernd.
> 
> On 4/22/21 8:37 AM, Bernd Edlinger wrote:
> > As the test case shows, the outer mode may have a higher alignment
> > requirement than the inner mode here.
> > 
> > 2021-04-22  Bernd Edlinger  
> > 
> > PR target/100106
> > * gimplify-rtx.c (simplify_context::simplify_subreg): Check the
> > memory alignment for the outer mode.
> > 
> > * gcc.c-torture/compile/pr100106.c: New testcase.
> > ---
> >  gcc/simplify-rtx.c |  1 +
> >  gcc/testsuite/gcc.c-torture/compile/pr100106.c | 11 +++
> >  2 files changed, 12 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100106.c
> > 
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index d13c390..ad3b7b2 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -7217,6 +7217,7 @@ simplify_context::simplify_subreg (machine_mode 
> > outermode, rtx op,
> >   have instruction to move the whole thing.  */
> >&& (! MEM_VOLATILE_P (op)
> >   || ! have_insn_for (SET, innermode))
> > +  && !(STRICT_ALIGNMENT && MEM_ALIGN (op) < GET_MODE_ALIGNMENT 
> > (outermode))
> >&& known_le (outersize, innersize))
> >  return adjust_address_nv (op, outermode, byte);
> >  
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100106.c 
> > b/gcc/testsuite/gcc.c-torture/compile/pr100106.c
> > new file mode 100644
> > index 000..7f98b4f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr100106.c
> > @@ -0,0 +1,11 @@
> > +union a {
> > +  float _Complex b;
> > +  long long c;
> > +};
> > +
> > +void g(union a);
> > +
> > +void e() {
> > +  union a f = {1.0f};
> > +  g(f);
> > +}
> > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] Fix target/100106 ICE in gen_movdi

2021-04-22 Thread Bernd Edlinger

Aehm,

forgot to mention,

tested on arm-none-eabi (arm-sim target)
is it OK for trunk?


Thanks
Bernd.

On 4/22/21 8:37 AM, Bernd Edlinger wrote:
> As the test case shows, the outer mode may have a higher alignment
> requirement than the inner mode here.
> 
> 2021-04-22  Bernd Edlinger  
> 
>   PR target/100106
>   * gimplify-rtx.c (simplify_context::simplify_subreg): Check the
>   memory alignment for the outer mode.
> 
>   * gcc.c-torture/compile/pr100106.c: New testcase.
> ---
>  gcc/simplify-rtx.c |  1 +
>  gcc/testsuite/gcc.c-torture/compile/pr100106.c | 11 +++
>  2 files changed, 12 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100106.c
> 
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index d13c390..ad3b7b2 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -7217,6 +7217,7 @@ simplify_context::simplify_subreg (machine_mode 
> outermode, rtx op,
>   have instruction to move the whole thing.  */
>&& (! MEM_VOLATILE_P (op)
> || ! have_insn_for (SET, innermode))
> +  && !(STRICT_ALIGNMENT && MEM_ALIGN (op) < GET_MODE_ALIGNMENT 
> (outermode))
>&& known_le (outersize, innersize))
>  return adjust_address_nv (op, outermode, byte);
>  
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100106.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr100106.c
> new file mode 100644
> index 000..7f98b4f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr100106.c
> @@ -0,0 +1,11 @@
> +union a {
> +  float _Complex b;
> +  long long c;
> +};
> +
> +void g(union a);
> +
> +void e() {
> +  union a f = {1.0f};
> +  g(f);
> +}
>

[PATCH] Fix target/100106 ICE in gen_movdi

2021-04-22 Thread Bernd Edlinger

As the test case shows, the outer mode may have a higher alignment
requirement than the inner mode here.

2021-04-22  Bernd Edlinger  

PR target/100106
* gimplify-rtx.c (simplify_context::simplify_subreg): Check the
memory alignment for the outer mode.

* gcc.c-torture/compile/pr100106.c: New testcase.
---
 gcc/simplify-rtx.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/pr100106.c | 11 +++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100106.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index d13c390..ad3b7b2 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -7217,6 +7217,7 @@ simplify_context::simplify_subreg (machine_mode 
outermode, rtx op,
  have instruction to move the whole thing.  */
   && (! MEM_VOLATILE_P (op)
  || ! have_insn_for (SET, innermode))
+  && !(STRICT_ALIGNMENT && MEM_ALIGN (op) < GET_MODE_ALIGNMENT (outermode))
   && known_le (outersize, innersize))
 return adjust_address_nv (op, outermode, byte);
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100106.c 
b/gcc/testsuite/gcc.c-torture/compile/pr100106.c
new file mode 100644
index 000..7f98b4f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr100106.c
@@ -0,0 +1,11 @@
+union a {
+  float _Complex b;
+  long long c;
+};
+
+void g(union a);
+
+void e() {
+  union a f = {1.0f};
+  g(f);
+}
-- 
1.9.1

90 matches

Mail list logo