Re: [PATCH] Fix target/101934: aarch64 memset code creates unaligned stores for -mstrict-align

2021-09-16 Thread Andrew Pinski via Gcc-patches
On Wed, Sep 1, 2021 at 1:52 AM Richard Sandiford via Gcc-patches
 wrote:
>
> apinski--- via Gcc-patches  writes:
> > From: Andrew Pinski 
> >
> > The problem here is the aarch64_expand_setmem code did not check
> > STRICT_ALIGNMENT if it is creating an overlapping store.
> > This patch adds that check and the testcase works.
> >
> > gcc/ChangeLog:
> >
> >   PR target/101934
> >   * config/aarch64/aarch64.c (aarch64_expand_setmem):
> >   Check STRICT_ALIGNMENT before creating an overlapping
> >   store.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR target/101934
> >   * gcc.target/aarch64/memset-strict-align-1.c: New test.
>
> OK, thanks.

Applied now also on the GCC 11 branch.

Thanks,
Andrew

>
> Richard
>
> > ---
> >  gcc/config/aarch64/aarch64.c  |  4 +--
> >  .../aarch64/memset-strict-align-1.c   | 28 +++
> >  2 files changed, 30 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 3213585a588..26d59ba1e13 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -23566,8 +23566,8 @@ aarch64_expand_setmem (rtx *operands)
> >/* Do certain trailing copies as overlapping if it's going to be
> >cheaper.  i.e. less instructions to do so.  For instance doing a 15
> >byte copy it's more efficient to do two overlapping 8 byte copies 
> > than
> > -  8 + 4 + 2 + 1.  */
> > -  if (n > 0 && n < copy_limit / 2)
> > +  8 + 4 + 2 + 1.  Only do this when -mstrict-align is not supplied.  */
> > +  if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT)
> >   {
> > next_mode = smallest_mode_for_size (n, MODE_INT);
> > int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
> > diff --git a/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c 
> > b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
> > new file mode 100644
> > index 000..5cdc8a44968
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -mstrict-align" } */
> > +
> > +struct s { char x[95]; };
> > +void foo (struct s *);
> > +void bar (void) { struct s s1 = {}; foo (&s1); }
> > +
> > +/* memset (s1 = {}, sizeof = 95) should be expanded out
> > +   such that there are no overlap stores when -mstrict-align
> > +   is in use.
> > +   so 2 pair 16 bytes stores (64 bytes).
> > +   1 16 byte stores
> > +   1 8 byte store
> > +   1 4 byte store
> > +   1 2 byte store
> > +   1 1 byte store
> > +   */
> > +
> > +/* { dg-final { scan-assembler-times "stp\tq" 2 } } */
> > +/* { dg-final { scan-assembler-times "str\tq" 1 } } */
> > +/* { dg-final { scan-assembler-times "str\txzr" 1 } } */
> > +/* { dg-final { scan-assembler-times "str\twzr" 1 } } */
> > +/* { dg-final { scan-assembler-times "strh\twzr" 1 } } */
> > +/* { dg-final { scan-assembler-times "strb\twzr" 1 } } */
> > +
> > +/* Also one store pair for the frame-pointer and the LR. */
> > +/* { dg-final { scan-assembler-times "stp\tx" 1 } } */
> > +


Re: GNU Tools @ LPC 2021: Program is published

2021-09-16 Thread Thomas Schwinge
Hi!

On 2021-09-16T05:58:22+0200, Gerald Pfeifer  wrote:
> On Wed, 15 Sep 2021, Thomas Schwinge wrote:
>>> The program for the GNU Tools Track at Linux Plumbers Conference is
>>> published:
>>>
>>>   https://linuxplumbersconf.org/event/11/sessions/109/
>> This may qualify "as obvious", but I better get reviewed what I change on
>> our front page to the Internet ;-) -- OK to push to wwwdocs master branch
>> the attached "GNU Tools @ Linux Plumbers Conference 2021"?
>
> Yes, and thank you for thinking of this!

;-) Better late than never.  Pushed to wwwdocs master branch in
commit 51e2e792d8a66436df126a28e870ac9f38767600, see attached.

> (Maybe just say "held online" or "through videoconference".)

I had copied that wording from the 2020 entry -- but yes, I agree, and
pushed "Simplify 'held through online videoconference' to 'held online'"
to wwwdocs master branch in
commit f492e1d651aba79760a384c04941b699eb9d811e, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 51e2e792d8a66436df126a28e870ac9f38767600 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 15 Sep 2021 17:08:12 +0200
Subject: [PATCH 1/2] GNU Tools @ Linux Plumbers Conference 2021

---
 htdocs/index.html | 5 +
 1 file changed, 5 insertions(+)

diff --git a/htdocs/index.html b/htdocs/index.html
index d6b0d959..c7368e26 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -54,6 +54,11 @@ mission statement.
 
 News
 
+
+https://gcc.gnu.org/wiki/linuxplumbers2021";>GNU Tools @ Linux Plumbers Conference 2021
+[2021-09-15]
+Will be held through online videoconference, September 20-24 2021
+
 GCC 11.2 released
 [2021-07-28]
 
-- 
2.25.1

>From f492e1d651aba79760a384c04941b699eb9d811e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 16 Sep 2021 09:25:06 +0200
Subject: [PATCH 2/2] Simplify 'held through online videoconference' to 'held
 online'

---
 htdocs/index.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/index.html b/htdocs/index.html
index c7368e26..00df1d46 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -57,7 +57,7 @@ mission statement.
 
 https://gcc.gnu.org/wiki/linuxplumbers2021";>GNU Tools @ Linux Plumbers Conference 2021
 [2021-09-15]
-Will be held through online videoconference, September 20-24 2021
+Will be held online, September 20-24 2021
 
 GCC 11.2 released
 [2021-07-28]
@@ -85,7 +85,7 @@ mission statement.
 
 https://gcc.gnu.org/wiki/linuxplumbers2020";>GNU Tools @ Linux Plumbers Conference 2020
 [2020-07-17]
-Will be held through online videoconference, August 24-28 2020
+Will be held online, August 24-28 2020
 
 GCC 10.1 released
 [2020-05-07]
-- 
2.25.1



Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-16 Thread Richard Biener via Gcc-patches
On Wed, 15 Sep 2021, Koning, Paul wrote:

> 
> 
> > On Sep 13, 2021, at 3:31 AM, Richard Biener  wrote:
> > 
> > This makes defaults.h choose DWARF2_DEBUG if PREFERRED_DEBUGGING_TYPE
> > is not specified by the target and NO_DEBUG if DWARF is not supported.
> 
> As I'm looking at questions about old debug formats, it brings up the 
> question of old object formats.  I don't remember what the status of 
> a.out is.  Is that considered deprecated?  Still current?  Of course 
> most targets use elf, but is there an expectation to move away from 
> a.out the way there is an expectation to move away from STABS?
> 
> Is this actually a binutils rather than a gcc question?

I guess it's a question for both - I do still see a.out targets
in the configs supported by gas for example.

Note that languages like C++ might have difficulties with object
formats that do not support separate sections for instantiated
templates for example, or for global initializers.  We might have
kludges for that in collect2 where removing those might be a
motivation to deprecate object formats not supporting some
set of features (named sections for example).

As for "old", the problem with the legacy systems, being it
pdp11 or hppa-hpux, is of course that they tend to be kept alive
with minimal resources and doing major modernization doesn't
really make sense if all that is wanted is to preserve them
rather than turning them into something modern.

That said - yes, I'd consider a.out purely legacy and not fit
for the future.  But it never came up on the radar of standing
in the way of modernizing GCC in any area.

Richard.


[PATCH] [AVX512FP16] Support embedded broadcast for AVX512FP16 instructions.

2021-09-16 Thread liuhongt via Gcc-patches
  Bootstrapped and regtest on x86_64-pc-linux-gnu{-m32,}.
  Runtime tests passed under sde{-m32,}.

gcc/ChangeLog:

PR target/87767
* config/i386/i386.c (ix86_print_operand): Handle
V8HF/V16HF/V32HFmode.
* config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode.
* config/i386/sse.md (avx512bcst): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-broadcast-1.c: New test.
* gcc.target/i386/avx512fp16-broadcast-2.c: New test.
---
 gcc/config/i386/i386.c|  6 +++
 gcc/config/i386/i386.h|  3 +-
 gcc/config/i386/sse.md|  8 ---
 .../gcc.target/i386/avx512fp16-broadcast-1.c  | 33 
 .../gcc.target/i386/avx512fp16-broadcast-2.c  | 53 +++
 5 files changed, 94 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-2.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d7abff0f396..4dec27845fe 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13610,12 +13610,18 @@ ix86_print_operand (FILE *file, rtx x, int code)
case E_V8SFmode:
case E_V8DFmode:
case E_V8DImode:
+   case E_V8HFmode:
  fputs ("{1to8}", file);
  break;
case E_V16SFmode:
case E_V16SImode:
+   case E_V16HFmode:
  fputs ("{1to16}", file);
  break;
+   case E_V32HFmode:
+ fputs ("{1to32}", file);
+ break;
+
default:
  gcc_unreachable ();
}
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e76bb55c080..285aef9ce5e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1101,7 +1101,8 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 
 #define VALID_BCST_MODE_P(MODE)\
   ((MODE) == SFmode || (MODE) == DFmode\
-   || (MODE) == SImode || (MODE) == DImode)
+   || (MODE) == SImode || (MODE) == DImode \
+   || (MODE) == HFmode)
 
 /* It is possible to write patterns to move flags; but until someone
does it,  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a1d419292d1..ba3e5009852 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -821,14 +821,6 @@ (define_mode_iterator V8_128 [V8HI V8HF])
 (define_mode_iterator V16_256 [V16HI V16HF])
 (define_mode_iterator V32_512 [V32HI V32HF])
 
-(define_mode_attr avx512bcst
-  [(V4SI "%{1to4%}") (V2DI "%{1to2%}")
-   (V8SI "%{1to8%}") (V4DI "%{1to4%}")
-   (V16SI "%{1to16%}") (V8DI "%{1to8%}")
-   (V4SF "%{1to4%}") (V2DF "%{1to2%}")
-   (V8SF "%{1to8%}") (V4DF "%{1to4%}")
-   (V16SF "%{1to16%}") (V8DF "%{1to8%}")])
-
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse
   [(SF "sse") (DF "sse2") (HF "avx512fp16")
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-1.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-1.c
new file mode 100644
index 000..1da73493f3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-1.c
@@ -0,0 +1,33 @@
+/* PR target/87767 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl -mavx512dq" } */
+/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } 
} }
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 4 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to32\\\}" 4 } }  */
+
+typedef _Float16 v8hf  __attribute__ ((vector_size (16)));
+typedef _Float16 v16hf  __attribute__ ((vector_size (32)));
+typedef _Float16 v32hf  __attribute__ ((vector_size (64)));
+
+#define CONSTANT 101;
+#define FOO(VTYPE, OP_NAME, OP)\
+VTYPE  \
+ __attribute__ ((noipa))   \
+foo_##OP_NAME##_##VTYPE (VTYPE a)  \
+{  \
+  return a OP CONSTANT;\
+}  \
+
+FOO (v8hf, add, +);
+FOO (v16hf, add, +);
+FOO (v32hf, add, +);
+FOO (v8hf, sub, -);
+FOO (v16hf, sub, -);
+FOO (v32hf, sub, -);
+FOO (v8hf, mul, *);
+FOO (v16hf, mul, *);
+FOO (v32hf, mul, *);
+FOO (v8hf, div, /);
+FOO (v16hf, div, /);
+FOO (v32hf, div, /);
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-2.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-2.c
new file mode 100644
index 000..839bb562d3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-broadcast-2.c
@@ -0,0 +1,53 @@
+/* PR target/87767 */
+/* { dg-do run } */
+/* { dg-options "-O1 -mavx512fp16 -mavx512dq -mavx512vl" } */
+/* { dg-require-effective-target avx512dq } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-require-effective-target avx512fp16 } */
+
+#define AVX512

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-16 Thread Richard Biener via Gcc-patches
On Thu, 16 Sep 2021, liuhongt wrote:

> Ping
> rebased on latest trunk.
> 
> gcc/ChangeLog:
> 
>   * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
>   * doc/invoke.texi (Options That Control Optimization): Update
>   documents.
>   * opts.c (default_options_table): Enable auto-vectorization at
>   O2 with very-cheap cost model.
>   (finish_options): Use cheap cost model for
>   explicit -ftree{,-loop}-vectorize.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
>   * g++.dg/tree-ssa/pr81408.C: Ditto.
>   * g++.dg/warn/Wuninitialized-13.C: Ditto.
>   * gcc.dg/Warray-bounds-51.c: Ditto.
>   * gcc.dg/Warray-parameter-3.c: Ditto.
>   * gcc.dg/Wstringop-overflow-13.c: Ditto.
>   * gcc.dg/Wstringop-overflow-14.c: Ditto.
>   * gcc.dg/Wstringop-overflow-21.c: Ditto.
>   * gcc.dg/Wstringop-overflow-68.c: Ditto.
>   * gcc.dg/gomp/pr46032-2.c: Ditto.
>   * gcc.dg/gomp/pr46032-3.c: Ditto.
>   * gcc.dg/gomp/simd-2.c: Ditto.
>   * gcc.dg/gomp/simd-3.c: Ditto.
>   * gcc.dg/graphite/fuse-1.c: Ditto.
>   * gcc.dg/pr67089-6.c: Ditto.
>   * gcc.dg/pr82929-2.c: Ditto.
>   * gcc.dg/pr82929.c: Ditto.
>   * gcc.dg/store_merging_1.c: Ditto.
>   * gcc.dg/store_merging_11.c: Ditto.
>   * gcc.dg/store_merging_15.c: Ditto.
>   * gcc.dg/store_merging_16.c: Ditto.
>   * gcc.dg/store_merging_19.c: Ditto.
>   * gcc.dg/store_merging_24.c: Ditto.
>   * gcc.dg/store_merging_25.c: Ditto.
>   * gcc.dg/store_merging_28.c: Ditto.
>   * gcc.dg/store_merging_30.c: Ditto.
>   * gcc.dg/store_merging_5.c: Ditto.
>   * gcc.dg/store_merging_7.c: Ditto.
>   * gcc.dg/store_merging_8.c: Ditto.
>   * gcc.dg/strlenopt-85.c: Ditto.
>   * gcc.dg/tree-ssa/dump-6.c: Ditto.
>   * gcc.dg/tree-ssa/pr19210-1.c: Ditto.
>   * gcc.dg/tree-ssa/pr47059.c: Ditto.
>   * gcc.dg/tree-ssa/pr86017.c: Ditto.
>   * gcc.dg/tree-ssa/pr91482.c: Ditto.
>   * gcc.dg/tree-ssa/predcom-1.c: Ditto.
>   * gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-3.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-6.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-8.c: Ditto.
>   * gcc.dg/tree-ssa/prefetch-9.c: Ditto.
>   * gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
>   * gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
>   * gcc.dg/uninit-40.c: Ditto.
>   * gcc.dg/unroll-7.c: Ditto.
>   * gcc.misc-tests/help.exp: Ditto.
>   * gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
>   * gcc.target/i386/pr22141.c: Ditto.
>   * gcc.target/i386/pr34012.c: Ditto.
>   * gcc.target/i386/pr49781-1.c: Ditto.
>   * gcc.target/i386/pr95798-1.c: Ditto.
>   * gcc.target/i386/pr95798-2.c: Ditto.
>   * gfortran.dg/pr77498.f: Ditto.
> ---
>  gcc/common.opt |  2 +-
>  gcc/doc/invoke.texi|  8 +---
>  gcc/opts.c | 18 +++---
>  .../c-c++-common/Wstringop-overflow-2.c|  2 +-
>  gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
>  gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
>  gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
>  gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
>  gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
>  gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
>  gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
>  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
>  gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
>  gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
>  gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
>  gcc/testsuite/gcc.dg/gomp/simd-3.c |  2 +-
>  gcc/testsuite/gcc.dg/graphite/fuse-1.c |  2 +-
>  gcc/testsuite/gcc.dg/pr67089-6.c   |  2 +-
>  gcc/testsuite/gcc.dg/pr82929-2.c   |  2 +-
>  gcc/testsuite/gcc.dg/pr82929.c |  2 +-
>  gcc/testsuite/gcc.dg/store_merging_1.c |  2 +-
>  gcc/testsuite/gcc.dg/store_merging_11.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_15.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_16.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_19.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_24.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_25.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_28.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_30.c|  2 +-
>  gcc/testsuite/gcc.dg/store_merging_5.c |  2 +-
>  gcc/testsuite/gcc.dg/store_merging_7.c |  2 +-
>  gcc/testsuite/gcc.dg/store_merging_8.c |  2 +-
>  gcc/testsuite/gcc.dg/strlenopt-85.c|  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/dump-6.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr19210-1.c  |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr47059.c|  2 +-
>  gcc/testsuite/gcc.dg/t

Re: [PATCH] Check mask type when doing cond_op related gimple simplification.

2021-09-16 Thread Richard Biener via Gcc-patches
On Thu, Sep 16, 2021 at 8:27 AM liuhongt  wrote:
>
> Ping.
>
>   Bootstrapped and regtest on x86_64-linux-gnu{-m32,}, 
> aarch64-unknown-linux-gnu{-m32,}
>   Ok for trunk?
>
> gcc/ChangeLog:
>
> PR middle-end/102080
> * match.pd: Check mask type when doing cond_op related gimple
> simplification.
> * tree.c (is_truth_type_for): New function.
> * tree.h (is_truth_type_for): New declaration.
>
> gcc/testsuite/ChangeLog:
>
> PR middle-end/102080
> * gcc.target/i386/pr102080.c: New test.
> ---
>  gcc/match.pd |  8 +++
>  gcc/testsuite/gcc.target/i386/pr102080.c | 19 
>  gcc/tree.c   | 29 
>  gcc/tree.h   |  1 +
>  4 files changed, 53 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 008f7758c96..41f9e6d97f0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7020,13 +7020,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
>(with { tree op_type = TREE_TYPE (@4); }
> (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> op_type)
> -   && element_precision (type) == element_precision (op_type))
> +   && is_truth_type_for (op_type, TREE_TYPE (@0)))
>  (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))
>   (simplify
>(vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
>(with { tree op_type = TREE_TYPE (@4); }
> (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> op_type)
> -   && element_precision (type) == element_precision (op_type))
> +   && is_truth_type_for (op_type, TREE_TYPE (@0)))
>  (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))
>
>  /* Same for ternary operations.  */
> @@ -7036,13 +7036,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4)
>(with { tree op_type = TREE_TYPE (@5); }
> (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> op_type)
> -   && element_precision (type) == element_precision (op_type))
> +   && is_truth_type_for (op_type, TREE_TYPE (@0)))
>  (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4))
>   (simplify
>(vec_cond @0 @1 (view_convert? (uncond_op@5 @2 @3 @4)))
>(with { tree op_type = TREE_TYPE (@5); }
> (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> op_type)
> -   && element_precision (type) == element_precision (op_type))
> +   && is_truth_type_for (op_type, TREE_TYPE (@0)))
>  (view_convert (cond_op (bit_not @0) @2 @3 @4
>   (view_convert:op_type @1)))
>  #endif
> diff --git a/gcc/testsuite/gcc.target/i386/pr102080.c 
> b/gcc/testsuite/gcc.target/i386/pr102080.c
> new file mode 100644
> index 000..4c5ee32ee63
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr102080.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include
> +typedef float __m256 __attribute__((__vector_size__(32)));
> +__m256 _mm256_blendv_ps___Y, _mm256_blendv_ps___M, _mm256_mul_ps___A,
> +  _mm256_mul_ps___B, IfThenElse___trans_tmp_9;
> +
> +void
> +__attribute__ ((target("avx")))
> +IfThenElse (__m256 no) {
> +  IfThenElse___trans_tmp_9 = _mm256_blendv_ps (no, _mm256_blendv_ps___Y, 
> _mm256_blendv_ps___M);
> +}
> +void
> +__attribute__ ((target("avx512vl")))
> +EncodedFromDisplay() {
> +  __m256 __trans_tmp_11 = _mm256_mul_ps___A * _mm256_mul_ps___B;
> +  IfThenElse(__trans_tmp_11);
> +}
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 3d15948fd1a..994775d7314 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -10737,6 +10737,35 @@ signed_type_for (tree type)
>return signed_or_unsigned_type_for (0, type);
>  }
>
> +/* - For VECTOR_TYPEs:
> +- The truth type must be a VECTOR_BOOLEAN_TYPE.
> +- The number of elements must match (known_eq).
> +- targetm.vectorize.get_mask_mode exists, and exactly
> +  the same mode as the truth type.
> +   - Otherwise, the truth type must be a BOOLEAN_TYPE
> + or useless_type_conversion_p to BOOLEAN_TYPE.  */
> +bool
> +is_truth_type_for (tree type, tree truth_type)
> +{
> +  machine_mode mask_mode = TYPE_MODE (truth_type);
> +  machine_mode vmode = TYPE_MODE (type);
> +  machine_mode tmask_mode;
> +
> +  if (TREE_CODE (type) == VECTOR_TYPE)
> +{
> +  if (VECTOR_BOOLEAN_TYPE_P (truth_type)
> + && known_eq (TYPE_VECTOR_SUBPARTS (type),
> +  TYPE_VECTOR_SUBPARTS (truth_type))
> + && targetm.vectorize.get_mask_mode (vmode).exists (&tmask_mode)
> + && tmask_mode == mask_mode)
> +   return true;
> +
> +  return false;
> +}
> +
> +  return useless_type_conversion_p (boolean_type_node, truth_type);
> +}
> +
>  /* If TYPE is a vector type, retur

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-16 Thread Hongtao Liu via Gcc-patches
On Thu, Sep 16, 2021 at 4:23 PM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, 16 Sep 2021, liuhongt wrote:
>
> > Ping
> > rebased on latest trunk.
> >
> > gcc/ChangeLog:
> >
> >   * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
> >   * doc/invoke.texi (Options That Control Optimization): Update
> >   documents.
> >   * opts.c (default_options_table): Enable auto-vectorization at
> >   O2 with very-cheap cost model.
> >   (finish_options): Use cheap cost model for
> >   explicit -ftree{,-loop}-vectorize.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
> >   * g++.dg/tree-ssa/pr81408.C: Ditto.
> >   * g++.dg/warn/Wuninitialized-13.C: Ditto.
> >   * gcc.dg/Warray-bounds-51.c: Ditto.
> >   * gcc.dg/Warray-parameter-3.c: Ditto.
> >   * gcc.dg/Wstringop-overflow-13.c: Ditto.
> >   * gcc.dg/Wstringop-overflow-14.c: Ditto.
> >   * gcc.dg/Wstringop-overflow-21.c: Ditto.
> >   * gcc.dg/Wstringop-overflow-68.c: Ditto.
> >   * gcc.dg/gomp/pr46032-2.c: Ditto.
> >   * gcc.dg/gomp/pr46032-3.c: Ditto.
> >   * gcc.dg/gomp/simd-2.c: Ditto.
> >   * gcc.dg/gomp/simd-3.c: Ditto.
> >   * gcc.dg/graphite/fuse-1.c: Ditto.
> >   * gcc.dg/pr67089-6.c: Ditto.
> >   * gcc.dg/pr82929-2.c: Ditto.
> >   * gcc.dg/pr82929.c: Ditto.
> >   * gcc.dg/store_merging_1.c: Ditto.
> >   * gcc.dg/store_merging_11.c: Ditto.
> >   * gcc.dg/store_merging_15.c: Ditto.
> >   * gcc.dg/store_merging_16.c: Ditto.
> >   * gcc.dg/store_merging_19.c: Ditto.
> >   * gcc.dg/store_merging_24.c: Ditto.
> >   * gcc.dg/store_merging_25.c: Ditto.
> >   * gcc.dg/store_merging_28.c: Ditto.
> >   * gcc.dg/store_merging_30.c: Ditto.
> >   * gcc.dg/store_merging_5.c: Ditto.
> >   * gcc.dg/store_merging_7.c: Ditto.
> >   * gcc.dg/store_merging_8.c: Ditto.
> >   * gcc.dg/strlenopt-85.c: Ditto.
> >   * gcc.dg/tree-ssa/dump-6.c: Ditto.
> >   * gcc.dg/tree-ssa/pr19210-1.c: Ditto.
> >   * gcc.dg/tree-ssa/pr47059.c: Ditto.
> >   * gcc.dg/tree-ssa/pr86017.c: Ditto.
> >   * gcc.dg/tree-ssa/pr91482.c: Ditto.
> >   * gcc.dg/tree-ssa/predcom-1.c: Ditto.
> >   * gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
> >   * gcc.dg/tree-ssa/prefetch-3.c: Ditto.
> >   * gcc.dg/tree-ssa/prefetch-6.c: Ditto.
> >   * gcc.dg/tree-ssa/prefetch-8.c: Ditto.
> >   * gcc.dg/tree-ssa/prefetch-9.c: Ditto.
> >   * gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
> >   * gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
> >   * gcc.dg/uninit-40.c: Ditto.
> >   * gcc.dg/unroll-7.c: Ditto.
> >   * gcc.misc-tests/help.exp: Ditto.
> >   * gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
> >   * gcc.target/i386/pr22141.c: Ditto.
> >   * gcc.target/i386/pr34012.c: Ditto.
> >   * gcc.target/i386/pr49781-1.c: Ditto.
> >   * gcc.target/i386/pr95798-1.c: Ditto.
> >   * gcc.target/i386/pr95798-2.c: Ditto.
> >   * gfortran.dg/pr77498.f: Ditto.
> > ---
> >  gcc/common.opt |  2 +-
> >  gcc/doc/invoke.texi|  8 +---
> >  gcc/opts.c | 18 +++---
> >  .../c-c++-common/Wstringop-overflow-2.c|  2 +-
> >  gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
> >  gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
> >  gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
> >  gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
> >  gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
> >  gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
> >  gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
> >  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
> >  gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
> >  gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
> >  gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
> >  gcc/testsuite/gcc.dg/gomp/simd-3.c |  2 +-
> >  gcc/testsuite/gcc.dg/graphite/fuse-1.c |  2 +-
> >  gcc/testsuite/gcc.dg/pr67089-6.c   |  2 +-
> >  gcc/testsuite/gcc.dg/pr82929-2.c   |  2 +-
> >  gcc/testsuite/gcc.dg/pr82929.c |  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_1.c |  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_11.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_15.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_16.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_19.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_24.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_25.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_28.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_30.c|  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_5.c |  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_7.c |  2 +-
> >  gcc/testsuite/gcc.dg/store_merging_8.c

[PATCH] C++: add type checking for static local vector variable in template

2021-09-16 Thread wangpc via Gcc-patches
This patch adds type checking for static local vector variable in
C++ template, both AArch64 SVE and RISCV RVV are of sizeless type
and they all have this issue.

2021-08-06  wangpc  

gcc/cp/ChangeLog

* pt.c (tsubst_decl): Add type checking.

gcc/testsuite/ChangeLog

* g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..e3a06ea0858 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7520,6 +7520,12 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   && DECL_INITIALIZED_IN_CLASS_P (decl))
 check_static_variable_definition (decl, type);
 
+  if (VAR_P (decl)
+  && DECL_FUNCTION_SCOPE_P (decl)
+  && TREE_STATIC (decl))
+verify_type_context (DECL_SOURCE_LOCATION (decl),
+ TCTX_STATIC_STORAGE, type);
+
   if (init && TREE_CODE (decl) == FUNCTION_DECL)
 {
   tree clone;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index 000..c2395d18d50
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */
-- 
2.33.0.windows.1



[patch] Fix PR rtl-optimization/102306

2021-09-16 Thread Eric Botcazou
Hi,

this is a duplication of volatile loads introduced during GCC 9 development by 
the new 2->2 mechanism of the RTL combiner.  There is already a substantial 
checking for volatile references in can_combine_p but it implicitly assumes 
that the combination reduces the number of instructions, which is of course 
not the case here.  So the fix teaches try_combine to abort the combination 
when it is about to make a copy of volatile references to preserve them.

Bootstrapped/regtested on x86-64/Linux, OK for mainline and release branches?


2021-09-16  Eric Botcazou  

PR rtl-optimization/102306
* combine.c (try_combine): Abort the combination if we are about
to duplicate volatile references.


2021-09-16  Eric Botcazou  

* gcc.target/sparc/20210916-1.c: New test.

-- 
Eric Botcazoudiff --git a/gcc/combine.c b/gcc/combine.c
index 290a3667c65..892c834a160 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3063,6 +3063,16 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
   return 0;
 }
 
+  /* We cannot safely duplicate volatile references in any case.  */
+
+  if ((added_sets_2 && volatile_refs_p (PATTERN (i2)))
+  || (added_sets_1 && volatile_refs_p (PATTERN (i1)))
+  || (added_sets_0 && volatile_refs_p (PATTERN (i0
+{
+  undo_all ();
+  return 0;
+}
+
   /* Count how many auto_inc expressions there were in the original insns;
  we need to have the same number in the resulting patterns.  */
 
/* { dg-do compile } */
/* { dg-require-effective-target ilp32 } */
/* { dg-options "-O -mcpu=v8" } */

extern void foo (void);

void test (volatile unsigned char *a) 
{ 
  char b = *a;
  if (!b)
return;
  if (b & 2)
foo ();
}

/* { dg-final { scan-assembler-times "ldub" 1 } } */


Re: [PATCH] C++: add type checking for static local vector variable in template

2021-09-16 Thread pc.wang via Gcc-patches
I move the verify_type_context code to cp_finish_decl and it works.
I will send the new patch later.
--
Sender:Jason Merrill 
Sent At:2021 Sep. 16 (Thu.) 05:04
Recipient:wangpc ; gcc-patches 

Subject:Re: [PATCH] C++: add type checking for static local vector variable in 
template

On 9/6/21 08:10, wangpc via Gcc-patches wrote:
> This patch adds type checking for static local vector variable in
> C++ template, both AArch64 SVE and RISCV RVV are of sizeless type
> and thay all have this issue.
> 
> 2021-08-06  wangpc  
> 
> gcc/cp/ChangeLog
> 
>  * pt.c (tsubst_decl): Add type checking.
> 
> gcc/testsuite/ChangeLog
> 
>  * g++.target/aarch64/sve/static-var-in-template.C: New test.
> ---
>   gcc/cp/pt.c|  8 +++-
>   .../aarch64/sve/static-var-in-template.C   | 18 ++
>   2 files changed, 25 insertions(+), 1 deletion(-)
>   create mode 100644 
> gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
> 
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index f0aa626ab723..988f4cb1e73f 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -14731,7 +14731,13 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
> complain)
>  even if its underlying type is not.  */
>  TYPE_DEPENDENT_P_VALID (TREE_TYPE (r)) = false;
>  }
> -
> +/* We should verify static local variable's type
> +since vector type does not have a fixed size.  */
> +if (TREE_STATIC (t)
> +  &&!verify_type_context (input_location, TCTX_STATIC_STORAGE, type))

It seems that the reason this was missed before was because we checked 
for this in start_decl, which isn't called for template instantiation. 
Would it work to move the verify_type_context code from start_decl to 
cp_finish_decl, near the other call to verify_type_context, instead of 
doing anything here?

> +{
> +  RETURN (error_mark_node);
> +}
>layout_decl (r, 0);
> }
> break;
> diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
> b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
> new file mode 100644
> index ..26d397ca565d
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +
> +#include 
> +
> +template 
> +void f()
> +{
> +int i = 0;
> +static svbool_t pg = svwhilelt_b64(0, N);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +f<2>();
> +return 0;
> +}
> +
> +/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */
> 

[Patch] PowerPC: Fix rs6000-gen-builtins with build != host [PR102353]

2021-09-16 Thread Tobias Burnus

As mentioned in https://gcc.gnu.org/PR102353 and in the patch,
rs6000-gen-builtins was build to be run on "host" – and then linked and
on on "build".

That caused bootstrap fails at link time.

The patch now does the same as Makefile.in for 'gen*', i.e. build under
build/ (using the Makefile.in rule), the linking is already the same as
for 'build/gen%' and for running, it runs it with valgrind if configured
(as gen* do). additionally, I added the exe extension var, in case it is
needed, following the gen* rules.

Tested with a x86_64-gnu-linux (build) → powerpc64le-linux-gnu (host,
target) build.

OK?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
PowerPC: Fix rs6000-gen-builtins with build != host [PR102353]

This mimics what the main Makefile.in does: compile the generator
files under build (with Makefile.in's 'build/%.o' rule for compilation).
It also adds $(RUN_GEN) to optionally run it with valgrind and
the $(build_exeext) suffix.

Before, the .o files were compiled with $(COMPILE), causing link
error with $(LINKER_FOR_BUILD) for build != host.

gcc/
	PR target/102353
	* config/rs6000/t-rs6000 (build/rs6000-gen-builtins.o,
	build/rbtree.o): Added 'build/' to target, use build/%.o rule.
	(build/rs6000-gen-builtins$(build_exeext)): Add 'build/' and
	'$(build_exeext)' to target and 'build/' for the *.o files.
	(rs6000-builtins.c): Update for those changes; run
	rs6000-gen-builtins with $(RUN_GEN).

diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 92766d8..7752e16 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -44,15 +44,10 @@ rs6000-logue.o: $(srcdir)/config/rs6000/rs6000-logue.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
-rs6000-gen-builtins.o: $(srcdir)/config/rs6000/rs6000-gen-builtins.c
-	$(COMPILE) $<
-	$(POSTCOMPILE)
-
-rbtree.o: $(srcdir)/config/rs6000/rbtree.c
-	$(COMPILE) $<
-	$(POSTCOMPILE)
+build/rs6000-gen-builtins.o: $(srcdir)/config/rs6000/rs6000-gen-builtins.c
+build/rbtree.o: $(srcdir)/config/rs6000/rbtree.c
 
-rs6000-gen-builtins: rs6000-gen-builtins.o rbtree.o
+build/rs6000-gen-builtins$(build_exeext): build/rs6000-gen-builtins.o build/rbtree.o $(BUILD_LIBDEPS)
 	$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \
 	$(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS)
 
@@ -62,10 +57,11 @@ rs6000-gen-builtins: rs6000-gen-builtins.o rbtree.o
 #   
 # For now, the header files depend on rs6000-builtins.c, which avoids
 # races because the .c file is closed last in rs6000-gen-builtins.c.
-rs6000-builtins.c: rs6000-gen-builtins \
+rs6000-builtins.c: build/rs6000-gen-builtins$(build_exeext) \
 		   $(srcdir)/config/rs6000/rs6000-builtin-new.def \
 		   $(srcdir)/config/rs6000/rs6000-overload.def
-	./rs6000-gen-builtins $(srcdir)/config/rs6000/rs6000-builtin-new.def \
+	$(RUN_GEN) ./build/rs6000-gen-builtins$(build_exeext) \
+		$(srcdir)/config/rs6000/rs6000-builtin-new.def \
 		$(srcdir)/config/rs6000/rs6000-overload.def rs6000-builtins.h \
 		rs6000-builtins.c rs6000-vecdefines.h
 


Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-16 Thread Christophe LYON via Gcc-patches



On 15/09/2021 18:43, Richard Earnshaw via Gcc-patches wrote:



On 15/09/2021 17:13, Christophe Lyon via Gcc-patches wrote:

On Wed, Sep 15, 2021 at 2:49 PM Richard Earnshaw via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:




On 15/09/2021 13:26, Christophe LYON via Gcc-patches wrote:


On 15/09/2021 13:02, Richard Earnshaw wrote:



On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:

g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
double-precision FPU support, but does not make sure it is actually
supported by the target.
Check (__ARM_FP & 8) to ensure this.

2021-08-26  Christophe Lyon 

 gcc/testsuite/
 * g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
---
   gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
index 62263c0c3b0..90d20081d78 100644
--- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
+++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
@@ -3,7 +3,7 @@
 /* Test to catch off-by-one errors in arm/pr-support.c.  */
   -#if defined (__VFP_FP__) && !defined (__SOFTFP__)
+#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP & 8)
 #include 
   #include 



Wouldn't it be better to have an alternate to the asm for the case
where we only have single-precision float?  Something like 
(untested):


static void donkey ()
{
#if __ARM_FP & 8
   asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
#else
   asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
#endif
   throw 1;
}



I tried similar things but they failed on some testing configurations.

Let me try your version, I'll let you know if there is any fallout.


Of course, the asm syntax should be converted to the new 'unified
syntax' form ie vmov.f{32,64}.



The problem is that %P expects a double-precision register.
It seems there's nothing to print a single-precision register, or 
rather %p

(small p)
rejects s18 too.



I said it was untested :)


In fact, I now remember I tried similar things and everything failed, 
hence my proposal at the start of this thread :-)





You want something like

#if __ARM_FP & 8
    asm volatile ("vmov.f64 d9, %P0" : : "w" (1.2345) : "d9");
#else
    asm volatile ("vmov.f32 s18, %0" : : "t" (1.2345f) : "s18");
#endif

(there's no need for a modifier on the single-precision register name).


Ha! I missed the magic "t".

I confirm this fixes the issues that motivated my original patch.

Do you want me to commit it?


Thanks

Christophe







R.



Christophe




R.




[PATCH] C++: add type checking for static local vector variable in template

2021-09-16 Thread wangpc via Gcc-patches
This patch adds type checking for static local vector variable in
C++ template, both AArch64 SVE and RISCV RVV are of sizeless type
and they all have this issue.

2021-08-06  wangpc  

gcc/cp/ChangeLog

* decl.c (cp_finish_decl): Add type checking.

gcc/testsuite/ChangeLog

* g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..e3a06ea0858 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7520,6 +7520,12 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   && DECL_INITIALIZED_IN_CLASS_P (decl))
 check_static_variable_definition (decl, type);
 
+  if (VAR_P (decl)
+  && DECL_FUNCTION_SCOPE_P (decl)
+  && TREE_STATIC (decl))
+verify_type_context (DECL_SOURCE_LOCATION (decl),
+ TCTX_STATIC_STORAGE, type);
+
   if (init && TREE_CODE (decl) == FUNCTION_DECL)
 {
   tree clone;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index 000..c2395d18d50
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */
-- 
2.33.0.windows.1



Re: [PATCH v2 1/2] MIPS: use mips_isa enum instead hardcoded numbers

2021-09-16 Thread Martin Liška

On 9/15/21 15:08, Martin Liška wrote:

Hello.

I noticed the change likely caused the following failure when building
x86_64-linux-gnu cross compiler:

g++  -fno-PIE -c  -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g   -DIN_GCC  
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -Ic-family 
-I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/c-family 
-I/home/marxin/Programming/gcc/gcc/../include 
-I/home/marxin/Programming/gcc/gcc/../libcpp/include 
-I/home/marxin/Programming/gcc/gcc/../libcody  
-I/home/marxin/Programming/gcc/gcc/../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o c-family/c-cppbuiltin.o 
-MT c-family/c-cppbuiltin.o -MMD -MP -MF c-family/.deps/c-cppbuiltin.TPo 
/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c

In file included from ./tm.h:26,

  from /home/marxin/Programming/gcc/gcc/target.h:52,

  from 
/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c:23:

/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c: In function ‘void 
c_cpp_builtins(cpp_reader*)’:

/home/marxin/Programming/gcc/gcc/config/mips/netbsd.h:90:28: error: 
‘MIPS_ISA_64’ was not declared in this scope; did you mean ‘MIPS_ISA_MIPS64’?

    90 |   else if (mips_isa >= MIPS_ISA_64) \

   |    ^~~

/home/marxin/Programming/gcc/gcc/c-family/c-cppbuiltin.c:1551:3: note: in 
expansion of macro ‘TARGET_CPU_CPP_BUILTINS’

  1551 |   TARGET_CPU_CPP_BUILTINS ();

   |   ^~~


It's configured with:
--host=x86_64-pc-linux-gnu --target=mips-netbsd

Thanks,
Martin


Hi.

I'm going to push the following patch. It's obvious typo introduced in 
g:4ecfc7e3debac53a30558d7ae33e8cdfdf351466.

Cheers,
MartinFrom 287cc5af0eb3049e584a9f7cf5a57f8375eee64f Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 16 Sep 2021 11:17:28 +0200
Subject: [PATCH] mips: Fix macro typo

gcc/ChangeLog:

	* config/mips/netbsd.h: Fix typo in name of a macro.
---
 gcc/config/mips/netbsd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/netbsd.h b/gcc/config/mips/netbsd.h
index 85c27793d4e..1c6a59d3c34 100644
--- a/gcc/config/mips/netbsd.h
+++ b/gcc/config/mips/netbsd.h
@@ -87,7 +87,7 @@ along with GCC; see the file COPYING3.  If not see
   else if (mips_isa >= MIPS_ISA_MIPS32			\
 	   && mips_isa < MIPS_ISA_MIPS64)			\
 	builtin_define ("__mips=32");\
-  else if (mips_isa >= MIPS_ISA_64)\
+  else if (mips_isa >= MIPS_ISA_MIPS64)			\
 	builtin_define ("__mips=64");\
   if (mips_isa_rev > 0)	\
 builtin_define_with_int_value ("__mips_isa_rev",	\
-- 
2.33.0



Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 15, 2021 at 05:59:08PM +, Qing Zhao wrote:
> > Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
> > deal with that right now, just try
> > make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> > i386.exp=auto-init*'
> 
> It’s strange that the above testing on my local x86 machine with the latest 
> gcc had less failure than the following:
> 
> [opc@qinzhao-ol8u3-x86 build-boot]$ make check-gcc 
> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*' &> log &
> [1] 3885164
> [opc@qinzhao-ol8u3-x86 build-boot]$ 
> [1]+  Donemake check-gcc 
> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*' &> log
> [opc@qinzhao-ol8u3-x86 build-boot]$ egrep FAIL gcc/testsuite/gcc/gcc.sum
> FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
> "0xfefefefe" 2
> FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
> "0xfefefefefefefefe" 3
> FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\t\\%xmm0, 
> \\%xmm0 3
> FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
> "0xfefefefe" 1
> FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
> "\\[0xfefefefefefefefe\\]" 1
> FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
> "0xfffe\\]\\) repeated x16" 1
> FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times \\.long\t0 14
> FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\t\\$16,
> FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
> FAIL: gcc.target/i386/auto-init-padding-7.c scan-assembler-times movq\t\\$0, 2
> FAIL: gcc.target/i386/auto-init-padding-8.c scan-assembler-times movq\t\\$0, 2
> FAIL: gcc.target/i386/auto-init-padding-9.c scan-assembler rep stosq

Testing for many instructions is always very fragile and dependent on exact
compiler flags etc.  So, either the test should have a particular
-march=/-mtune= options and ideally also -fno-stack-protector
-fno-stack-clash-protection etc. if they could change the expected matching,
or test it at runtime instead (I know, it is playing with fire, because you
are testing the behavior of UB, but perhaps making the functions that use
the uninitialized vars __attribute__((noipa)) and checking whether the vars
contain the expected values might be ok.

Jakub



Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-16 Thread Richard Earnshaw via Gcc-patches




On 16/09/2021 10:12, Christophe LYON via Gcc-patches wrote:


On 15/09/2021 18:43, Richard Earnshaw via Gcc-patches wrote:



On 15/09/2021 17:13, Christophe Lyon via Gcc-patches wrote:

On Wed, Sep 15, 2021 at 2:49 PM Richard Earnshaw via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:




On 15/09/2021 13:26, Christophe LYON via Gcc-patches wrote:


On 15/09/2021 13:02, Richard Earnshaw wrote:



On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:

g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
double-precision FPU support, but does not make sure it is actually
supported by the target.
Check (__ARM_FP & 8) to ensure this.

2021-08-26  Christophe Lyon 

 gcc/testsuite/
 * g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
---
   gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
index 62263c0c3b0..90d20081d78 100644
--- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
+++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
@@ -3,7 +3,7 @@
 /* Test to catch off-by-one errors in arm/pr-support.c.  */
   -#if defined (__VFP_FP__) && !defined (__SOFTFP__)
+#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP & 8)
 #include 
   #include 



Wouldn't it be better to have an alternate to the asm for the case
where we only have single-precision float?  Something like 
(untested):


static void donkey ()
{
#if __ARM_FP & 8
   asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
#else
   asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
#endif
   throw 1;
}



I tried similar things but they failed on some testing configurations.

Let me try your version, I'll let you know if there is any fallout.


Of course, the asm syntax should be converted to the new 'unified
syntax' form ie vmov.f{32,64}.



The problem is that %P expects a double-precision register.
It seems there's nothing to print a single-precision register, or 
rather %p

(small p)
rejects s18 too.



I said it was untested :)


In fact, I now remember I tried similar things and everything failed, 
hence my proposal at the start of this thread :-)





You want something like

#if __ARM_FP & 8
    asm volatile ("vmov.f64 d9, %P0" : : "w" (1.2345) : "d9");
#else
    asm volatile ("vmov.f32 s18, %0" : : "t" (1.2345f) : "s18");
#endif

(there's no need for a modifier on the single-precision register name).


Ha! I missed the magic "t".

I confirm this fixes the issues that motivated my original patch.

Do you want me to commit it?


Yes, please.

R.




Thanks

Christophe







R.



Christophe




R.




[PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-16 Thread Feng Xue OS via Gcc-patches
This and following patches are composed to enable full devirtualization
under whole program assumption (so also called whole-program
devirtualization, WPD for short), which is an enhancement to current
speculative devirtualization. The base of the optimization is how to
identify class type that is local in terms of whole-program scope, at
least  those class types in libstdc++ must be excluded in some way.
Our means is to use typeinfo symbol as identity marker of a class since
it is unique and always generated once the class or its derived type
is instantiated somewhere, and rely on symbol resolution by
lto-linker-plugin to detect whether  a typeinfo is referenced by regular
object/library, which indirectly tells class types are escaped or not.
The RFC at https://gcc.gnu.org/pipermail/gcc/2021-August/237132.html
gives more details on that.

Bootstrapped/regtested on x86_64-linux and aarch64-linux.

Thanks,
Feng


2021-09-07  Feng Xue  

gcc/
* common.opt (-fdevirtualize-fully): New option.
* class.c (build_rtti_vtbl_entries): Force generation of typeinfo
even -fno-rtti is specificied under full devirtualization.
* cgraph.c (cgraph_update_edges_for_call_stmt): Add an assertion
to check node to be traversed.
* cgraphclones.c (cgraph_node::find_replacement): Record
former_clone_of on replacement node.
* cgraphunit.c (symtab_node::needed_p): Always output vtable for
full devirtualization.
(analyze_functions): Force generation of primary vtables for all
base classes.
* ipa-devirt.c (odr_type_d::whole_program_local): New field.
(odr_type_d::has_virtual_base): Likewise.
(odr_type_d::all_derivations_known): Removed.
(odr_type_d::whole_program_local_p): New member function.
(odr_type_d::all_derivations_known_p): Likewise.
(odr_type_d::possibly_instantiated_p): Likewise.
(odr_type_d::set_has_virtual_base): Likewise.
(get_odr_type): Set "whole_program_local" and "has_virtual_base"
when adding a type.
(type_all_derivations_known_p): Replace implementation by a call
to odr_type_d::all_derivations_known_p.
(type_possibly_instantiated_p): Replace implementation by a call
to odr_type_d::possibly_instantiated_p.
(type_known_to_have_no_derivations_p): Replace call to
type_possibly_instantiated_p with call to
odr_type_d::possibly_instantiated_p.
(type_all_ctors_visible_p): Removed.
(type_whole_program_local_p): New function.
(get_type_vtable): Likewise.
(extract_typeinfo_in_vtable): Likewise.
(identify_whole_program_local_types): Likewise.
(dump_odr_type): Dump has_virtual_base and whole_program_local_p()
of type.
(maybe_record_node): Resort to type_whole_program_local_p to
check whether a class has been optimized away.
(record_target_from_binfo): Remove parameter "anonymous", add
a new parameter "possibly_instantiated", and adjust code
accordingly.
(devirt_variable_node_removal_hook): Replace call to
"type_in_anonymous_namespace_p" with "type_whole_program_local_p".
(possible_polymorphic_call_targets): Replace call to
"type_possibly_instantiated_p" with "possibly_instantiated_p",
replace flag check on "all_derivations_known" with call to
 "all_derivations_known_p".
* ipa-icf.c (filter_removed_items): Disable folding on vtable
under full devirtualization.
* ipa-polymorphic-call.c (restrict_to_inner_class): Move odr
type check to type_known_to_have_no_derivations_p.
* ipa-utils.h (identify_whole_program_local_types): New
declaration.
(type_all_derivations_known_p): Parameter type adjustment.
* ipa.c (walk_polymorphic_call_targets): Do not mark vcall
targets as reachable for full devirtualization.
(can_remove_vtable_if_no_refs_p): New function.
(symbol_table::remove_unreachable_nodes): Add defined vtables
to reachable list under full devirtualization.
* lto-symtab.c (lto_symtab_merge_symbols): Identify whole
program local types after symbol table merge.
---From 2632d8e7ea8f96cb545e57dedd9e4148b5a2cae4 Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Mon, 6 Sep 2021 15:03:31 +0800
Subject: [PATCH 1/2] WPD: Enable whole program devirtualization

Enable full devirtualization under whole program assumption (so also
called whole-program devirtualization, WPD for short). The base of the
optimization is how to identify class type that is local in terms of
whole-program scope. But "whole program" does not ensure that class
hierarchy of a type never span to dependent C++ libraries (one is
libstdc++), which would result in incorrect devirtualization. An
example is given below to demonstrate the problem.

// Has been pre-compiled to a library
class Base {
vi

[PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS

2021-09-16 Thread Feng Xue OS via Gcc-patches
This patch is to extend applicability  of full devirtualization to LTRANS stage.
Normally, whole program assumption would not hold when WPA splits
whole compilation into more than one LTRANS partitions. To avoid information
lost for WPD at LTRANS, we will record all vtable nodes and related member
function references into each partition.

Bootstrapped/regtested on x86_64-linux and aarch64-linux.

Thanks,
Feng


2021-09-07  Feng Xue  

gcc/
* tree.h (TYPE_CXX_LOCAL): New macro for type using
base.nothrow_flag.
* tree-core.h (tree_base): Update comment on using
base.nothrow_flag to represent TYPE_CXX_LOCAL.
* ipa-devirt.c (odr_type_d::whole_program_local): Removed.
(odr_type_d::whole_program_local_p): Check TYPE_CXX_LOCAL flag
on type, and enable WPD at LTRANS when flag_devirtualize_fully
is true.
(get_odr_type): Remove setting whole_program_local flag on type.
(identify_whole_program_local_types): Replace whole_program_local
in odr_type_d by TYPE_CXX_LOCAL on type.
(maybe_record_node): Enable WPD at LTRANS when
flag_devirtualize_fully is true.
* ipa.c (can_remove_vtable_if_no_refs_p): Retain vtables at LTRANS
stage under full devirtualization.
* lto-cgraph.c (compute_ltrans_boundary): Add all defined vtables
to boundary of each LTRANS partition.
* lto-streamer-out.c (get_symbol_initial_value): Streaming out
initial value of vtable even its class is optimized away.
* lto-lang.c (lto_post_options): Disable full devirtualization
if flag_ltrans_devirtualize is false.
* tree-streamer-in.c (unpack_ts_base_value_fields): unpack value
of TYPE_CXX_LOCAL for a type from streaming data.
* tree-streamer-out.c (pack_ts_base_value_fields): pack value
ofTYPE_CXX_LOCAL for a type into streaming data.
---
From 3af32b9aadff23d339750ada4541386b3d358edc Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Mon, 6 Sep 2021 20:34:50 +0800
Subject: [PATCH 2/2] WPD: Enable whole program devirtualization at LTRANS

Whole program assumption would not hold when WPA splits whole compilation
into more than one LTRANS partitions. To avoid information lost for WPD
at LTRANS, we will record all vtable nodes and related member function
references into each partition.

2021-09-07  Feng Xue  

gcc/
	* tree.h (TYPE_CXX_LOCAL): New macro for type using
	base.nothrow_flag.
   	* tree-core.h (tree_base): Update comment on using
	base.nothrow_flag to represent TYPE_CXX_LOCAL.
	* ipa-devirt.c (odr_type_d::whole_program_local): Removed.
(odr_type_d::whole_program_local_p): Check TYPE_CXX_LOCAL flag
	on type, and enable WPD at LTRANS when flag_devirtualize_fully
	is true.
(get_odr_type): Remove setting whole_program_local flag on type.
(identify_whole_program_local_types): Replace whole_program_local
	in odr_type_d by TYPE_CXX_LOCAL on type.
(maybe_record_node): Enable WPD at LTRANS when
	flag_devirtualize_fully	is true.
* ipa.c (can_remove_vtable_if_no_refs_p): Retain vtables at LTRANS
	stage under full devirtualization.
* lto-cgraph.c (compute_ltrans_boundary): Add all defined vtables
	to boundary of each LTRANS partition.
	* lto-streamer-out.c (get_symbol_initial_value): Streaming out
	initial	value of vtable even its class is optimized away.
	* lto-lang.c (lto_post_options): Disable full devirtualization
	if flag_ltrans_devirtualize is false.
	* tree-streamer-in.c (unpack_ts_base_value_fields): unpack value
	of TYPE_CXX_LOCAL for a type from streaming data.
	* tree-streamer-out.c (pack_ts_base_value_fields): pack value
	ofTYPE_CXX_LOCAL for a type into streaming data.
---
 gcc/ipa-devirt.c| 29 ++---
 gcc/ipa.c   |  7 ++-
 gcc/lto-cgraph.c| 18 ++
 gcc/lto-streamer-out.c  | 12 +++-
 gcc/lto/lto-lang.c  |  6 ++
 gcc/tree-core.h |  3 +++
 gcc/tree-streamer-in.c  | 11 ---
 gcc/tree-streamer-out.c | 11 ---
 gcc/tree.h  |  5 +
 9 files changed, 83 insertions(+), 19 deletions(-)

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index fcb097d7156..65e9ebbfb59 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -216,8 +216,6 @@ struct GTY(()) odr_type_d
   int id;
   /* Is it in anonymous namespace? */
   bool anonymous_namespace;
-  /* Set when type is not used outside of program.  */
-  bool whole_program_local;
   /* Did we report ODR violation here?  */
   bool odr_violated;
   /* Set when virtual table without RTTI prevailed table with.  */
@@ -290,10 +288,18 @@ get_type_vtable (tree type)
 bool
 odr_type_d::whole_program_local_p ()
 {
-  if (flag_ltrans)
+  if (flag_ltrans && !flag_devirtualize_fully)
 return false;
 
-  return whole_program_local;
+  if (in_lto_p)
+return TYPE_CXX_LOCAL (type);
+
+  /* Although a local class is always considered as whole program loca

Re: [PATCH] testsuite: Make sure double-precision is supported in g++.dg/eh/arm-vfp-unwind.C

2021-09-16 Thread Christophe Lyon via Gcc-patches
On Thu, Sep 16, 2021 at 11:21 AM Richard Earnshaw via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 16/09/2021 10:12, Christophe LYON via Gcc-patches wrote:
> >
> > On 15/09/2021 18:43, Richard Earnshaw via Gcc-patches wrote:
> >>
> >>
> >> On 15/09/2021 17:13, Christophe Lyon via Gcc-patches wrote:
> >>> On Wed, Sep 15, 2021 at 2:49 PM Richard Earnshaw via Gcc-patches <
> >>> gcc-patches@gcc.gnu.org> wrote:
> >>>
> 
> 
>  On 15/09/2021 13:26, Christophe LYON via Gcc-patches wrote:
> >
> > On 15/09/2021 13:02, Richard Earnshaw wrote:
> >>
> >>
> >> On 26/08/2021 16:53, Christophe Lyon via Gcc-patches wrote:
> >>> g++.dg/eh/arm-vfp-unwind.C uses an asm statement relying on
> >>> double-precision FPU support, but does not make sure it is actually
> >>> supported by the target.
> >>> Check (__ARM_FP & 8) to ensure this.
> >>>
> >>> 2021-08-26  Christophe Lyon 
> >>>
> >>>  gcc/testsuite/
> >>>  * g++.dg/eh/arm-vfp-unwind.C: Check __ARM_FP.
> >>> ---
> >>>gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> index 62263c0c3b0..90d20081d78 100644
> >>> --- a/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> +++ b/gcc/testsuite/g++.dg/eh/arm-vfp-unwind.C
> >>> @@ -3,7 +3,7 @@
> >>>  /* Test to catch off-by-one errors in arm/pr-support.c.  */
> >>>-#if defined (__VFP_FP__) && !defined (__SOFTFP__)
> >>> +#if defined (__VFP_FP__) && !defined (__SOFTFP__) && (__ARM_FP &
> 8)
> >>>  #include 
> >>>#include 
> >>>
> >>
> >> Wouldn't it be better to have an alternate to the asm for the case
> >> where we only have single-precision float?  Something like
> >> (untested):
> >>
> >> static void donkey ()
> >> {
> >> #if __ARM_FP & 8
> >>asm volatile ("fcpyd d9, %P0" : : "w" (1.2345) : "d9");
> >> #else
> >>asm volatile ("fcpys s18, %P0" : : "w" (1.2345f) : "s18");
> >> #endif
> >>throw 1;
> >> }
> >
> >
> > I tried similar things but they failed on some testing
> configurations.
> >
> > Let me try your version, I'll let you know if there is any fallout.
> 
>  Of course, the asm syntax should be converted to the new 'unified
>  syntax' form ie vmov.f{32,64}.
> 
> 
> >>> The problem is that %P expects a double-precision register.
> >>> It seems there's nothing to print a single-precision register, or
> >>> rather %p
> >>> (small p)
> >>> rejects s18 too.
> >>>
> >>>
> >> I said it was untested :)
> >
> > In fact, I now remember I tried similar things and everything failed,
> > hence my proposal at the start of this thread :-)
> >
> >
> >>
> >> You want something like
> >>
> >> #if __ARM_FP & 8
> >> asm volatile ("vmov.f64 d9, %P0" : : "w" (1.2345) : "d9");
> >> #else
> >> asm volatile ("vmov.f32 s18, %0" : : "t" (1.2345f) : "s18");
> >> #endif
> >>
> >> (there's no need for a modifier on the single-precision register name).
> >
> > Ha! I missed the magic "t".
> >
> > I confirm this fixes the issues that motivated my original patch.
> >
> > Do you want me to commit it?
>
> Yes, please.
>
> Ack, done as  r12-3571-g8e2c293f02745d47948fff19615064e4b34c1776

R.
>
> >
> >
> > Thanks
> >
> > Christophe
> >
> >
> >>
> >>>
>  R.
> 
> >
> > Christophe
> >
> >
> >>
> >> R.
> 
>


[PATCH 0/2 v3] New target hook TARGET_COMPUTE_MULTILIB and implementation for RISC-V

2021-09-16 Thread Kito Cheng
This patch set allow target to use customized multi-lib mechanism rather than 
the built-in
multi-lib mechanism.

The motivation of this patch is RISC-V might have very complicated multi-lib 
re-use
rule*, which is hard to maintain and use current multi-lib scripts,
we even hit the "argument list too long" error when we tried to add more
multi-lib reuse rule.

* Here is an example for RISC-V multi-lib rules:
https://gist.github.com/kito-cheng/0289cd42d9a756382e5afeb77b42b73b

V3 Changes:
- Doc fix for the first patch.
- Fix lots of typo.
- Rewrite multi-lib option parsing in riscv_compute_multilib.
- Rewrite riscv_check_conds (was riscv_check_other_cond).

V2 Changes:
- NO changes for the first patch(TARGET_COMPUTE_MULTILIB part) since the first 
version.
- Handle option other than -march and -mabi for riscv_compute_multilib.




[PATCH v3 1/2] Add TARGET_COMPUTE_MULTILIB hook to override multi-lib result.

2021-09-16 Thread Kito Cheng
Create a new hook to let target could override the multi-lib result,
the motivation is RISC-V might have very complicated multi-lib re-use
rule*, which is hard to maintain and use current multi-lib scripts,
we even hit the "argument list too long" error when we tried to add more
multi-lib reuse rule.

So I think it would be great to have a target specific way to determine
the multi-lib re-use rule, then we could write those rule in C, instead
of expand every possible case in MULTILIB_REUSE.

* Here is an example for RISC-V multi-lib rules:
https://gist.github.com/kito-cheng/0289cd42d9a756382e5afeb77b42b73b

gcc/ChangeLog:

* common/common-target.def (compute_multilib): New.
* common/common-targhooks.c (default_compute_multilib): New.
* doc/tm.texi.in (TARGET_COMPUTE_MULTILIB): New.
* doc/tm.texi: Regen.
* gcc.c: Include common/common-target.h.
(set_multilib_dir) Call targetm_common.compute_multilib.
(SWITCH_LIVE): Move to opts.h.
(SWITCH_FALSE): Ditto.
(SWITCH_IGNORE): Ditto.
(SWITCH_IGNORE_PERMANENTLY): Ditto.
(SWITCH_KEEP_FOR_GCC): Ditto.
(struct switchstr): Ditto.
* opts.h (SWITCH_LIVE): Move from gcc.c.
(SWITCH_FALSE): Ditto.
(SWITCH_IGNORE): Ditto.
(SWITCH_IGNORE_PERMANENTLY): Ditto.
(SWITCH_KEEP_FOR_GCC): Ditto.
(struct switchstr): Ditto.
---
 gcc/common/common-target.def  | 25 ++
 gcc/common/common-targhooks.c | 15 +++
 gcc/doc/tm.texi   | 17 +
 gcc/doc/tm.texi.in|  3 +++
 gcc/gcc.c | 48 +--
 gcc/opts.h| 36 ++
 6 files changed, 108 insertions(+), 36 deletions(-)

diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
index f54590a2a54..b7cf713770c 100644
--- a/gcc/common/common-target.def
+++ b/gcc/common/common-target.def
@@ -84,6 +84,31 @@ The result will be pruned to cases with PREFIX if not NULL.",
  vec, (int option_code, const char *prefix),
  default_get_valid_option_values)
 
+DEFHOOK
+(compute_multilib,
+ "Some targets like RISC-V might have complicated multilib reuse rules which\n\
+are hard to implement with the current multilib scheme.  This hook allows\n\
+targets to override the result from the built-in multilib mechanism.\n\
+@var{switches} is the raw option list with @var{n_switches} items;\n\
+@var{multilib_dir} is the multi-lib result which is computed by the built-in\n\
+multi-lib mechanism;\n\
+@var{multilib_defaults} is the default options list for multi-lib;\n\
+@var{multilib_select} is the string containing the list of supported\n\
+multi-libs, and the option checking list.\n\
+@var{multilib_matches}, @var{multilib_exclusions}, and @var{multilib_reuse}\n\
+are corresponding to @var{MULTILIB_MATCHES}, @var{MULTILIB_EXCLUSIONS},\n\
+and @var{MULTILIB_REUSE}.\n\
+The default definition does nothing but return @var{multilib_dir} directly.",
+ const char *, (const struct switchstr *switches,
+   int n_switches,
+   const char *multilib_dir,
+   const char *multilib_defaults,
+   const char *multilib_select,
+   const char *multilib_matches,
+   const char *multilib_exclusions,
+   const char *multilib_reuse),
+ default_compute_multilib)
+
 /* Leave the boolean fields at the end.  */
 
 /* True if unwinding tables should be generated by default.  */
diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
index 325f199bff3..1477aeeb536 100644
--- a/gcc/common/common-targhooks.c
+++ b/gcc/common/common-targhooks.c
@@ -90,3 +90,18 @@ const struct default_options empty_optimization_table[] =
   {
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
+
+/* Default version of TARGET_COMPUTE_MULTILIB.  */
+const char *
+default_compute_multilib(
+  const struct switchstr *,
+  int,
+  const char *multilib,
+  const char *,
+  const char *,
+  const char *,
+  const char *,
+  const char *)
+{
+  return multilib;
+}
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index be8148583d8..6f1e0293b6c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -778,6 +778,23 @@ options are changed via @code{#pragma GCC optimize} or by 
using the
 Set target-dependent initial values of fields in @var{opts}.
 @end deftypefn
 
+@deftypefn {Common Target Hook} {const char *} TARGET_COMPUTE_MULTILIB (const 
struct switchstr *@var{switches}, int @var{n_switches}, const char 
*@var{multilib_dir}, const char *@var{multilib_defaults}, const char 
*@var{multilib_select}, const char *@var{multilib_matches}, const char 
*@var{multilib_exclusions}, const char *@var{multilib_reuse})
+Some targets like RISC-V might have complicated multilib reuse rules which
+are hard to implement with the current multilib scheme.  This hook allows
+targets to override the result from the built-in multilib mechanism.
+@var{switches} i

[PATCH v3 2/2] RISC-V: Implement TARGET_COMPUTE_MULTILIB

2021-09-16 Thread Kito Cheng
Use TARGET_COMPUTE_MULTILIB to search the multi-lib reuse for riscv*-*-elf*,
according following rules:

 1. Check ABI is same.
 2. Check both has atomic extension or both don't have atomic extension.
- Because mix soft and hard atomic operation doesn't make sense and
  won't work as expect.
 3. Check current arch is superset of the target multi-lib arch.
- It might result slower performance or larger code size, but it
  safe to run.
 4. Pick most match multi-lib set if more than one multi-lib are pass
the above checking.

Example for how to select multi-lib:
  We build code with -march=rv32imaf and -mabi=ilp32, and we have
  following 5 multi-lib set:

1. rv32ia/ilp32
2. rv32ima/ilp32
3. rv32imf/ilp32
4. rv32imaf/ilp32f
5. rv32imafd/ilp32

  The first and second multi-lib is safe to like, 3rd multi-lib can't
  re-use becasue it don't have atomic extension, which is mismatch according
  rule 2, and the 4th multi-lib can't re-use too due to the ABI mismatch,
  the last multi-lib can't use since current arch is not superset of the
  arch of multi-lib.

And emit error if not found suitable multi-lib set, the error message
only emit when link with standard libraries.

Example for when error will be emitted:

  $ riscv64-unknown-elf-gcc -print-multi-lib
  .;
  rv32i/ilp32;@march=rv32i@mabi=ilp32
  rv32im/ilp32;@march=rv32im@mabi=ilp32
  rv32iac/ilp32;@march=rv32iac@mabi=ilp32
  rv32imac/ilp32;@march=rv32imac@mabi=ilp32
  rv32imafc/ilp32f;@march=rv32imafc@mabi=ilp32f
  rv64imac/lp64;@march=rv64imac@mabi=lp64

  // No actual linking, so no error emitted.
  $ riscv64-unknown-elf-gcc -print-multi-directory -march=rv32ia -mabi=ilp32
  .

  // Link to default libc and libgcc, so check the multi-lib, and emit
  // error because not found suitable multilib.
  $ riscv64-unknown-elf-gcc -march=rv32ia -mabi=ilp32 ~/hello.c
  riscv64-unknown-elf-gcc: fatal error: can't found suitable multilib set for 
'-march=rv32ia'/'-mabi=ilp32'
  compilation terminated.

  // No error emitted, because not link to stdlib.
  $ riscv64-unknown-elf-gcc -march=rv32ia -mabi=ilp32 ~/hello.c -nostdlib

  // No error emitted, because compile only.
  $ riscv64-unknown-elf-gcc -march=rv32ia -mabi=ilp32 ~/hello.c -c

gcc/ChangeLog:

* common/config/riscv/riscv-common.c: Include .
(struct riscv_multi_lib_info_t): New.
(riscv_subset_list::match_score): Ditto.
(find_last_appear_switch): Ditto.
(prefixed_with): Ditto.
(struct multi_lib_info_t): Ditto.
(riscv_current_arch_str): Ditto.
(riscv_current_abi_str): Ditto.
(riscv_multi_lib_info_t::parse): Ditto.
(riscv_check_cond): Ditto.
(riscv_check_conds): Ditto.
(riscv_compute_multilib): Ditto.
(TARGET_COMPUTE_MULTILIB): Defined.
* config/riscv/elf.h (LIB_SPEC): Call riscv_multi_lib_check if
doing link.
(RISCV_USE_CUSTOMISED_MULTI_LIB): New.
* config/riscv/riscv.h (riscv_multi_lib_check): New.
(EXTRA_SPEC_FUNCTIONS): Add riscv_multi_lib_check.
---
 gcc/common/config/riscv/riscv-common.c | 377 +
 gcc/config/riscv/elf.h |   6 +-
 gcc/config/riscv/riscv-subset.h|   2 +
 gcc/config/riscv/riscv.h   |   4 +-
 4 files changed, 387 insertions(+), 2 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 10868fd417d..d87418c02a6 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -18,6 +18,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include 
+#include 
 
 #define INCLUDE_STRING
 #include "config.h"
@@ -122,6 +123,26 @@ const riscv_subset_list *riscv_current_subset_list ()
   return current_subset_list;
 }
 
+/* struct for recording multi-lib info.  */
+struct riscv_multi_lib_info_t {
+  std::string path;
+  std::string arch_str;
+  std::string abi_str;
+  std::vector conds;
+  riscv_subset_list *subset_list;
+
+  static bool parse (struct riscv_multi_lib_info_t *,
+const std::string &,
+const std::vector &);
+};
+
+/* Flag for checking if there is no suitable multi-lib found.  */
+static bool riscv_no_matched_multi_lib;
+
+/* Used for record value of -march and -mabi.  */
+static std::string riscv_current_arch_str;
+static std::string riscv_current_abi_str;
+
 riscv_subset_t::riscv_subset_t ()
   : name (), major_version (0), minor_version (0), next (NULL),
 explicit_version_p (false), implied_p (false)
@@ -147,6 +168,42 @@ riscv_subset_list::~riscv_subset_list ()
 }
 }
 
+/* Compute the match score of two arch string, return 0 if incompatible.  */
+int
+riscv_subset_list::match_score (riscv_subset_list *list) const
+{
+  riscv_subset_t *s;
+  int score = 0;
+  bool has_a_ext, list_has_a_ext;
+
+  /* Impossible to match if XLEN is different.  */
+  if (list->m_xlen != this

[PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Richard Biener via Gcc-patches
This avoids using native_interpret_type when we cannot do it with
the original type of the variable, instead use an integer type
for the initialization and side-step the size limitation of
native_interpret_int.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress
(note the reported ICE happens on aarch64 only)

Richard.

2021-09-16  Richard Biener  

PR middle-end/102360
* internal-fn.c (expand_DEFERRED_INIT): Make pattern-init
of non-memory more robust.

* g++.dg/pr102360.C: New testcase.
---
 gcc/internal-fn.c   | 24 ++-
 gcc/testsuite/g++.dg/pr102360.C | 54 +
 2 files changed, 63 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr102360.C

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index b1283690080..842e320c31d 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3045,23 +3045,17 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
 
   if (init_type == AUTO_INIT_PATTERN)
{
- tree alt_type = NULL_TREE;
- if (!can_native_interpret_type_p (var_type))
-   {
- alt_type
-   = lang_hooks.types.type_for_mode (TYPE_MODE (var_type),
- TYPE_UNSIGNED (var_type));
- gcc_assert (can_native_interpret_type_p (alt_type));
-   }
-
  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
  memset (buf, INIT_PATTERN_VALUE, total_bytes);
- init = native_interpret_expr (alt_type ? alt_type : var_type,
-   buf, total_bytes);
- gcc_assert (init);
-
- if (alt_type)
-   init = build1 (VIEW_CONVERT_EXPR, var_type, init);
+ if (can_native_interpret_type_p (var_type))
+   init = native_interpret_expr (var_type, buf, total_bytes);
+ else
+   {
+ tree itype = build_nonstandard_integer_type (total_bytes * 8, 1);
+ wide_int w = wi::from_buffer (buf, total_bytes);
+ init = build1 (VIEW_CONVERT_EXPR, var_type,
+wide_int_to_tree (itype, w));
+   }
}
 
   expand_assignment (lhs, init, false);
diff --git a/gcc/testsuite/g++.dg/pr102360.C b/gcc/testsuite/g++.dg/pr102360.C
new file mode 100644
index 000..fdf9e08b283
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr102360.C
@@ -0,0 +1,54 @@
+// { dg-do compile }
+// { dg-options "-fno-tree-dse -O1 -ftrivial-auto-var-init=pattern" }
+
+class A;
+template  class B {
+public:
+  _Tp val[m * n];
+};
+class C {
+public:
+  C(A);
+};
+struct D {
+  D();
+  unsigned long &operator[](int);
+  unsigned long *p;
+};
+class A {
+public:
+  template  A(const B<_Tp, m, n> &, bool);
+  int rows, cols;
+  unsigned char *data;
+  unsigned char *datastart;
+  unsigned char *dataend;
+  unsigned char *datalimit;
+  D step;
+};
+template 
+A::A(const B<_Tp, m, n> &p1, bool)
+: rows(m), cols(n) {
+  step[0] = cols * sizeof(_Tp);
+  datastart = data = (unsigned char *)p1.val;
+  datalimit = dataend = datastart + rows * step[0];
+}
+class F {
+public:
+  static void compute(C);
+  template 
+  static void compute(const B<_Tp, m, n> &, B<_Tp, nm, 1> &, B<_Tp, m, nm> &,
+  B<_Tp, n, nm> &);
+};
+D::D() {}
+unsigned long &D::operator[](int p1) { return p[p1]; }
+template 
+void F::compute(const B<_Tp, m, n> &, B<_Tp, nm, 1> &, B<_Tp, m, nm> &,
+B<_Tp, n, nm> &p4) {
+  A a(p4, false);
+  compute(a);
+}
+void fn1() {
+  B b, c, e;
+  B d;
+  F::compute(b, d, c, e);
+}
-- 
2.31.1


[PATCH 1/N] Rename asm_out_file function arguments.

2021-09-16 Thread Martin Liška

As preparation for a new global object that will encapsulate
asm_out_file, we would need to live with a macro that will
define asm_out_file as casm->out_file and thus the name
can't be used in function arguments.

I've built all cross compilers with the change and
can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom 58c7c7f5ecf45f2f227f0792c9fdd24d4a7b59a6 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 15 Sep 2021 15:49:02 +0200
Subject: [PATCH 1/3] Rename asm_out_file function arguments.

As preparation for a new global object that will encapsulate
asm_out_file, we would need to live with a macro that will
define asm_out_file as casm->out_file and thus the name
can't be used in function arguments.

gcc/ChangeLog:

	* config/arm/arm.c (arm_unwind_emit_sequence): Do not declare
	already declared global variable.
	(arm_unwind_emit_set): Use out_file as function argument.
	(arm_unwind_emit): Likewise.
	* config/darwin.c (machopic_output_data_section_indirection): Likewise.
	(machopic_output_stub_indirection): Likewise.
	(machopic_output_indirection): Likewise.
	(machopic_finish): Likewise.
	* config/i386/i386.c (ix86_asm_output_function_label): Likewise.
	* config/i386/winnt.c (i386_pe_seh_unwind_emit): Likewise.
	* config/ia64/ia64.c (process_epilogue): Likewise.
	(process_cfa_adjust_cfa): Likewise.
	(process_cfa_register): Likewise.
	(process_cfa_offset): Likewise.
	(ia64_asm_unwind_emit): Likewise.
	* config/s390/s390.c (s390_asm_output_function_label): Likewise.
---
 gcc/config/arm/arm.c| 46 ++---
 gcc/config/darwin.c | 34 +++---
 gcc/config/i386/i386.c  | 12 
 gcc/config/i386/winnt.c | 12 
 gcc/config/ia64/ia64.c  | 64 -
 gcc/config/s390/s390.c  | 46 ++---
 6 files changed, 106 insertions(+), 108 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6c6e77fab66..1a7b47d236e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -844,8 +844,6 @@ static char * minipool_startobj;
will be conditionalised if possible.  */
 static int max_insns_skipped = 5;
 
-extern FILE * asm_out_file;
-
 /* True if we are currently building a constant table.  */
 int making_const_table;
 
@@ -29452,7 +29450,7 @@ arm_dwarf_register_span (rtx rtl)
epilogue.  */
 
 static void
-arm_unwind_emit_sequence (FILE * asm_out_file, rtx p)
+arm_unwind_emit_sequence (FILE * out_file, rtx p)
 {
   int i;
   HOST_WIDE_INT offset;
@@ -29496,14 +29494,14 @@ arm_unwind_emit_sequence (FILE * asm_out_file, rtx p)
 	padlast = offset - 4;
   gcc_assert (padlast == 0 || padlast == 4);
   if (padlast == 4)
-	fprintf (asm_out_file, "\t.pad #4\n");
+	fprintf (out_file, "\t.pad #4\n");
   reg_size = 4;
-  fprintf (asm_out_file, "\t.save {");
+  fprintf (out_file, "\t.save {");
 }
   else if (IS_VFP_REGNUM (reg))
 {
   reg_size = 8;
-  fprintf (asm_out_file, "\t.vsave {");
+  fprintf (out_file, "\t.vsave {");
 }
   else
 /* Unknown register type.  */
@@ -29529,13 +29527,13 @@ arm_unwind_emit_sequence (FILE * asm_out_file, rtx p)
   gcc_assert (reg >= lastreg);
 
   if (i != 1)
-	fprintf (asm_out_file, ", ");
+	fprintf (out_file, ", ");
   /* We can't use %r for vfp because we need to use the
 	 double precision register names.  */
   if (IS_VFP_REGNUM (reg))
-	asm_fprintf (asm_out_file, "d%d", (reg - FIRST_VFP_REGNUM) / 2);
+	asm_fprintf (out_file, "d%d", (reg - FIRST_VFP_REGNUM) / 2);
   else
-	asm_fprintf (asm_out_file, "%r", reg);
+	asm_fprintf (out_file, "%r", reg);
 
   if (flag_checking)
 	{
@@ -29553,15 +29551,15 @@ arm_unwind_emit_sequence (FILE * asm_out_file, rtx p)
 	  offset += reg_size;
 	}
 }
-  fprintf (asm_out_file, "}\n");
+  fprintf (out_file, "}\n");
   if (padfirst)
-fprintf (asm_out_file, "\t.pad #%d\n", padfirst);
+fprintf (out_file, "\t.pad #%d\n", padfirst);
 }
 
 /*  Emit unwind directives for a SET.  */
 
 static void
-arm_unwind_emit_set (FILE * asm_out_file, rtx p)
+arm_unwind_emit_set (FILE * out_file, rtx p)
 {
   rtx e0;
   rtx e1;
@@ -29578,12 +29576,12 @@ arm_unwind_emit_set (FILE * asm_out_file, rtx p)
 	  || REGNO (XEXP (XEXP (e0, 0), 0)) != SP_REGNUM)
 	abort ();
 
-  asm_fprintf (asm_out_file, "\t.save ");
+  asm_fprintf (out_file, "\t.save ");
   if (IS_VFP_REGNUM (REGNO (e1)))
-	asm_fprintf(asm_out_file, "{d%d}\n",
+	asm_fprintf(out_file, "{d%d}\n",
 		(REGNO (e1) - FIRST_VFP_REGNUM) / 2);
   else
-	asm_fprintf(asm_out_file, "{%r}\n", REGNO (e1));
+	asm_fprintf(out_file, "{%r}\n", REGNO (e1));
   break;
 
 case REG:
@@ -29596,7 +29594,7 @@ arm_unwind_emit_set (FILE * asm_out_file, rtx p)
 	  || !CONST_INT_P (XEXP (e1, 1)))
 	abort ();
 
-	  asm_fprintf (asm_out_file, "\t.pad #%wd\n",
+	  asm_fprintf (out_file, "\t.pad #%wd\n",
 		   -INTVAL (XEXP (e1, 1))

[PATCH 2/N] Do not hide asm_out_file in ASM_OUTPUT_ASCII.

2021-09-16 Thread Martin Liška

Again a preparation patch that was tested on all cross compilers.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom 0e5095274bb4e16ad28a5a52f30bd3887df25fde Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 15 Sep 2021 13:52:35 +0200
Subject: [PATCH 2/3] Do not hide asm_out_file in ASM_OUTPUT_ASCII.

gcc/ChangeLog:

	* defaults.h (ASM_OUTPUT_ASCII): Do not hide global variable
	asm_out_file and stream directly to MYFILE.
---
 gcc/defaults.h | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/gcc/defaults.h b/gcc/defaults.h
index ba79a8e48ed..9370fa12f96 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -61,36 +61,34 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #ifndef ASM_OUTPUT_ASCII
 #define ASM_OUTPUT_ASCII(MYFILE, MYSTRING, MYLENGTH) \
   do {	  \
-FILE *_hide_asm_out_file = (MYFILE);  \
 const unsigned char *_hide_p = (const unsigned char *) (MYSTRING);	  \
 int _hide_thissize = (MYLENGTH);	  \
 {	  \
-  FILE *asm_out_file = _hide_asm_out_file;  \
   const unsigned char *p = _hide_p;	  \
   int thissize = _hide_thissize;	  \
   int i;  \
-  fprintf (asm_out_file, "\t.ascii \"");  \
+  fprintf (MYFILE, "\t.ascii \"");	  \
 	  \
   for (i = 0; i < thissize; i++)	  \
 	{  \
 	  int c = p[i];			     \
 	  if (c == '\"' || c == '\\')	  \
-	putc ('\\', asm_out_file);	  \
+	putc ('\\', MYFILE);	  \
 	  if (ISPRINT (c))		  \
-	putc (c, asm_out_file);	  \
+	putc (c, MYFILE);		  \
 	  else  \
 	{  \
-	  fprintf (asm_out_file, "\\%o", c);			  \
+	  fprintf (MYFILE, "\\%o", c);  \
 	  /* After an octal-escape, if a digit follows,		  \
 		 terminate one string constant and start another.	  \
 		 The VAX assembler fails to stop reading the escape	  \
 		 after three digits, so this is the only way we		  \
 		 can get it to parse the data properly.  */		  \
 	  if (i < thissize - 1 && ISDIGIT (p[i + 1]))		  \
-		fprintf (asm_out_file, "\"\n\t.ascii \"");		  \
+		fprintf (MYFILE, "\"\n\t.ascii \"");			  \
 	  }  \
 	}  \
-  fprintf (asm_out_file, "\"\n");	  \
+  fprintf (MYFILE, "\"\n");		  \
 }	  \
   }	  \
   while (0)
-- 
2.33.0



Re: [PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 16, 2021 at 11:48:49AM +0200, Richard Biener via Gcc-patches wrote:
> 2021-09-16  Richard Biener  
> 
>   PR middle-end/102360
>   * internal-fn.c (expand_DEFERRED_INIT): Make pattern-init
>   of non-memory more robust.
> 
>   * g++.dg/pr102360.C: New testcase.
> +   if (can_native_interpret_type_p (var_type))
> + init = native_interpret_expr (var_type, buf, total_bytes);
> +   else
> + {
> +   tree itype = build_nonstandard_integer_type (total_bytes * 8, 1);

Shouldn't that 8 be BITS_PER_UNIT ?
I know we have tons of problems with BITS_PER_UNIT is not 8, but adding
further ones is unnecessary.

Jakub



Re: [PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Richard Biener via Gcc-patches
On Thu, 16 Sep 2021, Jakub Jelinek wrote:

> On Thu, Sep 16, 2021 at 11:48:49AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > 2021-09-16  Richard Biener  
> > 
> > PR middle-end/102360
> > * internal-fn.c (expand_DEFERRED_INIT): Make pattern-init
> > of non-memory more robust.
> > 
> > * g++.dg/pr102360.C: New testcase.
> > + if (can_native_interpret_type_p (var_type))
> > +   init = native_interpret_expr (var_type, buf, total_bytes);
> > + else
> > +   {
> > + tree itype = build_nonstandard_integer_type (total_bytes * 8, 1);
> 
> Shouldn't that 8 be BITS_PER_UNIT ?
> I know we have tons of problems with BITS_PER_UNIT is not 8, but adding
> further ones is unnecessary.

Well, a byte is 8 bits and we do

  unsigned HOST_WIDE_INT total_bytes
= tree_to_uhwi (TYPE_SIZE_UNIT (var_type));

  if (init_type == AUTO_INIT_PATTERN)
{
  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
  memset (buf, INIT_PATTERN_VALUE, total_bytes);

and thus mix host and target here.  I suppose it should be instead

   unsigned HOST_WIDE_INT total_bytes
 = tree_to_uhwi (TYPE_SIZE (var_type)) / (BITS_PER_UNIT / 8);

or so...  in this light * 8 for the build_nonstandard_integer_type
use is correct, no?  If total_bytes is really _bytes_.

Richard.


Re: [PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 16, 2021 at 12:41:20PM +0200, Richard Biener wrote:
> On Thu, 16 Sep 2021, Jakub Jelinek wrote:
> 
> > On Thu, Sep 16, 2021 at 11:48:49AM +0200, Richard Biener via Gcc-patches 
> > wrote:
> > > 2021-09-16  Richard Biener  
> > > 
> > >   PR middle-end/102360
> > >   * internal-fn.c (expand_DEFERRED_INIT): Make pattern-init
> > >   of non-memory more robust.
> > > 
> > >   * g++.dg/pr102360.C: New testcase.
> > > +   if (can_native_interpret_type_p (var_type))
> > > + init = native_interpret_expr (var_type, buf, total_bytes);
> > > +   else
> > > + {
> > > +   tree itype = build_nonstandard_integer_type (total_bytes * 8, 1);
> > 
> > Shouldn't that 8 be BITS_PER_UNIT ?
> > I know we have tons of problems with BITS_PER_UNIT is not 8, but adding
> > further ones is unnecessary.
> 
> Well, a byte is 8 bits and we do
> 
>   unsigned HOST_WIDE_INT total_bytes
> = tree_to_uhwi (TYPE_SIZE_UNIT (var_type));
> 
>   if (init_type == AUTO_INIT_PATTERN)
> {
>   unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
>   memset (buf, INIT_PATTERN_VALUE, total_bytes);
> 
> and thus mix host and target here.  I suppose it should be instead
> 
>unsigned HOST_WIDE_INT total_bytes
>  = tree_to_uhwi (TYPE_SIZE (var_type)) / (BITS_PER_UNIT / 8);
> 
> or so...  in this light * 8 for the build_nonstandard_integer_type
> use is correct, no?  If total_bytes is really _bytes_.

Typically for the native_interpret/native_encode we punt if
BITS_PER_UNIT != 8 || CHAR_BIT != 8 because nobody had the energy
to deal with the weird platforms (especially if we have currently
none, I believe dsp16xx that had 16-bit bytes had been removed in 4.0
and c4x that had 32-bit bytes had been removed in 4.3)
- dunno if the DEFERRED_INIT etc. code has those guards or not
and it clearly documents that this code is not ready for other
configurations.
A byte is not necessarily 8 bits, that is just the most common
size for it, and TYPE_SIZE_UNIT is number of BITS_PER_UNIT bit units.

Jakub



Re: [PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Richard Biener via Gcc-patches
On Thu, 16 Sep 2021, Jakub Jelinek wrote:

> On Thu, Sep 16, 2021 at 12:41:20PM +0200, Richard Biener wrote:
> > On Thu, 16 Sep 2021, Jakub Jelinek wrote:
> > 
> > > On Thu, Sep 16, 2021 at 11:48:49AM +0200, Richard Biener via Gcc-patches 
> > > wrote:
> > > > 2021-09-16  Richard Biener  
> > > > 
> > > > PR middle-end/102360
> > > > * internal-fn.c (expand_DEFERRED_INIT): Make pattern-init
> > > > of non-memory more robust.
> > > > 
> > > > * g++.dg/pr102360.C: New testcase.
> > > > + if (can_native_interpret_type_p (var_type))
> > > > +   init = native_interpret_expr (var_type, buf, total_bytes);
> > > > + else
> > > > +   {
> > > > + tree itype = build_nonstandard_integer_type (total_bytes 
> > > > * 8, 1);
> > > 
> > > Shouldn't that 8 be BITS_PER_UNIT ?
> > > I know we have tons of problems with BITS_PER_UNIT is not 8, but adding
> > > further ones is unnecessary.
> > 
> > Well, a byte is 8 bits and we do
> > 
> >   unsigned HOST_WIDE_INT total_bytes
> > = tree_to_uhwi (TYPE_SIZE_UNIT (var_type));
> > 
> >   if (init_type == AUTO_INIT_PATTERN)
> > {
> >   unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
> >   memset (buf, INIT_PATTERN_VALUE, total_bytes);
> > 
> > and thus mix host and target here.  I suppose it should be instead
> > 
> >unsigned HOST_WIDE_INT total_bytes
> >  = tree_to_uhwi (TYPE_SIZE (var_type)) / (BITS_PER_UNIT / 8);
> > 
> > or so...  in this light * 8 for the build_nonstandard_integer_type
> > use is correct, no?  If total_bytes is really _bytes_.
> 
> Typically for the native_interpret/native_encode we punt if
> BITS_PER_UNIT != 8 || CHAR_BIT != 8 because nobody had the energy
> to deal with the weird platforms (especially if we have currently
> none, I believe dsp16xx that had 16-bit bytes had been removed in 4.0
> and c4x that had 32-bit bytes had been removed in 4.3)
> - dunno if the DEFERRED_INIT etc. code has those guards or not
> and it clearly documents that this code is not ready for other
> configurations.
> A byte is not necessarily 8 bits, that is just the most common
> size for it, and TYPE_SIZE_UNIT is number of BITS_PER_UNIT bit units.

OK, I'll do s/8/BITS_PER_UNIT/ - I also see that we have
int_size_in_bytes returning TYPE_SIZE_UNIT and that TYPE_SIZE_UNIT
is documented to yeild the type size in 'bytes'.

I do believe that we should officially declare hosts with CHAR_BIT != 8
as unsupported and as you say support for targets with BITS_PER_UNIT != 8
is likely bit-rotten.

Richard.


Re: [PATCH] coroutines: Small cleanups to await_statement_walker [NFC].

2021-09-16 Thread Iain Sandoe

> On 15 Sep 2021, at 20:50, Jason Merrill  wrote:
> On 9/15/21 14:32, Iain Sandoe wrote:
>> Hi Jason,
>>> On 15 Sep 2021, at 18:32, Jason Merrill  wrote:
>>> 
>>> On 9/14/21 11:36, Iain Sandoe wrote:
 Hi
 Some small code cleanups that allow us to have just one place that
 we handle a statement with await expression(s) embedded.  Also we
 can reduce the work done to figure out whether a statement contains
 any such expressions.
 tested on x86_64,powerpc64le-linux x86_64-darwin

>>> What's the rationale for this assert?  [expr.await] seems to say explicitly 
>>> that an await can appear in the initializer of an init-statement.
>> Indeed (and we would not expect otherwise)
>>  - but currently GCC appears to generate code for:
>> for (loop_ind_var = init; … ; …) {}
>>   that looks like:
>>   loop_ind_var = init;
>>   for (; … ; …) {}
>> If that changes (and the init contains an await expr) then we’d need to 
>> apply that transform manually, so the assert is in place to check that the 
>> assumption about existing behaviour is met.
> 
> Then the patch is OK with that rationale in a comment.

thanks.
this is what was pushed:


0001-coroutines-Small-cleanups-to-await_statement_walker-.patch
Description: Binary data


[PATCH] tree-optimization/65206 - dependence analysis on mixed pointer/array

2021-09-16 Thread Richard Biener via Gcc-patches
This adds the capability to analyze the dependence of mixed
pointer/array accesses.  The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based.  Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.

The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based.  The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference.  That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.

There are quite some vectorization and loop distribution opportunities
unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
544.nab_r see amendments in what they report with -fopt-info-loop while
the rest of the specrate set sees no changes there.  Measuring runtime
for the set where changes were reported reveals nothing off-noise
besides 511.povray_r which seems to regress slightly for me
(on a Zen2 machine with -Ofast -march=native).

Changes from the [RFC] version are properly handling bitfields
that we cannot take the address of and optimization of refs
that already are MEM_REFs and thus won't see any change.  I've
also elided changing the set of vect_masked_stores targets in
favor of explicitely listing avx (but I did not verify if the
testcase passes on aarch64-sve or amdgcn).

The improves cases like the following from Povray:

   for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
 {
VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
 }

where there is a plain array access mixed with abstraction
using T[] or T* arguments.  That should be a not too uncommon
situation in the wild.  The loop above is now vectorized and was not
without the change.

Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
built and run SPEC CPU 2017 successfully.

OK?

Thanks,
Richard.

2021-09-08  Richard Biener  

PR tree-optimization/65206
* tree-data-ref.h (struct data_reference): Add alt_indices,
order it last.
* tree-data-ref.c (dr_analyze_indices): Work on
struct indices and get DR_REF as tree.
(create_data_ref): Adjust.
(initialize_data_dependence_relation): Split into head
and tail.  When the base objects fail to match up try
again with pointer-based analysis of indices.
* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
not compare the lazily computed alternate set of indices.

    * gcc.dg/torture/20210916.c: New testcase.
* gcc.dg/vect/pr65206.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
 gcc/testsuite/gcc.dg/vect/pr65206.c |  22 +++
 gcc/tree-data-ref.c | 173 
 gcc/tree-data-ref.h |   9 +-
 gcc/tree-vectorizer.c   |   3 +-
 5 files changed, 167 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c

diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c 
b/gcc/testsuite/gcc.dg/torture/20210916.c
new file mode 100644
index 000..0ea6d45e463
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/20210916.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+typedef union tree_node *tree;
+struct tree_base {
+  unsigned : 1;
+  unsigned lang_flag_2 : 1;
+};
+struct tree_type {
+  tree main_variant;
+};
+union tree_node {
+  struct tree_base base;
+  struct tree_type type;
+};
+tree finish_struct_t, finish_struct_x;
+void finish_struct()
+{
+  for (; finish_struct_t->type.main_variant;)
+finish_struct_x->base.lang_flag_2 = 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c 
b/gcc/testsuite/gcc.dg/vect/pr65206.c
new file mode 100644
index 000..3b6262622c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } 
*/
+/* { dg-additional-options "-mavx" { target avx } } */
+
+#define N 1024
+
+double a[N], b[N];
+
+void foo ()
+{
+  for (int i = 0; i < N; ++i)
+if (b[i] < 3.)
+  a[i] += b[i];
+}
+
+/* We get a .MASK_STORE because while the load of a[i] does not trap
+   the store would introduce st

PING – Re: [Patch] Fortran: Handle allocated() with coindexed scalars [PR93834] (was: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in trans_caf_is_present, at fortran/trans-intrinsic.c:8469)

2021-09-16 Thread Tobias Burnus

Patch PING – see comment in the follow-up email of the patch email - and
in the email(s) before in that thread.

Tobias

On 07.09.21 16:33, Tobias Burnus wrote:

Now I actually tested the patch – and fixed some issues.

OK? – It does add support for 'allocated(a[i])' by treating
it as 'allocated(a)', as 'a' must be collectively allocated
("established") on all images of the team.*

'a[i]' is (probably) an allocatable, following Malcolm in
answer to my question to the J3-list as linked below.

Tobias

* Ignoring issues related to failed images. It could
also be handled by fetching 'a' from the remote
image, but I am not sure that's better in terms of
handling failed images.

PS:
On 07.09.21 10:02, Tobias Burnus wrote:

Hi Harald,

I spend yesterday about two hours with this. Now I am still
tired but understand more. I think the confusion between the
two of us is due to wording and in which directions the
thoughts then go:


Talking about coindexed, all of a[i], b[i]%c and c%d[i] are
coindexed and there are many constraints like "shall not be
a coindexed variable" – which then rejects all of those.
That's what I was thinking of.

I think your starting point is that while ('a' = allocatable)
  a, b%a, c[5]%d(1)%a
are ALLOCATABLE, adding a subobject reference such as
  a(:), b%a(:,:), c[5]%d(1)%a(:,:,:)
makes the variable no longer allocatable.
I think that's what you were thinking of.

We then both argued along those different lines – which caused
the confusion as we both thought we talked about the same.


While those cases are clear, the question is whether
  a[i] or b%a[i]
is allocatable or not – assuming that 'a' is a scalar.
(For an array, '(:)' has to appear before the image-selector,
which in turn makes it nonallocatable.)


I tried to pinpoint the words for this in the standard – and
failed. I think I need a "how to read the Fortran standard" 101
and some long time actually reading it :-(

Malcolm has answered me – and he believes (but only offhand) that
  a[i]  and  b%a[i]
_are_ allocatable. See (6) at
https://mailman.j3-fortran.org/pipermail/j3/2021-September/013322.html


This implies that
  if ( allocated (a[i]) .and. allocated (b%a[i]) ) stop 1
is valid.

However, I do note that coarray allocatables have to be collectively
(de)allocated, therefore
  allocated (a[i]) .and. allocated (b%a[i])
is equivalent to
  allocated (a) .and. allocated (b%a)
at least assuming that no image has failed.


First: Does this answer all the questions you had and resolved the
confusion?
Secondly, do you agree about the last bits of the analysis?
Thirdly, what do you think of the attached patch?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-16 Thread Richard Biener via Gcc-patches
On Thu, 16 Sep 2021, Hongtao Liu wrote:

> On Thu, Sep 16, 2021 at 4:23 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, 16 Sep 2021, liuhongt wrote:
> >
> > > Ping
> > > rebased on latest trunk.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
> > >   * doc/invoke.texi (Options That Control Optimization): Update
> > >   documents.
> > >   * opts.c (default_options_table): Enable auto-vectorization at
> > >   O2 with very-cheap cost model.
> > >   (finish_options): Use cheap cost model for
> > >   explicit -ftree{,-loop}-vectorize.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
> > >   * g++.dg/tree-ssa/pr81408.C: Ditto.
> > >   * g++.dg/warn/Wuninitialized-13.C: Ditto.
> > >   * gcc.dg/Warray-bounds-51.c: Ditto.
> > >   * gcc.dg/Warray-parameter-3.c: Ditto.
> > >   * gcc.dg/Wstringop-overflow-13.c: Ditto.
> > >   * gcc.dg/Wstringop-overflow-14.c: Ditto.
> > >   * gcc.dg/Wstringop-overflow-21.c: Ditto.
> > >   * gcc.dg/Wstringop-overflow-68.c: Ditto.
> > >   * gcc.dg/gomp/pr46032-2.c: Ditto.
> > >   * gcc.dg/gomp/pr46032-3.c: Ditto.
> > >   * gcc.dg/gomp/simd-2.c: Ditto.
> > >   * gcc.dg/gomp/simd-3.c: Ditto.
> > >   * gcc.dg/graphite/fuse-1.c: Ditto.
> > >   * gcc.dg/pr67089-6.c: Ditto.
> > >   * gcc.dg/pr82929-2.c: Ditto.
> > >   * gcc.dg/pr82929.c: Ditto.
> > >   * gcc.dg/store_merging_1.c: Ditto.
> > >   * gcc.dg/store_merging_11.c: Ditto.
> > >   * gcc.dg/store_merging_15.c: Ditto.
> > >   * gcc.dg/store_merging_16.c: Ditto.
> > >   * gcc.dg/store_merging_19.c: Ditto.
> > >   * gcc.dg/store_merging_24.c: Ditto.
> > >   * gcc.dg/store_merging_25.c: Ditto.
> > >   * gcc.dg/store_merging_28.c: Ditto.
> > >   * gcc.dg/store_merging_30.c: Ditto.
> > >   * gcc.dg/store_merging_5.c: Ditto.
> > >   * gcc.dg/store_merging_7.c: Ditto.
> > >   * gcc.dg/store_merging_8.c: Ditto.
> > >   * gcc.dg/strlenopt-85.c: Ditto.
> > >   * gcc.dg/tree-ssa/dump-6.c: Ditto.
> > >   * gcc.dg/tree-ssa/pr19210-1.c: Ditto.
> > >   * gcc.dg/tree-ssa/pr47059.c: Ditto.
> > >   * gcc.dg/tree-ssa/pr86017.c: Ditto.
> > >   * gcc.dg/tree-ssa/pr91482.c: Ditto.
> > >   * gcc.dg/tree-ssa/predcom-1.c: Ditto.
> > >   * gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
> > >   * gcc.dg/tree-ssa/prefetch-3.c: Ditto.
> > >   * gcc.dg/tree-ssa/prefetch-6.c: Ditto.
> > >   * gcc.dg/tree-ssa/prefetch-8.c: Ditto.
> > >   * gcc.dg/tree-ssa/prefetch-9.c: Ditto.
> > >   * gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
> > >   * gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
> > >   * gcc.dg/uninit-40.c: Ditto.
> > >   * gcc.dg/unroll-7.c: Ditto.
> > >   * gcc.misc-tests/help.exp: Ditto.
> > >   * gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
> > >   * gcc.target/i386/pr22141.c: Ditto.
> > >   * gcc.target/i386/pr34012.c: Ditto.
> > >   * gcc.target/i386/pr49781-1.c: Ditto.
> > >   * gcc.target/i386/pr95798-1.c: Ditto.
> > >   * gcc.target/i386/pr95798-2.c: Ditto.
> > >   * gfortran.dg/pr77498.f: Ditto.
> > > ---
> > >  gcc/common.opt |  2 +-
> > >  gcc/doc/invoke.texi|  8 +---
> > >  gcc/opts.c | 18 +++---
> > >  .../c-c++-common/Wstringop-overflow-2.c|  2 +-
> > >  gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
> > >  gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
> > >  gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
> > >  gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
> > >  gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
> > >  gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
> > >  gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
> > >  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
> > >  gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
> > >  gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
> > >  gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
> > >  gcc/testsuite/gcc.dg/gomp/simd-3.c |  2 +-
> > >  gcc/testsuite/gcc.dg/graphite/fuse-1.c |  2 +-
> > >  gcc/testsuite/gcc.dg/pr67089-6.c   |  2 +-
> > >  gcc/testsuite/gcc.dg/pr82929-2.c   |  2 +-
> > >  gcc/testsuite/gcc.dg/pr82929.c |  2 +-
> > >  gcc/testsuite/gcc.dg/store_merging_1.c |  2 +-
> > >  gcc/testsuite/gcc.dg/store_merging_11.c|  2 +-
> > >  gcc/testsuite/gcc.dg/store_merging_15.c|  2 +-
> > >  gcc/testsuite/gcc.dg/store_merging_16.c|  2 +-
> > >  gcc/testsuite/gcc.dg/store_merging_19.c|  2 +-
> > >  gcc/testsuite/gcc.dg/store_merging_24.c|  2 +-
> > >  gcc/testsuite/gcc.dg/store_merging_25.c|  2 +-
> > >  gcc/testsuite/gcc.dg/store_mergin

[Patch] Fortran: Add gfc_simple_for_loop aux function

2021-09-16 Thread Tobias Burnus

This patch adds a useful auxiliary function to generate a loop.

I intent to use it for:
(A) An updated/cleaned-up version of
"[Patch] Fortran: Fix Bind(C) Array-Descriptor Conversion (Move to Front-End 
Code)"
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578904.html
→ new function used four times

(B) For the SIZE handling part of
  PR94070 - Assumed-rank arrays – bounds mishandled, 
SIZE/SHAPE/UBOUND/LBOUND
which I am currently writing (→ used once)

The main reason of splitting this patch off is to permit to be able to 
submit/commit
the two patches separately without having a code overlap.

However, in principle there is no reason that this patch cannot be reviewed 
and/or
committed separately instead of in the same review and/or commit with one of 
the other
patches.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Add gfc_simple_for_loop aux function

Function to generate a simple loop (to be used internally).
Callers will be added in follow-up commits.

gcc/fortran/
	* trans-expr.c (gfc_simple_for_loop): New.
	* trans.h (gfc_simple_for_loop): New prototype.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 18d665192f0..761f8c65234 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -11717,3 +11717,37 @@ gfc_trans_assign (gfc_code * code)
 {
   return gfc_trans_assignment (code->expr1, code->expr2, false, true);
 }
+
+/* Generate a simple loop for internal use of the form
+   for (var = begin; var  end; var += step)
+  body;  */
+void
+gfc_simple_for_loop (stmtblock_t *block, tree var, tree begin, tree end,
+		 enum tree_code cond, tree step, tree body)
+{
+  tree tmp;
+
+  /* var = begin. */
+  gfc_add_modify (block, var, begin);
+
+  /* Loop: for (var = begin; var  end; var += step).  */
+  tree label_loop = gfc_build_label_decl (NULL_TREE);
+  tree label_cond = gfc_build_label_decl (NULL_TREE);
+  TREE_USED (label_loop) = 1;
+  TREE_USED (label_cond) = 1;
+
+  gfc_add_expr_to_block (block, build1_v (GOTO_EXPR, label_cond));
+  gfc_add_expr_to_block (block, build1_v (LABEL_EXPR, label_loop));
+
+  /* Loop body.  */
+  gfc_add_expr_to_block (block, body);
+
+  /* End of loop body.  */
+  tmp = fold_build2_loc (input_location, PLUS_EXPR, TREE_TYPE (var), var, step);
+  gfc_add_modify (block, var, tmp);
+  gfc_add_expr_to_block (block, build1_v (LABEL_EXPR, label_cond));
+  tmp = fold_build2_loc (input_location, cond, boolean_type_node, var, end);
+  tmp = build3_v (COND_EXPR, tmp, build1_v (GOTO_EXPR, label_loop),
+		  build_empty_stmt (input_location));
+  gfc_add_expr_to_block (block, tmp);
+}
diff --git a/gcc/fortran/trans.h b/gcc/fortran/trans.h
index 78578cfd732..1b622fc1f2e 100644
--- a/gcc/fortran/trans.h
+++ b/gcc/fortran/trans.h
@@ -518,6 +518,8 @@ tree gfc_string_to_single_character (tree len, tree str, int kind);
 tree gfc_get_tree_for_caf_expr (gfc_expr *);
 void gfc_get_caf_token_offset (gfc_se*, tree *, tree *, tree, tree, gfc_expr *);
 tree gfc_caf_get_image_index (stmtblock_t *, gfc_expr *, tree);
+void gfc_simple_for_loop (stmtblock_t *, tree, tree, tree, enum tree_code, tree,
+			  tree);
 
 /* Find the decl containing the auxiliary variables for assigned variables.  */


[PATCH] c++: constrained variable template issues [PR98486]

2021-09-16 Thread Patrick Palka via Gcc-patches
This fixes some issues with constrained variable templates:

  * Constraints aren't checked when explicitly specializing a variable
template
  * Constraints aren't attached to a static data member template at
parse time
  * Constraints aren't propagated when (partially) instantiating a
static data member template

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
cmcstl2 and range-v3, does this look OK for trunk and perhaps 11?

PR c++/98486

gcc/cp/ChangeLog:

* decl.c (grokdeclarator): Set constraints on a static data
member template.
* pt.c (determine_specialization): Check constraints on a
variable template.
(tsubst_decl) : Propagate constraints on a
static data member template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-var-templ1.C: New test.
* g++.dg/cpp2a/concepts-var-templ1a.C: New test.
* g++.dg/cpp2a/concepts-var-templ1b.C: New test.
---
 gcc/cp/decl.c | 11 +++
 gcc/cp/pt.c   |  8 +++-
 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C  |  9 +
 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C | 14 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C | 15 +++
 5 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c0f1496636f..7beac79ec25 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13980,6 +13980,17 @@ grokdeclarator (const cp_declarator *declarator,
if (declspecs->gnu_thread_keyword_p)
  SET_DECL_GNU_TLS_P (decl);
  }
+
+   /* Set the constraints on declaration.  */
+   bool memtmpl = (processing_template_decl
+   > template_class_depth (current_class_type));
+   if (memtmpl)
+ {
+   tree tmpl_reqs
+ = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
+   tree ci = build_constraints (tmpl_reqs, NULL_TREE);
+   set_constraints (decl, ci);
+ }
  }
else
  {
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 224dd9ebd2b..613d87f2637 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -2218,7 +2218,8 @@ determine_specialization (tree template_id,
   targs = coerce_template_parms (parms, explicit_targs, fns,
 tf_warning_or_error,
 /*req_all*/true, /*use_defarg*/true);
-  if (targs != error_mark_node)
+  if (targs != error_mark_node
+ && constraints_satisfied_p (fns, targs))
 templates = tree_cons (targs, fns, templates);
 }
   else for (lkp_iterator iter (fns); iter; ++iter)
@@ -14920,6 +14921,11 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
if (DECL_NAMESPACE_SCOPE_P (t))
  DECL_NOT_REALLY_EXTERN (r) = 1;
 
+   /* Propagate the declaration's constraints.  */
+   if (VAR_P (r) && DECL_CLASS_SCOPE_P (r))
+ if (tree ci = get_constraints (t))
+   set_constraints (r, ci);
+
DECL_TEMPLATE_INFO (r) = build_template_info (tmpl, argvec);
SET_DECL_IMPLICIT_INSTANTIATION (r);
if (!error_operand_p (r) || (complain & tf_error))
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
new file mode 100644
index 000..80b48ba3a3d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
@@ -0,0 +1,9 @@
+// PR c++/98486
+// { dg-do compile { target c++20 } }
+
+template concept C = __is_same(T, U);
+
+template> int v;
+
+template<> int v;
+template<> int v; // { dg-error "match" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
new file mode 100644
index 000..b12d37d8b7e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
@@ -0,0 +1,14 @@
+// PR c++/98486
+// { dg-do compile { target c++20 } }
+
+template concept C = __is_same(T, U);
+
+struct A {
+  template> static int v;
+};
+
+template<> int A::v;
+template<> int A::v; // { dg-error "match" }
+
+int x = A::v;
+int y = A::v; // { dg-error "invalid" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C
new file mode 100644
index 000..37d7f0fc654
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C
@@ -0,0 +1,15 @@
+// PR c++/98486
+// { dg-do compile { target c++20 } }
+
+template concept C = __is_same(T, U);
+
+template
+struct A {
+  template> static int v;
+};
+
+

Re: [Patch] PowerPC: Fix rs6000-gen-builtins with build != host [PR102353]

2021-09-16 Thread Bill Schmidt via Gcc-patches
Thank you, Tobias!  This looks good to me and doesn't break 
host=target=build bootstrap.  I appreciate the patch very much.  (I 
can't approve it, so please wait for Segher/David to weigh in.)


Bill

On 9/16/21 4:07 AM, Tobias Burnus wrote:

As mentioned in https://gcc.gnu.org/PR102353 and in the patch,
rs6000-gen-builtins was build to be run on "host" – and then linked and
on on "build".

That caused bootstrap fails at link time.

The patch now does the same as Makefile.in for 'gen*', i.e. build under
build/ (using the Makefile.in rule), the linking is already the same as
for 'build/gen%' and for running, it runs it with valgrind if configured
(as gen* do). additionally, I added the exe extension var, in case it is
needed, following the gen* rules.

Tested with a x86_64-gnu-linux (build) → powerpc64le-linux-gnu (host,
target) build.

OK?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955




Re: [PATCH v2] ipa-inline: Add target info into fn summary [PR102059]

2021-09-16 Thread Martin Jambor
Hi,

On Thu, Sep 16 2021, Kewen.Lin wrote:
> Hi Martin,
>
> Thanks for the review comments!
>
> on 2021/9/15 下午8:51, Martin Jambor wrote:
>> Hi,
>> 
>> since this is inlining-related, I would somewhat prefer Honza to have a
>> look too, but I have the following comments:
>> 
>> On Wed, Sep 08 2021, Kewen.Lin wrote:
>>>
>> 
>> [...]
>> 
>>> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
>>> index 78399b0b9bb..300b8da4507 100644
>>> --- a/gcc/ipa-fnsummary.h
>>> +++ b/gcc/ipa-fnsummary.h
>>> @@ -193,6 +194,9 @@ public:
>>>vec *loop_strides;
>>>/* Parameters tested by builtin_constant_p.  */
>>>vec GTY((skip)) builtin_constant_p_parms;
>>> +  /* Like fp_expressions, but it's to hold some target specific 
>>> information,
>>> + such as some target specific isa flags.  */
>>> +  auto_vec GTY((skip)) target_info;
>>>/* Estimated growth for inlining all copies of the function before start
>>>   of small functions inlining.
>>>   This value will get out of date as the callers are duplicated, but
>> 
>> Segher already wrote in the first thread that a vector of HOST_WIDE_INTs
>> is an overkill and I agree.  So at least make the new field just a
>> HOST_WIDE_INT or better yet, an unsigned int.  But I would even go
>> further and make target_info only a 16-bit bit-field, place it after the
>> other bit-fields in class ipa_fn_summary and pass it to the hooks as
>> uint16_t.  Unless you have plans which require more space, I think we
>> should be conservative here.
>> 
>
> OK, yeah, the consideration is mainly for the scenario that target has
> a few bits to care about.  I just realized that to avoid inefficient
> bitwise operation for mapping target info bits to isa_flag bits, target
> can rearrange the sparse bits in isa_flag, so it's not a deal.
> Thanks for re-raising this!  I'll use the 16 bits bit-field in v3 as you
> suggested, if you don't mind, I will put it before the existing bit-fields
> to have a good alignment.

All right.

>
>> I am also not sure if I agree that the field should not be streamed for
>> offloading, but since we do not have an offloading compiler needing them
>> I guess for now that is OK. But it should be documented in the comment
>> describing the field that it is not streamed to offloading compilers.
>> 
>
> Good point, will add it in v3.
>
>> [...]
>> 
>> 
>>> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
>>> index 2470937460f..72091b6193f 100644
>>> --- a/gcc/ipa-fnsummary.c
>>> +++ b/gcc/ipa-fnsummary.c
>>> @@ -2608,6 +2617,7 @@ analyze_function_body (struct cgraph_node *node, bool 
>>> early)
>>>info->conds = NULL;
>>>info->size_time_table.release ();
>>>info->call_size_time_table.release ();
>>> +  info->target_info.release();
>>>  
>>>/* When optimizing and analyzing for IPA inliner, initialize loop 
>>> optimizer
>>>   so we can produce proper inline hints.
>>> @@ -2659,6 +2669,12 @@ analyze_function_body (struct cgraph_node *node, 
>>> bool early)
>>>bb_predicate,
>>>bb_predicate);
>>>  
>>> +  /* Only look for target information for inlinable functions.  */
>>> +  bool scan_for_target_info =
>>> +info->inlinable
>>> +&& targetm.target_option.need_ipa_fn_target_info (node->decl,
>>> + info->target_info);
>>> +
>>>if (fbi.info)
>>>  compute_bb_predicates (&fbi, node, info, params_summary);
>>>const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>>> @@ -2876,6 +2892,10 @@ analyze_function_body (struct cgraph_node *node, 
>>> bool early)
>>>   if (dump_file)
>>> fprintf (dump_file, "   fp_expression set\n");
>>> }
>>> + if (scan_for_target_info)
>>> +   scan_for_target_info =
>>> + targetm.target_option.update_ipa_fn_target_info
>>> + (info->target_info, stmt);
>>> }
>> 
>> Practically it probably does not matter, but why is this in the "if
>> (this_time || this_size)" block?  Although I can see that setting
>> fp_expression is also done that way... but it seems like copying a
>> mistake to me.
>
> Yeah, I felt target info scanning is similar to fp_expression scanning,
> so I just followed the same way.  If I read it right, the case
> !(this_time || this_size) means the STMT won't be weighted to any RTL
> insn from both time and size perspectives, so guarding it seems to avoid
> unnecessary scannings.  I assumed that target bifs and inline asm would
> not be evaluated as zero cost, it seems safe so far for HTM usage.
>
> Do you worry about some special STMT which is weighted to zero but it's
> necessarily to be checked for target info in a long term?
> If so, I'll move it out in v3.

It seems that gimple_call_internal_p statements are always costed to
zero and I am wondering whether those are something that targets would
want to look out for in the future.

But hopefully anyone implementing that in th

[PATCH v2] C++: add type checking for static local vector variable in template

2021-09-16 Thread wangpc via Gcc-patches
This patch adds type checking for static local vector variable in
C++ template, both AArch64 SVE and RISCV RVV are of sizeless type
and they all have this issue.

2021-08-06  wangpc  

gcc/cp/ChangeLog

* decl.c (cp_finish_decl): Add type checking.

gcc/testsuite/ChangeLog

* g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..21a6be12719 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7520,6 +7520,13 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   && DECL_INITIALIZED_IN_CLASS_P (decl))
 check_static_variable_definition (decl, type);
 
+  if (!processing_template_decl
+  && VAR_P (decl)
+  && DECL_FUNCTION_SCOPE_P (decl)
+  && TREE_STATIC (decl))
+verify_type_context (DECL_SOURCE_LOCATION (decl),
+ TCTX_STATIC_STORAGE, type);
+
   if (init && TREE_CODE (decl) == FUNCTION_DECL)
 {
   tree clone;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index 000..c2395d18d50
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */
-- 
2.33.0.windows.1



[Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-16 Thread Jirui Wu via Gcc-patches
Hi all,

This patch lowers the vld1 and vst1 variants of the
store and load neon builtins functions to gimple.

The changes in this patch covers:
* Replaces calls to the vld1 and vst1 variants of the builtins
* Uses MEM_REF gimple assignments to generate better code
* Updates test cases to prevent over optimization

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c 
(aarch64_general_gimple_fold_builtin):
lower vld1 and vst1 variants of the neon builtins

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fmla_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/fmls_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/fmul_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/mla_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/mls_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/mul_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/simd/vmul_elem_1.c:
prevent over optimization
* gcc.target/aarch64/vclz.c:
replace macro with function to prevent over optimization
* gcc.target/aarch64/vneg_s.c:
replace macro with function to prevent over optimization
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
eef9fc0f4440d7db359e53a7b4e21e48cf2a65f4..027491414da16b66a7fe922a1b979d97f553b724
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2382,6 +2382,31 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt)
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
+  /*Lower store and load neon builtins to gimple.  */
+  BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
+   if (!BYTES_BIG_ENDIAN)
+ {
+   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+   fold_build2 (MEM_REF,
+   TREE_TYPE
+   (gimple_call_lhs (stmt)),
+   args[0], build_int_cst
+   (TREE_TYPE (args[0]), 0)));
+ }
+   break;
+  BUILTIN_VALL_F16 (STORE1, st1, 0, STORE)
+   if (!BYTES_BIG_ENDIAN)
+ {
+ new_stmt = gimple_build_assign (fold_build2 (MEM_REF,
+  TREE_TYPE (gimple_call_arg
+(stmt, 1)),
+  gimple_call_arg (stmt, 0),
+  build_int_cst
+  (TREE_TYPE (gimple_call_arg
+ (stmt, 0)), 0)),
+  gimple_call_arg (stmt, 1));
+ }
+   break;
   BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10, ALL)
   BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10, ALL)
new_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
diff --git a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c 
b/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
index 
59ad41ed0471b17418c395f31fbe666b60ec3623..bef31c45650dcd088b38a755083e6bd9fe530c52
 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
@@ -11,6 +11,7 @@ extern void abort (void);
 
 #define TEST_VMLA(q1, q2, size, in1_lanes, in2_lanes)  \
 static void\
+__attribute__((noipa,noinline))
\
 test_vfma##q1##_lane##q2##_f##size (float##size##_t * res, \
   const float##size##_t *in1,  \
   const float##size##_t *in2)  \
@@ -104,12 +105,12 @@ main (int argc, char **argv)
vfmaq_laneq_f32.  */
 /* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 2 } } */
 
-/* vfma_lane_f64.  */
-/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, 
d\[0-9\]+\, d\[0-9\]+" 1 } } */
+/* vfma_lane_f64.
+   vfma_laneq_f64.  */
+/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, 
d\[0-9\]+\, d\[0-9\]+" 2 } } */
 
 /* vfmaq_lane_f64.
-   vfma_laneq_f64.
vfmaq_laneq_f64.  */
-/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, 
v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 } } */
+/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, 
v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c 
b/gcc/t

Re: [PATCH 1/N] Rename asm_out_file function arguments.

2021-09-16 Thread Iain Sandoe


Hi Martin,

> On 16 Sep 2021, at 11:00, Martin Liška  wrote:
> 
> As preparation for a new global object that will encapsulate
> asm_out_file, we would need to live with a macro that will
> define asm_out_file as casm->out_file and thus the name
> can't be used in function arguments.

So, if I understand correctly, the motivation is to be able to switch
between output file streams for different categories of content?

Darwin, actually already does this (manually) with a separate
lto_asm_out_name for lto data (so a general solution would
be great).
 
What is the reason for associating the section pointers with the
casm object?

* I can understand that each instance of a casm object would have
 potentially a different current section (“in_section”), but it seems that
 as things stand the section pointers would be duplicates.

* In the case that there’s reason that the sections could be different
  between casm instances, then would it make sense to have a
  target hook so that target-specific sections can be added to the
  local list (via some indirection, I’d assume)?

—

(of course, it would be great if one day we could abstract the asm out
 such that we could switch to a direct-to-object implementation)

> I've built all cross compilers with the change and
> can bootstrap on x86_64-linux-gnu and survives regression tests.

A native bootstrap fails early in stage1 for x86_64-darwin (I’ll take a look
at fixing the issues once the patch series settles down)

---

/src-local/gcc-master/gcc/dwarf2asm.c: In function ‘void 
dw2_asm_output_nstring(const char*, size_t, const char*, ...)’:
/src-local/gcc-master/gcc/output.h:387:26: error: expected initializer before 
‘->’ token
 #define asm_out_file casm->out_file
  ^
/src-local/gcc-master/gcc/defaults.h:68:13: note: in expansion of macro 
‘asm_out_file’
   FILE *asm_out_file = _hide_asm_out_file;  \
 ^~~~
/src-local/gcc-master/gcc/dwarf2asm.c:414:7: note: in expansion of macro 
‘ASM_OUTPUT_ASCII’
   ASM_OUTPUT_ASCII (asm_out_file, str, len);
   ^~~~
In file included from ./tm.h:42:0,

--

/src-local/gcc-master/gcc/config/i386/darwin.h:219:6: error: ‘in_section’ was 
not declared in this scope
  if (in_section == text_section)   \
  ^
/src-local/gcc-master/gcc/dwarf2out.c:677:3: note: in expansion of macro 
‘ASM_OUTPUT_ALIGN’
   ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (PTR_SIZE));




Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-16 Thread Richard Biener via Gcc-patches
On Thu, 16 Sep 2021, Jirui Wu wrote:

> Hi all,
> 
> This patch lowers the vld1 and vst1 variants of the
> store and load neon builtins functions to gimple.
> 
> The changes in this patch covers:
> * Replaces calls to the vld1 and vst1 variants of the builtins
> * Uses MEM_REF gimple assignments to generate better code
> * Updates test cases to prevent over optimization
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? If OK can it be committed for me, I have no commit rights.

+   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+   fold_build2 (MEM_REF,
+   TREE_TYPE
+   (gimple_call_lhs (stmt)),
+   args[0], build_int_cst
+   (TREE_TYPE (args[0]), 0)));

you are using TBAA info based on the formal argument type that might
have pointer conversions stripped.  Instead you should use a type
based on the specification of the intrinsics (or the builtins).

Likewise for the type of the access (mind alignment info there!).

Richard.

> Thanks,
> Jirui
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-builtins.c 
> (aarch64_general_gimple_fold_builtin):
> lower vld1 and vst1 variants of the neon builtins
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/fmla_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/fmls_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/fmul_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/mla_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/mls_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/mul_intrinsic_1.c:
> prevent over optimization
> * gcc.target/aarch64/simd/vmul_elem_1.c:
> prevent over optimization
> * gcc.target/aarch64/vclz.c:
> replace macro with function to prevent over optimization
> * gcc.target/aarch64/vneg_s.c:
> replace macro with function to prevent over optimization
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] wwwdocs: Move inactive branches to the right section

2021-09-16 Thread Jonathan Wakely via Gcc-patches
The https://gcc.gnu.org/git.html page is a total mess, listing dozens
and dozens of branches which haven't seen updates in a decade and
which are under the refs/dead/heads/* are of the Git repo.

This moves them all to the "Inactive" or "Merged" section, as
appropriate.

OK for wwwdocs?


commit cbb29df7295bb49f957100e50939393b31e22433
Author: Jonathan Wakely 
Date:   Thu Sep 16 12:20:15 2021 +0100

Move merged and inactive branches to the right section

diff --git a/htdocs/git.html b/htdocs/git.html
index 53267b09..ac1f2eb9 100644
--- a/htdocs/git.html
+++ b/htdocs/git.html
@@ -301,132 +301,19 @@ in Git.
 
 Architecture-specific
 
+No active branches
+
 
 Target-specific
 
+No active branches
+
 
 Language-specific
 
@@ -437,20 +324,6 @@ in Git.
   Lock3 Software.  It is currently maintained
   by Jason Merrill.
 
-  fortran-dev
-  This branch is for disruptive changes to the Fortran front end,
-especially for OOP development and 
-the https://gcc.gnu.org/wiki/ArrayDescriptorUpdate";>
-array descriptor update.  It is maintained by Jerry DeLisle
-.
-
-  gcc-4_4-plugins
-  This branch is for backporting the plugin functionality into
-  a 4.4-based release.  There will be no new code or functionality
-  added to this branch.  It is maintained by Diego Novillo.
-  Only patches backported from mainline are accepted.  They should
-  be marked with the tag [4_4-plugins] in the Subject line.
-
   gccgo
   This branch is for the Go front end to gcc.  For more information
 about the Go programming language,
@@ -459,14 +332,6 @@ in Git.
 marked with the tag [gccgo] in the Subject line.
   
 
-  gupc
-  This branch implements support for UPC (Unified Parallel C).
-  UPC extends the C programming language to provide support for
-  high-performance, parallel systems with access to a single
-  potentially large, global shared address space.
-  Further information can be found on the
-  https://github.com/Intrepid/GUPC";>GNU UPC page.
-
   modula-2
   This branch is for the
 http://nongnu.org/gm2/homepage.html";>GNU Modula-2
@@ -477,38 +342,6 @@ in Git.
 Patches should be
 prefixed with [modula-2] in the subject line.
 
-  pph
-  This branch implements https://gcc.gnu.org/wiki/pph";> Pre-Parsed
-  Headers for C++.  It is maintained by Diego Novillo and Lawrence Crowl.  Patches should be
-  prefixed with [pph] in the subject line.
-
-  pth-icm
-  This is a sub-branch of the pph branch.  It
-  implements
-  https://gcc.gnu.org/wiki/pph#Pre-Tokenized_Headers_.28PTH.29";>
-  Pre-Tokenized Headers for C++.  Additionally, it contains
-  instrumentation code in the C++ parser that was used in an
-  incremental compiler model (icm) to study the effects of an
-  incremental compiler cache for a compiler server. The branch is
-  maintained by Diego Novillo
-  and Lawrence Crowl.  Patches
-  should be prefixed with [pph] in the subject line.
-
-  tr29124
-  This branch is for development of TR29124 Special math Functions,
-for the C++ runtime library
-See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf";>
-.  It is maintained by Ed Smith-Rowland
-<3dw...@verizon.net>.
-
-  var-template
-  This branch is for implementation work on 
-  variable template for C++.  It was originally
-   created by Gabriel Dos Reis.  It is maintained by
-  Jason Merrill.
-
   coarray_native
   This branch is for implementation of a shared memory
 implementation of Fortran coarrays.  It is maintained by
@@ -628,6 +461,15 @@ inactive.  Inactive branches are under refs/dead/heads/
 in Git (except for ones under refs/vendors/).
 
 
+  https://gcc.gnu.org/git/?p=gcc-old.git;a=shortlog;h=refs/heads/aarch64/sve-acle-branch";>aarch64/sve-acle-branch
+  This https://gcc.gnu.org/wiki/GitMirror";>Git-only branch was
+  used for collaborative development of the AArch64 SVE ACLE implementation.
+  The branch is based off and merged with trunk.  Please send patches to
+  gcc-patches with an [SVE ACLE] tag in the subject line.
+  There's no need to use ChangeLogs; the ChangeLogs will instead be
+  written when the work is ready to be merged into trunk.  The branch is
+  maintained by Richard Sandiford.
+
   ARM/aarch64-branch
   This branch added support for the AArch64 architecture and tracked
 trunk until the port was merged into mainline.
@@ -741,6 +583,13 @@ in Git (except for ones under refs/vendors/).
   This branch was maintained by Ian Lance Taylor.  All changes were
   merged into mainline.
 
+  tr29124
+  This branch is for development of TR29124 Special math Functions,
+for the C++ runtime library
+See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf";>
+.  It is maintained by Ed Smith-Rowland
+<3dw...@verizon.net>.
+
   tree-cleanup-branch
   This branch contained improvements and reorganization to the
   tree optimizers that were not ready in time for GCC 4.0.  The
@@ -756,6 +605,10 @@ in Git (except

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Qing Zhao via Gcc-patches
Hi, Jakub,

> On Sep 16, 2021, at 4:19 AM, Jakub Jelinek  wrote:
> 
> On Wed, Sep 15, 2021 at 05:59:08PM +, Qing Zhao wrote:
>>> Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
>>> deal with that right now, just try
>>> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
>>> i386.exp=auto-init*'
>> 
>> It’s strange that the above testing on my local x86 machine with the latest 
>> gcc had less failure than the following:
>> 
>> [opc@qinzhao-ol8u3-x86 build-boot]$ make check-gcc 
>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*' &> log &
>> [1] 3885164
>> [opc@qinzhao-ol8u3-x86 build-boot]$ 
>> [1]+  Donemake check-gcc 
>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=auto-init*' &> log
>> [opc@qinzhao-ol8u3-x86 build-boot]$ egrep FAIL gcc/testsuite/gcc/gcc.sum
>> FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
>> "0xfefefefe" 2
>> FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
>> "0xfefefefefefefefe" 3
>> FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\t\\%xmm0, 
>> \\%xmm0 3
>> FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
>> "0xfefefefe" 1
>> FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
>> "\\[0xfefefefefefefefe\\]" 1
>> FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
>> "0xfffe\\]\\) repeated x16" 1
>> FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times \\.long\t0 14
>> FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\t\\$16,
>> FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
>> FAIL: gcc.target/i386/auto-init-padding-7.c scan-assembler-times movq\t\\$0, 
>> 2
>> FAIL: gcc.target/i386/auto-init-padding-8.c scan-assembler-times movq\t\\$0, 
>> 2
>> FAIL: gcc.target/i386/auto-init-padding-9.c scan-assembler rep stosq
> 
> Testing for many instructions is always very fragile and dependent on exact
> compiler flags etc.

Yes, It’s indeed very fragile. 
>  So, either the test should have a particular
> -march=/-mtune= options

I might add specific -march to the testing cases. 

> and ideally also -fno-stack-protector
> -fno-stack-clash-protection etc.

Could you explain a little bit on this why?

> if they could change the expected matching,
> or test it at runtime instead(I know, it is playing with fire, because you
> are testing the behavior of UB, but perhaps making the functions that use
> the uninitialized vars __attribute__((noipa)) and checking whether the vars
> contain the expected values might be ok.

I thought of doing the testing at runtime too in the beginning, however, I was 
worried about how can we be sure that the correct values  in the variable come 
from the compiler initialization? 
I will try one more time to see whether I can come up with a runtime testing 
case.

Thanks a lot.

Qing
> 
>   Jakub
> 



Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 16, 2021 at 02:49:23PM +, Qing Zhao wrote:
> > Testing for many instructions is always very fragile and dependent on exact
> > compiler flags etc.
> 
> Yes, It’s indeed very fragile. 
> >  So, either the test should have a particular
> > -march=/-mtune= options
> 
> I might add specific -march to the testing cases. 

Even -mtune= is needed if you want to stay safe, otherwise people testing
with --target_board=unix/-mtune=cascadelake (or whatever else) might get
failures.

> > and ideally also -fno-stack-protector
> > -fno-stack-clash-protection etc.
> 
> Could you explain a little bit on this why?

In case people test e.g. with --target_board=unix/\{,-fstack-protector-all\}
etc. (e.g. in Fedora/RHEL we do).
For the RTL scanning checks if they are done fairly early, those options
might not change anything, but with the ones scanning in the assembly,
one needs to watch if those options don't add e.g. in the prologue or
epilogue further copies of the instructions you scan for.

Jakub



Re: [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics

2021-09-16 Thread Paul A. Clarke via Gcc-patches
Ping.

On Mon, Aug 23, 2021 at 02:03:04PM -0500, Paul A. Clarke via Gcc-patches wrote:
> v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
> and users will expect to be able to include "nmmintrin.h",
> even though "nmmintrin.h" just includes "smmintrin.h"
> where all of the SSE4.2 implementations actually appear.
> 
> Only patch 5/6 changed from v2.
> 
> Tested ppc64le (POWER9) and ppc64/32 (POWER7).
> 
> OK for trunk?
> 
> Paul A. Clarke (6):
>   rs6000: Support SSE4.1 "round" intrinsics
>   rs6000: Support SSE4.1 "min" and "max" intrinsics
>   rs6000: Simplify some SSE4.1 "test" intrinsics
>   rs6000: Support SSE4.1 "cvt" intrinsics
>   rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics
>   rs6000: Guard some x86 intrinsics implementations
> 
>  gcc/config/rs6000/emmintrin.h |  12 +-
>  gcc/config/rs6000/nmmintrin.h |  40 ++
>  gcc/config/rs6000/pmmintrin.h |   4 +
>  gcc/config/rs6000/smmintrin.h | 427 --
>  gcc/config/rs6000/tmmintrin.h |  12 +
>  gcc/testsuite/gcc.target/powerpc/pr78102.c|  23 +
>  .../gcc.target/powerpc/sse4_1-packusdw.c  |  73 +++
>  .../gcc.target/powerpc/sse4_1-pcmpeqq.c   |  46 ++
>  .../gcc.target/powerpc/sse4_1-pmaxsb.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pmaxsd.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pmaxud.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pmaxuw.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pminsb.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pminsd.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pminud.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pminuw.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmuldq.c|  51 +++
>  .../gcc.target/powerpc/sse4_1-pmulld.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-round3.h|  81 
>  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
>  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 
>  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
>  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
>  .../gcc.target/powerpc/sse4_2-check.h |  18 +
>  .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |  46 ++
>  37 files changed, 2407 insertions(+), 59 deletions(-)
>  create mode 100644 gcc/config/rs6000/nmmintrin.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.

Re: [PATCH] c++: constrained variable template issues [PR98486]

2021-09-16 Thread Jason Merrill via Gcc-patches

On 9/16/21 09:11, Patrick Palka wrote:

This fixes some issues with constrained variable templates:

   * Constraints aren't checked when explicitly specializing a variable
 template
   * Constraints aren't attached to a static data member template at
 parse time
   * Constraints aren't propagated when (partially) instantiating a
 static data member template

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
cmcstl2 and range-v3, does this look OK for trunk and perhaps 11?

PR c++/98486

gcc/cp/ChangeLog:

* decl.c (grokdeclarator): Set constraints on a static data
member template.
* pt.c (determine_specialization): Check constraints on a
variable template.


These hunks are OK.


(tsubst_decl) : Propagate constraints on a
static data member template.


Hmm, why is this necessary?  I know we already do this for functions, 
but I don't remember why.  Don't we check satisfaction for the 
most-general template?



gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-var-templ1.C: New test.
* g++.dg/cpp2a/concepts-var-templ1a.C: New test.
* g++.dg/cpp2a/concepts-var-templ1b.C: New test.
---
  gcc/cp/decl.c | 11 +++
  gcc/cp/pt.c   |  8 +++-
  gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C  |  9 +
  gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C | 14 ++
  gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C | 15 +++
  5 files changed, 56 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c0f1496636f..7beac79ec25 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13980,6 +13980,17 @@ grokdeclarator (const cp_declarator *declarator,
if (declspecs->gnu_thread_keyword_p)
  SET_DECL_GNU_TLS_P (decl);
  }
+
+   /* Set the constraints on declaration.  */
+   bool memtmpl = (processing_template_decl
+   > template_class_depth (current_class_type));
+   if (memtmpl)
+ {
+   tree tmpl_reqs
+ = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
+   tree ci = build_constraints (tmpl_reqs, NULL_TREE);
+   set_constraints (decl, ci);
+ }
  }
else
  {
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 224dd9ebd2b..613d87f2637 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -2218,7 +2218,8 @@ determine_specialization (tree template_id,
targs = coerce_template_parms (parms, explicit_targs, fns,
 tf_warning_or_error,
 /*req_all*/true, /*use_defarg*/true);
-  if (targs != error_mark_node)
+  if (targs != error_mark_node
+ && constraints_satisfied_p (fns, targs))
  templates = tree_cons (targs, fns, templates);
  }
else for (lkp_iterator iter (fns); iter; ++iter)
@@ -14920,6 +14921,11 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
if (DECL_NAMESPACE_SCOPE_P (t))
  DECL_NOT_REALLY_EXTERN (r) = 1;
  
+	/* Propagate the declaration's constraints.  */

+   if (VAR_P (r) && DECL_CLASS_SCOPE_P (r))
+ if (tree ci = get_constraints (t))
+   set_constraints (r, ci);
+
DECL_TEMPLATE_INFO (r) = build_template_info (tmpl, argvec);
SET_DECL_IMPLICIT_INSTANTIATION (r);
if (!error_operand_p (r) || (complain & tf_error))
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
new file mode 100644
index 000..80b48ba3a3d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
@@ -0,0 +1,9 @@
+// PR c++/98486
+// { dg-do compile { target c++20 } }
+
+template concept C = __is_same(T, U);
+
+template> int v;
+
+template<> int v;
+template<> int v; // { dg-error "match" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
new file mode 100644
index 000..b12d37d8b7e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
@@ -0,0 +1,14 @@
+// PR c++/98486
+// { dg-do compile { target c++20 } }
+
+template concept C = __is_same(T, U);
+
+struct A {
+  template> static int v;
+};
+
+template<> int A::v;
+template<> int A::v; // { dg-error "match" }
+
+int x = A::v;
+int y = A::v; // { dg-error "invalid" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C
new file mode 100644
index 000..37d7f0fc654

Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-16 Thread Jeff Law via Gcc-patches




On 9/16/2021 1:41 AM, Richard Biener wrote:

On Wed, 15 Sep 2021, Koning, Paul wrote:




On Sep 13, 2021, at 3:31 AM, Richard Biener  wrote:

This makes defaults.h choose DWARF2_DEBUG if PREFERRED_DEBUGGING_TYPE
is not specified by the target and NO_DEBUG if DWARF is not supported.

As I'm looking at questions about old debug formats, it brings up the
question of old object formats.  I don't remember what the status of
a.out is.  Is that considered deprecated?  Still current?  Of course
most targets use elf, but is there an expectation to move away from
a.out the way there is an expectation to move away from STABS?

Is this actually a binutils rather than a gcc question?

I guess it's a question for both - I do still see a.out targets
in the configs supported by gas for example.

Note that languages like C++ might have difficulties with object
formats that do not support separate sections for instantiated
templates for example, or for global initializers.  We might have
kludges for that in collect2 where removing those might be a
motivation to deprecate object formats not supporting some
set of features (named sections for example).

As for "old", the problem with the legacy systems, being it
pdp11 or hppa-hpux, is of course that they tend to be kept alive
with minimal resources and doing major modernization doesn't
really make sense if all that is wanted is to preserve them
rather than turning them into something modern.

That said - yes, I'd consider a.out purely legacy and not fit
for the future.  But it never came up on the radar of standing
in the way of modernizing GCC in any area.
I'd definitely consider a.out & SOM as purely legacy.  As long as they 
continue to work, great, but I wouldn't make any significant investment 
in either.  And yes, there are mechanisms in collect2 to support things 
like global initializers/finalizers on a.out systems.


FWIW, SOM (the 32bit native hpux format) is a COFF derivative and can 
support most of the stuff  ELF can.   Even so, I'd consider it pure legacy.


Re: [PATCH 1/4] cgraph: Do not warn about caller count mismatches of removed functions

2021-09-16 Thread Martin Jambor
Hi,

On Fri, Aug 20 2021, Martin Jambor wrote:
> To verify other changes in the patch series, I have been searching for
> "Invalid sum of caller counts" string in symtab dump but found that
> there are false warnings about functions which have their body removed
> because they are now unreachable.  Those are of course invalid and so
> this patches avoids checking such cgraph_nodes.
>
> gcc/ChangeLog:
>
> 2021-08-20  Martin Jambor  
>
>   * cgraph.c (cgraph_node::dump): Do not check caller count sums if
>   the body has been removed.  Remove trailing whitespace.

I have pushed this patch as obvious but like to ping the rest of the
series.

Thanks,

Martin


> ---
>  gcc/cgraph.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 8f3af003f2a..de078653781 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2236,7 +2236,7 @@ cgraph_node::dump (FILE *f)
>  }
>fprintf (f, "\n");
>  
> -  if (count.ipa ().initialized_p ())
> +  if (!body_removed && count.ipa ().initialized_p ())
>  {
>bool ok = true;
>bool min = false;
> @@ -2245,7 +2245,7 @@ cgraph_node::dump (FILE *f)
>FOR_EACH_ALIAS (this, ref)
>   if (dyn_cast  (ref->referring)->count.initialized_p ())
> sum += dyn_cast  (ref->referring)->count.ipa ();
> -  
> +
>if (inlined_to
> || (symtab->state < EXPANSION
> && ultimate_alias_target () == this && only_called_directly_p ()))
> -- 
> 2.32.0


Re: [PATCH] C++: add type checking for static local vector variable in template

2021-09-16 Thread Jason Merrill via Gcc-patches

On 9/16/21 05:11, wangpc via Gcc-patches wrote:

This patch adds type checking for static local vector variable in
C++ template, both AArch64 SVE and RISCV RVV are of sizeless type
and they all have this issue.

2021-08-06  wangpc  

gcc/cp/ChangeLog

 * decl.c (cp_finish_decl): Add type checking.

gcc/testsuite/ChangeLog

 * g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..e3a06ea0858 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7520,6 +7520,12 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
&& DECL_INITIALIZED_IN_CLASS_P (decl))
  check_static_variable_definition (decl, type);
  
+  if (VAR_P (decl)

+  && DECL_FUNCTION_SCOPE_P (decl)
+  && TREE_STATIC (decl))
+verify_type_context (DECL_SOURCE_LOCATION (decl),
+ TCTX_STATIC_STORAGE, type);


I was thinking to move the verify_type_context code from start_decl, 
which handles more cases:



  if (is_global_var (decl))
{
  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
   ? TCTX_THREAD_STORAGE
   : TCTX_STATIC_STORAGE);
  verify_type_context (input_location, context, TREE_TYPE (decl));
}


Jason



Re: [PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 16 Sep 2021, Richard Biener via Gcc-patches wrote:

> > Typically for the native_interpret/native_encode we punt if 
> > BITS_PER_UNIT != 8 || CHAR_BIT != 8 because nobody had the energy to 
> > deal with the weird platforms (especially if we have currently none, I 
> > believe dsp16xx that had 16-bit bytes had been removed in 4.0 and c4x 
> > that had 32-bit bytes had been removed in 4.3) - dunno if the 
> > DEFERRED_INIT etc. code has those guards or not and it clearly 
> > documents that this code is not ready for other configurations. A byte 
> > is not necessarily 8 bits, that is just the most common size for it, 
> > and TYPE_SIZE_UNIT is number of BITS_PER_UNIT bit units.
> 
> OK, I'll do s/8/BITS_PER_UNIT/ - I also see that we have 
> int_size_in_bytes returning TYPE_SIZE_UNIT and that TYPE_SIZE_UNIT is 
> documented to yeild the type size in 'bytes'.

For better or worse GCCs meaning of 'byte' is really 'unit'; I guess most 
introductions of the term 'byte' in comments and function names really 
came from either carelessness or from people who knew this fact and 
thought it obvious that 'byte' of course is the same as 'unit', but not 
the same as octet.

FWIW: (for GCC) both mean the smallest naturally addressable piece of 
memory (i.e. what you get when you increase an address by 'one'), and that 
is not necessarily 8 bit (but anything else is bit-rotten of course).

As modern use of 'byte' tends to actually mean octet, but old use of byte 
(and use in GCC) means unit, we probably should avoid the term byte 
alltogether in GCC.  But that ship has sailed :-/

> I do believe that we should officially declare hosts with CHAR_BIT != 8 
> as unsupported and as you say support for targets with BITS_PER_UNIT != 
> 8 is likely bit-rotten.

(And characters are still something else entirely, except on those couple 
platforms where chars, units and octets happen to be the same :) )  
But yes.


Ciao,
Michael.


Re: [patch] Fix PR rtl-optimization/102306

2021-09-16 Thread Jeff Law via Gcc-patches




On 9/16/2021 3:02 AM, Eric Botcazou wrote:

Hi,

this is a duplication of volatile loads introduced during GCC 9 development by
the new 2->2 mechanism of the RTL combiner.  There is already a substantial
checking for volatile references in can_combine_p but it implicitly assumes
that the combination reduces the number of instructions, which is of course
not the case here.  So the fix teaches try_combine to abort the combination
when it is about to make a copy of volatile references to preserve them.

Bootstrapped/regtested on x86-64/Linux, OK for mainline and release branches?


2021-09-16  Eric Botcazou  

PR rtl-optimization/102306
* combine.c (try_combine): Abort the combination if we are about
to duplicate volatile references.


2021-09-16  Eric Botcazou  

* gcc.target/sparc/20210916-1.c: New test.

OK
jeff



Re: [PATCH] Fix PR 67102: Add libstdc++ dependancy to libffi

2021-09-16 Thread Jeff Law via Gcc-patches




On 9/15/2021 2:56 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

The error message is obvious -funconfigured-libstdc++-v3 is used
on the g++ command line.  So we just add the dependancy.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

ChangeLog:

* Makefile.def: Have configure-target-libffi depend on
all-target-libstdc++-v3.
* Makefile.in: Regenerate.

OK
jeff




Re: [Patch] PowerPC: Fix rs6000-gen-builtins with build != host [PR102353]

2021-09-16 Thread Segher Boessenkool
Hi!

On Thu, Sep 16, 2021 at 11:07:25AM +0200, Tobias Burnus wrote:
> As mentioned in https://gcc.gnu.org/PR102353 and in the patch,
> rs6000-gen-builtins was build to be run on "host" – and then linked and
> on on "build".
> 
> That caused bootstrap fails at link time.

> Tested with a x86_64-gnu-linux (build) → powerpc64le-linux-gnu (host,
> target) build.

Needs a native build, too, but Bill did that.  Okay.

> PowerPC: Fix rs6000-gen-builtins with build != host [PR102353]

> 
> This mimics what the main Makefile.in does: compile the generator
> files under build (with Makefile.in's 'build/%.o' rule for compilation).
> It also adds $(RUN_GEN) to optionally run it with valgrind and
> the $(build_exeext) suffix.
> 
> Before, the .o files were compiled with $(COMPILE), causing link
> error with $(LINKER_FOR_BUILD) for build != host.
> 
> gcc/
>   PR target/102353
>   * config/rs6000/t-rs6000 (build/rs6000-gen-builtins.o,
>   build/rbtree.o): Added 'build/' to target, use build/%.o rule.

You can break the line after the ":" fwiw.

>   (build/rs6000-gen-builtins$(build_exeext)): Add 'build/' and
>   '$(build_exeext)' to target and 'build/' for the *.o files.
>   (rs6000-builtins.c): Update for those changes; run
>   rs6000-gen-builtins with $(RUN_GEN).

The margin is at 80 chars for changelogs.

> -rs6000-gen-builtins: rs6000-gen-builtins.o rbtree.o
> +build/rs6000-gen-builtins$(build_exeext): build/rs6000-gen-builtins.o 
> build/rbtree.o $(BUILD_LIBDEPS)

Maybe break the prerequisites here (with "\"), the line is very long now?

Okay for trunk.  Thanks!


Segher


Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Qing Zhao via Gcc-patches


> On Sep 16, 2021, at 9:56 AM, Jakub Jelinek  wrote:
> 
> On Thu, Sep 16, 2021 at 02:49:23PM +, Qing Zhao wrote:
>>> Testing for many instructions is always very fragile and dependent on exact
>>> compiler flags etc.
>> 
>> Yes, It’s indeed very fragile. 
>>> So, either the test should have a particular
>>> -march=/-mtune= options
>> 
>> I might add specific -march to the testing cases. 
> 
> Even -mtune= is needed if you want to stay safe, otherwise people testing
> with --target_board=unix/-mtune=cascadelake (or whatever else) might get
> failures.

Okay. Will try this.
> 
>>> and ideally also -fno-stack-protector
>>> -fno-stack-clash-protection etc.
>> 
>> Could you explain a little bit on this why?
> 
> In case people test e.g. with --target_board=unix/\{,-fstack-protector-all\}
> etc. (e.g. in Fedora/RHEL we do).
> For the RTL scanning checks if they are done fairly early, those options
> might not change anything, but with the ones scanning in the assembly,
> one needs to watch if those options don't add e.g. in the prologue or
> epilogue further copies of the instructions you scan for.

I see. 

Thank you.

Qing
> 
>   Jakub
> 



Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 16, 2021 at 03:39:46PM +, Qing Zhao wrote:
> > Even -mtune= is needed if you want to stay safe, otherwise people testing
> > with --target_board=unix/-mtune=cascadelake (or whatever else) might get
> > failures.
> 
> Okay. Will try this.
> > 
> >>> and ideally also -fno-stack-protector
> >>> -fno-stack-clash-protection etc.
> >> 
> >> Could you explain a little bit on this why?
> > 
> > In case people test e.g. with --target_board=unix/\{,-fstack-protector-all\}
> > etc. (e.g. in Fedora/RHEL we do).
> > For the RTL scanning checks if they are done fairly early, those options
> > might not change anything, but with the ones scanning in the assembly,
> > one needs to watch if those options don't add e.g. in the prologue or
> > epilogue further copies of the instructions you scan for.
> 
> I see. 
> 
> Thank you.

Basically, try to test with a bunch of semi-randomly chosen option sets and
see what breaks and what works and then for the cases you think are common
enough and worth adjusting testcases adjust them, otherwise add dg-options
to make sure the expected arch/tune/etc. are in effect.
make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
 i386.exp=auto-init*'
etc.

Jakub



Re: [PATCH] c++: fix wrong fixit hints for misspelled typedef [PR77565]

2021-09-16 Thread Michel Morin via Gcc-patches
On Thu, Sep 16, 2021 at 5:44 AM Jason Merrill  wrote:
>
> On 9/14/21 04:29, Michel Morin via Gcc-patches wrote:
> > On Tue, Sep 14, 2021 at 7:14 AM David Malcolm  wrote:
> >>
> >> On Tue, 2021-09-14 at 03:35 +0900, Michel Morin via Gcc-patches wrote:
> >>> Hi,
> >>>
> >>> PR77565 reports that, with the code `typdef int Int;`, GCC emits
> >>> "did you mean 'typeof'?" instead of "did you mean 'typedef'?".
> >>>
> >>> This happens because the typo corrector determines that `typeof` is a
> >>> candidate for suggestion (through
> >>> `cp_keyword_starts_decl_specifier_p`),
> >>> but `typedef` is not.
> >>>
> >>> This patch fixes the issue by adding `typedef` as a candidate. The
> >>> patch
> >>> additionally adds the `inline` specifier and cv-specifiers as a
> >>> candidate.
> >>> Here is a patch (tests `make check-gcc` pass on darwin):
> >>
> >> Thanks for this patch (and for reporting the bug in the first place).
> >>
> >> I notice that, as well as being used for fix-it hints by
> >> lookup_name_fuzzy (indirectly via suggest_rid_p),
> >> cp_keyword_starts_decl_specifier_p is also used by
> >> cp_lexer_next_token_is_decl_specifier_keyword, which is used by
> >> cp_parser_lambda_declarator_opt and cp_parser_constructor_declarator_p.
> >
> > Ah, you're right! Thank you for pointing this out.
> > I failed to grep those functions somehow.
> >
> > One thing that confuses me is that cp_keyword_starts_decl_specifier_p
> > misses many keywords that can start decl-specifiers (e.g.
> > typedef/inline/cv-qual and friend/explicit/virtual).
> > So let's wait C++ frontend maintainers ;)
>
> That is strange.  Let's add all the rest of them as well.

Done. Thanks for your help!

One more thing — cp_keyword_starts_decl_specifier_p includes RID_ATTRIBUTE
(from the beginning; see https://gcc.gnu.org/PR28261 ), but attributes are
not decl-specifiers. Would it be reasonable to remove this?

Both patches (with and without removal of RID_ATTRIBUTE) attached.
No regressions on x86_64-apple-darwin.

Regards,
Michel



> >> So I'm not sure if this fix is exactly correct - hopefully one of the
> >> C++ frontend maintainers can chime in.  If
> >> cp_keyword_starts_decl_specifier_p isn't quite the right place for
> >> this, the fix could probably go in suggest_rid_p instead, which *is*
> >> specific to spelling corrections.
> >>
> >> Hope this is constructive; thanks again for the patch
> >> Dave
> >>
> >>
> >>
> >>>
> >>> 
> >>> c++: add typo corrections for typedef/inline/cv-qual [PR77565]
> >>>
> >>> PR c++/77565
> >>>
> >>> gcc/cp/ChangeLog:
> >>>
> >>> * parser.c (cp_keyword_starts_decl_specifier_p): Handle
> >>> typedef/inline specifiers and cv-qualifiers.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> * g++.dg/spellcheck-typenames.C: Add tests for decl-specs.
> >>>
> >>> --- a/gcc/cp/parser.c
> >>> +++ b/gcc/cp/parser.c
> >>> @@ -1051,6 +1051,12 @@ cp_keyword_starts_decl_specifier_p (enum rid
> >>> keyword)
> >>>   case RID_FLOAT:
> >>>   case RID_DOUBLE:
> >>>   case RID_VOID:
> >>> +  /* CV qualifiers.  */
> >>> +case RID_CONST:
> >>> +case RID_VOLATILE:
> >>> +  /* typedef/inline specifiers.  */
> >>> +case RID_TYPEDEF:
> >>> +case RID_INLINE:
> >>> /* GNU extensions.  */
> >>>   case RID_ATTRIBUTE:
> >>>   case RID_TYPEOF:
> >>> --- a/gcc/testsuite/g++.dg/spellcheck-typenames.C
> >>> +++ b/gcc/testsuite/g++.dg/spellcheck-typenames.C
> >>> @@ -76,3 +76,38 @@ singed char ch; // { dg-error "1: 'singed' does
> >>> not
> >>> name a type; did you mean 's
> >>>^~
> >>>signed
> >>>  { dg-end-multiline-output "" } */
> >>> +
> >>> +typdef int my_int; // { dg-error "1: 'typdef' does not name a type;
> >>> did you mean 'typedef'?" }
> >>> +/* { dg-begin-multiline-output "" }
> >>> + typdef int my_int;
> >>> + ^~
> >>> + typedef
> >>> +   { dg-end-multiline-output "" } */
> >>> +
> >>> +inlien int inline_func(); // { dg-error "1: 'inlien' does not name a
> >>> type; did you mean 'inline'?" }
> >>> +/* { dg-begin-multiline-output "" }
> >>> + inlien int inline_func();
> >>> + ^~
> >>> + inline
> >>> +   { dg-end-multiline-output "" } */
> >>> +
> >>> +coonst int ci = 0; // { dg-error "1: 'coonst' does not name a type;
> >>> did you mean 'const'?" }
> >>> +/* { dg-begin-multiline-output "" }
> >>> + coonst int ci = 0;
> >>> + ^~
> >>> + const
> >>> +   { dg-end-multiline-output "" } */
> >>> +
> >>> +voltil int vi; // { dg-error "1: 'voltil' does not name a type; did
> >>> you mean 'volatile'?" }
> >>> +/* { dg-begin-multiline-output "" }
> >>> + voltil int vi;
> >>> + ^~
> >>> + volatile
> >>> +   { dg-end-multiline-output "" } */
> >>> +
> >>> +statik int si; // { dg-error "1: 'statik' does not name a type; did
> >>> you mean 'static'?" }
> >>> +/* { dg-begin-multiline-output "" }
> >>> + statik int si;
> >>> + ^~
> >>> + static
> >>> +   { dg-end-multiline-output "" } */
> >>> ===

RE: [PATCH 1/5]AArch64 sve: combine inverted masks into NOTs

2021-09-16 Thread Tamar Christina via Gcc-patches
Hi esteemed reviewer!

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, August 31, 2021 4:46 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 1/5]AArch64 sve: combine inverted masks into NOTs
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > The following example
> >
> > void f10(double * restrict z, double * restrict w, double * restrict x,
> >  double * restrict y, int n)
> > {
> > for (int i = 0; i < n; i++) {
> > z[i] = (w[i] > 0) ? x[i] + w[i] : y[i] - w[i];
> > }
> > }
> >
> > generates currently:
> >
> > ld1dz1.d, p1/z, [x1, x5, lsl 3]
> > fcmgt   p2.d, p1/z, z1.d, #0.0
> > fcmgt   p0.d, p3/z, z1.d, #0.0
> > ld1dz2.d, p2/z, [x2, x5, lsl 3]
> > bic p0.b, p3/z, p1.b, p0.b
> > ld1dz0.d, p0/z, [x3, x5, lsl 3]
> >
> > where a BIC is generated between p1 and p0 where a NOT would be better
> here
> > since we won't require the use of p3 and opens the pattern up to being
> CSEd.
> >
> > After this patch using a 2 -> 2 split we generate:
> >
> > ld1dz1.d, p0/z, [x1, x5, lsl 3]
> > fcmgt   p2.d, p0/z, z1.d, #0.0
> > not p1.b, p0/z, p2.b
> >
> > The additional scratch is needed such that we can CSE the two operations.
> If
> > both statements wrote to the same register then CSE won't be able to CSE
> the
> > values if there are other statements in between that use the register.
> >
> > Note: This patch series is working incrementally towards generating the
> most
> >   efficient code for this and other loops in small steps.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-sve.md (*mask_inv_combine): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/pred-not-gen.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-sve.md
> > index
> 359fe0e457096cf4042a774789a5c241420703d3..2c23c6b12bafb038d82920e714
> 1a418e078a2c65 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -8126,6 +8126,42 @@ (define_insn_and_split
> "*fcmuo_and_combine"
> >   UNSPEC_COND_FCMUO))]
> >  )
> >
> > +;; Make sure that inversions of masked comparisons are always on the
> mask
> > +;; instead of on the operation.
> > +(define_insn_and_split "*mask_inv_combine"
> 
> I think it would be better to explain why this is an optimisation
> in its own right, e.g.:
> 
> ;; Similar to *fcm_and_combine, but for BIC rather than
> AND.
> ;; In this case, we still need a separate NOT/BIC operation, but predicating
> ;; the comparison on the BIC operand removes the need for a PTRUE.
> 
> For the same reason, calling it *fcm_bic_combine might
> be more consistent with surrounding code.
> 
> It would be good to have a pattern for FCMUO as well, even though that
> isn't a focus of the work.
> 
> > +  [(set (match_operand: 0 "register_operand" "=Upa")
> > +   (and:
> > + (and:
> > +   (not:
> > + (unspec:
> > +   [(match_operand: 1)
> > +(const_int SVE_KNOWN_PTRUE)
> > +(match_operand:SVE_FULL_F 2 "register_operand" "w")
> > +(match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero"
> "wDz")]
> > +   SVE_COND_FP_CMP_I0))
> > +   (match_operand: 4 "register_operand" "Upa"))
> > + (match_dup: 1)))
> 
> Indentation looks off here: this is a sibling of the inner “and”.

Fixed.

> 
> > +   (clobber (match_scratch: 5 "=&Upa"))]
> 
> This needs to be Upl, since it's the target of a comparison.

Fixed, and I didn't realise this property of the ISA at all until now..

> 
> > +  "TARGET_SVE"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 5)
> > +   (unspec:
> > + [(match_dup 4)
> > +  (const_int SVE_MAYBE_NOT_PTRUE)
> > +  (match_dup 2)
> > +  (match_dup 3)]
> > + SVE_COND_FP_CMP_I0))
> > +   (set (match_dup 0)
> > +   (and:
> > + (not:
> > +   (match_dup 5))
> > + (match_dup 4)))]
> > +{
> > +  operands[5] = gen_reg_rtx (mode);
> 
> This should be protected by:
> 
>   if (can_create_pseudo_p ())
> 
> since for post-reload splits we should use operand 5 unaltered.

Done.

> 
> It would be good to test the patch with the "&& 1" changed to
> "&& reload_completed", to make sure that things still work for
> post-RA splits.  I *think* the changes above are the only ones
> needed to make that true, but nothing beats trying.

Yup, with && reload_completed it still works. CSE misses it as expected but 
other than that still works.

> 
> > +}
> > +)
> > +
> >  ;; 
> > -
> >  ;;  [FP] Absolute comparisons
> >  ;; 
> > -

Re: [PATCH] c++: constrained variable template issues [PR98486]

2021-09-16 Thread Patrick Palka via Gcc-patches
On Thu, 16 Sep 2021, Jason Merrill wrote:

> On 9/16/21 09:11, Patrick Palka wrote:
> > This fixes some issues with constrained variable templates:
> > 
> >* Constraints aren't checked when explicitly specializing a variable
> >  template
> >* Constraints aren't attached to a static data member template at
> >  parse time
> >* Constraints aren't propagated when (partially) instantiating a
> >  static data member template
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
> > cmcstl2 and range-v3, does this look OK for trunk and perhaps 11?
> > 
> > PR c++/98486
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl.c (grokdeclarator): Set constraints on a static data
> > member template.
> > * pt.c (determine_specialization): Check constraints on a
> > variable template.
> 
> These hunks are OK.
> 
> > (tsubst_decl) : Propagate constraints on a
> > static data member template.
> 
> Hmm, why is this necessary?  I know we already do this for functions, but I
> don't remember why.  Don't we check satisfaction for the most-general
> template?

Ah true, it looks like propagating constraints is not strictly necessary
for satisfaction for that reason..

But propagating them seems necessary for disambiguating constrained
overloads in a class template specialization:

  template
  struct A
  {
void f() requires true;  // #1
void f() requires false; // #2
  };

  template struct A;

Without the propagation in tsubst_function_decl, during instantiation of
A we complain from add_method that #2 cannot be overloaded with #1.

But I don't think this is a probem for static data member templates
since they can't be overloaded, so indeed there's no reason to propagate
constraints on them if we tweak get_normalized_constraints_from_decl.

How does the following look?  Passes all the concepts tests so far, full
testing in progress:

-- >8 --

gcc/cp/ChangeLog:

* constraint.cc (get_normalized_constraints_from_decl): Look up
constraints using the most general template instead of the
specialization.
* decl.c (grokdeclarator): Set constraints on a static data
member template.
* pt.c (determine_specialization): Check constraints on a
variable template.
(tsubst_decl) : Propagate constraints on a
static data member template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-var-templ1.C: New test.
* g++.dg/cpp2a/concepts-var-templ1a.C: New test.
* g++.dg/cpp2a/concepts-var-templ1b.C: New test.
---
 gcc/cp/constraint.cc  |  8 +---
 gcc/cp/decl.c | 11 +++
 gcc/cp/pt.c   |  3 ++-
 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C  |  9 +
 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C | 14 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C | 15 +++
 6 files changed, 56 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 1aaf1e27886..2896efdd7f2 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -918,20 +918,22 @@ get_normalized_constraints_from_decl (tree d, bool diag = 
false)
   tmpl = most_general_template (tmpl);
   }
 
+  d = tmpl ? tmpl : decl;
+
   /* If we're not diagnosing errors, use cached constraints, if any.  */
   if (!diag)
-if (tree *p = hash_map_safe_get (normalized_map, tmpl))
+if (tree *p = hash_map_safe_get (normalized_map, d))
   return *p;
 
   tree norm = NULL_TREE;
-  if (tree ci = get_constraints (decl))
+  if (tree ci = get_constraints (d))
 {
   push_access_scope_guard pas (decl);
   norm = get_normalized_constraints_from_info (ci, tmpl, diag);
 }
 
   if (!diag)
-hash_map_safe_put (normalized_map, tmpl, norm);
+hash_map_safe_put (normalized_map, d, norm);
 
   return norm;
 }
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c0f1496636f..7beac79ec25 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13980,6 +13980,17 @@ grokdeclarator (const cp_declarator *declarator,
if (declspecs->gnu_thread_keyword_p)
  SET_DECL_GNU_TLS_P (decl);
  }
+
+   /* Set the constraints on declaration.  */
+   bool memtmpl = (processing_template_decl
+   > template_class_depth (current_class_type));
+   if (memtmpl)
+ {
+   tree tmpl_reqs
+ = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
+   tree ci = build_constraints (tmpl_reqs, NULL_TREE);
+   set_constraints (decl, ci);
+ }
  }

Re: [PATCH][v2] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-16 Thread Koning, Paul via Gcc-patches



> On Sep 16, 2021, at 11:05 AM, Jeff Law  wrote:
> 
> 
> On 9/16/2021 1:41 AM, Richard Biener wrote:
>> ...
>> That said - yes, I'd consider a.out purely legacy and not fit
>> for the future.  But it never came up on the radar of standing
>> in the way of modernizing GCC in any area.
> I'd definitely consider a.out & SOM as purely legacy.  As long as they 
> continue to work, great, but I wouldn't make any significant investment in 
> either.  And yes, there are mechanisms in collect2 to support things like 
> global initializers/finalizers on a.out systems.

"Legacy" sounds fine.  My main concern was whether it was, or is likely to 
become soon, "deprecated" or "unsupported".  For an old platform to use legacy 
formats is perfectly sensible, for it to use deprecated mechanisms is not.

For this to work, if there are no supported debug formats for the object format 
in question -- which will be the case for a.out with STABS going away -- that 
would mean you'd get output without debug symbols.  There was a suggestion that 
this wouldn't be allowed and that it would be grounds for removing such 
platforms.  I'd rather not see things tied like that.

paul



[PATCH] [i386] Change ix86_decompose_address return type to bool.

2021-09-16 Thread Uros Bizjak via Gcc-patches
After a recent change only a boolean value is returned.

2021-09-16  Uroš Bizjak  

gcc/
* config/i386/i386-protos.h (ix86_decompose_address):
Change return type to bool.
* config/i386/i386.c (ix86_decompose_address): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index dcae34b915e..708834ae832 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -320,7 +320,7 @@ struct ix86_address
   addr_space_t seg;
 };
 
-extern int ix86_decompose_address (rtx, struct ix86_address *);
+extern bool ix86_decompose_address (rtx, struct ix86_address *);
 extern int memory_address_length (rtx, bool);
 extern void x86_output_aligned_bss (FILE *, tree, const char *,
unsigned HOST_WIDE_INT, unsigned);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d7abff0f396..337ea291780 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10101,10 +10101,10 @@ ix86_live_on_entry (bitmap regs)
 }
 
 /* Extract the parts of an RTL expression that is a valid memory address
-   for an instruction.  Return 0 if the structure of the address is
+   for an instruction.  Return false if the structure of the address is
grossly off.  */
 
-int
+bool
 ix86_decompose_address (rtx addr, struct ix86_address *out)
 {
   rtx base = NULL_RTX, index = NULL_RTX, disp = NULL_RTX;
@@ -10123,17 +10123,17 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
{
  addr = XEXP (addr, 0);
  if (CONST_INT_P (addr))
-   return 0;
+   return false;
} 
   else if (GET_CODE (addr) == AND
   && const_32bit_mask (XEXP (addr, 1), DImode))
{
  addr = lowpart_subreg (SImode, XEXP (addr, 0), DImode);
  if (addr == NULL_RTX)
-   return 0;
+   return false;
 
  if (CONST_INT_P (addr))
-   return 0;
+   return false;
}
   else if (GET_CODE (addr) == AND)
{
@@ -10167,7 +10167,7 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
{
  addr = SUBREG_REG (addr);
  if (CONST_INT_P (addr))
-   return 0;
+   return false;
}
 }
 
@@ -10178,7 +10178,7 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
   if (REG_P (SUBREG_REG (addr)))
base = addr;
   else
-   return 0;
+   return false;
 }
   else if (GET_CODE (addr) == PLUS)
 {
@@ -10189,13 +10189,13 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
   do
{
  if (n >= 4)
-   return 0;
+   return false;
  addends[n++] = XEXP (op, 1);
  op = XEXP (op, 0);
}
   while (GET_CODE (op) == PLUS);
   if (n >= 4)
-   return 0;
+   return false;
   addends[n] = op;
 
   for (i = n; i >= 0; --i)
@@ -10205,28 +10205,28 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
{
case MULT:
  if (index)
-   return 0;
+   return false;
  index = XEXP (op, 0);
  scale_rtx = XEXP (op, 1);
  break;
 
case ASHIFT:
  if (index)
-   return 0;
+   return false;
  index = XEXP (op, 0);
  tmp = XEXP (op, 1);
  if (!CONST_INT_P (tmp))
-   return 0;
+   return false;
  scale = INTVAL (tmp);
  if ((unsigned HOST_WIDE_INT) scale > 3)
-   return 0;
+   return false;
  scale = 1 << scale;
  break;
 
case ZERO_EXTEND:
  op = XEXP (op, 0);
  if (GET_CODE (op) != UNSPEC)
-   return 0;
+   return false;
  /* FALLTHRU */
 
case UNSPEC:
@@ -10235,12 +10235,12 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
  && seg == ADDR_SPACE_GENERIC)
seg = DEFAULT_TLS_SEG_REG;
  else
-   return 0;
+   return false;
  break;
 
case SUBREG:
  if (!REG_P (SUBREG_REG (op)))
-   return 0;
+   return false;
  /* FALLTHRU */
 
case REG:
@@ -10249,7 +10249,7 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
  else if (!index)
index = op;
  else
-   return 0;
+   return false;
  break;
 
case CONST:
@@ -10257,12 +10257,12 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
case SYMBOL_REF:
case LABEL_REF:
  if (disp)
-   return 0;
+   return false;
  disp = op;
  break;
 

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Qing Zhao via Gcc-patches



> On Sep 16, 2021, at 10:47 AM, Jakub Jelinek  wrote:
> 
> On Thu, Sep 16, 2021 at 03:39:46PM +, Qing Zhao wrote:
>>> Even -mtune= is needed if you want to stay safe, otherwise people testing
>>> with --target_board=unix/-mtune=cascadelake (or whatever else) might get
>>> failures.
>> 
>> Okay. Will try this.
>>> 
> and ideally also -fno-stack-protector
> -fno-stack-clash-protection etc.
 
 Could you explain a little bit on this why?
>>> 
>>> In case people test e.g. with --target_board=unix/\{,-fstack-protector-all\}
>>> etc. (e.g. in Fedora/RHEL we do).
>>> For the RTL scanning checks if they are done fairly early, those options
>>> might not change anything, but with the ones scanning in the assembly,
>>> one needs to watch if those options don't add e.g. in the prologue or
>>> epilogue further copies of the instructions you scan for.
>> 
>> I see. 
>> 
>> Thank you.
> 
> Basically, try to test with a bunch of semi-randomly chosen option sets and
> see what breaks and what works and then for the cases you think are common
> enough and worth adjusting testcases adjust them, otherwise add dg-options
> to make sure the expected arch/tune/etc. are in effect.
> make check-gcc 
> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
>  i386.exp=auto-init*'

Thanks a lot for the suggestions and help, I will try this.

Qing
> etc.
> 
>   Jakub
> 



Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Iain Sandoe via Gcc-patches



> On 16 Sep 2021, at 18:11, Qing Zhao via Gcc-patches  
> wrote:
> 
> 
> 
>> On Sep 16, 2021, at 10:47 AM, Jakub Jelinek  wrote:
>> 
>> On Thu, Sep 16, 2021 at 03:39:46PM +, Qing Zhao wrote:
 Even -mtune= is needed if you want to stay safe, otherwise people testing
 with --target_board=unix/-mtune=cascadelake (or whatever else) might get
 failures.
>>> 
>>> Okay. Will try this.
 
>> and ideally also -fno-stack-protector
>> -fno-stack-clash-protection etc.
> 
> Could you explain a little bit on this why?
 
 In case people test e.g. with 
 --target_board=unix/\{,-fstack-protector-all\}
 etc. (e.g. in Fedora/RHEL we do).
 For the RTL scanning checks if they are done fairly early, those options
 might not change anything, but with the ones scanning in the assembly,
 one needs to watch if those options don't add e.g. in the prologue or
 epilogue further copies of the instructions you scan for.
>>> 
>>> I see. 
>>> 
>>> Thank you.
>> 
>> Basically, try to test with a bunch of semi-randomly chosen option sets and
>> see what breaks and what works and then for the cases you think are common
>> enough and worth adjusting testcases adjust them, otherwise add dg-options
>> to make sure the expected arch/tune/etc. are in effect.
>> make check-gcc 
>> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
>>  i386.exp=auto-init*'
> 
> Thanks a lot for the suggestions and help, I will try this.

I might suggest adding -fPIC or -fpic to the mix too (if it’s relevant to the 
tests) there are quite a few testcases that fail when run on Darwin (or HJ’s 
pic tester versions) because of the difference in code-gen.
Iain

> 
> Qing
>> etc.
>> 
>>  Jakub



Re: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-16 Thread Jason Merrill via Gcc-patches

On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote:

This and following patches are composed to enable full devirtualization
under whole program assumption (so also called whole-program
devirtualization, WPD for short), which is an enhancement to current
speculative devirtualization. The base of the optimization is how to
identify class type that is local in terms of whole-program scope, at
least  those class types in libstdc++ must be excluded in some way.
Our means is to use typeinfo symbol as identity marker of a class since
it is unique and always generated once the class or its derived type
is instantiated somewhere, and rely on symbol resolution by
lto-linker-plugin to detect whether  a typeinfo is referenced by regular
object/library, which indirectly tells class types are escaped or not.
The RFC at https://gcc.gnu.org/pipermail/gcc/2021-August/237132.html
gives more details on that.

Bootstrapped/regtested on x86_64-linux and aarch64-linux.

Thanks,
Feng


2021-09-07  Feng Xue  

gcc/
* common.opt (-fdevirtualize-fully): New option.
* class.c (build_rtti_vtbl_entries): Force generation of typeinfo
even -fno-rtti is specificied under full devirtualization.


This makes -fno-rtti useless; rather than this, you should warn about 
the combination of flags and force flag_rtti on.  It also sounds like 
you depend on the library not being built with -fno-rtti.



* cgraph.c (cgraph_update_edges_for_call_stmt): Add an assertion
to check node to be traversed.
* cgraphclones.c (cgraph_node::find_replacement): Record
former_clone_of on replacement node.
* cgraphunit.c (symtab_node::needed_p): Always output vtable for
full devirtualization.
(analyze_functions): Force generation of primary vtables for all
base classes.
* ipa-devirt.c (odr_type_d::whole_program_local): New field.
(odr_type_d::has_virtual_base): Likewise.
(odr_type_d::all_derivations_known): Removed.
(odr_type_d::whole_program_local_p): New member function.
(odr_type_d::all_derivations_known_p): Likewise.
(odr_type_d::possibly_instantiated_p): Likewise.
(odr_type_d::set_has_virtual_base): Likewise.
(get_odr_type): Set "whole_program_local" and "has_virtual_base"
when adding a type.
(type_all_derivations_known_p): Replace implementation by a call
to odr_type_d::all_derivations_known_p.
(type_possibly_instantiated_p): Replace implementation by a call
to odr_type_d::possibly_instantiated_p.
(type_known_to_have_no_derivations_p): Replace call to
type_possibly_instantiated_p with call to
odr_type_d::possibly_instantiated_p.
(type_all_ctors_visible_p): Removed.
(type_whole_program_local_p): New function.
(get_type_vtable): Likewise.
(extract_typeinfo_in_vtable): Likewise.
(identify_whole_program_local_types): Likewise.
(dump_odr_type): Dump has_virtual_base and whole_program_local_p()
of type.
(maybe_record_node): Resort to type_whole_program_local_p to
check whether a class has been optimized away.
(record_target_from_binfo): Remove parameter "anonymous", add
a new parameter "possibly_instantiated", and adjust code
accordingly.
(devirt_variable_node_removal_hook): Replace call to
"type_in_anonymous_namespace_p" with "type_whole_program_local_p".
(possible_polymorphic_call_targets): Replace call to
"type_possibly_instantiated_p" with "possibly_instantiated_p",
replace flag check on "all_derivations_known" with call to
 "all_derivations_known_p".
* ipa-icf.c (filter_removed_items): Disable folding on vtable
under full devirtualization.
* ipa-polymorphic-call.c (restrict_to_inner_class): Move odr
type check to type_known_to_have_no_derivations_p.
* ipa-utils.h (identify_whole_program_local_types): New
declaration.
(type_all_derivations_known_p): Parameter type adjustment.
* ipa.c (walk_polymorphic_call_targets): Do not mark vcall
targets as reachable for full devirtualization.
(can_remove_vtable_if_no_refs_p): New function.
(symbol_table::remove_unreachable_nodes): Add defined vtables
to reachable list under full devirtualization.
* lto-symtab.c (lto_symtab_merge_symbols): Identify whole
program local types after symbol table merge.
---





[PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-09-16 Thread Di Zhao OS via Gcc-patches
Sorry about updating on this after so long. It took me much time to work out a
new plan and pass the tests.

The new idea is to use one variable to represent a set of equal variables at
some basic-block. This variable is called a "equivalence head" or "equiv-head"
in the code. (There's no-longer a "equivalence map".)

- Initially an SSA_NAME's "equivalence head" is its value number. Temporary
  equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
  map. Besides, when inserting into vn_nary_op_t map, make the new result at
  front of the vn_pval list, so that when searching for a variable's
  equivalence head, the first result represents the largest equivalence set at
  current location.
- In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
  For recorded equivalences, the reference is result->entry; for normal N-ary
  operations, the reference is operand->entry.
- When recording equivalences, if one side A is constant or has more refs, make
  it the new equivalence head of the other side B. Traverse B's ref-list, if a
  variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
  operations by replacing B with A.
- When inserting and looking for the results of n-ary operations, insert and
  lookup by the operands' equiv-heads.

So except for the refs in vn_ssa_aux_t, this scheme uses the original
infrastructure to its best. Quadric search time is avoided at the cost of some
re-insertions. Test results on SPEC2017 intrate (counts and percentages):

|more bb |more bb |more stmt|more stmt|more   |more
|removed |removed |removed  |removed  |nv_nary_ops|nv_nary_ops
|at fre1 |at fre1 |at fre1  |at fre1  |inserted   |inserted
--
 500.perlbench_r| 64 | 1.98%  | 103 | 0.19%   | 11260 | 12.16%
 502.gcc_r  | 671| 4.80%  | 317 | 0.23%   | 13964 | 6.09%
 505.mcf_r  | 5  | 35.71% | 9   | 1.40%   | 32| 2.52%
 520.omnetpp| 132| 5.45%  | 39  | 0.11%   | 1895  | 3.91%
 523.xalancbmk_r| 238| 3.26%  | 313 | 0.36%   | 1417  | 1.27%
 525.x264_r | 4  | 1.36%  | 27  | 0.11%   | 1752  | 6.78%
 531.deepsjeng_r| 1  | 3.45%  | 2   | 0.14%   | 228   | 8.67%
 541.leela_r| 2  | 0.63%  | 0   | 0.00%   | 92| 1.14%
 548.exchange2_r| 0  | 0.00%  | 3   | 0.04%   | 49| 1.03%
 557.xz_r   | 0  | 0.00%  | 3   | 0.07%   | 272   | 7.55%

There're more basic_blocks and statements removed compared with last
implementation, the reasons are:
1) "CONST op CONST" simplification is included. It is missed in previous patch.
2) By inserting RHS of statements on equiv-heads, more N-ary operations can be
   simplified. One example is in 'ssa-fre-97.c' in the patch file.

While jump-threading & vrp also utilize temporary equivalences (so some of the
newly removed blocks and statements can also be covered by them), I think this
patch is a supplement, in cases when jump threading cannot take place (the
original example), or value number info needs to be involved (the
'ssa-fre-97.c' example).

Fixed the former issue with non-iterate mode.

About recording the temporary equivalences generated by PHIs (i.e. the
'record_equiv_from_previous_cond' stuff), I have to admit it looks strange and
the code size is large, but I haven't find a better way yet. Consider a piece
of CFG like the one below, if we want to record x==x2 on the true edge when
processing bb1, the location (following current practice) will be bb2. But that
is not useful at bb5 or bb6, because bb2 doesn't dominate them. And I can't
find a place to record x==x1 when processing bb1.
If we can record things on edges rather than blocks, say x==x1 on 1->3 and
x==x2 on 1->2, then perhaps with an extra check for "a!=0", x2 can be a valid
equiv-head for x since bb5. But I think it lacks efficiency and is not
persuasive. It is more efficient to find a valid previous predicate when
processing bb4, because the vn_nary_op_t will be fetched anyway.
--
| if (a != 0) | bb1
--
f |  \ t
  |---
  || bb2 | 
  |---
  |  /
-
| x = PHI | bb3
-
   |
  
   |
   --
   | if (a != 0) | bb4
   --
   |f \t
-  ---
bb7 | where |  | bb5 |  ==> where "x==x2" is recorded now
| "x==x1" is|  ---
| recorded  |\
| now   |...
- |
   ---
   | bb6 |  ==> where "x==x2" needs to be used
   ---
Although I think I can remove the 'dominator_to_phi_map' and generalize this a
little, but the major logic will be similar. So I 

Re: [PATCH] c++: fix wrong fixit hints for misspelled typedef [PR77565]

2021-09-16 Thread Jason Merrill via Gcc-patches

On 9/16/21 11:50, Michel Morin wrote:

On Thu, Sep 16, 2021 at 5:44 AM Jason Merrill  wrote:


On 9/14/21 04:29, Michel Morin via Gcc-patches wrote:

On Tue, Sep 14, 2021 at 7:14 AM David Malcolm  wrote:


On Tue, 2021-09-14 at 03:35 +0900, Michel Morin via Gcc-patches wrote:

Hi,

PR77565 reports that, with the code `typdef int Int;`, GCC emits
"did you mean 'typeof'?" instead of "did you mean 'typedef'?".

This happens because the typo corrector determines that `typeof` is a
candidate for suggestion (through
`cp_keyword_starts_decl_specifier_p`),
but `typedef` is not.

This patch fixes the issue by adding `typedef` as a candidate. The
patch
additionally adds the `inline` specifier and cv-specifiers as a
candidate.
Here is a patch (tests `make check-gcc` pass on darwin):


Thanks for this patch (and for reporting the bug in the first place).

I notice that, as well as being used for fix-it hints by
lookup_name_fuzzy (indirectly via suggest_rid_p),
cp_keyword_starts_decl_specifier_p is also used by
cp_lexer_next_token_is_decl_specifier_keyword, which is used by
cp_parser_lambda_declarator_opt and cp_parser_constructor_declarator_p.


Ah, you're right! Thank you for pointing this out.
I failed to grep those functions somehow.

One thing that confuses me is that cp_keyword_starts_decl_specifier_p
misses many keywords that can start decl-specifiers (e.g.
typedef/inline/cv-qual and friend/explicit/virtual).
So let's wait C++ frontend maintainers ;)


That is strange.  Let's add all the rest of them as well.


Done. Thanks for your help!

One more thing — cp_keyword_starts_decl_specifier_p includes RID_ATTRIBUTE
(from the beginning; see https://gcc.gnu.org/PR28261 ), but attributes are
not decl-specifiers. Would it be reasonable to remove this?


It looks like the place that PR28261 used 
cp_lexer_next_token_is_decl_specifier_keyword specifically exempts 
attributes:



  && (!cp_lexer_next_token_is_decl_specifier_keyword (parser->lexer)
  /* GNU attributes can actually appear both at the start of   
 a parameter and parenthesized declarator. 
 S (__attribute__((unused)) int);  
 is a constructor, but 
 S (__attribute__((unused)) foo) (int);
 is a function declaration.  */

  || (cp_parser_allow_gnu_extensions_p (parser)
  && cp_next_tokens_can_be_gnu_attribute_p (parser)))


So yes, let's remove RID_ATTRIBUTE and the || clause there.  I'd keep 
the comment, but move it to go with the test for C++11 attributes below.



Both patches (with and without removal of RID_ATTRIBUTE) attached.
No regressions on x86_64-apple-darwin.

Regards,
Michel




So I'm not sure if this fix is exactly correct - hopefully one of the
C++ frontend maintainers can chime in.  If
cp_keyword_starts_decl_specifier_p isn't quite the right place for
this, the fix could probably go in suggest_rid_p instead, which *is*
specific to spelling corrections.

Hope this is constructive; thanks again for the patch
Dave






c++: add typo corrections for typedef/inline/cv-qual [PR77565]

PR c++/77565

gcc/cp/ChangeLog:

* parser.c (cp_keyword_starts_decl_specifier_p): Handle
typedef/inline specifiers and cv-qualifiers.

gcc/testsuite/ChangeLog:

* g++.dg/spellcheck-typenames.C: Add tests for decl-specs.

--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -1051,6 +1051,12 @@ cp_keyword_starts_decl_specifier_p (enum rid
keyword)
   case RID_FLOAT:
   case RID_DOUBLE:
   case RID_VOID:
+  /* CV qualifiers.  */
+case RID_CONST:
+case RID_VOLATILE:
+  /* typedef/inline specifiers.  */
+case RID_TYPEDEF:
+case RID_INLINE:
 /* GNU extensions.  */
   case RID_ATTRIBUTE:
   case RID_TYPEOF:
--- a/gcc/testsuite/g++.dg/spellcheck-typenames.C
+++ b/gcc/testsuite/g++.dg/spellcheck-typenames.C
@@ -76,3 +76,38 @@ singed char ch; // { dg-error "1: 'singed' does
not
name a type; did you mean 's
^~
signed
  { dg-end-multiline-output "" } */
+
+typdef int my_int; // { dg-error "1: 'typdef' does not name a type;
did you mean 'typedef'?" }
+/* { dg-begin-multiline-output "" }
+ typdef int my_int;
+ ^~
+ typedef
+   { dg-end-multiline-output "" } */
+
+inlien int inline_func(); // { dg-error "1: 'inlien' does not name a
type; did you mean 'inline'?" }
+/* { dg-begin-multiline-output "" }
+ inlien int inline_func();
+ ^~
+ inline
+   { dg-end-multiline-output "" } */
+
+coonst int ci = 0; // { dg-error "1: 'coonst' does not name a type;
did you mean 'const'?" }
+/* { dg-begin-multiline-output "" }
+ coonst int ci = 0;
+ ^~
+ const
+   { dg-end-multiline-out

Re: [PATCH] c++: constrained variable template issues [PR98486]

2021-09-16 Thread Jason Merrill via Gcc-patches

On 9/16/21 12:44, Patrick Palka wrote:

On Thu, 16 Sep 2021, Jason Merrill wrote:


On 9/16/21 09:11, Patrick Palka wrote:

This fixes some issues with constrained variable templates:

* Constraints aren't checked when explicitly specializing a variable
  template
* Constraints aren't attached to a static data member template at
  parse time
* Constraints aren't propagated when (partially) instantiating a
  static data member template

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
cmcstl2 and range-v3, does this look OK for trunk and perhaps 11?

PR c++/98486

gcc/cp/ChangeLog:

* decl.c (grokdeclarator): Set constraints on a static data
member template.
* pt.c (determine_specialization): Check constraints on a
variable template.


These hunks are OK.


(tsubst_decl) : Propagate constraints on a
static data member template.


Hmm, why is this necessary?  I know we already do this for functions, but I
don't remember why.  Don't we check satisfaction for the most-general
template?


Ah true, it looks like propagating constraints is not strictly necessary
for satisfaction for that reason..

But propagating them seems necessary for disambiguating constrained
overloads in a class template specialization:

   template
   struct A
   {
 void f() requires true;  // #1
 void f() requires false; // #2
   };

   template struct A;

Without the propagation in tsubst_function_decl, during instantiation of
A we complain from add_method that #2 cannot be overloaded with #1.

But I don't think this is a probem for static data member templates
since they can't be overloaded, so indeed there's no reason to propagate
constraints on them if we tweak get_normalized_constraints_from_decl.

How does the following look?  Passes all the concepts tests so far, full
testing in progress:


OK.


-- >8 --

gcc/cp/ChangeLog:

* constraint.cc (get_normalized_constraints_from_decl): Look up
constraints using the most general template instead of the
specialization.
* decl.c (grokdeclarator): Set constraints on a static data
member template.
* pt.c (determine_specialization): Check constraints on a
variable template.
(tsubst_decl) : Propagate constraints on a
static data member template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-var-templ1.C: New test.
* g++.dg/cpp2a/concepts-var-templ1a.C: New test.
* g++.dg/cpp2a/concepts-var-templ1b.C: New test.
---
  gcc/cp/constraint.cc  |  8 +---
  gcc/cp/decl.c | 11 +++
  gcc/cp/pt.c   |  3 ++-
  gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C  |  9 +
  gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C | 14 ++
  gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C | 15 +++
  6 files changed, 56 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1a.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-var-templ1b.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 1aaf1e27886..2896efdd7f2 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -918,20 +918,22 @@ get_normalized_constraints_from_decl (tree d, bool diag = 
false)
tmpl = most_general_template (tmpl);
}
  
+  d = tmpl ? tmpl : decl;

+
/* If we're not diagnosing errors, use cached constraints, if any.  */
if (!diag)
-if (tree *p = hash_map_safe_get (normalized_map, tmpl))
+if (tree *p = hash_map_safe_get (normalized_map, d))
return *p;
  
tree norm = NULL_TREE;

-  if (tree ci = get_constraints (decl))
+  if (tree ci = get_constraints (d))
  {
push_access_scope_guard pas (decl);
norm = get_normalized_constraints_from_info (ci, tmpl, diag);
  }
  
if (!diag)

-hash_map_safe_put (normalized_map, tmpl, norm);
+hash_map_safe_put (normalized_map, d, norm);
  
return norm;

  }
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c0f1496636f..7beac79ec25 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13980,6 +13980,17 @@ grokdeclarator (const cp_declarator *declarator,
if (declspecs->gnu_thread_keyword_p)
  SET_DECL_GNU_TLS_P (decl);
  }
+
+   /* Set the constraints on declaration.  */
+   bool memtmpl = (processing_template_decl
+   > template_class_depth (current_class_type));
+   if (memtmpl)
+ {
+   tree tmpl_reqs
+ = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
+   tree ci = build_constraints (tmpl_reqs, NULL_TREE);
+   set_constraints (decl, ci);
+

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-16 Thread Qing Zhao via Gcc-patches


> On Sep 16, 2021, at 12:39 PM, Iain Sandoe  wrote:
> 
> 
> 
>> On 16 Sep 2021, at 18:11, Qing Zhao via Gcc-patches 
>>  wrote:
>> 
>> 
>> 
>>> On Sep 16, 2021, at 10:47 AM, Jakub Jelinek  wrote:
>>> 
>>> On Thu, Sep 16, 2021 at 03:39:46PM +, Qing Zhao wrote:
> Even -mtune= is needed if you want to stay safe, otherwise people testing
> with --target_board=unix/-mtune=cascadelake (or whatever else) might get
> failures.
 
 Okay. Will try this.
> 
>>> and ideally also -fno-stack-protector
>>> -fno-stack-clash-protection etc.
>> 
>> Could you explain a little bit on this why?
> 
> In case people test e.g. with 
> --target_board=unix/\{,-fstack-protector-all\}
> etc. (e.g. in Fedora/RHEL we do).
> For the RTL scanning checks if they are done fairly early, those options
> might not change anything, but with the ones scanning in the assembly,
> one needs to watch if those options don't add e.g. in the prologue or
> epilogue further copies of the instructions you scan for.
 
 I see. 
 
 Thank you.
>>> 
>>> Basically, try to test with a bunch of semi-randomly chosen option sets and
>>> see what breaks and what works and then for the cases you think are common
>>> enough and worth adjusting testcases adjust them, otherwise add dg-options
>>> to make sure the expected arch/tune/etc. are in effect.
>>> make check-gcc 
>>> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
>>>  i386.exp=auto-init*'
>> 
>> Thanks a lot for the suggestions and help, I will try this.
> 
> I might suggest adding -fPIC or -fpic to the mix too (if it’s relevant to the 
> tests) there are quite a few testcases that fail when run on Darwin (or HJ’s 
> pic tester versions) because of the difference in code-gen.

Okay, will add that. Thanks.

Qing
> Iain
> 
>> 
>> Qing
>>> etc.
>>> 
>>> Jakub



[committed] libstdc++: Add noexcept to unique_ptr accessors

2021-09-16 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h (__uniq_ptr_impl::_M_ptr)
(__uniq_ptr_impl::_M_deleter): Add noexcept.

Tested powerpc64le-linux. Committed to trunk.

commit 869107c9c9752c9a53cdb06179c1e6be6d2e5f44
Author: Jonathan Wakely 
Date:   Tue Sep 14 09:34:30 2021

libstdc++: Add noexcept to unique_ptr accessors

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h (__uniq_ptr_impl::_M_ptr)
(__uniq_ptr_impl::_M_deleter): Add noexcept.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 62ec1b52ecd..da582176e84 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -169,10 +169,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *this;
   }
 
-  pointer&   _M_ptr() { return std::get<0>(_M_t); }
-  pointer_M_ptr() const { return std::get<0>(_M_t); }
-  _Dp&   _M_deleter() { return std::get<1>(_M_t); }
-  const _Dp& _M_deleter() const { return std::get<1>(_M_t); }
+  pointer&   _M_ptr() noexcept { return std::get<0>(_M_t); }
+  pointer_M_ptr() const noexcept { return std::get<0>(_M_t); }
+  _Dp&   _M_deleter() noexcept { return std::get<1>(_M_t); }
+  const _Dp& _M_deleter() const noexcept { return std::get<1>(_M_t); }
 
   void reset(pointer __p) noexcept
   {


[committed] libstdc++: Add noexcept to std::to_string overloads that don't allocate

2021-09-16 Thread Jonathan Wakely via Gcc-patches
When the values is guaranteed to fit in the SSO buffer we know the
string won't allocate, so the function can be noexcept. For 32-bit
integers, we know they need no more than 9 bytes (or 10 with a minus
sign) and the SSO buffer is 15 bytes.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
(to_string): Add noexcept if the type width is 32 bits or less.

Tested x86_64-linux. Committed to trunk.

commit 9d813ddd978aff75001d53fe55ff15e9167bb4d0
Author: Jonathan Wakely 
Date:   Wed Sep 15 21:40:20 2021

libstdc++: Add noexcept to std::to_string overloads that don't allocate

When the values is guaranteed to fit in the SSO buffer we know the
string won't allocate, so the function can be noexcept. For 32-bit
integers, we know they need no more than 9 bytes (or 10 with a minus
sign) and the SSO buffer is 15 bytes.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
(to_string): Add noexcept if the type width is 32 bits or less.

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index b61fe05efcf..24c454d863a 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3718,6 +3718,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   inline string
   to_string(int __val)
+#if _GLIBCXX_USE_CXX11_ABI && (__CHAR_BIT__ * __SIZEOF_INT__) <= 32
+  noexcept // any 32-bit value fits in the SSO buffer
+#endif
   {
 const bool __neg = __val < 0;
 const unsigned __uval = __neg ? (unsigned)~__val + 1u : __val;
@@ -3729,6 +3732,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   inline string
   to_string(unsigned __val)
+#if _GLIBCXX_USE_CXX11_ABI && (__CHAR_BIT__ * __SIZEOF_INT__) <= 32
+  noexcept // any 32-bit value fits in the SSO buffer
+#endif
   {
 string __str(__detail::__to_chars_len(__val), '\0');
 __detail::__to_chars_10_impl(&__str[0], __str.size(), __val);
@@ -3737,6 +3743,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   inline string
   to_string(long __val)
+#if _GLIBCXX_USE_CXX11_ABI && (__CHAR_BIT__ * __SIZEOF_LONG__) <= 32
+  noexcept // any 32-bit value fits in the SSO buffer
+#endif
   {
 const bool __neg = __val < 0;
 const unsigned long __uval = __neg ? (unsigned long)~__val + 1ul : __val;
@@ -3748,6 +3757,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   inline string
   to_string(unsigned long __val)
+#if _GLIBCXX_USE_CXX11_ABI && (__CHAR_BIT__ * __SIZEOF_LONG__) <= 32
+  noexcept // any 32-bit value fits in the SSO buffer
+#endif
   {
 string __str(__detail::__to_chars_len(__val), '\0');
 __detail::__to_chars_10_impl(&__str[0], __str.size(), __val);


[committed] libstdc++: Fix recipes for C++11-compiled files in src/c++98

2021-09-16 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/c++98/Makefile.am: Use CXXCOMPILE not LTCXXCOMPILE.
* src/c++98/Makefile.in: Regenerate.

Tested x86_64-linux. Committed to trunk.

commit 2c351dafcbc871c088ce09ae69bc08871f7df57b
Author: Jonathan Wakely 
Date:   Wed Sep 15 21:39:27 2021

libstdc++: Fix recipes for C++11-compiled files in src/c++98

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/c++98/Makefile.am: Use CXXCOMPILE not LTCXXCOMPILE.
* src/c++98/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/src/c++98/Makefile.am 
b/libstdc++-v3/src/c++98/Makefile.am
index 0fa6ab95fb4..b48b57a2945 100644
--- a/libstdc++-v3/src/c++98/Makefile.am
+++ b/libstdc++-v3/src/c++98/Makefile.am
@@ -181,11 +181,11 @@ endif
 locale_init.lo: locale_init.cc
$(LTCXXCOMPILE) -std=gnu++11 -fchar8_t -c $<
 locale_init.o: locale_init.cc
-   $(LTCXXCOMPILE) -std=gnu++11 -fchar8_t -c $<
+   $(CXXCOMPILE) -std=gnu++11 -fchar8_t -c $<
 localename.lo: localename.cc
$(LTCXXCOMPILE) -std=gnu++11 -fchar8_t -c $<
 localename.o: localename.cc
-   $(LTCXXCOMPILE) -std=gnu++11 -fchar8_t -c $<
+   $(CXXCOMPILE) -std=gnu++11 -fchar8_t -c $<
 
 # Use special rules for the deprecated source files so that they find
 # deprecated include files.


[committed] libstdc++: Add missing constraint to std::span deduction guide [PR102280]

2021-09-16 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102280
* include/std/span (span(Range&&)): Add constraint to deduction
guide.

Tested x86_64-linux. Committed to trunk.

commit e67917f5df9d84f5aed3513b3931a82870d25135
Author: Jonathan Wakely 
Date:   Wed Sep 15 21:49:29 2021

libstdc++: Add missing constraint to std::span deduction guide [PR102280]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102280
* include/std/span (span(Range&&)): Add constraint to deduction
guide.

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index be053e8ef38..af0d24b29f2 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -409,7 +409,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 span(_Iter, _End)
   -> span>>;
 
-  template
+  template
 span(_Range &&)
   -> span>>;
 


[committed] libstdc++: Add missing 'constexpr' to std::tuple [PR102270]

2021-09-16 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Head_base, _Tuple_impl): Add
_GLIBCXX20_CONSTEXPR to allocator-extended constructors.
(tuple<>::swap(tuple&)): Add _GLIBCXX20_CONSTEXPR.
* testsuite/20_util/tuple/cons/102270.C: New test.

Tested x86_64-linux. Committed to trunk.

commit 734b2c2eedca50d966e22540fc136158c3633393
Author: Jonathan Wakely 
Date:   Wed Sep 15 21:53:35 2021

libstdc++: Add missing 'constexpr' to std::tuple [PR102270]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Head_base, _Tuple_impl): Add
_GLIBCXX20_CONSTEXPR to allocator-extended constructors.
(tuple<>::swap(tuple&)): Add _GLIBCXX20_CONSTEXPR.
* testsuite/20_util/tuple/cons/102270.C: New test.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index f082ccb8a3b..6f0dc6346e1 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -95,10 +95,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : _M_head_impl() { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(allocator_arg_t, __uses_alloc1<_Alloc> __a)
: _M_head_impl(allocator_arg, *__a._M_a) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(allocator_arg_t, __uses_alloc2<_Alloc> __a)
: _M_head_impl(*__a._M_a) { }
 
@@ -108,11 +110,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_head_impl(std::forward<_UHead>(__uhead)) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(__uses_alloc1<_Alloc> __a, _UHead&& __uhead)
: _M_head_impl(allocator_arg, *__a._M_a, std::forward<_UHead>(__uhead))
{ }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(__uses_alloc2<_Alloc> __a, _UHead&& __uhead)
: _M_head_impl(std::forward<_UHead>(__uhead), *__a._M_a) { }
 
@@ -142,26 +146,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr _Head_base(_UHead&& __h)
: _Head(std::forward<_UHead>(__h)) { }
 
+  _GLIBCXX20_CONSTEXPR
   _Head_base(allocator_arg_t, __uses_alloc0)
   : _Head() { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(allocator_arg_t, __uses_alloc1<_Alloc> __a)
: _Head(allocator_arg, *__a._M_a) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(allocator_arg_t, __uses_alloc2<_Alloc> __a)
: _Head(*__a._M_a) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(__uses_alloc0, _UHead&& __uhead)
: _Head(std::forward<_UHead>(__uhead)) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(__uses_alloc1<_Alloc> __a, _UHead&& __uhead)
: _Head(allocator_arg, *__a._M_a, std::forward<_UHead>(__uhead)) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(__uses_alloc2<_Alloc> __a, _UHead&& __uhead)
: _Head(std::forward<_UHead>(__uhead), *__a._M_a) { }
 
@@ -194,10 +204,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : _M_head_impl() { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(allocator_arg_t, __uses_alloc1<_Alloc> __a)
: _M_head_impl(allocator_arg, *__a._M_a) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(allocator_arg_t, __uses_alloc2<_Alloc> __a)
: _M_head_impl(*__a._M_a) { }
 
@@ -207,11 +219,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_head_impl(std::forward<_UHead>(__uhead)) { }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(__uses_alloc1<_Alloc> __a, _UHead&& __uhead)
: _M_head_impl(allocator_arg, *__a._M_a, std::forward<_UHead>(__uhead))
{ }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Head_base(__uses_alloc2<_Alloc> __a, _UHead&& __uhead)
: _M_head_impl(std::forward<_UHead>(__uhead), *__a._M_a) { }
 
@@ -467,6 +481,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
const _Head& __head)
: _Base(__use_alloc<_Head, _Alloc, const _Head&>(__a), __head)
@@ -955,6 +970,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class tuple<>
 {
 public:
+  _GLIBCXX20_CONSTEXPR
   void swap(tuple&) noexcept { /* no-op */ }
   // We need the default since we're going to define no-op
   // allocator constructors.
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/102270.C 
b/libstdc++-v3/testsuite/20_util/tuple/cons/102270.C
new file mode 100644
index 000..998329817c7
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/102270.C
@@ -0,0 +1,61 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+#include 
+
+// PR libstdc++/102270 - std::tuple<>::swap missing constexpr specifier
+
+constexpr bool swap_empty_tuple()
+{
+  std::tuple<> t, u;
+  t.swap(u);
+  ret

[committed] libstdc++: Remove non-deducible parameter for std::advance overload

2021-09-16 Thread Jonathan Wakely via Gcc-patches
This was just a copy and paste error.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (advance): Remove non-deducible
template parameter.

Tested x86_64-linux. Committed to trunk.

commit 21c760510d31253074577a14021fdc6ad44084b6
Author: Jonathan Wakely 
Date:   Thu Sep 16 13:35:24 2021

libstdc++: Remove non-deducible parameter for std::advance overload

This was just a copy and paste error.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (advance): Remove non-deducible
template parameter.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 3151af1e901..235d1df748f 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -1357,7 +1357,7 @@ inline ptrdiff_t
 distance(filesystem::path::iterator __first, filesystem::path::iterator __last)
 { return __path_iter_distance(__first, __last); }
 
-template
+template
   void
   advance(filesystem::path::iterator& __i, _Distance __n)
   { __path_iter_advance(__i, static_cast(__n)); }


[committed] libstdc++: Add noexcept to std::nullopt_t constructor

2021-09-16 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/optional (nullptr_t): Make constructor noexcept.

Tested x86_64-linux. Committed to trunk.

commit cbe705a2f749c98a5f803afeb207e175b4c9a3c3
Author: Jonathan Wakely 
Date:   Thu Sep 16 14:14:38 2021

libstdc++: Add noexcept to std::nullopt_t constructor

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/optional (nullptr_t): Make constructor noexcept.

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index b8ab7510757..b6ebe12b3e1 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -70,7 +70,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 enum class _Construct { _Token };
 
 // Must be constexpr for nullopt_t to be literal.
-explicit constexpr nullopt_t(_Construct) { }
+explicit constexpr nullopt_t(_Construct) noexcept { }
   };
 
   /// Tag to disengage optional objects.


[committed] libstdc++: Update documentation that only refers to c++98 and c++11

2021-09-16 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* doc/xml/manual/using.xml: Generalize to apply to more than
just -std=c++11.
* doc/html/manual/using_macros.html: Regenerate.

Committed to trunk.

commit bd0df30a7bc7a2e98e643cf84901e5383f83c005
Author: Jonathan Wakely 
Date:   Thu Sep 16 15:36:31 2021

libstdc++: Update documentation that only refers to c++98 and c++11

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* doc/xml/manual/using.xml: Generalize to apply to more than
just -std=c++11.
* doc/html/manual/using_macros.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 24543e9526e..65fde4609db 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -1065,7 +1065,7 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
hello.cc -o test.exe
removes older ARM-style iostreams code, and other anachronisms
from the API.  This macro is dependent on the version of the
standard being tracked, and as a result may give different results for
-   -std=c++98 and -std=c++11. This may
+   different -std options.  This may
be useful in updating old C++ code which no longer meet the
requirements of the language, or for checking current code
against new language standards.


[committed] libstdc++: Increase timeout factor for slow pb_ds tests

2021-09-16 Thread Jonathan Wakely via Gcc-patches
Compiling these tests still times out too often when running the
testsuite with more parallel jobs than there are available cores.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/ext/pb_ds/regression/tree_map_rand.cc: Increase
timeout factor to 3.
* testsuite/ext/pb_ds/regression/tree_set_rand.cc: Likewise.

Tested x86_64-linux. Committed to trunk.

commit 433789330609c571983a4e1f5c3e0caf3d7a6178
Author: Jonathan Wakely 
Date:   Thu Sep 16 20:10:02 2021

libstdc++: Increase timeout factor for slow pb_ds tests

Compiling these tests still times out too often when running the
testsuite with more parallel jobs than there are available cores.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/ext/pb_ds/regression/tree_map_rand.cc: Increase
timeout factor to 3.
* testsuite/ext/pb_ds/regression/tree_set_rand.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_map_rand.cc 
b/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_map_rand.cc
index 8ff6e926003..62904749ee7 100644
--- a/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_map_rand.cc
+++ b/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_map_rand.cc
@@ -4,7 +4,7 @@
 // { dg-require-cstdint "" }
 // This can take long on simulators, timing out the test.
 // { dg-options "-DITERATIONS=5" { target simulator } }
-// { dg-timeout-factor 2.0 }
+// { dg-timeout-factor 3.0 }
 
 // -*- C++ -*-
 
diff --git a/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_set_rand.cc 
b/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_set_rand.cc
index af7f7ffde22..16a864e0e30 100644
--- a/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_set_rand.cc
+++ b/libstdc++-v3/testsuite/ext/pb_ds/regression/tree_set_rand.cc
@@ -4,7 +4,7 @@
 // { dg-require-cstdint "" }
 // This can take long on simulators, timing out the test.
 // { dg-options "-DITERATIONS=5" { target simulator } }
-// { dg-timeout-factor 2.0 }
+// { dg-timeout-factor 3.0 }
 
 // -*- C++ -*-
 


[committed] libstdc++: Regenerate the src/debug Makefiles as needed

2021-09-16 Thread Jonathan Wakely via Gcc-patches
When the build configuration changes and Makefiles are recreated, the
src/debug/Makefile and src/debug/*/Makefile files are not recreated,
because they're not managed in the usual way by automake. This can lead
to build failures or surprising inconsistencies between the main and
debug versions of the library when doing incremental builds.

This causes them to be regenerated if any of the corresponding non-debug
makefiles is newer.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/Makefile.am (stamp-debug): Add all Makefiles as
prerequisites.
* src/Makefile.in: Regenerate.

Tested x86_64-linux. Committed to trunk.

commit fce4e12f8efb3b3db959b807201e08786b001f39
Author: Jonathan Wakely 
Date:   Thu Sep 16 21:21:56 2021

libstdc++: Regenerate the src/debug Makefiles as needed

When the build configuration changes and Makefiles are recreated, the
src/debug/Makefile and src/debug/*/Makefile files are not recreated,
because they're not managed in the usual way by automake. This can lead
to build failures or surprising inconsistencies between the main and
debug versions of the library when doing incremental builds.

This causes them to be regenerated if any of the corresponding non-debug
makefiles is newer.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/Makefile.am (stamp-debug): Add all Makefiles as
prerequisites.
* src/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
index 16f4cc6eff4..f27d3f8c87e 100644
--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -369,7 +369,7 @@ endif
 # Build a debug variant.
 # Take care to fix all possibly-relative paths.
 debugdir = ${glibcxx_builddir}/src/debug
-stamp-debug:
+stamp-debug: Makefile $(foreach dir,$(SUBDIRS),$(dir)/Makefile)
if test ! -d ${debugdir} || test ! -f ${debugdir}/Makefile ; then \
  mkdir -p ${debugdir}; \
  for d in $(SUBDIRS); do mkdir -p  ${debugdir}/$$d; done; \


[PATCH] libstdc++: Fix UB in atomic_ref/wait_notify.cc [PR101761]

2021-09-16 Thread Thomas Rodgers
From: Thomas Rodgers 

Remove UB in atomic_ref/wait_notify test.

Signed-off-by: Thomas Rodgers 

libstdc++-v3/ChangeLog:

PR libstdc++/101761
* testsuite/29_atomics/atomic_ref/wait_notify.cc (test): Use
va and vb as arguments to wait/notify, remove unused bb local.

Tested x86_64-pc-linux-gnu, committed to master.
Ok to backport to releases/gcc-11?

---
 .../testsuite/29_atomics/atomic_ref/wait_notify.cc | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
index b75e27617f7..b41d1ac0bb7 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
@@ -33,15 +33,14 @@ template
 if constexpr (std::atomic_ref::is_always_lock_free)
 {
   S aa{ va };
-  S bb{ vb };
   std::atomic_ref a{ aa };
-  a.wait(bb);
+  a.wait(vb);
   std::thread t([&]
 {
- a.store(bb);
+ a.store(vb);
  a.notify_one();
 });
-  a.wait(aa);
+  a.wait(va);
   t.join();
 }
   }
-- 
2.31.1



Re: [PATCH v2] analyzer: Define INCLUDE_UNIQUE_PTR

2021-09-16 Thread Gerald Pfeifer
On Tue, 14 Sep 2021, Maxim Blinov wrote:
> Un-break the build for AArch64 Darwin, see PR bootstrap/102242.  Build
> fails with log below:

David already acked this with

  "Does the patch fix the build for you?

  If so, looks good for trunk.  Please reference PR bootstrap/102242 
  in the ChangeLog entry."

Can you please go ahead and commit this to fix the regression and
bootstrap failure (and add PR bootstrap/102242 to the ChangeLog)?

Oh, I don't see your name in the MAINTAINERS file, so I went ahead
and committed this in your behalf.

(One out of four bootstrap failures gone, hopefully.)

Gerald

PS: It might have made sense to Cc: David on your updated patch and
note that you do not have commit access.


[PATCH] nvptx: Add (experimental) support for HFmode with -misa=sm_53

2021-09-16 Thread Roger Sayle

The recent flurry of activity around HFmode on gcc-patches intrigued me
to investigate adding HFmode support to the nvptx backend.  NVidia GPUs
with an SM ISA above 5.3 support IEEE 16-bit floating point instructions.
Hence, this patch adds support for -misa=sm_53, and implements some
backend patterns/insns sufficient for a proof-of-concept prototype.

Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points
where other useful instructions were added to the ISA.  Adding support
for this infrastructure now, simplifies adding (ISA conditional) insns
to the nvptx machine description (follow-up patches) in future.  I'm
happy to defer these changes/hunks until later if reviewers prefer.

The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.  Ok for mainline?


2020-09-16  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx-opts.h (ptx_isa): Add PTX_ISA_SM53,
PTX_ISA_SM75 and PTX_ISA_SM80 ISA levels to enumeration.
* config/nvptx/nvptx.opt: Add sm_53, sm_75 and sm_80 to -misa.
* config/nvptx/nvptx.h (TARGET_SM53, TARGET_SM75, TARGET_SM80):
New helper macros to conditionalize functionality on target ISA.
* config/nvptx/nvptx.c (nvptx_cpu_cpp_builtins): Add __PTX_SM__
support for the new ISA levels.

* config/nvptx/nvptx-modes.def: Add support for HFmode.
* config/nvptx/nvptx.c (nvtx_ptx_type_from_mode): Support
new HFmode with the ".f16" suffix/qualifier.
(nvptx_file_start): Add support for TARGET_SM53, TARGET_SM75
and TARGET_SM80.
(nvptx_omp_device_kind_arch_isa): Add support for TARGET_SM53
and tweak TARGET_SM35.
(nvptx_scalar_mode_supported_p): Target hook with conditional
HFmode support on TARGET_SM53 and higher.
(nvptx_libgcc_floating_mode_supported_p): Likewise.
(TARGET_SCALAR_MODE_SUPPORTED_P): Use nvptx_scalar_mode_supported_p.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Likewise, use new hook.

* config/nvptx/nvptx.md (*movhf_insn): New define_insn.
(movhf): New define_expand for HFmode moves.
(addhf3, subhf3, mulhf, extendhf2, trunchf2): New
instructions conditional on TARGET_SM53 (i.e. -misa=sm_53).

gcc/testsuite/ChangeLog
* gcc.target/nvptx/float16-1.c: New test case.

Roger
--

diff --git a/gcc/config/nvptx/nvptx-c.c b/gcc/config/nvptx/nvptx-c.c
index 72594a82e..d51ad00 100644
--- a/gcc/config/nvptx/nvptx-c.c
+++ b/gcc/config/nvptx/nvptx-c.c
@@ -39,7 +39,13 @@ nvptx_cpu_cpp_builtins (void)
 cpp_define (parse_in, "__nvptx_softstack__");
   if (TARGET_UNIFORM_SIMT)
 cpp_define (parse_in,"__nvptx_unisimt__");
-  if (TARGET_SM35)
+  if (TARGET_SM80)
+cpp_define (parse_in, "__PTX_SM__=800");
+  else if (TARGET_SM75)
+cpp_define (parse_in, "__PTX_SM__=750");
+  else if (TARGET_SM53)
+cpp_define (parse_in, "__PTX_SM__=530");
+  else if (TARGET_SM35)
 cpp_define (parse_in, "__PTX_SM__=350");
   else
 cpp_define (parse_in,"__PTX_SM__=300");
diff --git a/gcc/config/nvptx/nvptx-modes.def b/gcc/config/nvptx/nvptx-modes.def
index ff61b36..cc19a26 100644
--- a/gcc/config/nvptx/nvptx-modes.def
+++ b/gcc/config/nvptx/nvptx-modes.def
@@ -1,3 +1,5 @@
+FLOAT_MODE (HF, 2, ieee_half_format);  /* HFmode */
+
 VECTOR_MODE (INT, SI, 2);  /* V2SI */
 
 VECTOR_MODE (INT, DI, 2);  /* V2DI */
diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index bfa926e..2011b51 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -23,7 +23,10 @@
 enum ptx_isa
 {
   PTX_ISA_SM30,
-  PTX_ISA_SM35
+  PTX_ISA_SM35,
+  PTX_ISA_SM53,
+  PTX_ISA_SM75,
+  PTX_ISA_SM80
 };
 
 enum ptx_version
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 4e4909e..90d9dc7 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -294,6 +294,8 @@ nvptx_ptx_type_from_mode (machine_mode mode, bool promote)
 case E_DImode:
   return ".u64";
 
+case E_HFmode:
+  return ".f16";
 case E_SFmode:
   return ".f32";
 case E_DFmode:
@@ -5406,7 +5408,13 @@ nvptx_file_start (void)
 fputs ("\t.version\t6.3\n", asm_out_file);
   else
 fputs ("\t.version\t3.1\n", asm_out_file);
-  if (TARGET_SM35)
+  if (TARGET_SM80)
+fputs ("\t.target\tsm_80\n", asm_out_file);
+  else if (TARGET_SM75)
+fputs ("\t.target\tsm_75\n", asm_out_file);
+  else if (TARGET_SM53)
+fputs ("\t.target\tsm_53\n", asm_out_file);
+  else if (TARGET_SM35)
 fputs ("\t.target\tsm_35\n", asm_out_file);
   else
 fputs ("\t.target\tsm_30\n", asm_out_file);
@@ -5717,7 +5725,9 @@ nvptx_omp_device_kind_arch_isa (enum 
omp_device_kind_arch_isa trait,
   if (strcmp (name, "sm_30") == 0)
return !TARGET_SM35;
   if (strcmp (name, "sm_35") == 0)
-   return TARGET_SM35;
+   return TARGET_SM35 && !TARGET_SM53;
+  if (strcmp (name, "sm_53") == 0)
+   

Re: [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza

2021-09-16 Thread Segher Boessenkool
On Wed, Sep 01, 2021 at 11:13:38AM -0500, Bill Schmidt wrote:
> I over-restricted use of __builtin_mffsl, since I was unaware that it
> automatically uses mffs when mffsl is not available.  Paul Clarke pointed
> this out in discussion of his SSE 4.1 compatibility patches.

Right.  Do we need to document this better?  There are more builtins
that can generate code for older archs than you might expect (like,
set_fpscr_rn).

Hrm, it *is* documented, but in a big wall of text.  Not sure we can do
much better though, there simply are this many builtins, but maybe you
have an idea how to arrange things better?

Anyway: okay for trunk.  Thanks!


Segher


Re: [PATCH 03/18] rs6000: Handle gimple folding of target built-ins

2021-09-16 Thread Segher Boessenkool
On Wed, Sep 01, 2021 at 11:13:39AM -0500, Bill Schmidt wrote:
> This is another patch that looks bigger than it really is.  Because we
> have a new namespace for the builtins, allowing us to have both the old
> and new builtin infrastructure supported at once, we need versions of
> these functions that use the new builtin namespace.  Otherwise the code is
> unchanged.

I'll just blindly approve it, given that Will has slogged through it all
already :-)

>   * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
>   New forward decl.

Changelog margins are at 80 chars, not 72.

Okay for trunk (w/ the fixes from Will's review).  Thanks!


Segher


Re: [PATCH] libstdc++: Fix UB in atomic_ref/wait_notify.cc [PR101761]

2021-09-16 Thread Jonathan Wakely via Gcc-patches
On Thu, 16 Sep 2021, 23:24 Thomas Rodgers, 
wrote:

> From: Thomas Rodgers 
>
> Remove UB in atomic_ref/wait_notify test.
>
> Signed-off-by: Thomas Rodgers 
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/101761
> * testsuite/29_atomics/atomic_ref/wait_notify.cc (test): Use
> va and vb as arguments to wait/notify, remove unused bb local.
>
> Tested x86_64-pc-linux-gnu, committed to master.
> Ok to backport to releases/gcc-11?
>


Yes, ok for 11, thanks.



> ---
>  .../testsuite/29_atomics/atomic_ref/wait_notify.cc | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
> b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
> index b75e27617f7..b41d1ac0bb7 100644
> --- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
> +++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
> @@ -33,15 +33,14 @@ template
>  if constexpr (std::atomic_ref::is_always_lock_free)
>  {
>S aa{ va };
> -  S bb{ vb };
>std::atomic_ref a{ aa };
> -  a.wait(bb);
> +  a.wait(vb);
>std::thread t([&]
>  {
> - a.store(bb);
> + a.store(vb);
>   a.notify_one();
>  });
> -  a.wait(aa);
> +  a.wait(va);
>t.join();
>  }
>}
> --
> 2.31.1
>
>


Re: [PATCH 04/18] rs6000: Handle some recent MMA builtin changes

2021-09-16 Thread Segher Boessenkool
Hi!

On Wed, Sep 01, 2021 at 11:13:40AM -0500, Bill Schmidt wrote:
> Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
> __builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
> I had been using to automate gimple folding of MMA builtins.  Previously,
> every MMA function that could be folded had an associated internal function
> that it was folded into.  The LXVP/STXVP builtins are just folded directly
> into memory operations.
> 
> Instead of relying on this pattern, this patch adds a new attribute to
> builtins called "mmaint," which is set for all MMA builtins that have an
> associated internal builtin.  The naming convention that adds _INTERNAL to
> the builtin index name remains.
> 
> The rest of the patch is just duplicating Peter's patch, using the new
> builtin infrastructure.

>   * config/rs6000/rs6000-call.c
>   (rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
>   RS6000_BIF_STXVP.

It is fine to end a changelog line in a colon.

> +  else if (fncode == RS6000_BIF_LXVP)
> +{
> +  push_gimplify_context (true);
> +  tree offset = gimple_call_arg (stmt, 0);
> +  tree ptr = gimple_call_arg (stmt, 1);
> +  tree lhs = gimple_call_lhs (stmt);
> +  if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
> + ptr = build1 (VIEW_CONVERT_EXPR,
> +   build_pointer_type (vector_pair_type_node), ptr);
> +  tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
> +TREE_TYPE (ptr), ptr, offset));
> +  gimplify_assign (lhs, mem, &new_seq);
> +  pop_gimplify_context (NULL);
> +  gsi_replace_with_seq (gsi, new_seq, true);
> +  return true;
> +}

Fwiw, all those cases return, so those "else" are not needed.  Also it
would be nice if this could be factored a bit better, hrm.

Is that "if" in there useful?  Maybe add a helper function for it, then?

Anyway: okay for trunk.  Thanks!


Segher


libgo patch committed: Update to go1.17.1 release

2021-09-16 Thread Ian Lance Taylor via Gcc-patches
This patch updates libgo to the go1.17.1 release.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
1d62d26192bf7c2f303d993f9a2963a0fd5b475a
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f4816816500..e2abd5fc4b7 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-c11d9f8275f2bbe9b05cdd815c79ac331f78e15c
+850235e4b974b9c5c2d7a1f9860583bd07f2a45c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/MERGE b/libgo/MERGE
index d037d8d06a2..4473f479d5f 100644
--- a/libgo/MERGE
+++ b/libgo/MERGE
@@ -1,4 +1,4 @@
-ec5170397c724a8ae440b2bc529f857c86f0e6b1
+21a4e67ad58e3c4a7c5254f60cda5be5c3c450ff
 
 The first line of this file holds the git revision number of the
 last merge done from the master library sources.
diff --git a/libgo/VERSION b/libgo/VERSION
index efcff2916b0..844393b24b0 100644
--- a/libgo/VERSION
+++ b/libgo/VERSION
@@ -1 +1 @@
-go1.17
+go1.17.1
diff --git a/libgo/go/archive/zip/reader.go b/libgo/go/archive/zip/reader.go
index 2d53f4c7231..c91a8d00e6c 100644
--- a/libgo/go/archive/zip/reader.go
+++ b/libgo/go/archive/zip/reader.go
@@ -102,7 +102,7 @@ func (z *Reader) init(r io.ReaderAt, size int64) error {
// indicate it contains up to 1 << 128 - 1 files. Since each file has a
// header which will be _at least_ 30 bytes we can safely preallocate
// if (data size / 30) >= end.directoryRecords.
-   if (uint64(size)-end.directorySize)/30 >= end.directoryRecords {
+   if end.directorySize < uint64(size) && 
(uint64(size)-end.directorySize)/30 >= end.directoryRecords {
z.File = make([]*File, 0, end.directoryRecords)
}
z.Comment = end.comment
diff --git a/libgo/go/archive/zip/reader_test.go 
b/libgo/go/archive/zip/reader_test.go
index 37dafe6c8e7..afb03ace24d 100644
--- a/libgo/go/archive/zip/reader_test.go
+++ b/libgo/go/archive/zip/reader_test.go
@@ -1384,3 +1384,21 @@ func TestCVE202133196(t *testing.T) {
t.Errorf("Archive has unexpected number of files, got %d, want 
5", len(r.File))
}
 }
+
+func TestCVE202139293(t *testing.T) {
+   // directory size is so large, that the check in Reader.init
+   // overflows when subtracting from the archive size, causing
+   // the pre-allocation check to be bypassed.
+   data := []byte{
+   0x50, 0x4b, 0x06, 0x06, 0x05, 0x06, 0x31, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x50, 0x4b,
+   0x06, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x01,
+   0x00, 0x00, 0x50, 0x4b, 0x05, 0x06, 0x00, 0x1a, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x50, 0x4b,
+   0x06, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x01,
+   0x00, 0x00, 0x00, 0x50, 0x4b, 0x05, 0x06, 0x00, 0x31, 0x00, 
0x00, 0x00, 0x00, 0xff, 0xff,
+   0xff, 0x50, 0xfe, 0x00, 0xff, 0x00, 0x3a, 0x00, 0x00, 0x00, 
0xff,
+   }
+   _, err := NewReader(bytes.NewReader(data), int64(len(data)))
+   if err != ErrFormat {
+   t.Fatalf("unexpected error, got: %v, want: %v", err, ErrFormat)
+   }
+}
diff --git a/libgo/go/cmd/go/internal/modload/edit.go 
b/libgo/go/cmd/go/internal/modload/edit.go
index c350b9d1b5c..47f236ce168 100644
--- a/libgo/go/cmd/go/internal/modload/edit.go
+++ b/libgo/go/cmd/go/internal/modload/edit.go
@@ -190,8 +190,8 @@ func limiterForEdit(ctx context.Context, rs *Requirements, 
tryUpgrade, mustSelec
 
 // raiseLimitsForUpgrades increases the module versions in maxVersions to the
 // versions that would be needed to allow each of the modules in tryUpgrade
-// (individually) and all of the modules in mustSelect (simultaneously) to be
-// added as roots.
+// (individually or in any combination) and all of the modules in mustSelect
+// (simultaneously) to be added as roots.
 //
 // Versions not present in maxVersion are unrestricted, and it is assumed that
 // they will not be promoted to root requirements (and thus will not contribute
@@ -213,18 +213,42 @@ func raiseLimitsForUpgrades(ctx context.Context, 
maxVersion map[string]string, d
}
}
 
-   var eagerUpgrades []module.Version
+   var (
+   eagerUpgrades  []module.Version
+   isLazyRootPath map[string]bool
+   )
if depth == eager {
eagerUpgrades = tryUpgrade
} else {
+   isLazyRootPath = make(map[string]bool, len(maxVersion))
+   for p := range maxVersion {
+   isLazyRootPath[p] = true
+   }
for _, m := range tryUpgrade {
+   isLazyRootPath[m.Path] = true
+   }
+   for _, m := range mustSelect {
+   isLazyRootPath[m.Path] = true
+   }
+
+   allowedRoot := map[module.Version]boo

[PATCH] better handle MIN/MAX_EXPR of unrelated objects [PR102200]

2021-09-16 Thread Martin Sebor via Gcc-patches

When computing the size of an object pointed to by the result of
a MIN/MAX_EXPR, the handle_min_max_size() function tries to deal
gracefully with operands that designate distinct objects.  But
the handling fails to consider an edge case when one of
the operands is a PHI one of whose operands references the same
MIN/MAX_EXPR.  This ultimately results in attempting to cache
as the result of the MIN/MAX_EXPR two different object references,
which triggers an ICE in the cache consistency checking.

The attached fix avoids the problem by instead caching the SSA_NAME
that's the result of the MIN/MAX_EXPR when its operands might
reference distinct objects, and by enhancing the infor_access()
function to handle this case.  Besides the absence if the ICE
the two additional tests verify that the right subobject of
the MIN/MAX_EXPR is used under the various combinations
of conditions.

Tested on x86_64-linux.

Martin
PR middle-end/102200 - ICE on a min of a decl and pointer in a loop

gcc/ChangeLog:

	PR middle-end/102200
	* pointer-query.cc (access_ref::inform_access): Handle MIN/MAX_EXPR.
	(handle_min_max_size): Change argument.  Store original SSA_NAME for
	operands to potentially distinct (sub)objects.
	(compute_objsize_r): Adjust call to the above.

gcc/testsuite/ChangeLog:

	PR middle-end/102200
	* gcc.dg/Wstringop-overflow-62.c: Adjust text of an expected note.
	* gcc.dg/Warray-bounds-89.c: New test.
	* gcc.dg/Wstringop-overflow-74.c: New test.
	* gcc.dg/Wstringop-overflow-75.c: New test.
	* gcc.dg/Wstringop-overflow-76.c: New test.

diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index 4ad28796e57..83b1f0fc866 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -1087,6 +1087,34 @@ access_ref::inform_access (access_mode mode) const
   else if (gimple_nop_p (stmt))
 	/* Handle DECL_PARM below.  */
 	ref = SSA_NAME_VAR (ref);
+  else if (is_gimple_assign (stmt)
+	   && (gimple_assign_rhs_code (stmt) == MIN_EXPR
+		   || gimple_assign_rhs_code (stmt) == MAX_EXPR))
+	{
+	  /* MIN or MAX_EXPR here implies a reference to a known object
+	 and either an unknown or distinct one (the latter being
+	 the result of an invalid relational expression).  Determine
+	 the identity of the former and point to it in the note.
+	 TODO: Consider merging with PHI handling.  */
+	  access_ref arg_ref[2];
+	  tree arg = gimple_assign_rhs1 (stmt);
+	  compute_objsize (arg, /* ostype = */ 1 , &arg_ref[0]);
+	  arg = gimple_assign_rhs2 (stmt);
+	  compute_objsize (arg, /* ostype = */ 1 , &arg_ref[1]);
+
+	  /* Use the argument that references a known object with more
+	 space remaining.  */
+	  const bool idx
+	= (!arg_ref[0].ref || !arg_ref[0].base0
+	   || (arg_ref[0].base0 && arg_ref[1].base0
+		   && (arg_ref[0].size_remaining ()
+		   < arg_ref[1].size_remaining (;
+
+	  arg_ref[idx].offrng[0] = offrng[0];
+	  arg_ref[idx].offrng[1] = offrng[1];
+	  arg_ref[idx].inform_access (mode);
+	  return;
+	}
 }
 
   if (DECL_P (ref))
@@ -1463,15 +1491,18 @@ pointer_query::dump (FILE *dump_file, bool contents /* = false */)
 }
 
 /* A helper of compute_objsize_r() to determine the size from an assignment
-   statement STMT with the RHS of either MIN_EXPR or MAX_EXPR.  */
+   statement STMT with the RHS of either MIN_EXPR or MAX_EXPR.  On success
+   set PREF->REF to the operand with more or less space remaining,
+   respectively, if both refer to the same (sub)object, or to PTR if they
+   might not, and return true.  Otherwise, if the identity of neither
+   operand can be determined, return false.  */
 
 static bool
-handle_min_max_size (gimple *stmt, int ostype, access_ref *pref,
+handle_min_max_size (tree ptr, int ostype, access_ref *pref,
 		 ssa_name_limit_t &snlim, pointer_query *qry)
 {
-  tree_code code = gimple_assign_rhs_code (stmt);
-
-  tree ptr = gimple_assign_rhs1 (stmt);
+  const gimple *stmt = SSA_NAME_DEF_STMT (ptr);
+  const tree_code code = gimple_assign_rhs_code (stmt);
 
   /* In a valid MAX_/MIN_EXPR both operands must refer to the same array.
  Determine the size/offset of each and use the one with more or less
@@ -1479,7 +1510,8 @@ handle_min_max_size (gimple *stmt, int ostype, access_ref *pref,
  determined from the other instead, adjusted up or down as appropriate
  for the expression.  */
   access_ref aref[2] = { *pref, *pref };
-  if (!compute_objsize_r (ptr, ostype, &aref[0], snlim, qry))
+  tree arg1 = gimple_assign_rhs1 (stmt);
+  if (!compute_objsize_r (arg1, ostype, &aref[0], snlim, qry))
 {
   aref[0].base0 = false;
   aref[0].offrng[0] = aref[0].offrng[1] = 0;
@@ -1487,8 +1519,8 @@ handle_min_max_size (gimple *stmt, int ostype, access_ref *pref,
   aref[0].set_max_size_range ();
 }
 
-  ptr = gimple_assign_rhs2 (stmt);
-  if (!compute_objsize_r (ptr, ostype, &aref[1], snlim, qry))
+  tree arg2 = gimple_assign_rhs2 (stmt);
+  if (!compute_objsize_r (arg2, ostype, &aref[1], snlim, qry))
 {
 

Re: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-16 Thread Feng Xue OS via Gcc-patches
>On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote:
>> This and following patches are composed to enable full devirtualization
>> under whole program assumption (so also called whole-program
>> devirtualization, WPD for short), which is an enhancement to current
>> speculative devirtualization. The base of the optimization is how to
>> identify class type that is local in terms of whole-program scope, at
>> least  those class types in libstdc++ must be excluded in some way.
>> Our means is to use typeinfo symbol as identity marker of a class since
>> it is unique and always generated once the class or its derived type
>> is instantiated somewhere, and rely on symbol resolution by
>> lto-linker-plugin to detect whether  a typeinfo is referenced by regular
>> object/library, which indirectly tells class types are escaped or not.
>> The RFC at https://gcc.gnu.org/pipermail/gcc/2021-August/237132.html
>> gives more details on that.
>>
>> Bootstrapped/regtested on x86_64-linux and aarch64-linux.
>>
>> Thanks,
>> Feng
>>
>> 
>> 2021-09-07  Feng Xue  
>>
>> gcc/
>>   * common.opt (-fdevirtualize-fully): New option.
>>   * class.c (build_rtti_vtbl_entries): Force generation of typeinfo
>>   even -fno-rtti is specificied under full devirtualization.
>
>This makes -fno-rtti useless; rather than this, you should warn about
>the combination of flags and force flag_rtti on.  It also sounds like
>you depend on the library not being built with -fno-rtti.

Although rtti is generated by front-end, we will remove it after lto symtab
merge, which is meant to keep same behavior as -fno-rtti.

Yes, regular library to be linked with should contain rtti data, otherwise
WPD could not deduce class type usage safely. By default, we can think
that it should work for libstdc++, but it probably becomes a problem for
user library, which might be avoided if we properly document this
requirement and suggest user doing that when using WPD.

Thanks
Feng
>
>>   * cgraph.c (cgraph_update_edges_for_call_stmt): Add an assertion
>>   to check node to be traversed.
>>   * cgraphclones.c (cgraph_node::find_replacement): Record
>>   former_clone_of on replacement node.
>>   * cgraphunit.c (symtab_node::needed_p): Always output vtable for
>>   full devirtualization.
>>   (analyze_functions): Force generation of primary vtables for all
>>   base classes.
>>   * ipa-devirt.c (odr_type_d::whole_program_local): New field.
>>   (odr_type_d::has_virtual_base): Likewise.
>>   (odr_type_d::all_derivations_known): Removed.
>>   (odr_type_d::whole_program_local_p): New member function.
>>   (odr_type_d::all_derivations_known_p): Likewise.
>>   (odr_type_d::possibly_instantiated_p): Likewise.
>>   (odr_type_d::set_has_virtual_base): Likewise.
>>   (get_odr_type): Set "whole_program_local" and "has_virtual_base"
>>   when adding a type.
>>   (type_all_derivations_known_p): Replace implementation by a call
>>   to odr_type_d::all_derivations_known_p.
>>   (type_possibly_instantiated_p): Replace implementation by a call
>>   to odr_type_d::possibly_instantiated_p.
>>   (type_known_to_have_no_derivations_p): Replace call to
>>   type_possibly_instantiated_p with call to
>>   odr_type_d::possibly_instantiated_p.
>>   (type_all_ctors_visible_p): Removed.
>>   (type_whole_program_local_p): New function.
>>   (get_type_vtable): Likewise.
>>   (extract_typeinfo_in_vtable): Likewise.
>>   (identify_whole_program_local_types): Likewise.
>>   (dump_odr_type): Dump has_virtual_base and whole_program_local_p()
>>   of type.
>>   (maybe_record_node): Resort to type_whole_program_local_p to
>>   check whether a class has been optimized away.
>>   (record_target_from_binfo): Remove parameter "anonymous", add
>>   a new parameter "possibly_instantiated", and adjust code
>>   accordingly.
>>   (devirt_variable_node_removal_hook): Replace call to
>>   "type_in_anonymous_namespace_p" with "type_whole_program_local_p".
>>   (possible_polymorphic_call_targets): Replace call to
>>   "type_possibly_instantiated_p" with "possibly_instantiated_p",
>>   replace flag check on "all_derivations_known" with call to
>>"all_derivations_known_p".
>>   * ipa-icf.c (filter_removed_items): Disable folding on vtable
>>   under full devirtualization.
>>   * ipa-polymorphic-call.c (restrict_to_inner_class): Move odr
>>   type check to type_known_to_have_no_derivations_p.
>>   * ipa-utils.h (identify_whole_program_local_types): New
>>   declaration.
>>   (type_all_derivations_known_p): Parameter type adjustment.
>>   * ipa.c (walk_polymorphic_call_targets): Do not mark vcall
>>   targets as reachable for full devirtualization.
>>   (can_remove_vtable_if_no_refs_p): New function.
>>   (symbol_table::remove_unreachable_nodes): Add defined vtables
>>   to reachable list

Re: [PING^2] [PATCH] configure, jit: Allow for 'make check-gcc-jit'.

2021-09-16 Thread Jeff Law via Gcc-patches




On 9/15/2021 1:28 PM, Iain Sandoe wrote:

Hi folks,


On 27 Aug 2021, at 14:00, Iain Sandoe  wrote:

+Jeff

(it’s probably borderline obvious - but in the top level Makefile .. so)
OK.  Sorry I didn't look at it before.  I largely ignore things like JIT 
these days.


Jeff


RE: [PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-16 Thread Cui, Lili via Gcc-patches

> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, September 16, 2021 2:28 PM
> To: Cui, Lili 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu
> 
> Subject: Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
> 
> On Wed, Sep 15, 2021 at 10:10 AM  wrote:
> >
> > From: "H.J. Lu" 
> >
> > Check TARGET_USE_VECTOR_FP_CONVERTS or
> TARGET_USE_VECTOR_CONVERTS when
> > handling avx_partial_xmm_update attribute.  Don't convert AVX partial
> > XMM register update if vector packed SSE conversion should be used.
> >
> > gcc/
> >
> > PR target/101900
> > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > Check TARGET_USE_VECTOR_FP_CONVERTS and
> TARGET_USE_VECTOR_CONVERTS
> > before generating vxorps.
> >
> > gcc/
> >
> > PR target/101900
> > * testsuite/gcc.target/i386/pr101900-1.c: New test.
> > * testsuite/gcc.target/i386/pr101900-2.c: Likewise.
> > * testsuite/gcc.target/i386/pr101900-3.c: Likewise.
> > ---
> >  gcc/config/i386/i386-features.c| 21 ++---
> >  gcc/testsuite/gcc.target/i386/pr101900-1.c | 18 ++
> > gcc/testsuite/gcc.target/i386/pr101900-2.c | 18 ++
> > gcc/testsuite/gcc.target/i386/pr101900-3.c | 19 +++
> >  4 files changed, 73 insertions(+), 3 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.target/i386/pr101900-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c
> >
> > diff --git a/gcc/config/i386/i386-features.c
> > b/gcc/config/i386/i386-features.c index 5a99ea7c046..ae5ea02a002
> > 100644
> > --- a/gcc/config/i386/i386-features.c
> > +++ b/gcc/config/i386/i386-features.c
> > @@ -2210,15 +2210,30 @@ remove_partial_avx_dependency (void)
> >   != AVX_PARTIAL_XMM_UPDATE_TRUE)
> > continue;
> >
> > - if (!v4sf_const0)
> > -   v4sf_const0 = gen_reg_rtx (V4SFmode);
> > -
> >   /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
> >  SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
> >  vec_merge with subreg.  */
> >   rtx src = SET_SRC (set);
> >   rtx dest = SET_DEST (set);
> >   machine_mode dest_mode = GET_MODE (dest);
> > + machine_mode src_mode;
> > +
> > + if (TARGET_USE_VECTOR_FP_CONVERTS)
> > +   {
> > + src_mode = GET_MODE (XEXP (src, 0));
> > + if (src_mode == E_SFmode || src_mode == E_DFmode)
> > +   continue;
> > +   }
> > +
> > + if (TARGET_USE_VECTOR_CONVERTS)
> > +   {
> > + src_mode = GET_MODE (XEXP (src, 0));
> > + if (src_mode == E_SImode || src_mode == E_DImode)
> > +   continue;
> > +   }
> > +
> > + if (!v4sf_const0)
> > +   v4sf_const0 = gen_reg_rtx (V4SFmode);
> 
> Please better move initialization of src_mode to the top of the new hunk, 
> like:
> 
> machine_mode src_mode = GET_MODE (XEXP (src, 0)); switch (src_mode) {
>   case E_SFmode:
>   case E_DFmode:
> if (TARGET_USE_VECTOR_FP_CONVERTS)
>   continue;
> break;
>   case E_SImode:
>   case E_DImode:
> if (TARGET_USE_VECTOR_CONVERTS)
>   continue;
> break;
>   default:
> break;
> }
> 
> or something like the above.

Done, thanks for your good advice, I also rebased patch 4/4, since it is based 
on patch 3/4.

Changed it to:

+ machine_mode src_mode = GET_MODE (XEXP (src, 0));
+
+ switch (src_mode)
+   {
+   case E_SFmode:
+   case E_DFmode:
+ if (TARGET_USE_VECTOR_FP_CONVERTS)
+   continue;
+ break;
+   case E_SImode:
+   case E_DImode:
+ if (TARGET_USE_VECTOR_CONVERTS)
+   continue;
+ break;
+   default:
+ break;
+   }
+ if (!v4sf_const0)
+   v4sf_const0 = gen_reg_rtx (V4SFmode);

Thanks,
Lili.

> 
> Uros.
> 
> >
> >   rtx zero;
> >   machine_mode dest_vecmode;
> > diff --git a/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > new file mode 100644
> > index 000..0a45f8e340a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=skylake -mfpmath=sse
> > +-mtune-ctrl=use_vector_fp_converts" } */
> > +
> > +extern float f;
> > +extern double d;
> > +extern int i;
> > +
> > +void
> > +foo (void)
> > +{
> > +  d = f;
> > +  f = i;
> > +}
> > +
> > +/* { dg-final { scan-assembler "vcvtps2pd" } } */
> > +/* { dg-final { scan-assembler "vcvtsi2ssl" } } */
> > +/* { dg-final { scan-assembler-not "vcvtss2sd" } } */
> > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 }
> > +} */
> > diff --git

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-16 Thread Hongtao Liu via Gcc-patches
On Thu, Sep 16, 2021 at 8:31 PM Richard Biener  wrote:
>
> On Thu, 16 Sep 2021, Hongtao Liu wrote:
>
> > On Thu, Sep 16, 2021 at 4:23 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Thu, 16 Sep 2021, liuhongt wrote:
> > >
> > > > Ping
> > > > rebased on latest trunk.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
> > > >   * doc/invoke.texi (Options That Control Optimization): Update
> > > >   documents.
> > > >   * opts.c (default_options_table): Enable auto-vectorization at
> > > >   O2 with very-cheap cost model.
> > > >   (finish_options): Use cheap cost model for
> > > >   explicit -ftree{,-loop}-vectorize.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
> > > >   * g++.dg/tree-ssa/pr81408.C: Ditto.
> > > >   * g++.dg/warn/Wuninitialized-13.C: Ditto.
> > > >   * gcc.dg/Warray-bounds-51.c: Ditto.
> > > >   * gcc.dg/Warray-parameter-3.c: Ditto.
> > > >   * gcc.dg/Wstringop-overflow-13.c: Ditto.
> > > >   * gcc.dg/Wstringop-overflow-14.c: Ditto.
> > > >   * gcc.dg/Wstringop-overflow-21.c: Ditto.
> > > >   * gcc.dg/Wstringop-overflow-68.c: Ditto.
> > > >   * gcc.dg/gomp/pr46032-2.c: Ditto.
> > > >   * gcc.dg/gomp/pr46032-3.c: Ditto.
> > > >   * gcc.dg/gomp/simd-2.c: Ditto.
> > > >   * gcc.dg/gomp/simd-3.c: Ditto.
> > > >   * gcc.dg/graphite/fuse-1.c: Ditto.
> > > >   * gcc.dg/pr67089-6.c: Ditto.
> > > >   * gcc.dg/pr82929-2.c: Ditto.
> > > >   * gcc.dg/pr82929.c: Ditto.
> > > >   * gcc.dg/store_merging_1.c: Ditto.
> > > >   * gcc.dg/store_merging_11.c: Ditto.
> > > >   * gcc.dg/store_merging_15.c: Ditto.
> > > >   * gcc.dg/store_merging_16.c: Ditto.
> > > >   * gcc.dg/store_merging_19.c: Ditto.
> > > >   * gcc.dg/store_merging_24.c: Ditto.
> > > >   * gcc.dg/store_merging_25.c: Ditto.
> > > >   * gcc.dg/store_merging_28.c: Ditto.
> > > >   * gcc.dg/store_merging_30.c: Ditto.
> > > >   * gcc.dg/store_merging_5.c: Ditto.
> > > >   * gcc.dg/store_merging_7.c: Ditto.
> > > >   * gcc.dg/store_merging_8.c: Ditto.
> > > >   * gcc.dg/strlenopt-85.c: Ditto.
> > > >   * gcc.dg/tree-ssa/dump-6.c: Ditto.
> > > >   * gcc.dg/tree-ssa/pr19210-1.c: Ditto.
> > > >   * gcc.dg/tree-ssa/pr47059.c: Ditto.
> > > >   * gcc.dg/tree-ssa/pr86017.c: Ditto.
> > > >   * gcc.dg/tree-ssa/pr91482.c: Ditto.
> > > >   * gcc.dg/tree-ssa/predcom-1.c: Ditto.
> > > >   * gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
> > > >   * gcc.dg/tree-ssa/prefetch-3.c: Ditto.
> > > >   * gcc.dg/tree-ssa/prefetch-6.c: Ditto.
> > > >   * gcc.dg/tree-ssa/prefetch-8.c: Ditto.
> > > >   * gcc.dg/tree-ssa/prefetch-9.c: Ditto.
> > > >   * gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
> > > >   * gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
> > > >   * gcc.dg/uninit-40.c: Ditto.
> > > >   * gcc.dg/unroll-7.c: Ditto.
> > > >   * gcc.misc-tests/help.exp: Ditto.
> > > >   * gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
> > > >   * gcc.target/i386/pr22141.c: Ditto.
> > > >   * gcc.target/i386/pr34012.c: Ditto.
> > > >   * gcc.target/i386/pr49781-1.c: Ditto.
> > > >   * gcc.target/i386/pr95798-1.c: Ditto.
> > > >   * gcc.target/i386/pr95798-2.c: Ditto.
> > > >   * gfortran.dg/pr77498.f: Ditto.
> > > > ---
> > > >  gcc/common.opt |  2 +-
> > > >  gcc/doc/invoke.texi|  8 +---
> > > >  gcc/opts.c | 18 +++---
> > > >  .../c-c++-common/Wstringop-overflow-2.c|  2 +-
> > > >  gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
> > > >  gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
> > > >  gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
> > > >  gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
> > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
> > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
> > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
> > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
> > > >  gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
> > > >  gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
> > > >  gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
> > > >  gcc/testsuite/gcc.dg/gomp/simd-3.c |  2 +-
> > > >  gcc/testsuite/gcc.dg/graphite/fuse-1.c |  2 +-
> > > >  gcc/testsuite/gcc.dg/pr67089-6.c   |  2 +-
> > > >  gcc/testsuite/gcc.dg/pr82929-2.c   |  2 +-
> > > >  gcc/testsuite/gcc.dg/pr82929.c |  2 +-
> > > >  gcc/testsuite/gcc.dg/store_merging_1.c |  2 +-
> > > >  gcc/testsuite/gcc.dg/store_merging_11.c|  2 +-
> > > >  gcc/testsuite/gcc.dg/store_merging_15.c|  2 +-
> > > >  gcc/testsuite/gcc.dg/stor

[PATCH, Fortran] Use _Float128 rather than __float128 for c_float128 kind

2021-09-16 Thread Sandra Loosemore

On 9/5/21 11:20 PM, Sandra Loosemore wrote:

Unless the aarch64 maintainers think it is a bug that __float128 is not 
supported, I think the right solution here is the one I was thinking of 
previously, to fix the Fortran front end to tie the C_FLOAT128 kind to 
_Float128 rather than __float128, and fix the runtime support and test 
cases accordingly.  Then there should be no need to depend on quadmath.h 
at all.  C_FLOAT128 is a GNU extension and _Float128 is supported on a 
superset of targets that __float128 is, so there should be no issue with 
backward compatibility.


Here's a new patch that does this.  I've tested it on x86_64-linux-gnu, 
powerpc64le-linux-gnu, and aarch64-linux-gnu, and it does fix the 
previously reported failure compiling gfortran.dg/PR100914.c on aarch64. 
 OK to commit?


-Sandra
commit cc7e47df550485654efa5f523c3be35007125340
Author: Sandra Loosemore 
Date:   Tue Sep 14 19:07:36 2021 -0700

Fortran: Use _Float128 rather than __float128 for c_float128 kind.

The GNU Fortran manual documents that the c_float128 kind corresponds
to __float128, but in fact the implementation uses float128_type_node,
which is _Float128.  Both refer to the 128-bit IEEE/ISO encoding, but
some targets including aarch64 only define _Float128 and not __float128,
and do not provide quadmath.h.  This caused errors in some test cases
referring to __float128.

This patch changes the documentation (including code comments) and
test cases to use _Float128 to match the implementation.

2021-09-16  Sandra Loosemore  

gcc/fortran/

	* intrinsic.texi (ISO_C_BINDING): Change C_FLOAT128 to correspond
	to _Float128 rather than __float128.
	* iso-c-binding.def (c_float128): Update comments.
	* trans-intrinsic.c (gfc_builtin_decl_for_float_kind): Likewise.
	(build_round_expr): Likewise.
	(gfc_build_intrinsic_lib_fndcecls): Likewise.
	* trans-types.h (gfc_real16_is_float128): Likewise.

gcc/testsuite/
	* gfortran.dg/PR100914.c: Do not include quadmath.h.  Use
	_Float128 _Complex instead of __complex128.
	* gfortran.dg/PR100914.f90: Add -Wno-pedantic to suppress error
	about use of _Float128.
	* gfortran.dg/c-interop/typecodes-array-float128-c.c: Use
	_Float128 instead of __float128.
	* gfortran.dg/c-interop/typecodes-sanity-c.c: Likewise.
	* gfortran.dg/c-interop/typecodes-scalar-float128-c.c: Likewise.
	* lib/target-supports.exp
	(check_effective_target_fortran_real_c_float128): Update comments.

libgfortran/
	* ISO_Fortran_binding.h: Update comments.
	* runtime/ISO_Fortran_binding.c: Likewise.

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 1aacd33..1b9a89d 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -15193,8 +15193,8 @@ In addition to the integer named constants required by the Fortran 2003
 standard and @code{C_PTRDIFF_T} of TS 29113, GNU Fortran provides as an
 extension named constants for the 128-bit integer types supported by the
 C compiler: @code{C_INT128_T, C_INT_LEAST128_T, C_INT_FAST128_T}.
-Furthermore, if @code{__float128} is supported in C, the named constants
-@code{C_FLOAT128, C_FLOAT128_COMPLEX} are defined.
+Furthermore, if @code{_Float128} is supported in C, the named constants
+@code{C_FLOAT128} and @code{C_FLOAT128_COMPLEX} are defined.
 
 @multitable @columnfractions .15 .35 .35 .35
 @headitem Fortran Type  @tab Named constant @tab C type@tab Extension
@@ -15225,11 +15225,11 @@ Furthermore, if @code{__float128} is supported in C, the named constants
 @item @code{REAL}   @tab @code{C_FLOAT} @tab @code{float}
 @item @code{REAL}   @tab @code{C_DOUBLE}@tab @code{double}
 @item @code{REAL}   @tab @code{C_LONG_DOUBLE}   @tab @code{long double}
-@item @code{REAL}   @tab @code{C_FLOAT128}  @tab @code{__float128}@tab Ext.
+@item @code{REAL}   @tab @code{C_FLOAT128}  @tab @code{_Float128}@tab Ext.
 @item @code{COMPLEX}@tab @code{C_FLOAT_COMPLEX} @tab @code{float _Complex}
 @item @code{COMPLEX}@tab @code{C_DOUBLE_COMPLEX}@tab @code{double _Complex}
 @item @code{COMPLEX}@tab @code{C_LONG_DOUBLE_COMPLEX}@tab @code{long double _Complex}
-@item @code{REAL}   @tab @code{C_FLOAT128_COMPLEX}   @tab @code{__float128 _Complex}  @tab Ext.
+@item @code{COMPLEX}@tab @code{C_FLOAT128_COMPLEX}   @tab @code{_Float128 _Complex}  @tab Ext.
 @item @code{LOGICAL}@tab @code{C_BOOL}  @tab @code{_Bool}
 @item @code{CHARACTER}@tab @code{C_CHAR}@tab @code{char}
 @end multitable
diff --git a/gcc/fortran/iso-c-binding.def b/gcc/fortran/iso-c-binding.def
index e65c750..50256fe 100644
--- a/gcc/fortran/iso-c-binding.def
+++ b/gcc/fortran/iso-c-binding.def
@@ -116,7 +116,7 @@ NAMED_REALCST (ISOCBINDING_LONG_DOUBLE, "c_long_double", \
get_real_kind_from_node (long_double_type_node

Re: [PATCH] better handle MIN/MAX_EXPR of unrelated objects [PR102200]

2021-09-16 Thread Jeff Law via Gcc-patches




On 9/16/2021 6:28 PM, Martin Sebor via Gcc-patches wrote:

When computing the size of an object pointed to by the result of
a MIN/MAX_EXPR, the handle_min_max_size() function tries to deal
gracefully with operands that designate distinct objects.  But
the handling fails to consider an edge case when one of
the operands is a PHI one of whose operands references the same
MIN/MAX_EXPR.  This ultimately results in attempting to cache
as the result of the MIN/MAX_EXPR two different object references,
which triggers an ICE in the cache consistency checking.

The attached fix avoids the problem by instead caching the SSA_NAME
that's the result of the MIN/MAX_EXPR when its operands might
reference distinct objects, and by enhancing the infor_access()
function to handle this case.  Besides the absence if the ICE
the two additional tests verify that the right subobject of
the MIN/MAX_EXPR is used under the various combinations
of conditions.

Tested on x86_64-linux.

Martin

gcc-102200.diff

PR middle-end/102200 - ICE on a min of a decl and pointer in a loop

gcc/ChangeLog:

PR middle-end/102200
* pointer-query.cc (access_ref::inform_access): Handle MIN/MAX_EXPR.
(handle_min_max_size): Change argument.  Store original SSA_NAME for
operands to potentially distinct (sub)objects.
(compute_objsize_r): Adjust call to the above.

gcc/testsuite/ChangeLog:

PR middle-end/102200
* gcc.dg/Wstringop-overflow-62.c: Adjust text of an expected note.
* gcc.dg/Warray-bounds-89.c: New test.
* gcc.dg/Wstringop-overflow-74.c: New test.
* gcc.dg/Wstringop-overflow-75.c: New test.
* gcc.dg/Wstringop-overflow-76.c: New test.
OK.  And just for the record, I was initially concerned that we might be 
focused too much on trying to issue an access diagnostic for invalid 
code.  But we could have pointers to different subjects or pointers to 
different elements within an array and the like.  So there's value for 
valid code as well.


jeff


[PATCH v2 2/2] rs6000: Fold xxsel to vsel since they have same semantics

2021-09-16 Thread Xionghu Luo via Gcc-patches
Fold xxsel to vsel like xxperm/vperm to avoid duplicate code.

gcc/ChangeLog:

2021-09-17  Xionghu Luo  

* config/rs6000/altivec.md: Add vsx register constraints.
* config/rs6000/vsx.md (vsx_xxsel): Delete.
(vsx_xxsel2): Likewise.
(vsx_xxsel3): Likewise.
(vsx_xxsel4): Likewise.
---
 gcc/config/rs6000/altivec.md  | 60 +++
 gcc/config/rs6000/vsx.md  | 57 --
 gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
 3 files changed, 37 insertions(+), 82 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index a3424e1a458..4b4ca2c5d17 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -684,56 +684,68 @@ (define_insn "*altivec_gev4sf"
   [(set_attr "type" "veccmp")])
 
 (define_insn "altivec_vsel"
-  [(set (match_operand:VM 0 "altivec_register_operand" "=v")
+  [(set (match_operand:VM 0 "register_operand" "=wa,v")
(ior:VM
  (and:VM
-   (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
-   (match_operand:VM 1 "altivec_register_operand" "v"))
+   (not:VM (match_operand:VM 3 "register_operand" "wa,v"))
+   (match_operand:VM 1 "register_operand" "wa,v"))
  (and:VM
(match_dup 3)
-   (match_operand:VM 2 "altivec_register_operand" "v"]
+   (match_operand:VM 2 "register_operand" "wa,v"]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "vsel %0,%1,%2,%3"
-  [(set_attr "type" "vecmove")])
+  "@
+   xxsel %x0,%x1,%x2,%x3
+   vsel %0,%1,%2,%3"
+  [(set_attr "type" "vecmove")
+   (set_attr "isa" "")])
 
 (define_insn "altivec_vsel2"
-  [(set (match_operand:VM 0 "altivec_register_operand" "=v")
+  [(set (match_operand:VM 0 "register_operand" "=wa,v")
(ior:VM
  (and:VM
-   (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
-   (match_operand:VM 1 "altivec_register_operand" "v"))
+   (not:VM (match_operand:VM 3 "register_operand" "wa,v"))
+   (match_operand:VM 1 "register_operand" "wa,v"))
  (and:VM
-   (match_operand:VM 2 "altivec_register_operand" "v")
+   (match_operand:VM 2 "register_operand" "wa,v")
(match_dup 3]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "vsel %0,%1,%2,%3"
-  [(set_attr "type" "vecmove")])
+  "@
+   xxsel %x0,%x1,%x2,%x3
+   vsel %0,%1,%2,%3"
+  [(set_attr "type" "vecmove")
+   (set_attr "isa" "")])
 
 (define_insn "altivec_vsel3"
-  [(set (match_operand:VM 0 "altivec_register_operand" "=v")
+  [(set (match_operand:VM 0 "register_operand" "=wa,v")
(ior:VM
  (and:VM
-   (match_operand:VM 3 "altivec_register_operand" "v")
-   (match_operand:VM 1 "altivec_register_operand" "v"))
+   (match_operand:VM 3 "register_operand" "wa,v")
+   (match_operand:VM 1 "register_operand" "wa,v"))
  (and:VM
(not:VM (match_dup 3))
-   (match_operand:VM 2 "altivec_register_operand" "v"]
+   (match_operand:VM 2 "register_operand" "wa,v"]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "vsel %0,%2,%1,%3"
-  [(set_attr "type" "vecmove")])
+  "@
+   xxsel %x0,%x2,%x1,%x3
+   vsel %0,%2,%1,%3"
+  [(set_attr "type" "vecmove")
+   (set_attr "isa" "")])
 
 (define_insn "altivec_vsel4"
-  [(set (match_operand:VM 0 "altivec_register_operand" "=v")
+  [(set (match_operand:VM 0 "register_operand" "=wa,v")
(ior:VM
  (and:VM
-   (match_operand:VM 1 "altivec_register_operand" "v")
-   (match_operand:VM 3 "altivec_register_operand" "v"))
+   (match_operand:VM 1 "register_operand" "wa,v")
+   (match_operand:VM 3 "register_operand" "wa,v"))
  (and:VM
(not:VM (match_dup 3))
-   (match_operand:VM 2 "altivec_register_operand" "v"]
+   (match_operand:VM 2 "register_operand" "wa,v"]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "vsel %0,%2,%1,%3"
-  [(set_attr "type" "vecmove")])
+  "@
+   xxsel %x0,%x2,%x1,%x3
+   vsel %0,%2,%1,%3"
+  [(set_attr "type" "vecmove")
+   (set_attr "isa" "")])
 
 ;; Fused multiply add.
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 601eb81e316..1d9a1eaaa54 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2184,63 +2184,6 @@ (define_insn "*vsx_ge__p"
   "xvcmpgep. %x0,%x1,%x2"
   [(set_attr "type" "")])
 
-;; Vector select
-(define_insn "vsx_xxsel"
-  [(set (match_operand:VSX_L 0 "vsx_register_operand" "=,?wa")
-   (ior:VSX_L
- (and:VSX_L
-   (not:VSX_L (match_operand:VSX_L 3 "vsx_register_operand" 
",wa"))
-   (match_operand:VSX_L 1 "vsx_register_operand" ",wa"))
- (and:VSX_L
-   (match_dup 3)
-   (match_operand:VSX_L 2 "vsx_register_operand" ",wa"]
-  "VECTOR_MEM_VSX_P (mode)"
-  "xxsel %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecmove")
-   (set_attr "isa" "")])
-
-(define_insn "vsx_xxs

[PATCH v2 0/2] Fix vec_sel code generation and merge xxsel to vsel

2021-09-16 Thread Xionghu Luo via Gcc-patches
These two patches are updated version from:
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html

Changes:
1. Fix alignment error in md files.
2. Replace rtx_equal_p with match_dup.
3. Use register_operand instead of gpc_reg_operand to align with
   vperm/xxperm.
4. Regression tested pass on P8LE.

Xionghu Luo (2):
  rs6000: Fix wrong code generation for vec_sel [PR94613]
  rs6000: Fold xxsel to vsel since they have same semantics

 gcc/config/rs6000/altivec.md  | 84 ++-
 gcc/config/rs6000/rs6000-call.c   | 62 ++
 gcc/config/rs6000/rs6000.c| 19 ++---
 gcc/config/rs6000/vector.md   | 26 +++---
 gcc/config/rs6000/vsx.md  | 25 --
 gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr94613.c| 47 +++
 7 files changed, 193 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c

-- 
2.25.1



[PATCH v2 1/2] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-09-16 Thread Xionghu Luo via Gcc-patches
The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3&op1) | (op3&op2),
(~op3&op1) | (op2&op3),
(op3&op2) | (~op3&op1),
(op2&op3) | (~op3&op1),
(op1&~op3) | (op3&op2),
(op1&~op3) | (op2&op3),
(op3&op2) | (op1&~op3),
(op2&op3) | (op1&~op3),

The latter 4 cases does not follow canonicalisation rules, non-canonical
RTL is invalid RTL in vregs pass.  Secondly, combine pass will swap
(op1&~op3) to (~op3&op1) by commutative canonical, which could reduce
it to the FIRST 4 patterns, but it won't swap (op2&op3) | (~op3&op1) to
(~op3&op1) | (op2&op3), so this patch handles it with 4 patterns with
different NOT op3 position and check equality inside it.

Tested pass on Power8LE, any comments?

gcc/ChangeLog:

2021-09-17  Xionghu Luo  

* config/rs6000/altivec.md (*altivec_vsel): Change to ...
(altivec_vsel): ... this and update define.
(*altivec_vsel_uns): Delete.
(altivec_vsel2): New define_insn.
(altivec_vsel3): Likewise.
(altivec_vsel4): Likewise.
* config/rs6000/rs6000-call.c (altivec_expand_vec_sel_builtin): New.
(altivec_expand_builtin): Call altivec_expand_vec_sel_builtin to expand
vel_sel.
* config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Use bit-wise
selection instead of per element.
* config/rs6000/vector.md:
* config/rs6000/vsx.md (*vsx_xxsel): Change to ...
(vsx_xxsel): ... this and update define.
(*vsx_xxsel_uns): Delete.
(vsx_xxsel2): New define_insn.
(vsx_xxsel3): Likewise.
(vsx_xxsel4): Likewise.

gcc/testsuite/ChangeLog:

2021-09-17  Xionghu Luo  

* gcc.target/powerpc/pr94613.c: New test.
---
 gcc/config/rs6000/altivec.md   | 62 --
 gcc/config/rs6000/rs6000-call.c| 62 ++
 gcc/config/rs6000/rs6000.c | 19 +++
 gcc/config/rs6000/vector.md| 26 +
 gcc/config/rs6000/vsx.md   | 60 -
 gcc/testsuite/gcc.target/powerpc/pr94613.c | 47 
 6 files changed, 221 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 93d237156d5..a3424e1a458 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -683,26 +683,56 @@ (define_insn "*altivec_gev4sf"
   "vcmpgefp %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
-(define_insn "*altivec_vsel"
+(define_insn "altivec_vsel"
   [(set (match_operand:VM 0 "altivec_register_operand" "=v")
-   (if_then_else:VM
-(ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
-   (match_operand:VM 4 "zero_constant" ""))
-(match_operand:VM 2 "altivec_register_operand" "v")
-(match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+   (ior:VM
+ (and:VM
+   (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+   (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+   (match_dup 3)
+   (match_operand:VM 2 "altivec_register_operand" "v"]
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
+  "vsel %0,%1,%2,%3"
   [(set_attr "type" "vecmove")])
 
-(define_insn "*altivec_vsel_uns"
+(define_insn "altivec_vsel2"
   [(set (match_operand:VM 0 "altivec_register_operand" "=v")
-   (if_then_else:VM
-(ne:CCUNS (match_operand:VM 1 "altivec_register_operand" "v")
-  (match_operand:VM 4 "zero_constant" ""))
-(match_operand:VM 2 "altivec_register_operand" "v")
-(match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+   (ior:VM
+ (and:VM
+   (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+   (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+   (match_operand:VM 2 "altivec_register_operand" "v")
+   (match_dup 3]
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
+  "vsel %0,%1,%2,%3"
+  [(set_attr "type" "vecmove")])
+
+(define_insn "altivec_vsel3"
+  [(set (match_operand:VM 0 "altivec_register_operand" "=v")
+   (ior:VM
+ (and:VM
+   (match_operand:VM 3 "altivec_register_operand" "v")
+   (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+   (not:VM (match_dup 3))
+   (match_operand:VM 2 "altivec_register_operand" "v"]
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
+  "vsel %0,%2,%1,%3"
+  [(set_attr "type" "vecmove")])
+
+(define_insn "altivec_vsel4"
+ 

Re: Ping ^ 3: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-09-16 Thread Xionghu Luo via Gcc-patches




On 2021/9/15 21:11, David Edelsohn wrote:

Hi, Xionhu

Should "altivec_vsel2" .. 3 .. 4 be "*altivec_vsel2", etc.
because they are combiner patterns and never referenced by name?  Only
the first, named pattern is referenced by the builtin code.


Thanks, updated the patchset with Segher's review comments, he didn't mention
about this and sorry to forget change this part,  I am also not
sure whether "altivec_vsel2" .. 3 .. 4 will be used/generated or
optimized by expander in future, is there any benefit to add "*" to the
define_insn patterns?



Other than that question / suggestion, this patch is okay.  Please
coordinate with Bill and his builtin patches.


OK.



Thanks, David

On Wed, Sep 15, 2021 at 3:50 AM Xionghu Luo  wrote:


Ping^3, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html


On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote:

Ping^2, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html

On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote:

Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html


On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/5/13 18:49, Segher Boessenkool wrote:

Hi!

On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3&op1) | (op3&op2),
(~op3&op1) | (op2&op3),
(op3&op2) | (~op3&op1),
(op2&op3) | (~op3&op1),
(op1&~op3) | (op3&op2),
(op1&~op3) | (op2&op3),
(op3&op2) | (op1&~op3),
(op2&op3) | (op1&~op3),

Combine pass will swap (op1&~op3) to (~op3&op1) due to commutative
canonical, which could reduce it to the FIRST 4 patterns, but it won't
swap (op2&op3) | (~op3&op1) to (~op3&op1) | (op2&op3), so this patch
handles it with two patterns with different NOT op3 position and check
equality inside it.


Yup, that latter case does not have canonicalisation rules.  Btw, not
only combine does this canonicalisation: everything should,
non-canonical RTL is invalid RTL (in the instruction stream, you can do
everything in temporary code of course, as long as the RTL isn't
malformed).


-(define_insn "*altivec_vsel"
+(define_insn "altivec_vsel"
 [(set (match_operand:VM 0 "altivec_register_operand" "=v")
-(if_then_else:VM
- (ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
-(match_operand:VM 4 "zero_constant" ""))
- (match_operand:VM 2 "altivec_register_operand" "v")
- (match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+(ior:VM
+ (and:VM
+  (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+  (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+  (match_operand:VM 2 "altivec_register_operand" "v")
+  (match_operand:VM 4 "altivec_register_operand" "v"]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
+  && (rtx_equal_p (operands[2], operands[3])
+  || rtx_equal_p (operands[4], operands[3]))"
+  {
+if (rtx_equal_p (operands[2], operands[3]))
+  return "vsel %0,%1,%4,%3";
+else
+  return "vsel %0,%1,%2,%3";
+  }
 [(set_attr "type" "vecmove")])


That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I
think.  So please write this as two patterns (and keep the expand if
that helps).


I was a bit concerned that there would be a lot of duplicate code if we
write two patterns for each vsel, totally 4 similar patterns in
altivec.md and another 4 in vsx.md make it difficult to maintain,
however
I updated it since you prefer this way, as you pointed out the xxsel in
vsx.md could be folded by later patch.




+(define_insn "altivec_vsel2"


(same here of course).


   ;; Fused multiply add.
diff --git a/gcc/config/rs6000/rs6000-call.c
b/gcc/config/rs6000/rs6000-call.c
index f5676255387..d65bdc01055 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3362,11 +3362,11 @@ const struct altivec_builtin_types
altivec_overloaded_builtins[] = {
   RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
RS6000_BTI_unsigned_V2DI },
 { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
   RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI,
RS6000_BTI_V2DI },
-  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
+  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS,


Are the _uns things still used for anything?  But, let's not change
this until Bill's stuff is in :-)

Why do you want to change this here, btw?  I don't understand.


OK, they are actually "unsigned type" overload builtin functions, change
it or not so far won't cause functionality issue, I will revert this
change
in the updated patch.




+  i

Re: [PATCH, Fortran] Revert to non-multilib-specific ISO_Fortran_binding.h

2021-09-16 Thread Gerald Pfeifer
On Tue, 14 Sep 2021, Gerald Pfeifer wrote:
>> And, related, does the following make sense and fixes the issue?
>> 
>> --- a/libgfortran/ISO_Fortran_binding.h
>> +++ b/libgfortran/ISO_Fortran_binding.h
>> @@ -228,5 +228,5 @@ extern int CFI_setpointer (CFI_cdesc_t *, CFI_cdesc_t *,
>> const CFI_index_t []);
>> 
>>  /* This is the 80-bit encoding on x86; Fortran assigns it kind 10.  */
>> -#elif (LDBL_MANT_DIG == 64 \
>> +#elif ((LDBL_MANT_DIG == 64 || LDBL_MANT_DIG == 53) \
>> && LDBL_MIN_EXP == -16381 \
>> && LDBL_MAX_EXP == 16384)
> Yes, with this patch (on top of current trunk) i586-freebsd-* is back
> in bootstrap land. :)

Neither this (which fixes the bootstrap) nor Sandra's rewrite (which 
does not, but seemed generally liked) has been committed.

Can someone please push the former (and possibly later)?

Thank you,
Gerald


Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-16 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 17, 2021 at 5:15 AM Cui, Lili  wrote:
>
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Thursday, September 16, 2021 2:28 PM
> > To: Cui, Lili 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu
> > 
> > Subject: Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> > USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
> >
> > On Wed, Sep 15, 2021 at 10:10 AM  wrote:
> > >
> > > From: "H.J. Lu" 
> > >
> > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> > TARGET_USE_VECTOR_CONVERTS when
> > > handling avx_partial_xmm_update attribute.  Don't convert AVX partial
> > > XMM register update if vector packed SSE conversion should be used.
> > >
> > > gcc/
> > >
> > > PR target/101900
> > > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > > Check TARGET_USE_VECTOR_FP_CONVERTS and
> > TARGET_USE_VECTOR_CONVERTS
> > > before generating vxorps.
> > >
> > > gcc/
> > >
> > > PR target/101900
> > > * testsuite/gcc.target/i386/pr101900-1.c: New test.
> > > * testsuite/gcc.target/i386/pr101900-2.c: Likewise.
> > > * testsuite/gcc.target/i386/pr101900-3.c: Likewise.
> > > ---
> > >  gcc/config/i386/i386-features.c| 21 ++---
> > >  gcc/testsuite/gcc.target/i386/pr101900-1.c | 18 ++
> > > gcc/testsuite/gcc.target/i386/pr101900-2.c | 18 ++
> > > gcc/testsuite/gcc.target/i386/pr101900-3.c | 19 +++
> > >  4 files changed, 73 insertions(+), 3 deletions(-)  create mode 100644
> > > gcc/testsuite/gcc.target/i386/pr101900-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c
> > >
> > > diff --git a/gcc/config/i386/i386-features.c
> > > b/gcc/config/i386/i386-features.c index 5a99ea7c046..ae5ea02a002
> > > 100644
> > > --- a/gcc/config/i386/i386-features.c
> > > +++ b/gcc/config/i386/i386-features.c
> > > @@ -2210,15 +2210,30 @@ remove_partial_avx_dependency (void)
> > >   != AVX_PARTIAL_XMM_UPDATE_TRUE)
> > > continue;
> > >
> > > - if (!v4sf_const0)
> > > -   v4sf_const0 = gen_reg_rtx (V4SFmode);
> > > -
> > >   /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
> > >  SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
> > >  vec_merge with subreg.  */
> > >   rtx src = SET_SRC (set);
> > >   rtx dest = SET_DEST (set);
> > >   machine_mode dest_mode = GET_MODE (dest);
> > > + machine_mode src_mode;
> > > +
> > > + if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > +   {
> > > + src_mode = GET_MODE (XEXP (src, 0));
> > > + if (src_mode == E_SFmode || src_mode == E_DFmode)
> > > +   continue;
> > > +   }
> > > +
> > > + if (TARGET_USE_VECTOR_CONVERTS)
> > > +   {
> > > + src_mode = GET_MODE (XEXP (src, 0));
> > > + if (src_mode == E_SImode || src_mode == E_DImode)
> > > +   continue;
> > > +   }
> > > +
> > > + if (!v4sf_const0)
> > > +   v4sf_const0 = gen_reg_rtx (V4SFmode);
> >
> > Please better move initialization of src_mode to the top of the new hunk, 
> > like:
> >
> > machine_mode src_mode = GET_MODE (XEXP (src, 0)); switch (src_mode) {
> >   case E_SFmode:
> >   case E_DFmode:
> > if (TARGET_USE_VECTOR_FP_CONVERTS)
> >   continue;
> > break;
> >   case E_SImode:
> >   case E_DImode:
> > if (TARGET_USE_VECTOR_CONVERTS)
> >   continue;
> > break;
> >   default:
> > break;
> > }
> >
> > or something like the above.
>
> Done, thanks for your good advice, I also rebased patch 4/4, since it is 
> based on patch 3/4.

OK.

Thanks,
Uros.

>
> Changed it to:
>
> + machine_mode src_mode = GET_MODE (XEXP (src, 0));
> +
> + switch (src_mode)
> +   {
> +   case E_SFmode:
> +   case E_DFmode:
> + if (TARGET_USE_VECTOR_FP_CONVERTS)
> +   continue;
> + break;
> +   case E_SImode:
> +   case E_DImode:
> + if (TARGET_USE_VECTOR_CONVERTS)
> +   continue;
> + break;
> +   default:
> + break;
> +   }
> + if (!v4sf_const0)
> +   v4sf_const0 = gen_reg_rtx (V4SFmode);
>
> Thanks,
> Lili.
>
> >
> > Uros.
> >
> > >
> > >   rtx zero;
> > >   machine_mode dest_vecmode;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > new file mode 100644
> > > index 000..0a45f8e340a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -march=skylake -mfpmath=sse
> > > +-mtune-ctrl=use_vector_fp_converts" } */
> > > +
> > > +extern float f;
> > > +extern double d;
> > > +extern