date:20230818

[PATCH v2] Add clang's invalid-noreturn warning flag

2023-08-18 Thread Julian Waters via Gcc-patches

Please review the second version of a patch to add clang's invalid-noreturn
flag to toggle noreturn  warnings. This patch keeps the old behaviour of
always warning on every noreturn violation, but unlike clang also adds an
extra layer of fine tuning by turning invalid-noreturn into a warning with
levels, where level 1 warns about noreturn functions that do return, level
2 warns about noreturn functions that explicitly have return statements,
and level 3, which is the default to match old behaviour, warns for both
instances. Fixed from the first version is a malformed table in invoke.texi.

Tested on Windows with the MinGW 64 Runtime.

gcc/doc/ChangeLog:

* invoke.texi (-Wno-invalid-noreturn, -Winvalid-noreturn=): Document new
options.

gcc/ChangeLog:

* tree-cfg.cc (pass_warn_function_return::execute): Use new warning option.

gcc/c-family/ChangeLog:

* c.opt (Winvalid-noreturn, Winvalid-noreturn=): New options.

gcc/c/ChangeLog:

* c-typeck.cc (c_finish_return): Use new warning option.
* gimple-parser.cc (c_finish_gimple_return): Likewise.

gcc/cp/ChangeLog:

* coroutines.cc (finish_co_return_stmt): Use new warning option.
* typeck.cc (check_return_expr): Likewise.

 gcc/c-family/c.opt |  8 
 gcc/c/c-typeck.cc  |  9 ++---
 gcc/c/gimple-parser.cc |  9 ++---
 gcc/cp/coroutines.cc   | 11 +++
 gcc/cp/typeck.cc   |  7 +--
 gcc/doc/invoke.texi| 27 +++
 gcc/tree-cfg.cc|  5 -
 7 files changed, 63 insertions(+), 13 deletions(-)


0001-Add-the-invalid-noreturn-warning-to-match-clang.patch
Description: Binary data

[PATCH] improve error when /usr/include isn't found [PR90835]

2023-08-18 Thread Eric Gallager via Gcc-patches

This is a pretty simple patch that ought to help Darwin users understand
better why their build is failing when they forget to pass the
--with-sysroot= flag to configure.

gcc/ChangeLog:

PR target/90835
* Makefile.in: improve error message when /usr/include is
missing
---
 gcc/Makefile.in | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 97e5450ecb5..535c475dfab 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -55,6 +55,7 @@ MAKEOVERRIDES =
 # ---
 
 build=@build@
+build_os=@build_os@
 host=@host@
 host_noncanonical=@host_noncanonical@
 host_os=@host_os@
@@ -3240,8 +3241,13 @@ stmp-fixinc: gsyslimits.h macro_list fixinc_list \
multi_dir=`echo $${ml} | sed -e 's/^[^;]*;//'`; \
fix_dir=include-fixed$${multi_dir}; \
if ! $(inhibit_libc) && test ! -d ${BUILD_SYSTEM_HEADER_DIR}; then \
- echo The directory that should contain system headers does not 
exist: >&2 ; \
+ echo "The directory (BUILD_SYSTEM_HEADER_DIR) that should contain 
system headers does not exist:" >&2 ; \
  echo "  ${BUILD_SYSTEM_HEADER_DIR}" >&2 ; \
+ case ${build_os} in \
+   darwin*) \
+ echo "(on darwin this usually means you need to pass the 
--with-sysroot flag to configure to point it to where the system headers are 
actually put)" >&2; \
+ ;; \
+ esac; \
  tooldir_sysinc=`echo "${gcc_tooldir}/sys-include" | sed -e :a -e 
"s,[^/]*/\.\.\/,," -e ta`; \
  if test "x${BUILD_SYSTEM_HEADER_DIR}" = "x$${tooldir_sysinc}"; \
  then sleep 1; else exit 1; fi; \
-- 
2.32.0 (Apple Git-132)

[PATCH] improve error when /usr/include isn't found [PR90835]

2023-08-18 Thread Eric Gallager via Gcc-patches

This is a pretty simple patch that ought to help Darwin users understand
better why their build is failing when they forget to pass the
--with-sysroot= flag to configure.

gcc/ChangeLog:

PR target/90835
* Makefile.in: improve error message when /usr/include is
missing
---
 gcc/Makefile.in | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 97e5450ecb5..535c475dfab 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -55,6 +55,7 @@ MAKEOVERRIDES =
 # ---
 
 build=@build@
+build_os=@build_os@
 host=@host@
 host_noncanonical=@host_noncanonical@
 host_os=@host_os@
@@ -3240,8 +3241,13 @@ stmp-fixinc: gsyslimits.h macro_list fixinc_list \
multi_dir=`echo $${ml} | sed -e 's/^[^;]*;//'`; \
fix_dir=include-fixed$${multi_dir}; \
if ! $(inhibit_libc) && test ! -d ${BUILD_SYSTEM_HEADER_DIR}; then \
- echo The directory that should contain system headers does not 
exist: >&2 ; \
+ echo "The directory (BUILD_SYSTEM_HEADER_DIR) that should contain 
system headers does not exist:" >&2 ; \
  echo "  ${BUILD_SYSTEM_HEADER_DIR}" >&2 ; \
+ case ${build_os} in \
+   darwin*) \
+ echo "(on darwin this usually means you need to pass the 
--with-sysroot flag to configure to point it to where the system headers are 
actually put)" >&2; \
+ ;; \
+ esac; \
  tooldir_sysinc=`echo "${gcc_tooldir}/sys-include" | sed -e :a -e 
"s,[^/]*/\.\.\/,," -e ta`; \
  if test "x${BUILD_SYSTEM_HEADER_DIR}" = "x$${tooldir_sysinc}"; \
  then sleep 1; else exit 1; fi; \
-- 
2.32.0 (Apple Git-132)

Re: [PATCH] Loongarch: Fix plugin header missing install.

2023-08-18 Thread Chenghua Xu

Pushed as r14-3331.

Thanks.
chenglulu writes:

> LGTM!
>
> 在 2023/8/16 上午9:48, Guo Jie 写道:
>> gcc/ChangeLog:
>>
>>  * config/loongarch/t-loongarch: Add loongarch-driver.h into
>>  TM_H. Add loongarch-def.h and loongarch-tune.h into
>>  OPTIONS_H_EXTRA.
>>
>> Co-authored-by: Lulu Cheng 
>> ---
>>   gcc/config/loongarch/t-loongarch | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/gcc/config/loongarch/t-loongarch 
>> b/gcc/config/loongarch/t-loongarch
>> index 6d6e3435d59..e73f4f437ef 100644
>> --- a/gcc/config/loongarch/t-loongarch
>> +++ b/gcc/config/loongarch/t-loongarch
>> @@ -16,6 +16,10 @@
>>   # along with GCC; see the file COPYING3.  If not see
>>   # .
>>   +TM_H += $(srcdir)/config/loongarch/loongarch-driver.h
>> +OPTIONS_H_EXTRA += $(srcdir)/config/loongarch/loongarch-def.h \
>> +   $(srcdir)/config/loongarch/loongarch-tune.h
>> +
>>   # Canonical target triplet from config.gcc
>>   LA_MULTIARCH_TRIPLET = $(patsubst LA_MULTIARCH_TRIPLET=%,%,$\
>>   $(filter LA_MULTIARCH_TRIPLET=%,$(tm_defines)))

Re: [PATCH] RISC-V: Enable pressure-aware scheduling by default.

2023-08-18 Thread Jeff Law via Gcc-patches





On 8/18/23 17:24, Vineet Gupta wrote:



On 8/18/23 16:08, Jeff Law wrote:

There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
Yea.  In fact I wouldn't really expect significant changes other than 
those key loops in x264.


Care to elaborate a bit more please. I've seen severe reg pressure / 
spills in a bunch of others: cactu, lbm, exchange2. Is there something 
specific to x264 spills ?
The only thing that's particularly interesting about the x264 spills is 
they're caused by scheduling.


In simplest terms GCC's scheduler tries to minimize the latency of the 
critical path in a block.  For x264 we've got a loop that we unrolled 8 
times with 8 byte sized loads per loop iteration.  So 64 byte loads, all 
higher from a critical path latency standpoint than anything else.


Naturally there's no way we can hold 64 values live as we only have 32 
registers and thus we blow out the register file.


By turning on pressure sensitive scheduling, as register pressure 
approaches the threshold, the scheduler will select a lower priority 
instruction (say computing the difference of two previously loaded 
values) that reduces register pressure.  So it's not critical path 
optimal, but it keep us from blowing out the register file and 
ultimately we get better performance as a result.


jeff

Re: [PATCH] RISC-V/testsuite: Add missing conversion tests.

2023-08-18 Thread 钟居哲


I wonder whether this patch fix such following issues :?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108271 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108412 




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-19 03:32
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V/testsuite: Add missing conversion tests.
Hi,
 
this patch adds some missing tests for vf[nw]cvt.
 
Regards
Robin
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c:
Add tests.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.
---
.../rvv/autovec/conversions/vfncvt-ftoi-run.c | 96 +++
.../autovec/conversions/vfncvt-ftoi-rv32gcv.c |  6 +-
.../autovec/conversions/vfncvt-ftoi-rv64gcv.c |  6 +-
.../conversions/vfncvt-ftoi-template.h|  6 ++
.../autovec/conversions/vfncvt-itof-rv32gcv.c |  1 +
.../autovec/conversions/vfncvt-itof-rv64gcv.c |  4 +-
.../conversions/vfncvt-itof-template.h|  5 +-
.../conversions/vfncvt-itof-zvfh-run.c| 32 +++
.../autovec/conversions/vfwcvt-ftoi-rv32gcv.c |  4 +-
.../autovec/conversions/vfwcvt-ftoi-rv64gcv.c |  4 +-
.../conversions/vfwcvt-ftoi-template.h|  2 +
.../conversions/vfwcvt-ftoi-zvfh-run.c| 32 +++
.../rvv/autovec/conversions/vfwcvt-itof-run.c | 96 +++
.../autovec/conversions/vfwcvt-itof-rv32gcv.c |  4 +-
.../autovec/conversions/vfwcvt-itof-rv64gcv.c |  4 +-
.../conversions/vfwcvt-itof-template.h| 10 +-
.../conversions/vfwcvt-itof-zvfh-run.c| 10 +-
17 files changed, 302 insertions(+), 20 deletions(-)
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
index ce3fcfa9af8..73eda067ba3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
@@ -62,6 +62,38 @@ main ()
   RUN2 (float, uint16_t, 4096)
   RUN2 (float, uint16_t, 5975)
+  RUN (float, int8_t, 3)
+  RUN (float, int8_t, 4)
+  RUN (float, int8_t, 7)
+  RUN (float, int8_t, 99)
+  RUN (float, int8_t, 119)
+  RUN (float, int8_t, 128)
+  RUN (float, int8_t, 256)
+  RUN (float, int8_t, 279)
+  RUN (float, int8_t, 555)
+  RUN (float, int8_t, 1024)
+  RUN (float, int8_t, 1389)
+  RUN (float, int8_t, 2048)
+  RUN (float, int8_t, 3989)
+  RUN (float, int8_t, 4096)
+  RUN (float, int8_t, 5975)
+
+  RUN2 (float, uint8_t, 3)
+  RUN2 (float, uint8_t, 4)
+  RUN2 (float, uint8_t, 7)
+  RUN2 (float, uint8_t, 99)
+  RUN2 (float, uint8_t, 119)
+  RUN2 (float, uint8_t, 128)
+  RUN2 (float, uint8_t, 256)
+  RUN2 (float, uint8_t, 279)
+  RUN2 (float, uint8_t, 555)
+  RUN2 (float, uint8_t, 1024)
+  RUN2 (float, uint8_t, 1389)
+  RUN2 (float, uint8_t, 2048)
+  RUN2 (float, uint8_t, 3989)
+  RUN2 (float, uint8_t, 4096)
+  RUN2 (float, uint8_t, 5975)
+
   RUN (double, int32_t, 3)
   RUN (double, int32_t, 4)
   RUN (double, int32_t, 7)
@@ -93,4 +125,68 @@ main ()
   RUN2 (double, uint32_t, 3989)
   RUN2 (double, uint32_t, 4096)
   RUN2 (double, uint32_t, 5975)
+
+  RUN (double, int16_t, 3)
+  RUN (double, int16_t, 4)
+  RUN (double, int16_t, 7)
+  RUN (double, int16_t, 99)
+  RUN (double, int16_t, 119)
+  RUN (double, int16_t, 128)
+  RUN (double, int16_t, 256)
+  RUN (double, int16_t, 279)
+  RUN (double, int16_t, 555)
+  RUN (double, int16_t, 1024)
+  RUN (double, int16_t, 1389)
+  RUN (double, int16_t, 2048)
+  RUN (double, int16_t, 3989)
+  RUN (double, int16_t, 4096)
+  RUN (double, int16_t, 5975)
+
+  RUN2 (double, uint16_t, 3)
+  RUN2 (double, uint16_t, 4)
+  RUN2 (double, uint16_t, 7)
+  RUN2 (double, uint16_t, 99)
+  RUN2 (double, uint16_t, 119)
+  RUN2 (double, uint16_t, 128)
+  RUN2 (double, uint16_t, 256)
+  RUN2 (double, uint16_t, 279)
+  RUN2 (double,

Re: [PATCH] RISC-V: Enable pressure-aware scheduling by default.

2023-08-18 Thread Vineet Gupta





On 8/18/23 16:08, Jeff Law wrote:

There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
Yea.  In fact I wouldn't really expect significant changes other than 
those key loops in x264.


Care to elaborate a bit more please. I've seen severe reg pressure / 
spills in a bunch of others: cactu, lbm, exchange2. Is there something 
specific to x264 spills ?






diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc

index 4737dcd44a1..59848b21162 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -2017,9 +2017,11 @@ static const struct default_options 
riscv_option_optimization_table[] =

    {
  { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },


Nit2: maybe move this 1 line up to keep LEVEL_1 together, at least the 
new ones being added.



  #if TARGET_DEFAULT_ASYNC_UNWIND_TABLES == 1
  { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
  { OPT_LEVELS_ALL, OPT_funwind_tables, NULL, 1},
+    /* Enable -fsched-pressure by default when optimizing.  */
  #endif
  { OPT_LEVELS_NONE, 0, NULL, 0 }
    };

Shouldn't the comment move up to before the OPT_fsched_pressure line?


Yep I had the exact same first though but then thought it was something 
deeper.

Turned out to be Occam's Razor after all :-)

Thx,
-Vineet

Re: [PATCH] RISC-V: Allow immediates 17-31 for vector shift.

2023-08-18 Thread Jeff Law via Gcc-patches





On 8/18/23 13:37, Robin Dapp wrote:

Hi,

this patch adds a missing constraint check in order to be able to
print (and not ICE) vector immediates 17-31 for vector shifts.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.

OK
jeff

Re: [PATCH] RISC-V: Allow immediates 17-31 for vector shift.

2023-08-18 Thread Jeff Law via Gcc-patches





On 8/18/23 13:56, Palmer Dabbelt wrote:

On Fri, 18 Aug 2023 12:37:06 PDT (-0700), rdapp@gmail.com wrote:

Hi,

this patch adds a missing constraint check in order to be able to
print (and not ICE) vector immediates 17-31 for vector shifts.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.
---
 gcc/config/riscv/riscv.cc    |  3 ++-
 .../riscv/rvv/autovec/binop/shift-immediate.c    | 16 
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 49062bef9fc..0f60ffe5f60 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4954,7 +4954,8 @@ riscv_print_operand (FILE *file, rtx op, int 
letter)


Looks like the comment at the top of riscv_print_operand() is way out of 
date.  Maybe we should just toss it?
I think it's been out of date on every port I've ever worked on! Folks 
start with good intentions, but nobody seems to keep this up-to-date.


So +1 to removing it if someone wants to. I'll even pre-approve such a 
patch.  I'll also pre-approve a patch which brings it up-to-date if 
that's the direction we want to go.


jeff

Re: [PATCH] RISC-V: Enable pressure-aware scheduling by default.

2023-08-18 Thread Jeff Law via Gcc-patches





On 8/18/23 07:57, Robin Dapp wrote:

Hi,

this patch enables pressure-aware scheduling for riscv.  There have been
various requests for it so I figured I'd just go ahead and send
the patch.
Thanks.  Your timing is good, I was pondering nominating a victim to 
pick this up next week ;-)





There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
Yea.  In fact I wouldn't really expect significant changes other than 
those key loops in x264.




As cost and scheduling models mature I expect the situation to improve
and for now I think it's generally favorable to enable pressure-aware
scheduling so we can work with it rather than trying to find every
possible problem in advance.  Any other opinions on that?

Agreed.  I wouldn't be surprised if largely turns into a set-and-forget.



Regards
  Robin

This patch enables register -fsched-pressure by default and sets
the algorithm to "model".  As with other backends, this helps
reduce unnecessary spills.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add -fsched-pressure.
* config/riscv/riscv.cc (riscv_option_override): Set sched
pressure algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/narrow_constraint-1.c: Add
-fno-sched-pressure.
* gcc.target/riscv/rvv/base/narrow_constraint-17.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-18.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-19.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-20.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-21.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-22.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-23.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-24.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-25.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-26.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-27.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-28.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-29.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-30.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-31.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-4.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-5.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.

OK.  Just one nit below.




diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4737dcd44a1..59848b21162 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -2017,9 +2017,11 @@ static const struct default_options 
riscv_option_optimization_table[] =
{
  { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
  #if TARGET_DEFAULT_ASYNC_UNWIND_TABLES == 1
  { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
  { OPT_LEVELS_ALL, OPT_funwind_tables, NULL, 1},
+/* Enable -fsched-pressure by default when optimizing.  */
  #endif
  { OPT_LEVELS_NONE, 0, NULL, 0 }
};

Shouldn't the comment move up to before the OPT_fsched_pressure line?


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 49062bef9fc..96c5362d2fd 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "cfgloop.h"
  #include "cfgrtl.h"
  #include "sel-sched.h"
+#include "sched-int.h"
  #include "fold-const.h"
  #include "gimple-iterator.h"
  #include "gimple-expr.h"
@@ -7095,6 +7096,10 @@ riscv_option_override (void)
  sorry (
"Current RISC-V GCC cannot support VLEN greater than 4096bit for 'V' 
Extension");
  
+  SET_OPTION_IF_UNSET (_options, _options_set,

+  param_sched_pressure_algorithm,
+  SCHED_PRESSURE_MODEL);
+
FWIW, I tried both variants of the pressure model.  They seemed roughly 
on-par with each other across specint.


Jeff

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-08-18 Thread Iain Sandoe via Gcc-patches




> On 18 Aug 2023, at 23:59, Joseph Myers  wrote:
> 
> On Fri, 18 Aug 2023, Iain Sandoe via Gcc-patches wrote:
> 
>> There is quite extensive Apple Developer documentation on delivering 
>> packages with co-installed libraries using @rpath (that is the intended 
>> mechanism for delivery since it allows drag-and-drop installation and 
>> moving of built applications).
>> 
>> The revised compiler has libraries already built in a suitable manner 
>> for that distribution model.
>> 
>> I would not propose that we repeated such information - but we could 
>> refer to it?
>> 
>> Generally, I’d prefer we suggested searching for such documentation, 
>> rather than linking to it, since links can expire - does that seem 
>> reasonable?
> 
> I suppose the key thing is to note that, if building a program for 
> distribution, you shouldn't build it to embed 
> /path/to/compiler/install/lib, but instead should do  possibly referring to relevant Apple documentation>.

right, exactly - there are special runpath roots like @executable_path and
@loader_path that provide for packages that are fully relocatable (we
actually use some of this to allow GCC runtimes to find their dependent
runtimes without needing an absolute runpath).

OK. We just need to find a suitable place to put this - perhaps in documenting
-nodefaultrpaths (since that’s usually used together with specifying something
different).

thanks,
Iain


> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-08-18 Thread Joseph Myers

On Fri, 18 Aug 2023, Iain Sandoe via Gcc-patches wrote:

> There is quite extensive Apple Developer documentation on delivering 
> packages with co-installed libraries using @rpath (that is the intended 
> mechanism for delivery since it allows drag-and-drop installation and 
> moving of built applications).
> 
> The revised compiler has libraries already built in a suitable manner 
> for that distribution model.
> 
> I would not propose that we repeated such information - but we could 
> refer to it?
> 
> Generally, I’d prefer we suggested searching for such documentation, 
> rather than linking to it, since links can expire - does that seem 
> reasonable?

I suppose the key thing is to note that, if building a program for 
distribution, you shouldn't build it to embed 
/path/to/compiler/install/lib, but instead should do .

-- 
Joseph S. Myers
jos...@codesourcery.com

[committed] libstdc++: Revert pre-C++23 support for 16-bit float types [PR111060]

2023-08-18 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux (--with-arch-32=i686). Pushed to trunk.

-- >8 --

In r14-3304-g1a566fddea212a and r14-3305-g6cf214b4fc97f5 I tried to
enable std::format for 16-bit float types before C++23. This causes
errors for targets where the types are defined but can't actually be
used, e.g. i686 without sse2.

Make the std::numeric_limits and std::formatter specializations for
_Float16 and __bfloat16_t depend on the __STDCPP_FLOAT16_T__ and
__STDCPP_BFLOAT16_T__ macros again, so they're only defined for C++23
when the type is fully supported. This is OK because the main point of
my earlier commits was to add better support for _Float32 and _Float64.
It seems fine for the new 16-bit types to only be supported for C++23,
as they were never present before GCC 13 anyway.

libstdc++-v3/ChangeLog:

PR target/111060
* include/std/format (formatter): Only define specializations
for 16-bit floating-point types for C++23.
* include/std/limits (numeric_limits): Likewise.
---
 libstdc++-v3/include/std/format | 4 ++--
 libstdc++-v3/include/std/limits | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 648f847ad96..f3d9ae152f9 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2083,7 +2083,7 @@ namespace __format
 };
 #endif
 
-#if defined(__FLT16_DIG__)
+#ifdef __STDCPP_FLOAT16_T__
   // Reuse __formatter_fp::format for _Float16.
   template<__format::__char _CharT>
 struct formatter<_Float16, _CharT>
@@ -2171,7 +2171,7 @@ namespace __format
 };
 #endif
 
-#if defined(__BFLT16_DIG__)
+#ifdef __STDCPP_BFLOAT16_T__
   // Reuse __formatter_fp::format for bfloat16_t.
   template<__format::__char _CharT>
 struct formatter<__gnu_cxx::__bfloat16_t, _CharT>
diff --git a/libstdc++-v3/include/std/limits b/libstdc++-v3/include/std/limits
index 7a59e7520eb..ec0b7a1ca7b 100644
--- a/libstdc++-v3/include/std/limits
+++ b/libstdc++-v3/include/std/limits
@@ -1982,7 +1982,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
= round_to_nearest; \
 }; 
\
 
-#ifdef __FLT16_DIG__
+#ifdef __STDCPP_FLOAT16_T__
 __glibcxx_float_n(16)
 #endif
 #ifdef __FLT32_DIG__
@@ -2002,7 +2002,7 @@ __glibcxx_float_n(128)
 # undef __max_digits10
 #endif
 
-#ifdef __BFLT16_DIG__
+#ifdef __STDCPP_BFLOAT16_T__
   __extension__
   template<>
 struct numeric_limits<__gnu_cxx::__bfloat16_t>
@@ -2079,7 +2079,7 @@ __glibcxx_float_n(128)
   static _GLIBCXX_USE_CONSTEXPR float_round_style round_style
= round_to_nearest;
 };
-#endif
+#endif // __STDCPP_BFLOAT16_T__
 
 #if defined(_GLIBCXX_USE_FLOAT128)
 // We either need Q literal suffixes, or IEEE double.
-- 
2.41.0

[PATCH v7 3/5] OpenMP: Pointers and member mappings

2023-08-18 Thread Julian Brown

This patch changes the mapping node arrangement used for array components
of derived types in order to accommodate for changes made in the previous
patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed
derived-type members instead of "GOMP_MAP_ALWAYS_POINTER".

We change the mapping nodes used for a derived-type mapping like this:

  type T
  integer, pointer, dimension(:) :: arrptr
  end type T

  type(T) :: tvar
  [...]
  !$omp target map(tofrom: tvar%arrptr)

So that the nodes used look like this:

  1) map(to: tvar%arrptr)   -->
  GOMP_MAP_TO [implicit]  *tvar%arrptr%data  (the array data)
  GOMP_MAP_TO_PSETtvar%arrptr(the descriptor)
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

  2) map(tofrom: tvar%arrptr(3:8)   -->
  GOMP_MAP_TOFROM *tvar%arrptr%data(3)  (size 8-3+1, etc.)
  GOMP_MAP_TO_PSETtvar%arrptr
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data  (bias 3, etc.)

In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer
-- so we drop it entirely.  (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)

In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:

  GOMP_MAP_STRUCT   tvar (len: 1)
  GOMP_MAP_TO_PSET  tvar%arr (size: 64, etc.)  <--. moved here
  [...]   |
  GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

In another case, if we have an array of derived-type values "dtarr",
and mappings like:

  i = 1
  j = 1
  map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))

We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):

  GOMP_MAP_STRUCT dtvar  (len: 2)
  GOMP_MAP_TO_PSETdtvar(i)%arrptr
  GOMP_MAP_TO_PSETdtvar(j)%arrptr
  [...]
  GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3)  (size: 8-3+1)
  GOMP_MAP_ATTACH_DETACH  dtvar(j)%arrptr%data
  GOMP_MAP_TO [implicit]  *dtvar(i)%arrptr%data(1)  (size: whole array)
  GOMP_MAP_ATTACH_DETACH  dtvar(i)%arrptr%data

Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.

The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement.  Also now if you have
mappings like this:

  #pragma omp target enter data map(to: dv, dv%arr(1:20))

The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:

  GOMP_MAP_TOdv

  GOMP_MAP_TO*dv%arr%data
  GOMP_MAP_TO_PSET   dv%arr <-- deleted (array section mapping)
  GOMP_MAP_ATTACH_DETACH dv%arr%data

To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion.  A new
flag is introduced so the middle-end knows when the latter two kinds
are being used specifically for an array descriptor.

This version of the patch is based on the version posted for the og13
branch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/62.html

2023-08-18  Julian Brown  

gcc/fortran/
* dependency.cc (gfc_omp_expr_prefix_same): New function.
* dependency.h (gfc_omp_expr_prefix_same): Add prototype.
* gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
union.
* trans-openmp.cc (dependency.h): Include.
(gfc_trans_omp_array_section): Adjust mapping node arrangement for
array descriptors.  Use GOMP_MAP_TO_PSET or
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR
flag set.
(gfc_symbol_rooted_namelist): New function.
(gfc_trans_omp_clauses): Check subcomponent and subarray/element
accesses elsewhere in the clause list for pointers to derived types or
array descriptors, and adjust or drop mapping nodes appropriately.
Adjust for changes to mapping node arrangement.
(gfc_trans_oacc_executable_directive): Pass code op through.

gcc/
* gimplify.cc (omp_map_clause_descriptor_p): New function.
(build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use
above function.
(omp_tsort_mapping_groups): Process nodes

[PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-08-18 Thread Julian Brown

This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time.  Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.

This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time.  Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).

For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c).  That could perhaps be addressed with a follow-up
patch, if necessary.

This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support
(though that is not yet applied to mainline).

2023-08-18  Julian Brown  

gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.

include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.

libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
---
 gcc/gimplify.cc   | 110 ++
 gcc/omp-low.cc|   1 +
 gcc/tree-pretty-print.cc  |   3 +
 include/gomp-constants.h  |   6 +
 libgomp/oacc-mem.c|   6 +-
 libgomp/target.c  |  60 +-
 .../map-arrayofstruct-1.c |  38 ++
 .../map-arrayofstruct-2.c |  58 +
 .../map-arrayofstruct-3.c |  68 +++
 .../libgomp.fortran/map-subarray-5.f90|  54 +
 10 files changed, 377 insertions(+), 27 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index fad4308a0eb4..e682583054b0 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8965,7 +8965,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
 
 static tree
 extract_base_bit_offset (tree base, poly_int64 *bitposp,
-poly_offset_int *poffsetp)
+poly_offset_int *poffsetp,
+bool *variable_offset)
 {
   tree offset;
   poly_int64 bitsize, bitpos;
@@ -8983,10 +8984,13 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp,
   if (offset && poly_int_tree_p (offset))
 {
   poffset = wi::to_poly_offset (offset);
-  offset = NULL_TREE;
+  *variable_offset = false;
 }
   else
-poffset = 0;
+{
+  poffset = 0;
+  *variable_offset = (offset != NULL_TREE);
+}
 
   if (maybe_ne (bitpos, 0))
 poffset += bits_to_bytes_round_down (bitpos);
@@ -9166,6 +9170,7 @@ omp_get_attachment (omp_mapping_group *grp)
   return error_mark_node;
 
 case GOMP_MAP_STRUCT:
+case GOMP_MAP_STRUCT_UNORD:
 case GOMP_MAP_FORCE_DEVICEPTR:
 case GOMP_MAP_DEVICE_RESIDENT:
 case GOMP_MAP_LINK:
@@ -9271,6 +9276,7 @@ omp_group_last (tree *start_p)
   break;
 
 case GOMP_MAP_STRUCT:
+case

[PATCH v7 5/5] OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc

2023-08-18 Thread Julian Brown

This patch has been separated out from the C++ "declare mapper"
support patch.  It contains just the gimplify.cc rearrangement
work, mostly moving gimplification from gimplify_scan_omp_clauses
to gimplify_adjust_omp_clauses for map clauses.

The motivation for doing this was that we don't know if we need to
instantiate mappers implicitly until the body of an offload region has
been scanned, i.e. in gimplify_adjust_omp_clauses, but we also need the
un-gimplified form of clauses to sort by base-pointer dependencies after
mapper instantiation has taken place.

The patch also reimplements the "present" clause sorting code to avoid
another sorting pass on mapping nodes.

This version of the patch is based on the version posted for og13, and
additionally incorporates a follow-on fix for DECL_VALUE_EXPR handling
in gimplify_adjust_omp_clauses:

"OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/63.html

Parts of:
"OpenMP: OpenMP 5.2 semantics for pointers with unmapped target"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623351.html

2023-08-18  Julian Brown  

gcc/
* gimplify.cc (omp_segregate_mapping_groups): Handle "present" groups.
(gimplify_scan_omp_clauses): Use mapping group functionality to
iterate through mapping nodes.  Remove most gimplification of
OMP_CLAUSE_MAP nodes from here, but still populate ctx->variables
splay tree.
(gimplify_adjust_omp_clauses): Move most gimplification of
OMP_CLAUSE_MAP nodes here.

gcc/testsuite/
* gfortran.dg/gomp/map-12.f90: Adjust scan output.
---
 gcc/gimplify.cc   | 667 +-
 gcc/testsuite/gfortran.dg/gomp/map-12.f90 |   2 +-
 2 files changed, 386 insertions(+), 283 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index e682583054b0..1e32ad48b844 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -9804,10 +9804,15 @@ omp_tsort_mapping_groups (vec 
*groups,
   return outlist;
 }
 
-/* Split INLIST into two parts, moving groups corresponding to
-   ALLOC/RELEASE/DELETE mappings to one list, and other mappings to another.
-   The former list is then appended to the latter.  Each sub-list retains the
-   order of the original list.
+/* Split INLIST into four parts:
+
+ - "present" to/from groups
+ - "present" alloc groups
+ - other to/from groups
+ - other alloc/release/delete groups
+
+   These sub-lists are then concatenated together to form the final list.
+   Each sub-list retains the order of the original list.
Note that ATTACH nodes are later moved to the end of the list in
gimplify_adjust_omp_clauses, for target regions.  */
 
@@ -9815,7 +9820,9 @@ static omp_mapping_group *
 omp_segregate_mapping_groups (omp_mapping_group *inlist)
 {
   omp_mapping_group *ard_groups = NULL, *tf_groups = NULL;
+  omp_mapping_group *pa_groups = NULL, *ptf_groups = NULL;
   omp_mapping_group **ard_tail = _groups, **tf_tail = _groups;
+  omp_mapping_group **pa_tail = _groups, **ptf_tail = _groups;
 
   for (omp_mapping_group *w = inlist; w;)
 {
@@ -9834,6 +9841,20 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist)
  ard_tail = >next;
  break;
 
+   case GOMP_MAP_PRESENT_ALLOC:
+ *pa_tail = w;
+ w->next = NULL;
+ pa_tail = >next;
+ break;
+
+   case GOMP_MAP_PRESENT_FROM:
+   case GOMP_MAP_PRESENT_TO:
+   case GOMP_MAP_PRESENT_TOFROM:
+ *ptf_tail = w;
+ w->next = NULL;
+ ptf_tail = >next;
+ break;
+
default:
  *tf_tail = w;
  w->next = NULL;
@@ -9845,8 +9866,10 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist)
 
   /* Now splice the lists together...  */
   *tf_tail = ard_groups;
+  *pa_tail = tf_groups;
+  *ptf_tail = pa_groups;
 
-  return tf_groups;
+  return ptf_groups;
 }
 
 /* Given a list LIST_P containing groups of mappings given by GROUPS, reorder
@@ -11698,119 +11721,30 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
break;
   }
 
-  if (code == OMP_TARGET
-  || code == OMP_TARGET_DATA
-  || code == OMP_TARGET_ENTER_DATA
-  || code == OMP_TARGET_EXIT_DATA)
-{
-  vec *groups;
-  groups = omp_gather_mapping_groups (list_p);
-  if (groups)
-   {
- hash_map *grpmap;
- grpmap = omp_index_mapping_groups (groups);
+  vec *groups = omp_gather_mapping_groups (list_p);
+  hash_map *grpmap = NULL;
+  unsigned grpnum = 0;
+  tree *grp_start_p = NULL, grp_end = NULL_TREE;
 
- omp_resolve_clause_dependencies (code, groups, grpmap);
- omp_build_struct_sibling_lists (code, region_type, groups, ,
- list_p);
-
- omp_mapping_group *outlist = NULL;
- bool enter_exit = (code == OMP_TARGET_ENTER_DATA
-|| code == OMP_TARGET_EXIT_DATA);
-
- /* Topological

[PATCH v7 1/5] OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause

2023-08-18 Thread Julian Brown

This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).

2023-08-18  Julian Brown  

gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.

gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
---
 gcc/c/c-typeck.cc   | 615 +-
 gcc/cp/semantics.cc | 786 ++--
 2 files changed, 706 insertions(+), 695 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 6f2fff51683a..f29dbfe6526e 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15346,321 +15346,326 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_TO:
case OMP_CLAUSE_FROM:
case OMP_CLAUSE__CACHE_:
- t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST)
-   {
- grp_start_p = pc;
- grp_sentinel = OMP_CLAUSE_CHAIN (c);
+ {
+   t = OMP_CLAUSE_DECL (c);
+   if (TREE_CODE (t) == TREE_LIST)
+ {
+   grp_start_p = pc;
+   grp_sentinel = OMP_CLAUSE_CHAIN (c);
 
- if (handle_omp_array_sections (c, ort))
-   remove = true;
- else
-   {
- t = OMP_CLAUSE_DECL (c);
- if (!omp_mappable_type (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "array section does not have mappable type "
-   "in %qs clause",
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- else if (TYPE_ATOMIC (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "%<_Atomic%> %qE in %qs clause", t,
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- while (TREE_CODE (t) == ARRAY_REF)
-   t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == COMPONENT_REF
- && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
-   {
- do
-   {
- t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == MEM_REF
- || INDIRECT_REF_P (t))
-   {
- t = TREE_OPERAND (t, 0);
- STRIP_NOPS (t);
- if (TREE_CODE (t) == POINTER_PLUS_EXPR)
-   t = TREE_OPERAND (t, 0);
-   }
-   }
- while (TREE_CODE (t) == COMPONENT_REF
-|| TREE_CODE (t) == ARRAY_REF);
-
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
- && OMP_CLAUSE_MAP_IMPLICIT (c)
- && (bitmap_bit_p (_head, DECL_UID (t))
- || bitmap_bit_p (_field_head, DECL_UID (t))
- || bitmap_bit_p (_firstprivate_head,
-  DECL_UID (t
-   {
- remove = true;
- break;
-   }
- if (bitmap_bit_p (_field_head, DECL_UID (t)))
-   break;
- if (bitmap_bit_p (_head, DECL_UID (t)))
-   {
- if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in motion "
- "clauses", t);
- else if (ort == C_ORT_ACC)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in data "
- "clauses", t);
- else
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in map "
- "clauses", t);
- remove = true;
-   }
- else
-   {
- bitmap_set_bit (_head, DECL_UID (t));
- bitmap_set_bit (_field_head, DECL_UID (t));
-   }
-   }
-   }
- if

[PATCH v7 0/5] OpenMP/OpenACC: map clause and OMP gimplify rework

2023-08-18 Thread Julian Brown

This series comprises the first few patches in support of several OpenMP
5.0 features: "lvalue" parsing for map/to/from clauses, "declare mapper"
support and array shaping/strided "target update" support.  These patches
provide the critical infrastructure changes needed to implement those
features.

Though labelled "v7", there are fewer patches here than in the
previously-posted "v6" series for mainline:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

Rather, this is more similar to the series posted for og13:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622213.html

Though rearranged somewhat and without several of the later
OpenACC-specific patches.

Further comments on individual patches.  Tested with offloading to NVPTX
and bootstrapped.  OK for mainline?

Julian Brown (5):
  OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in
{c_}finish_omp_clause
  OpenMP/OpenACC: Rework clause expansion and nested struct handling
  OpenMP: Pointers and member mappings
  OpenMP/OpenACC: Unordered/non-constant component offset runtime
diagnostic
  OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc

 gcc/c-family/c-common.h   |   74 +-
 gcc/c-family/c-omp.cc |  834 -
 gcc/c/c-parser.cc |   17 +-
 gcc/c/c-typeck.cc |  745 ++--
 gcc/cp/parser.cc  |   17 +-
 gcc/cp/pt.cc  |4 +-
 gcc/cp/semantics.cc   | 1035 +++---
 gcc/fortran/dependency.cc |  128 +
 gcc/fortran/dependency.h  |1 +
 gcc/fortran/gfortran.h|1 +
 gcc/fortran/trans-openmp.cc   |  306 +-
 gcc/gimplify.cc   | 1936 +++---
 gcc/omp-general.cc|  425 +++
 gcc/omp-general.h |   69 +
 gcc/omp-low.cc|8 +-
 gcc/testsuite/c-c++-common/gomp/clauses-2.c   |2 +-
 gcc/testsuite/c-c++-common/gomp/target-50.c   |2 +-
 .../c-c++-common/gomp/target-enter-data-1.c   |3 +-
 .../c-c++-common/gomp/target-implicit-map-2.c |3 +-
 .../g++.dg/gomp/static-component-1.C  |   23 +
 gcc/testsuite/gcc.dg/gomp/target-3.c  |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-12.f90 |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-9.f90  |2 +-
 .../gfortran.dg/gomp/map-subarray-2.f90   |   57 +
 .../gfortran.dg/gomp/map-subarray.f90 |   40 +
 gcc/tree-pretty-print.cc  |3 +
 gcc/tree.h|8 +
 include/gomp-constants.h  |6 +
 libgomp/oacc-mem.c|6 +-
 libgomp/target.c  |   98 +-
 libgomp/testsuite/libgomp.c++/baseptrs-3.C|  275 ++
 libgomp/testsuite/libgomp.c++/baseptrs-4.C| 3154 +
 libgomp/testsuite/libgomp.c++/baseptrs-5.C|   62 +
 libgomp/testsuite/libgomp.c++/class-array-1.C |   59 +
 libgomp/testsuite/libgomp.c++/target-48.C |   32 +
 libgomp/testsuite/libgomp.c++/target-49.C |   37 +
 .../libgomp.c++/target-exit-data-reftoptr-1.C |   34 +
 .../testsuite/libgomp.c++/target-lambda-1.C   |5 +-
 libgomp/testsuite/libgomp.c++/target-this-3.C |   11 +-
 libgomp/testsuite/libgomp.c++/target-this-4.C |   11 +-
 .../libgomp.c-c++-common/baseptrs-1.c |   50 +
 .../libgomp.c-c++-common/baseptrs-2.c |   70 +
 .../map-arrayofstruct-1.c |   38 +
 .../map-arrayofstruct-2.c |   58 +
 .../map-arrayofstruct-3.c |   68 +
 .../libgomp.c-c++-common/ptr-attach-2.c   |   60 +
 .../target-implicit-map-2.c   |2 +
 .../target-implicit-map-5.c   |   50 +
 .../libgomp.c-c++-common/target-map-zlas-1.c  |   36 +
 .../libgomp.fortran/map-subarray-2.f90|  108 +
 .../libgomp.fortran/map-subarray-3.f90|   62 +
 .../libgomp.fortran/map-subarray-4.f90|   35 +
 .../libgomp.fortran/map-subarray-5.f90|   54 +
 .../libgomp.fortran/map-subarray-6.f90|   26 +
 .../libgomp.fortran/map-subarray-7.f90|   29 +
 .../libgomp.fortran/map-subarray-8.f90|   47 +
 .../libgomp.fortran/map-subarray.f90  |   33 +
 .../libgomp.fortran/map-subcomponents.f90 |   32 +
 .../libgomp.fortran/struct-elem-map-1.f90 |  180 +
 59 files changed, 9039 insertions(+), 1536 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/static-component-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/map-subarray-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/map-subarray.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/baseptrs-3.C
 create mode 100644 libgomp/testsuite/libgomp.c++/baseptrs-4.C
 create mode 100644 libgomp/testsuite/libgomp.c++/baseptrs-5.C
 create mode 100644

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-08-18 Thread Iain Sandoe via Gcc-patches

Hi Joseph,

> On 18 Aug 2023, at 21:17, Joseph Myers  wrote:
> 
> On Tue, 15 Aug 2023, FX Coudert via Gcc-patches wrote:
> 
>> I am currently retesting the patches on various archs (Linux and Darwin) 
>> after a final rebase, but various previous versions were 
>> regression-tested, and have been shipped for a long time in Homebrew.
>> 
>> OK to commit?
> 
> The driver changes are OK.
> 
> I think the new configure options and the new -nodefaultrpaths compiler 
> option need documenting (I suppose there might be a case for the configure 
> option defined in libtool code being documented somewhere in libtool, if 
> there's somewhere appropriate, but I don't see that in the libtool patch 
> submission).
> 
> The help text for --enable-darwin-at-rpath refers to it as 
> --enable-darwin-at-path.

OK, we need to fix those things.

> Somewhere the documentation ought to discuss the considerations around 
> embedding such paths in binaries, and what's appropriate for building a 
> binary on the system where it's going to be used, versus using the 
> compiler to build redistributed binaries that will be run on a system that 
> doesn't have the compiler installed (and so shouldn't have hardcoded paths 
> that were only applicable on the system with the compiler, but will need 
> to be able to find shared libraries - probably shipped with the binary - 
> somehow).

Actually, the changes are not so dramatic as they seem (for Darwin) since we
already have less flexibility than other unix-like systems.

The status quo is that installed libraries embed their runpath e.g:
/path/to/compiler/install/lib/libgfortran.dylib

When executables are built, they embed the full library names for dependent
libraries (this "two-level" namespacing has some useful and not-so-useful
sides to it).

However, Darwin compiler installations are not relocatable without re-writing 
the
library names and,  because of the vendor’s decision to neuter
DYLD_LIBRARY_PATH, (the Darwin equivalent of LD_LIBRARY_PATH) it makes
it quite involved to do such a move.

===

After the change:
libraries are identified as  @rpath/libgfortran.dylib (for example)

and any executable built by the compiler has a runpath 
/path/to/compiler/install/lib.

Actually, after this change the compiler is initially relocatable (that is you 
can choose to
install it anywhere) but once installed, if you move it, that would break 
executables already
built because they embed the path to the installation.

So, in practice, for the “out of the tin” self-use of GCC, there is no 
practical difference between
the existing fixed installation and a “placed once” installation.

[however, the change means that we can correctly configure the compiler, since 
the runpaths
 at build time can be made to point to the uninstalled libraries, which is the 
underlying problem
 we are solving].

— on the topic of building applications for distribution:

The expectation for Darwin platforms is that dependent libraries are shipped 
along with
applications (it is not desirable to require that end users have to have 
elevated privs to
install them in some Well Known Place, and [other than OSS distributions like 
macports
and homebrew] there is no common place to expect to find OSS libraries).

There is quite extensive Apple Developer documentation on delivering packages 
with
co-installed libraries using @rpath (that is the intended mechanism for 
delivery since it
allows drag-and-drop installation and moving of built applications).

The revised compiler has libraries already built in a suitable manner for that 
distribution
model.

I would not propose that we repeated such information - but we could refer to 
it?

Generally, I’d prefer we suggested searching for such documentation, rather 
than linking
to it, since links can expire - does that seem reasonable?

thanks for the reviews
Iain

> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Re: [PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

2023-08-18 Thread Thiago Jung Bauermann via Gcc-patches



Hello Tobias,

Tobias Burnus  writes:

> Hello Thiago,
>
> the patch looks good to me. Thanks! Can you commit the patch yourself or
> do you need someone to do this for you?

Thank you! I don't have commit access, so I would need someone to do
this for me.

> On 15.08.23 18:17, Thiago Jung Bauermann via Gcc-patches wrote:
>> Thiago Jung Bauermann  writes:
>>
>>> Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
>>> changed the compiler error message in this testcase from
>>>
>>> : In instantiation of 'void foo() [with T = int]':
>>> :14:11:   required from here
>>> :8:22: error: 'int' is not a class, struct, or union type
>>> :8:22: error: 'int' is not a class, struct, or union type
>>> :8:22: error: 'int' is not a class, struct, or union type
>>> :8:3: error: expected iteration declaration or initialization
>>> compiler exited with status 1
>>>
>>> to:
>>>
>>> : In instantiation of 'void foo() [with T = int]':
>>> :14:11:   required from here
>>> :8:22: error: 'int' is not a class, struct, or union type
>>> :8:3: error: invalid type for iteration variable 'i'
>>> compiler exited with status 1
>>> Excess errors:
>>> :8:3: error: invalid type for iteration variable 'i'
>>>
>>> Andrew Pinski analysed the issue in PR 110756 and considered that it was a
>>> testsuite issue in that the error message changed slightly.  Also, it's a
>>> better error message.
>>>
>>> Therefore, we only need to adjust the testcase to expect the new message.
>>>
>>> gcc/testsuite/ChangeLog:
>>>  PR testsuite/110756
>>>  g++.dg/gomp/pr58567.C: Adjust to new compiler error message.
>>> ---
>>>   gcc/testsuite/g++.dg/gomp/pr58567.C | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/testsuite/g++.dg/gomp/pr58567.C 
>>> b/gcc/testsuite/g++.dg/gomp/pr58567.C
>>> index 35a5bb027ffe..866d831c65e4 100644
>>> --- a/gcc/testsuite/g++.dg/gomp/pr58567.C
>>> +++ b/gcc/testsuite/g++.dg/gomp/pr58567.C
>>> @@ -5,7 +5,7 @@
>>>   template void foo()
>>>   {
>>> #pragma omp parallel for
>>> -  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
>>> class, struct, or union type|expected iteration declaration or 
>>> initialization" } */
>>> +  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
>>> class, struct, or union type|invalid type for iteration variable 'i'" } */
>>>   ;
>>>   }
>>>
>> Ping? I just tested trunk. It still fails this test, and this patch
>> still fixes the failures.
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München;
> Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank 
> Thürauf;
> Sitz der Gesellschaft: München; Registergericht München, HRB 106955


-- 
Thiago

Re: [PATCH] testsuite: Improve test in dg-require-python-h

2023-08-18 Thread Thiago Jung Bauermann via Gcc-patches



Eric Feng  writes:

> Thanks for the patch, Thiago. I've pushed it to trunk:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=6785917c9103e18bba0d718ac3b65a386d9a14f7.

Thank you, Eric and Dave.

> On Fri, Aug 18, 2023 at 2:11 PM David Malcolm  wrote:
>>
>> On Thu, 2023-08-17 at 23:30 -0300, Thiago Jung Bauermann wrote:
>> > If GCC is tested with a sysroot which doesn't contain a Python
>> > installation (e.g., with a command such as
>> > "make check-gcc-c FLAGS_UNDER_TEST="--sysroot=/some/path"), but
>> > there's
>> > a python3-config in $PATH, then the testsuite will pick up the host's
>> > Python.h which can't actually be used:
>> >
>> > Executing on host: python3-config --includes(timeout = 300)
>> > spawn -ignore SIGHUP python3-config --includes
>> > -I/usr/include/python3.10 -I/usr/include/python3.10
>> > Executing on host: /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --
>> > sysroot=/some/sysroot/libc -Wl,-dynamic-
>> > linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
>> > rpath=/some/sysroot/libc/lib
>> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
>> > 2.c-fdiagnostics-plain-output  -
>> > fplugin=./analyzer_cpython_plugin.so -fanalyzer -
>> > I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-
>> > plugin-test-2.s(timeout = 600)
>> > spawn -ignore SIGHUP /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc
>> > --sysroot=/some/sysroot/libc -Wl,-dynamic-
>> > linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
>> > rpath=/some/sysroot/libc/lib
>> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
>> > -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -
>> > fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o
>> > cpython-plugin-test-2.s
>> > In file included from /usr/include/python3.10/Python.h:8,
>> >  from
>> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
>> > 2.c:8:
>> > /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
>> > gnu/python3.10/pyconfig.h: No such file or directory
>> > compilation terminated.
>> > compiler exited with status 1
>> >
>> > This problem causes these testsuite failures:
>> >
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 17)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 18)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 21)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 31)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 32)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 35)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 45)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 55)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 63)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 66)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 68)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 69)
>> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
>> > fplugin=./analyzer_cpython_plugin.so (test for excess errors)
>> > Excess errors:
>> > /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
>> > gnu/python3.10/pyconfig.h: No such file or directory
>> > compilation terminated.
>> >
>> > So try to compile a test file so that the testcase can be marked as
>> > unsupported instead.
>> >
>> > gcc/testsuite/ChangeLog:
>> > * gcc/testsuite/lib/target-supports.exp (dg-require-python-
>> > h): Test
>> > whether Python.h can really be used.
>> > ---
>> >  gcc/testsuite/lib/target-supports.exp | 14 --
>> >  1 file changed, 12 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/gcc/testsuite/lib/target-supports.exp
>> > b/gcc/testsuite/lib/target-supports.exp
>> > index 92b6f69730e9..5b5f86551844 100644
>> > --- a/gcc/testsuite/lib/target-supports.exp
>> > +++ b/gcc/testsuite/lib/target-supports.exp
>> > @@ -12570,11 +12570,21 @@ proc dg-require-python-h { args } {
>> >
>> >  verbose "ENTER dg-require-python-h" 2
>> >
>> > +set supported 0
>> >  set result [remote_exec host "python3-config --includes"]
>> >  set status [lindex $result 0]
>> >  if { $status == 0 } {
>> > -set python_flags [lindex $result 1]
>> > -

[PATCH] ipa-sra: Allow IPA-SRA in presence of returns which will be removed

2023-08-18 Thread Martin Jambor

Hi,

testing on 32bit arm revealed that even the simplest case of PR 110378
was still not resolved there because destructors were returning this
pointer.  Needless to say, the return value of those destructors often
is just not used, which IPA-SRA can already detect in time.  Since
such enhancement seems generally useful, here it is.

The patch simply adds two flag to respective summaries to mark down
situations when it encounters either a simple direct use of a default
definition SSA_NAME of a parameter, which means that the parameter may
still be split when return value is removed, and when any derived use
of it is returned, allowing for complete removal in that case, instead
of discarding it as a candidate for removal or splitting like we do
now.  The IPA phase then simply checks that we indeed plan to remove
the return value before allowing any transformation to be considered
in such cases.

Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
master?

Thanks,

Martin


gcc/ChangeLog:

2023-08-18  Martin Jambor  

PR ipa/110378
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::mark_dead_statements): Verify that any
return uses of PARAM will be removed.
(ipa_param_body_adjustments::mark_clobbers_dead): Likewise.
* ipa-sra.cc (isra_param_desc): New fields
remove_only_when_retval_removed and split_only_when_retval_removed.
(struct gensum_param_desc): Likewise.  Fix comment long line.
(ipa_sra_function_summaries::duplicate): Copy the new flags.
(dump_gensum_param_descriptor): Dump the new flags.
(dump_isra_param_descriptor): Likewise.
(isra_track_scalar_value_uses): New parameter desc.  Set its flag
remove_only_when_retval_removed when encountering a simple return.
(isra_track_scalar_param_local_uses): Replace parameter call_uses_p
with desc.  Pass it to isra_track_scalar_value_uses and set its
call_uses.
(ptr_parm_has_nonarg_uses): Accept parameter descriptor as a
parameter.  If there is a direct return use, mark any..
(create_parameter_descriptors): Pass the whole parameter descriptor to
isra_track_scalar_param_local_uses and ptr_parm_has_nonarg_uses.
(process_scan_results): Copy the new flags.
(isra_write_node_summary): Stream the new flags.
(isra_read_node_info): Likewise.
(adjust_parameter_descriptions): Check that transformations
requring return removal only happen when return value is removed.
Restructure main loop.  Adjust dump message.

gcc/testsuite/ChangeLog:

2023-08-18  Martin Jambor  

PR ipa/110378
* gcc.dg/ipa/ipa-sra-32.c: New test.
* gcc.dg/ipa/pr110378-4.c: Likewise.
* gcc.dg/ipa/ipa-sra-4.c: Use a return value.
---
 gcc/ipa-param-manipulation.cc |   7 +-
 gcc/ipa-sra.cc| 247 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c |  30 
 gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c  |   4 +-
 gcc/testsuite/gcc.dg/ipa/pr110378-4.c |  50 ++
 5 files changed, 251 insertions(+), 87 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-32.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110378-4.c

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 4a185ddbdf4..ae52f17b2c9 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -1163,6 +1163,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param,
stack.safe_push (lhs);
}
}
+ else if (gimple_code (stmt) == GIMPLE_RETURN)
+   gcc_assert (m_adjustments && m_adjustments->m_skip_return);
  else
/* IPA-SRA does not analyze other types of statements.  */
gcc_unreachable ();
@@ -1182,7 +1184,8 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param,
 }
 
 /* Put all clobbers of of dereference of default definition of PARAM into
-   m_dead_stmts.  */
+   m_dead_stmts.  If there are returns among uses of the default definition of
+   PARAM, verify they will be stripped off the return value.  */
 
 void
 ipa_param_body_adjustments::mark_clobbers_dead (tree param)
@@ -1200,6 +1203,8 @@ ipa_param_body_adjustments::mark_clobbers_dead (tree 
param)
  gimple *stmt = USE_STMT (use_p);
  if (gimple_clobber_p (stmt))
m_dead_stmts.add (stmt);
+ else if (gimple_code (stmt) == GIMPLE_RETURN)
+   gcc_assert (m_adjustments && m_adjustments->m_skip_return);
}
 }
 
diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index edba364f56e..817f29ea62f 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -185,6 +185,13 @@ struct GTY(()) isra_param_desc
   unsigned split_candidate : 1;
   /* Is this a parameter passing stuff by reference?  */
   unsigned by_ref : 1;
+  /* If set, this parameter can only be a candidate for removal if the function
+ is going to

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-08-18 Thread Joseph Myers

On Tue, 15 Aug 2023, FX Coudert via Gcc-patches wrote:

> I am currently retesting the patches on various archs (Linux and Darwin) 
> after a final rebase, but various previous versions were 
> regression-tested, and have been shipped for a long time in Homebrew.
> 
> OK to commit?

The driver changes are OK.

I think the new configure options and the new -nodefaultrpaths compiler 
option need documenting (I suppose there might be a case for the configure 
option defined in libtool code being documented somewhere in libtool, if 
there's somewhere appropriate, but I don't see that in the libtool patch 
submission).

The help text for --enable-darwin-at-rpath refers to it as 
--enable-darwin-at-path.

Somewhere the documentation ought to discuss the considerations around 
embedding such paths in binaries, and what's appropriate for building a 
binary on the system where it's going to be used, versus using the 
compiler to build redistributed binaries that will be run on a system that 
doesn't have the compiler installed (and so shouldn't have hardcoded paths 
that were only applicable on the system with the compiler, but will need 
to be able to find shared libraries - probably shipped with the binary - 
somehow).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] testsuite: Improve test in dg-require-python-h

2023-08-18 Thread Eric Feng via Gcc-patches

Thanks for the patch, Thiago. I've pushed it to trunk:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=6785917c9103e18bba0d718ac3b65a386d9a14f7.

Best,
Eric

On Fri, Aug 18, 2023 at 2:11 PM David Malcolm  wrote:
>
> On Thu, 2023-08-17 at 23:30 -0300, Thiago Jung Bauermann wrote:
> > If GCC is tested with a sysroot which doesn't contain a Python
> > installation (e.g., with a command such as
> > "make check-gcc-c FLAGS_UNDER_TEST="--sysroot=/some/path"), but
> > there's
> > a python3-config in $PATH, then the testsuite will pick up the host's
> > Python.h which can't actually be used:
> >
> > Executing on host: python3-config --includes(timeout = 300)
> > spawn -ignore SIGHUP python3-config --includes
> > -I/usr/include/python3.10 -I/usr/include/python3.10
> > Executing on host: /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --
> > sysroot=/some/sysroot/libc -Wl,-dynamic-
> > linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
> > rpath=/some/sysroot/libc/lib
> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> > 2.c-fdiagnostics-plain-output  -
> > fplugin=./analyzer_cpython_plugin.so -fanalyzer -
> > I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-
> > plugin-test-2.s(timeout = 600)
> > spawn -ignore SIGHUP /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc
> > --sysroot=/some/sysroot/libc -Wl,-dynamic-
> > linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
> > rpath=/some/sysroot/libc/lib
> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> > -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -
> > fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o
> > cpython-plugin-test-2.s
> > In file included from /usr/include/python3.10/Python.h:8,
> >  from
> > /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> > 2.c:8:
> > /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
> > gnu/python3.10/pyconfig.h: No such file or directory
> > compilation terminated.
> > compiler exited with status 1
> >
> > This problem causes these testsuite failures:
> >
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 17)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 18)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 21)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 31)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 32)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 35)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 45)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 55)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 63)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 66)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 68)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 69)
> > FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> > fplugin=./analyzer_cpython_plugin.so (test for excess errors)
> > Excess errors:
> > /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
> > gnu/python3.10/pyconfig.h: No such file or directory
> > compilation terminated.
> >
> > So try to compile a test file so that the testcase can be marked as
> > unsupported instead.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc/testsuite/lib/target-supports.exp (dg-require-python-
> > h): Test
> > whether Python.h can really be used.
> > ---
> >  gcc/testsuite/lib/target-supports.exp | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/testsuite/lib/target-supports.exp
> > b/gcc/testsuite/lib/target-supports.exp
> > index 92b6f69730e9..5b5f86551844 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -12570,11 +12570,21 @@ proc dg-require-python-h { args } {
> >
> >  verbose "ENTER dg-require-python-h" 2
> >
> > +set supported 0
> >  set result [remote_exec host "python3-config --includes"]
> >  set status [lindex $result 0]
> >  if { $status == 0 } {
> > -set python_flags [lindex $result 1]
> > -} else {
> > +   # Remove trailing newline from python3-config output.
> > +   set python_flags [string trim [lindex $result 1]]
>

Re: [PATCH] RISC-V: Allow immediates 17-31 for vector shift.

2023-08-18 Thread Palmer Dabbelt


On Fri, 18 Aug 2023 12:37:06 PDT (-0700), rdapp@gmail.com wrote:

Hi,

this patch adds a missing constraint check in order to be able to
print (and not ICE) vector immediates 17-31 for vector shifts.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.
---
 gcc/config/riscv/riscv.cc|  3 ++-
 .../riscv/rvv/autovec/binop/shift-immediate.c| 16 
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 49062bef9fc..0f60ffe5f60 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4954,7 +4954,8 @@ riscv_print_operand (FILE *file, rtx op, int letter)


Looks like the comment at the top of riscv_print_operand() is way out of 
date.  Maybe we should just toss it?



else if (satisfies_constraint_Wc0 (op))
  asm_fprintf (file, "0");
else if (satisfies_constraint_vi (op)
-|| satisfies_constraint_vj (op))
+|| satisfies_constraint_vj (op)
+|| satisfies_constraint_vk (op))
  asm_fprintf (file, "%wd", INTVAL (elt));
else
  output_operand_lossage ("invalid vector constant");
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c
new file mode 100644
index 000..a2e1c33f4fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv32gcv -mabi=ilp32d -O2 
--param=riscv-autovec-preference=scalable" } */
+
+#define uint8_t unsigned char
+
+void foo1 (uint8_t *a)
+{
+uint8_t b = a[0];
+int val = 0;
+
+for (int i = 0; i < 4; i++)
+{
+a[i] = (val & 1) ? (-val) >> 17 : val;
+val += b;
+}
+}



Unless I'm missing something it looks like we're missing at least Wc1 as 
well, and maybe a few others?


Either way

Reviewed-by: Palmer Dabbelt 

Thanks!

Re: [PATCH] c: Add support for [[extension ...]]

2023-08-18 Thread Joseph Myers

On Fri, 18 Aug 2023, Richard Sandiford via Gcc-patches wrote:

> [[]] attributes are a recent addition to C, but as a GNU extension,
> GCC allows them to be used in C11 and earlier.  Normally this use
> would trigger a pedwarn (for -pedantic, -Wc11-c2x-compat, etc.).
> 
> This patch allows the pedwarn to be suppressed by starting the
> attribute-list with __extension__.
> 
> Also, :: is not a single lexing token prior to C2X, so it wasn't
> possible to use scoped attributes in C11, even as a GNU extension.
> The patch allows two colons to be used in place of :: when
> __extension__ is used.  No attempt is made to check whether the
> two colons are immediately adjacent.
> 
> gcc/
>   * doc/extend.texi: Document the C [[__extension__ ...]] construct.
> 
> gcc/c/
>   * c-parser.cc (c_parser_std_attribute): Conditionally allow
>   two colons to be used in place of ::.
>   (c_parser_std_attribute_list): New function, split out from...
>   (c_parser_std_attribute_specifier): ...here.  Allow the attribute-list
>   to start with __extension__.  When it does, also allow two colons
>   to be used in place of ::.
> 
> gcc/testsuite/
>   * gcc.dg/c2x-attr-syntax-6.c: New test.
>   * gcc.dg/c2x-attr-syntax-7.c: Likewise.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] RISC-V: Allow immediates 17-31 for vector shift.

2023-08-18 Thread Robin Dapp via Gcc-patches

Hi,

this patch adds a missing constraint check in order to be able to
print (and not ICE) vector immediates 17-31 for vector shifts.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.
---
 gcc/config/riscv/riscv.cc|  3 ++-
 .../riscv/rvv/autovec/binop/shift-immediate.c| 16 
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 49062bef9fc..0f60ffe5f60 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4954,7 +4954,8 @@ riscv_print_operand (FILE *file, rtx op, int letter)
else if (satisfies_constraint_Wc0 (op))
  asm_fprintf (file, "0");
else if (satisfies_constraint_vi (op)
-|| satisfies_constraint_vj (op))
+|| satisfies_constraint_vj (op)
+|| satisfies_constraint_vk (op))
  asm_fprintf (file, "%wd", INTVAL (elt));
else
  output_operand_lossage ("invalid vector constant");
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c
new file mode 100644
index 000..a2e1c33f4fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-immediate.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv32gcv -mabi=ilp32d -O2 
--param=riscv-autovec-preference=scalable" } */
+
+#define uint8_t unsigned char
+
+void foo1 (uint8_t *a)
+{
+uint8_t b = a[0];
+int val = 0;
+
+for (int i = 0; i < 4; i++)
+{
+a[i] = (val & 1) ? (-val) >> 17 : val;
+val += b;
+}
+}
-- 
2.41.0

[PATCH] RISC-V/testsuite: Add missing conversion tests.

2023-08-18 Thread Robin Dapp via Gcc-patches

Hi,

this patch adds some missing tests for vf[nw]cvt.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c:
Add tests.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.
---
 .../rvv/autovec/conversions/vfncvt-ftoi-run.c | 96 +++
 .../autovec/conversions/vfncvt-ftoi-rv32gcv.c |  6 +-
 .../autovec/conversions/vfncvt-ftoi-rv64gcv.c |  6 +-
 .../conversions/vfncvt-ftoi-template.h|  6 ++
 .../autovec/conversions/vfncvt-itof-rv32gcv.c |  1 +
 .../autovec/conversions/vfncvt-itof-rv64gcv.c |  4 +-
 .../conversions/vfncvt-itof-template.h|  5 +-
 .../conversions/vfncvt-itof-zvfh-run.c| 32 +++
 .../autovec/conversions/vfwcvt-ftoi-rv32gcv.c |  4 +-
 .../autovec/conversions/vfwcvt-ftoi-rv64gcv.c |  4 +-
 .../conversions/vfwcvt-ftoi-template.h|  2 +
 .../conversions/vfwcvt-ftoi-zvfh-run.c| 32 +++
 .../rvv/autovec/conversions/vfwcvt-itof-run.c | 96 +++
 .../autovec/conversions/vfwcvt-itof-rv32gcv.c |  4 +-
 .../autovec/conversions/vfwcvt-itof-rv64gcv.c |  4 +-
 .../conversions/vfwcvt-itof-template.h| 10 +-
 .../conversions/vfwcvt-itof-zvfh-run.c| 10 +-
 17 files changed, 302 insertions(+), 20 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
index ce3fcfa9af8..73eda067ba3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c
@@ -62,6 +62,38 @@ main ()
   RUN2 (float, uint16_t, 4096)
   RUN2 (float, uint16_t, 5975)
 
+  RUN (float, int8_t, 3)
+  RUN (float, int8_t, 4)
+  RUN (float, int8_t, 7)
+  RUN (float, int8_t, 99)
+  RUN (float, int8_t, 119)
+  RUN (float, int8_t, 128)
+  RUN (float, int8_t, 256)
+  RUN (float, int8_t, 279)
+  RUN (float, int8_t, 555)
+  RUN (float, int8_t, 1024)
+  RUN (float, int8_t, 1389)
+  RUN (float, int8_t, 2048)
+  RUN (float, int8_t, 3989)
+  RUN (float, int8_t, 4096)
+  RUN (float, int8_t, 5975)
+
+  RUN2 (float, uint8_t, 3)
+  RUN2 (float, uint8_t, 4)
+  RUN2 (float, uint8_t, 7)
+  RUN2 (float, uint8_t, 99)
+  RUN2 (float, uint8_t, 119)
+  RUN2 (float, uint8_t, 128)
+  RUN2 (float, uint8_t, 256)
+  RUN2 (float, uint8_t, 279)
+  RUN2 (float, uint8_t, 555)
+  RUN2 (float, uint8_t, 1024)
+  RUN2 (float, uint8_t, 1389)
+  RUN2 (float, uint8_t, 2048)
+  RUN2 (float, uint8_t, 3989)
+  RUN2 (float, uint8_t, 4096)
+  RUN2 (float, uint8_t, 5975)
+
   RUN (double, int32_t, 3)
   RUN (double, int32_t, 4)
   RUN (double, int32_t, 7)
@@ -93,4 +125,68 @@ main ()
   RUN2 (double, uint32_t, 3989)
   RUN2 (double, uint32_t, 4096)
   RUN2 (double, uint32_t, 5975)
+
+  RUN (double, int16_t, 3)
+  RUN (double, int16_t, 4)
+  RUN (double, int16_t, 7)
+  RUN (double, int16_t, 99)
+  RUN (double, int16_t, 119)
+  RUN (double, int16_t, 128)
+  RUN (double, int16_t, 256)
+  RUN (double, int16_t, 279)
+  RUN (double, int16_t, 555)
+  RUN (double, int16_t, 1024)
+  RUN (double, int16_t, 1389)
+  RUN (double, int16_t, 2048)
+  RUN (double, int16_t, 3989)
+  RUN (double, int16_t, 4096)
+  RUN (double, int16_t, 5975)
+
+  RUN2 (double, uint16_t, 3)
+  RUN2 (double, uint16_t, 4)
+  RUN2 (double, uint16_t, 7)
+  RUN2 (double, uint16_t, 99)
+  RUN2 (double, uint16_t, 119)
+  RUN2 (double, uint16_t, 128)
+  RUN2 (double, uint16_t, 256)
+  RUN2 (double, uint16_t, 279)
+  RUN2 (double, uint16_t, 555)
+  RUN2 (double, uint16_t, 1024)
+  RUN2 (double, uint16_t, 1389)
+  RUN2

Re: [PATCH] libstdc++ Add cstdarg to freestanding

2023-08-18 Thread Paul M. Bendixen via Gcc-patches

Hi
Jonathan, I just went over the proposal again as well as [compliance],
which Arsen mentioned ( https://wg21.link/compliance ) don't seem to
mention either  or .

Shouldn't I just stick to the ones we know are in?

(Still working on figuring out how to do the change log thing)

Best regards
Paul

Den ons. 16. aug. 2023 kl. 18.50 skrev Paul M. Bendixen <
paulbendi...@gmail.com>:

> Yes, the other files are in another committee proposal, and I'm working my
> way through the proposals one by one.
> Thank you for the feedback, I'll update and resend
> /Paul
>
> Den ons. 16. aug. 2023 kl. 15.51 skrev Arsen Arsenović :
>
>>
>> Jonathan Wakely  writes:
>>
>> > On Fri, 21 Jul 2023 at 22:23, Paul M. Bendixen via Libstdc++
>> >  wrote:
>> >>
>> >> P1642 includes the header cstdarg to the freestanding implementation.
>> >> This was probably left out by accident, this patch puts it in.
>> >> Since this is one of the headers that go in whole cloth, there should
>> be no
>> >> further actions needed.
>> >
>> > Thanks for the patch. I agree that  should be freestanding,
>> > but I think  and  were also missed from the
>> > change. Arsen?
>>
>> Indeed, we should include all three, and according to [compliance],
>> there's a couple more headers that we should provide (cwchar, cstring,
>> cerrno, and cmath, but these are probably significantly more involved,
>> so we can handle them separately).
>>
>> As guessed, the omission was not intentional.
>>
>> If you could, add those two to the patch as well, edit Makefile.am and
>> regenerate using automake 1.15.1, and see
>> https://gcc.gnu.org/contribute.html wrt. changelogs in commit messages.
>>
>> Thank you!  Have a lovely day :-)
>>
>> [compliance]: https://eel.is/c++draft/compliance
>>
>> > Also, the patch should change include/Makefile.am as well (the .in
>> > file is autogenerated from that one).
>> >
>> >
>> >> This might be related to PR106953, but since that one touches the
>> partial
>> >> headers I'm not sure
>>
>> The headers mentioned in this PR are provided in freestanding,
>> partially, in 13 already, indeed.
>>
>> >> /Paul M. Bendixen
>> >>
>> >> --
>> >> • − − •/• −/• • −/• − • •/− • • •/•/− •/− • •/• •/− • • −/•/− •/• − −
>> •−
>> >> •/− − •/− −/• −/• •/• − • •/• − • − • −/− • − •/− − −/− −//
>>
>>
>> --
>> Arsen Arsenović
>>
>
>
> --
> • − − •/• −/• • −/• − • •/− • • •/•/− •/− • •/• •/− • • −/•/− •/• − − •−
> •/− − •/− −/• −/• •/• − • •/• − • − • −/− • − •/− − −/− −//
>


-- 
• − − •/• −/• • −/• − • •/− • • •/•/− •/− • •/• •/− • • −/•/− •/• − − •−
•/− − •/− −/• −/• •/• − • •/• − • − • −/− • − •/− − −/− −//

Re: [PATCH] testsuite: Improve test in dg-require-python-h

2023-08-18 Thread David Malcolm via Gcc-patches

On Thu, 2023-08-17 at 23:30 -0300, Thiago Jung Bauermann wrote:
> If GCC is tested with a sysroot which doesn't contain a Python
> installation (e.g., with a command such as
> "make check-gcc-c FLAGS_UNDER_TEST="--sysroot=/some/path"), but
> there's
> a python3-config in $PATH, then the testsuite will pick up the host's
> Python.h which can't actually be used:
> 
> Executing on host: python3-config --includes    (timeout = 300)
> spawn -ignore SIGHUP python3-config --includes
> -I/usr/include/python3.10 -I/usr/include/python3.10
> Executing on host: /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --
> sysroot=/some/sysroot/libc -Wl,-dynamic-
> linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
> rpath=/some/sysroot/libc/lib 
> /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> 2.c    -fdiagnostics-plain-output  -
> fplugin=./analyzer_cpython_plugin.so -fanalyzer -
> I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-
> plugin-test-2.s    (timeout = 600)
> spawn -ignore SIGHUP /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc
> --sysroot=/some/sysroot/libc -Wl,-dynamic-
> linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-
> rpath=/some/sysroot/libc/lib
> /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -
> fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o
> cpython-plugin-test-2.s
> In file included from /usr/include/python3.10/Python.h:8,
>  from
> /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> 2.c:8:
> /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
> gnu/python3.10/pyconfig.h: No such file or directory
> compilation terminated.
> compiler exited with status 1
> 
> This problem causes these testsuite failures:
> 
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 17)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 18)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 21)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 31)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 32)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 35)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 45)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 55)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 63)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 66)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 68)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 69)
> FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -
> fplugin=./analyzer_cpython_plugin.so (test for excess errors)
> Excess errors:
> /usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-
> gnu/python3.10/pyconfig.h: No such file or directory
> compilation terminated.
> 
> So try to compile a test file so that the testcase can be marked as
> unsupported instead.
> 
> gcc/testsuite/ChangeLog:
> * gcc/testsuite/lib/target-supports.exp (dg-require-python-
> h): Test
>     whether Python.h can really be used.
> ---
>  gcc/testsuite/lib/target-supports.exp | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index 92b6f69730e9..5b5f86551844 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -12570,11 +12570,21 @@ proc dg-require-python-h { args } {
>  
>  verbose "ENTER dg-require-python-h" 2
>  
> +    set supported 0
>  set result [remote_exec host "python3-config --includes"]
>  set status [lindex $result 0]
>  if { $status == 0 } {
> -    set python_flags [lindex $result 1]
> -    } else {
> +   # Remove trailing newline from python3-config output.
> +   set python_flags [string trim [lindex $result 1]]
> +   if [check_no_compiler_messages python_h assembly {
> +   #include 
> +   int main (void) { return 0; }
> +   } $python_flags] {
> +   set supported 1
> +   }
> +    }
> +
> +    if { $supported == 0 } {
> verbose "Python.h not supported" 2
> upvar dg-do-what dg-do-what
> set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
> 
>

[OG13, committed 2/3] OpenMP: C++ attribute syntax fixes/testcases for "declare mapper"

2023-08-18 Thread Sandra Loosemore

gcc/c-family/ChangeLog
* c-omp.cc (c_omp_directives): Uncomment "declare mapper" entry.

gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_declare_mapper): Allow commas between
clauses.

gcc/testsuite/ChangeLog
* g++.dg/gomp/attrs-declare-mapper-3.C: New file.
* g++.dg/gomp/attrs-declare-mapper-4.C: New file.
* g++.dg/gomp/attrs-declare-mapper-5.C: New file.
* g++.dg/gomp/attrs-declare-mapper-6.C: New file.
---
 gcc/c-family/ChangeLog.omp|  4 +
 gcc/c-family/c-omp.cc |  4 +-
 gcc/cp/ChangeLog.omp  |  5 ++
 gcc/cp/parser.cc  |  2 +
 gcc/testsuite/ChangeLog.omp   |  7 ++
 .../g++.dg/gomp/attrs-declare-mapper-3.C  | 31 
 .../g++.dg/gomp/attrs-declare-mapper-4.C  | 74 +++
 .../g++.dg/gomp/attrs-declare-mapper-5.C  | 26 +++
 .../g++.dg/gomp/attrs-declare-mapper-6.C  | 22 ++
 9 files changed, 173 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-4.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-5.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-6.C

diff --git a/gcc/c-family/ChangeLog.omp b/gcc/c-family/ChangeLog.omp
index 60f5c29276f..40cb8e811e5 100644
--- a/gcc/c-family/ChangeLog.omp
+++ b/gcc/c-family/ChangeLog.omp
@@ -1,3 +1,7 @@
+2023-08-18  Sandra Loosemore  
+
+   * c-omp.cc (c_omp_directives): Uncomment "declare mapper" entry.
+
 2023-08-10  Julian Brown  
 
* c-common.h (c_omp_region_type): Add C_ORT_UPDATE and C_ORT_OMP_UPDATE
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 9ff09b59bc6..7bc69e9e2da 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -5500,8 +5500,8 @@ const struct c_omp_directive c_omp_directives[] = {
 C_OMP_DIR_STANDALONE, false },
   { "critical", nullptr, nullptr, PRAGMA_OMP_CRITICAL,
 C_OMP_DIR_CONSTRUCT, false },
-  /* { "declare", "mapper", nullptr, PRAGMA_OMP_DECLARE,
-C_OMP_DIR_DECLARATIVE, false },  */
+  { "declare", "mapper", nullptr, PRAGMA_OMP_DECLARE,
+C_OMP_DIR_DECLARATIVE, false },
   { "declare", "reduction", nullptr, PRAGMA_OMP_DECLARE,
 C_OMP_DIR_DECLARATIVE, true },
   { "declare", "simd", nullptr, PRAGMA_OMP_DECLARE,
diff --git a/gcc/cp/ChangeLog.omp b/gcc/cp/ChangeLog.omp
index 6e154ea3426..1b2d71422d8 100644
--- a/gcc/cp/ChangeLog.omp
+++ b/gcc/cp/ChangeLog.omp
@@ -1,3 +1,8 @@
+2023-08-18  Sandra Loosemore  
+
+   * parser.cc (cp_parser_omp_declare_mapper): Allow commas between
+   clauses.
+
 2023-08-18  Sandra Loosemore  
 
* parser.cc (analyze_metadirective_body): Handle CPP_PRAGMA and
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 84ec0de6c69..0ce2a7be608 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -50358,6 +50358,8 @@ cp_parser_omp_declare_mapper (cp_parser *parser, 
cp_token *pragma_tok,
 
   while (cp_lexer_next_token_is_not (parser->lexer, CPP_PRAGMA_EOL))
 {
+  if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
+   cp_lexer_consume_token (parser->lexer);
   pragma_omp_clause c_kind = cp_parser_omp_clause_name (parser);
   if (c_kind != PRAGMA_OMP_CLAUSE_MAP)
{
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index a9b4ac3d0a7..a7d11777988 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,10 @@
+2023-08-18  Sandra Loosemore  
+
+   * g++.dg/gomp/attrs-declare-mapper-3.C: New file.
+   * g++.dg/gomp/attrs-declare-mapper-4.C: New file.
+   * g++.dg/gomp/attrs-declare-mapper-5.C: New file.
+   * g++.dg/gomp/attrs-declare-mapper-6.C: New file.
+
 2023-08-18  Sandra Loosemore  
 
* g++.dg/gomp/attrs-metadirective-1.C: New file.
diff --git a/gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-3.C 
b/gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-3.C
new file mode 100644
index 000..36345fe0dc2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-3.C
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++11 } } 
+// { dg-additional-options "-fdump-tree-gimple" }
+
+#include 
+
+// Test named mapper invocation.
+
+struct S {
+  int *ptr;
+  int size;
+};
+
+int main (int argc, char *argv[])
+{
+  int N = 1024;
+  [[omp::directive (declare mapper (mapN:struct S s)
+   map(to:s.ptr, s.size)
+   map(s.ptr[:N]))]];
+
+  struct S s;
+  s.ptr = (int *) malloc (sizeof (int) * N);
+
+  [[omp::directive (target map(mapper(mapN), tofrom: s))]]
+// { dg-final { scan-tree-dump {map\(struct:s \[len: 2\]\) map\(alloc:s\.ptr 
\[len: [0-9]+\]\) map\(to:s\.size \[len: [0-9]+\]\) map\(tofrom:\*_[0-9]+ 
\[len: _[0-9]+\]\) map\(attach:s\.ptr \[bias: 0\]\)} "gimple" } }
+  {
+for (int i = 0; i < N; i++)
+  s.ptr[i]++;
+  }
+
+  return 0;
+}
diff --git

[OG13, committed 3/3] OpenMP: C++ attribute syntax fixes/testcases for loop transformations

2023-08-18 Thread Sandra Loosemore

gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_all_clauses): Allow comma before first
clause.
(cp_parser_see_omp_loop_nest): Accept C++ standard attributes
before RID_FOR.
(cp_parser_omp_loop_nest): Process C++ standard attributes like
pragmas.  Improve error handling for bad pragmas/attributes.
Use cp_parser_see_omp_loop_nest instead of duplicating what it
does.
(cp_parser_omp_tile_sizes): Permit comma before the clause.
(cp_parser_omp_tile): Assert that this isn't called for inner
directive.
(cp_parser_omp_unroll): Likewise.

gcc/testsuite/ChangeLog
* g++.dg/gomp/loop-transforms/attrs-tile-1.C: New file.
* g++.dg/gomp/loop-transforms/attrs-tile-2.C: New file.
* g++.dg/gomp/loop-transforms/attrs-tile-3.C: New file.
* g++.dg/gomp/loop-transforms/attrs-unroll-1.C: New file.
* g++.dg/gomp/loop-transforms/attrs-unroll-2.C: New file.
* g++.dg/gomp/loop-transforms/attrs-unroll-3.C: New file.
* g++.dg/gomp/loop-transforms/attrs-unroll-inner-1.C: New file.
* g++.dg/gomp/loop-transforms/attrs-unroll-inner-2.C: New file.
* g++.dg/gomp/loop-transforms/attrs-unroll-inner-3.C: New file.
---
 gcc/cp/ChangeLog.omp  |  15 ++
 gcc/cp/parser.cc  |  69 ---
 gcc/testsuite/ChangeLog.omp   |  12 ++
 .../gomp/loop-transforms/attrs-tile-1.C   | 164 +
 .../gomp/loop-transforms/attrs-tile-2.C   | 174 ++
 .../gomp/loop-transforms/attrs-tile-3.C   | 111 +++
 .../gomp/loop-transforms/attrs-unroll-1.C | 135 ++
 .../gomp/loop-transforms/attrs-unroll-2.C |  81 
 .../gomp/loop-transforms/attrs-unroll-3.C |  20 ++
 .../loop-transforms/attrs-unroll-inner-1.C|  15 ++
 .../loop-transforms/attrs-unroll-inner-2.C|  29 +++
 .../loop-transforms/attrs-unroll-inner-3.C|  71 +++
 12 files changed, 872 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-tile-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-tile-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-tile-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-3.C
 create mode 100644 
gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-inner-1.C
 create mode 100644 
gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-inner-2.C
 create mode 100644 
gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-inner-3.C

diff --git a/gcc/cp/ChangeLog.omp b/gcc/cp/ChangeLog.omp
index 1b2d71422d8..fe5ef67a7ad 100644
--- a/gcc/cp/ChangeLog.omp
+++ b/gcc/cp/ChangeLog.omp
@@ -1,3 +1,18 @@
+2023-08-18  Sandra Loosemore  
+
+   * parser.cc (cp_parser_omp_all_clauses): Allow comma before first
+   clause.
+   (cp_parser_see_omp_loop_nest): Accept C++ standard attributes
+   before RID_FOR.
+   (cp_parser_omp_loop_nest): Process C++ standard attributes like
+   pragmas.  Improve error handling for bad pragmas/attributes.
+   Use cp_parser_see_omp_loop_nest instead of duplicating what it
+   does.
+   (cp_parser_omp_tile_sizes): Permit comma before the clause.
+   (cp_parser_omp_tile): Assert that this isn't called for inner
+   directive.
+   (cp_parser_omp_unroll): Likewise.
+
 2023-08-18  Sandra Loosemore  
 
* parser.cc (cp_parser_omp_declare_mapper): Allow commas between
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 0ce2a7be608..4871f4511a9 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -42240,15 +42240,12 @@ cp_parser_omp_all_clauses (cp_parser *parser, 
omp_clause_mask mask,
   if (nested && cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_PAREN))
break;
 
-  if (!first || nested != 2)
-   {
- if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
-   cp_lexer_consume_token (parser->lexer);
- else if (nested == 2)
-   error_at (cp_lexer_peek_token (parser->lexer)->location,
- "clauses in % trait should be separated "
-  "by %<,%>");
-   }
+  if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
+   cp_lexer_consume_token (parser->lexer);
+  else if (!first && nested == 2)
+   error_at (cp_lexer_peek_token (parser->lexer)->location,
+ "clauses in % trait should be separated "
+ "by %<,%>");
 
   token = cp_lexer_peek_token (parser->lexer);
   c_kind = cp_parser_omp_clause_name (parser);
@@ -44803,6 +44800,11 @@ cp_parser_see_omp_loop_nest (cp_parser *parser, enum 
tree_code code,
  || (cp_parser_pragma_kind (cp_lexer_peek_token (parser->lexer))
  ==

[OG13, committed 1/3] OpenMP: C++ attribute syntax fixes/testcases for "metadirective"

2023-08-18 Thread Sandra Loosemore

gcc/cp/ChangeLog:
* parser.cc (analyze_metadirective_body): Handle CPP_PRAGMA and
CPP_PRAGMA_EOL.
(cp_parser_omp_metadirective): Allow comma between clauses.

gcc/testsuite/ChangeLog:
* g++.dg/gomp/attrs-metadirective-1.C: New file.
* g++.dg/gomp/attrs-metadirective-2.C: New file.
* g++.dg/gomp/attrs-metadirective-3.C: New file.
* g++.dg/gomp/attrs-metadirective-4.C: New file.
* g++.dg/gomp/attrs-metadirective-5.C: New file.
* g++.dg/gomp/attrs-metadirective-6.C: New file.
* g++.dg/gomp/attrs-metadirective-7.C: New file.
* g++.dg/gomp/attrs-metadirective-8.C: New file.
---
 gcc/cp/ChangeLog.omp  |  6 ++
 gcc/cp/parser.cc  | 16 
 gcc/testsuite/ChangeLog.omp   | 11 +++
 .../g++.dg/gomp/attrs-metadirective-1.C   | 40 ++
 .../g++.dg/gomp/attrs-metadirective-2.C   | 74 +++
 .../g++.dg/gomp/attrs-metadirective-3.C   | 31 
 .../g++.dg/gomp/attrs-metadirective-4.C   | 41 ++
 .../g++.dg/gomp/attrs-metadirective-5.C   | 24 ++
 .../g++.dg/gomp/attrs-metadirective-6.C   | 31 
 .../g++.dg/gomp/attrs-metadirective-7.C   | 31 
 .../g++.dg/gomp/attrs-metadirective-8.C   | 16 
 11 files changed, 321 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-4.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-5.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-6.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-7.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-8.C

diff --git a/gcc/cp/ChangeLog.omp b/gcc/cp/ChangeLog.omp
index e146f57d57d..6e154ea3426 100644
--- a/gcc/cp/ChangeLog.omp
+++ b/gcc/cp/ChangeLog.omp
@@ -1,3 +1,9 @@
+2023-08-18  Sandra Loosemore  
+
+   * parser.cc (analyze_metadirective_body): Handle CPP_PRAGMA and
+   CPP_PRAGMA_EOL.
+   (cp_parser_omp_metadirective): Allow comma between clauses.
+
 2023-08-10  Julian Brown  
 
* parser.cc (cp_parser_omp_var_list_no_open): Support array-shaping
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index cbbef6470d5..84ec0de6c69 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -49458,6 +49458,7 @@ analyze_metadirective_body (cp_parser *parser,
   int bracket_depth = 0;
   bool in_case = false;
   bool in_label_decl = false;
+  cp_token *pragma_tok = NULL;
 
   while (1)
 {
@@ -49501,6 +49502,19 @@ analyze_metadirective_body (cp_parser *parser,
  /* Local label declarations are terminated by a semicolon.  */
  in_label_decl = false;
  goto add;
+   case CPP_PRAGMA:
+ parser->lexer->in_pragma = true;
+ pragma_tok = token;
+ goto add;
+   case CPP_PRAGMA_EOL:
+ /* C++ attribute syntax for OMP directives lexes as a pragma,
+but we must reset the associated lexer state when we reach
+the end in order to get the tokens for the statement that
+come after it.  */
+ tokens.safe_push (*token);
+ cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+ pragma_tok = NULL;
+ continue;
default:
add:
  tokens.safe_push (*token);
@@ -49541,6 +49555,8 @@ cp_parser_omp_metadirective (cp_parser *parser, 
cp_token *pragma_tok,
 
   while (cp_lexer_next_token_is_not (parser->lexer, CPP_PRAGMA_EOL))
 {
+  if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
+   cp_lexer_consume_token (parser->lexer);
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_NAME)
  && cp_lexer_next_token_is_not (parser->lexer, CPP_KEYWORD))
{
diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index b8780d17841..a9b4ac3d0a7 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,14 @@
+2023-08-18  Sandra Loosemore  
+
+   * g++.dg/gomp/attrs-metadirective-1.C: New file.
+   * g++.dg/gomp/attrs-metadirective-2.C: New file.
+   * g++.dg/gomp/attrs-metadirective-3.C: New file.
+   * g++.dg/gomp/attrs-metadirective-4.C: New file.
+   * g++.dg/gomp/attrs-metadirective-5.C: New file.
+   * g++.dg/gomp/attrs-metadirective-6.C: New file.
+   * g++.dg/gomp/attrs-metadirective-7.C: New file.
+   * g++.dg/gomp/attrs-metadirective-8.C: New file.
+
 2023-08-10  Julian Brown  
 
* c-c++-common/gomp/declare-mapper-17.c: New test.
diff --git a/gcc/testsuite/g++.dg/gomp/attrs-metadirective-1.C 
b/gcc/testsuite/g++.dg/gomp/attrs-metadirective-1.C
new file mode 100644
index 000..22edd257084
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/attrs-metadirective-1.C
@@ -0,0

[OG13, committed 0/3] C++ attribute syntax fixes/testcases

2023-08-18 Thread Sandra Loosemore

I've had a task item to ensure that g++ accepts the standard C++
attribute syntax form for all (currently-implemented) OpenMP 5.1
directives, and that there are tests to verify this.  I used some
scripting to scan for existing testcases given a list of the
directives, which I extracted from the reference card on the OpenMP
web site.  It looked to me that on mainline all the supported
directives had tests already, but on OG13 I found that
"metadirective", "declare mapper", and the loop transforms "tile" and
"unroll" had no tests, and on further investigation all of them had
bugs, too.

I didn't manually examine all the existing tests for other directives,
BTW, but the ones I spot-checked seem to have good coverage.  The new
tests are mostly just adapted from a subset of existing pragma-syntax
tests.

-Sandra


Sandra Loosemore (3):
  OpenMP: C++ attribute syntax fixes/testcases for "metadirective"
  OpenMP: C++ attribute syntax fixes/testcases for "declare mapper"
  OpenMP: C++ attribute syntax fixes/testcases for loop transformations

 gcc/c-family/ChangeLog.omp|   4 +
 gcc/c-family/c-omp.cc |   4 +-
 gcc/cp/ChangeLog.omp  |  26 +++
 gcc/cp/parser.cc  |  87 ++---
 gcc/testsuite/ChangeLog.omp   |  30 +++
 .../g++.dg/gomp/attrs-declare-mapper-3.C  |  31 
 .../g++.dg/gomp/attrs-declare-mapper-4.C  |  74 
 .../g++.dg/gomp/attrs-declare-mapper-5.C  |  26 +++
 .../g++.dg/gomp/attrs-declare-mapper-6.C  |  22 +++
 .../g++.dg/gomp/attrs-metadirective-1.C   |  40 
 .../g++.dg/gomp/attrs-metadirective-2.C   |  74 
 .../g++.dg/gomp/attrs-metadirective-3.C   |  31 
 .../g++.dg/gomp/attrs-metadirective-4.C   |  41 +
 .../g++.dg/gomp/attrs-metadirective-5.C   |  24 +++
 .../g++.dg/gomp/attrs-metadirective-6.C   |  31 
 .../g++.dg/gomp/attrs-metadirective-7.C   |  31 
 .../g++.dg/gomp/attrs-metadirective-8.C   |  16 ++
 .../gomp/loop-transforms/attrs-tile-1.C   | 164 +
 .../gomp/loop-transforms/attrs-tile-2.C   | 174 ++
 .../gomp/loop-transforms/attrs-tile-3.C   | 111 +++
 .../gomp/loop-transforms/attrs-unroll-1.C | 135 ++
 .../gomp/loop-transforms/attrs-unroll-2.C |  81 
 .../gomp/loop-transforms/attrs-unroll-3.C |  20 ++
 .../loop-transforms/attrs-unroll-inner-1.C|  15 ++
 .../loop-transforms/attrs-unroll-inner-2.C|  29 +++
 .../loop-transforms/attrs-unroll-inner-3.C|  71 +++
 26 files changed, 1366 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-4.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-5.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-declare-mapper-6.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-4.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-5.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-6.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-7.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-8.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-tile-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-tile-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-tile-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-3.C
 create mode 100644 
gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-inner-1.C
 create mode 100644 
gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-inner-2.C
 create mode 100644 
gcc/testsuite/g++.dg/gomp/loop-transforms/attrs-unroll-inner-3.C

-- 
2.31.1

Re: [Patch] omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]

2023-08-18 Thread Jakub Jelinek via Gcc-patches

On Fri, Aug 18, 2023 at 07:15:16PM +0200, Tobias Burnus wrote:
> Comments, questions, concerns?
> 
> If not, I intent to commit the attached patch to mainline on Monday
> and after the usual grace time to GCC 13 and then to GCC 12.
> 
>   PR middle-end/111017
> gcc/
>   * omp-expand.cc (expand_omp_for_init_vars): Pass after=true
>   to expand_omp_build_cond for 'factor != 0' condition, resulting
>   in pre-r12-5295-g47de0b56ee455e code for the gimple insert.
> 
> libgomp/
>   * testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test.

LGTM, thanks.

Jakub

[Patch] omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]

2023-08-18 Thread Tobias Burnus


This patch fixes a bug with an OpenMP non-rectangular loop nest where the
factor is 0.

With the old code before r12-5295-g47de0b56ee455e, the testcase of the
PR (or included in the attached patch) worked fine. omp-expand.c contained
back then:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/omp-expand.c;hb=eacdfaf7ca07367ede1a0c50aa997953958dabae#l2560

2560   gcond *cond_stmt
2561 = gimple_build_cond (NE_EXPR, factor,
2562  build_zero_cst (TREE_TYPE (factor)),
2563  NULL_TREE, NULL_TREE);
2564   gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);

In commit https://gcc.gnu.org/r12-5295-g47de0b56ee455e a new function
was introduced:

+/* Prepend or append LHS CODE RHS condition before or after *GSI_P.  */
+
+static gcond *
+expand_omp_build_cond (gimple_stmt_iterator *gsi_p, enum tree_code code,
+  tree lhs, tree rhs, bool after = false)
+{
+  gcond *cond_stmt = gimple_build_cond (code, lhs, rhs, NULL_TREE, NULL_TREE);
+  if (after)
+gsi_insert_after (gsi_p, cond_stmt, GSI_CONTINUE_LINKING);
+  else
+gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT);


While it supports both before/GSI_SAME_STMT and after/GSI_CONTINUE_LINKING,
the patch missed to add an '/* after= */ true for the 'factor != 0' condition
above. (For all others, after=false was fine.)

This patch reinstates the prior after/GSI_CONTINUE_LINKING by adding 'true' to
the call and, thus, fixes the in between segfaulting testcase of the PR,
https://gcc.gnu.org/PR111017


Comments, questions, concerns?

If not, I intent to commit the attached patch to mainline on Monday
and after the usual grace time to GCC 13 and then to GCC 12.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]

Before commit r12-5295-g47de0b56ee455e, all gimple_build_cond in
expand_omp_for_* were inserted with
  gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT);
except the one dealing with the multiplicative factor that was
  gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);

That commit for PR103208 fixed the issue of some missing regimplify of
operands of GIMPLE_CONDs by moving the condition handling to the new function
expand_omp_build_cond. While that function has an 'bool after = false'
argument to switch between the two variants.

However, all callers ommited this argument. This commit reinstates the
prior behavior by passing 'true' for the factor != 0 condition, fixing
the included testcase.

	PR middle-end/111017
gcc/
	* omp-expand.cc (expand_omp_for_init_vars): Pass after=true
	to expand_omp_build_cond for 'factor != 0' condition, resulting
	in pre-r12-5295-g47de0b56ee455e code for the gimple insert.

libgomp/
	* testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test.
---
 gcc/omp-expand.cc  |  3 +-
 .../libgomp.c-c++-common/non-rect-loop-1.c | 72 ++
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index db58b3cb49b..1a4d625fea3 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -2562,7 +2562,8 @@ expand_omp_for_init_vars (struct omp_for_data *fd, gimple_stmt_iterator *gsi,
 	  tree factor = fd->factor;
 	  gcond *cond_stmt
 		= expand_omp_build_cond (gsi, NE_EXPR, factor,
-	 build_zero_cst (TREE_TYPE (factor)));
+	 build_zero_cst (TREE_TYPE (factor)),
+	 true);
 	  edge e = split_block (gsi_bb (*gsi), cond_stmt);
 	  basic_block bb0 = e->src;
 	  e->flags = EDGE_TRUE_VALUE;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/non-rect-loop-1.c b/libgomp/testsuite/libgomp.c-c++-common/non-rect-loop-1.c
new file mode 100644
index 000..fbd462b3683
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/non-rect-loop-1.c
@@ -0,0 +1,72 @@
+/* PR middle-end/111017  */
+
+#include 
+
+#define DIM 32
+#define N (DIM*DIM)
+
+int
+main ()
+{
+  int a[N], b[N], c[N];
+  int dim = DIM;
+
+  for (int i = 0; i < N; i++)
+{
+  a[i] = 3*i;
+  b[i] = 7*i;
+  c[i] = 42;
+}
+
+  #pragma omp parallel for collapse(2)
+  for (int i = 0; i < DIM; i++)
+for (int j = (i*DIM); j < (i*DIM + DIM); j++)
+  c[j] = a[j] + b[j];
+
+  for (int i = 0; i < DIM; i++)
+for (int j = (i*DIM); j < (i*DIM + DIM); j++)
+  if (c[j] != a[j] + b[j] || c[j] != 3*j +7*j)
+	__builtin_abort ();
+  for (int i = 0; i < N; i++)
+c[i] = 42;
+
+  #pragma omp parallel for collapse(2)
+  for (int i = 0; i < dim; i++)
+for (int j = (i*dim); j < (i*dim + dim); j++)
+  c[j] = a[j] + b[j];
+
+  for (int i = 0; i < DIM; i++)
+for (int j = (i*DIM); j < (i*DIM + DIM);

Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-18 Thread Bradley Lucier via Gcc-patches


On 8/17/23 3:54 AM, Richard Biener wrote:

I think it needs a new category, 'inline' is probably the "closest" existing one
but that also tends to be noisy.  Maybe 'call' would be a good name?  We could
report things like tail-recursion optimization, tail-calling and sibling calling
optimizations there, possibly also return/argument copy elision.


OK, thanks.

I have two questions:

1.  Is the information dumped by -fopt-info intended for compiler 
developers, to see something of the internal logic of gcc, or for end users?


2.  You say that "'inline' ... tends to be noisy".  Most of the output I 
see from -fopt-info-missed is basically


_io.c:103829:4: missed:   not inlinable: ___H___io/396 -> 
__builtin_expect/2486, function body not available


Is ___builtin_expect truly a function whose body is not available, or 
should -fopt-info-missed not report these instances?


Brad

[committed]: i386: Use PUNPCKL?? to implement vector extend and zero_extend for TARGET_SSE2 [PR111023]

2023-08-18 Thread Uros Bizjak via Gcc-patches

Implement vector extend and zero_extend functionality for TARGET_SSE2 using
PUNPCKL?? family of instructions. The code for e.g. zero-extend from V2SI to
V2DImode improves from:

movd%xmm0, %edx
pshufd  $85, %xmm0, %xmm0
movd%xmm0, %eax
movq%rdx, (%rdi)
movq%rax, 8(%rdi)

to:
pxor%xmm1, %xmm1
punpckldq   %xmm1, %xmm0
movaps  %xmm0, (%rdi)

And the code for sign-extend from V2SI to V2DImode from:

movd%xmm0, %edx
pshufd  $85, %xmm0, %xmm0
movd%xmm0, %eax
movslq  %edx, %rdx
cltq
movq%rdx, (%rdi)
movq%rax, 8(%rdi)

to:
pxor%xmm1, %xmm1
pcmpgtd %xmm0, %xmm1
punpckldq   %xmm1, %xmm0
movaps  %xmm0, (%rdi)

PR target/111023

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_split_mmx_punpck):
Also handle V2QImode.
(ix86_expand_sse_extend): New function.
* config/i386/i386-protos.h (ix86_expand_sse_extend): New prototype.
* config/i386/mmx.md (v4qiv4hi2): Enable for
TARGET_SSE2.  Expand through ix86_expand_sse_extend for !TARGET_SSE4_1.
(v2hiv2si2): Ditto.
(v2qiv2hi2): Ditto.
* config/i386/sse.md (v8qiv8hi2): Ditto.
(v4hiv4si2): Ditto.
(v2siv2di2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111023-2.c: New test.
* gcc.target/i386/pr111023-4b.c: New test.
* gcc.target/i386/pr111023-8b.c: New test.
* gcc.target/i386/pr111023.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 85e30552d6f..460d496ef22 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1124,8 +1124,9 @@ ix86_split_mmx_punpck (rtx operands[], bool high_p)
 
   switch (mode)
 {
-case E_V4QImode:
 case E_V8QImode:
+case E_V4QImode:
+case E_V2QImode:
   sse_mode = V16QImode;
   double_sse_mode = V32QImode;
   mask = gen_rtx_PARALLEL (VOIDmode,
@@ -5636,7 +5637,43 @@ ix86_expand_vec_perm (rtx operands[])
 }
 }
 
-/* Unpack OP[1] into the next wider integer vector type.  UNSIGNED_P is
+/* Extend SRC into next wider integer vector type.  UNSIGNED_P is
+   true if we should do zero extension, else sign extension.  */
+
+void
+ix86_expand_sse_extend (rtx dest, rtx src, bool unsigned_p)
+{
+  machine_mode imode = GET_MODE (src);
+  rtx ops[3];
+
+  switch (imode)
+{
+case E_V8QImode:
+case E_V4QImode:
+case E_V2QImode:
+case E_V4HImode:
+case E_V2HImode:
+case E_V2SImode:
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  ops[0] = gen_reg_rtx (imode);
+
+  ops[1] = force_reg (imode, src);
+
+  if (unsigned_p)
+ops[2] = force_reg (imode, CONST0_RTX (imode));
+  else
+ops[2] = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode),
+ src, pc_rtx, pc_rtx);
+
+  ix86_split_mmx_punpck (ops, false);
+  emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), ops[0], imode));
+}
+
+/* Unpack SRC into the next wider integer vector type.  UNSIGNED_P is
true if we should do zero extension, else sign extension.  HIGH_P is
true if we want the N/2 high elements, else the low elements.  */
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index fc2f1f13b78..9ffb125fc2b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -155,6 +155,7 @@ extern bool ix86_expand_mask_vec_cmp (rtx, enum rtx_code, 
rtx, rtx);
 extern bool ix86_expand_int_vec_cmp (rtx[]);
 extern bool ix86_expand_fp_vec_cmp (rtx[]);
 extern void ix86_expand_sse_movcc (rtx, rtx, rtx, rtx);
+extern void ix86_expand_sse_extend (rtx, rtx, bool);
 extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
 extern void ix86_expand_fp_spaceship (rtx, rtx, rtx);
 extern bool ix86_expand_int_addcc (rtx[]);
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 170432a7128..ef578222945 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3744,8 +3744,14 @@ (define_expand "v4qiv4hi2"
   [(set (match_operand:V4HI 0 "register_operand")
(any_extend:V4HI
  (match_operand:V4QI 1 "register_operand")))]
-  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "TARGET_MMX_WITH_SSE"
 {
+  if (!TARGET_SSE4_1)
+{
+  ix86_expand_sse_extend (operands[0], operands[1], );
+  DONE;
+}
+
   rtx op1 = force_reg (V4QImode, operands[1]);
   op1 = lowpart_subreg (V8QImode, op1, V4QImode);
   emit_insn (gen_sse4_1_v4qiv4hi2 (operands[0], op1));
@@ -3770,8 +3776,14 @@ (define_expand "v2hiv2si2"
   [(set (match_operand:V2SI 0 "register_operand")
(any_extend:V2SI
  (match_operand:V2HI 1 "register_operand")))]
-  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "TARGET_MMX_WITH_SSE"
 {
+  if (!TARGET_SSE4_1)
+{
+  ix86_expand_sse_extend (operands[0], operands[1], );
+  DONE;

Re: [PING][PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-08-18 Thread Peter Bergner via Gcc-patches

On 8/2/23 8:23 AM, Vladimir Makarov wrote:
>>> gcc/
>>> PR rtl-optimization/PR110254
>>> * ira-color.cc (improve_allocation): Update array
> 
> I guess you missed the next line in the changelog.  I suspect it should be 
> "Update array allocated_hard_reg_p."
> 
> Please, fix it before committing the patch.

Is this a fix we want backported?

Peter

Re: [PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

2023-08-18 Thread Tobias Burnus


Hello Thiago,

the patch looks good to me. Thanks! Can you commit the patch yourself or
do you need someone to do this for you?

On 15.08.23 18:17, Thiago Jung Bauermann via Gcc-patches wrote:

Thiago Jung Bauermann  writes:


Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
changed the compiler error message in this testcase from

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: expected iteration declaration or initialization
compiler exited with status 1

to:

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: invalid type for iteration variable 'i'
compiler exited with status 1
Excess errors:
:8:3: error: invalid type for iteration variable 'i'

Andrew Pinski analysed the issue in PR 110756 and considered that it was a
testsuite issue in that the error message changed slightly.  Also, it's a
better error message.

Therefore, we only need to adjust the testcase to expect the new message.

gcc/testsuite/ChangeLog:
 PR testsuite/110756
 g++.dg/gomp/pr58567.C: Adjust to new compiler error message.
---
  gcc/testsuite/g++.dg/gomp/pr58567.C | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/gomp/pr58567.C 
b/gcc/testsuite/g++.dg/gomp/pr58567.C
index 35a5bb027ffe..866d831c65e4 100644
--- a/gcc/testsuite/g++.dg/gomp/pr58567.C
+++ b/gcc/testsuite/g++.dg/gomp/pr58567.C
@@ -5,7 +5,7 @@
  template void foo()
  {
#pragma omp parallel for
-  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a class, 
struct, or union type|expected iteration declaration or initialization" } */
+  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a class, 
struct, or union type|invalid type for iteration variable 'i'" } */
  ;
  }


Ping? I just tested trunk. It still fails this test, and this patch
still fixes the failures.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] Emit funcall external declarations only if actually used.

2023-08-18 Thread Jakub Jelinek via Gcc-patches

On Fri, Aug 18, 2023 at 06:31:10PM +0200, Jose E. Marchesi wrote:
> > This won't work if target can't use a direct call instruction.
> > Consider
> > __int128 a, b; void foo () { a = a / b; }
> > on x86_64-linux.  With just -O2, the above works fine, with
> > -O2 -mcmodel=large it will not, the call is indirect, but at least one has
> > REG_CALL_DECL note that could be used as fallback to the above.
> > And with -O0 -mcmodel=large because flag_ipa_ra is false REG_CALL_DECL isn't
> > emitted at all.
> > So, perhaps you could emit the REG_CALL_DECL note even if !flag_ipa_ra
> > when SYMBOL_REF_LIBCALL is set?
> 
> Hmm something like this?

Yes.

Jakub

[PATCH] aarch64: fix format specifier

2023-08-18 Thread FX Coudert via Gcc-patches

A rather trivial fix for fprintf() specifier of a HOST_WIDE_INT value.
Tested on aarch64-apple-darwin. OK to commit?

FX



0001-aarch64-fix-format-specifier.patch
Description: Binary data

Re: [PATCH] Emit funcall external declarations only if actually used.

2023-08-18 Thread Jose E. Marchesi via Gcc-patches



Hi Jakub.
Thanks for the review.

> On Fri, Aug 18, 2023 at 03:53:51PM +0200, Jose E. Marchesi via Gcc-patches 
> wrote:
>> --- a/gcc/final.cc
>> +++ b/gcc/final.cc
>> @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>> reorg.cc, since the branch splitting exposes new instructions with delay
>> slots.  */
>>  
>> +static rtx call_from_call_insn (rtx_call_insn *insn);
>> +
>
> I'd say the forward declaration should go before the function comment, so
> that it is clear the function comment talks about shorten_branches.

Will do.

>
>>  void
>>  shorten_branches (rtx_insn *first)
>>  {
>> @@ -850,6 +852,20 @@ shorten_branches (rtx_insn *first)
>>for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>>  {
>>INSN_SHUID (insn) = i++;
>> +
>> +  /* If this is a `call' instruction implementing a libcall,
>> + and this machine requires an external definition for library
>> + functions, write one out.  */
>> +  if (CALL_P (insn))
>> +{
>> +  rtx x = call_from_call_insn (dyn_cast  (insn));
>> +  x = XEXP (x, 0);
>> +  if (x && MEM_P (x)
>
> When all conditions don't fit on one line, each && condition should be on
> its own line.

Will fix.

>
>> +  && SYMBOL_REF_P (XEXP (x, 0))
>> +  && SYMBOL_REF_LIBCALL (XEXP (x, 0)))
>> +assemble_external_libcall (XEXP (x, 0));
>> +}
>
> This won't work if target can't use a direct call instruction.
> Consider
> __int128 a, b; void foo () { a = a / b; }
> on x86_64-linux.  With just -O2, the above works fine, with
> -O2 -mcmodel=large it will not, the call is indirect, but at least one has
> REG_CALL_DECL note that could be used as fallback to the above.
> And with -O0 -mcmodel=large because flag_ipa_ra is false REG_CALL_DECL isn't
> emitted at all.
> So, perhaps you could emit the REG_CALL_DECL note even if !flag_ipa_ra
> when SYMBOL_REF_LIBCALL is set?

Hmm something like this?

(I am aware that as things stand in emit_library_call_value_1 that
 conditional will be always true, but I think it is good to keep the
 conditional as documentation and in case emit_library_call_value_1
 changes in the future.  Note also that `fun' is known to be `orgfun'
 when the bit it set.  That may change later as per
 prepare_call_address.)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 1f3a6d5c450..219ea599b16 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
|| argvec[i].partial != 0)
   update_stack_alignment_for_call ([i].locate);
 
-  /* If this machine requires an external definition for library
- functions, write one out.  */
-  assemble_external_libcall (fun);
+  /* Mark the emitted target as a libcall.  This will be used by final
+ in order to emit an external symbol declaration if the libcall is
+ ever used.  */
+  SYMBOL_REF_LIBCALL (fun) = 1;
 
   original_args_size = args_size;
   args_size.constant = (aligned_upper_bound (args_size.constant
@@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
   valreg,
   old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
 
-  if (flag_ipa_ra)
+  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
 {
   rtx datum = orgfun;
   gcc_assert (GET_CODE (datum) == SYMBOL_REF);
diff --git a/gcc/final.cc b/gcc/final.cc
index dd3e22547ac..53f5d890809 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
 }
 
 
+static rtx call_from_call_insn (rtx_call_insn *insn);
+
 /* Make a pass over all insns and compute their actual lengths by shortening
any branches of variable length if possible.  */
 
@@ -850,6 +852,19 @@ shorten_branches (rtx_insn *first)
   for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
 {
   INSN_SHUID (insn) = i++;
+
+  /* If this is a `call' instruction or implementing a libcall,
+ and this machine requires an external definition for library
+ functions, write one out.  */
+  if (CALL_P (insn))
+{
+  rtx x = call_from_call_insn (dyn_cast  (insn));
+
+  if ((x = XEXP (x, 0)) && MEM_P (x) && SYMBOL_REF_P (XEXP (x, 0))
+  || (x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX)))
+assemble_external_libcall (XEXP (x, 0));
+}
+
   if (INSN_P (insn))
continue;
 
>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>> index e1c51156f90..945e3267a34 100644
>> --- a/gcc/rtl.h
>> +++ b/gcc/rtl.h
>> @@ -402,6 +402,8 @@ struct GTY((desc("0"), tag("0"),
>>   1 in a VALUE or DEBUG_EXPR is NO_LOC_P in var-tracking.cc.
>>   Dumped as "/i" in RTL dumps.  */
>>unsigned return_val : 1;
>> +  /* 1 in a SYMBOL_REF if it is the target of a libcall.  */
>> +  unsigned is_libcall : 1;
>
> This is wrong.  struct rtx_def is carefully designed such that
> it has 16 +

Re: Another bug for __builtin_object_size? (Or expected behavior)

2023-08-18 Thread Qing Zhao via Gcc-patches




> On Aug 17, 2023, at 5:32 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-17 17:25, Qing Zhao wrote:
>>> It's not exactly the same issue, the earlier discussion was about choosing 
>>> sizes in the same pass while the current one is about choosing between 
>>> passes, but I agree it "rhymes".  This is what I was alluding to originally 
>>> (for OST_MINIMUM use MIN_EXPR if both passes returned a pass) but I haven't 
>>> thought about it hard enough to be 100% confident that it's the better 
>>> solution, especially for OST_MAXIMUM.
>> We have two different sources to get SIZE information for the subobject:
>> 1. From the TYPESIZE information embedded in the IR;
>> 2. From the initialization information propagated from data flow, this 
>> includes both malloc call and the DECL_INIT.
>> We need to choose between these two when both available, (these two 
>> information could be
>>  in the same pass as we discussed before, or in different passes which is 
>> shown in this discussion).
>> I think that the MIN_EXPR might be the right choice (especially for 
>> OST_MAXIMUM) -:)
> 
> It's worth a shot I guess.  We could emit something like the following in 
> early_object_sizes_execute_one:
> 
>  sz = (__bos(o->sub, ost) == unknown
>? early_size
>: MIN_EXPR (__bos(o->sub, ost), early_size));
> 
> and see if it sticks.

I came up with the following change for tree-object-size.cc:

diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index a62af0500563..e1b2008c6dcc 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -2016,10 +2016,22 @@ do_valueize (tree t)
   return t;
 }
 
-/* Process a __builtin_object_size or __builtin_dynamic_object_size call in
-   CALL early for subobjects before any object information is lost due to
-   optimization.  Insert a MIN or MAX expression of the result and
-   __builtin_object_size at I so that it may be processed in the second pass.
+/* Process a __builtin_object_size or __builtin_dynamic_object_size call
+   early for subobjects before any object information is lost due to
+   optimization.
+
+   We have two different sources to get the size information for subobjects:
+   A. The TYPE information of the subobject in the IR;
+   B. The initialization information propagated through data flow.
+   In the early pass, only A is available.
+   B might be available in the second pass.
+
+   If both A and B are available, we should choose the minimum one between
+   these two.
+
+   Insert a MIN expression of the result from the early pass and the original
+   __builtin_object_size call at I so that it may be processed in the second 
pass.
+
__builtin_dynamic_object_size is treated like __builtin_object_size here
since we're only looking for constant bounds.  */
 
@@ -2036,7 +2048,7 @@ early_object_sizes_execute_one (gimple_stmt_iterator *i, 
gimple *call)
   unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi (ost);
   tree ptr = gimple_call_arg (call, 0);
 
-  if (object_size_type != 1 && object_size_type != 3)
+  if (object_size_type & OST_SUBOBJECT == 0)
 return;
 
   if (TREE_CODE (ptr) != ADDR_EXPR && TREE_CODE (ptr) != SSA_NAME)
@@ -2050,9 +2062,8 @@ early_object_sizes_execute_one (gimple_stmt_iterator *i, 
gimple *call)
 
   tree tem = make_ssa_name (type);
   gimple_call_set_lhs (call, tem);
-  enum tree_code code = object_size_type & OST_MINIMUM ? MAX_EXPR : MIN_EXPR;
   tree cst = fold_convert (type, bytes);
-  gimple *g = gimple_build_assign (lhs, code, tem, cst);
+  gimple *g = gimple_build_assign (lhs, MIN_EXPR, tem, cst);
   gsi_insert_after (i, g, GSI_NEW_STMT);
   update_stmt (call);
 }

Let me know if you see any issue with the change.

thanks.

Qing

> 
> Thanks,
> Sid

[COMMITTED] [irange] Return FALSE if updated bitmask is unchanged [PR110753]

2023-08-18 Thread Aldy Hernandez via Gcc-patches

The mask/value pair we track in the irange is a bit fickle in that it
can sometimes contradict the bitmask inherent in the range.  This can
happen when a series of calculations yield a combination such as:

[3, 1000] MASK 0xfffe VALUE 0x0

The mask/value above implies that the lowest bit is a known 0, which
would exclude the 3 in the range.  At one time we tried keeping mask
and ranges 100% consistent, but the performance penalty was too high
(5% in VRP).  Also, it's unclear whether the intersection of two
incompatible known bits should make the whole range undefined, or
just the contradicting bits.  This is all documented in
irange::get_bitmask().  We could revisit both of these assumptions
in the future.

In this testcase IPA ends up with a range where the lower 2 bits are
expected to be 0, but the range is [1,1].

[irange] long int [1, 1] MASK 0xfffc VALUE 0x0

This causes irange::union_bitmask() to think an update occurred, when
no semantic change happened, thus triggering an assert in IPA-cp.  We
could get rid of the assert, but it's cleaner to make
irange::{union,intersect}_bitmask always tell the truth.  Beside, the
ranger's cache also depends on union being truthful.

PR ipa/110753

gcc/ChangeLog:

* value-range.cc (irange::union_bitmask): Return FALSE if updated
bitmask is semantically equivalent to the original mask.
(irange::intersect_bitmask): Same.
(irange::get_bitmask): Add comment.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr110753.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr110753.c | 15 +++
 gcc/value-range.cc   | 18 ++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110753.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110753.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr110753.c
new file mode 100644
index 000..aa02487e2a7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110753.c
@@ -0,0 +1,15 @@
+// { dg-do compile }
+// { dg-options "-O2" }
+
+int a, b, c;
+static int d(long e, long f) { return f == 0 || e && f == 1 ?: f; }
+int g(void) {static int t; return t;}
+static void h(long e) {
+  b = e - 1;
+  a = d(b || d(e, 8), g());
+}
+int tt;
+void i(void) {
+  c = (__SIZE_TYPE__)
+  h(c);
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 2abf57bcee8..76f88d91046 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1876,6 +1876,8 @@ irange::get_bitmask () const
   //
   // 3 is in the range endpoints, but is excluded per the known 0 bits
   // in the mask.
+  //
+  // See also the note in irange_bitmask::intersect.
   irange_bitmask bm = get_bitmask_from_range ();
   if (!m_bitmask.unknown_p ())
 bm.intersect (m_bitmask);
@@ -1915,10 +1917,18 @@ irange::intersect_bitmask (const irange )
 return false;
 
   irange_bitmask bm = get_bitmask ();
+  irange_bitmask save = bm;
   if (!bm.intersect (r.get_bitmask ()))
 return false;
 
   m_bitmask = bm;
+
+  // Updating m_bitmask may still yield a semantic bitmask (as
+  // returned by get_bitmask) which is functionally equivalent to what
+  // we originally had.  In which case, there's still no change.
+  if (save == get_bitmask ())
+return false;
+
   if (!set_range_from_bitmask ())
 normalize_kind ();
   if (flag_checking)
@@ -1938,10 +1948,18 @@ irange::union_bitmask (const irange )
 return false;
 
   irange_bitmask bm = get_bitmask ();
+  irange_bitmask save = bm;
   if (!bm.union_ (r.get_bitmask ()))
 return false;
 
   m_bitmask = bm;
+
+  // Updating m_bitmask may still yield a semantic bitmask (as
+  // returned by get_bitmask) which is functionally equivalent to what
+  // we originally had.  In which case, there's still no change.
+  if (save == get_bitmask ())
+return false;
+
   // No need to call set_range_from_mask, because we'll never
   // narrow the range.  Besides, it would cause endless recursion
   // because of the union_ in set_range_from_mask.
-- 
2.41.0

RE: [PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-08-18 Thread Di Zhao OS via Gcc-patches

Hi,

A few updates to the patch:

1. rank_ops_for_fma: return FMA_STATE_NESTED only for complete
   FMA chain, since the regression is obvious only in this case.

2. Added new testcase.

Thanks,
Di Zhao



PR tree-optimization/110279

gcc/ChangeLog:

* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added
new parameter collect_lhs.
(struct fma_transformation_info): Moved to header.
(class fma_deferring_state): Moved to header.
(convert_mult_to_fma): Added new parameter collect_lhs.
* tree-ssa-math-opts.h (struct fma_transformation_info):
(class fma_deferring_state): Moved from .cc.
(convert_mult_to_fma): Moved from .cc.
* tree-ssa-reassoc.cc (enum fma_state): Defined enum to
describe the state of FMA candidates for a list of
operands.
(rewrite_expr_tree_parallel): Changed boolean parameter
to enum type.
(has_nested_fma_p): New function to check for nested FMA
on given multiplication statement.
(rank_ops_for_fma): Return enum fma_state.
(reassociate_bb): Avoid rewriting to parallel if nested
FMAs are found.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110279-1.c: New test.
* gcc.dg/pr110279-2.c: New test.
* gcc.dg/pr110279-3.c: New test.

> -Original Message-
> From: Di Zhao OS
> Sent: Thursday, August 10, 2023 12:53 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Biener 
> Subject: [PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc
> 
> Hi,
> 
> The previous version of this patch tries to solve two problems
> at the same time. For better clarity, I'll separate them and
> only deal with the "nested" FMA in this version. I plan to
> propose another patch in avoiding bad shaped FMA (deferring FMA).
> 
> Other changes:
> 
> 1. Added new testcases for the "nested" FMA issue. For the
>following code:
> 
>   tmp1 = a + c * c + d * d + x * y;
>   tmp2 = x * tmp1;
>   result += (a + c + d + tmp2);
> 
>, when "tmp1 = ..." is not rewritten, tmp1 will be result of
>an FMA, and there will be a list of consecutive FMAs:
> 
>   _1 = .FMA (c, c, a_39);
>   _2 = .FMA (d, d, _1);
>   tmp1 = .FMA (x, y, _2);
>   _3 = .FMA (tmp1, x, d);
>   ...
> 
>If "tmp1 = ..." is rewritten to parallel, tmp1 will be result
>of a PLUS_EXPR between FMAs:
> 
>   _1 = .FMA (c, c, a_39);
>   _2 = x * y;
>   _3 = .FMA (d, d, _2);
>tmp1 = _3 + _1;
>_4 = .FMA (tmp1, x, d);
>   ...
> 
>It seems the register pressure of the latter is higher than
>the former. On the test machines we have (including Ampere1,
>Neoverse-n1 and Intel Xeon), with "tmp1 = ..." is rewritten to
>parallel, the run time all increased significantly. In
>contrast, when "tmp1" is not the 1st or 2nd operand of another
>FMA (pr110279-1.c), rewriting it results in better performance.
>(I'll also append the testcases in the bug tracker.)
> 
> 2. Enhanced checking for nested FMA by: 1) Modified
>convert_mult_to_fma so it can return multiple LHS.  2) Check
>NEGATE_EXPRs for nested FMA.
> 
> (I think maybe this can be further refined by enabling rewriting
> to parallel for very long op list. )
> 
> Bootstrapped and regression tested on x86_64-linux-gnu.
> 
> Thanks,
> Di Zhao



0001-PATCH-tree-optimization-110279-Check-for-nested-FMA-.patch
Description: 0001-PATCH-tree-optimization-110279-Check-for-nested-FMA-.patch

Re: [PATCH] Emit funcall external declarations only if actually used.

2023-08-18 Thread Jakub Jelinek via Gcc-patches

On Fri, Aug 18, 2023 at 03:53:51PM +0200, Jose E. Marchesi via Gcc-patches 
wrote:
> --- a/gcc/final.cc
> +++ b/gcc/final.cc
> @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
> reorg.cc, since the branch splitting exposes new instructions with delay
> slots.  */
>  
> +static rtx call_from_call_insn (rtx_call_insn *insn);
> +

I'd say the forward declaration should go before the function comment, so
that it is clear the function comment talks about shorten_branches.

>  void
>  shorten_branches (rtx_insn *first)
>  {
> @@ -850,6 +852,20 @@ shorten_branches (rtx_insn *first)
>for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>  {
>INSN_SHUID (insn) = i++;
> +
> +  /* If this is a `call' instruction implementing a libcall,
> + and this machine requires an external definition for library
> + functions, write one out.  */
> +  if (CALL_P (insn))
> +{
> +  rtx x = call_from_call_insn (dyn_cast  (insn));
> +  x = XEXP (x, 0);
> +  if (x && MEM_P (x)

When all conditions don't fit on one line, each && condition should be on
its own line.

> +  && SYMBOL_REF_P (XEXP (x, 0))
> +  && SYMBOL_REF_LIBCALL (XEXP (x, 0)))
> +assemble_external_libcall (XEXP (x, 0));
> +}

This won't work if target can't use a direct call instruction.
Consider
__int128 a, b; void foo () { a = a / b; }
on x86_64-linux.  With just -O2, the above works fine, with
-O2 -mcmodel=large it will not, the call is indirect, but at least one has
REG_CALL_DECL note that could be used as fallback to the above.
And with -O0 -mcmodel=large because flag_ipa_ra is false REG_CALL_DECL isn't
emitted at all.
So, perhaps you could emit the REG_CALL_DECL note even if !flag_ipa_ra
when SYMBOL_REF_LIBCALL is set?

> +
>if (INSN_P (insn))
>   continue;
>  
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index e1c51156f90..945e3267a34 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -402,6 +402,8 @@ struct GTY((desc("0"), tag("0"),
>   1 in a VALUE or DEBUG_EXPR is NO_LOC_P in var-tracking.cc.
>   Dumped as "/i" in RTL dumps.  */
>unsigned return_val : 1;
> +  /* 1 in a SYMBOL_REF if it is the target of a libcall.  */
> +  unsigned is_libcall : 1;

This is wrong.  struct rtx_def is carefully designed such that
it has 16 + 8 + 8 bits before the union is 32-bit and then
flexible array member which sometimes needs 64-bit alignment.
So, the above change would grow all RTX by 64 bits.
I think jump, call and in_struct are unused on SYMBOL_REF, so just
document the new meaning of the chosen bit for SYMBOL_REF.

> @@ -2734,6 +2736,10 @@ do {   
> \
>  #define SYMBOL_REF_USED(RTX) \
>(RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
>  
> +/* 1 if RTX is a symbol_ref that represents a libcall target.  */
> +#define SYMBOL_REF_LIBCALL(RTX) \
> +  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->is_libcall)

And change is_libcall to the selected bit-field here.

Jakub

[PATCH] RISC-V: Enable pressure-aware scheduling by default.

2023-08-18 Thread Robin Dapp via Gcc-patches

Hi,

this patch enables pressure-aware scheduling for riscv.  There have been
various requests for it so I figured I'd just go ahead and send
the patch.

There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.

As cost and scheduling models mature I expect the situation to improve
and for now I think it's generally favorable to enable pressure-aware
scheduling so we can work with it rather than trying to find every
possible problem in advance.  Any other opinions on that?

Regards
 Robin

This patch enables register -fsched-pressure by default and sets
the algorithm to "model".  As with other backends, this helps
reduce unnecessary spills.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add -fsched-pressure.
* config/riscv/riscv.cc (riscv_option_override): Set sched
pressure algorithm.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/narrow_constraint-1.c: Add
-fno-sched-pressure.
* gcc.target/riscv/rvv/base/narrow_constraint-17.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-18.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-19.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-20.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-21.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-22.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-23.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-24.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-25.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-26.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-27.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-28.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-29.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-30.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-31.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-4.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-5.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto.
* gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.
---
 gcc/common/config/riscv/riscv-common.cc  | 2 ++
 gcc/config/riscv/riscv.cc| 5 +
 .../gcc.target/riscv/rvv/base/narrow_constraint-1.c  | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-17.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-18.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-19.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-20.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-21.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-22.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-23.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-24.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-25.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-26.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-27.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-28.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-29.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-30.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-31.c | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-4.c  | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-5.c  | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-8.c  | 2 +-
 .../gcc.target/riscv/rvv/base/narrow_constraint-9.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c  | 2 +-
 27 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4737dcd44a1..59848b21162 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++

[PATCH] Emit funcall external declarations only if actually used.

2023-08-18 Thread Jose E. Marchesi via Gcc-patches

[Previous thread:
 https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608162.html]

There are many places in GCC where alternative local sequences are
tried in order to determine what is the cheapest or best alternative
to use in the current target.  When any of these sequences involve a
libcall, the current implementation of emit_library_call_value_1
introduce a side-effect consisting on emitting an external declaration
for the funcall (such as __divdi3) which is thus emitted even if the
sequence that does the libcall is not retained.

This is problematic in targets such as BPF, because the kernel loader
chokes on the spurious symbol __divdi3 and makes the resulting BPF
object unloadable.  Note that BPF objects are not linked before being
loaded.

This patch changes emit_library_call_value_1 to mark the target
SYMBOL_REF as a libcall.  Then, the emission of the external
declaration is done in the first loop of final.cc:shorten_branches.
This happens only if the corresponding sequence has been kept.

Regtested in x86_64-linux-gnu.
Tested with host x86_64-linux-gnu with target bpf-unknown-none.

gcc/ChangeLog

* rtl.h: New flag is_libcall.
(SYMBOL_REF_LIBCALL): Define.
* calls.cc (emit_library_call_value_1): Do not emit external
libcall declaration here.
* final.cc (shorten_branches): Do it here.

gcc/testsuite/ChangeLog

* gcc.target/bpf/divmod-libcall-1.c: New test.
* gcc.target/bpf/divmod-libcall-2.c: Likewise.
---
 gcc/calls.cc  |  7 ---
 gcc/final.cc  | 16 
 gcc/rtl.h |  6 ++
 .../gcc.target/bpf/divmod-libcall-1.c | 19 +++
 .../gcc.target/bpf/divmod-libcall-2.c | 16 
 5 files changed, 61 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 1f3a6d5c450..e0ddda42442 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
|| argvec[i].partial != 0)
   update_stack_alignment_for_call ([i].locate);
 
-  /* If this machine requires an external definition for library
- functions, write one out.  */
-  assemble_external_libcall (fun);
+  /* Mark the emitted target as a libcall.  This will be used by final
+ in order to emit an external symbol declaration if the libcall is
+ ever used.  */
+  SYMBOL_REF_LIBCALL (fun) = 1;
 
   original_args_size = args_size;
   args_size.constant = (aligned_upper_bound (args_size.constant
diff --git a/gcc/final.cc b/gcc/final.cc
index dd3e22547ac..80c112b91f7 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
reorg.cc, since the branch splitting exposes new instructions with delay
slots.  */
 
+static rtx call_from_call_insn (rtx_call_insn *insn);
+
 void
 shorten_branches (rtx_insn *first)
 {
@@ -850,6 +852,20 @@ shorten_branches (rtx_insn *first)
   for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
 {
   INSN_SHUID (insn) = i++;
+
+  /* If this is a `call' instruction implementing a libcall,
+ and this machine requires an external definition for library
+ functions, write one out.  */
+  if (CALL_P (insn))
+{
+  rtx x = call_from_call_insn (dyn_cast  (insn));
+  x = XEXP (x, 0);
+  if (x && MEM_P (x)
+  && SYMBOL_REF_P (XEXP (x, 0))
+  && SYMBOL_REF_LIBCALL (XEXP (x, 0)))
+assemble_external_libcall (XEXP (x, 0));
+}
+
   if (INSN_P (insn))
continue;
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e1c51156f90..945e3267a34 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -402,6 +402,8 @@ struct GTY((desc("0"), tag("0"),
  1 in a VALUE or DEBUG_EXPR is NO_LOC_P in var-tracking.cc.
  Dumped as "/i" in RTL dumps.  */
   unsigned return_val : 1;
+  /* 1 in a SYMBOL_REF if it is the target of a libcall.  */
+  unsigned is_libcall : 1;
 
   union {
 /* The final union field is aligned to 64 bits on LP64 hosts,
@@ -2734,6 +2736,10 @@ do { 
\
 #define SYMBOL_REF_USED(RTX)   \
   (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
 
+/* 1 if RTX is a symbol_ref that represents a libcall target.  */
+#define SYMBOL_REF_LIBCALL(RTX) \
+  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->is_libcall)
+
 /* 1 if RTX is a symbol_ref for a weak symbol.  */
 #define SYMBOL_REF_WEAK(RTX)   \
   (RTL_FLAG_CHECK1 ("SYMBOL_REF_WEAK", (RTX), SYMBOL_REF)->return_val)
diff --git

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Richard Biener via Gcc-patches

On Fri, 18 Aug 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, August 18, 2023 2:53 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Fri, 18 Aug 2023, Tamar Christina wrote:
> > 
> > > > > Yeah if you comment it out one of the testcases should fail.
> > > >
> > > > using new_preheader instead of e->dest would make things clearer.
> > > >
> > > > You are now adding the same arg to every exit (you've just queried the
> > > > main exit redirect_edge_var_map_vector).
> > > >
> > > > OK, so I think I understand what you're doing.  If I understand
> > > > correctly we know that when we exit the main loop via one of the
> > > > early exits we are definitely going to enter the epilog but when
> > > > we take the main exit we might not.
> > > >
> > >
> > > Correct.. but..
> > >
> > > > Looking at the CFG we create currently this isn't reflected and
> > > > this complicates this PHI node updating.  What I'd try to do
> > > > is leave redirecting the alternate exits until after
> > >
> > > It is, in the case of the alternate exits this is reflected in copying
> > > the same values, as they are the values of the number of completed
> > > iterations since the scalar code restarts the last iteration.
> > >
> > > So all the PHI nodes of the alternate exits are correct.  The vector
> > > Iteration doesn't handle the partial iteration.
> > >
> > > > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > > > means leaving it almost unchanged besides the LC SSA maintaining
> > > > changes.  After that for the multi-exit case split the
> > > > epilog preheader edge and redirect all the alternate exits to the
> > > > new preheader.  So the CFG becomes
> > > >
> > > >  
> > > > /  |
> > > >/
> > > >   /  if (epilog)
> > > >alt exits //  \
> > > > //loop around
> > > > |   /
> > > >preheader with "header" PHIs
> > > >   |
> > > >   
> > > >
> > > > note you need the header PHIs also on the main exit path but you
> > > > only need the loop end PHIs there.
> > > >
> > > > It seems so that at least currently the order of things makes
> > > > them more complicated than necessary.
> > >
> > > I've been trying to, but this representation seems a lot harder to work 
> > > with,
> > > In particular at the moment once we exit
> > slpeel_tree_duplicate_loop_to_edge_cfg
> > > the loop structure is exactly the same as one expects from any normal 
> > > epilog
> > vectorization.
> > >
> > > But this new representation requires me to place the guard much earlier 
> > > than
> > the epilogue
> > > preheader,  yet I still have to adjust the PHI nodes in the preheader.  
> > > So it
> > seems that this split
> > > is there to only indicate that we always enter the epilog when taking an 
> > > early
> > exit.
> > >
> > > Today this is reflected in the values of the PHI nodes rather than 
> > > structurally.
> > Once we place
> > > The guard we update the nodes and the alternate exits get their value for
> > ivtmp updated to VF.
> > >
> > > This representation also forces me to do the redirection in every call 
> > > site of
> > > slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated
> > in all use sites.
> > >
> > > But I think this doesn't address the main reason why the
> > slpeel_tree_duplicate_loop_to_edge_cfg
> > > code has a large block of code to deal with PHI node updates.
> > >
> > > The reason as you mentioned somewhere else is that after we redirect the
> > edges I have to reconstruct
> > > the phi nodes.  For most it's straight forwards, but for live values or 
> > > vuse
> > chains it requires extra code.
> > >
> > > You're right in that before we redirect the edges they are all correct in 
> > > the exit
> > block, you mentioned that
> > > the API for the edge redirection is supposed to copy the values over if I
> > create the phi nodes before hand.
> > >
> > > However this doesn't seem to work:
> > >
> > >  for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> > >  !gsi_end_p (gsi_from); gsi_next (_from))
> > >   {
> > > gimple *from_phi = gsi_stmt (gsi_from);
> > > tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > create_phi_node (new_res, new_preheader);
> > >   }
> > >
> > >   for (edge exit : loop_exits)
> > >   redirect_edge_and_branch (exit, new_preheader);
> > >
> > > Still leaves them empty.  Grepping around most code seems to pair
> > redirect_edge_and_branch with
> > > copy_phi_arg_into_existing_phi.  The problem is that in all these cases 
> > > after
> > redirecting an edge they
> > > call copy_phi_arg_into_existing_phi from a predecessor edge to fill in 
> > > the phi
> > nodes.
> > 
>

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Friday, August 18, 2023 2:53 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Fri, 18 Aug 2023, Tamar Christina wrote:
> 
> > > > Yeah if you comment it out one of the testcases should fail.
> > >
> > > using new_preheader instead of e->dest would make things clearer.
> > >
> > > You are now adding the same arg to every exit (you've just queried the
> > > main exit redirect_edge_var_map_vector).
> > >
> > > OK, so I think I understand what you're doing.  If I understand
> > > correctly we know that when we exit the main loop via one of the
> > > early exits we are definitely going to enter the epilog but when
> > > we take the main exit we might not.
> > >
> >
> > Correct.. but..
> >
> > > Looking at the CFG we create currently this isn't reflected and
> > > this complicates this PHI node updating.  What I'd try to do
> > > is leave redirecting the alternate exits until after
> >
> > It is, in the case of the alternate exits this is reflected in copying
> > the same values, as they are the values of the number of completed
> > iterations since the scalar code restarts the last iteration.
> >
> > So all the PHI nodes of the alternate exits are correct.  The vector
> > Iteration doesn't handle the partial iteration.
> >
> > > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > > means leaving it almost unchanged besides the LC SSA maintaining
> > > changes.  After that for the multi-exit case split the
> > > epilog preheader edge and redirect all the alternate exits to the
> > > new preheader.  So the CFG becomes
> > >
> > >  
> > > /  |
> > >/
> > >   /  if (epilog)
> > >alt exits //  \
> > > //loop around
> > > |   /
> > >preheader with "header" PHIs
> > >   |
> > >   
> > >
> > > note you need the header PHIs also on the main exit path but you
> > > only need the loop end PHIs there.
> > >
> > > It seems so that at least currently the order of things makes
> > > them more complicated than necessary.
> >
> > I've been trying to, but this representation seems a lot harder to work 
> > with,
> > In particular at the moment once we exit
> slpeel_tree_duplicate_loop_to_edge_cfg
> > the loop structure is exactly the same as one expects from any normal epilog
> vectorization.
> >
> > But this new representation requires me to place the guard much earlier than
> the epilogue
> > preheader,  yet I still have to adjust the PHI nodes in the preheader.  So 
> > it
> seems that this split
> > is there to only indicate that we always enter the epilog when taking an 
> > early
> exit.
> >
> > Today this is reflected in the values of the PHI nodes rather than 
> > structurally.
> Once we place
> > The guard we update the nodes and the alternate exits get their value for
> ivtmp updated to VF.
> >
> > This representation also forces me to do the redirection in every call site 
> > of
> > slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated
> in all use sites.
> >
> > But I think this doesn't address the main reason why the
> slpeel_tree_duplicate_loop_to_edge_cfg
> > code has a large block of code to deal with PHI node updates.
> >
> > The reason as you mentioned somewhere else is that after we redirect the
> edges I have to reconstruct
> > the phi nodes.  For most it's straight forwards, but for live values or vuse
> chains it requires extra code.
> >
> > You're right in that before we redirect the edges they are all correct in 
> > the exit
> block, you mentioned that
> > the API for the edge redirection is supposed to copy the values over if I
> create the phi nodes before hand.
> >
> > However this doesn't seem to work:
> >
> >  for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> >!gsi_end_p (gsi_from); gsi_next (_from))
> > {
> >   gimple *from_phi = gsi_stmt (gsi_from);
> >   tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> >   create_phi_node (new_res, new_preheader);
> > }
> >
> >   for (edge exit : loop_exits)
> > redirect_edge_and_branch (exit, new_preheader);
> >
> > Still leaves them empty.  Grepping around most code seems to pair
> redirect_edge_and_branch with
> > copy_phi_arg_into_existing_phi.  The problem is that in all these cases 
> > after
> redirecting an edge they
> > call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the 
> > phi
> nodes.
> 
> You need to call flush_pending_stmts on each edge you redirect.
> copy_phi_arg_into_existing_phi isn't suitable for edge redirecting.

Oh. I'll give that a try, that would make sense.. I didn't flush it in the 
current approach
because I needed the map, but since I want to get rid of the map,

[PATCH] tree-optimization/111019 - invariant motion and aliasing

2023-08-18 Thread Richard Biener via Gcc-patches

The following fixes a bad choice in representing things to the alias
oracle by LIM which while correct in pieces is inconsistent with itself.
When canonicalizing a ref to a bare deref instead of leaving the base
object and the extracted offset the same and just substituting an
alternate ref the following replaces the base and the offset as well,
avoiding the confusion that otherwise will arise in
aliasing_matching_component_refs_p.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk 
sofar.

Richard.

PR tree-optimization/111019
* tree-ssa-loop-im.cc (gather_mem_refs_stmt): When canonicalizing
also scrap base and offset in case the ref is indirect.

* g++.dg/torture/pr111019.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr111019.C | 65 +
 gcc/tree-ssa-loop-im.cc | 14 +-
 2 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr111019.C

diff --git a/gcc/testsuite/g++.dg/torture/pr111019.C 
b/gcc/testsuite/g++.dg/torture/pr111019.C
new file mode 100644
index 000..ce21a311c96
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr111019.C
@@ -0,0 +1,65 @@
+// { dg-do run }
+// { dg-additional-options "-fstrict-aliasing" }
+
+#include 
+#include 
+#include 
+
+class Base
+{
+public:
+  Base* previous = nullptr;
+  Base* next = nullptr;
+  Base* target = nullptr;
+};
+
+class Target : public Base
+{
+public:
+  __attribute__((always_inline)) ~Target()
+  {
+while (this->next)
+{
+  Base* n = this->next;
+
+  if (n->previous)
+n->previous->next = n->next;
+  if (n->next)
+n->next->previous = n->previous;
+  n->previous = nullptr;
+  n->next = nullptr;
+  n->target = nullptr;
+}
+  }
+};
+
+template 
+class TargetWithData final : public Target
+{
+public:
+  TargetWithData(T data)
+: data(data)
+  {}
+  T data;
+};
+
+void test()
+{
+  printf("test\n");
+  Base ptr;
+  {
+auto data = 
std::make_unique>(std::string("asdf"));
+ptr.target = &*data;
+ptr.previous = &*data;
+data->next = 
+
+assert(ptr.target != nullptr);
+  }
+  assert(ptr.target == nullptr);
+}
+
+int main(int, char**)
+{
+  test();
+  return 0;
+}
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 268f466bdc9..b8e33a4d49a 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -1665,11 +1665,21 @@ gather_mem_refs_stmt (class loop *loop, gimple *stmt)
 unshare_expr (mem_base));
  if (TYPE_ALIGN (ref_type) != ref_align)
ref_type = build_aligned_type (ref_type, ref_align);
- (*slot)->mem.ref
+ tree new_ref
= fold_build2 (MEM_REF, ref_type, tmp,
   build_int_cst (ref_alias_type, mem_off));
  if ((*slot)->mem.volatile_p)
-   TREE_THIS_VOLATILE ((*slot)->mem.ref) = 1;
+   TREE_THIS_VOLATILE (new_ref) = 1;
+ (*slot)->mem.ref = new_ref;
+ /* Make sure the recorded base and offset are consistent
+with the newly built ref.  */
+ if (TREE_CODE (TREE_OPERAND (new_ref, 0)) == ADDR_EXPR)
+   ;
+ else
+   {
+ (*slot)->mem.base = new_ref;
+ (*slot)->mem.offset = 0;
+   }
  gcc_checking_assert (TREE_CODE ((*slot)->mem.ref) == MEM_REF
   && is_gimple_mem_ref_addr
(TREE_OPERAND ((*slot)->mem.ref,
-- 
2.35.3

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Richard Biener via Gcc-patches

On Fri, 18 Aug 2023, Tamar Christina wrote:

> > > Yeah if you comment it out one of the testcases should fail.
> > 
> > using new_preheader instead of e->dest would make things clearer.
> > 
> > You are now adding the same arg to every exit (you've just queried the
> > main exit redirect_edge_var_map_vector).
> > 
> > OK, so I think I understand what you're doing.  If I understand
> > correctly we know that when we exit the main loop via one of the
> > early exits we are definitely going to enter the epilog but when
> > we take the main exit we might not.
> > 
> 
> Correct.. but..
> 
> > Looking at the CFG we create currently this isn't reflected and
> > this complicates this PHI node updating.  What I'd try to do
> > is leave redirecting the alternate exits until after
> 
> It is, in the case of the alternate exits this is reflected in copying
> the same values, as they are the values of the number of completed 
> iterations since the scalar code restarts the last iteration.
> 
> So all the PHI nodes of the alternate exits are correct.  The vector
> Iteration doesn't handle the partial iteration.
> 
> > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > means leaving it almost unchanged besides the LC SSA maintaining
> > changes.  After that for the multi-exit case split the
> > epilog preheader edge and redirect all the alternate exits to the
> > new preheader.  So the CFG becomes
> > 
> >  
> > /  |
> >/
> >   /  if (epilog)
> >alt exits //  \
> > //loop around
> > |   /
> >preheader with "header" PHIs
> >   |
> >   
> > 
> > note you need the header PHIs also on the main exit path but you
> > only need the loop end PHIs there.
> > 
> > It seems so that at least currently the order of things makes
> > them more complicated than necessary.
> 
> I've been trying to, but this representation seems a lot harder to work with,
> In particular at the moment once we exit 
> slpeel_tree_duplicate_loop_to_edge_cfg
> the loop structure is exactly the same as one expects from any normal epilog 
> vectorization.
> 
> But this new representation requires me to place the guard much earlier than 
> the epilogue
> preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it 
> seems that this split
> is there to only indicate that we always enter the epilog when taking an 
> early exit.
> 
> Today this is reflected in the values of the PHI nodes rather than 
> structurally.  Once we place
> The guard we update the nodes and the alternate exits get their value for 
> ivtmp updated to VF.
> 
> This representation also forces me to do the redirection in every call site of
> slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in 
> all use sites.
> 
> But I think this doesn't address the main reason why the 
> slpeel_tree_duplicate_loop_to_edge_cfg
> code has a large block of code to deal with PHI node updates.
> 
> The reason as you mentioned somewhere else is that after we redirect the 
> edges I have to reconstruct
> the phi nodes.  For most it's straight forwards, but for live values or vuse 
> chains it requires extra code.
> 
> You're right in that before we redirect the edges they are all correct in the 
> exit block, you mentioned that
> the API for the edge redirection is supposed to copy the values over if I 
> create the phi nodes before hand.
> 
> However this doesn't seem to work:
> 
>  for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
>  !gsi_end_p (gsi_from); gsi_next (_from))
>   {
> gimple *from_phi = gsi_stmt (gsi_from);
> tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> create_phi_node (new_res, new_preheader);
>   }
> 
>   for (edge exit : loop_exits)
>   redirect_edge_and_branch (exit, new_preheader);
> 
> Still leaves them empty.  Grepping around most code seems to pair 
> redirect_edge_and_branch with
> copy_phi_arg_into_existing_phi.  The problem is that in all these cases after 
> redirecting an edge they
> call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the 
> phi nodes.

You need to call flush_pending_stmts on each edge you redirect.
copy_phi_arg_into_existing_phi isn't suitable for edge redirecting.

> This is because as I redirect_edge_and_branch destroys the phi node entries 
> and copy_phi_arg_into_existing_phi
> simply just reads the gimple_phi_arg_def which would be NULL.
> 
> You could point it to the src block of the exit, in which case it copies the 
> wrong values in for the vuses.  At the end
> of vectorization the cfgcleanup code does the same thing to maintain LCSSA if 
> you haven't.  This code always goes
> wrong for multiple exits because of the problem described above.  There's no 
> node for it to copy the right value
> from.
> 
> As an alternate approach I can split the exit

[COMMITTED] bpf: bump maximum frame size limit to 32767 bytes

2023-08-18 Thread Jose E. Marchesi via Gcc-patches

This commit bumps the maximum stack frame size allowed for BPF
functions to the maximum possible value.

Tested in x86_64-linux-gnu host and target bpf-unknown-none.

gcc/ChangeLog

* config/bpf/bpf.opt (mframe-limit): Set default to 32767.

gcc/testsuite/ChangeLog

* gcc.target/bpf/frame-limit-1.c: New test.
* gcc.target/bpf/frame-limit-2.c: Likewise.
---
 gcc/config/bpf/bpf.opt   |  2 +-
 gcc/testsuite/gcc.target/bpf/frame-limit-1.c | 18 ++
 gcc/testsuite/gcc.target/bpf/frame-limit-2.c | 16 
 3 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/frame-limit-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/frame-limit-2.c

diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index 8e240d397e4..efa0380ee3f 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -38,7 +38,7 @@ Target RejectNegative InverseMask(BIG_ENDIAN)
 Generate little-endian eBPF.
 
 mframe-limit=
-Target Joined RejectNegative UInteger IntegerRange(0, 32767) 
Var(bpf_frame_limit) Init(512)
+Target Joined RejectNegative UInteger IntegerRange(0, 32767) 
Var(bpf_frame_limit) Init(32767)
 Set a hard limit for the size of each stack frame, in bytes.
 
 mco-re
diff --git a/gcc/testsuite/gcc.target/bpf/frame-limit-1.c 
b/gcc/testsuite/gcc.target/bpf/frame-limit-1.c
new file mode 100644
index 000..7843e04b5ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/frame-limit-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+/* The stack frame size is limited to 32767 bytes.  */
+
+int
+foo ()
+{
+  long data[4095];
+  return 0;
+}
+
+int
+bar ()
+{
+  long data[4096];
+  return 0;
+} /* { dg-error "stack limit" } */
diff --git a/gcc/testsuite/gcc.target/bpf/frame-limit-2.c 
b/gcc/testsuite/gcc.target/bpf/frame-limit-2.c
new file mode 100644
index 000..57f82e00567
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/frame-limit-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mframe-limit=256" } */
+
+int
+foo ()
+{
+  long data[32];
+  return 0;
+}
+
+int
+bar ()
+{
+  long data[33];
+  return 0;
+} /* { dg-error "stack limit" } */
-- 
2.30.2

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-18 Thread Richard Biener via Gcc-patches

On Fri, 18 Aug 2023, Richard Biener wrote:

> On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:
> 
> > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener  writes:
> > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener 
> > > >>  wrote:
> > > >> > It doesn't seem to make a difference for x86.  That said, the "fix" 
> > > >> > is
> > > >> > probably sticking the correct target on the dump-check, it seems
> > > >> > that vect_fold_extract_last is no longer correct here.
> > > >> Um sorry, I did go thru various checks in target-supports.exp, but not
> > > >> sure which one will be appropriate for this case,
> > > >> and am stuck here :/ Could you please suggest how to proceed ?
> > > >
> > > > Maybe Richard S. knows the magic thing to test, he originally
> > > > implemented the direct conversion support.  I suggest to implement
> > > > such dg-checks if they are not present (I can't find them),
> > > > possibly quite specific to the modes involved (like we have
> > > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > > just _float).
> > >
> > > Yeah, can't remember specific selectors for that feature.  TBH I think
> > > most (all?) of the tests were AArch64-specific.
> > Hi,
> > As Richi mentioned above, the test now vectorizes on AArch64 because
> > it has support for direct conversion
> > between vectors while x86 doesn't. IIUC this is because
> > supportable_convert_operation returns true
> > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> > doing the conversion ?
> > 
> > In the attached patch, I added a new target check vect_extend which
> > (currently) returns 1 only for aarch64*-*-*,
> > which makes the test PASS on both the targets, altho I am not sure if
> > this is entirely correct.
> > Does the patch look OK ?
> 
> Can you make vect_extend more specific, say vect_extend_hi_si or
> what is specifically needed here?  Note I'll have to investigate
> why x86 cannot vectorize here since in fact it does have
> the extend operation ... it might be also worth splitting the
> sign/zero extend case, so - vect_sign_extend_hi_si or
> vect_extend_short_int?

And now having anaylzed _why_ x86 doesn't vectorize it's rather
why we get this vectorized with NEON which is because

static opt_machine_mode
aarch64_vectorize_related_mode (machine_mode vector_mode,
scalar_mode element_mode,
poly_uint64 nunits)
{
...
  /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
  if (TARGET_SIMD
  && (vec_flags & VEC_ADVSIMD)
  && known_eq (nunits, 0U)
  && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
  && maybe_ge (GET_MODE_BITSIZE (element_mode)
   * GET_MODE_NUNITS (vector_mode), 128U))
{
  machine_mode res = aarch64_simd_container_mode (element_mode, 128);
  if (VECTOR_MODE_P (res))
return res;

which makes us get a V4SImode vector for a V4HImode loop vector_mode.

So I think the appropriate effective dejagnu target is
aarch64-*-* (there's none specifically to advsimd, not sure if one
can disable that?)

Richard.

> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Tamar Christina via Gcc-patches

> > Yeah if you comment it out one of the testcases should fail.
> 
> using new_preheader instead of e->dest would make things clearer.
> 
> You are now adding the same arg to every exit (you've just queried the
> main exit redirect_edge_var_map_vector).
> 
> OK, so I think I understand what you're doing.  If I understand
> correctly we know that when we exit the main loop via one of the
> early exits we are definitely going to enter the epilog but when
> we take the main exit we might not.
> 

Correct.. but..

> Looking at the CFG we create currently this isn't reflected and
> this complicates this PHI node updating.  What I'd try to do
> is leave redirecting the alternate exits until after

It is, in the case of the alternate exits this is reflected in copying
the same values, as they are the values of the number of completed 
iterations since the scalar code restarts the last iteration.

So all the PHI nodes of the alternate exits are correct.  The vector
Iteration doesn't handle the partial iteration.

> slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> means leaving it almost unchanged besides the LC SSA maintaining
> changes.  After that for the multi-exit case split the
> epilog preheader edge and redirect all the alternate exits to the
> new preheader.  So the CFG becomes
> 
>  
> /  |
>/
>   /  if (epilog)
>alt exits //  \
> //loop around
> |   /
>preheader with "header" PHIs
>   |
>   
> 
> note you need the header PHIs also on the main exit path but you
> only need the loop end PHIs there.
> 
> It seems so that at least currently the order of things makes
> them more complicated than necessary.

I've been trying to, but this representation seems a lot harder to work with,
In particular at the moment once we exit slpeel_tree_duplicate_loop_to_edge_cfg
the loop structure is exactly the same as one expects from any normal epilog 
vectorization.

But this new representation requires me to place the guard much earlier than 
the epilogue
preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it 
seems that this split
is there to only indicate that we always enter the epilog when taking an early 
exit.

Today this is reflected in the values of the PHI nodes rather than 
structurally.  Once we place
The guard we update the nodes and the alternate exits get their value for ivtmp 
updated to VF.

This representation also forces me to do the redirection in every call site of
slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in all 
use sites.

But I think this doesn't address the main reason why the 
slpeel_tree_duplicate_loop_to_edge_cfg
code has a large block of code to deal with PHI node updates.

The reason as you mentioned somewhere else is that after we redirect the edges 
I have to reconstruct
the phi nodes.  For most it's straight forwards, but for live values or vuse 
chains it requires extra code.

You're right in that before we redirect the edges they are all correct in the 
exit block, you mentioned that
the API for the edge redirection is supposed to copy the values over if I 
create the phi nodes before hand.

However this doesn't seem to work:

 for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
   !gsi_end_p (gsi_from); gsi_next (_from))
{
  gimple *from_phi = gsi_stmt (gsi_from);
  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
  create_phi_node (new_res, new_preheader);
}

  for (edge exit : loop_exits)
redirect_edge_and_branch (exit, new_preheader);

Still leaves them empty.  Grepping around most code seems to pair 
redirect_edge_and_branch with
copy_phi_arg_into_existing_phi.  The problem is that in all these cases after 
redirecting an edge they
call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the phi 
nodes.

This is because as I redirect_edge_and_branch destroys the phi node entries and 
copy_phi_arg_into_existing_phi
simply just reads the gimple_phi_arg_def which would be NULL.

You could point it to the src block of the exit, in which case it copies the 
wrong values in for the vuses.  At the end
of vectorization the cfgcleanup code does the same thing to maintain LCSSA if 
you haven't.  This code always goes
wrong for multiple exits because of the problem described above.  There's no 
node for it to copy the right value
from.

As an alternate approach I can split the exit edges, copy the phi nodes into 
the split and after that redirect them.
This however creates the awkwardness of having the exit edges no longer connect 
to the preheader.

All of this then begs the question if this is all easier than the current 
approach which is just to read the edge var
map to figure out the nodes that were removed during the redirect.

Maybe I'm still misunderstanding the API,

[committed] libstdc++: Replace non-type-dependent uses of wchar_t in and

2023-08-18 Thread Jonathan Wakely via Gcc-patches

This should be really fixed now!

Tested x86_64-linux. Pushed to trunk.

-- >8 --

This is one more piece of the rework to make wchar_t support in
std::format depend on _GLIBCXX_USE_WCHAR_T.

In  the __to_wstring_numeric function is called with arguments
that aren't type-dependent, so a declaration needs to be available, or
the calls need to be guarded by _GLIBCXX_USE_WCHAR_T.

In  there is a similarly non-type-dependent call to std::format
with a wchar_t format string, which is ill-formed when the wchar_t
overloads of std::format are not declared. Use _GLIBCXX_WIDEN to make it
type-dependent.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (operator<<): Make uses of wide
strings with streams and std::format type-dependent on _CharT.
* include/std/format [!_GLIBCXX_USE_WCHAR_T] Do not use
__to_wstring_numeric.
---
 libstdc++-v3/include/bits/chrono_io.h | 17 ++---
 libstdc++-v3/include/std/format   | 10 --
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index e16302baf84..d558802e7d8 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2390,15 +2390,14 @@ namespace __detail
   __os2.imbue(__os.getloc());
   __os2 << __wdi.weekday();
   const auto __i = __wdi.index();
-  if constexpr (is_same_v<_CharT, char>)
-   __os2 << std::format("[{}", __i);
-  else
-   __os2 << std::format(L"[{}", __i);
-  basic_string_view<_CharT> __s = _GLIBCXX_WIDEN(" is not a valid index]");
+  basic_string_view<_CharT> __s
+   = _GLIBCXX_WIDEN("[ is not a valid index]");
+  __os2 << __s[0];
+  __os2 << std::format(_GLIBCXX_WIDEN("{}"), __i);
   if (__i >= 1 && __i <= 5)
__os2 << __s.back();
   else
-   __os2 << __s;
+   __os2 << __s.substr(1);
   __os << __os2.view();
   return __os;
 }
@@ -2457,11 +2456,7 @@ namespace __detail
   // As above, just write straight to a stringstream, as if by "{:L}/last"
   basic_stringstream<_CharT> __os2;
   __os2.imbue(__os.getloc());
-  __os2 << __mdl.month();
-  if constexpr (is_same_v<_CharT, char>)
-   __os2 << "/last";
-  else
-   __os2 << L"/last";
+  __os2 << __mdl.month() << _GLIBCXX_WIDEN("/last");
   __os << __os2.view();
   return __os;
 }
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 0b1ae8201af..648f847ad96 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1142,13 +1142,15 @@ namespace __format
  basic_string_view<_CharT> __str;
  if constexpr (is_same_v)
__str = __narrow_str;
+#ifdef _GLIBCXX_USE_WCHAR_T
  else
{
  size_t __n = __narrow_str.size();
  auto __p = (_CharT*)__builtin_alloca(__n * sizeof(_CharT));
- __to_wstring_numeric(__narrow_str.data(), __n, __p);
+ std::__to_wstring_numeric(__narrow_str.data(), __n, __p);
  __str = {__p, __n};
}
+#endif
 
  if (_M_spec._M_localized)
{
@@ -1624,11 +1626,13 @@ namespace __format
  basic_string_view<_CharT> __str;
  if constexpr (is_same_v<_CharT, char>)
__str = __narrow_str;
+#ifdef _GLIBCXX_USE_WCHAR_T
  else
{
  __wstr = std::__to_wstring_numeric(__narrow_str);
  __str = __wstr;
}
+#endif
 
  if (_M_spec._M_localized)
{
@@ -2290,12 +2294,14 @@ namespace __format
  basic_string_view<_CharT> __str;
  if constexpr (is_same_v<_CharT, char>)
__str = string_view(__buf, __n);
+#ifdef _GLIBCXX_USE_WCHAR_T
  else
{
  auto __p = (_CharT*)__builtin_alloca(__n * sizeof(_CharT));
- __to_wstring_numeric(__buf, __n, __p);
+ std::__to_wstring_numeric(__buf, __n, __p);
  __str = wstring_view(__p, __n);
}
+#endif
 
 #if _GLIBCXX_P2518R3
  if (_M_spec._M_zero_fill)
-- 
2.41.0

Re: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-08-18 Thread Stefan Schulze Frielinghaus via Gcc-patches

Ping.  Since this fixes bootstrap problem PR110939 for Loongarch I'm
pingen this one earlier.

On Thu, Aug 10, 2023 at 03:04:03PM +0200, Stefan Schulze Frielinghaus wrote:
> In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
> completely missed the fact that the normal form of a generated constant for a
> mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the
> actual constant.  This even holds true for unsigned constants.
> 
> Fixed by masking out the upper bits for the incoming constant and sign
> extending the resulting unsigned constant.
> 
> Bootstrapped and regtested on x64 and s390x.  Ok for mainline?
> 
> While reading existing optimizations in combine I stumbled across two
> optimizations where either my intuition about the representation of
> unsigned integers via a const_int rtx is wrong, which then in turn would
> probably also mean that this patch is wrong, or that the optimizations
> are missed sometimes.  In other words in the following I would assume
> that the upper bits are masked out:
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 468b7fde911..80c4ff0fbaf 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>/* (unsigned) < 0x8000 is equivalent to >= 0.  */
>else if (is_a  (mode, _mode)
>&& GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> -  && ((unsigned HOST_WIDE_INT) const_op
> +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> (int_mode))
>== HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> 1)))
> {
>   const_op = 0;
> @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>/* (unsigned) >= 0x8000 is equivalent to < 0.  */
>else if (is_a  (mode, _mode)
>&& GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
> -  && ((unsigned HOST_WIDE_INT) const_op
> +  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
> (int_mode))
>== HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 
> 1)))
> {
>   const_op = 0;
> 
> For example, while bootstrapping on x64 the optimization is missed since
> a LTU comparison in QImode is done and the constant equals
> 0xff80.
> 
> Sorry for inlining another patch, but I would really like to make sure
> that my understanding is correct, now, before I come up with another
> patch.  Thus it would be great if someone could shed some light on this.
> 
> gcc/ChangeLog:
> 
>   * combine.cc (simplify_compare_const): Properly handle unsigned
>   constants while narrowing comparison of memory and constants.
> ---
>  gcc/combine.cc | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index e46d202d0a7..468b7fde911 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>&& !MEM_VOLATILE_P (op0)
>/* The optimization makes only sense for constants which are big enough
>so that we have a chance to chop off something at all.  */
> -  && (unsigned HOST_WIDE_INT) const_op > 0xff
> -  /* Bail out, if the constant does not fit into INT_MODE.  */
> -  && (unsigned HOST_WIDE_INT) const_op
> -  < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) - 1)
> +  && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > 
> 0xff
>/* Ensure that we do not overflow during normalization.  */
> -  && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
> HOST_WIDE_INT_M1U))
> +  && (code != GTU
> +   || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode))
> +  < HOST_WIDE_INT_M1U)
> +  && trunc_int_for_mode (const_op, int_mode) == const_op)
>  {
> -  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op;
> +  unsigned HOST_WIDE_INT n
> + = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
>enum rtx_code adjusted_code;
>  
>/* Normalize code to either LEU or GEU.  */
> @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, 
> machine_mode mode,
>   HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
>   HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode),
>   GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
> - (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME (adjusted_code),
> - n);
> + (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode),
> + GET_RTX_NAME (adjusted_code), n);
>   }
> poly_int64 offset = (BYTES_BIG_ENDIAN
>  ? 0
>  : (GET_MODE_SIZE (int_mode)
>

Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern

2023-08-18 Thread Lehua Ding

On 2023/8/11 23:57, Jeff Law wrote:

On 8/8/23 21:54, Lehua Ding wrote:

Hi Jeff,

 > The pattern's operand 0 explicitly allows MEMs as do the constraints.
 > So forcing the operand into a register just seems like it's papering
 > over the real problem.

The added of force_reg code is address the problem preduced after 
address the error combine.
The more restrict condtion of the pattern forbidden mem->mem pattern 
which will
produced in -O0. I think the implementation forgot to do this 
force_reg operation before
when doing the intrinis expansion The reason this problem isn't 
exposed before is because
the reload pass will converts mem->mem to mem->reg; reg->mem based on 
the constraint.

So if the core issue if mem->mem, that is a common thing to avoid.

Basically in the expander you use a force_reg and then have a test like
!(MEM_P (op0) && MEM_P (op1)) in the define_insn's condition.

But the v1 had a much more complex condition.  It looks like that got 
cleaned up in the v2.  So I'll need to look at that one more closely.

Gentle ping V2, thanks.

 > This comment doesn't make sense in conjuction with your earlier 
details.

 > In particular combine doesn't run at -O0, so your earlier comment that
 > combine creates the problem seems inconsistent with the comment above.

As the above says, the code addresses the problem which produced
after addressing the combine problem.
But combine doesn't run at -O0.  So something is inconsistent.  I 
certainly believe we need to avoid the mem->mem case, but that's 
independent of combine and affects all optimization levels.

I think it's the comment written here that is the problem. I plan to 
change it to this:

  /* Since there is no intrinsic where target is a mem operand, it must
 be converted to reg if it is a mem operand.  */

--
Best,
Lehua

Re: [PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-18 Thread Juzhe Zhong




Thanks for Richi.

I will wait for Richard's comments and fix for both of you then send V2 
patch.

Re: [PATCH] c: Add support for [[extension ...]]

2023-08-18 Thread Richard Sandiford via Gcc-patches

Richard Sandiford  writes:
> Joseph Myers  writes:
>> On Wed, 16 Aug 2023, Richard Sandiford via Gcc-patches wrote:
>>
>>> Would it be OK to add support for:
>>> 
>>>   [[__extension__ ...]]
>>> 
>>> to suppress the pedwarn about using [[]] prior to C2X?  Then we can
>>
>> That seems like a plausible feature to add.
>
> Thanks.  Of course, once I actually tried it, I hit a snag:
> :: isn't a single lexing token prior to C2X, and so something like:
>
>   [[__extension__ arm::streaming]]
>
> would not be interpreted as a scoped attribute in C11.  The patch
> gets around that by allowing two colons in place of :: when
> __extension__ is used.  I realise that's pushing the bounds of
> acceptability though...
>
> I wondered about trying to require the two colons to be immediately
> adjacent.  But:
>
> (a) There didn't appear to be an existing API to check that, which seemed
> like a red flag.  The closest I could find was get_source_text_between.
>
> Similarly to that, it would in principle be possible to compare
> two expanded locations.  But...
>
> (b) I had a vague impression that locations were allowed to drop column
> information for very large inputs (maybe I'm wrong).
>
> (c) It wouldn't cope with token pasting.
>
> So in the end I just used a simple two-token test, like for [[ and ]].
>
> Bootstrapped & regression-tested on aarch64-linux-gnu.

Gah, as mentioned yesterday, the patch was peeking the wrong token.
I've fixed that, and added corresponding tests.  Sorry for missing
it first time.

Richard

-

[[]] attributes are a recent addition to C, but as a GNU extension,
GCC allows them to be used in C11 and earlier.  Normally this use
would trigger a pedwarn (for -pedantic, -Wc11-c2x-compat, etc.).

This patch allows the pedwarn to be suppressed by starting the
attribute-list with __extension__.

Also, :: is not a single lexing token prior to C2X, so it wasn't
possible to use scoped attributes in C11, even as a GNU extension.
The patch allows two colons to be used in place of :: when
__extension__ is used.  No attempt is made to check whether the
two colons are immediately adjacent.

gcc/
* doc/extend.texi: Document the C [[__extension__ ...]] construct.

gcc/c/
* c-parser.cc (c_parser_std_attribute): Conditionally allow
two colons to be used in place of ::.
(c_parser_std_attribute_list): New function, split out from...
(c_parser_std_attribute_specifier): ...here.  Allow the attribute-list
to start with __extension__.  When it does, also allow two colons
to be used in place of ::.

gcc/testsuite/
* gcc.dg/c2x-attr-syntax-6.c: New test.
* gcc.dg/c2x-attr-syntax-7.c: Likewise.
---
 gcc/c/c-parser.cc| 64 ++--
 gcc/doc/extend.texi  | 27 --
 gcc/testsuite/gcc.dg/c2x-attr-syntax-6.c | 62 +++
 gcc/testsuite/gcc.dg/c2x-attr-syntax-7.c | 60 ++
 4 files changed, 193 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c2x-attr-syntax-6.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-attr-syntax-7.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 33fe7b115ff..ca60c51ddb2 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -5390,10 +5390,18 @@ c_parser_balanced_token_sequence (c_parser *parser)
  ( balanced-token-sequence[opt] )
 
Keywords are accepted as identifiers for this purpose.
-*/
+
+   As an extension, we permit an attribute-specifier to be:
+
+ [ [ __extension__ attribute-list ] ]
+
+   Two colons are then accepted as a synonym for ::.  No attempt is made
+   to check whether the colons are immediately adjacent.  LOOSE_SCOPE_P
+   indicates whether this relaxation is in effect.  */
 
 static tree
-c_parser_std_attribute (c_parser *parser, bool for_tm)
+c_parser_std_attribute (c_parser *parser, bool for_tm,
+   bool loose_scope_p = false)
 {
   c_token *token = c_parser_peek_token (parser);
   tree ns, name, attribute;
@@ -5406,9 +5414,14 @@ c_parser_std_attribute (c_parser *parser, bool for_tm)
 }
   name = canonicalize_attr_name (token->value);
   c_parser_consume_token (parser);
-  if (c_parser_next_token_is (parser, CPP_SCOPE))
+  if (c_parser_next_token_is (parser, CPP_SCOPE)
+  || (loose_scope_p
+ && c_parser_next_token_is (parser, CPP_COLON)
+ && c_parser_peek_2nd_token (parser)->type == CPP_COLON))
 {
   ns = name;
+  if (c_parser_next_token_is (parser, CPP_COLON))
+   c_parser_consume_token (parser);
   c_parser_consume_token (parser);
   token = c_parser_peek_token (parser);
   if (token->type != CPP_NAME && token->type != CPP_KEYWORD)
@@ -5481,19 +5494,9 @@ c_parser_std_attribute (c_parser *parser, bool for_tm)
 }
 
 static tree
-c_parser_std_attribute_specifier (c_parser *parser, bool for_tm)
+c_parser_std_attribute_list (c_parser *parser, bool for_tm,
+

Re: [PATCH] vect: Factor out the handling on scatter store having gs_info.decl

2023-08-18 Thread Richard Biener via Gcc-patches

On Thu, Aug 17, 2023 at 8:22 AM Kewen.Lin  wrote:
>
> Hi,
>
> Similar to the existing function vect_build_gather_load_calls,
> this patch is to factor out the handling on scatter store
> having gs_info.decl to vect_build_scatter_store_calls which
> is a new function.  It also does some minor refactoring like
> moving some variables' declarations close to their uses and
> restrict the scope for some of them etc.
>
> It's a pre-patch for upcoming vectorizable_store re-structuring
> for costing.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK.

Richard.

> Kewen
> -
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vect_build_scatter_store_calls): New, factor
> out from ...
> (vectorizable_store): ... here.
> ---
>  gcc/tree-vect-stmts.cc | 411 +
>  1 file changed, 212 insertions(+), 199 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index cd8e0a76374..f8a904de503 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2989,6 +2989,216 @@ vect_build_gather_load_calls (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>*vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
>  }
>
> +/* Build a scatter store call while vectorizing STMT_INFO.  Insert new
> +   instructions before GSI and add them to VEC_STMT.  GS_INFO describes
> +   the scatter store operation.  If the store is conditional, MASK is the
> +   unvectorized condition, otherwise MASK is null.  */
> +
> +static void
> +vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info,
> +   gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +   gather_scatter_info *gs_info, tree mask)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast (vinfo);
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  int ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +  enum { NARROW, NONE, WIDEN } modifier;
> +  poly_uint64 scatter_off_nunits
> += TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
> +
> +  tree perm_mask = NULL_TREE, mask_halfvectype = NULL_TREE;
> +  if (known_eq (nunits, scatter_off_nunits))
> +modifier = NONE;
> +  else if (known_eq (nunits * 2, scatter_off_nunits))
> +{
> +  modifier = WIDEN;
> +
> +  /* Currently gathers and scatters are only supported for
> +fixed-length vectors.  */
> +  unsigned int count = scatter_off_nunits.to_constant ();
> +  vec_perm_builder sel (count, count, 1);
> +  for (unsigned i = 0; i < (unsigned int) count; ++i)
> +   sel.quick_push (i | (count / 2));
> +
> +  vec_perm_indices indices (sel, 1, count);
> +  perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype, 
> indices);
> +  gcc_assert (perm_mask != NULL_TREE);
> +}
> +  else if (known_eq (nunits, scatter_off_nunits * 2))
> +{
> +  modifier = NARROW;
> +
> +  /* Currently gathers and scatters are only supported for
> +fixed-length vectors.  */
> +  unsigned int count = nunits.to_constant ();
> +  vec_perm_builder sel (count, count, 1);
> +  for (unsigned i = 0; i < (unsigned int) count; ++i)
> +   sel.quick_push (i | (count / 2));
> +
> +  vec_perm_indices indices (sel, 2, count);
> +  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
> +  gcc_assert (perm_mask != NULL_TREE);
> +  ncopies *= 2;
> +
> +  if (mask)
> +   mask_halfvectype = truth_type_for (gs_info->offset_vectype);
> +}
> +  else
> +gcc_unreachable ();
> +
> +  tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl));
> +  tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl));
> +  tree ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
> +  tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
> +  tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
> +  tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
> +  tree scaletype = TREE_VALUE (arglist);
> +
> +  gcc_checking_assert (TREE_CODE (masktype) == INTEGER_TYPE
> +  && TREE_CODE (rettype) == VOID_TYPE);
> +
> +  tree ptr = fold_convert (ptrtype, gs_info->base);
> +  if (!is_gimple_min_invariant (ptr))
> +{
> +  gimple_seq seq;
> +  ptr = force_gimple_operand (ptr, , true, NULL_TREE);
> +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  edge pe = loop_preheader_edge (loop);
> +  basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
> +  gcc_assert (!new_bb);
> +}
> +
> +  tree mask_arg = NULL_TREE;
> +  if (mask == NULL_TREE)
> +{
> +  mask_arg = build_int_cst (masktype, -1);
> +  mask_arg = vect_init_vector (vinfo, stmt_info, mask_arg, masktype, 
> NULL);
> +}
> +
> +  tree scale = build_int_cst (scaletype, gs_info->scale);
> +
> +  auto_vec

Re: [PATCH] Makefile.in: Make TM_P_H depend on $(TREE_H) [PR111021]

2023-08-18 Thread Richard Biener via Gcc-patches

On Thu, Aug 17, 2023 at 8:15 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR111021 shows, the below ${port}-protos.h include tree.h
> for code_helper and tree_code:
>
>   arm/arm-protos.h:#include "tree.h"
>   cris/cris-protos.h:#include "tree.h" (H-P removed this in r14-3218)
>   microblaze/microblaze-protos.h:#include "tree.h"
>   rl78/rl78-protos.h:#include "tree.h"
>   stormy16/stormy16-protos.h:#include "tree.h"
>
> , when compiling build/gencondmd.cc, the include hierarchy
> makes it depend on tm_p.h -> ${port}-protos.h -> tree.h,
> which further includes (depends on) some files that are
> generated during the building, such as: all-tree.def,
> tree-check.h and so on.  The previous commit r14-3215
> should already force build/gencondmd.cc to depend on
> ${TREE_H}, so the reported build failure should be gone.
>
> But for a long term maintenance, especially one day some
> build/xxx.cc requires tm_p.h but not recog.h, the ${TREE_H}
> dependence could be missed and a build failure will show
> up.  So this patch is to make TM_P_H depend on $(TREE_H),
> any new build/xxx.cc depending on tm_p.h will be able to
> consider ${TREE_H}.
>
> It's tested with cross-builds for the affected ports with
> steps:
>  1) dropped the fix r14-3215;
>  2) reproduced the build failure with serial build;
>  3) applied this patch, serially built and verified all passed;
>  4) added back r14-3215, serially built and verified all passed;
>
> Also bootstrapped and regtested on x86_64-redhat-linux and
> powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK.

> BR,
> Kewen
> -
> PR bootstrap/111021
>
> gcc/ChangeLog:
>
> * Makefile.in (TM_P_H): Add $(TREE_H) as dependence.
> ---
>  gcc/Makefile.in | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 9dddb65b45d..b85c967951b 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -893,7 +893,8 @@ OPTIONS_C_EXTRA = $(PRETTY_PRINT_H)
>  BCONFIG_H = bconfig.h $(build_xm_file_list)
>  CONFIG_H  = config.h  $(host_xm_file_list)
>  TCONFIG_H = tconfig.h $(xm_file_list)
> -TM_P_H= tm_p.h$(tm_p_file_list)
> +# Some $(target)-protos.h depends on tree.h
> +TM_P_H= tm_p.h$(tm_p_file_list) $(TREE_H)
>  TM_D_H= tm_d.h$(tm_d_file_list)
>  GTM_H = tm.h  $(tm_file_list) insn-constants.h
>  TM_H  = $(GTM_H) insn-flags.h $(OPTIONS_H)
> --
> 2.39.1

Re: [PATCH] MATCH: Sink convert for vec_cond

2023-08-18 Thread Richard Biener via Gcc-patches

On Thu, Aug 17, 2023 at 3:38 AM Andrew Pinski via Gcc-patches
 wrote:
>
> Convert be sinked into a vec_cond if both sides
> fold. Unlike other unary operations, we need to check that we still can handle
> this vec_cond's first operand is the same as the new truth type.
>
> I tried a few different versions of this patch:
> view_convert to the new truth_type but that does not work as we always 
> support all vec_cond
> afterwards.
> using expand_vec_cond_expr_p; but that would allow too much.
>
> I also tried to see if view_convert can be handled here but we end up with:
>   _3 = VEC_COND_EXPR <_2, {  Nan(-1),  Nan(-1),  Nan(-1),  Nan(-1) }, { 0.0, 
> 0.0, 0.0, 0.0 }>;
> Which isel does not know how to handle as just being a view_convert from 
> `vector(4) `
> to `vector(4) float` and causes a regression with `g++.target/i386/pr88152.C`
>
> Note, in the case of the SVE testcase, we will sink negate after the convert 
> and be able
> to remove a few extra instructions in the end.
> Also with this change gcc.target/aarch64/sve/cond_unary_5.c will now pass.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
> PR tree-optimization/111006
> PR tree-optimization/110986
> * match.pd: (op(vec_cond(a,b,c))): Handle convert for op.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/111006
> * gcc.target/aarch64/sve/cond_convert_7.c: New test.
> ---
>  gcc/match.pd  |  9 
>  .../gcc.target/aarch64/sve/cond_convert_7.c   | 23 +++
>  2 files changed, 32 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index acd2a964917..ca5ab6f289d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4704,6 +4704,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(op (vec_cond:s @0 @1 @2))
>(vec_cond @0 (op! @1) (op! @2
>
> +/* Sink unary conversions to branches, but only if we do fold both
> +   and the target's truth type is the same as we already have.  */
> +(for op (convert)

This (for ..) looks unneeded?

Otherwise looks OK.

Thanks,
Richard.

> + (simplify
> +  (op (vec_cond:s @0 @1 @2))
> +  (if (VECTOR_TYPE_P (type)
> +   && types_match (TREE_TYPE (@0), truth_type_for (type)))
> +   (vec_cond @0 (op! @1) (op! @2)
> +
>  /* Sink binary operation to branches, but only if we can fold it.  */
>  (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
>  lshift rshift rdiv trunc_div ceil_div floor_div round_div
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c
> new file mode 100644
> index 000..4bb95b92195
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 
> -fdump-tree-optimized" } */
> +
> +/* This is a modified reduced version of cond_unary_5.c */
> +
> +void __attribute__ ((noipa))
> +f0 (unsigned short *__restrict r,
> +   int *__restrict a,
> +   int *__restrict pred)
> +{
> +  for (int i = 0; i < 1024; ++i)
> +  {
> +int p = pred[i]?-1:0;
> +r[i] = p ;
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]+/z, #-1} 1 } 
> } */
> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.[hs], p[0-7]+/z, #1} } } 
> */
> +
> +/* { dg-final { scan-tree-dump-not "VIEW_CONVERT_EXPR " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " = \\\(vector" "optimized" } } */
> --
> 2.31.1
>

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-18 Thread Richard Biener via Gcc-patches

On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:

> On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener  
> > >> wrote:
> > >> > It doesn't seem to make a difference for x86.  That said, the "fix" is
> > >> > probably sticking the correct target on the dump-check, it seems
> > >> > that vect_fold_extract_last is no longer correct here.
> > >> Um sorry, I did go thru various checks in target-supports.exp, but not
> > >> sure which one will be appropriate for this case,
> > >> and am stuck here :/ Could you please suggest how to proceed ?
> > >
> > > Maybe Richard S. knows the magic thing to test, he originally
> > > implemented the direct conversion support.  I suggest to implement
> > > such dg-checks if they are not present (I can't find them),
> > > possibly quite specific to the modes involved (like we have
> > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > just _float).
> >
> > Yeah, can't remember specific selectors for that feature.  TBH I think
> > most (all?) of the tests were AArch64-specific.
> Hi,
> As Richi mentioned above, the test now vectorizes on AArch64 because
> it has support for direct conversion
> between vectors while x86 doesn't. IIUC this is because
> supportable_convert_operation returns true
> for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> doing the conversion ?
> 
> In the attached patch, I added a new target check vect_extend which
> (currently) returns 1 only for aarch64*-*-*,
> which makes the test PASS on both the targets, altho I am not sure if
> this is entirely correct.
> Does the patch look OK ?

Can you make vect_extend more specific, say vect_extend_hi_si or
what is specifically needed here?  Note I'll have to investigate
why x86 cannot vectorize here since in fact it does have
the extend operation ... it might be also worth splitting the
sign/zero extend case, so - vect_sign_extend_hi_si or
vect_extend_short_int?

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-18 Thread Richard Biener via Gcc-patches

On Wed, 16 Aug 2023, Juzhe-Zhong wrote:

> Hi, Richard and Richi.
> 
> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
> 
> Consider this following case:
> #define TEST_TYPE(TYPE)   
>  \
>   __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,   
>  \
> TYPE *__restrict a,  \
> TYPE *__restrict b, int n)   \
>   {   
>  \
> for (int i = 0; i < n; i++)   
>  \
>   dst[i] -= a[i] * b[i];   \
>   }
> 
> #define TEST_ALL()
>  \
>   TEST_TYPE (float)   
>  \
> 
> TEST_ALL ()
> 
> Gimple IR for RVV:
> 
> ...
> _39 = -vect__8.14_26;
> vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, 
> vect__4.8_34, vect__4.8_34, _46, 0);
> ...
> 
> This is because this following piece of codes in tree-ssa-math-opts.cc:
> 
>   if (len)
>   fma_stmt
> = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
>   addop, else_value, len, bias);
>   else if (cond)
>   fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
>  op2, addop, else_value);
>   else
>   fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
>   gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
>   gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
>  use_stmt));
>   gsi_replace (, fma_stmt, true);
>   /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
>regardless of where the negation occurs.  */
>   gimple *orig_stmt = gsi_stmt (gsi);
>   if (fold_stmt (, follow_all_ssa_edges))
>   {
> if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
>   gcc_unreachable ();
> update_stmt (gsi_stmt (gsi));
>   }
> 
> 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA > COND_LEN_FNMA.
> 
> This patch support STMT fold into:
> 
> vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, 
> vect__4.8_34, { 0.0, ... }, _46, 0);
> 
> Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.
> 
> Extend maximum num ops:
> -  static const unsigned int MAX_NUM_OPS = 5;
> +  static const unsigned int MAX_NUM_OPS = 7;
> 
> Bootstrap and Regtest on X86 passed.
> 
> Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.
> 
> Testing on aarch64 is on progress.
> 
> gcc/ChangeLog:
> 
> * genmatch.cc (decision_tree::gen): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> * gimple-match-exports.cc (gimple_simplify): Ditto.
> (gimple_resimplify6): New function.
> (gimple_resimplify7): New function.
> (gimple_match_op::resimplify): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> (convert_conditional_op): Ditto.
> (build_call_internal): Ditto.
> (try_conditional_simplification): Ditto.
> (gimple_extract): Ditto.
> * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
> * internal-fn.cc (CASE): Ditto.
> 
> ---
>  gcc/genmatch.cc |   2 +-
>  gcc/gimple-match-exports.cc | 124 ++--
>  gcc/gimple-match.h  |  19 +-
>  gcc/internal-fn.cc  |  11 ++--
>  4 files changed, 144 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f46d2e1520d..a1925a747a7 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -4052,7 +4052,7 @@ decision_tree::gen (vec  , bool gimple)
>  }
>fprintf (stderr, "removed %u duplicate tails\n", rcnt);
>  
> -  for (unsigned n = 1; n <= 5; ++n)
> +  for (unsigned n = 1; n <= 7; ++n)
>  {
>bool has_kids_p = false;
>  
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index 7aeb4ddb152..895950309b7 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, gimple_seq 
> *, tree (*)(tree),
>code_helper, tree, tree, tree, tree, tree);
>  extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
>code_helper, tree, tree, tree, tree, tree, tree);
> +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
> +

Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-18 Thread Richard Biener via Gcc-patches

On Fri, 18 Aug 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following avoids running into somehow flawed logic in fold_vec_perm
> > for non-VLA vectors.
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > PR tree-optimization/111048
> > * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> > vectors first.
> >
> > * gcc.dg/torture/pr111048.c: New testcase.
> 
> Please don't do this as a permanent thing.  It was a deliberate choice
> to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
> logic gets more coverage.  Like you say, if something is wrong for VLS
> then the chances are that it's also wrong for VLA.

Sure, feel free to undo this change together with the fix for the
VLA case.

Richard.

> Thanks,
> Richard
> 
> 
> > ---
> >  gcc/fold-const.cc   | 12 ++--
> >  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 
> >  2 files changed, 30 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 5c51c9d91be..144fd7481b3 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree 
> > arg1, const vec_perm_indices ,
> >unsigned res_npatterns, res_nelts_per_pattern;
> >unsigned HOST_WIDE_INT res_nelts;
> >  
> > +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (_nelts))
> > +{
> > +  res_npatterns = res_nelts;
> > +  res_nelts_per_pattern = 1;
> > +}
> >/* (1) If SEL is a suitable mask as determined by
> >   valid_mask_for_fold_vec_perm_cst_p, then:
> >   res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
> > const vec_perm_indices ,
> >   res_npatterns = nelts in result vector.
> >   res_nelts_per_pattern = 1.
> >   This exception is made so that VLS ARG0, ARG1 and SEL work as before. 
> >  */
> > -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> >  {
> >res_npatterns
> > = std::max (VECTOR_CST_NPATTERNS (arg0),
> > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree 
> > arg1, const vec_perm_indices ,
> >  
> >res_nelts = res_npatterns * res_nelts_per_pattern;
> >  }
> > -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (_nelts))
> > -{
> > -  res_npatterns = res_nelts;
> > -  res_nelts_per_pattern = 1;
> > -}
> >else
> >  return NULL_TREE;
> >  
> > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c 
> > b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > new file mode 100644
> > index 000..475978aae2b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do run } */
> > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> > +
> > +typedef unsigned char u8;
> > +
> > +__attribute__((noipa))
> > +static void check(const u8 * v) {
> > +if (*v != 15) __builtin_trap();
> > +}
> > +
> > +__attribute__((noipa))
> > +static void bug(void) {
> > +u8 in_lanes[32];
> > +for (unsigned i = 0; i < 32; i += 2) {
> > +  in_lanes[i + 0] = 0;
> > +  in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> > +}
> > +
> > +check(_lanes[13]);
> > +  }
> > +
> > +int main() {
> > +bug();
> > +}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[committed] libstdc++: Fix incomplete rework of wchar_t support in std::format

2023-08-18 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

r14-3300-g023a62b77f999b left make_wformat_args and some uses of
std::wformat_context unguarded by _GLIBCXX_USE_WCHAR_T.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (operator<<): Use __format_context.
* include/std/format (__format::__format_context): New alias
template.
[!_GLIBCXX_USE_WCHAR_T] (wformat_args, make_wformat_arg):
Disable.
---
 libstdc++-v3/include/bits/chrono_io.h | 15 +--
 libstdc++-v3/include/std/format   | 13 +
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 05caa64fb7c..e16302baf84 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2263,8 +2263,7 @@ namespace __detail
 inline basic_ostream<_CharT, _Traits>&
 operator<<(basic_ostream<_CharT, _Traits>& __os, const day& __d)
 {
-  using _Ctx = __conditional_t,
-  format_context, wformat_context>;
+  using _Ctx = __format::__format_context<_CharT>;
   using _Str = basic_string_view<_CharT>;
   _Str __s = _GLIBCXX_WIDEN("{:02d} is not a valid day");
   if (__d.ok())
@@ -2291,8 +2290,7 @@ namespace __detail
 inline basic_ostream<_CharT, _Traits>&
 operator<<(basic_ostream<_CharT, _Traits>& __os, const month& __m)
 {
-  using _Ctx = __conditional_t,
-  format_context, wformat_context>;
+  using _Ctx = __format::__format_context<_CharT>;
   using _Str = basic_string_view<_CharT>;
   _Str __s = _GLIBCXX_WIDEN("{:L%b}{} is not a valid month");
   if (__m.ok())
@@ -2322,8 +2320,7 @@ namespace __detail
 inline basic_ostream<_CharT, _Traits>&
 operator<<(basic_ostream<_CharT, _Traits>& __os, const year& __y)
 {
-  using _Ctx = __conditional_t,
-  format_context, wformat_context>;
+  using _Ctx = __format::__format_context<_CharT>;
   using _Str = basic_string_view<_CharT>;
   _Str __s = _GLIBCXX_WIDEN("-{:04d} is not a valid year");
   if (__y.ok())
@@ -2355,8 +2352,7 @@ namespace __detail
 inline basic_ostream<_CharT, _Traits>&
 operator<<(basic_ostream<_CharT, _Traits>& __os, const weekday& __wd)
 {
-  using _Ctx = __conditional_t,
-  format_context, wformat_context>;
+  using _Ctx = __format::__format_context<_CharT>;
   using _Str = basic_string_view<_CharT>;
   _Str __s = _GLIBCXX_WIDEN("{:L%a}{} is not a valid weekday");
   if (__wd.ok())
@@ -2544,8 +2540,7 @@ namespace __detail
 operator<<(basic_ostream<_CharT, _Traits>& __os,
   const year_month_day& __ymd)
 {
-  using _Ctx = __conditional_t,
-  format_context, wformat_context>;
+  using _Ctx = __format::__format_context<_CharT>;
   using _Str = basic_string_view<_CharT>;
   _Str __s = _GLIBCXX_WIDEN("{:%F} is not a valid date");
   __os << std::vformat(__ymd.ok() ? __s.substr(0, 5) : __s,
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 13f700a10bf..0b1ae8201af 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -74,20 +74,23 @@ namespace __format
   // Output iterator that writes to a type-erase character sink.
   template
 class _Sink_iter;
+
+  template
+using __format_context = basic_format_context<_Sink_iter<_CharT>, _CharT>;
 } // namespace __format
 /// @endcond
 
-  using format_context
-= basic_format_context<__format::_Sink_iter, char>;
+  using format_context  = __format::__format_context;
 #ifdef _GLIBCXX_USE_WCHAR_T
-  using wformat_context
-= basic_format_context<__format::_Sink_iter, wchar_t>;
+  using wformat_context = __format::__format_context;
 #endif
 
   // [format.args], class template basic_format_args
   template class basic_format_args;
   using format_args = basic_format_args;
+#ifdef _GLIBCXX_USE_WCHAR_T
   using wformat_args = basic_format_args;
+#endif
 
   // [format.arguments], arguments
   // [format.arg], class template basic_format_arg
@@ -3505,12 +3508,14 @@ namespace __format
   return _Store(__fmt_args...);
 }
 
+#ifdef _GLIBCXX_USE_WCHAR_T
   /// Capture formatting arguments for use by `std::vformat` (for wide output).
   template
 [[nodiscard,__gnu__::__always_inline__]]
 inline auto
 make_wformat_args(_Args&&... __args) noexcept
 { return std::make_format_args(__args...); }
+#endif
 
 /// @cond undocumented
 namespace __format
-- 
2.41.0

Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-18 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> The following avoids running into somehow flawed logic in fold_vec_perm
> for non-VLA vectors.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
>   PR tree-optimization/111048
>   * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
>   vectors first.
>
>   * gcc.dg/torture/pr111048.c: New testcase.

Please don't do this as a permanent thing.  It was a deliberate choice
to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
logic gets more coverage.  Like you say, if something is wrong for VLS
then the chances are that it's also wrong for VLA.

Thanks,
Richard


> ---
>  gcc/fold-const.cc   | 12 ++--
>  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 
>  2 files changed, 30 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 5c51c9d91be..144fd7481b3 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
> const vec_perm_indices ,
>unsigned res_npatterns, res_nelts_per_pattern;
>unsigned HOST_WIDE_INT res_nelts;
>  
> +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (_nelts))
> +{
> +  res_npatterns = res_nelts;
> +  res_nelts_per_pattern = 1;
> +}
>/* (1) If SEL is a suitable mask as determined by
>   valid_mask_for_fold_vec_perm_cst_p, then:
>   res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
> const vec_perm_indices ,
>   res_npatterns = nelts in result vector.
>   res_nelts_per_pattern = 1.
>   This exception is made so that VLS ARG0, ARG1 and SEL work as before.  
> */
> -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
>  {
>res_npatterns
>   = std::max (VECTOR_CST_NPATTERNS (arg0),
> @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
> const vec_perm_indices ,
>  
>res_nelts = res_npatterns * res_nelts_per_pattern;
>  }
> -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (_nelts))
> -{
> -  res_npatterns = res_nelts;
> -  res_nelts_per_pattern = 1;
> -}
>else
>  return NULL_TREE;
>  
> diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c 
> b/gcc/testsuite/gcc.dg/torture/pr111048.c
> new file mode 100644
> index 000..475978aae2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> +
> +typedef unsigned char u8;
> +
> +__attribute__((noipa))
> +static void check(const u8 * v) {
> +if (*v != 15) __builtin_trap();
> +}
> +
> +__attribute__((noipa))
> +static void bug(void) {
> +u8 in_lanes[32];
> +for (unsigned i = 0; i < 32; i += 2) {
> +  in_lanes[i + 0] = 0;
> +  in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> +}
> +
> +check(_lanes[13]);
> +  }
> +
> +int main() {
> +bug();
> +}

Efficient Production Monitoring and Optimization for Your Company

2023-08-18 Thread Kurt Birky via Gcc-patches

Good morning

Do you know a system that not only monitors but also optimizes production, 
bringing constant income?

Thanks to the latest technologies and data analysis, our solution identifies 
areas of optimization, increasing efficiency and reducing costs. Our clients 
have recorded an increase in income by an average of 20%, and you can test it 
for 60 days for free today.

Please reply with a contact number if you want more details.


Best regards
Kurt Birky

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread chenxiaolong

在 2023-08-18五的 15:19 +0800，Xi Ruoyao写道：
> On Fri, 2023-08-18 at 15:05 +0800, Xi Ruoyao via Gcc-patches wrote:
> > On Fri, 2023-08-18 at 14:58 +0800, Xi Ruoyao via Gcc-patches wrote:
> > > On Fri, 2023-08-18 at 14:39 +0800, chenxiaolong wrote:
> > > > 在 2023-08-17四的 15:08 +，Joseph Myers写道：
> > > > > On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> > > > > 
> > > > > > So I guess we just need
> > > > > > 
> > > > > > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > > > > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > > > > 
> > > > > > etc. to map the "q" builtins to "f128" builtins if we
> > > > > > really need
> > > > > > the
> > > > > > "q" builtins.
> > > > > > 
> > > > > > Joseph: the problem here is many customers of LoongArch
> > > > > > CPUs wish
> > > > > > to
> > > > > > compile their old code with minimal change.  Is it
> > > > > > acceptable to
> > > > > > add
> > > > > > these builtin_define's like rs6000-c.cc?  Note "a new
> > > > > > architecture"
> > > > > > does
> > > > > > not mean we'll only compile post-C2x-era programs onto it.
> > > > > 
> > > > > The powerpc support for __float128 started in GCC 6,
> > > > > predating the
> > > > > support 
> > > > > for _FloatN type names, built-in functions etc. in GCC 7 -
> > > > > that's
> > > > > why 
> > > > > there's such backwards compatibility support there.  That
> > > > > name only
> > > > > exists 
> > > > > on a few architectures.
> > > > > 
> > > > > If people really want to compile code using the old
> > > > > __float128 names
> > > > > for 
> > > > > LoongArch I suppose you could have such #defines, but it
> > > > > would be
> > > > > better 
> > > > > for people to make their code use the standard names (as
> > > > > supported
> > > > > from 
> > > > > GCC 7 onwards, though only from GCC 13 in C++) and then put
> > > > > backwards 
> > > > > compatibility in their code for using the __float128 names if
> > > > > they
> > > > > want to 
> > > > > support the type with older GCC (GCC 6 or before for C; GCC
> > > > > 12 or
> > > > > before 
> > > > > for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> > > > > compatibility 
> > > > > in user code is more likely to be relevant for C++ than for
> > > > > C, given
> > > > > how 
> > > > > the C++ support was added to GCC much more recently.  (Note:
> > > > > I
> > > > > haven't 
> > > > > checked when other compilers added support for the _Float128
> > > > > name or
> > > > > associated built-in functions, whether for C or for C++,
> > > > > which might
> > > > > also 
> > > > > affect when user code wants such compatibility.)
> > > > > 
> > > > Thank you for your valuable comments. On the LoongArch
> > > > architecture,
> > > > the "__float128" type is associated with float128_type_node and
> > > > the "q"
> > > > suffix function is mapped to the "f128" function. This allows
> > > > compatibility with both "__float128" and "_Float128" types in
> > > > the GCC
> > > > compiler. The new code is modified as follows:
> > > >   Add the following to the loongarch-builtins.c file:
> > > > +lang_hooks.types.register_builtin_type (float128_type_node,
> > > > "__float128");
> > > >   Add the following to the loongarch-c.c file:
> > > > +builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > > +builtin_define ("__builtin_copysignq=__builtin_copysignf128");
> > > > +builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > > +builtin_define ("__builtin_nansq=__builtin_nansf128");
> > > > +builtin_define ("__builtin_infq=__builtin_inff128");
> > > > +builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
> > > > 
> > > >  The regression tests of the six functions were added without
> > > > problems.
> > > > However, the implementation of the __builtin_nansq() function
> > > > does not
> > > > get the result we want. The questions are as follows:
> > > >  x86_64:
> > > > _Float128 ret=__builtin_nansf128("NAN");
> > > > 
> > > > compiled to (with gcc test.c -O2 ):
> > > > .cfi_offset 1, -8
> > > > bl  %plt(__builtin_nansf128)
> > > > ..
> > > >  LoongArch:
> > > > _Float128 ret=__builtin_nansf128("NAN");
> > > >   compiled to (with gcc test.c -O2 ):
> > > > .cfi_offset 1, -8
> > > > bl  %plt(__builtin_nansf128)
> > > 
> > > It seems wrong.  It should be "bl %plt(nansf128)" instead,
> > > without the
> > > __builtin_ prefix so the implementation in libm (from Glibc) will
> > > be
> > > used instead.  AFAIK __builtin_nan and __builtin_nans are rarely
> > > called
> > > with a non-empty tagp so it's not worthy to inline the
> > > implementation
> > > for non-empty tagp here.
> > > 
> > > The same issue happens on x86_64:
> > > 
> > > call__builtin_nansf128@PLT
> > > 
> > > __builtin_nanf128 compiles correct:
> > > 
> > > callnanf128@PLT
> > > 
> > > I'll see if there is a ticket in https://gcc.gnu.org/bugzilla. 
> > > If not
> > > I'll create one.
> 
>

[PATCH] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-08-18 Thread Manos Anagnostakis

This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
to provide the requested behaviour for handling ldp and stp:

  /* Allow the tuning structure to disable LDP instruction formation
 from combining instructions (e.g., in peephole2).
 TODO: Implement fine-grained tuning control for LDP and STP:
   1. control policies for load and store separately;
   2. support the following policies:
  - default (use what is in the tuning structure)
  - always
  - never
  - aligned (only if the compiler can prove that the
load will be aligned to 2 * element_size)  */

It provides two new and concrete command-line options -mldp-policy and 
-mstp-policy
to give the ability to control load and store policies seperately as
stated in part 1 of the TODO.

The accepted values for both options are:
- default: Use the ldp/stp policy defined in the corresponding tuning
  structure.
- always: Emit ldp/stp regardless of alignment.
- never: Do not emit ldp/stp.
- aligned: In order to emit ldp/stp, first check if the load/store will
  be aligned to 2 * element_size.

gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (struct tune_params): Add
appropriate enums for the policies.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
options.
* config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
function to parse ldp-policy option.
(aarch64_parse_stp_policy): New function to parse stp-policy option.
(aarch64_override_options_internal): Call parsing functions.
(aarch64_operands_ok_for_ldpstp): Add option-value check and
alignment check and remove superseded ones
(aarch64_operands_adjust_ok_for_ldpstp): Add option-value check and
alignment check and remove superseded ones.
* config/aarch64/aarch64.opt: Add options.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ldp_aligned.c: New test.
* gcc.target/aarch64/ldp_always.c: New test.
* gcc.target/aarch64/ldp_never.c: New test.
* gcc.target/aarch64/stp_aligned.c: New test.
* gcc.target/aarch64/stp_always.c: New test.
* gcc.target/aarch64/stp_never.c: New test.

Signed-off-by: Manos Anagnostakis 
---

 gcc/config/aarch64/aarch64-protos.h   |  24 ++
 gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
 gcc/config/aarch64/aarch64.cc | 229 ++
 gcc/config/aarch64/aarch64.opt|   8 +
 .../gcc.target/aarch64/ldp_aligned.c  |  64 +
 gcc/testsuite/gcc.target/aarch64/ldp_always.c |  64 +
 gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  64 +
 .../gcc.target/aarch64/stp_aligned.c  |  60 +
 gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
 gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
 10 files changed, 580 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 70303d6fd95..be1d73490ed 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -568,6 +568,30 @@ struct tune_params
   /* Place prefetch struct pointer at the end to enable type checking
  errors when tune_params misses elements (e.g., from erroneous merges).  */
   const struct cpu_prefetch_tune *prefetch;
+/* An enum specifying how to handle load pairs using a fine-grained policy:
+   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
+   to at least double the alignment of the type.
+   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
+   - LDP_POLICY_NEVER: Do not emit ldp.  */
+
+  enum aarch64_ldp_policy_model
+  {
+LDP_POLICY_ALIGNED,
+LDP_POLICY_ALWAYS,
+LDP_POLICY_NEVER
+  } ldp_policy_model;
+/* An enum specifying how to handle store pairs using a fine-grained policy:
+   - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
+   to at least double the alignment of the type.
+   - STP_POLICY_ALWAYS: Emit stp regardless of alignment.
+   - STP_POLICY_NEVER: Do not emit stp.  */
+
+  enum aarch64_stp_policy_model
+  {
+STP_POLICY_ALIGNED,
+STP_POLICY_ALWAYS,
+STP_POLICY_NEVER
+  } stp_policy_model;
 };
 
 /* Classifies an address.
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 52112ba7c48..774568e9106 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def

[PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-18 Thread Richard Biener via Gcc-patches

The following avoids running into somehow flawed logic in fold_vec_perm
for non-VLA vectors.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

PR tree-optimization/111048
* fold-const.cc (fold_vec_perm_cst): Check for non-VLA
vectors first.

* gcc.dg/torture/pr111048.c: New testcase.
---
 gcc/fold-const.cc   | 12 ++--
 gcc/testsuite/gcc.dg/torture/pr111048.c | 24 
 2 files changed, 30 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 5c51c9d91be..144fd7481b3 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
const vec_perm_indices ,
   unsigned res_npatterns, res_nelts_per_pattern;
   unsigned HOST_WIDE_INT res_nelts;
 
+  if (TYPE_VECTOR_SUBPARTS (type).is_constant (_nelts))
+{
+  res_npatterns = res_nelts;
+  res_nelts_per_pattern = 1;
+}
   /* (1) If SEL is a suitable mask as determined by
  valid_mask_for_fold_vec_perm_cst_p, then:
  res_npatterns = max of npatterns between ARG0, ARG1, and SEL
@@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
const vec_perm_indices ,
  res_npatterns = nelts in result vector.
  res_nelts_per_pattern = 1.
  This exception is made so that VLS ARG0, ARG1 and SEL work as before.  */
-  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
+  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
 {
   res_npatterns
= std::max (VECTOR_CST_NPATTERNS (arg0),
@@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, 
const vec_perm_indices ,
 
   res_nelts = res_npatterns * res_nelts_per_pattern;
 }
-  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (_nelts))
-{
-  res_npatterns = res_nelts;
-  res_nelts_per_pattern = 1;
-}
   else
 return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c 
b/gcc/testsuite/gcc.dg/torture/pr111048.c
new file mode 100644
index 000..475978aae2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
+
+typedef unsigned char u8;
+
+__attribute__((noipa))
+static void check(const u8 * v) {
+if (*v != 15) __builtin_trap();
+}
+
+__attribute__((noipa))
+static void bug(void) {
+u8 in_lanes[32];
+for (unsigned i = 0; i < 32; i += 2) {
+  in_lanes[i + 0] = 0;
+  in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
+}
+
+check(_lanes[13]);
+  }
+
+int main() {
+bug();
+}
-- 
2.35.3

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread Xi Ruoyao via Gcc-patches

On Fri, 2023-08-18 at 15:05 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2023-08-18 at 14:58 +0800, Xi Ruoyao via Gcc-patches wrote:
> > On Fri, 2023-08-18 at 14:39 +0800, chenxiaolong wrote:
> > > 在 2023-08-17四的 15:08 +，Joseph Myers写道：
> > > > On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> > > > 
> > > > > So I guess we just need
> > > > > 
> > > > > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > > > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > > > 
> > > > > etc. to map the "q" builtins to "f128" builtins if we really need
> > > > > the
> > > > > "q" builtins.
> > > > > 
> > > > > Joseph: the problem here is many customers of LoongArch CPUs wish
> > > > > to
> > > > > compile their old code with minimal change.  Is it acceptable to
> > > > > add
> > > > > these builtin_define's like rs6000-c.cc?  Note "a new architecture"
> > > > > does
> > > > > not mean we'll only compile post-C2x-era programs onto it.
> > > > 
> > > > The powerpc support for __float128 started in GCC 6, predating the
> > > > support 
> > > > for _FloatN type names, built-in functions etc. in GCC 7 - that's
> > > > why 
> > > > there's such backwards compatibility support there.  That name only
> > > > exists 
> > > > on a few architectures.
> > > > 
> > > > If people really want to compile code using the old __float128 names
> > > > for 
> > > > LoongArch I suppose you could have such #defines, but it would be
> > > > better 
> > > > for people to make their code use the standard names (as supported
> > > > from 
> > > > GCC 7 onwards, though only from GCC 13 in C++) and then put
> > > > backwards 
> > > > compatibility in their code for using the __float128 names if they
> > > > want to 
> > > > support the type with older GCC (GCC 6 or before for C; GCC 12 or
> > > > before 
> > > > for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> > > > compatibility 
> > > > in user code is more likely to be relevant for C++ than for C, given
> > > > how 
> > > > the C++ support was added to GCC much more recently.  (Note: I
> > > > haven't 
> > > > checked when other compilers added support for the _Float128 name or
> > > > associated built-in functions, whether for C or for C++, which might
> > > > also 
> > > > affect when user code wants such compatibility.)
> > > > 
> > > Thank you for your valuable comments. On the LoongArch architecture,
> > > the "__float128" type is associated with float128_type_node and the "q"
> > > suffix function is mapped to the "f128" function. This allows
> > > compatibility with both "__float128" and "_Float128" types in the GCC
> > > compiler. The new code is modified as follows:
> > >   Add the following to the loongarch-builtins.c file:
> > > +lang_hooks.types.register_builtin_type (float128_type_node,
> > > "__float128");
> > >   Add the following to the loongarch-c.c file:
> > > +builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > +builtin_define ("__builtin_copysignq=__builtin_copysignf128");
> > > +builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > +builtin_define ("__builtin_nansq=__builtin_nansf128");
> > > +builtin_define ("__builtin_infq=__builtin_inff128");
> > > +builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
> > > 
> > >  The regression tests of the six functions were added without problems.
> > > However, the implementation of the __builtin_nansq() function does not
> > > get the result we want. The questions are as follows:
> > >  x86_64:
> > >     _Float128 ret=__builtin_nansf128("NAN");
> > > 
> > >     compiled to (with gcc test.c -O2 ):
> > > .cfi_offset 1, -8
> > > bl  %plt(__builtin_nansf128)
> > >     ..
> > >  LoongArch:
> > >     _Float128 ret=__builtin_nansf128("NAN");
> > >   compiled to (with gcc test.c -O2 ):
> > > .cfi_offset 1, -8
> > > bl  %plt(__builtin_nansf128)
> > 
> > It seems wrong.  It should be "bl %plt(nansf128)" instead, without the
> > __builtin_ prefix so the implementation in libm (from Glibc) will be
> > used instead.  AFAIK __builtin_nan and __builtin_nans are rarely called
> > with a non-empty tagp so it's not worthy to inline the implementation
> > for non-empty tagp here.
> > 
> > The same issue happens on x86_64:
> > 
> > call    __builtin_nansf128@PLT
> > 
> > __builtin_nanf128 compiles correct:
> > 
> > call    nanf128@PLT
> > 
> > I'll see if there is a ticket in https://gcc.gnu.org/bugzilla.  If not
> > I'll create one.

https://gcc.gnu.org/PR111058

> Alright, Glibc does not have a "nansf128" function yet.  Actually there
> is even no "nans" function for the plain double type.  So even a plain
> __builtin_nans("114") won't work too.
> 
> If we'll fix this, we need to do it in a generic, target-independent way
> (i. e. fix it all at once for all targets).
> 
> So for now, and for LoongArch specific code, the proper thing to do is
> aliasing float128_type_node as __float128 and the six
> __builtin_define's.
> 
>

Re: [PATCH] RISC-V: Fix -march error of zhinxmin testcases

2023-08-18 Thread Lehua Ding


On 2023/8/18 14:39, Robin Dapp wrote:

This little patch fixs the -march error of a zhinxmin testcase I added earlier
and an old zhinxmin testcase, since these testcases are for zhinxmin extension
and not zfhmin extension.


Arg, I should have noticed that ;)
OK, of course.

Regards
  Robin



Committed, thanks Robin.

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread Xi Ruoyao via Gcc-patches

On Fri, 2023-08-18 at 14:58 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2023-08-18 at 14:39 +0800, chenxiaolong wrote:
> > 在 2023-08-17四的 15:08 +，Joseph Myers写道：
> > > On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> > > 
> > > > So I guess we just need
> > > > 
> > > > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > > 
> > > > etc. to map the "q" builtins to "f128" builtins if we really need
> > > > the
> > > > "q" builtins.
> > > > 
> > > > Joseph: the problem here is many customers of LoongArch CPUs wish
> > > > to
> > > > compile their old code with minimal change.  Is it acceptable to
> > > > add
> > > > these builtin_define's like rs6000-c.cc?  Note "a new architecture"
> > > > does
> > > > not mean we'll only compile post-C2x-era programs onto it.
> > > 
> > > The powerpc support for __float128 started in GCC 6, predating the
> > > support 
> > > for _FloatN type names, built-in functions etc. in GCC 7 - that's
> > > why 
> > > there's such backwards compatibility support there.  That name only
> > > exists 
> > > on a few architectures.
> > > 
> > > If people really want to compile code using the old __float128 names
> > > for 
> > > LoongArch I suppose you could have such #defines, but it would be
> > > better 
> > > for people to make their code use the standard names (as supported
> > > from 
> > > GCC 7 onwards, though only from GCC 13 in C++) and then put
> > > backwards 
> > > compatibility in their code for using the __float128 names if they
> > > want to 
> > > support the type with older GCC (GCC 6 or before for C; GCC 12 or
> > > before 
> > > for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> > > compatibility 
> > > in user code is more likely to be relevant for C++ than for C, given
> > > how 
> > > the C++ support was added to GCC much more recently.  (Note: I
> > > haven't 
> > > checked when other compilers added support for the _Float128 name or
> > > associated built-in functions, whether for C or for C++, which might
> > > also 
> > > affect when user code wants such compatibility.)
> > > 
> > Thank you for your valuable comments. On the LoongArch architecture,
> > the "__float128" type is associated with float128_type_node and the "q"
> > suffix function is mapped to the "f128" function. This allows
> > compatibility with both "__float128" and "_Float128" types in the GCC
> > compiler. The new code is modified as follows:
> >   Add the following to the loongarch-builtins.c file:
> > +lang_hooks.types.register_builtin_type (float128_type_node,
> > "__float128");
> >   Add the following to the loongarch-c.c file:
> > +builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > +builtin_define ("__builtin_copysignq=__builtin_copysignf128");
> > +builtin_define ("__builtin_nanq=__builtin_nanf128");
> > +builtin_define ("__builtin_nansq=__builtin_nansf128");
> > +builtin_define ("__builtin_infq=__builtin_inff128");
> > +builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
> > 
> >  The regression tests of the six functions were added without problems.
> > However, the implementation of the __builtin_nansq() function does not
> > get the result we want. The questions are as follows:
> >  x86_64:
> >     _Float128 ret=__builtin_nansf128("NAN");
> > 
> >     compiled to (with gcc test.c -O2 ):
> > .cfi_offset 1, -8
> > bl  %plt(__builtin_nansf128)
> >     ..
> >  LoongArch:
> >     _Float128 ret=__builtin_nansf128("NAN");
> >   compiled to (with gcc test.c -O2 ):
> > .cfi_offset 1, -8
> > bl  %plt(__builtin_nansf128)
> 
> It seems wrong.  It should be "bl %plt(nansf128)" instead, without the
> __builtin_ prefix so the implementation in libm (from Glibc) will be
> used instead.  AFAIK __builtin_nan and __builtin_nans are rarely called
> with a non-empty tagp so it's not worthy to inline the implementation
> for non-empty tagp here.
> 
> The same issue happens on x86_64:
> 
> call    __builtin_nansf128@PLT
> 
> __builtin_nanf128 compiles correct:
> 
> call    nanf128@PLT
> 
> I'll see if there is a ticket in https://gcc.gnu.org/bugzilla.  If not
> I'll create one.

Alright, Glibc does not have a "nansf128" function yet.  Actually there
is even no "nans" function for the plain double type.  So even a plain
__builtin_nans("114") won't work too.

If we'll fix this, we need to do it in a generic, target-independent way
(i. e. fix it all at once for all targets).

So for now, and for LoongArch specific code, the proper thing to do is
aliasing float128_type_node as __float128 and the six
__builtin_define's.

Please commit them to trunk if regression test passes.  You need to also
add LoongArch as a target supporting __float128 in extend.texi.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread Xi Ruoyao via Gcc-patches

On Fri, 2023-08-18 at 14:39 +0800, chenxiaolong wrote:
> 在 2023-08-17四的 15:08 +，Joseph Myers写道：
> > On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> > 
> > > So I guess we just need
> > > 
> > > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > > 
> > > etc. to map the "q" builtins to "f128" builtins if we really need
> > > the
> > > "q" builtins.
> > > 
> > > Joseph: the problem here is many customers of LoongArch CPUs wish
> > > to
> > > compile their old code with minimal change.  Is it acceptable to
> > > add
> > > these builtin_define's like rs6000-c.cc?  Note "a new architecture"
> > > does
> > > not mean we'll only compile post-C2x-era programs onto it.
> > 
> > The powerpc support for __float128 started in GCC 6, predating the
> > support 
> > for _FloatN type names, built-in functions etc. in GCC 7 - that's
> > why 
> > there's such backwards compatibility support there.  That name only
> > exists 
> > on a few architectures.
> > 
> > If people really want to compile code using the old __float128 names
> > for 
> > LoongArch I suppose you could have such #defines, but it would be
> > better 
> > for people to make their code use the standard names (as supported
> > from 
> > GCC 7 onwards, though only from GCC 13 in C++) and then put
> > backwards 
> > compatibility in their code for using the __float128 names if they
> > want to 
> > support the type with older GCC (GCC 6 or before for C; GCC 12 or
> > before 
> > for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> > compatibility 
> > in user code is more likely to be relevant for C++ than for C, given
> > how 
> > the C++ support was added to GCC much more recently.  (Note: I
> > haven't 
> > checked when other compilers added support for the _Float128 name or
> > associated built-in functions, whether for C or for C++, which might
> > also 
> > affect when user code wants such compatibility.)
> > 
> Thank you for your valuable comments. On the LoongArch architecture,
> the "__float128" type is associated with float128_type_node and the "q"
> suffix function is mapped to the "f128" function. This allows
> compatibility with both "__float128" and "_Float128" types in the GCC
> compiler. The new code is modified as follows:
>   Add the following to the loongarch-builtins.c file:
> +lang_hooks.types.register_builtin_type (float128_type_node,
> "__float128");
>   Add the following to the loongarch-c.c file:
> +builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> +builtin_define ("__builtin_copysignq=__builtin_copysignf128");
> +builtin_define ("__builtin_nanq=__builtin_nanf128");
> +builtin_define ("__builtin_nansq=__builtin_nansf128");
> +builtin_define ("__builtin_infq=__builtin_inff128");
> +builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
> 
>  The regression tests of the six functions were added without problems.
> However, the implementation of the __builtin_nansq() function does not
> get the result we want. The questions are as follows:
>  x86_64:
>     _Float128 ret=__builtin_nansf128("NAN");
> 
>     compiled to (with gcc test.c -O2 ):
> .cfi_offset 1, -8
> bl  %plt(__builtin_nansf128)
>     ..
>  LoongArch:
>     _Float128 ret=__builtin_nansf128("NAN");
>   compiled to (with gcc test.c -O2 ):
> .cfi_offset 1, -8
> bl  %plt(__builtin_nansf128)

It seems wrong.  It should be "bl %plt(nansf128)" instead, without the
__builtin_ prefix so the implementation in libm (from Glibc) will be
used instead.  AFAIK __builtin_nan and __builtin_nans are rarely called
with a non-empty tagp so it's not worthy to inline the implementation
for non-empty tagp here.

The same issue happens on x86_64:

call__builtin_nansf128@PLT

__builtin_nanf128 compiles correct:

callnanf128@PLT

I'll see if there is a ticket in https://gcc.gnu.org/bugzilla.  If not
I'll create one.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] Support -march=gracemont

2023-08-18 Thread liuhongt via Gcc-patches

Alderlake-N is E-core only, add it as an alias of Alderlake.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Any comments?

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Detect
Alderlake-N.
* common/config/i386/i386-common.cc (alias_table): Support
-march=gracemont as an alias of -march=alderlake.
---
 gcc/common/config/i386/cpuinfo.h  | 3 +++
 gcc/common/config/i386/i386-common.cc | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 13102b9c5dc..941f728b48b 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -533,6 +533,9 @@ get_intel_cpu (struct __processor_model *cpu_model,
   cpu_model->__cpu_type = INTEL_COREI7;
   cpu_model->__cpu_subtype = INTEL_COREI7_TIGERLAKE;
   break;
+
+case 0xbe:
+  /* Alder Lake N, E-core only.  */
 case 0x97:
 case 0x9a:
   /* Alder Lake.  */
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 26005914079..8aa8bf12d76 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2190,6 +2190,8 @@ const pta processor_alias_table[] =
 M_CPU_TYPE (INTEL_GOLDMONT_PLUS), P_PROC_SSE4_2},
   {"tremont", PROCESSOR_TREMONT, CPU_HASWELL, PTA_TREMONT,
 M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2},
+  {"gracemont", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
+   M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST,
 M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2},
   {"grandridge", PROCESSOR_GRANDRIDGE, CPU_HASWELL, PTA_GRANDRIDGE,
-- 
2.31.1

Re: [PATCH] RISC-V: Fix -march error of zhinxmin testcases

2023-08-18 Thread Robin Dapp via Gcc-patches

> This little patch fixs the -march error of a zhinxmin testcase I added earlier
> and an old zhinxmin testcase, since these testcases are for zhinxmin extension
> and not zfhmin extension.

Arg, I should have noticed that ;)
OK, of course.

Regards
 Robin

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-18 Thread chenxiaolong

在 2023-08-17四的 15:08 +，Joseph Myers写道：
> On Thu, 17 Aug 2023, Xi Ruoyao via Gcc-patches wrote:
> 
> > So I guess we just need
> > 
> > builtin_define ("__builtin_fabsq=__builtin_fabsf128");
> > builtin_define ("__builtin_nanq=__builtin_nanf128");
> > 
> > etc. to map the "q" builtins to "f128" builtins if we really need
> > the
> > "q" builtins.
> > 
> > Joseph: the problem here is many customers of LoongArch CPUs wish
> > to
> > compile their old code with minimal change.  Is it acceptable to
> > add
> > these builtin_define's like rs6000-c.cc?  Note "a new architecture"
> > does
> > not mean we'll only compile post-C2x-era programs onto it.
> 
> The powerpc support for __float128 started in GCC 6, predating the
> support 
> for _FloatN type names, built-in functions etc. in GCC 7 - that's
> why 
> there's such backwards compatibility support there.  That name only
> exists 
> on a few architectures.
> 
> If people really want to compile code using the old __float128 names
> for 
> LoongArch I suppose you could have such #defines, but it would be
> better 
> for people to make their code use the standard names (as supported
> from 
> GCC 7 onwards, though only from GCC 13 in C++) and then put
> backwards 
> compatibility in their code for using the __float128 names if they
> want to 
> support the type with older GCC (GCC 6 or before for C; GCC 12 or
> before 
> for C++) on x86_64 / i386 / powerpc / ia64.  Such backwards
> compatibility 
> in user code is more likely to be relevant for C++ than for C, given
> how 
> the C++ support was added to GCC much more recently.  (Note: I
> haven't 
> checked when other compilers added support for the _Float128 name or 
> associated built-in functions, whether for C or for C++, which might
> also 
> affect when user code wants such compatibility.)
> 
Thank you for your valuable comments. On the LoongArch architecture,
the "__float128" type is associated with float128_type_node and the "q"
suffix function is mapped to the "f128" function. This allows
compatibility with both "__float128" and "_Float128" types in the GCC
compiler. The new code is modified as follows:
  Add the following to the loongarch-builtins.c file:
+lang_hooks.types.register_builtin_type (float128_type_node,
"__float128");
  Add the following to the loongarch-c.c file:
+builtin_define ("__builtin_fabsq=__builtin_fabsf128");
+builtin_define ("__builtin_copysignq=__builtin_copysignf128");
+builtin_define ("__builtin_nanq=__builtin_nanf128");
+builtin_define ("__builtin_nansq=__builtin_nansf128");
+builtin_define ("__builtin_infq=__builtin_inff128");
+builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");

 The regression tests of the six functions were added without problems.
However, the implementation of the __builtin_nansq() function does not
get the result we want. The questions are as follows:
 x86_64:
_Float128 ret=__builtin_nansf128("NAN");

compiled to (with gcc test.c -O2 ):
.cfi_offset 1, -8
bl  %plt(__builtin_nansf128)
..
 LoongArch:
_Float128 ret=__builtin_nansf128("NAN");
  compiled to (with gcc test.c -O2 ):
.cfi_offset 1, -8
bl  %plt(__builtin_nansf128)
..
   Obviously, there may have been legacy issues with the implementation
when "_Float128 __builtin_nansf128()" was first supported.
Architectures including LoongArch, x86_64, arm, etc. are no longer
supported, and some of the remaining architectures are unproven.
   I will continue to follow up the implementation of the builtin
function and complete the function.

Re: [PATCH] Document cond_neg, cond_one_cmpl, cond_len_neg and cond_len_one_cmpl standard patterns

2023-08-18 Thread Richard Biener via Gcc-patches

On Thu, Aug 17, 2023 at 9:26 PM Andrew Pinski via Gcc-patches
 wrote:
>
> When I added `cond_one_cmpl` (and the corresponding IFN) I had noticed 
> cond_neg
> standard named pattern was not documented and this adds the documentation for
> all 4 named patterns now.
>
> OK? Tested by building the manual.

OK.

> gcc/ChangeLog:
>
> * doc/md.texi (Standard patterns): Document cond_neg, cond_one_cmpl,
> cond_len_neg and cond_len_one_cmpl.
> ---
>  gcc/doc/md.texi | 62 +
>  1 file changed, 62 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 70590e68ffe..89562fdb43c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -7194,6 +7194,40 @@ move operand 2 or (operands 2 + operand 3) into 
> operand 0 according to the
>  comparison in operand 1.  If the comparison is false, operand 2 is moved into
>  operand 0, otherwise (operand 2 + operand 3) is moved.
>
> +@cindex @code{cond_neg@var{mode}} instruction pattern
> +@cindex @code{cond_one_cmpl@var{mode}} instruction pattern
> +@item @samp{cond_neg@var{mode}}
> +@itemx @samp{cond_one_cmpl@var{mode}}
> +When operand 1 is true, perform an operation on operands 2 and
> +store the result in operand 0, otherwise store operand 3 in operand 0.
> +The operation works elementwise if the operands are vectors.
> +
> +The scalar case is equivalent to:
> +
> +@smallexample
> +op0 = op1 ? @var{op} op2 : op3;
> +@end smallexample
> +
> +while the vector case is equivalent to:
> +
> +@smallexample
> +for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++)
> +  op0[i] = op1[i] ? @var{op} op2[i] : op3[i];
> +@end smallexample
> +
> +where, for example, @var{op} is @code{~} for @samp{cond_one_cmpl@var{mode}}.
> +
> +When defined for floating-point modes, the contents of @samp{op2[i]}
> +are not interpreted if @samp{op1[i]} is false, just like they would not
> +be in a normal C @samp{?:} condition.
> +
> +Operands 0, 2, and 3 all have mode @var{m}.  Operand 1 is a scalar
> +integer if @var{m} is scalar, otherwise it has the mode returned by
> +@code{TARGET_VECTORIZE_GET_MASK_MODE}.
> +
> +@samp{cond_@var{op}@var{mode}} generally corresponds to a conditional
> +form of @samp{@var{op}@var{mode}2}.
> +
>  @cindex @code{cond_add@var{mode}} instruction pattern
>  @cindex @code{cond_sub@var{mode}} instruction pattern
>  @cindex @code{cond_mul@var{mode}} instruction pattern
> @@ -7281,6 +7315,34 @@ for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++)
>op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i];
>  @end smallexample
>
> +@cindex @code{cond_len_neg@var{mode}} instruction pattern
> +@cindex @code{cond_len_one_cmpl@var{mode}} instruction pattern
> +@item @samp{cond_len_neg@var{mode}}
> +@itemx @samp{cond_len_one_cmpl@var{mode}}
> +When operand 1 is true and element index < operand 4 + operand 5, perform an 
> operation on operands 1 and
> +store the result in operand 0, otherwise store operand 2 in operand 0.
> +The operation only works for the operands are vectors.
> +
> +@smallexample
> +for (i = 0; i < ops[4] + ops[5]; i++)
> +  op0[i] = op1[i] ? @var{op} op2[i] : op3[i];
> +@end smallexample
> +
> +where, for example, @var{op} is @code{~} for 
> @samp{cond_len_one_cmpl@var{mode}}.
> +
> +When defined for floating-point modes, the contents of @samp{op2[i]}
> +are not interpreted if @samp{op1[i]} is false, just like they would not
> +be in a normal C @samp{?:} condition.
> +
> +Operands 0, 2, and 3 all have mode @var{m}.  Operand 1 is a scalar
> +integer if @var{m} is scalar, otherwise it has the mode returned by
> +@code{TARGET_VECTORIZE_GET_MASK_MODE}.  Operand 4 has whichever
> +integer mode the target prefers.
> +
> +@samp{cond_len_@var{op}@var{mode}} generally corresponds to a conditional
> +form of @samp{@var{op}@var{mode}2}.
> +
> +
>  @cindex @code{cond_len_add@var{mode}} instruction pattern
>  @cindex @code{cond_len_sub@var{mode}} instruction pattern
>  @cindex @code{cond_len_mul@var{mode}} instruction pattern
> --
> 2.31.1
>

Re: [PATCH V4 1/4] rs6000: build constant via li;rotldi

2023-08-18 Thread guojiufu via Gcc-patches




Hi Segher,

As discussed on "~" vs. "-",  "~" is correct for this patch.

I updated the patch according to Kewen's comments.

If ok,  I would commit to trunk.

BR,
Jeff (Jiufu Guo)


On 2023-07-04 11:28, Kewen.Lin wrote:

Hi Jeff,

on 2023/7/4 10:18, Jiufu Guo via Gcc-patches wrote:

Hi,

If a constant is possible to be rotated to/from a positive or negative
value from "li", then "li;rotldi" can be used to build the constant.

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621961.html
This patch just did minor changes to the style and comments.

Bootstrap and regtest pass on ppc64{,le}.

Since the previous version is approved with conditions, this version
explained the concern too.  If no objection, I would like to apply
this patch to trunk.


BR,
Jeff (Jiufu)

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New 
function.

(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 47 +--
 .../gcc.target/powerpc/const-build.c  | 57 
+++

 2 files changed, 98 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..acc332acc05 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,31 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }

+/* Check if value C can be built by 2 instructions: one is 'li', 
another is

+   rotldi.


Nit: different style, li is with "'" but rotldi isn't.


+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and 
*MASK

+   is set to the mask operand of rotldi(rldicl), and return true.
+   Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  /* If C or ~C contains at least 49 successive zeros, then C can be 
rotated
+ to/from a positive or negative value that 'li' is able to load.  
*/

+  int n;
+  if (can_be_rotated_to_lowbits (c, 15, )
+  || can_be_rotated_to_lowbits (~c, 15, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10291,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;

   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;

   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 
0x8000))

   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, 
HOST_WIDE_INT c)

   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - 
shift));

+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c

new file mode 100644
index 000..69b37e2bb53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Verify that two instructions are sucessfully used to build 
constants.


s/sucessfully/successfully/

+   One insn is li or lis, another is rotate: rldicl, rldicr or rldic. 
 */


Nit: This patch is for insn li + insn rldicl only, you probably want to 
keep

consistent in the comments.

The others look good to me, thanks!

Segher had one question on "~c" before, I saw you had explained for it, 
it
makes sense to me, but in case he has more questions I'd defer the 
final

approval to him.

BR,
Kewen

Re: [PATCH] i386: Add AVX2 pragma wrapper for AVX512DQVL intrins

2023-08-18 Thread Hongtao Liu via Gcc-patches

On Fri, Aug 18, 2023 at 2:01 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch aims to fix PR111051, which actually make sure that AVX2
> intrins are visible to AVX512/AVX10 intrins under any circumstances.
>
> I will also apply the same fix on AVX512DQ scalar intrins.
>
> Regtested on on x86_64-pc-linux-gnu. Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> PR target/111051
>
> gcc/ChangeLog:
>
> * config/i386/avx512vldqintrin.h: Push AVX2 when AVX2 is
> disabled.
>
> gcc/testsuite/ChangeLog:
>
> PR target/111051
> * gcc.target/i386/pr111051-1.c: New test.
> ---
>  gcc/config/i386/avx512vldqintrin.h | 11 +++
>  gcc/testsuite/gcc.target/i386/pr111051-1.c | 11 +++
>  2 files changed, 22 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr111051-1.c
>
> diff --git a/gcc/config/i386/avx512vldqintrin.h 
> b/gcc/config/i386/avx512vldqintrin.h
> index 1fbf93a0b52..db900ebf467 100644
> --- a/gcc/config/i386/avx512vldqintrin.h
> +++ b/gcc/config/i386/avx512vldqintrin.h
> @@ -28,6 +28,12 @@
>  #ifndef _AVX512VLDQINTRIN_H_INCLUDED
>  #define _AVX512VLDQINTRIN_H_INCLUDED
>
> +#if !defined(__AVX2__)
> +#pragma GCC push_options
> +#pragma GCC target("avx2")
> +#define __DISABLE_AVX2__
> +#endif /* __AVX2__ */
> +
>  extern __inline __m256i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_cvttpd_epi64 (__m256d __A)
> @@ -2002,4 +2008,9 @@ _mm256_maskz_insertf64x2 (__mmask8 __U, __m256d __A, 
> __m128d __B,
>
>  #endif
>
> +#ifdef __DISABLE_AVX2__
> +#undef __DISABLE_AVX2__
> +#pragma GCC pop_options
> +#endif /* __DISABLE_AVX2__ */
> +
>  #endif /* _AVX512VLDQINTRIN_H_INCLUDED */
> diff --git a/gcc/testsuite/gcc.target/i386/pr111051-1.c 
> b/gcc/testsuite/gcc.target/i386/pr111051-1.c
> new file mode 100644
> index 000..973007043cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr111051-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +
> +#include 
> +
> +#pragma GCC target("avx512vl,avx512dq")
> +
> +void foo (__m256i i)
> +{
> +  volatile __m256d v1 = _mm256_cvtepi64_pd (i);
> +}
> +
> --
> 2.31.1
>


-- 
BR,
Hongtao

[PATCH] i386: Add AVX2 pragma wrapper for AVX512DQVL intrins

2023-08-18 Thread Haochen Jiang via Gcc-patches

Hi all,

This patch aims to fix PR111051, which actually make sure that AVX2
intrins are visible to AVX512/AVX10 intrins under any circumstances.

I will also apply the same fix on AVX512DQ scalar intrins.

Regtested on on x86_64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen

PR target/111051

gcc/ChangeLog:

* config/i386/avx512vldqintrin.h: Push AVX2 when AVX2 is
disabled.

gcc/testsuite/ChangeLog:

PR target/111051
* gcc.target/i386/pr111051-1.c: New test.
---
 gcc/config/i386/avx512vldqintrin.h | 11 +++
 gcc/testsuite/gcc.target/i386/pr111051-1.c | 11 +++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111051-1.c

diff --git a/gcc/config/i386/avx512vldqintrin.h 
b/gcc/config/i386/avx512vldqintrin.h
index 1fbf93a0b52..db900ebf467 100644
--- a/gcc/config/i386/avx512vldqintrin.h
+++ b/gcc/config/i386/avx512vldqintrin.h
@@ -28,6 +28,12 @@
 #ifndef _AVX512VLDQINTRIN_H_INCLUDED
 #define _AVX512VLDQINTRIN_H_INCLUDED
 
+#if !defined(__AVX2__)
+#pragma GCC push_options
+#pragma GCC target("avx2")
+#define __DISABLE_AVX2__
+#endif /* __AVX2__ */
+
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cvttpd_epi64 (__m256d __A)
@@ -2002,4 +2008,9 @@ _mm256_maskz_insertf64x2 (__mmask8 __U, __m256d __A, 
__m128d __B,
 
 #endif
 
+#ifdef __DISABLE_AVX2__
+#undef __DISABLE_AVX2__
+#pragma GCC pop_options
+#endif /* __DISABLE_AVX2__ */
+
 #endif /* _AVX512VLDQINTRIN_H_INCLUDED */
diff --git a/gcc/testsuite/gcc.target/i386/pr111051-1.c 
b/gcc/testsuite/gcc.target/i386/pr111051-1.c
new file mode 100644
index 000..973007043cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111051-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+#include 
+
+#pragma GCC target("avx512vl,avx512dq")
+
+void foo (__m256i i)
+{
+  volatile __m256d v1 = _mm256_cvtepi64_pd (i);
+}
+
-- 
2.31.1

84 matches

Mail list logo