Re: [PATCH] Predefine __STRICT_ALIGN__ if STRICT_ALIGNMENT

2024-03-16 Thread Sam James
YunQiang Su  writes:

> Arm32 predefines __ARM_FEATURE_UNALIGNED if -mno-unaligned-access,
> and RISC-V predefines __riscv_misaligned_avoid, while other ports
> that support -mstrict-align/-mno-unaligned-access don't have such
> macro, and these backend macros are only avaiable for c-family.
> Note: Arm64 always predefine __ARM_FEATURE_UNALIGNED: See #111555.

I would say tag the bug even if you're not fixing it, as it was related
enough for you to cite it.

>
> Let's add a generic one.
>
> __STRICT_ALIGN__ is used instead of __STRICT_ALIGNMENT__, due to that
> the later is used by some softwares, such as lzo2, syslinux etc.
>
> gcc
>   * cppbuiltin.cc: Predefine __STRICT_ALIGNMENT__ if
>   STRICT_ALIGNMENT.
> ---
>  gcc/cppbuiltin.cc | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/cppbuiltin.cc b/gcc/cppbuiltin.cc
> index c4bfc2917dc..d32efdf9a07 100644
> --- a/gcc/cppbuiltin.cc
> +++ b/gcc/cppbuiltin.cc
> @@ -123,6 +123,9 @@ define_builtin_macros_for_compilation_flags (cpp_reader 
> *pfile)
>  
>cpp_define_formatted (pfile, "__FINITE_MATH_ONLY__=%d",
>   flag_finite_math_only);
> +
> +  if (STRICT_ALIGNMENT)
> +cpp_define (pfile, "__STRICT_ALIGNMENT__");
>  }


[PATCH] Predefine __STRICT_ALIGN__ if STRICT_ALIGNMENT

2024-03-16 Thread YunQiang Su
Arm32 predefines __ARM_FEATURE_UNALIGNED if -mno-unaligned-access,
and RISC-V predefines __riscv_misaligned_avoid, while other ports
that support -mstrict-align/-mno-unaligned-access don't have such
macro, and these backend macros are only avaiable for c-family.
Note: Arm64 always predefine __ARM_FEATURE_UNALIGNED: See #111555.

Let's add a generic one.

__STRICT_ALIGN__ is used instead of __STRICT_ALIGNMENT__, due to that
the later is used by some softwares, such as lzo2, syslinux etc.

gcc
* cppbuiltin.cc: Predefine __STRICT_ALIGNMENT__ if
STRICT_ALIGNMENT.
---
 gcc/cppbuiltin.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cppbuiltin.cc b/gcc/cppbuiltin.cc
index c4bfc2917dc..d32efdf9a07 100644
--- a/gcc/cppbuiltin.cc
+++ b/gcc/cppbuiltin.cc
@@ -123,6 +123,9 @@ define_builtin_macros_for_compilation_flags (cpp_reader 
*pfile)
 
   cpp_define_formatted (pfile, "__FINITE_MATH_ONLY__=%d",
flag_finite_math_only);
+
+  if (STRICT_ALIGNMENT)
+cpp_define (pfile, "__STRICT_ALIGNMENT__");
 }
 
 
-- 
2.39.2



Re: [PATCH] c-c++-common/Wrestrict.c: fix some typos and enable for LLP64

2024-03-16 Thread NightStrike
On Sat, Mar 2, 2024, 08:24 Jonathan Yong <10wa...@gmail.com> wrote:

> On 2/15/24 14:08, Jonathan Yong wrote:
> > Attached patch OK?
> >
> > Copy/pasted for review convenience.
> Ping.
>

Pinging this for Jon. It's been a couple months, and this should be mostly
obvious if someone has a spare minute.

>


[pushed] Darwin: Fix bootstrap break on X86.

2024-03-16 Thread Iain Sandoe
Tested on x86_64-darwin21, pushed to the branch, thanks
Iain

--- 8< ---

The changes in r12-9536-gefcca6481eab18 mangled the whitespace in
the ENDFILE_SPEC i386/darwin.h leading to a bootstrap fail.

gcc/ChangeLog:

* config/i386/darwin.h (ENDFILE_SPEC): Fix whitespace.

Signed-off-by: Iain Sandoe 
---
 gcc/config/i386/darwin.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index 2f773924d6e..12cdc34a19e 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -109,8 +109,8 @@ along with GCC; see the file COPYING3.  If not see
 "%{!force_cpusubtype_ALL:-force_cpusubtype_ALL} "
 
 #undef ENDFILE_SPEC
-#define ENDFILE_SPEC
-\  
"%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
 \
+#define ENDFILE_SPEC \
+  
"%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
 \
%{mpc32:crtprec32.o%s} \
%{mpc64:crtprec64.o%s} \
%{mpc80:crtprec80.o%s}" TM_DESTRUCTOR
-- 
2.39.2 (Apple Git-143)



[PATCH] LoongArch: gcc13: Implement option save/restore.

2024-03-16 Thread Lulu Cheng
LTO option streaming and target attributes both require per-function
target configuration, which is achieved via option save/restore.

We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target
context in addition to other automatically maintained option states
(via the "Save" option property in the .opt files).

PR target/113233

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark options with
the "Save" property.
* config/loongarch/loongarch-opts.cc
(loongarch_update_gcc_opt_status): Update the value of the
la_target to global_options.
* config/loongarch/loongarch-opts.h
(loongarch_update_gcc_opt_status): Add a function declaration.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Call the function
loongarch_update_gcc_opt_status.
(loongarch_option_save): New functions.
(loongarch_option_restore): Likewise.
(TARGET_OPTION_SAVE): Define macro.
(TARGET_OPTION_RESTORE): Likewise.
* config/loongarch/loongarch.opt: Regenerate.
---
 gcc/config/loongarch/genopts/loongarch.opt.in | 24 ++---
 gcc/config/loongarch/loongarch-opts.cc| 22 
 gcc/config/loongarch/loongarch-opts.h |  6 
 gcc/config/loongarch/loongarch.cc | 34 +--
 gcc/config/loongarch/loongarch.opt| 24 ++---
 5 files changed, 84 insertions(+), 26 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 76acd35d39c..aea4f2a4f61 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -58,7 +58,7 @@ EnumValue
 Enum(isa_ext_fpu) String(@@STR_ISA_EXT_FPU64@@) Value(ISA_EXT_FPU64)
 
 m@@OPTSTR_ISA_EXT_FPU@@=
-Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN) Save
 -m@@OPTSTR_ISA_EXT_FPU@@=FPU   Generate code for the given FPU.
 
 m@@OPTSTR_ISA_EXT_FPU@@=@@STR_ISA_EXT_FPU0@@
@@ -92,11 +92,11 @@ EnumValue
 Enum(cpu_type) String(@@STR_CPU_LA464@@) Value(CPU_LA464)
 
 m@@OPTSTR_ARCH@@=
-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN) Save
 -m@@OPTSTR_ARCH@@=PROCESSORGenerate code for the given PROCESSOR ISA.
 
 m@@OPTSTR_TUNE@@=
-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN) Save
 -m@@OPTSTR_TUNE@@=PROCESSORGenerate optimized code for PROCESSOR.
 
 
@@ -127,31 +127,31 @@ int la_opt_abi_ext = M_OPTION_NOT_SEEN
 
 
 mbranch-cost=
-Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
+Target RejectNegative Joined UInteger Var(loongarch_branch_cost) Save
 -mbranch-cost=COST Set the cost of branches to roughly COST instructions.
 
 mcheck-zero-division
-Target Mask(CHECK_ZERO_DIV)
+Target Mask(CHECK_ZERO_DIV) Save
 Trap on integer divide by zero.
 
 mcond-move-int
-Target Var(TARGET_COND_MOVE_INT) Init(1)
+Target Var(TARGET_COND_MOVE_INT) Init(1) Save
 Conditional moves for integral are enabled.
 
 mcond-move-float
-Target Var(TARGET_COND_MOVE_FLOAT) Init(1)
+Target Var(TARGET_COND_MOVE_FLOAT) Init(1) Save
 Conditional moves for float are enabled.
 
 mmemcpy
-Target Mask(MEMCPY)
+Target Mask(MEMCPY) Save
 Prevent optimizing block moves, which is also the default behavior of -Os.
 
 mstrict-align
-Target Var(TARGET_STRICT_ALIGN) Init(0)
+Target Var(TARGET_STRICT_ALIGN) Init(0) Save
 Do not generate unaligned memory accesses.
 
 mmax-inline-memcpy-size=
-Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024)
+Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024) Save
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
@@ -182,11 +182,11 @@ EnumValue
 Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) Value(CMODEL_EXTREME)
 
 mcmodel=
-Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
+Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL) Save
 Specify the code model.
 
 mdirect-extern-access
-Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0)
+Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0) Save
 Avoid using the GOT to access external symbols.
 
 mrelax
diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index a52e25236ea..e158de9a12f 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -594,3 +594,25 @@ multilib_enabled_abi_list ()
 
   return XOBFINISH (&msg_obstack, const char *);
 }
+
+/* option status 

[PATCH] LoongArch: gcc12: Implement option save/restore.

2024-03-16 Thread Lulu Cheng
LTO option streaming and target attributes both require per-function
target configuration, which is achieved via option save/restore.

We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target
context in addition to other automatically maintained option states
(via the "Save" option property in the .opt files).

PR target/113233

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Mark options with
the "Save" property.
* config/loongarch/loongarch-opts.cc
(loongarch_update_gcc_opt_status): Update the value of the
la_target to global_options.
* config/loongarch/loongarch-opts.h
(loongarch_update_gcc_opt_status): Add a function declaration.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Call the function
loongarch_update_gcc_opt_status.
(loongarch_option_save): New functions.
(loongarch_option_restore): Likewise.
(TARGET_OPTION_SAVE): Define macro.
(TARGET_OPTION_RESTORE): Likewise.
* config/loongarch/loongarch.opt: Regenerate.
---
 gcc/config/loongarch/genopts/loongarch.opt.in | 22 ++--
 gcc/config/loongarch/loongarch-opts.cc| 22 
 gcc/config/loongarch/loongarch-opts.h |  6 
 gcc/config/loongarch/loongarch.cc | 34 +--
 gcc/config/loongarch/loongarch.opt| 22 ++--
 5 files changed, 82 insertions(+), 24 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 420a3941b3b..a3107cb2294 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -58,7 +58,7 @@ EnumValue
 Enum(isa_ext_fpu) String(@@STR_ISA_EXT_FPU64@@) Value(ISA_EXT_FPU64)
 
 m@@OPTSTR_ISA_EXT_FPU@@=
-Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) 
Init(M_OPTION_NOT_SEEN) Save
 -m@@OPTSTR_ISA_EXT_FPU@@=FPU   Generate code for the given FPU.
 
 m@@OPTSTR_ISA_EXT_FPU@@=@@STR_ISA_EXT_FPU0@@
@@ -92,11 +92,11 @@ EnumValue
 Enum(cpu_type) String(@@STR_CPU_LA464@@) Value(CPU_LA464)
 
 m@@OPTSTR_ARCH@@=
-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) 
Init(M_OPTION_NOT_SEEN) Save
 -m@@OPTSTR_ARCH@@=PROCESSORGenerate code for the given PROCESSOR ISA.
 
 m@@OPTSTR_TUNE@@=
-Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN)
+Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) 
Init(M_OPTION_NOT_SEEN) Save
 -m@@OPTSTR_TUNE@@=PROCESSORGenerate optimized code for PROCESSOR.
 
 
@@ -127,31 +127,31 @@ int la_opt_abi_ext = M_OPTION_NOT_SEEN
 
 
 mbranch-cost=
-Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
+Target RejectNegative Joined UInteger Var(loongarch_branch_cost) Save
 -mbranch-cost=COST Set the cost of branches to roughly COST instructions.
 
 mcheck-zero-division
-Target Mask(CHECK_ZERO_DIV)
+Target Mask(CHECK_ZERO_DIV) Save
 Trap on integer divide by zero.
 
 mcond-move-int
-Target Var(TARGET_COND_MOVE_INT) Init(1)
+Target Var(TARGET_COND_MOVE_INT) Init(1) Save
 Conditional moves for integral are enabled.
 
 mcond-move-float
-Target Var(TARGET_COND_MOVE_FLOAT) Init(1)
+Target Var(TARGET_COND_MOVE_FLOAT) Init(1) Save
 Conditional moves for float are enabled.
 
 mmemcpy
-Target Mask(MEMCPY)
+Target Mask(MEMCPY) Save
 Prevent optimizing block moves, which is also the default behavior of -Os.
 
 mstrict-align
-Target Var(TARGET_STRICT_ALIGN) Init(0)
+Target Var(TARGET_STRICT_ALIGN) Init(0) Save
 Do not generate unaligned memory accesses.
 
 mmax-inline-memcpy-size=
-Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024)
+Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024) Save
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 ; The code model option names for -mcmodel.
@@ -175,7 +175,7 @@ EnumValue
 Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) Value(CMODEL_EXTREME)
 
 mcmodel=
-Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL)
+Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) 
Init(CMODEL_NORMAL) Save
 Specify the code model.
 
 mrelax
diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index eb9c2a52f9e..b55baeccd2f 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -575,3 +575,25 @@ multilib_enabled_abi_list ()
 
   return XOBFINISH (&msg_obstack, const char *);
 }
+
+/* option status feedback for "gcc --help=target -Q" */
+void
+loongarch_update_gcc_opt_status (struct loongarch_target *target,
+struct gc

Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-03-16 Thread Jeff Law




On 3/16/24 11:35 AM, Vineet Gupta wrote:

... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
  - avoid an additional LUI insn
  - side benefits of not clobbering a reg

e.g.
 w/o patch w/ patch
long  | |
plus(unsigned long i) | li  a5,4096 |
{ | addia5,a5,-2032 | addi a0, a0, 2047
return i + 2064;   |add a0,a0,a5| addi a0, a0, 17
} | ret | ret

NOTE: In theory not having const in a standalone reg might seem less
   CSE friendly, but for workloads in consideration these mat are
   from very late LRA reloads and follow on GCSE is not doing much
   currently.
As you note, there's a bit of natural tension between what to expose and 
when.  There's no clear cut answer and I might argue there never will be 
given the design and implementation of various parts of the RTL pipeline.


We have some ports that expose early and just say "tough" if it's a 
minor loss in some cases.  Others choose to expose late.  We're kindof a 
mix of both in RISC-V land.   The limited offsets we've got will tend to 
make this problem a bit worse for RISC-V compared to other architectures.






The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.
Yea, this is a known sore spot.  I spent a good deal of time working on 
the PA where we only have a 5 bit displacement for FP memory ops.  You 
can assume this caused reload fits and often resulted in horrific (and 
repetitive code) code.


We did some interesting tricks to avoid the worst of the codegen issues. 
 None of those tricks really apply here since we're in an LRA world and 
there's no difference in the allowed offset in a memory reference vs an 
addi instruction.






This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.

That's a fantastic result.



This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs.
Agreed.  But I think it's worth tackling from both ends.  Generate 
better code when we do have these reloads and avoid the reloads when we can.





Implementation Details (for posterity)
--
  - basic idea is to have a splitter selected via a new predicate for constant
being possible sum of two S12 and provide the transform.
This is however a 2 -> 2 transform which combine can't handle.
So we specify it using a define_insn_and_split.
Right.  For the record it looks like a 2->2 case because of the 
mvconst_internal define_insn_and_split.


What I was talking about in the Tuesday meeting last week was the desire 
to rethink that pattern precisely because it drives us to need to 
implement patterns like yours rather than the more natural 3->2 or 4->3/2.


Furthermore, I have a suspicion that there's logicals where we're going 
to want to do something similar and if we keep the mvconst_internal 
pattern they're all going to have to be implemented as 2->2s with a 
define_insn_and_split rather than the more natural 3->2.


Having said all that, I suspect the case you're diving into is far more 
important than the logicals.






| linux/scripts/bloat-o-meter build-gcc-240131/target/lib/libc.so.6 \
  build-gcc-240131-new-splitter-1-variant/target/lib/libc.so.6
|
| add/remove: 0/0 grow/shrink: 21/49 up/down: 520/-3056 (-2536)
| Function old new   delta
| getnameinfo 27562892+136
...
| tempnam  136 144  +8
| padzero  276 284  +8
...
| __GI___register_printf_specifier 284 280  -4
| __EI_xdr_array   468 464  -4
| try_file_lock268 260  -8
| pthread_create@GLIBC_2  35203508 -12
| __pthread_create_2_135203508 -12
...
| _nss_files_setnetgrent   932 904 -28
| _nss_dns_gethostbyaddr2_r   15241480 -44
| build_trtable   33123208-104
| printf_positional  25000   22580   -2420
| Total: Before=2107999, After=2105463, chg -0.12%
That's a bi

Re: [gcc-15 3/3] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-03-16 Thread Jeff Law




On 3/16/24 11:35 AM, Vineet Gupta wrote:

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

gcc-13.1 release   |  gcc 230823 |   |
   |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.
Yea, wouldn't expect this to move the needle on spec since it's just 
hitting the prologue/epilogue.  In fact, I wouldn't be surprised if 
there were other stack frame sizes that could be improved.  But I 
wouldn't bother chasing down those other cases.


If we think about the embedded space, they're probably not going to want 
to see functions with large frames to begin with.  So optimizing those 
cases for the embedded space just doesn't make much sense.


In the distro space, by this time next year we'll be living in a world 
where stack clash mitigations are enabled.  So for any given size stack 
frame, it'll be allocated in at most 1 page chunks.   So again, going to 
any significant length to optimize other cases just doesn't make much sense.


So we probably should go with this patch in the gcc-15 space, but I 
wouldn't suggest heroic efforts for other sized stack frames.


jeff


Re: [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

2024-03-16 Thread Jeff Law




On 3/16/24 11:35 AM, Vineet Gupta wrote:

Noticed that new sum of two s12 splitter was generating following:

| 059930 :
|   59930:  add sp,sp,-16
|   59934:  lui t0,0xf
|   59938:  sd  s0,0(sp)
|   5993c:  sd  ra,8(sp)
|   59940:  add sp,sp,t0
|   59944:  add s0,sp,2047  <
|   59948:  mv  a2,a0
|   5994c:  mv  a3,a1
|   59950:  mv  a0,sp
|   59954:  li  a4,1
|   59958:  lui a1,0x1
|   5995c:  add s0,s0,1 <---
|   59960:  jal 59a3c

SP here becomes unaligned, even if transitively which is undesirable as
well as incorrect:
  - ABI requires stack to be 8 byte aligned
  - asm code looks weird and unexpected
  - to the user it might falsely seem like a compiler bug even when not,
specially when staring at asm for debugging unrelated issue.
It's not ideal, but I think it's still ABI compliant as-is.  If it 
wasn't, then I suspect things like virtual origins in Ada couldn't be 
made ABI compliant.





Fix this by using 2032+addend idiom when handling register operands
related to stack. This only affects positive S12 values, negative values
are already -2048 based.

Unfortunately implementation requires making a copy of splitter, since
it needs different varaints of predicate and constraint which cant be
done conditionally in the same MD pattern (constraint with restricted
range prevents LRA from allowing such insn despite new predicate)

With the patch, we get following correct code instead:

| ..
| 59944:add s0,sp,2032
| ..
| 5995c:add s0,s0,16
Alternately you could tighten the positive side of the range of the 
splitter from patch 1/3 so that you could always use 2032 rather than 
2047 on the first addi.   ie instead of allowing 2048..4094, allow 
2048..4064.


I don't have a strong opinion on that vs the direction you've gone here.


Jeff


[gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-03-16 Thread Vineet Gupta
... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
 - avoid an additional LUI insn
 - side benefits of not clobbering a reg

e.g.
w/o patch w/ patch
long  | |
plus(unsigned long i) | li  a5,4096 |
{ | addia5,a5,-2032 | addi a0, a0, 2047
   return i + 2064;   | add a0,a0,a5| addi a0, a0, 17
} | ret | ret

NOTE: In theory not having const in a standalone reg might seem less
  CSE friendly, but for workloads in consideration these mat are
  from very late LRA reloads and follow on GCSE is not doing much
  currently.

The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.

This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.

1,214,172,747,858 | 1,212,530,889,930 |  0.14%  <-
  740,618,658,160 |   739,515,555,243 |  0.15%  <-
  692,128,469,436 |   691,175,291,593 |  0.14%  <-
  190,811,495,310 |   190,854,437,311 | -0.02%
  225,751,808,189 |   225,818,881,019 | -0.03%
  220,364,237,915 |   220,405,868,038 | -0.02%
  179,157,361,261 |   179,186,059,187 | -0.02%
  219,313,921,837 |   219,337,232,829 | -0.01%
  278,733,265,382 |   278,733,264,380 |  0.00%
  442,397,437,578 |   442,397,436,026 |  0.00%
  344,112,144,242 |   344,112,142,910 |  0.00%
  417,561,390,319 |   417,561,388,877 |  0.00%
  669,319,255,911 |   669,318,761,114 |  0.00%
2,852,781,330,203 | 2,564,750,360,011 | 10.10% <--
1,855,884,349,001 | 1,855,881,122,280 |  0.00%
1,653,856,904,409 | 1,653,733,205,926 |  0.01%
2,974,347,350,401 | 2,974,174,999,505 |  0.01%
1,158,337,609,995 | 1,158,337,608,826 |  0.00%
1,021,799,191,806 | 1,021,425,634,319 |  0.04%
1,716,448,413,824 | 1,714,912,501,736 |  0.09%
  849,330,283,510 |   849,321,128,484 |  0.00%
  276,556,968,005 |   276,232,659,282 |  0.12%  <-
  913,352,953,686 |   912,822,592,719 |  0.06%
  903,792,550,116 |   903,107,637,917 |  0.08%
1,654,904,617,482 | 1,655,327,175,238 | -0.03%
1,495,841,421,311 | 1,493,418,761,599 |  0.16%  <-
1,641,969,530,523 | 1,642,126,596,979 | -0.01%
2,546,170,307,385 | 2,546,151,690,122 |  0.00%
1,983,551,321,388 | 1,983,525,296,345 |  0.00%
1,516,077,036,742 | 1,516,076,833,481 |  0.00%
2,055,868,055,165 | 2,055,865,373,175 |  0.00%
1,000,621,723,807 | 1,000,601,360,859 |  0.00%
1,024,149,313,694 | 1,024,127,472,779 |  0.00%
  363,827,037,541 |   363,057,012,239 |  0.21%  <-
  906,649,110,459 |   905,928,886,712 |  0.08%
  509,023,896,044 |   508,140,354,911 |  0.17%  <-
  403,238,430 |   403,238,485 |  0.00%
  403,238,430 |   403,238,485 |  0.00%

This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs.

Implementation Details (for posterity)
--
 - basic idea is to have a splitter selected via a new predicate for constant
   being possible sum of two S12 and provide the transform.
   This is however a 2 -> 2 transform which combine can't handle.
   So we specify it using a define_insn_and_split.

 - the initial loose "i" constraint caused LRA to accept invalid insns thus
   needing a tighter new constraint as well.

 - An additional fallback alternate with catch-all "r" register
   constraint also needed to allow any "reloads" that LRA might
   require for ADDI with const larger than S12.

Testing

This is testsuite clean (rv64 only).
I'll rely on post-commit CI multlib run for any possible fallout for
other setups such as rv32.

||  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/  lp64d/ medlow |  181 /44 |4 / 1 | 
  72 /12 |
| rv64imafdc_zba_zbb_zbs_zicond/  lp64d/ medlow |  181 /44 |4 / 1 | 
  73 /13 |
|

I also threw this into a buildroot run, it obviously boots Linux to
userspace. bloat-o-meter on glibc and kernel show overall decrease in
staic instruction counts with some minor spot increases.
These are generally in the category of

 - LUI + ADDI are 2 byte each vs. two ADD being 4 byte each.
 - Knock on effects due to inlining changes.
 - Sometimes the slightly shorter 2-insn seq in a mult-exit function
   can cause in-place epilogue duplication (vs. a jump back).
   This is slightly larger but more efficient in execution.
In summary nothing to fret abo

[gcc-15 3/3] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-03-16 Thread Vineet Gupta
If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

   gcc-13.1 release   |  gcc 230823 |   |
  |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv.cc | 80 +--
 gcc/testsuite/gcc.target/riscv/pr105733.c | 15 
 .../riscv/rvv/autovec/vls/spill-1.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-2.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-3.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-4.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-5.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-6.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-7.c   |  4 +-
 10 files changed, 104 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index f9e407bf5768..8b3f7ce181cc 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -165,6 +165,8 @@ extern void riscv_subword_address (rtx, rtx *, rtx *, rtx 
*, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 extern bool riscv_reg_frame_related (rtx);
+extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, bool,
+   HOST_WIDE_INT *, HOST_WIDE_INT *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38aebefa2590..b4a7947433a7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3810,6 +3810,38 @@ riscv_split_doubleword_move (rtx dest, rtx src)
riscv_emit_move (riscv_subword (dest, true), riscv_subword (src, true));
  }
 }
+
+/* Constant VAL is known to be sum of two S12 constants.  Break it into
+   comprising BASE and OFF.
+   Numerically S12 is -2048 to 2047, however if ALIGN_BASE is true, use the
+   more conservative -2048 to 2032 range for offsets pertaining to stack
+   related registers.  */
+
+void
+riscv_split_sum_of_two_s12 (HOST_WIDE_INT val, bool align_base,
+   HOST_WIDE_INT *base, HOST_WIDE_INT *off)
+{
+  if (SUM_OF_TWO_S12_N (val))
+{
+  *base = -2048;
+  *off = val - (-2048);
+}
+  else if (SUM_OF_TWO_S12_P (val, true) && align_base)
+{
+  *base = 2032;
+  *off = val - 2032;
+}
+  else if (SUM_OF_TWO_

[gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

2024-03-16 Thread Vineet Gupta
Noticed that new sum of two s12 splitter was generating following:

| 059930 :
|   59930:  add sp,sp,-16
|   59934:  lui t0,0xf
|   59938:  sd  s0,0(sp)
|   5993c:  sd  ra,8(sp)
|   59940:  add sp,sp,t0
|   59944:  add s0,sp,2047  <
|   59948:  mv  a2,a0
|   5994c:  mv  a3,a1
|   59950:  mv  a0,sp
|   59954:  li  a4,1
|   59958:  lui a1,0x1
|   5995c:  add s0,s0,1 <---
|   59960:  jal 59a3c

SP here becomes unaligned, even if transitively which is undesirable as
well as incorrect:
 - ABI requires stack to be 8 byte aligned
 - asm code looks weird and unexpected
 - to the user it might falsely seem like a compiler bug even when not,
   specially when staring at asm for debugging unrelated issue.

Fix this by using 2032+addend idiom when handling register operands
related to stack. This only affects positive S12 values, negative values
are already -2048 based.

Unfortunately implementation requires making a copy of splitter, since
it needs different varaints of predicate and constraint which cant be
done conditionally in the same MD pattern (constraint with restricted
range prevents LRA from allowing such insn despite new predicate)

With the patch, we get following correct code instead:

| ..
| 59944:add s0,sp,2032
| ..
| 5995c:add s0,s0,16

gcc/Changelog:
* config/riscv/riscv.h: Add alignment arg to new macros.
* config/riscv/constraints.md: Variant of new constraint.
* config/riscv/predicates.md: Variant of new predicate.
* config/riscv/riscv.md: Variant of new splitter which offsets
off of 2032 (vs. 2047).
Gate existing splitter behind riscv_reg_frame_related.
* config/riscv/riscv.cc (riscv_reg_frame_related): New helper to
conditionalize the existing and new spitters.
* config/riscv/riscv-protos.h: Add new prototype.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/constraints.md |  6 ++
 gcc/config/riscv/predicates.md  |  8 ++-
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.cc   | 11 ++
 gcc/config/riscv/riscv.h| 15 -
 gcc/config/riscv/riscv.md   | 37 ++---
 6 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 435461180c7e..a9446c54ee45 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -86,6 +86,12 @@
(ior (match_test "IN_RANGE (ival,  2048,  4094)")
(match_test "IN_RANGE (ival, -4096, -2049)"
 
+(define_constraint "MiA"
+  "const can be represented as sum of two S12 values with first aligned."
+  (and (match_code "const_int")
+   (ior (match_test "IN_RANGE (ival,  2033,  4064)")
+   (match_test "IN_RANGE (ival, -4096, -2049)"
+
 (define_constraint "Ds3"
   "@internal
1, 2 or 3 immediate"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 89490339c2da..56f9919daafa 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -423,7 +423,13 @@
 (define_predicate "const_two_s12"
   (match_code "const_int")
 {
-  return SUM_OF_TWO_S12 (INTVAL (op));
+  return SUM_OF_TWO_S12 (INTVAL (op), false);
+})
+
+(define_predicate "const_two_s12_algn"
+  (match_code "const_int")
+{
+  return SUM_OF_TWO_S12 (INTVAL (op), true);
 })
 
 ;; CORE-V Predicates:
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b87355938052..f9e407bf5768 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -164,6 +164,7 @@ extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
+extern bool riscv_reg_frame_related (rtx);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 680c4a728e92..38aebefa2590 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6667,6 +6667,17 @@ riscv_can_eliminate (const int from ATTRIBUTE_UNUSED, 
const int to)
   return (to == HARD_FRAME_POINTER_REGNUM || to == STACK_POINTER_REGNUM);
 }
 
+/* Helper to determine if reg X pertains to stack.  */
+bool
+riscv_reg_frame_related (rtx x)
+{
+  return REG_P (x)
+&& (REGNO (x) == FRAME_POINTER_REGNUM
+|| REGNO (x) == HARD_FRAME_POINTER_REGNUM
+|| REGNO (x) == ARG_POINTER_REGNUM
+|| REGNO (x) == VIRTUAL_STACK_VARS_REGNUM);
+}
+
 /* Implement INITIAL_ELIMINATION_OFFSET.  FROM is either the frame pointer
or argument pointer.  TO is either the stack pointer or hard frame
pointer.  */
diff --git a/gcc

[gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak

2024-03-16 Thread Vineet Gupta
Hi,

This set of patches (for gcc-15) help improve stack/array accesses
by improving constant materialization. Details are in respective
patches.

The first patch is the main change which improves SPEC cactu by 10%.

There is at least one more pending change with same theme
(riscv_add_offset/riscv_legitimize_address) but that is tripping up
and currently being debugged.

Thx,
-Vineet

Vineet Gupta (3):
  RISC-V: avoid LUI based const materialization ... [part of PR/106265]
  RISC-V: avoid LUI based const mat: keep stack offsets aligned
  RISC-V: avoid LUI based const mat in prologue/epilogue expansion
[PR/105733]

 gcc/config/riscv/constraints.md   | 12 +++
 gcc/config/riscv/predicates.md| 12 +++
 gcc/config/riscv/riscv-protos.h   |  3 +
 gcc/config/riscv/riscv.cc | 91 +--
 gcc/config/riscv/riscv.h  | 20 
 gcc/config/riscv/riscv.md | 65 +
 gcc/testsuite/gcc.target/riscv/pr105733.c | 15 +++
 .../riscv/rvv/autovec/vls/spill-1.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-2.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-3.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-4.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-5.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-6.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-7.c   |  4 +-
 .../gcc.target/riscv/sum-of-two-s12-const-1.c | 45 +
 .../gcc.target/riscv/sum-of-two-s12-const-2.c | 22 +
 16 files changed, 292 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-2.c

-- 
2.34.1



New French PO file for 'gcc' (version 14.1-b20240218)

2024-03-16 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-14.1-b20240218.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] i386: Fix setup of incoming varargs for (...) functions which return large aggregates [PR114175]

2024-03-16 Thread Richard Biener



> Am 16.03.2024 um 08:24 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The c23-stdarg-6.c testcase I've added recently apparently works fine with
> -O0 but aborts with -O1 and higher on x86_64-linux.
> The problem is in setup of incoming varargs.
> 
> Like function.cc before r14-9249 even ix86_setup_incoming_varargs assumes
> that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there
> is nothing to advance, but that is not the case for (...) functions
> returning by hidden reference which have one such artificial argument.
> If the setup_incoming_varargs hook is called from the
>  if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl))
>  && fnargs.is_empty ())
>{
>  struct assign_parm_data_one data = {};
>  assign_parms_setup_varargs (&all, &data, false);
>}
> spot, i.e. where there is no hidden return argument passed, arg.type
> is always NULL, while when it is called in the
>  if (cfun->stdarg && !DECL_CHAIN (parm))
>assign_parms_setup_varargs (&all, &data, false);
> spot, even when it is TYPE_NO_NAMED_ARGS_STDARG_P arg.type will be non-NULL.
> The tree-stdarg.cc pass in f in c23-stdarg-6.cc at -O1 or higher determines
> that va_arg is used on integral types at most twice (loads 2 words),
> and because ix86_setup_incoming_varargs doesn't advance, the code saves
> just the %rdi and %rsi registers to the save area.  But that isn't correct,
> it should save %rsi and %rdx because %rdi is the hidden return argument.
> With -O0 tree-stdarg.cc doesn't attempt to optimize and we save all the
> registers, so it works fine in that case.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

Ok

> Now, I think we'll need the same fix also on
> aarch64, alpha, arc, csky, ia64, loongarch, mips, mmix, nios2, riscv, visium
> which have pretty much the similarly looking snippet in their hooks
> changed by the r13-3549 commit.
> Then arm, epiphany, fr30, frv, ft32, m32r, mcore, nds32, rs6000, sh
> have different changes but most likely need something similar too.
> I don't have access to most of those, could test aarch64 and rs6000 I guess.

If we have testsuite coverage after this I’d say go ahead for unmaintained 
ports.

Richard 

> 2024-03-16  Jakub Jelinek  
> 
>PR target/114175
>* config/i386/i386.cc (ix86_setup_incoming_varargs): Only skip
>ix86_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
>if arg.type is NULL.
> 
>* gcc.dg/c23-stdarg-7.c: New test.
>* gcc.dg/c23-stdarg-8.c: New test.
> 
> --- gcc/config/i386/i386.cc.jj2024-03-05 10:26:19.424029862 +0100
> +++ gcc/config/i386/i386.cc2024-03-15 23:50:08.821823213 +0100
> @@ -4643,7 +4643,8 @@ ix86_setup_incoming_varargs (cumulative_
>   /* For varargs, we do not want to skip the dummy va_dcl argument.
>  For stdargs, we do want to skip the last named argument.  */
>   next_cum = *cum;
> -  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
> +  if ((!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
> +   || arg.type != NULL_TREE)
>   && stdarg_p (fntype))
> ix86_function_arg_advance (pack_cumulative_args (&next_cum), arg);
> 
> --- gcc/testsuite/gcc.dg/c23-stdarg-7.c.jj2024-03-15 23:50:56.857185518 
> +0100
> +++ gcc/testsuite/gcc.dg/c23-stdarg-7.c2024-03-15 23:50:45.611334813 +0100
> @@ -0,0 +1,6 @@
> +/* Test C23 variadic functions with no named parameters, or last named
> +   parameter with a declaration not allowed in C17.  Execution tests.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -std=c23 -pedantic-errors" } */
> +
> +#include "c23-stdarg-4.c"
> --- gcc/testsuite/gcc.dg/c23-stdarg-8.c.jj2024-03-15 23:51:20.814867467 
> +0100
> +++ gcc/testsuite/gcc.dg/c23-stdarg-8.c2024-03-15 23:51:06.973051224 +0100
> @@ -0,0 +1,6 @@
> +/* Test C23 variadic functions with no named parameters, or last named
> +   parameter with a declaration not allowed in C17.  Execution tests.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -std=c23 -pedantic-errors" } */
> +
> +#include "c23-stdarg-6.c"
> 
>Jakub
> 


Re: [PATCH] bitint: Fix up stores to large/huge _BitInt bitfields [PR114329]

2024-03-16 Thread Richard Biener



> Am 16.03.2024 um 08:11 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The verifier requires BIT_FIELD_REFs with INTEGRAL_TYPE_P first operand
> to have mode precision.  In most cases for the large/huge _BitInt bitfield
> stores the code uses bitfield representatives, which are typically arrays
> of chars, but if the bitfield starts at byte boundary on big endian,
> the code uses as nlhs in lower_mergeable_store COMPONENT_REF of the
> bitfield FIELD_DECL instead, which is fine for the limb accesses,
> but when used for the most significant limb can result in invalid
> BIT_FIELD_REF because the first operand then has BITINT_TYPE and
> usually VOIDmode.
> 
> The following patch adds a helper method for the 4 creatikons of
> BIT_FIELD_REF which when needed adds a VIEW_CONVERT_EXPR.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-03-16  Jakub Jelinek  
> 
>PR tree-optimization/114329
>* gimple-lower-bitint.cc (struct bitint_large_huge): Declare
>build_bit_field_ref method.
>(bitint_large_huge::build_bit_field_ref): New method.
>(bitint_large_huge::lower_mergeable_stmt): Use it.
> 
>* gcc.dg/bitint-101.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2024-03-15 09:16:36.464033118 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-15 17:53:30.200276578 +0100
> @@ -427,6 +427,8 @@ struct bitint_large_huge
>   void insert_before (gimple *);
>   tree limb_access_type (tree, tree);
>   tree limb_access (tree, tree, tree, bool);
> +  tree build_bit_field_ref (tree, tree, unsigned HOST_WIDE_INT,
> +unsigned HOST_WIDE_INT);
>   void if_then (gimple *, profile_probability, edge &, edge &);
>   void if_then_else (gimple *, profile_probability, edge &, edge &);
>   void if_then_if_then_else (gimple *g, gimple *,
> @@ -659,6 +661,31 @@ bitint_large_huge::limb_access (tree typ
>   return ret;
> }
> 
> +/* Build a BIT_FIELD_REF to access BITSIZE bits with FTYPE type at
> +   offset BITPOS inside of OBJ.  */
> +
> +tree
> +bitint_large_huge::build_bit_field_ref (tree ftype, tree obj,
> +unsigned HOST_WIDE_INT bitsize,
> +unsigned HOST_WIDE_INT bitpos)
> +{
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (obj))
> +  && !type_has_mode_precision_p (TREE_TYPE (obj)))
> +{
> +  unsigned HOST_WIDE_INT nelts
> += CEIL (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (obj))), limb_prec);
> +  tree ltype = m_limb_type;
> +  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (obj));
> +  if (as != TYPE_ADDR_SPACE (ltype))
> +ltype = build_qualified_type (ltype, TYPE_QUALS (ltype)
> +  | ENCODE_QUAL_ADDR_SPACE (as));
> +  tree atype = build_array_type_nelts (ltype, nelts);
> +  obj = build1 (VIEW_CONVERT_EXPR, atype, obj);
> +}
> +  return build3 (BIT_FIELD_REF, ftype, obj, bitsize_int (bitsize),
> + bitsize_int (bitpos));
> +}
> +
> /* Emit a half diamond,
>if (COND)
>  |\
> @@ -2651,9 +2678,9 @@ bitint_large_huge::lower_mergeable_stmt
>}
>  tree ftype
>= build_nonstandard_integer_type (limb_prec - bo_bit, 1);
> -  tree bfr = build3 (BIT_FIELD_REF, ftype, unshare_expr (nlhs),
> - bitsize_int (limb_prec - bo_bit),
> - bitsize_int (bo_idx * limb_prec + bo_bit));
> +  tree bfr = build_bit_field_ref (ftype, unshare_expr (nlhs),
> +  limb_prec - bo_bit,
> +  bo_idx * limb_prec + bo_bit);
>  tree t = add_cast (ftype, rhs1);
>  g = gimple_build_assign (bfr, t);
>  insert_before (g);
> @@ -2714,12 +2741,11 @@ bitint_large_huge::lower_mergeable_stmt
>{
>  tree ftype
>= build_nonstandard_integer_type (rprec + bo_bit, 1);
> -  tree bfr = build3 (BIT_FIELD_REF, ftype,
> - unshare_expr (nlhs),
> - bitsize_int (rprec + bo_bit),
> - bitsize_int ((bo_idx
> -   + tprec / limb_prec)
> -  * limb_prec));
> +  tree bfr
> += build_bit_field_ref (ftype, unshare_expr (nlhs),
> +   rprec + bo_bit,
> +   (bo_idx + tprec / limb_prec)
> +   * limb_prec);
>  tree t = add_cast (ftype, rhs1);
>  g = gimple_build_assign (bfr, t);
>  done = true;
> @@ -2860,11 +2886,11 @@ bitint_large_huge::lower_mergeable_stmt
>{
>  tree ftype
>= build_nonstandard_integer_type (rprec + bo_bit, 1);
> -  tree bfr = build3 (BIT_FIELD_REF, ftype,
> - unshare_expr (nlhs),
> - bitsize_int (rprec + bo_bit),
> - bitsize_int ((bo_idx + tprec / limb_prec)
> -  * limb_prec));
> +  tree bfr
> += build_bit_field_ref (ftype, unshare_expr 

Re: _LIBCXX_DEBUG value initialized singular iterators assert failures in std algorithms [PR104316]

2024-03-16 Thread François Dumont

With the patch, sorry.

On 14/03/2024 22:49, François Dumont wrote:

Hi

This is what I started to do.

For now I haven't touch to __cpp_lib_null_iterators definition as 
_Safe_local_iterator still need some work.


libstdc++: Implement N3644 on _Safe_iterator<> [PR114316]

Consider range of value-initialized iterators as valid and empty.

libstdc++-v3/ChangeLog:

    PR libstdc++/114316
    * include/debug/safe_iterator.tcc 
(_Safe_iterator<>::_M_valid_range):
    First check if both iterators are value-initialized before 
checking if

    singular.
    * testsuite/23_containers/set/debug/114316.cc: New test case.
    * testsuite/23_containers/vector/debug/114316.cc: New test case.

Tested under Linux x86_64, ok to commit ?

François


On 12/03/2024 10:52, Jonathan Wakely wrote:
On Tue, 12 Mar 2024 at 01:03, Jonathan Wakely  
wrote:
On Tue, 12 Mar 2024 at 00:55, Maciej Miera  
wrote:



Wiadomość napisana przez Jonathan Wakely  w 
dniu 11.03.2024, o godz. 21:40:


On Mon, 11 Mar 2024 at 20:07, Maciej Miera  
wrote:



Hello,

I have tried to introduce an extra level of safety to my codebase 
and utilize _GLIBCXX_DEBUG in my test builds in order to catch 
faulty iterators.
However, I have encountered the following problem: I would like to 
utilize singular, value-initialized iterators as an arbitrary "null 
range”.
However, this leads to failed assertions in std:: algorithms taking 
such range.


Consider the following code sample with find_if:

#include 
#include 
#include 

#ifndef __cpp_lib_null_iterators
#warning "Not standard compliant"
#endif

int main()
{
    std::multimap::iterator it1{};
    std::multimap::iterator it2{};

    (void) (it1==it2); // OK
    (void) std::find_if(
    it1, it2, [](const auto& el) { return el.second == 8;});
}

Compiled with -std=c++20 and -D_GLIBCXX_DEBUG it produces the 
warning "Not standard compliant"

and the execution results in the following assert failure:

/opt/compiler-explorer/gcc-12.2.0/include/c++/12.2.0/bits/stl_algo.h:3875: 


In function:
    constexpr _IIter std::find_if(_IIter, _IIter, _Predicate) [with 
_IIter =

gnu_debug::_Safe_iterator<_Rb_tree_iterator >,
    debug::multimap, bidirectional_iterator_tag>; 
_Predicate =

    main()::]

The question is though: is it by design, or is it just a mere 
oversight? The warning actually suggest the first option.
If it is an intentional design choice, could you provide some 
rationale behind it, please?



The macro was not defined because the C++14 rule wasn't implemented
for debug mode, but that should have been fixed for GCC 11, according
to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70303
So we should be able to define macro now, except maybe it wasn't fixed
for the RB tree containers.



Just to make sure there are no misunderstandings: comparison via == 
works fine. The feature check macro without _GLIBCXX_DEBUG and with 
 included is also fine. Maybe the need to include a 
header is the issue, but that’s not the core of the problem anyway.

No, it has nothing to do with the headers included. The feature test
macro is defined like so:

# if (__cplusplus >= 201402L) && (!defined(_GLIBCXX_DEBUG))
#  define __glibcxx_null_iterators 201304L

It's a very deliberate choice to not define it when _GLIBCXX_DEBUG is
defined. But as I said, I think we should have changed that.

The actual question is though, whether passing singular iterators 
to std algorithms (like find_if) should make the asserts at the 
beginning of the algo function fail when compiled with 
_GLIBCXX_DEBUG. IMHO, intuitively it should not, as comparing 
iterators equal would just ensure early exit and return of the same 
singular iterator.

This seems not to be the case though. The actual message is this:
Error: the function requires a valid iterator range [first, last).
What bothers me is whether the empty virtual range limited by two 
same singular iterators is actually valid or not.
Yes, it's valid. So the bug is in the __glibcxx_requires_valid_range 
macro.

Thanks for the bugzilla report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114316
We'll get it fixed!
diff --git a/libstdc++-v3/include/debug/safe_iterator.tcc 
b/libstdc++-v3/include/debug/safe_iterator.tcc
index a8b24233e85..4b2baf2980e 100644
--- a/libstdc++-v3/include/debug/safe_iterator.tcc
+++ b/libstdc++-v3/include/debug/safe_iterator.tcc
@@ -194,6 +194,12 @@ namespace __gnu_debug
   std::pair& __dist,
   bool __check_dereferenceable) const
 {
+  if (_M_value_initialized() && __rhs._M_value_initialized())
+   {
+ __dist = std::make_pair(0, __dp_exact);
+ return true;
+   }
+
   if (_M_singular() || __rhs._M_singular() || !_M_can_compare(__rhs))
return false;
 
@@ -218,6 +224,12 @@ namespace __gnu_debug
   std::pair& __dist) const
 {
+  if (this->_M_value_initialized() && __rhs._M_value_initial

[PATCH v3] c++: Fix handling of no-linkage decls for modules

2024-03-16 Thread Nathaniel Shead
On Mon, Mar 11, 2024 at 02:13:34PM -0400, Jason Merrill wrote:
> On 3/8/24 18:18, Nathaniel Shead wrote:
> > On Fri, Mar 08, 2024 at 10:19:52AM -0500, Jason Merrill wrote:
> > > On 3/7/24 21:55, Nathaniel Shead wrote:
> > > > On Mon, Nov 27, 2023 at 03:59:39PM +1100, Nathaniel Shead wrote:
> > > > > On Thu, Nov 23, 2023 at 03:03:37PM -0500, Nathan Sidwell wrote:
> > > > > > On 11/20/23 04:47, Nathaniel Shead wrote:
> > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have 
> > > > > > > write
> > > > > > > access.
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > 
> > > > > > > Block-scope declarations of functions or extern values are not 
> > > > > > > allowed
> > > > > > > when attached to a named module. Similarly, class member 
> > > > > > > functions are
> > > > > > > not inline if attached to a named module. However, in both these 
> > > > > > > cases
> > > > > > > we currently only check if the declaration is within the module 
> > > > > > > purview;
> > > > > > > it is possible for such a declaration to occur within the module 
> > > > > > > purview
> > > > > > > but not be attached to a named module (e.g. in an 'extern "C++"' 
> > > > > > > block).
> > > > > > > This patch makes the required adjustments.
> > > > > > 
> > > > > > 
> > > > > > Ah I'd been puzzling over the default inlinedness of  member-fns of
> > > > > > block-scope structs.  Could you augment the testcase to make sure 
> > > > > > that's
> > > > > > right too?
> > > > > > 
> > > > > > Something like:
> > > > > > 
> > > > > > // dg-module-do link
> > > > > > export module Mod;
> > > > > > 
> > > > > > export auto Get () {
> > > > > > struct X { void Fn () {} };
> > > > > > return X();
> > > > > > }
> > > > > > 
> > > > > > 
> > > > > > ///
> > > > > > import Mod
> > > > > > void Frob () { Get().Fn(); }
> > > > > > 
> > > > > 
> > > > > I gave this a try and it indeed doesn't work correctly; 'Fn' needs to 
> > > > > be
> > > > > marked 'inline' for this to link (whether or not 'Get' itself is
> > > > > inline). I've tried tracing the code to work out what's going on but
> > > > > I've been struggling to work out how all the different flags (e.g.
> > > > > TREE_PUBLIC, TREE_EXTERNAL, TREE_COMDAT, DECL_NOT_REALLY_EXTERN)
> > > > > interact, which flags we want to be set where, and where the decision 
> > > > > of
> > > > > what function definitions to emit is actually made.
> > > > > 
> > > > > I did find that doing 'mark_used(decl)' on all member functions in
> > > > > block-scope structs seems to work however, but I wonder if that's 
> > > > > maybe
> > > > > too aggressive or if there's something else we should be doing?
> > > > 
> > > > I got around to looking at this again, here's an updated version of this
> > > > patch. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > > 
> > > > (I'm not sure if 'start_preparsed_function' is the right place to be
> > > > putting this kind of logic or if it should instead be going in
> > > > 'grokfndecl', e.g. decl.cc:10761 where the rules for making local
> > > > functions have no linkage are initially determined, but I found this
> > > > easier to implement: happy to move around though if preferred.)
> > > > 
> > > > -- >8 --
> > > > 
> > > > Block-scope declarations of functions or extern values are not allowed
> > > > when attached to a named module. Similarly, class member functions are
> > > > not inline if attached to a named module. However, in both these cases
> > > > we currently only check if the declaration is within the module purview;
> > > > it is possible for such a declaration to occur within the module purview
> > > > but not be attached to a named module (e.g. in an 'extern "C++"' block).
> > > > This patch makes the required adjustments.
> > > > 
> > > > While implementing this we discovered that block-scope local functions
> > > > are not correctly emitted, causing link failures; this patch also
> > > > corrects some assumptions here and ensures that they are emitted when
> > > > needed.
> > > > 
> > > > PR c++/112631
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * cp-tree.h (named_module_attach_p): New function.
> > > > * decl.cc (start_decl): Check for attachment not purview.
> > > > (grokmethod): Likewise.
> > > 
> > > These changes are OK; the others I want to consider more.
> > 
> > Thanks, I can commit this as a separate commit if you prefer?
> 
> Please.
> 

Thanks, committed as r14-9501-gead3075406ece9.

> > > > +export auto n_n() {
> > > > +  internal();
> > > > +  struct X { void f() { internal(); } };
> > > > +  return X{};
> > > 
> > > Hmm, is this not a prohibited exposure?  Seems like X has no linkage 
> > > because
> > > it's at block scope, and the deduced return type names it.
> > > 
> > > I know we try to support this "voldemort" pattern, but is that actually
> > > correct?
> > 
> > I had similar doubts, but this is not an especially uncommon pattern in
> > the wild e

[PATCH v4 2/2] extend.texi: Add documentation for all missing built-in traits [PR87343]

2024-03-16 Thread Ken Matsui
Fixed some wording and consistency issues in the documentation.

-- >8 --

PR c++/87343

gcc/ChangeLog:

* doc/extend.texi (Expression-yielding Type Traits): New
subsection.
(Type-yielding Type Traits): Likewise.
(C++ Concepts): Move __is_same to ...
(Expression-yielding Type Traits): ... here.
(__is_enum): Add @anchor and @ref to related traits.
(__is_trivial): Likewise.
(__is_class): Fix documentation.
(__underlying_type): Likewise.
(__has_nothrow_assign): Ensure consistency.
(__has_trivial_constructor): Likewise.
(__has_trivial_copy): Likewise.
(__has_trivial_assign): Likewise.
(__has_trivial_constructor): Likewise.
(__has_trivial_copy): Likewise.
(__has_trivial_destructor): Likewise.
(__has_virtual_destructor): Likewise.
(__is_abstract): Likewise.
(__is_aggregate): Likewise.
(__is_base_of): Likewise.
(__is_empty): Likewise.
(__is_enum): Likewise.
(__is_literal_type): Likewise.
(__is_pod): Likewise.
(__is_polymorphic): Likewise.
(__is_standard_layout): Likewise.
(__is_trivial): Likewise.
(__is_union): Likewise.
(__has_unique_object_representations): New documentation.
(__is_array): Likewise.
(__is_assignable): Likewise.
(__is_bounded_array): Likewise.
(__is_constructible): Likewise.
(__is_convertible): Likewise.
(__is_function): Likewise.
(__is_layout_compatible): Likewise.
(__is_member_function_pointer): Likewise.
(__is_member_object_pointer): Likewise.
(__is_member_pointer): Likewise.
(__is_nothrow_assignable): Likewise.
(__is_nothrow_constructible): Likewise.
(__is_nothrow_convertible): Likewise.
(__is_object): Likewise.
(__is_pointer_interconvertible_base_of): Likewise.
(__is_reference): Likewise.
(__is_scoped_enum): Likewise.
(__is_trivially_assignable): Likewise.
(__is_trivially_constructible): Likewise.
(__is_trivially_copyable): Likewise.
(__reference_constructs_from_temporary): Likewise.
(__reference_converts_from_temporary): Likewise.
(__bases): Likewise.
(__direct_bases): Likewise.
(__remove_cv): Likewise.
(__remove_cvref): Likewise.
(__remove_pointer): Likewise.
(__remove_reference): Likewise.
(__type_pack_element): Likewise.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 355 +---
 1 file changed, 304 insertions(+), 51 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 1c61682b102..eb61f619e02 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29485,79 +29485,91 @@ Function Multiversioning} for more details.
 @section Type Traits
 
 The C++ front end implements syntactic extensions that allow
-compile-time determination of 
-various characteristics of a type (or of a
-pair of types).
+compile-time determination of various characteristics of a type
+(or of a pair of types).
+
+
+@subsection Value-yielding Type Traits
+
+Most of these built-ins return a @code{bool} result when invoked; the rest
+yields a @code{size_t} value.
 
 @defbuiltin{bool __has_nothrow_assign (@var{type})}
-If @var{type} is @code{const}-qualified or is a reference type then
-the trait is @code{false}.  Otherwise if @code{__has_trivial_assign (type)}
-is @code{true} then the trait is @code{true}, else if @var{type} is
+If @var{type} is @code{const}-qualified or is a reference type, then
+the trait is @code{false}.  Otherwise, if @code{__has_trivial_assign (type)}
+is @code{true}, then the trait is @code{true}, else if @var{type} is
 a cv-qualified class or union type with copy assignment operators that are
-known not to throw an exception then the trait is @code{true}, else it is
+known not to throw an exception, then the trait is @code{true}, else it is
 @code{false}.
 Requires: @var{type} shall be a complete type, (possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
 @defbuiltin{bool __has_nothrow_constructor (@var{type})}
-If @code{__has_trivial_constructor (type)} is @code{true} then the trait
+If @code{__has_trivial_constructor (type)} is @code{true}, then the trait
 is @code{true}, else if @var{type} is a cv class or union type (or array
 thereof) with a default constructor that is known not to throw an
-exception then the trait is @code{true}, else it is @code{false}.
+exception, then the trait is @code{true}, else it is @code{false}.
 Requires: @var{type} shall be a complete type, (possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
 @defbuiltin{bool __has_nothrow_copy (@var{type})}
-If @code{__has_trivial_copy (type)} is @code{true} then the trait is
+If @code{__has_trivial_copy (type)} is @code{true}, 

[PING] Re: [PATCH] testsuite: Define _POSIX_C_SOURCE for test

2024-03-16 Thread Torbjorn SVENSSON

Ping!

Kind regards,
Torbjörn

On 2024-03-10 18:26, Torbjörn SVENSSON wrote:

Ok for trunk?

--

As the tests assume that strndup() is visible (only part of
POSIX.1-2008) define the guard to ensure that it's visible.  Currently,
glibc appears to always have this defined in C++, newlib does not.

Without this patch, fails like this can be seen:

Testing analyzer/strndup-1.c,  -std=c++98
.../strndup-1.c: In function 'void test_1(const char*)':
.../strndup-1.c:11:13: error: 'strndup' was not declared in this scope; did you 
mean 'strncmp'?
.../strndup-1.c: In function 'void test_2(const char*)':
.../strndup-1.c:16:13: error: 'strndup' was not declared in this scope; did you 
mean 'strncmp'?
.../strndup-1.c: In function 'void test_3(const char*)':
.../strndup-1.c:21:13: error: 'strndup' was not declared in this scope; did you 
mean 'strncmp'?

Patch has been verified on Linux.

gcc/testsuite/ChangeLog:

* c-c++-common/analyzer/strndup-1.c: Define _POSIX_C_SOURCE.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/c-c++-common/analyzer/strndup-1.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/analyzer/strndup-1.c 
b/gcc/testsuite/c-c++-common/analyzer/strndup-1.c
index 85ccae85d83..577ece0cfba 100644
--- a/gcc/testsuite/c-c++-common/analyzer/strndup-1.c
+++ b/gcc/testsuite/c-c++-common/analyzer/strndup-1.c
@@ -1,4 +1,5 @@
  /* { dg-skip-if "no strndup in libc" { *-*-darwin[789]* *-*-darwin10* 
hppa*-*-hpux* *-*-mingw* *-*-vxworks* } } */
+/* { dg-additional-options "-D_POSIX_C_SOURCE=200809L" } */
  
  #include 

  #include 


Re: [PATCH] Fix libcc1plugin and libc1plugin to avoid poisoned identifiers

2024-03-16 Thread FX Coudert
Given its fixes build, is obvious, and tested appropriately: patch pushed as 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5213047b1d50af63dfabb5e5649821a6cb157e33

FX

[PATCH] i386: Fix setup of incoming varargs for (...) functions which return large aggregates [PR114175]

2024-03-16 Thread Jakub Jelinek
Hi!

The c23-stdarg-6.c testcase I've added recently apparently works fine with
-O0 but aborts with -O1 and higher on x86_64-linux.
The problem is in setup of incoming varargs.

Like function.cc before r14-9249 even ix86_setup_incoming_varargs assumes
that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there
is nothing to advance, but that is not the case for (...) functions
returning by hidden reference which have one such artificial argument.
If the setup_incoming_varargs hook is called from the
  if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl))
  && fnargs.is_empty ())
{
  struct assign_parm_data_one data = {};
  assign_parms_setup_varargs (&all, &data, false);
}
spot, i.e. where there is no hidden return argument passed, arg.type
is always NULL, while when it is called in the
  if (cfun->stdarg && !DECL_CHAIN (parm))
assign_parms_setup_varargs (&all, &data, false);
spot, even when it is TYPE_NO_NAMED_ARGS_STDARG_P arg.type will be non-NULL.
The tree-stdarg.cc pass in f in c23-stdarg-6.cc at -O1 or higher determines
that va_arg is used on integral types at most twice (loads 2 words),
and because ix86_setup_incoming_varargs doesn't advance, the code saves
just the %rdi and %rsi registers to the save area.  But that isn't correct,
it should save %rsi and %rdx because %rdi is the hidden return argument.
With -O0 tree-stdarg.cc doesn't attempt to optimize and we save all the
registers, so it works fine in that case.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

Now, I think we'll need the same fix also on
aarch64, alpha, arc, csky, ia64, loongarch, mips, mmix, nios2, riscv, visium
which have pretty much the similarly looking snippet in their hooks
changed by the r13-3549 commit.
Then arm, epiphany, fr30, frv, ft32, m32r, mcore, nds32, rs6000, sh
have different changes but most likely need something similar too.
I don't have access to most of those, could test aarch64 and rs6000 I guess.

2024-03-16  Jakub Jelinek  

PR target/114175
* config/i386/i386.cc (ix86_setup_incoming_varargs): Only skip
ix86_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions
if arg.type is NULL.

* gcc.dg/c23-stdarg-7.c: New test.
* gcc.dg/c23-stdarg-8.c: New test.

--- gcc/config/i386/i386.cc.jj  2024-03-05 10:26:19.424029862 +0100
+++ gcc/config/i386/i386.cc 2024-03-15 23:50:08.821823213 +0100
@@ -4643,7 +4643,8 @@ ix86_setup_incoming_varargs (cumulative_
   /* For varargs, we do not want to skip the dummy va_dcl argument.
  For stdargs, we do want to skip the last named argument.  */
   next_cum = *cum;
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  if ((!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+   || arg.type != NULL_TREE)
   && stdarg_p (fntype))
 ix86_function_arg_advance (pack_cumulative_args (&next_cum), arg);
 
--- gcc/testsuite/gcc.dg/c23-stdarg-7.c.jj  2024-03-15 23:50:56.857185518 
+0100
+++ gcc/testsuite/gcc.dg/c23-stdarg-7.c 2024-03-15 23:50:45.611334813 +0100
@@ -0,0 +1,6 @@
+/* Test C23 variadic functions with no named parameters, or last named
+   parameter with a declaration not allowed in C17.  Execution tests.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c23 -pedantic-errors" } */
+
+#include "c23-stdarg-4.c"
--- gcc/testsuite/gcc.dg/c23-stdarg-8.c.jj  2024-03-15 23:51:20.814867467 
+0100
+++ gcc/testsuite/gcc.dg/c23-stdarg-8.c 2024-03-15 23:51:06.973051224 +0100
@@ -0,0 +1,6 @@
+/* Test C23 variadic functions with no named parameters, or last named
+   parameter with a declaration not allowed in C17.  Execution tests.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c23 -pedantic-errors" } */
+
+#include "c23-stdarg-6.c"

Jakub



[PATCH] bitint: Fix up stores to large/huge _BitInt bitfields [PR114329]

2024-03-16 Thread Jakub Jelinek
Hi!

The verifier requires BIT_FIELD_REFs with INTEGRAL_TYPE_P first operand
to have mode precision.  In most cases for the large/huge _BitInt bitfield
stores the code uses bitfield representatives, which are typically arrays
of chars, but if the bitfield starts at byte boundary on big endian,
the code uses as nlhs in lower_mergeable_store COMPONENT_REF of the
bitfield FIELD_DECL instead, which is fine for the limb accesses,
but when used for the most significant limb can result in invalid
BIT_FIELD_REF because the first operand then has BITINT_TYPE and
usually VOIDmode.

The following patch adds a helper method for the 4 creatikons of
BIT_FIELD_REF which when needed adds a VIEW_CONVERT_EXPR.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-16  Jakub Jelinek  

PR tree-optimization/114329
* gimple-lower-bitint.cc (struct bitint_large_huge): Declare
build_bit_field_ref method.
(bitint_large_huge::build_bit_field_ref): New method.
(bitint_large_huge::lower_mergeable_stmt): Use it.

* gcc.dg/bitint-101.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-03-15 09:16:36.464033118 +0100
+++ gcc/gimple-lower-bitint.cc  2024-03-15 17:53:30.200276578 +0100
@@ -427,6 +427,8 @@ struct bitint_large_huge
   void insert_before (gimple *);
   tree limb_access_type (tree, tree);
   tree limb_access (tree, tree, tree, bool);
+  tree build_bit_field_ref (tree, tree, unsigned HOST_WIDE_INT,
+   unsigned HOST_WIDE_INT);
   void if_then (gimple *, profile_probability, edge &, edge &);
   void if_then_else (gimple *, profile_probability, edge &, edge &);
   void if_then_if_then_else (gimple *g, gimple *,
@@ -659,6 +661,31 @@ bitint_large_huge::limb_access (tree typ
   return ret;
 }
 
+/* Build a BIT_FIELD_REF to access BITSIZE bits with FTYPE type at
+   offset BITPOS inside of OBJ.  */
+
+tree
+bitint_large_huge::build_bit_field_ref (tree ftype, tree obj,
+   unsigned HOST_WIDE_INT bitsize,
+   unsigned HOST_WIDE_INT bitpos)
+{
+  if (INTEGRAL_TYPE_P (TREE_TYPE (obj))
+  && !type_has_mode_precision_p (TREE_TYPE (obj)))
+{
+  unsigned HOST_WIDE_INT nelts
+   = CEIL (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (obj))), limb_prec);
+  tree ltype = m_limb_type;
+  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (obj));
+  if (as != TYPE_ADDR_SPACE (ltype))
+   ltype = build_qualified_type (ltype, TYPE_QUALS (ltype)
+ | ENCODE_QUAL_ADDR_SPACE (as));
+  tree atype = build_array_type_nelts (ltype, nelts);
+  obj = build1 (VIEW_CONVERT_EXPR, atype, obj);
+}
+  return build3 (BIT_FIELD_REF, ftype, obj, bitsize_int (bitsize),
+bitsize_int (bitpos));
+}
+
 /* Emit a half diamond,
if (COND)
  |\
@@ -2651,9 +2678,9 @@ bitint_large_huge::lower_mergeable_stmt
}
  tree ftype
= build_nonstandard_integer_type (limb_prec - bo_bit, 1);
- tree bfr = build3 (BIT_FIELD_REF, ftype, unshare_expr (nlhs),
-bitsize_int (limb_prec - bo_bit),
-bitsize_int (bo_idx * limb_prec + bo_bit));
+ tree bfr = build_bit_field_ref (ftype, unshare_expr (nlhs),
+ limb_prec - bo_bit,
+ bo_idx * limb_prec + bo_bit);
  tree t = add_cast (ftype, rhs1);
  g = gimple_build_assign (bfr, t);
  insert_before (g);
@@ -2714,12 +2741,11 @@ bitint_large_huge::lower_mergeable_stmt
{
  tree ftype
= build_nonstandard_integer_type (rprec + bo_bit, 1);
- tree bfr = build3 (BIT_FIELD_REF, ftype,
-unshare_expr (nlhs),
-bitsize_int (rprec + bo_bit),
-bitsize_int ((bo_idx
-  + tprec / limb_prec)
- * limb_prec));
+ tree bfr
+   = build_bit_field_ref (ftype, unshare_expr (nlhs),
+  rprec + bo_bit,
+  (bo_idx + tprec / limb_prec)
+  * limb_prec);
  tree t = add_cast (ftype, rhs1);
  g = gimple_build_assign (bfr, t);
  done = true;
@@ -2860,11 +2886,11 @@ bitint_large_huge::lower_mergeable_stmt
{
  tree ftype
= build_nonstandard_integer_type (rprec + bo_bit, 1);
- tree bfr = build3 (BIT_FIELD_REF, ftype,
-