Re: [PATCH] Predefine __STRICT_ALIGN__ if STRICT_ALIGNMENT
YunQiang Su writes: > Arm32 predefines __ARM_FEATURE_UNALIGNED if -mno-unaligned-access, > and RISC-V predefines __riscv_misaligned_avoid, while other ports > that support -mstrict-align/-mno-unaligned-access don't have such > macro, and these backend macros are only avaiable for c-family. > Note: Arm64 always predefine __ARM_FEATURE_UNALIGNED: See #111555. I would say tag the bug even if you're not fixing it, as it was related enough for you to cite it. > > Let's add a generic one. > > __STRICT_ALIGN__ is used instead of __STRICT_ALIGNMENT__, due to that > the later is used by some softwares, such as lzo2, syslinux etc. > > gcc > * cppbuiltin.cc: Predefine __STRICT_ALIGNMENT__ if > STRICT_ALIGNMENT. > --- > gcc/cppbuiltin.cc | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/gcc/cppbuiltin.cc b/gcc/cppbuiltin.cc > index c4bfc2917dc..d32efdf9a07 100644 > --- a/gcc/cppbuiltin.cc > +++ b/gcc/cppbuiltin.cc > @@ -123,6 +123,9 @@ define_builtin_macros_for_compilation_flags (cpp_reader > *pfile) > >cpp_define_formatted (pfile, "__FINITE_MATH_ONLY__=%d", > flag_finite_math_only); > + > + if (STRICT_ALIGNMENT) > +cpp_define (pfile, "__STRICT_ALIGNMENT__"); > }
[PATCH] Predefine __STRICT_ALIGN__ if STRICT_ALIGNMENT
Arm32 predefines __ARM_FEATURE_UNALIGNED if -mno-unaligned-access, and RISC-V predefines __riscv_misaligned_avoid, while other ports that support -mstrict-align/-mno-unaligned-access don't have such macro, and these backend macros are only avaiable for c-family. Note: Arm64 always predefine __ARM_FEATURE_UNALIGNED: See #111555. Let's add a generic one. __STRICT_ALIGN__ is used instead of __STRICT_ALIGNMENT__, due to that the later is used by some softwares, such as lzo2, syslinux etc. gcc * cppbuiltin.cc: Predefine __STRICT_ALIGNMENT__ if STRICT_ALIGNMENT. --- gcc/cppbuiltin.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/cppbuiltin.cc b/gcc/cppbuiltin.cc index c4bfc2917dc..d32efdf9a07 100644 --- a/gcc/cppbuiltin.cc +++ b/gcc/cppbuiltin.cc @@ -123,6 +123,9 @@ define_builtin_macros_for_compilation_flags (cpp_reader *pfile) cpp_define_formatted (pfile, "__FINITE_MATH_ONLY__=%d", flag_finite_math_only); + + if (STRICT_ALIGNMENT) +cpp_define (pfile, "__STRICT_ALIGNMENT__"); } -- 2.39.2
Re: [PATCH] c-c++-common/Wrestrict.c: fix some typos and enable for LLP64
On Sat, Mar 2, 2024, 08:24 Jonathan Yong <10wa...@gmail.com> wrote: > On 2/15/24 14:08, Jonathan Yong wrote: > > Attached patch OK? > > > > Copy/pasted for review convenience. > Ping. > Pinging this for Jon. It's been a couple months, and this should be mostly obvious if someone has a spare minute. >
[pushed] Darwin: Fix bootstrap break on X86.
Tested on x86_64-darwin21, pushed to the branch, thanks Iain --- 8< --- The changes in r12-9536-gefcca6481eab18 mangled the whitespace in the ENDFILE_SPEC i386/darwin.h leading to a bootstrap fail. gcc/ChangeLog: * config/i386/darwin.h (ENDFILE_SPEC): Fix whitespace. Signed-off-by: Iain Sandoe --- gcc/config/i386/darwin.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h index 2f773924d6e..12cdc34a19e 100644 --- a/gcc/config/i386/darwin.h +++ b/gcc/config/i386/darwin.h @@ -109,8 +109,8 @@ along with GCC; see the file COPYING3. If not see "%{!force_cpusubtype_ALL:-force_cpusubtype_ALL} " #undef ENDFILE_SPEC -#define ENDFILE_SPEC -\ "%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}} \ +#define ENDFILE_SPEC \ + "%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}} \ %{mpc32:crtprec32.o%s} \ %{mpc64:crtprec64.o%s} \ %{mpc80:crtprec80.o%s}" TM_DESTRUCTOR -- 2.39.2 (Apple Git-143)
[PATCH] LoongArch: gcc13: Implement option save/restore.
LTO option streaming and target attributes both require per-function target configuration, which is achieved via option save/restore. We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target context in addition to other automatically maintained option states (via the "Save" option property in the .opt files). PR target/113233 gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Mark options with the "Save" property. * config/loongarch/loongarch-opts.cc (loongarch_update_gcc_opt_status): Update the value of the la_target to global_options. * config/loongarch/loongarch-opts.h (loongarch_update_gcc_opt_status): Add a function declaration. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Call the function loongarch_update_gcc_opt_status. (loongarch_option_save): New functions. (loongarch_option_restore): Likewise. (TARGET_OPTION_SAVE): Define macro. (TARGET_OPTION_RESTORE): Likewise. * config/loongarch/loongarch.opt: Regenerate. --- gcc/config/loongarch/genopts/loongarch.opt.in | 24 ++--- gcc/config/loongarch/loongarch-opts.cc| 22 gcc/config/loongarch/loongarch-opts.h | 6 gcc/config/loongarch/loongarch.cc | 34 +-- gcc/config/loongarch/loongarch.opt| 24 ++--- 5 files changed, 84 insertions(+), 26 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 76acd35d39c..aea4f2a4f61 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -58,7 +58,7 @@ EnumValue Enum(isa_ext_fpu) String(@@STR_ISA_EXT_FPU64@@) Value(ISA_EXT_FPU64) m@@OPTSTR_ISA_EXT_FPU@@= -Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) Init(M_OPTION_NOT_SEEN) +Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) Init(M_OPTION_NOT_SEEN) Save -m@@OPTSTR_ISA_EXT_FPU@@=FPU Generate code for the given FPU. m@@OPTSTR_ISA_EXT_FPU@@=@@STR_ISA_EXT_FPU0@@ @@ -92,11 +92,11 @@ EnumValue Enum(cpu_type) String(@@STR_CPU_LA464@@) Value(CPU_LA464) m@@OPTSTR_ARCH@@= -Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) Init(M_OPTION_NOT_SEEN) +Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) Init(M_OPTION_NOT_SEEN) Save -m@@OPTSTR_ARCH@@=PROCESSORGenerate code for the given PROCESSOR ISA. m@@OPTSTR_TUNE@@= -Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) Init(M_OPTION_NOT_SEEN) +Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) Init(M_OPTION_NOT_SEEN) Save -m@@OPTSTR_TUNE@@=PROCESSORGenerate optimized code for PROCESSOR. @@ -127,31 +127,31 @@ int la_opt_abi_ext = M_OPTION_NOT_SEEN mbranch-cost= -Target RejectNegative Joined UInteger Var(loongarch_branch_cost) +Target RejectNegative Joined UInteger Var(loongarch_branch_cost) Save -mbranch-cost=COST Set the cost of branches to roughly COST instructions. mcheck-zero-division -Target Mask(CHECK_ZERO_DIV) +Target Mask(CHECK_ZERO_DIV) Save Trap on integer divide by zero. mcond-move-int -Target Var(TARGET_COND_MOVE_INT) Init(1) +Target Var(TARGET_COND_MOVE_INT) Init(1) Save Conditional moves for integral are enabled. mcond-move-float -Target Var(TARGET_COND_MOVE_FLOAT) Init(1) +Target Var(TARGET_COND_MOVE_FLOAT) Init(1) Save Conditional moves for float are enabled. mmemcpy -Target Mask(MEMCPY) +Target Mask(MEMCPY) Save Prevent optimizing block moves, which is also the default behavior of -Os. mstrict-align -Target Var(TARGET_STRICT_ALIGN) Init(0) +Target Var(TARGET_STRICT_ALIGN) Init(0) Save Do not generate unaligned memory accesses. mmax-inline-memcpy-size= -Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init(1024) +Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init(1024) Save -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default is 1024. mexplicit-relocs @@ -182,11 +182,11 @@ EnumValue Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) Value(CMODEL_EXTREME) mcmodel= -Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) Init(CMODEL_NORMAL) +Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) Init(CMODEL_NORMAL) Save Specify the code model. mdirect-extern-access -Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0) +Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0) Save Avoid using the GOT to access external symbols. mrelax diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index a52e25236ea..e158de9a12f 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -594,3 +594,25 @@ multilib_enabled_abi_list () return XOBFINISH (&msg_obstack, const char *); } + +/* option status
[PATCH] LoongArch: gcc12: Implement option save/restore.
LTO option streaming and target attributes both require per-function target configuration, which is achieved via option save/restore. We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target context in addition to other automatically maintained option states (via the "Save" option property in the .opt files). PR target/113233 gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Mark options with the "Save" property. * config/loongarch/loongarch-opts.cc (loongarch_update_gcc_opt_status): Update the value of the la_target to global_options. * config/loongarch/loongarch-opts.h (loongarch_update_gcc_opt_status): Add a function declaration. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Call the function loongarch_update_gcc_opt_status. (loongarch_option_save): New functions. (loongarch_option_restore): Likewise. (TARGET_OPTION_SAVE): Define macro. (TARGET_OPTION_RESTORE): Likewise. * config/loongarch/loongarch.opt: Regenerate. --- gcc/config/loongarch/genopts/loongarch.opt.in | 22 ++-- gcc/config/loongarch/loongarch-opts.cc| 22 gcc/config/loongarch/loongarch-opts.h | 6 gcc/config/loongarch/loongarch.cc | 34 +-- gcc/config/loongarch/loongarch.opt| 22 ++-- 5 files changed, 82 insertions(+), 24 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 420a3941b3b..a3107cb2294 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -58,7 +58,7 @@ EnumValue Enum(isa_ext_fpu) String(@@STR_ISA_EXT_FPU64@@) Value(ISA_EXT_FPU64) m@@OPTSTR_ISA_EXT_FPU@@= -Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) Init(M_OPTION_NOT_SEEN) +Target RejectNegative Joined ToLower Enum(isa_ext_fpu) Var(la_opt_fpu) Init(M_OPTION_NOT_SEEN) Save -m@@OPTSTR_ISA_EXT_FPU@@=FPU Generate code for the given FPU. m@@OPTSTR_ISA_EXT_FPU@@=@@STR_ISA_EXT_FPU0@@ @@ -92,11 +92,11 @@ EnumValue Enum(cpu_type) String(@@STR_CPU_LA464@@) Value(CPU_LA464) m@@OPTSTR_ARCH@@= -Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) Init(M_OPTION_NOT_SEEN) +Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_arch) Init(M_OPTION_NOT_SEEN) Save -m@@OPTSTR_ARCH@@=PROCESSORGenerate code for the given PROCESSOR ISA. m@@OPTSTR_TUNE@@= -Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) Init(M_OPTION_NOT_SEEN) +Target RejectNegative Joined Enum(cpu_type) Var(la_opt_cpu_tune) Init(M_OPTION_NOT_SEEN) Save -m@@OPTSTR_TUNE@@=PROCESSORGenerate optimized code for PROCESSOR. @@ -127,31 +127,31 @@ int la_opt_abi_ext = M_OPTION_NOT_SEEN mbranch-cost= -Target RejectNegative Joined UInteger Var(loongarch_branch_cost) +Target RejectNegative Joined UInteger Var(loongarch_branch_cost) Save -mbranch-cost=COST Set the cost of branches to roughly COST instructions. mcheck-zero-division -Target Mask(CHECK_ZERO_DIV) +Target Mask(CHECK_ZERO_DIV) Save Trap on integer divide by zero. mcond-move-int -Target Var(TARGET_COND_MOVE_INT) Init(1) +Target Var(TARGET_COND_MOVE_INT) Init(1) Save Conditional moves for integral are enabled. mcond-move-float -Target Var(TARGET_COND_MOVE_FLOAT) Init(1) +Target Var(TARGET_COND_MOVE_FLOAT) Init(1) Save Conditional moves for float are enabled. mmemcpy -Target Mask(MEMCPY) +Target Mask(MEMCPY) Save Prevent optimizing block moves, which is also the default behavior of -Os. mstrict-align -Target Var(TARGET_STRICT_ALIGN) Init(0) +Target Var(TARGET_STRICT_ALIGN) Init(0) Save Do not generate unaligned memory accesses. mmax-inline-memcpy-size= -Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init(1024) +Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init(1024) Save -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default is 1024. ; The code model option names for -mcmodel. @@ -175,7 +175,7 @@ EnumValue Enum(cmodel) String(@@STR_CMODEL_EXTREME@@) Value(CMODEL_EXTREME) mcmodel= -Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) Init(CMODEL_NORMAL) +Target RejectNegative Joined Enum(cmodel) Var(la_opt_cmodel) Init(CMODEL_NORMAL) Save Specify the code model. mrelax diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index eb9c2a52f9e..b55baeccd2f 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -575,3 +575,25 @@ multilib_enabled_abi_list () return XOBFINISH (&msg_obstack, const char *); } + +/* option status feedback for "gcc --help=target -Q" */ +void +loongarch_update_gcc_opt_status (struct loongarch_target *target, +struct gc
Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]
On 3/16/24 11:35 AM, Vineet Gupta wrote: ... if the constant can be represented as sum of two S12 values. The two S12 values could instead be fused with subsequent ADD insn. The helps - avoid an additional LUI insn - side benefits of not clobbering a reg e.g. w/o patch w/ patch long | | plus(unsigned long i) | li a5,4096 | { | addia5,a5,-2032 | addi a0, a0, 2047 return i + 2064; |add a0,a0,a5| addi a0, a0, 17 } | ret | ret NOTE: In theory not having const in a standalone reg might seem less CSE friendly, but for workloads in consideration these mat are from very late LRA reloads and follow on GCSE is not doing much currently. As you note, there's a bit of natural tension between what to expose and when. There's no clear cut answer and I might argue there never will be given the design and implementation of various parts of the RTL pipeline. We have some ports that expose early and just say "tough" if it's a minor loss in some cases. Others choose to expose late. We're kindof a mix of both in RISC-V land. The limited offsets we've got will tend to make this problem a bit worse for RISC-V compared to other architectures. The real benefit however is seen in base+offset computation for array accesses and especially for stack accesses which are finalized late in optim pipeline, during LRA register allocation. Often the finalized offsets trigger LRA reloads resulting in mind boggling repetition of exact same insn sequence including LUI based constant materialization. Yea, this is a known sore spot. I spent a good deal of time working on the PA where we only have a 5 bit displacement for FP memory ops. You can assume this caused reload fits and often resulted in horrific (and repetitive code) code. We did some interesting tricks to avoid the worst of the codegen issues. None of those tricks really apply here since we're in an LRA world and there's no difference in the allowed offset in a memory reference vs an addi instruction. This shaves off 290 billion dynamic instrustions (QEMU icounts) in SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of suite, there additional 10 billion shaved, with both gains and losses in indiv workloads as is usual with compiler changes. That's a fantastic result. This should still be considered damage control as the real/deeper fix would be to reduce number of LRA reloads or CSE/anchor those during LRA constraint sub-pass (re)runs. Agreed. But I think it's worth tackling from both ends. Generate better code when we do have these reloads and avoid the reloads when we can. Implementation Details (for posterity) -- - basic idea is to have a splitter selected via a new predicate for constant being possible sum of two S12 and provide the transform. This is however a 2 -> 2 transform which combine can't handle. So we specify it using a define_insn_and_split. Right. For the record it looks like a 2->2 case because of the mvconst_internal define_insn_and_split. What I was talking about in the Tuesday meeting last week was the desire to rethink that pattern precisely because it drives us to need to implement patterns like yours rather than the more natural 3->2 or 4->3/2. Furthermore, I have a suspicion that there's logicals where we're going to want to do something similar and if we keep the mvconst_internal pattern they're all going to have to be implemented as 2->2s with a define_insn_and_split rather than the more natural 3->2. Having said all that, I suspect the case you're diving into is far more important than the logicals. | linux/scripts/bloat-o-meter build-gcc-240131/target/lib/libc.so.6 \ build-gcc-240131-new-splitter-1-variant/target/lib/libc.so.6 | | add/remove: 0/0 grow/shrink: 21/49 up/down: 520/-3056 (-2536) | Function old new delta | getnameinfo 27562892+136 ... | tempnam 136 144 +8 | padzero 276 284 +8 ... | __GI___register_printf_specifier 284 280 -4 | __EI_xdr_array 468 464 -4 | try_file_lock268 260 -8 | pthread_create@GLIBC_2 35203508 -12 | __pthread_create_2_135203508 -12 ... | _nss_files_setnetgrent 932 904 -28 | _nss_dns_gethostbyaddr2_r 15241480 -44 | build_trtable 33123208-104 | printf_positional 25000 22580 -2420 | Total: Before=2107999, After=2105463, chg -0.12% That's a bi
Re: [gcc-15 3/3] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]
On 3/16/24 11:35 AM, Vineet Gupta wrote: If the constant used for stack offset can be expressed as sum of two S12 values, the constant need not be materialized (in a reg) and instead the two S12 bits can be added to instructions involved with frame pointer. This avoids burning a register and more importantly can often get down to be 2 insn vs. 3. The prev patches to generally avoid LUI based const materialization didn't fix this PR and need this directed fix in funcion prologue/epilogue expansion. This fix doesn't move the neddle for SPEC, at all, but it is still a win considering gcc generates one insn fewer than llvm for the test ;-) gcc-13.1 release | gcc 230823 | | |g6619b3d4c15c| This patch | clang/llvm - li t0,-4096 | lit0,-4096 | addi sp,sp,-2048 | addi sp,sp,-2048 addit0,t0,2016 | addi t0,t0,2032| add sp,sp,-16 | addi sp,sp,-32 li a4,4096 | add sp,sp,t0 | add a5,sp,a0| add a1,sp,16 add sp,sp,t0 | addi a5,sp,-2032 | sbzero,0(a5) | add a0,a0,a1 li a5,-4096 | add a0,a5,a0 | addi sp,sp,2032 | sb zero,0(a0) addia4,a4,-2032 | lit0, 4096 | addi sp,sp,32| addi sp,sp,2032 add a4,a4,a5 | sbzero,2032(a0) | ret | addi sp,sp,48 addia5,sp,16 | addi t0,t0,-2032 | | ret add a5,a4,a5 | add sp,sp,t0 | add a0,a5,a0 | ret | li t0,4096 | sd a5,8(sp) | sb zero,2032(a0)| addit0,t0,-2016 | add sp,sp,t0 | ret | gcc/ChangeLog: PR target/105733 * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New function to split a sum of two s12 values into constituents. (riscv_expand_prologue): Handle offset being sum of two S12. (riscv_expand_epilogue): Ditto. * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr105733.c: New Test. * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not expect LUI 4096. * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto. Yea, wouldn't expect this to move the needle on spec since it's just hitting the prologue/epilogue. In fact, I wouldn't be surprised if there were other stack frame sizes that could be improved. But I wouldn't bother chasing down those other cases. If we think about the embedded space, they're probably not going to want to see functions with large frames to begin with. So optimizing those cases for the embedded space just doesn't make much sense. In the distro space, by this time next year we'll be living in a world where stack clash mitigations are enabled. So for any given size stack frame, it'll be allocated in at most 1 page chunks. So again, going to any significant length to optimize other cases just doesn't make much sense. So we probably should go with this patch in the gcc-15 space, but I wouldn't suggest heroic efforts for other sized stack frames. jeff
Re: [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned
On 3/16/24 11:35 AM, Vineet Gupta wrote: Noticed that new sum of two s12 splitter was generating following: | 059930 : | 59930: add sp,sp,-16 | 59934: lui t0,0xf | 59938: sd s0,0(sp) | 5993c: sd ra,8(sp) | 59940: add sp,sp,t0 | 59944: add s0,sp,2047 < | 59948: mv a2,a0 | 5994c: mv a3,a1 | 59950: mv a0,sp | 59954: li a4,1 | 59958: lui a1,0x1 | 5995c: add s0,s0,1 <--- | 59960: jal 59a3c SP here becomes unaligned, even if transitively which is undesirable as well as incorrect: - ABI requires stack to be 8 byte aligned - asm code looks weird and unexpected - to the user it might falsely seem like a compiler bug even when not, specially when staring at asm for debugging unrelated issue. It's not ideal, but I think it's still ABI compliant as-is. If it wasn't, then I suspect things like virtual origins in Ada couldn't be made ABI compliant. Fix this by using 2032+addend idiom when handling register operands related to stack. This only affects positive S12 values, negative values are already -2048 based. Unfortunately implementation requires making a copy of splitter, since it needs different varaints of predicate and constraint which cant be done conditionally in the same MD pattern (constraint with restricted range prevents LRA from allowing such insn despite new predicate) With the patch, we get following correct code instead: | .. | 59944:add s0,sp,2032 | .. | 5995c:add s0,s0,16 Alternately you could tighten the positive side of the range of the splitter from patch 1/3 so that you could always use 2032 rather than 2047 on the first addi. ie instead of allowing 2048..4094, allow 2048..4064. I don't have a strong opinion on that vs the direction you've gone here. Jeff
[gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]
... if the constant can be represented as sum of two S12 values. The two S12 values could instead be fused with subsequent ADD insn. The helps - avoid an additional LUI insn - side benefits of not clobbering a reg e.g. w/o patch w/ patch long | | plus(unsigned long i) | li a5,4096 | { | addia5,a5,-2032 | addi a0, a0, 2047 return i + 2064; | add a0,a0,a5| addi a0, a0, 17 } | ret | ret NOTE: In theory not having const in a standalone reg might seem less CSE friendly, but for workloads in consideration these mat are from very late LRA reloads and follow on GCSE is not doing much currently. The real benefit however is seen in base+offset computation for array accesses and especially for stack accesses which are finalized late in optim pipeline, during LRA register allocation. Often the finalized offsets trigger LRA reloads resulting in mind boggling repetition of exact same insn sequence including LUI based constant materialization. This shaves off 290 billion dynamic instrustions (QEMU icounts) in SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of suite, there additional 10 billion shaved, with both gains and losses in indiv workloads as is usual with compiler changes. 1,214,172,747,858 | 1,212,530,889,930 | 0.14% <- 740,618,658,160 | 739,515,555,243 | 0.15% <- 692,128,469,436 | 691,175,291,593 | 0.14% <- 190,811,495,310 | 190,854,437,311 | -0.02% 225,751,808,189 | 225,818,881,019 | -0.03% 220,364,237,915 | 220,405,868,038 | -0.02% 179,157,361,261 | 179,186,059,187 | -0.02% 219,313,921,837 | 219,337,232,829 | -0.01% 278,733,265,382 | 278,733,264,380 | 0.00% 442,397,437,578 | 442,397,436,026 | 0.00% 344,112,144,242 | 344,112,142,910 | 0.00% 417,561,390,319 | 417,561,388,877 | 0.00% 669,319,255,911 | 669,318,761,114 | 0.00% 2,852,781,330,203 | 2,564,750,360,011 | 10.10% <-- 1,855,884,349,001 | 1,855,881,122,280 | 0.00% 1,653,856,904,409 | 1,653,733,205,926 | 0.01% 2,974,347,350,401 | 2,974,174,999,505 | 0.01% 1,158,337,609,995 | 1,158,337,608,826 | 0.00% 1,021,799,191,806 | 1,021,425,634,319 | 0.04% 1,716,448,413,824 | 1,714,912,501,736 | 0.09% 849,330,283,510 | 849,321,128,484 | 0.00% 276,556,968,005 | 276,232,659,282 | 0.12% <- 913,352,953,686 | 912,822,592,719 | 0.06% 903,792,550,116 | 903,107,637,917 | 0.08% 1,654,904,617,482 | 1,655,327,175,238 | -0.03% 1,495,841,421,311 | 1,493,418,761,599 | 0.16% <- 1,641,969,530,523 | 1,642,126,596,979 | -0.01% 2,546,170,307,385 | 2,546,151,690,122 | 0.00% 1,983,551,321,388 | 1,983,525,296,345 | 0.00% 1,516,077,036,742 | 1,516,076,833,481 | 0.00% 2,055,868,055,165 | 2,055,865,373,175 | 0.00% 1,000,621,723,807 | 1,000,601,360,859 | 0.00% 1,024,149,313,694 | 1,024,127,472,779 | 0.00% 363,827,037,541 | 363,057,012,239 | 0.21% <- 906,649,110,459 | 905,928,886,712 | 0.08% 509,023,896,044 | 508,140,354,911 | 0.17% <- 403,238,430 | 403,238,485 | 0.00% 403,238,430 | 403,238,485 | 0.00% This should still be considered damage control as the real/deeper fix would be to reduce number of LRA reloads or CSE/anchor those during LRA constraint sub-pass (re)runs. Implementation Details (for posterity) -- - basic idea is to have a splitter selected via a new predicate for constant being possible sum of two S12 and provide the transform. This is however a 2 -> 2 transform which combine can't handle. So we specify it using a define_insn_and_split. - the initial loose "i" constraint caused LRA to accept invalid insns thus needing a tighter new constraint as well. - An additional fallback alternate with catch-all "r" register constraint also needed to allow any "reloads" that LRA might require for ADDI with const larger than S12. Testing This is testsuite clean (rv64 only). I'll rely on post-commit CI multlib run for any possible fallout for other setups such as rv32. || gcc | g++ | gfortran | | rv64imafdc_zba_zbb_zbs_zicond/ lp64d/ medlow | 181 /44 |4 / 1 | 72 /12 | | rv64imafdc_zba_zbb_zbs_zicond/ lp64d/ medlow | 181 /44 |4 / 1 | 73 /13 | | I also threw this into a buildroot run, it obviously boots Linux to userspace. bloat-o-meter on glibc and kernel show overall decrease in staic instruction counts with some minor spot increases. These are generally in the category of - LUI + ADDI are 2 byte each vs. two ADD being 4 byte each. - Knock on effects due to inlining changes. - Sometimes the slightly shorter 2-insn seq in a mult-exit function can cause in-place epilogue duplication (vs. a jump back). This is slightly larger but more efficient in execution. In summary nothing to fret abo
[gcc-15 3/3] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]
If the constant used for stack offset can be expressed as sum of two S12 values, the constant need not be materialized (in a reg) and instead the two S12 bits can be added to instructions involved with frame pointer. This avoids burning a register and more importantly can often get down to be 2 insn vs. 3. The prev patches to generally avoid LUI based const materialization didn't fix this PR and need this directed fix in funcion prologue/epilogue expansion. This fix doesn't move the neddle for SPEC, at all, but it is still a win considering gcc generates one insn fewer than llvm for the test ;-) gcc-13.1 release | gcc 230823 | | |g6619b3d4c15c| This patch | clang/llvm - li t0,-4096 | lit0,-4096 | addi sp,sp,-2048 | addi sp,sp,-2048 addit0,t0,2016 | addi t0,t0,2032| add sp,sp,-16 | addi sp,sp,-32 li a4,4096 | add sp,sp,t0 | add a5,sp,a0| add a1,sp,16 add sp,sp,t0 | addi a5,sp,-2032 | sbzero,0(a5) | add a0,a0,a1 li a5,-4096 | add a0,a5,a0 | addi sp,sp,2032 | sb zero,0(a0) addia4,a4,-2032 | lit0, 4096 | addi sp,sp,32| addi sp,sp,2032 add a4,a4,a5 | sbzero,2032(a0) | ret | addi sp,sp,48 addia5,sp,16 | addi t0,t0,-2032 | | ret add a5,a4,a5 | add sp,sp,t0 | add a0,a5,a0 | ret | li t0,4096 | sd a5,8(sp) | sb zero,2032(a0)| addit0,t0,-2016 | add sp,sp,t0 | ret | gcc/ChangeLog: PR target/105733 * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New function to split a sum of two s12 values into constituents. (riscv_expand_prologue): Handle offset being sum of two S12. (riscv_expand_epilogue): Ditto. * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr105733.c: New Test. * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not expect LUI 4096. * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto. Signed-off-by: Vineet Gupta --- gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv.cc | 80 +-- gcc/testsuite/gcc.target/riscv/pr105733.c | 15 .../riscv/rvv/autovec/vls/spill-1.c | 4 +- .../riscv/rvv/autovec/vls/spill-2.c | 4 +- .../riscv/rvv/autovec/vls/spill-3.c | 4 +- .../riscv/rvv/autovec/vls/spill-4.c | 4 +- .../riscv/rvv/autovec/vls/spill-5.c | 4 +- .../riscv/rvv/autovec/vls/spill-6.c | 4 +- .../riscv/rvv/autovec/vls/spill-7.c | 4 +- 10 files changed, 104 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index f9e407bf5768..8b3f7ce181cc 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -165,6 +165,8 @@ extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *); extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *); extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel); extern bool riscv_reg_frame_related (rtx); +extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, bool, + HOST_WIDE_INT *, HOST_WIDE_INT *); /* Routines implemented in riscv-c.cc. */ void riscv_cpu_cpp_builtins (cpp_reader *); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 38aebefa2590..b4a7947433a7 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -3810,6 +3810,38 @@ riscv_split_doubleword_move (rtx dest, rtx src) riscv_emit_move (riscv_subword (dest, true), riscv_subword (src, true)); } } + +/* Constant VAL is known to be sum of two S12 constants. Break it into + comprising BASE and OFF. + Numerically S12 is -2048 to 2047, however if ALIGN_BASE is true, use the + more conservative -2048 to 2032 range for offsets pertaining to stack + related registers. */ + +void +riscv_split_sum_of_two_s12 (HOST_WIDE_INT val, bool align_base, + HOST_WIDE_INT *base, HOST_WIDE_INT *off) +{ + if (SUM_OF_TWO_S12_N (val)) +{ + *base = -2048; + *off = val - (-2048); +} + else if (SUM_OF_TWO_S12_P (val, true) && align_base) +{ + *base = 2032; + *off = val - 2032; +} + else if (SUM_OF_TWO_
[gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned
Noticed that new sum of two s12 splitter was generating following: | 059930 : | 59930: add sp,sp,-16 | 59934: lui t0,0xf | 59938: sd s0,0(sp) | 5993c: sd ra,8(sp) | 59940: add sp,sp,t0 | 59944: add s0,sp,2047 < | 59948: mv a2,a0 | 5994c: mv a3,a1 | 59950: mv a0,sp | 59954: li a4,1 | 59958: lui a1,0x1 | 5995c: add s0,s0,1 <--- | 59960: jal 59a3c SP here becomes unaligned, even if transitively which is undesirable as well as incorrect: - ABI requires stack to be 8 byte aligned - asm code looks weird and unexpected - to the user it might falsely seem like a compiler bug even when not, specially when staring at asm for debugging unrelated issue. Fix this by using 2032+addend idiom when handling register operands related to stack. This only affects positive S12 values, negative values are already -2048 based. Unfortunately implementation requires making a copy of splitter, since it needs different varaints of predicate and constraint which cant be done conditionally in the same MD pattern (constraint with restricted range prevents LRA from allowing such insn despite new predicate) With the patch, we get following correct code instead: | .. | 59944:add s0,sp,2032 | .. | 5995c:add s0,s0,16 gcc/Changelog: * config/riscv/riscv.h: Add alignment arg to new macros. * config/riscv/constraints.md: Variant of new constraint. * config/riscv/predicates.md: Variant of new predicate. * config/riscv/riscv.md: Variant of new splitter which offsets off of 2032 (vs. 2047). Gate existing splitter behind riscv_reg_frame_related. * config/riscv/riscv.cc (riscv_reg_frame_related): New helper to conditionalize the existing and new spitters. * config/riscv/riscv-protos.h: Add new prototype. Signed-off-by: Vineet Gupta --- gcc/config/riscv/constraints.md | 6 ++ gcc/config/riscv/predicates.md | 8 ++- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv.cc | 11 ++ gcc/config/riscv/riscv.h| 15 - gcc/config/riscv/riscv.md | 37 ++--- 6 files changed, 69 insertions(+), 9 deletions(-) diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md index 435461180c7e..a9446c54ee45 100644 --- a/gcc/config/riscv/constraints.md +++ b/gcc/config/riscv/constraints.md @@ -86,6 +86,12 @@ (ior (match_test "IN_RANGE (ival, 2048, 4094)") (match_test "IN_RANGE (ival, -4096, -2049)" +(define_constraint "MiA" + "const can be represented as sum of two S12 values with first aligned." + (and (match_code "const_int") + (ior (match_test "IN_RANGE (ival, 2033, 4064)") + (match_test "IN_RANGE (ival, -4096, -2049)" + (define_constraint "Ds3" "@internal 1, 2 or 3 immediate" diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 89490339c2da..56f9919daafa 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -423,7 +423,13 @@ (define_predicate "const_two_s12" (match_code "const_int") { - return SUM_OF_TWO_S12 (INTVAL (op)); + return SUM_OF_TWO_S12 (INTVAL (op), false); +}) + +(define_predicate "const_two_s12_algn" + (match_code "const_int") +{ + return SUM_OF_TWO_S12 (INTVAL (op), true); }) ;; CORE-V Predicates: diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index b87355938052..f9e407bf5768 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -164,6 +164,7 @@ extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT); extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *); extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *); extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel); +extern bool riscv_reg_frame_related (rtx); /* Routines implemented in riscv-c.cc. */ void riscv_cpu_cpp_builtins (cpp_reader *); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 680c4a728e92..38aebefa2590 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -6667,6 +6667,17 @@ riscv_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to) return (to == HARD_FRAME_POINTER_REGNUM || to == STACK_POINTER_REGNUM); } +/* Helper to determine if reg X pertains to stack. */ +bool +riscv_reg_frame_related (rtx x) +{ + return REG_P (x) +&& (REGNO (x) == FRAME_POINTER_REGNUM +|| REGNO (x) == HARD_FRAME_POINTER_REGNUM +|| REGNO (x) == ARG_POINTER_REGNUM +|| REGNO (x) == VIRTUAL_STACK_VARS_REGNUM); +} + /* Implement INITIAL_ELIMINATION_OFFSET. FROM is either the frame pointer or argument pointer. TO is either the stack pointer or hard frame pointer. */ diff --git a/gcc
[gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak
Hi, This set of patches (for gcc-15) help improve stack/array accesses by improving constant materialization. Details are in respective patches. The first patch is the main change which improves SPEC cactu by 10%. There is at least one more pending change with same theme (riscv_add_offset/riscv_legitimize_address) but that is tripping up and currently being debugged. Thx, -Vineet Vineet Gupta (3): RISC-V: avoid LUI based const materialization ... [part of PR/106265] RISC-V: avoid LUI based const mat: keep stack offsets aligned RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733] gcc/config/riscv/constraints.md | 12 +++ gcc/config/riscv/predicates.md| 12 +++ gcc/config/riscv/riscv-protos.h | 3 + gcc/config/riscv/riscv.cc | 91 +-- gcc/config/riscv/riscv.h | 20 gcc/config/riscv/riscv.md | 65 + gcc/testsuite/gcc.target/riscv/pr105733.c | 15 +++ .../riscv/rvv/autovec/vls/spill-1.c | 4 +- .../riscv/rvv/autovec/vls/spill-2.c | 4 +- .../riscv/rvv/autovec/vls/spill-3.c | 4 +- .../riscv/rvv/autovec/vls/spill-4.c | 4 +- .../riscv/rvv/autovec/vls/spill-5.c | 4 +- .../riscv/rvv/autovec/vls/spill-6.c | 4 +- .../riscv/rvv/autovec/vls/spill-7.c | 4 +- .../gcc.target/riscv/sum-of-two-s12-const-1.c | 45 + .../gcc.target/riscv/sum-of-two-s12-const-2.c | 22 + 16 files changed, 292 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-2.c -- 2.34.1
New French PO file for 'gcc' (version 14.1-b20240218)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the French team of translators. The file is available at: https://translationproject.org/latest/gcc/fr.po (This file, 'gcc-14.1-b20240218.fr.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH] i386: Fix setup of incoming varargs for (...) functions which return large aggregates [PR114175]
> Am 16.03.2024 um 08:24 schrieb Jakub Jelinek : > > Hi! > > The c23-stdarg-6.c testcase I've added recently apparently works fine with > -O0 but aborts with -O1 and higher on x86_64-linux. > The problem is in setup of incoming varargs. > > Like function.cc before r14-9249 even ix86_setup_incoming_varargs assumes > that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there > is nothing to advance, but that is not the case for (...) functions > returning by hidden reference which have one such artificial argument. > If the setup_incoming_varargs hook is called from the > if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl)) > && fnargs.is_empty ()) >{ > struct assign_parm_data_one data = {}; > assign_parms_setup_varargs (&all, &data, false); >} > spot, i.e. where there is no hidden return argument passed, arg.type > is always NULL, while when it is called in the > if (cfun->stdarg && !DECL_CHAIN (parm)) >assign_parms_setup_varargs (&all, &data, false); > spot, even when it is TYPE_NO_NAMED_ARGS_STDARG_P arg.type will be non-NULL. > The tree-stdarg.cc pass in f in c23-stdarg-6.cc at -O1 or higher determines > that va_arg is used on integral types at most twice (loads 2 words), > and because ix86_setup_incoming_varargs doesn't advance, the code saves > just the %rdi and %rsi registers to the save area. But that isn't correct, > it should save %rsi and %rdx because %rdi is the hidden return argument. > With -O0 tree-stdarg.cc doesn't attempt to optimize and we save all the > registers, so it works fine in that case. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, > ok for trunk? Ok > Now, I think we'll need the same fix also on > aarch64, alpha, arc, csky, ia64, loongarch, mips, mmix, nios2, riscv, visium > which have pretty much the similarly looking snippet in their hooks > changed by the r13-3549 commit. > Then arm, epiphany, fr30, frv, ft32, m32r, mcore, nds32, rs6000, sh > have different changes but most likely need something similar too. > I don't have access to most of those, could test aarch64 and rs6000 I guess. If we have testsuite coverage after this I’d say go ahead for unmaintained ports. Richard > 2024-03-16 Jakub Jelinek > >PR target/114175 >* config/i386/i386.cc (ix86_setup_incoming_varargs): Only skip >ix86_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions >if arg.type is NULL. > >* gcc.dg/c23-stdarg-7.c: New test. >* gcc.dg/c23-stdarg-8.c: New test. > > --- gcc/config/i386/i386.cc.jj2024-03-05 10:26:19.424029862 +0100 > +++ gcc/config/i386/i386.cc2024-03-15 23:50:08.821823213 +0100 > @@ -4643,7 +4643,8 @@ ix86_setup_incoming_varargs (cumulative_ > /* For varargs, we do not want to skip the dummy va_dcl argument. > For stdargs, we do want to skip the last named argument. */ > next_cum = *cum; > - if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) > + if ((!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) > + || arg.type != NULL_TREE) > && stdarg_p (fntype)) > ix86_function_arg_advance (pack_cumulative_args (&next_cum), arg); > > --- gcc/testsuite/gcc.dg/c23-stdarg-7.c.jj2024-03-15 23:50:56.857185518 > +0100 > +++ gcc/testsuite/gcc.dg/c23-stdarg-7.c2024-03-15 23:50:45.611334813 +0100 > @@ -0,0 +1,6 @@ > +/* Test C23 variadic functions with no named parameters, or last named > + parameter with a declaration not allowed in C17. Execution tests. */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -std=c23 -pedantic-errors" } */ > + > +#include "c23-stdarg-4.c" > --- gcc/testsuite/gcc.dg/c23-stdarg-8.c.jj2024-03-15 23:51:20.814867467 > +0100 > +++ gcc/testsuite/gcc.dg/c23-stdarg-8.c2024-03-15 23:51:06.973051224 +0100 > @@ -0,0 +1,6 @@ > +/* Test C23 variadic functions with no named parameters, or last named > + parameter with a declaration not allowed in C17. Execution tests. */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -std=c23 -pedantic-errors" } */ > + > +#include "c23-stdarg-6.c" > >Jakub >
Re: [PATCH] bitint: Fix up stores to large/huge _BitInt bitfields [PR114329]
> Am 16.03.2024 um 08:11 schrieb Jakub Jelinek : > > Hi! > > The verifier requires BIT_FIELD_REFs with INTEGRAL_TYPE_P first operand > to have mode precision. In most cases for the large/huge _BitInt bitfield > stores the code uses bitfield representatives, which are typically arrays > of chars, but if the bitfield starts at byte boundary on big endian, > the code uses as nlhs in lower_mergeable_store COMPONENT_REF of the > bitfield FIELD_DECL instead, which is fine for the limb accesses, > but when used for the most significant limb can result in invalid > BIT_FIELD_REF because the first operand then has BITINT_TYPE and > usually VOIDmode. > > The following patch adds a helper method for the 4 creatikons of > BIT_FIELD_REF which when needed adds a VIEW_CONVERT_EXPR. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Richard > 2024-03-16 Jakub Jelinek > >PR tree-optimization/114329 >* gimple-lower-bitint.cc (struct bitint_large_huge): Declare >build_bit_field_ref method. >(bitint_large_huge::build_bit_field_ref): New method. >(bitint_large_huge::lower_mergeable_stmt): Use it. > >* gcc.dg/bitint-101.c: New test. > > --- gcc/gimple-lower-bitint.cc.jj2024-03-15 09:16:36.464033118 +0100 > +++ gcc/gimple-lower-bitint.cc2024-03-15 17:53:30.200276578 +0100 > @@ -427,6 +427,8 @@ struct bitint_large_huge > void insert_before (gimple *); > tree limb_access_type (tree, tree); > tree limb_access (tree, tree, tree, bool); > + tree build_bit_field_ref (tree, tree, unsigned HOST_WIDE_INT, > +unsigned HOST_WIDE_INT); > void if_then (gimple *, profile_probability, edge &, edge &); > void if_then_else (gimple *, profile_probability, edge &, edge &); > void if_then_if_then_else (gimple *g, gimple *, > @@ -659,6 +661,31 @@ bitint_large_huge::limb_access (tree typ > return ret; > } > > +/* Build a BIT_FIELD_REF to access BITSIZE bits with FTYPE type at > + offset BITPOS inside of OBJ. */ > + > +tree > +bitint_large_huge::build_bit_field_ref (tree ftype, tree obj, > +unsigned HOST_WIDE_INT bitsize, > +unsigned HOST_WIDE_INT bitpos) > +{ > + if (INTEGRAL_TYPE_P (TREE_TYPE (obj)) > + && !type_has_mode_precision_p (TREE_TYPE (obj))) > +{ > + unsigned HOST_WIDE_INT nelts > += CEIL (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (obj))), limb_prec); > + tree ltype = m_limb_type; > + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (obj)); > + if (as != TYPE_ADDR_SPACE (ltype)) > +ltype = build_qualified_type (ltype, TYPE_QUALS (ltype) > + | ENCODE_QUAL_ADDR_SPACE (as)); > + tree atype = build_array_type_nelts (ltype, nelts); > + obj = build1 (VIEW_CONVERT_EXPR, atype, obj); > +} > + return build3 (BIT_FIELD_REF, ftype, obj, bitsize_int (bitsize), > + bitsize_int (bitpos)); > +} > + > /* Emit a half diamond, >if (COND) > |\ > @@ -2651,9 +2678,9 @@ bitint_large_huge::lower_mergeable_stmt >} > tree ftype >= build_nonstandard_integer_type (limb_prec - bo_bit, 1); > - tree bfr = build3 (BIT_FIELD_REF, ftype, unshare_expr (nlhs), > - bitsize_int (limb_prec - bo_bit), > - bitsize_int (bo_idx * limb_prec + bo_bit)); > + tree bfr = build_bit_field_ref (ftype, unshare_expr (nlhs), > + limb_prec - bo_bit, > + bo_idx * limb_prec + bo_bit); > tree t = add_cast (ftype, rhs1); > g = gimple_build_assign (bfr, t); > insert_before (g); > @@ -2714,12 +2741,11 @@ bitint_large_huge::lower_mergeable_stmt >{ > tree ftype >= build_nonstandard_integer_type (rprec + bo_bit, 1); > - tree bfr = build3 (BIT_FIELD_REF, ftype, > - unshare_expr (nlhs), > - bitsize_int (rprec + bo_bit), > - bitsize_int ((bo_idx > - + tprec / limb_prec) > - * limb_prec)); > + tree bfr > += build_bit_field_ref (ftype, unshare_expr (nlhs), > + rprec + bo_bit, > + (bo_idx + tprec / limb_prec) > + * limb_prec); > tree t = add_cast (ftype, rhs1); > g = gimple_build_assign (bfr, t); > done = true; > @@ -2860,11 +2886,11 @@ bitint_large_huge::lower_mergeable_stmt >{ > tree ftype >= build_nonstandard_integer_type (rprec + bo_bit, 1); > - tree bfr = build3 (BIT_FIELD_REF, ftype, > - unshare_expr (nlhs), > - bitsize_int (rprec + bo_bit), > - bitsize_int ((bo_idx + tprec / limb_prec) > - * limb_prec)); > + tree bfr > += build_bit_field_ref (ftype, unshare_expr
Re: _LIBCXX_DEBUG value initialized singular iterators assert failures in std algorithms [PR104316]
With the patch, sorry. On 14/03/2024 22:49, François Dumont wrote: Hi This is what I started to do. For now I haven't touch to __cpp_lib_null_iterators definition as _Safe_local_iterator still need some work. libstdc++: Implement N3644 on _Safe_iterator<> [PR114316] Consider range of value-initialized iterators as valid and empty. libstdc++-v3/ChangeLog: PR libstdc++/114316 * include/debug/safe_iterator.tcc (_Safe_iterator<>::_M_valid_range): First check if both iterators are value-initialized before checking if singular. * testsuite/23_containers/set/debug/114316.cc: New test case. * testsuite/23_containers/vector/debug/114316.cc: New test case. Tested under Linux x86_64, ok to commit ? François On 12/03/2024 10:52, Jonathan Wakely wrote: On Tue, 12 Mar 2024 at 01:03, Jonathan Wakely wrote: On Tue, 12 Mar 2024 at 00:55, Maciej Miera wrote: Wiadomość napisana przez Jonathan Wakely w dniu 11.03.2024, o godz. 21:40: On Mon, 11 Mar 2024 at 20:07, Maciej Miera wrote: Hello, I have tried to introduce an extra level of safety to my codebase and utilize _GLIBCXX_DEBUG in my test builds in order to catch faulty iterators. However, I have encountered the following problem: I would like to utilize singular, value-initialized iterators as an arbitrary "null range”. However, this leads to failed assertions in std:: algorithms taking such range. Consider the following code sample with find_if: #include #include #include #ifndef __cpp_lib_null_iterators #warning "Not standard compliant" #endif int main() { std::multimap::iterator it1{}; std::multimap::iterator it2{}; (void) (it1==it2); // OK (void) std::find_if( it1, it2, [](const auto& el) { return el.second == 8;}); } Compiled with -std=c++20 and -D_GLIBCXX_DEBUG it produces the warning "Not standard compliant" and the execution results in the following assert failure: /opt/compiler-explorer/gcc-12.2.0/include/c++/12.2.0/bits/stl_algo.h:3875: In function: constexpr _IIter std::find_if(_IIter, _IIter, _Predicate) [with _IIter = gnu_debug::_Safe_iterator<_Rb_tree_iterator >, debug::multimap, bidirectional_iterator_tag>; _Predicate = main()::] The question is though: is it by design, or is it just a mere oversight? The warning actually suggest the first option. If it is an intentional design choice, could you provide some rationale behind it, please? The macro was not defined because the C++14 rule wasn't implemented for debug mode, but that should have been fixed for GCC 11, according to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70303 So we should be able to define macro now, except maybe it wasn't fixed for the RB tree containers. Just to make sure there are no misunderstandings: comparison via == works fine. The feature check macro without _GLIBCXX_DEBUG and with included is also fine. Maybe the need to include a header is the issue, but that’s not the core of the problem anyway. No, it has nothing to do with the headers included. The feature test macro is defined like so: # if (__cplusplus >= 201402L) && (!defined(_GLIBCXX_DEBUG)) # define __glibcxx_null_iterators 201304L It's a very deliberate choice to not define it when _GLIBCXX_DEBUG is defined. But as I said, I think we should have changed that. The actual question is though, whether passing singular iterators to std algorithms (like find_if) should make the asserts at the beginning of the algo function fail when compiled with _GLIBCXX_DEBUG. IMHO, intuitively it should not, as comparing iterators equal would just ensure early exit and return of the same singular iterator. This seems not to be the case though. The actual message is this: Error: the function requires a valid iterator range [first, last). What bothers me is whether the empty virtual range limited by two same singular iterators is actually valid or not. Yes, it's valid. So the bug is in the __glibcxx_requires_valid_range macro. Thanks for the bugzilla report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114316 We'll get it fixed! diff --git a/libstdc++-v3/include/debug/safe_iterator.tcc b/libstdc++-v3/include/debug/safe_iterator.tcc index a8b24233e85..4b2baf2980e 100644 --- a/libstdc++-v3/include/debug/safe_iterator.tcc +++ b/libstdc++-v3/include/debug/safe_iterator.tcc @@ -194,6 +194,12 @@ namespace __gnu_debug std::pair& __dist, bool __check_dereferenceable) const { + if (_M_value_initialized() && __rhs._M_value_initialized()) + { + __dist = std::make_pair(0, __dp_exact); + return true; + } + if (_M_singular() || __rhs._M_singular() || !_M_can_compare(__rhs)) return false; @@ -218,6 +224,12 @@ namespace __gnu_debug std::pair& __dist) const { + if (this->_M_value_initialized() && __rhs._M_value_initial
[PATCH v3] c++: Fix handling of no-linkage decls for modules
On Mon, Mar 11, 2024 at 02:13:34PM -0400, Jason Merrill wrote: > On 3/8/24 18:18, Nathaniel Shead wrote: > > On Fri, Mar 08, 2024 at 10:19:52AM -0500, Jason Merrill wrote: > > > On 3/7/24 21:55, Nathaniel Shead wrote: > > > > On Mon, Nov 27, 2023 at 03:59:39PM +1100, Nathaniel Shead wrote: > > > > > On Thu, Nov 23, 2023 at 03:03:37PM -0500, Nathan Sidwell wrote: > > > > > > On 11/20/23 04:47, Nathaniel Shead wrote: > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have > > > > > > > write > > > > > > > access. > > > > > > > > > > > > > > -- >8 -- > > > > > > > > > > > > > > Block-scope declarations of functions or extern values are not > > > > > > > allowed > > > > > > > when attached to a named module. Similarly, class member > > > > > > > functions are > > > > > > > not inline if attached to a named module. However, in both these > > > > > > > cases > > > > > > > we currently only check if the declaration is within the module > > > > > > > purview; > > > > > > > it is possible for such a declaration to occur within the module > > > > > > > purview > > > > > > > but not be attached to a named module (e.g. in an 'extern "C++"' > > > > > > > block). > > > > > > > This patch makes the required adjustments. > > > > > > > > > > > > > > > > > > Ah I'd been puzzling over the default inlinedness of member-fns of > > > > > > block-scope structs. Could you augment the testcase to make sure > > > > > > that's > > > > > > right too? > > > > > > > > > > > > Something like: > > > > > > > > > > > > // dg-module-do link > > > > > > export module Mod; > > > > > > > > > > > > export auto Get () { > > > > > > struct X { void Fn () {} }; > > > > > > return X(); > > > > > > } > > > > > > > > > > > > > > > > > > /// > > > > > > import Mod > > > > > > void Frob () { Get().Fn(); } > > > > > > > > > > > > > > > > I gave this a try and it indeed doesn't work correctly; 'Fn' needs to > > > > > be > > > > > marked 'inline' for this to link (whether or not 'Get' itself is > > > > > inline). I've tried tracing the code to work out what's going on but > > > > > I've been struggling to work out how all the different flags (e.g. > > > > > TREE_PUBLIC, TREE_EXTERNAL, TREE_COMDAT, DECL_NOT_REALLY_EXTERN) > > > > > interact, which flags we want to be set where, and where the decision > > > > > of > > > > > what function definitions to emit is actually made. > > > > > > > > > > I did find that doing 'mark_used(decl)' on all member functions in > > > > > block-scope structs seems to work however, but I wonder if that's > > > > > maybe > > > > > too aggressive or if there's something else we should be doing? > > > > > > > > I got around to looking at this again, here's an updated version of this > > > > patch. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > > > > > > > (I'm not sure if 'start_preparsed_function' is the right place to be > > > > putting this kind of logic or if it should instead be going in > > > > 'grokfndecl', e.g. decl.cc:10761 where the rules for making local > > > > functions have no linkage are initially determined, but I found this > > > > easier to implement: happy to move around though if preferred.) > > > > > > > > -- >8 -- > > > > > > > > Block-scope declarations of functions or extern values are not allowed > > > > when attached to a named module. Similarly, class member functions are > > > > not inline if attached to a named module. However, in both these cases > > > > we currently only check if the declaration is within the module purview; > > > > it is possible for such a declaration to occur within the module purview > > > > but not be attached to a named module (e.g. in an 'extern "C++"' block). > > > > This patch makes the required adjustments. > > > > > > > > While implementing this we discovered that block-scope local functions > > > > are not correctly emitted, causing link failures; this patch also > > > > corrects some assumptions here and ensures that they are emitted when > > > > needed. > > > > > > > > PR c++/112631 > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > * cp-tree.h (named_module_attach_p): New function. > > > > * decl.cc (start_decl): Check for attachment not purview. > > > > (grokmethod): Likewise. > > > > > > These changes are OK; the others I want to consider more. > > > > Thanks, I can commit this as a separate commit if you prefer? > > Please. > Thanks, committed as r14-9501-gead3075406ece9. > > > > +export auto n_n() { > > > > + internal(); > > > > + struct X { void f() { internal(); } }; > > > > + return X{}; > > > > > > Hmm, is this not a prohibited exposure? Seems like X has no linkage > > > because > > > it's at block scope, and the deduced return type names it. > > > > > > I know we try to support this "voldemort" pattern, but is that actually > > > correct? > > > > I had similar doubts, but this is not an especially uncommon pattern in > > the wild e
[PATCH v4 2/2] extend.texi: Add documentation for all missing built-in traits [PR87343]
Fixed some wording and consistency issues in the documentation. -- >8 -- PR c++/87343 gcc/ChangeLog: * doc/extend.texi (Expression-yielding Type Traits): New subsection. (Type-yielding Type Traits): Likewise. (C++ Concepts): Move __is_same to ... (Expression-yielding Type Traits): ... here. (__is_enum): Add @anchor and @ref to related traits. (__is_trivial): Likewise. (__is_class): Fix documentation. (__underlying_type): Likewise. (__has_nothrow_assign): Ensure consistency. (__has_trivial_constructor): Likewise. (__has_trivial_copy): Likewise. (__has_trivial_assign): Likewise. (__has_trivial_constructor): Likewise. (__has_trivial_copy): Likewise. (__has_trivial_destructor): Likewise. (__has_virtual_destructor): Likewise. (__is_abstract): Likewise. (__is_aggregate): Likewise. (__is_base_of): Likewise. (__is_empty): Likewise. (__is_enum): Likewise. (__is_literal_type): Likewise. (__is_pod): Likewise. (__is_polymorphic): Likewise. (__is_standard_layout): Likewise. (__is_trivial): Likewise. (__is_union): Likewise. (__has_unique_object_representations): New documentation. (__is_array): Likewise. (__is_assignable): Likewise. (__is_bounded_array): Likewise. (__is_constructible): Likewise. (__is_convertible): Likewise. (__is_function): Likewise. (__is_layout_compatible): Likewise. (__is_member_function_pointer): Likewise. (__is_member_object_pointer): Likewise. (__is_member_pointer): Likewise. (__is_nothrow_assignable): Likewise. (__is_nothrow_constructible): Likewise. (__is_nothrow_convertible): Likewise. (__is_object): Likewise. (__is_pointer_interconvertible_base_of): Likewise. (__is_reference): Likewise. (__is_scoped_enum): Likewise. (__is_trivially_assignable): Likewise. (__is_trivially_constructible): Likewise. (__is_trivially_copyable): Likewise. (__reference_constructs_from_temporary): Likewise. (__reference_converts_from_temporary): Likewise. (__bases): Likewise. (__direct_bases): Likewise. (__remove_cv): Likewise. (__remove_cvref): Likewise. (__remove_pointer): Likewise. (__remove_reference): Likewise. (__type_pack_element): Likewise. Signed-off-by: Ken Matsui --- gcc/doc/extend.texi | 355 +--- 1 file changed, 304 insertions(+), 51 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 1c61682b102..eb61f619e02 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -29485,79 +29485,91 @@ Function Multiversioning} for more details. @section Type Traits The C++ front end implements syntactic extensions that allow -compile-time determination of -various characteristics of a type (or of a -pair of types). +compile-time determination of various characteristics of a type +(or of a pair of types). + + +@subsection Value-yielding Type Traits + +Most of these built-ins return a @code{bool} result when invoked; the rest +yields a @code{size_t} value. @defbuiltin{bool __has_nothrow_assign (@var{type})} -If @var{type} is @code{const}-qualified or is a reference type then -the trait is @code{false}. Otherwise if @code{__has_trivial_assign (type)} -is @code{true} then the trait is @code{true}, else if @var{type} is +If @var{type} is @code{const}-qualified or is a reference type, then +the trait is @code{false}. Otherwise, if @code{__has_trivial_assign (type)} +is @code{true}, then the trait is @code{true}, else if @var{type} is a cv-qualified class or union type with copy assignment operators that are -known not to throw an exception then the trait is @code{true}, else it is +known not to throw an exception, then the trait is @code{true}, else it is @code{false}. Requires: @var{type} shall be a complete type, (possibly cv-qualified) @code{void}, or an array of unknown bound. @enddefbuiltin @defbuiltin{bool __has_nothrow_constructor (@var{type})} -If @code{__has_trivial_constructor (type)} is @code{true} then the trait +If @code{__has_trivial_constructor (type)} is @code{true}, then the trait is @code{true}, else if @var{type} is a cv class or union type (or array thereof) with a default constructor that is known not to throw an -exception then the trait is @code{true}, else it is @code{false}. +exception, then the trait is @code{true}, else it is @code{false}. Requires: @var{type} shall be a complete type, (possibly cv-qualified) @code{void}, or an array of unknown bound. @enddefbuiltin @defbuiltin{bool __has_nothrow_copy (@var{type})} -If @code{__has_trivial_copy (type)} is @code{true} then the trait is +If @code{__has_trivial_copy (type)} is @code{true},
[PING] Re: [PATCH] testsuite: Define _POSIX_C_SOURCE for test
Ping! Kind regards, Torbjörn On 2024-03-10 18:26, Torbjörn SVENSSON wrote: Ok for trunk? -- As the tests assume that strndup() is visible (only part of POSIX.1-2008) define the guard to ensure that it's visible. Currently, glibc appears to always have this defined in C++, newlib does not. Without this patch, fails like this can be seen: Testing analyzer/strndup-1.c, -std=c++98 .../strndup-1.c: In function 'void test_1(const char*)': .../strndup-1.c:11:13: error: 'strndup' was not declared in this scope; did you mean 'strncmp'? .../strndup-1.c: In function 'void test_2(const char*)': .../strndup-1.c:16:13: error: 'strndup' was not declared in this scope; did you mean 'strncmp'? .../strndup-1.c: In function 'void test_3(const char*)': .../strndup-1.c:21:13: error: 'strndup' was not declared in this scope; did you mean 'strncmp'? Patch has been verified on Linux. gcc/testsuite/ChangeLog: * c-c++-common/analyzer/strndup-1.c: Define _POSIX_C_SOURCE. Signed-off-by: Torbjörn SVENSSON --- gcc/testsuite/c-c++-common/analyzer/strndup-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/c-c++-common/analyzer/strndup-1.c b/gcc/testsuite/c-c++-common/analyzer/strndup-1.c index 85ccae85d83..577ece0cfba 100644 --- a/gcc/testsuite/c-c++-common/analyzer/strndup-1.c +++ b/gcc/testsuite/c-c++-common/analyzer/strndup-1.c @@ -1,4 +1,5 @@ /* { dg-skip-if "no strndup in libc" { *-*-darwin[789]* *-*-darwin10* hppa*-*-hpux* *-*-mingw* *-*-vxworks* } } */ +/* { dg-additional-options "-D_POSIX_C_SOURCE=200809L" } */ #include #include
Re: [PATCH] Fix libcc1plugin and libc1plugin to avoid poisoned identifiers
Given its fixes build, is obvious, and tested appropriately: patch pushed as https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5213047b1d50af63dfabb5e5649821a6cb157e33 FX
[PATCH] i386: Fix setup of incoming varargs for (...) functions which return large aggregates [PR114175]
Hi! The c23-stdarg-6.c testcase I've added recently apparently works fine with -O0 but aborts with -O1 and higher on x86_64-linux. The problem is in setup of incoming varargs. Like function.cc before r14-9249 even ix86_setup_incoming_varargs assumes that TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there is nothing to advance, but that is not the case for (...) functions returning by hidden reference which have one such artificial argument. If the setup_incoming_varargs hook is called from the if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl)) && fnargs.is_empty ()) { struct assign_parm_data_one data = {}; assign_parms_setup_varargs (&all, &data, false); } spot, i.e. where there is no hidden return argument passed, arg.type is always NULL, while when it is called in the if (cfun->stdarg && !DECL_CHAIN (parm)) assign_parms_setup_varargs (&all, &data, false); spot, even when it is TYPE_NO_NAMED_ARGS_STDARG_P arg.type will be non-NULL. The tree-stdarg.cc pass in f in c23-stdarg-6.cc at -O1 or higher determines that va_arg is used on integral types at most twice (loads 2 words), and because ix86_setup_incoming_varargs doesn't advance, the code saves just the %rdi and %rsi registers to the save area. But that isn't correct, it should save %rsi and %rdx because %rdi is the hidden return argument. With -O0 tree-stdarg.cc doesn't attempt to optimize and we save all the registers, so it works fine in that case. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Now, I think we'll need the same fix also on aarch64, alpha, arc, csky, ia64, loongarch, mips, mmix, nios2, riscv, visium which have pretty much the similarly looking snippet in their hooks changed by the r13-3549 commit. Then arm, epiphany, fr30, frv, ft32, m32r, mcore, nds32, rs6000, sh have different changes but most likely need something similar too. I don't have access to most of those, could test aarch64 and rs6000 I guess. 2024-03-16 Jakub Jelinek PR target/114175 * config/i386/i386.cc (ix86_setup_incoming_varargs): Only skip ix86_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions if arg.type is NULL. * gcc.dg/c23-stdarg-7.c: New test. * gcc.dg/c23-stdarg-8.c: New test. --- gcc/config/i386/i386.cc.jj 2024-03-05 10:26:19.424029862 +0100 +++ gcc/config/i386/i386.cc 2024-03-15 23:50:08.821823213 +0100 @@ -4643,7 +4643,8 @@ ix86_setup_incoming_varargs (cumulative_ /* For varargs, we do not want to skip the dummy va_dcl argument. For stdargs, we do want to skip the last named argument. */ next_cum = *cum; - if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) + if ((!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) + || arg.type != NULL_TREE) && stdarg_p (fntype)) ix86_function_arg_advance (pack_cumulative_args (&next_cum), arg); --- gcc/testsuite/gcc.dg/c23-stdarg-7.c.jj 2024-03-15 23:50:56.857185518 +0100 +++ gcc/testsuite/gcc.dg/c23-stdarg-7.c 2024-03-15 23:50:45.611334813 +0100 @@ -0,0 +1,6 @@ +/* Test C23 variadic functions with no named parameters, or last named + parameter with a declaration not allowed in C17. Execution tests. */ +/* { dg-do run } */ +/* { dg-options "-O2 -std=c23 -pedantic-errors" } */ + +#include "c23-stdarg-4.c" --- gcc/testsuite/gcc.dg/c23-stdarg-8.c.jj 2024-03-15 23:51:20.814867467 +0100 +++ gcc/testsuite/gcc.dg/c23-stdarg-8.c 2024-03-15 23:51:06.973051224 +0100 @@ -0,0 +1,6 @@ +/* Test C23 variadic functions with no named parameters, or last named + parameter with a declaration not allowed in C17. Execution tests. */ +/* { dg-do run } */ +/* { dg-options "-O2 -std=c23 -pedantic-errors" } */ + +#include "c23-stdarg-6.c" Jakub
[PATCH] bitint: Fix up stores to large/huge _BitInt bitfields [PR114329]
Hi! The verifier requires BIT_FIELD_REFs with INTEGRAL_TYPE_P first operand to have mode precision. In most cases for the large/huge _BitInt bitfield stores the code uses bitfield representatives, which are typically arrays of chars, but if the bitfield starts at byte boundary on big endian, the code uses as nlhs in lower_mergeable_store COMPONENT_REF of the bitfield FIELD_DECL instead, which is fine for the limb accesses, but when used for the most significant limb can result in invalid BIT_FIELD_REF because the first operand then has BITINT_TYPE and usually VOIDmode. The following patch adds a helper method for the 4 creatikons of BIT_FIELD_REF which when needed adds a VIEW_CONVERT_EXPR. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-03-16 Jakub Jelinek PR tree-optimization/114329 * gimple-lower-bitint.cc (struct bitint_large_huge): Declare build_bit_field_ref method. (bitint_large_huge::build_bit_field_ref): New method. (bitint_large_huge::lower_mergeable_stmt): Use it. * gcc.dg/bitint-101.c: New test. --- gcc/gimple-lower-bitint.cc.jj 2024-03-15 09:16:36.464033118 +0100 +++ gcc/gimple-lower-bitint.cc 2024-03-15 17:53:30.200276578 +0100 @@ -427,6 +427,8 @@ struct bitint_large_huge void insert_before (gimple *); tree limb_access_type (tree, tree); tree limb_access (tree, tree, tree, bool); + tree build_bit_field_ref (tree, tree, unsigned HOST_WIDE_INT, + unsigned HOST_WIDE_INT); void if_then (gimple *, profile_probability, edge &, edge &); void if_then_else (gimple *, profile_probability, edge &, edge &); void if_then_if_then_else (gimple *g, gimple *, @@ -659,6 +661,31 @@ bitint_large_huge::limb_access (tree typ return ret; } +/* Build a BIT_FIELD_REF to access BITSIZE bits with FTYPE type at + offset BITPOS inside of OBJ. */ + +tree +bitint_large_huge::build_bit_field_ref (tree ftype, tree obj, + unsigned HOST_WIDE_INT bitsize, + unsigned HOST_WIDE_INT bitpos) +{ + if (INTEGRAL_TYPE_P (TREE_TYPE (obj)) + && !type_has_mode_precision_p (TREE_TYPE (obj))) +{ + unsigned HOST_WIDE_INT nelts + = CEIL (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (obj))), limb_prec); + tree ltype = m_limb_type; + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (obj)); + if (as != TYPE_ADDR_SPACE (ltype)) + ltype = build_qualified_type (ltype, TYPE_QUALS (ltype) + | ENCODE_QUAL_ADDR_SPACE (as)); + tree atype = build_array_type_nelts (ltype, nelts); + obj = build1 (VIEW_CONVERT_EXPR, atype, obj); +} + return build3 (BIT_FIELD_REF, ftype, obj, bitsize_int (bitsize), +bitsize_int (bitpos)); +} + /* Emit a half diamond, if (COND) |\ @@ -2651,9 +2678,9 @@ bitint_large_huge::lower_mergeable_stmt } tree ftype = build_nonstandard_integer_type (limb_prec - bo_bit, 1); - tree bfr = build3 (BIT_FIELD_REF, ftype, unshare_expr (nlhs), -bitsize_int (limb_prec - bo_bit), -bitsize_int (bo_idx * limb_prec + bo_bit)); + tree bfr = build_bit_field_ref (ftype, unshare_expr (nlhs), + limb_prec - bo_bit, + bo_idx * limb_prec + bo_bit); tree t = add_cast (ftype, rhs1); g = gimple_build_assign (bfr, t); insert_before (g); @@ -2714,12 +2741,11 @@ bitint_large_huge::lower_mergeable_stmt { tree ftype = build_nonstandard_integer_type (rprec + bo_bit, 1); - tree bfr = build3 (BIT_FIELD_REF, ftype, -unshare_expr (nlhs), -bitsize_int (rprec + bo_bit), -bitsize_int ((bo_idx - + tprec / limb_prec) - * limb_prec)); + tree bfr + = build_bit_field_ref (ftype, unshare_expr (nlhs), + rprec + bo_bit, + (bo_idx + tprec / limb_prec) + * limb_prec); tree t = add_cast (ftype, rhs1); g = gimple_build_assign (bfr, t); done = true; @@ -2860,11 +2886,11 @@ bitint_large_huge::lower_mergeable_stmt { tree ftype = build_nonstandard_integer_type (rprec + bo_bit, 1); - tree bfr = build3 (BIT_FIELD_REF, ftype, -