Re: [PATCH 2/2] arm: Add cortex-m52 doc
On 2024/01/08 22:32 UTC+8, Kyrylo Tkachov wrote: -Original Message- From: Chung-Ju Wu Sent: Monday, January 8, 2024 6:17 AM To: gcc-patches ; Kyrylo Tkachov ; Richard Earnshaw Cc: jason...@anshingtek.com.tw Subject: [PATCH 2/2] arm: Add cortex-m52 doc Hi, This is the patch to add cortex-m52 in the Arm-related options sections of the gcc invoke.texi documentation. Is it OK for trunk? In the ChangeLog entry: gcc/ChangeLog: * doc/invoke.texi: Update docs. Let's be more specific and specify something like * doc/invoke.texi (Arm Options): Document Cortex-m52 options. Ok with a better ChangeLog entry. Hi Kyrylo, Thanks for the suggestion and approval. The patch is revised and committed as: https://gcc.gnu.org/g:43c4f982113076ad54c3405f865cc63b0a5ba5aa Thanks, jasonwucj Thanks, Kyrill Regards, jasonwucj
Re: [PATCH 1/2] arm: Add cortex-m52 core
On 2024/01/08 22:31 UTC+8, Kyrylo Tkachov wrote: Hi jasonwucj, -Original Message- From: Chung-Ju Wu Sent: Monday, January 8, 2024 6:16 AM To: gcc-patches ; Kyrylo Tkachov ; Richard Earnshaw Cc: jason...@anshingtek.com.tw Subject: [PATCH 1/2] arm: Add cortex-m52 core Hi, Recently, Arm announced the Cortex-M52, delivering increased performance in DSP and ML along with a range of other features and benefits. For the completeness of Arm ecosystem, we hope that cortex-m52 support could be available in gcc-14. Attached is the patch to support cortex-m52 cpu with MVE and PACBTI enabled in GCC. Bootstrapped and tested on arm-none-eabi. Is it OK for trunk? The patch looks good to me. It should be safe to include it in GCC 14 as it doesn’t add any new logic beyond a new entry in arm-cpus.in. Do you have commit rights to push it? Hi Kyrylo, Thanks for the approval. Yes, I have commit right to push it. The patch is committed as: https://gcc.gnu.org/g:6e249a9ad9d26fb01b147d33be9f9bfebca85c24 Thanks, jasonwucj Thanks, Kyrill Regards, jasonwucj
RE: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option
The test case pr30957-1.c first comes from this commit about 19 years ago which expect the -1.0 for testing. https://github.com/gcc-mirror/gcc/commit/290358f770d21d9204ea621f839ee8fba606a275 Then the below commit changes from -1.0 to +1.0 for this test file only, because of the instantiates copy(s) of the accumulator which it initializes with +0.0. https://github.com/gcc-mirror/gcc/commit/ffefa9288ab95b06b1dfed95e7235f4c09619a91 According to the implementation details of insert_var_expansion_initialization. The zero_init will be the CONST0_RTX (mode) if HONOR_SIGNED_ZEROS is false. If my understanding is correct, maybe the test case is not well designed for the variable expanding in unrolling? At least it is not good idea to mix/rely on the HONOR_SIGNED_ZEROS when testing variable-expansion-in-unroller. CC the original author and please feel free to correct me if any misunderstanding. Pan -Original Message- From: Li, Pan2 Sent: Tuesday, January 9, 2024 9:22 AM To: Richard Biener Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang ; kito.ch...@gmail.com; jeffreya...@gmail.com Subject: RE: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option Thanks Richard B for comments. > We don't really expect targets to do this. The small testcase above > is somewhat ill-formed with -fno-signed-zeros. Note there's no > -0.0 in pr30957-1.c so why does that one fail for you? Does > the -fvariable-expansion-in-unroller code maybe not trigger for > riscv? Sorry this confused me a little about the sematics of the option -fno-signed-zeros. I wonder what the target/backend need to do for this option. About the failure, it comes from below code in pr30957-1.c. The 0.0 / -5.0 is initialized to -0.0 in riscv but +0.0 in aarch64. if (__builtin_copysignf (1.0, foo (0.0 / -5.0, 10)) != 1.0) abort (); If my understanding is correct, the loop will be vectorized during vect_transform_loop with a variable factor. It won't benefit from unrolling/peeling and mark the loop->unroll as 1, and then we have tree-vect log similar to below: Disabling unrolling due to variable-length vectorization factor. > I think we should go to PR30957 and see what that was filed originally > for, the testcase doesn't make much sense to me. Sure thing, will take a look and back to you later. Pan -Original Message- From: Richard Biener Sent: Monday, January 8, 2024 6:45 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang ; kito.ch...@gmail.com; jeffreya...@gmail.com Subject: Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option On Tue, Jan 2, 2024 at 2:37 PM wrote: > > From: Pan Li > > According to the sematics of no-signed-zeros option, the backend > like RISC-V should treat the minus zero -0.0f as plus zero 0.0f. > > Consider below example with option -fno-signed-zeros. > > void > test (float *a) > { > *a = -0.0; > } > > We will generate code as below, which doesn't treat the minus zero > as plus zero. > > test: > lui a5,%hi(.LC0) > flw fa5,%lo(.LC0)(a5) > fsw fa5,0(a0) > ret > > .LC0: > .word -2147483648 // aka -0.0 (0x8000 in hex) > > This patch would like to fix the bug and treat the minus zero -0.0 > as plus zero, aka +0.0. Thus after this patch we will have asm code > as below for the above sampe code. > > test: > sw zero,0(a0) > ret > > This patch also fix the run failure of the test case pr30957-1.c. The > below tests are passed for this patch. We don't really expect targets to do this. The small testcase above is somewhat ill-formed with -fno-signed-zeros. Note there's no -0.0 in pr30957-1.c so why does that one fail for you? Does the -fvariable-expansion-in-unroller code maybe not trigger for riscv? I think we should go to PR30957 and see what that was filed originally for, the testcase doesn't make much sense to me. > * The riscv regression tests. > * The pr30957-1.c run tests. > > gcc/ChangeLog: > > * config/riscv/constraints.md: Leverage func > riscv_float_const_zero_rtx_p > for predicating the rtx is const zero float or not. > * config/riscv/predicates.md: Ditto. > * config/riscv/riscv.cc (riscv_const_insns): Ditto. > (riscv_float_const_zero_rtx_p): New func impl for predicating the rtx > is > const zero float or not. > (riscv_const_zero_rtx_p): New func impl for predicating the rtx > is const zero (both int and fp) or not. > * config/riscv/riscv-protos.h (riscv_float_const_zero_rtx_p): > New func decl. > (riscv_const_zero_rtx_p): Ditto. > * config/riscv/riscv.md: Making sure the operand[1] of movfp is > CONST0_RTX when the operand[1] is const zero float. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/no-signed-zeros-0.c: New test. > * gcc.target/riscv/no-signed-zeros-1.c: New test. > * gcc.target/riscv/no-signed-zeros-2.c: New test. > *
[wwwdocs][PATCH] gcc-14/changes: Update APX inline asm behavior for x86_64
Hi, This patch adds missing description for inline asm behavior and related compiler switch for APX. Ok for gcc-wwwdocs? --- htdocs/gcc-14/changes.html | 6 ++ 1 file changed, 6 insertions(+) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index e3a68998..73a90d30 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -342,6 +342,12 @@ a work-in-progress. NDD, PPX and PUSH2POP2. APX support is available via the -mapxf compiler switch. + For inline asm support with APX, by default the EGPR feature was + disabled to prevent potential illegal instruction with EGPR occurs. + To invoke egpr usage in inline asm, use new compiler option + -mapx-inline-asm-use-gpr32 and user should ensure the instruction + supports EGPR. + New ISA extension support for Intel AVX10.1 was added. AVX10.1 intrinsics are available via the -mavx10.1 or -mavx10.1-256 compiler switch with 256-bit vector size -- 2.31.1
[PATCH] i386: [APX] Document inline asm behavior and new switch for APX
Hi, For APX, the inline asm behavior was not mentioned in any document before. Add description for it. Ok for trunk? gcc/ChangeLog: * config/i386/i386.opt: Adjust document. * doc/invoke.texi: Add description for -mapx-inline-asm-use-gpr32. --- gcc/config/i386/i386.opt | 3 +-- gcc/doc/invoke.texi | 7 +++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index a38e92baf92..5b4f1bff25f 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1357,8 +1357,7 @@ Enum(apx_features) String(all) Value(apx_all) Set(1) mapx-inline-asm-use-gpr32 Target Var(ix86_apx_inline_asm_use_gpr32) Init(0) -Enable GPR32 in inline asm when APX_EGPR enabled, do not -hook reg or mem constraint in inline asm to GPR16. +Enable GPR32 in inline asm when APX_F enabled. mevex512 Target Mask(ISA2_EVEX512) Var(ix86_isa_flags2) Save diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 68d1f364ac0..47fd96648d8 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -35272,6 +35272,13 @@ r8-r15 registers so that the call and jmp instruction length is 6 bytes to allow them to be replaced with @samp{lfence; call *%r8-r15} or @samp{lfence; jmp *%r8-r15} at run-time. +@opindex mapx-inline-asm-use-gpr32 +@item -mapx-inline-asm-use-gpr32 +When APX_F enabled, EGPR usage was by default disabled to prevent +unexpected EGPR generation in instructions that does not support it. +To invoke EGPR usage in inline asm, use this switch to allow EGPR in +inline asm, while user should ensure the asm actually supports EGPR. + @end table These @samp{-m} switches are supported in addition to the above -- 2.31.1
[PATCH] Add -mevex512 into invoke.texi
Hi all, In invoke.texi, -mevex512 is missing. This patch adds that. Ok for trunk? Thx, Haochen gcc/ChangeLog: * doc/invoke.texi: Add -mevex512. --- gcc/doc/invoke.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 68d1f364ac0..1a92dcdc1ef 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1463,7 +1463,7 @@ See RS/6000 and PowerPC Options. -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd -mamx-fp16 -mprefetchi -mraoint -mamx-complex -mavxvnniint16 -msm3 -msha512 -msm4 -mapxf --musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512 +-musermsr -mavx10.1 -mavx10.1-256 -mavx10.1-512 -mevex512 -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops -minline-stringops-dynamically -mstringop-strategy=@var{alg} -mkl -mwidekl -- 2.31.1
Re:[pushed] [PATCH] LoongArch: Implenment vec_init where N is a LSX vector mode
Pushed to r14-7022. 在 2024/1/5 下午3:38, Jiahao Xu 写道: This patch implenments more vec_init optabs that can handle two LSX vectors producing a LASX vector by concatenating them. When an lsx vector is concatenated with an LSX const_vector of zeroes, the vec_concatz pattern can be used effectively. For example as below typedef short v8hi __attribute__ ((vector_size (16))); typedef short v16hi __attribute__ ((vector_size (32))); v8hi a, b; v16hi vec_initv16hiv8hi () { return __builtin_shufflevector (a, b, 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15); } Before this patch: vec_initv16hiv8hi: addi.d $r3,$r3,-64 .cfi_def_cfa_offset 64 xvrepli.h $xr0,0 la.local$r12,.LANCHOR0 xvst$xr0,$r3,0 xvst$xr0,$r3,32 vld $vr0,$r12,0 vst $vr0,$r3,0 vld $vr0,$r12,16 vst $vr0,$r3,32 xvld$xr1,$r3,32 xvld$xr2,$r3,32 xvld$xr0,$r3,0 xvilvh.h$xr0,$xr1,$xr0 xvld$xr1,$r3,0 xvilvl.h$xr1,$xr2,$xr1 addi.d $r3,$r3,64 .cfi_def_cfa_offset 0 xvpermi.q $xr0,$xr1,32 jr $r1 After this patch: vec_initv16hiv8hi: la.local$r12,.LANCHOR0 vld $vr0,$r12,32 vld $vr2,$r12,48 xvilvh.h$xr1,$xr2,$xr0 xvilvl.h$xr0,$xr2,$xr0 xvpermi.q $xr1,$xr0,32 xvst$xr1,$r4,0 jr $r1 gcc/ChangeLog: * config/loongarch/lasx.md (vec_initv32qiv16qi): Rename to .. (vec_init): .. this, and extend to mode. (@vec_concatz): New insn pattern. * config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Handle VALS containing two vectors. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: New test. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index e196613ffe4..36dc3d95eac 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -465,6 +465,11 @@ (V16HI "w") (V32QI "w")]) +;; Half modes of all LASX vector modes, in lower-case. +(define_mode_attr lasxhalf [(V32QI "v16qi") (V16HI "v8hi") + (V8SI "v4si") (V4DI "v2di") + (V8SF "v4sf") (V4DF "v2df")]) + (define_expand "vec_init" [(match_operand:LASX 0 "register_operand") (match_operand:LASX 1 "")] @@ -474,9 +479,9 @@ DONE; }) -(define_expand "vec_initv32qiv16qi" - [(match_operand:V32QI 0 "register_operand") - (match_operand:V16QI 1 "")] +(define_expand "vec_init" + [(match_operand:LASX 0 "register_operand") + (match_operand: 1 "")] "ISA_HAS_LASX" { loongarch_expand_vector_group_init (operands[0], operands[1]); @@ -577,6 +582,21 @@ [(set_attr "type" "simd_insert") (set_attr "mode" "")]) +(define_insn "@vec_concatz" + [(set (match_operand:LASX 0 "register_operand" "=f") +(vec_concat:LASX + (match_operand: 1 "nonimmediate_operand") + (match_operand: 2 "const_0_operand")))] + "ISA_HAS_LASX" +{ + if (MEM_P (operands[1])) +return "vld\t%w0,%1"; + else +return "vori.b\t%w0,%w1,0"; +} + [(set_attr "type" "simd_splat") + (set_attr "mode" "")]) + (define_insn "vec_concat" [(set (match_operand:LASX 0 "register_operand" "=f") (vec_concat:LASX diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 28d64135c54..b2a296a1dd9 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -9858,10 +9858,46 @@ loongarch_gen_const_int_vector_shuffle (machine_mode mode, int val) void loongarch_expand_vector_group_init (rtx target, rtx vals) { - rtx ops[2] = { force_reg (E_V16QImode, XVECEXP (vals, 0, 0)), - force_reg (E_V16QImode, XVECEXP (vals, 0, 1)) }; - emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (E_V32QImode, ops[0], - ops[1]))); + machine_mode vmode = GET_MODE (target); + machine_mode half_mode = VOIDmode; + rtx low = XVECEXP (vals, 0, 0); + rtx high = XVECEXP (vals, 0, 1); + + switch (vmode) +{ +case E_V32QImode: + half_mode = V16QImode; + break; +case E_V16HImode: + half_mode = V8HImode; + break; +case E_V8SImode: + half_mode = V4SImode; + break; +case E_V4DImode: + half_mode = V2DImode; + break; +case E_V8SFmode: + half_mode = V4SFmode; + break; +case E_V4DFmode: + half_mode = V2DFmode; + break; +default: + gcc_unreachable (); +} + + if (high == CONST0_RTX (half_mode)) +emit_insn (gen_vec_concatz (vmode, target, low, high)); + else +{ + if (!register_operand (low, half_mode)) + low = force_reg (half_mode, low); + if (!register_operand (high, half_mode)) + high = force_reg (half_mode, high); + emit_insn (gen_rtx_SET (target, + gen_rtx_VEC_CONCAT (vmode, low, high))); +} } /* Expand initialization of a vector which has all same
Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector
It has been updated. [PATCH v5] RISC-V: Handle differences between XTheadvector and Vector (gnu.org) -- 发件人:钟居哲 发送时间:2024年1月9日(星期二) 07:08 收件人:"cooper.joshua"; "gcc-patches" 抄 送:"jim.wilson.gcc"; palmer; andrew; "philipp.tomsich"; Jeff Law; "Christoph Müllner"; "cooper.joshua"; jinma; Cooper Qu 主 题:Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector - return TAIL_ANY; + return TARGET_XTHEADVECTOR ? TAIL_AGNOSTIC : TAIL_ANY; - return MASK_ANY; + return TARGET_XTHEADVECTOR ? MASK_UNDISTURBED : MASK_ANY; You shouldn't change this. - "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5" + { return TARGET_XTHEADVECTOR ? "vsetvli\t%0,%1,e%2,%m3" : "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"; } I prefer do it in ASM_OUTPUT + Copyright (C) 2022-2023 Free Software Foundation, Inc. Copyright is not correct. juzhe.zh...@rivai.ai From: Jun Sha (Joshua) Date: 2024-01-03 14:15 To: gcc-patches CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu Subject: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-c.cc: Add pragma for XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/riscv/vector-iterators.md: Remove fractional LMUL. * config/riscv/vector.md: Include thead-vector.md. * config/riscv/riscv_th_vector.h: New file. * config/riscv/thead-vector.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector. * gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector. * lib/target-supports.exp: Add target for XTheadVector. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc | 2 +- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/predicates.md | 4 +- gcc/config/riscv/riscv-c.cc | 3 +- gcc/config/riscv/riscv-string.cc | 3 +- gcc/config/riscv/riscv-v.cc | 6 +- .../riscv/riscv-vector-builtins-bases.cc | 48 +++-- .../riscv/riscv-vector-builtins-shapes.cc | 23 +++ gcc/config/riscv/riscv-vector-switch.def | 150 +++--- gcc/config/riscv/riscv.cc | 20 +- gcc/config/riscv/riscv_th_vector.h | 49 + gcc/config/riscv/thead-vector.md | 69 +++ gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md | 55 -- .../gcc.target/riscv/rvv/base/abi-1.c | 2 +- .../gcc.target/riscv/rvv/base/pragma-1.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 17 files changed, 427 insertions(+), 209 deletions(-)
[PATCH v5] RISC-V: Handle differences between XTheadvector and Vector
This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-c.cc: Add pragma for XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/riscv/vector-iterators.md: Remove fractional LMUL. * config/riscv/vector.md: Include thead-vector.md. * config/riscv/riscv_th_vector.h: New file. * config/riscv/thead-vector.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector. * gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector. * lib/target-supports.exp: Add target for XTheadVector. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc| 2 +- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/predicates.md| 4 +- gcc/config/riscv/riscv-c.cc | 3 +- gcc/config/riscv/riscv-string.cc | 3 +- gcc/config/riscv/riscv-v.cc | 2 +- .../riscv/riscv-vector-builtins-bases.cc | 48 +++-- .../riscv/riscv-vector-builtins-shapes.cc | 23 +++ gcc/config/riscv/riscv-vector-switch.def | 150 +++--- gcc/config/riscv/riscv.cc | 20 +- gcc/config/riscv/riscv_th_vector.h| 49 + gcc/config/riscv/thead-vector.md | 102 ++ gcc/config/riscv/thead.cc | 23 ++- gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md| 49 - .../gcc.target/riscv/rvv/base/abi-1.c | 2 +- .../gcc.target/riscv/rvv/base/pragma-1.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 18 files changed, 476 insertions(+), 206 deletions(-) create mode 100644 gcc/config/riscv/riscv_th_vector.h create mode 100644 gcc/config/riscv/thead-vector.md diff --git a/gcc/config.gcc b/gcc/config.gcc index 7e583390024..047e4c02cf4 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -549,7 +549,7 @@ riscv*) extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o" extra_objs="${extra_objs} thead.o riscv-target-attr.o" d_target_objs="riscv-d.o" - extra_headers="riscv_vector.h" + extra_headers="riscv_vector.h riscv_th_vector.h" target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc" target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.h" ;; diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 775eaa825b0..0477781cabe 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2579,7 +2579,7 @@ [(match_operand 0 "register_operand") (match_operand 1 "memory_operand") (match_operand:ANYI 2 "const_int_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR &&
[Committed] RISC-V: Fix comments of segment load/store intrinsic
We have supported segment load/store intrinsics. Committed as it is obvious. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-functions.def (vleff): Move comments. (vundefined): Ditto. --- gcc/config/riscv/riscv-vector-builtins-functions.def | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def b/gcc/config/riscv/riscv-vector-builtins-functions.def index 96dd0d95dec..f742c98be8a 100644 --- a/gcc/config/riscv/riscv-vector-builtins-functions.def +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def @@ -79,8 +79,6 @@ DEF_RVV_FUNCTION (vsoxei64, indexed_loadstore, none_m_preds, all_v_scalar_ptr_ee // 7.7. Unit-stride Fault-Only-First Loads DEF_RVV_FUNCTION (vleff, fault_load, full_preds, all_v_scalar_const_ptr_size_ptr_ops) -// TODO: 7.8. Vector Load/Store Segment Instructions - /* 11. Vector Integer Arithmetic Instructions. */ // 11.1. Vector Single-Width Integer Add and Subtract @@ -630,6 +628,8 @@ DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops) DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops) DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops) DEF_RVV_FUNCTION (vundefined, vundefined, none_preds, all_none_void_tuple_ops) + +// 7.8. Vector Load/Store Segment Instructions DEF_RVV_FUNCTION (vlseg, seg_loadstore, full_preds, tuple_v_scalar_const_ptr_ops) DEF_RVV_FUNCTION (vsseg, seg_loadstore, none_m_preds, tuple_v_scalar_ptr_ops) DEF_RVV_FUNCTION (vlsseg, seg_loadstore, full_preds, tuple_v_scalar_const_ptr_ptrdiff_ops) -- 2.36.3
Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector
For the vsetvl issue, we have discussed last week. Maybe riscv_asm_output function cannot return instructions like riscv_output_move. The briefest approach may be to add some logic in the vsetvl patterns. Only 3 patterns need to be modified and that will not be too invasive. -- 发件人:钟居哲 发送时间:2024年1月9日(星期二) 07:08 收件人:"cooper.joshua"; "gcc-patches" 抄 送:"jim.wilson.gcc"; palmer; andrew; "philipp.tomsich"; Jeff Law; "Christoph Müllner"; "cooper.joshua"; jinma; Cooper Qu 主 题:Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector - return TAIL_ANY; + return TARGET_XTHEADVECTOR ? TAIL_AGNOSTIC : TAIL_ANY; - return MASK_ANY; + return TARGET_XTHEADVECTOR ? MASK_UNDISTURBED : MASK_ANY; You shouldn't change this. - "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5" + { return TARGET_XTHEADVECTOR ? "vsetvli\t%0,%1,e%2,%m3" : "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"; } I prefer do it in ASM_OUTPUT + Copyright (C) 2022-2023 Free Software Foundation, Inc. Copyright is not correct. juzhe.zh...@rivai.ai From: Jun Sha (Joshua) Date: 2024-01-03 14:15 To: gcc-patches CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu Subject: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-c.cc: Add pragma for XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/riscv/vector-iterators.md: Remove fractional LMUL. * config/riscv/vector.md: Include thead-vector.md. * config/riscv/riscv_th_vector.h: New file. * config/riscv/thead-vector.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector. * gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector. * lib/target-supports.exp: Add target for XTheadVector. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc | 2 +- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/predicates.md | 4 +- gcc/config/riscv/riscv-c.cc | 3 +- gcc/config/riscv/riscv-string.cc | 3 +- gcc/config/riscv/riscv-v.cc | 6 +- .../riscv/riscv-vector-builtins-bases.cc | 48 +++-- .../riscv/riscv-vector-builtins-shapes.cc | 23 +++ gcc/config/riscv/riscv-vector-switch.def | 150 +++--- gcc/config/riscv/riscv.cc | 20 +- gcc/config/riscv/riscv_th_vector.h | 49 + gcc/config/riscv/thead-vector.md | 69 +++ gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md | 55 -- .../gcc.target/riscv/rvv/base/abi-1.c
[Committed] RISC-V: Fix comments of segment load/store intrinsic[NFC]
We have supported segment load/store intrinsics. Committed as it is obvious. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-functions.def (vleff): Move comments to real place. (vcreate): Ditto. --- gcc/config/riscv/riscv-vector-builtins-functions.def | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def b/gcc/config/riscv/riscv-vector-builtins-functions.def index 96dd0d95dec..14560923d11 100644 --- a/gcc/config/riscv/riscv-vector-builtins-functions.def +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def @@ -79,8 +79,6 @@ DEF_RVV_FUNCTION (vsoxei64, indexed_loadstore, none_m_preds, all_v_scalar_ptr_ee // 7.7. Unit-stride Fault-Only-First Loads DEF_RVV_FUNCTION (vleff, fault_load, full_preds, all_v_scalar_const_ptr_size_ptr_ops) -// TODO: 7.8. Vector Load/Store Segment Instructions - /* 11. Vector Integer Arithmetic Instructions. */ // 11.1. Vector Single-Width Integer Add and Subtract @@ -625,7 +623,7 @@ DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x2_ops) DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x4_ops) DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul4_x2_ops) -// Tuple types +// 7.8. Vector Load/Store Segment Instructions DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops) DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops) DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops) -- 2.36.3
回复: Re: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.
Committed, thanks Juzhe. 发件人: 钟居哲 发送时间: 2024-01-09 07:02 收件人: wangfeng; gcc-patches 抄送: kito.cheng; Jeff Law; wangfeng 主题: Re: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function. LGTM. juzhe.zh...@rivai.ai From: Feng Wang Date: 2024-01-08 17:12 To: gcc-patches CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang Subject: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function. Patch v7:Resubmit after fix trl-checking issue. Passed all the riscv regression test. Patch v6:Remove unused code. Patch v5:Rebase. Patch v4:Merge crypto vector function.def into vector. Patch v3:Define a shape for vaesz and merge vector-crypto-types.def into riscv-vector-builtins-types.def. Patch v2:Optimize function_shape class for crypto_vector. This patch add the intrinsic funtions of crypto vector based on the intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob /eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md). Co-Authored by: Songhe Zhu Co-Authored by: Ciyan Pan gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class vandn): Add new function_base for crypto vector. (class bitmanip): Ditto. (class b_reverse):Ditto. (class vwsll): Ditto. (class clmul): Ditto. (class vg_nhab): Ditto. (class crypto_vv):Ditto. (class crypto_vi):Ditto. (class vaeskf2_vsm3c):Ditto. (class vsm3me): Ditto. (BASE): Add BASE declaration for crypto vector. * config/riscv/riscv-vector-builtins-bases.h: Ditto. * config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS): Add crypto vector intrinsic definition. (vbrev): Ditto. (vclz): Ditto. (vctz): Ditto. (vwsll): Ditto. (vandn): Ditto. (vbrev8): Ditto. (vrev8): Ditto. (vrol): Ditto. (vror): Ditto. (vclmul): Ditto. (vclmulh): Ditto. (vghsh): Ditto. (vgmul): Ditto. (vaesef): Ditto. (vaesem): Ditto. (vaesdf): Ditto. (vaesdm): Ditto. (vaesz): Ditto. (vaeskf1): Ditto. (vaeskf2): Ditto. (vsha2ms): Ditto. (vsha2ch): Ditto. (vsha2cl): Ditto. (vsm4k): Ditto. (vsm4r): Ditto. (vsm3me): Ditto. (vsm3c): Ditto. * config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def): Add new function_shape for crypto vector. (struct crypto_vi_def): Ditto. (struct crypto_vv_no_op_type_def): Ditto. (SHAPE): Add SHAPE declaration of crypto vector. * config/riscv/riscv-vector-builtins-shapes.h: Ditto. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS): Add new data type for crypto vector. (DEF_RVV_CRYPTO_SEW64_OPS): Ditto. (vuint32mf2_t): Ditto. (vuint32m1_t): Ditto. (vuint32m2_t): Ditto. (vuint32m4_t): Ditto. (vuint32m8_t): Ditto. (vuint64m1_t): Ditto. (vuint64m2_t): Ditto. (vuint64m4_t): Ditto. (vuint64m8_t): Ditto. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS): Add new data struct for crypto vector. (DEF_RVV_CRYPTO_SEW64_OPS): Ditto. (registered_function::overloaded_hash): Processing size_t uimm for C overloaded func. * config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE. --- .../riscv/riscv-vector-builtins-bases.cc | 264 +- .../riscv/riscv-vector-builtins-bases.h | 28 ++ .../riscv/riscv-vector-builtins-functions.def | 94 +++ .../riscv/riscv-vector-builtins-shapes.cc | 87 +- .../riscv/riscv-vector-builtins-shapes.h | 4 + .../riscv/riscv-vector-builtins-types.def | 25 ++ gcc/config/riscv/riscv-vector-builtins.cc | 133 - gcc/config/riscv/riscv-vector-builtins.def| 1 + 8 files changed, 633 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index d70468542ee..d12bb89f91c 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -2127,6 +2127,212 @@ public: } }; +/* Below implements are vector crypto */ +/* Implements vandn.[vv,vx] */ +class vandn : public function_base +{ +public: + rtx expand (function_expander ) const override + { +switch (e.op_info->op) + { + case OP_TYPE_vv: +return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ())); + case OP_TYPE_vx: +return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode ())); + default: +gcc_unreachable (); + } + } +}; + +/* Implements vrol/vror/clz/ctz. */ +template +class bitmanip : public function_base +{ +public: + bool apply_tail_policy_p () const override + { +return (CODE == CLZ || CODE == CTZ) ? false : true; + } + bool apply_mask_policy_p () const override + { +return (CODE == CLZ || CODE == CTZ) ? false : true; + } + bool has_merge_operand_p () const override + { +return (CODE == CLZ || CODE == CTZ) ? false : true; + } + + rtx expand (function_expander ) const override + { +switch (e.op_info->op) +{ + case OP_TYPE_v: + case OP_TYPE_vv: +return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ())); + case OP_TYPE_vx: +return e.use_exact_insn
回复: Re: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.
Committed, thanks Juzhe. 发件人: 钟居哲 发送时间: 2024-01-09 07:02 收件人: wangfeng; gcc-patches 抄送: kito.cheng; Jeff Law; wangfeng 主题: Re: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases. LGTM. juzhe.zh...@rivai.ai From: Feng Wang Date: 2024-01-08 17:12 To: gcc-patches CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang Subject: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases. Patch v8: Resubmit after fix the rtl-checking issue. Passed all the riscv regression test. Patch v7: Add newline at the end of file. Patch v6: Move intrinsic tests into rvv/base. Patch v5: Rebase Patch v4: Add some RV32 vx constraint testcase. Patch v3: Refine crypto vector api-testing cases. Patch v2: Update march info according to the change of riscv-common.c This patch add crypto vector api-testing cases based on https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test. * gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test. * gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test. * gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test. * gcc.target/riscv/zvkb.c: New test. --- .../riscv/rvv/base/zvbb-intrinsic.c | 179 ++ .../riscv/rvv/base/zvbb_vandn_vx_constraint.c | 15 ++ .../riscv/rvv/base/zvbc-intrinsic.c | 62 ++ .../riscv/rvv/base/zvbc_vx_constraint-1.c | 14 ++ .../riscv/rvv/base/zvbc_vx_constraint-2.c | 14 ++ .../riscv/rvv/base/zvkg-intrinsic.c | 24 +++ .../riscv/rvv/base/zvkned-intrinsic.c | 104 ++ .../riscv/rvv/base/zvknha-intrinsic.c | 33 .../riscv/rvv/base/zvknhb-intrinsic.c | 33 .../riscv/rvv/base/zvksed-intrinsic.c | 33 .../riscv/rvv/base/zvksh-intrinsic.c | 24 +++ gcc/testsuite/gcc.target/riscv/zvkb.c | 13 ++ 12 files changed, 548 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c new file mode 100644 index 000..b7e25bfe819 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c @@ -0,0 +1,179 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */ +#include "riscv_vector.h" + +vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) { + return __riscv_vandn_vv_u8mf8(vs2, vs1, vl); +} + +vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) { + return __riscv_vandn_vx_u32m1(vs2, rs1, vl); +} + +vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t vs1, size_t vl) { + return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl); +} + +vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t rs1, size_t vl) { + return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl); +} + +vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) { + return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl); +} + +vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, vuint64m4_t vs2, uint64_t rs1, size_t vl) { + return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl); +} + +vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) { + return __riscv_vbrev_v_u8m8(vs2, vl); +} + +vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) { + return __riscv_vbrev_v_u16m1_m(mask, vs2, vl); +} + +vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff,
Re: Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL]
Yes. It does sufficient. Send a patch: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642216.html juzhe.zh...@rivai.ai From: Robin Dapp Date: 2024-01-09 00:45 To: 钟居哲; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law Subject: Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL] > > + if (is_gimple_min_invariant (op)) > > +return true; > > + if (SSA_NAME_IS_DEFAULT_DEF (op) > > + || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT > (op > > +return true; > > + return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1; > > +} > > + Does gimple_uid ever return something useful for us here? In tree-ssa-loop-ch it is being populated before and then used but I don't think we populate it properly? So my question would be, isn't is_gimple_constant and flow_bb_inside_loop_p sufficient for our purpose? Regards Robin
[PATCH] RISC-V: Fix loop invariant check
As Robin suggested, remove gimple_uid check which is sufficient for our need. Tested on both RV32/RV64 no regression, ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (loop_invariant_op_p): Fix loop invariant check. --- gcc/config/riscv/riscv-vector-costs.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc index 3bae581d6fd..f4a1a789f23 100644 --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -241,7 +241,7 @@ loop_invariant_op_p (class loop *loop, if (SSA_NAME_IS_DEFAULT_DEF (op) || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (op return true; - return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1; + return false; } /* Return true if the variable should be counted into liveness. */ -- 2.36.3
RE: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option
Thanks Richard B for comments. > We don't really expect targets to do this. The small testcase above > is somewhat ill-formed with -fno-signed-zeros. Note there's no > -0.0 in pr30957-1.c so why does that one fail for you? Does > the -fvariable-expansion-in-unroller code maybe not trigger for > riscv? Sorry this confused me a little about the sematics of the option -fno-signed-zeros. I wonder what the target/backend need to do for this option. About the failure, it comes from below code in pr30957-1.c. The 0.0 / -5.0 is initialized to -0.0 in riscv but +0.0 in aarch64. if (__builtin_copysignf (1.0, foo (0.0 / -5.0, 10)) != 1.0) abort (); If my understanding is correct, the loop will be vectorized during vect_transform_loop with a variable factor. It won't benefit from unrolling/peeling and mark the loop->unroll as 1, and then we have tree-vect log similar to below: Disabling unrolling due to variable-length vectorization factor. > I think we should go to PR30957 and see what that was filed originally > for, the testcase doesn't make much sense to me. Sure thing, will take a look and back to you later. Pan -Original Message- From: Richard Biener Sent: Monday, January 8, 2024 6:45 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang ; kito.ch...@gmail.com; jeffreya...@gmail.com Subject: Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option On Tue, Jan 2, 2024 at 2:37 PM wrote: > > From: Pan Li > > According to the sematics of no-signed-zeros option, the backend > like RISC-V should treat the minus zero -0.0f as plus zero 0.0f. > > Consider below example with option -fno-signed-zeros. > > void > test (float *a) > { > *a = -0.0; > } > > We will generate code as below, which doesn't treat the minus zero > as plus zero. > > test: > lui a5,%hi(.LC0) > flw fa5,%lo(.LC0)(a5) > fsw fa5,0(a0) > ret > > .LC0: > .word -2147483648 // aka -0.0 (0x8000 in hex) > > This patch would like to fix the bug and treat the minus zero -0.0 > as plus zero, aka +0.0. Thus after this patch we will have asm code > as below for the above sampe code. > > test: > sw zero,0(a0) > ret > > This patch also fix the run failure of the test case pr30957-1.c. The > below tests are passed for this patch. We don't really expect targets to do this. The small testcase above is somewhat ill-formed with -fno-signed-zeros. Note there's no -0.0 in pr30957-1.c so why does that one fail for you? Does the -fvariable-expansion-in-unroller code maybe not trigger for riscv? I think we should go to PR30957 and see what that was filed originally for, the testcase doesn't make much sense to me. > * The riscv regression tests. > * The pr30957-1.c run tests. > > gcc/ChangeLog: > > * config/riscv/constraints.md: Leverage func > riscv_float_const_zero_rtx_p > for predicating the rtx is const zero float or not. > * config/riscv/predicates.md: Ditto. > * config/riscv/riscv.cc (riscv_const_insns): Ditto. > (riscv_float_const_zero_rtx_p): New func impl for predicating the rtx > is > const zero float or not. > (riscv_const_zero_rtx_p): New func impl for predicating the rtx > is const zero (both int and fp) or not. > * config/riscv/riscv-protos.h (riscv_float_const_zero_rtx_p): > New func decl. > (riscv_const_zero_rtx_p): Ditto. > * config/riscv/riscv.md: Making sure the operand[1] of movfp is > CONST0_RTX when the operand[1] is const zero float. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/no-signed-zeros-0.c: New test. > * gcc.target/riscv/no-signed-zeros-1.c: New test. > * gcc.target/riscv/no-signed-zeros-2.c: New test. > * gcc.target/riscv/no-signed-zeros-3.c: New test. > * gcc.target/riscv/no-signed-zeros-4.c: New test. > * gcc.target/riscv/no-signed-zeros-5.c: New test. > * gcc.target/riscv/no-signed-zeros-run-0.c: New test. > * gcc.target/riscv/no-signed-zeros-run-1.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/constraints.md | 2 +- > gcc/config/riscv/predicates.md| 2 +- > gcc/config/riscv/riscv-protos.h | 2 + > gcc/config/riscv/riscv.cc | 35 - > gcc/config/riscv/riscv.md | 49 --- > .../gcc.target/riscv/no-signed-zeros-0.c | 26 ++ > .../gcc.target/riscv/no-signed-zeros-1.c | 28 +++ > .../gcc.target/riscv/no-signed-zeros-2.c | 26 ++ > .../gcc.target/riscv/no-signed-zeros-3.c | 28 +++ > .../gcc.target/riscv/no-signed-zeros-4.c | 26 ++ > .../gcc.target/riscv/no-signed-zeros-5.c | 28 +++ > .../gcc.target/riscv/no-signed-zeros-run-0.c | 36 ++ > .../gcc.target/riscv/no-signed-zeros-run-1.c | 36 ++ > 13 files changed, 314 insertions(+), 10
RE: [PATCH] i386: Fix recent testcase fail
> -Original Message- > From: Jiang, Haochen > Sent: Monday, January 8, 2024 4:41 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Fix recent testcase fail > > After commit 01f4251b8775c832a92d55e2df57c9ac72eaceef, early break > vectorization is supported. The two testcases need to be fixed. Ok. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512fp16-xorsign-1.c: Fix testcase. > * gcc.target/i386/part-vect-absneghf.c: Ditto. > --- > gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c | 2 +- > gcc/testsuite/gcc.target/i386/part-vect-absneghf.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c > b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c > index a22a6ceabff..f5dd457c9eb 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c > +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-xorsign-1.c > @@ -35,7 +35,7 @@ do_test (void) >abort (); > } > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } > +*/ > /* { dg-final { scan-assembler "\[ \t\]xor" } } */ > /* { dg-final { scan-assembler "\[ \t\]and" } } */ > /* { dg-final { scan-assembler-not "copysign" } } */ diff --git > a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c > b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c > index 48aed14d604..713f0bff4dd 100644 > --- a/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c > +++ b/gcc/testsuite/gcc.target/i386/part-vect-absneghf.c > @@ -1,5 +1,5 @@ > /* { dg-do run { target avx512fp16 } } */ > -/* { dg-options "-O1 -mavx512fp16 -mavx512vl -ftree-vectorize -fdump- > tree-slp-details -fdump-tree-optimized" } */ > +/* { dg-options "-O1 -mavx512fp16 -mavx512vl -fdump-tree-slp-details > +-fdump-tree-optimized" } */ > > extern void abort (); > > -- > 2.31.1
ping^3: [PATCH] diagnostics: Fix behavior of permerror options after diagnostic pop [PR111918]
Can I please ping this one again? It's 3 lines or so to fix the PR. Thanks! https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638692.html On Tue, Dec 19, 2023 at 6:20 PM Lewis Hyatt wrote: > > Hello- > > May I please ping this one? Thanks... > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638692.html > > -Lewis > > On Wed, Nov 29, 2023 at 7:05 PM Lewis Hyatt wrote: > > > > On Thu, Nov 09, 2023 at 04:16:10PM -0500, Lewis Hyatt wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111918 > > > > > > This patch fixes the behavior of `#pragma GCC diagnostic pop' for > > > permissive > > > error diagnostics such as -Wnarrowing (in C++11). Those currently do not > > > return to the correct state after the last pop; they become effectively > > > simple warnings instead. Bootstrap + regtest all languages on x86-64, does > > > it look OK please? Thanks! > > > > Hello- > > > > May I please ping this bug fix? > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635871.html > > > > Please note, it requires a trivial rebase on top of recent changes to > > the class diagnostic_context public interface. I attached the rebased patch > > here as well. Thanks! > > > > -Lewis
[PATCH] Resolve issue with Canadian build for x86_64-w64-mingw32 multilibs
From: trcrsired In the case of x86_64-w64-mingw32 gcc with multilibs, a conflict arises as both 64-bit and 32-bit DLLs attempt to copy into the bin/ directory. This discrepancy results in coverage issues. This commit aligns the Canadian build process for gcc targeting Windows with cross builds. Consequently, DLLs will no longer be copied into bin/ but will instead reside in the lib and lib32 directories. --- gcc/configure | 32 ++-- libatomic/configure | 16 +++- libbacktrace/configure | 16 +++- libcc1/configure| 32 ++-- libffi/configure| 32 ++-- libgfortran/configure | 32 ++-- libgm2/configure| 32 ++-- libgo/config/libtool.m4 | 16 +++- libgo/configure | 16 +++- libgomp/configure | 32 ++-- libgrust/configure | 32 ++-- libitm/configure| 32 ++-- libobjc/configure | 16 +++- libphobos/configure | 16 +++- libquadmath/configure | 16 +++- libsanitizer/configure | 32 ++-- libssp/configure| 16 +++- libstdc++-v3/configure | 32 ++-- libtool.m4 | 16 +++- libvtv/configure| 32 ++-- lto-plugin/configure| 16 +++- zlib/configure | 16 +++- 22 files changed, 495 insertions(+), 33 deletions(-) diff --git a/gcc/configure b/gcc/configure index 996046f5198..db9a5c8f40b 100755 --- a/gcc/configure +++ b/gcc/configure @@ -20631,6 +20631,19 @@ cygwin* | mingw* | pw32* | cegcc*) yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds +# If user builds GCC with mulitlibs enabled, +# it should just install on $(libdir) +# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones. +if test ${multilib} = yes; then +postinstall_cmds='base_file=`basename \${file}`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + $install_prog $dir/$dlname $destdir/$dlname~ + chmod a+x $destdir/$dlname~ + if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then + eval '\''$striplib $destdir/$dlname'\'' || exit \$?; + fi' +else postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ @@ -20638,8 +20651,9 @@ cygwin* | mingw* | pw32* | cegcc*) $install_prog $dir/$dlname \$dldir/$dlname~ chmod a+x \$dldir/$dlname~ if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then -eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; + eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; fi' +fi postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $RM \$dlpath' @@ -24359,6 +24373,19 @@ cygwin* | mingw* | pw32* | cegcc*) yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds +# If user builds GCC with mulitlibs enabled, +# it should just install on $(libdir) +# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones. +if test ${multilib} = yes; then +postinstall_cmds='base_file=`basename \${file}`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + $install_prog $dir/$dlname $destdir/$dlname~ + chmod a+x $destdir/$dlname~ + if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then + eval '\''$striplib $destdir/$dlname'\'' || exit \$?; + fi' +else postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ @@ -24366,8 +24393,9 @@ cygwin* | mingw* | pw32* | cegcc*) $install_prog $dir/$dlname \$dldir/$dlname~ chmod a+x \$dldir/$dlname~ if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then -eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; + eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; fi' +fi postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $RM \$dlpath' diff --git a/libatomic/configure b/libatomic/configure index d579bab96f8..bf5e3858f94 100755 --- a/libatomic/configure +++ b/libatomic/configure @@ -10518,6 +10518,19 @@
Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
This patch looks ok from myside. juzhe.zh...@rivai.ai From: Jun Sha (Joshua) Date: 2024-01-03 14:08 To: gcc-patches CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu Subject: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector. This patch adds th. prefix to all XTheadVector instructions by implementing new assembly output functions. We only check the prefix is 'v', so that no extra attribute is needed. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_asm_output_opcode): New function to add assembler insn code prefix/suffix. (th_asm_output_opcode): Thead function to add assembler insn code prefix/suffix. * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise. * config/riscv/thead.cc (th_asm_output_opcode): Likewise gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/prefix.c: New test. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config/riscv/riscv-protos.h | 2 ++ gcc/config/riscv/riscv.cc | 11 +++ gcc/config/riscv/riscv.h| 4 gcc/config/riscv/thead.cc | 13 + .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 5 files changed, 42 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 31049ef7523..71724dabdb5 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -102,6 +102,7 @@ struct riscv_address_info { }; /* Routines implemented in riscv.cc. */ +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p); extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx); extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *); extern int riscv_float_const_rtx_index_for_fli (rtx); @@ -717,6 +718,7 @@ extern void th_mempair_prepare_save_restore_operands (rtx[4], bool, int, HOST_WIDE_INT, int, HOST_WIDE_INT); extern void th_mempair_save_restore_regs (rtx[4], bool, machine_mode); +extern const char *th_asm_output_opcode (FILE *asm_out_file, const char *p); #ifdef RTX_CODE extern const char* th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 0d1cbc5cb5f..51878797287 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -5636,6 +5636,17 @@ riscv_get_v_regno_alignment (machine_mode mode) return lmul; } +/* Define ASM_OUTPUT_OPCODE to do anything special before + emitting an opcode. */ +const char * +riscv_asm_output_opcode (FILE *asm_out_file, const char *p) +{ + if (TARGET_XTHEADVECTOR) +return th_asm_output_opcode (asm_out_file, p); + + return p; +} + /* Implement TARGET_PRINT_OPERAND. The RISCV-specific operand codes are: 'h' Print the high-part relocation associated with OP, after stripping diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 6df9ec73c5e..c33361a254d 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use); asm_fprintf ((FILE), "%U%s", (NAME)); \ } while (0) +#undef ASM_OUTPUT_OPCODE +#define ASM_OUTPUT_OPCODE(STREAM, PTR) \ + (PTR) = riscv_asm_output_opcode(STREAM, PTR) + #define JUMP_TABLES_IN_TEXT_SECTION 0 #define CASE_VECTOR_MODE SImode #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW) diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc index 20353995931..dc3aed3904d 100644 --- a/gcc/config/riscv/thead.cc +++ b/gcc/config/riscv/thead.cc @@ -883,6 +883,19 @@ th_output_move (rtx dest, rtx src) return NULL; } +/* Define ASM_OUTPUT_OPCODE to do anything special before + emitting an opcode. */ +const char * +th_asm_output_opcode (FILE *asm_out_file, const char *p) +{ + /* We need to add th. prefix to all the xtheadvector + instructions here.*/ + if (current_output_insn != NULL && p[0] == 'v') +fputs ("th.", asm_out_file); + + return p; +} + /* Implement TARGET_PRINT_OPERAND_ADDRESS for XTheadMemIdx. */ bool diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c new file mode 100644 index 000..eee727ef6b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */ + +#include "riscv_vector.h" + +vint32m1_t +prefix (vint32m1_t vx, vint32m1_t vy, size_t vl) +{ + return __riscv_vadd_vv_i32m1 (vx, vy, vl); +} + +/* { dg-final { scan-assembler {\mth\.v\M} } } */ -- 2.17.1
Re: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases.
LGTM. juzhe.zh...@rivai.ai From: Feng Wang Date: 2024-01-08 17:12 To: gcc-patches CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang Subject: [PATCH v8 2/2] RISC-V: Add crypto vector api-testing cases. Patch v8: Resubmit after fix the rtl-checking issue. Passed all the riscv regression test. Patch v7: Add newline at the end of file. Patch v6: Move intrinsic tests into rvv/base. Patch v5: Rebase Patch v4: Add some RV32 vx constraint testcase. Patch v3: Refine crypto vector api-testing cases. Patch v2: Update march info according to the change of riscv-common.c This patch add crypto vector api-testing cases based on https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test. * gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test. * gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test. * gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test. * gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test. * gcc.target/riscv/zvkb.c: New test. --- .../riscv/rvv/base/zvbb-intrinsic.c | 179 ++ .../riscv/rvv/base/zvbb_vandn_vx_constraint.c | 15 ++ .../riscv/rvv/base/zvbc-intrinsic.c | 62 ++ .../riscv/rvv/base/zvbc_vx_constraint-1.c | 14 ++ .../riscv/rvv/base/zvbc_vx_constraint-2.c | 14 ++ .../riscv/rvv/base/zvkg-intrinsic.c | 24 +++ .../riscv/rvv/base/zvkned-intrinsic.c | 104 ++ .../riscv/rvv/base/zvknha-intrinsic.c | 33 .../riscv/rvv/base/zvknhb-intrinsic.c | 33 .../riscv/rvv/base/zvksed-intrinsic.c | 33 .../riscv/rvv/base/zvksh-intrinsic.c | 24 +++ gcc/testsuite/gcc.target/riscv/zvkb.c | 13 ++ 12 files changed, 548 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c new file mode 100644 index 000..b7e25bfe819 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c @@ -0,0 +1,179 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */ +#include "riscv_vector.h" + +vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) { + return __riscv_vandn_vv_u8mf8(vs2, vs1, vl); +} + +vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) { + return __riscv_vandn_vx_u32m1(vs2, rs1, vl); +} + +vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t vs1, size_t vl) { + return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl); +} + +vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t rs1, size_t vl) { + return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl); +} + +vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) { + return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl); +} + +vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, vuint64m4_t vs2, uint64_t rs1, size_t vl) { + return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl); +} + +vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) { + return __riscv_vbrev_v_u8m8(vs2, vl); +} + +vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) { + return __riscv_vbrev_v_u16m1_m(mask, vs2, vl); +} + +vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, vuint32m4_t vs2, size_t vl) { + return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl); +} + +vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) { + return
Re: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function.
LGTM. juzhe.zh...@rivai.ai From: Feng Wang Date: 2024-01-08 17:12 To: gcc-patches CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang Subject: [PATCH v7 1/2] RISC-V: Add crypto vector builtin function. Patch v7:Resubmit after fix trl-checking issue. Passed all the riscv regression test. Patch v6:Remove unused code. Patch v5:Rebase. Patch v4:Merge crypto vector function.def into vector. Patch v3:Define a shape for vaesz and merge vector-crypto-types.def into riscv-vector-builtins-types.def. Patch v2:Optimize function_shape class for crypto_vector. This patch add the intrinsic funtions of crypto vector based on the intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob /eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md). Co-Authored by: Songhe Zhu Co-Authored by: Ciyan Pan gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class vandn): Add new function_base for crypto vector. (class bitmanip): Ditto. (class b_reverse):Ditto. (class vwsll): Ditto. (class clmul): Ditto. (class vg_nhab): Ditto. (class crypto_vv):Ditto. (class crypto_vi):Ditto. (class vaeskf2_vsm3c):Ditto. (class vsm3me): Ditto. (BASE): Add BASE declaration for crypto vector. * config/riscv/riscv-vector-builtins-bases.h: Ditto. * config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS): Add crypto vector intrinsic definition. (vbrev): Ditto. (vclz): Ditto. (vctz): Ditto. (vwsll): Ditto. (vandn): Ditto. (vbrev8): Ditto. (vrev8): Ditto. (vrol): Ditto. (vror): Ditto. (vclmul): Ditto. (vclmulh): Ditto. (vghsh): Ditto. (vgmul): Ditto. (vaesef): Ditto. (vaesem): Ditto. (vaesdf): Ditto. (vaesdm): Ditto. (vaesz): Ditto. (vaeskf1): Ditto. (vaeskf2): Ditto. (vsha2ms): Ditto. (vsha2ch): Ditto. (vsha2cl): Ditto. (vsm4k): Ditto. (vsm4r): Ditto. (vsm3me): Ditto. (vsm3c): Ditto. * config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def): Add new function_shape for crypto vector. (struct crypto_vi_def): Ditto. (struct crypto_vv_no_op_type_def): Ditto. (SHAPE): Add SHAPE declaration of crypto vector. * config/riscv/riscv-vector-builtins-shapes.h: Ditto. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS): Add new data type for crypto vector. (DEF_RVV_CRYPTO_SEW64_OPS): Ditto. (vuint32mf2_t): Ditto. (vuint32m1_t): Ditto. (vuint32m2_t): Ditto. (vuint32m4_t): Ditto. (vuint32m8_t): Ditto. (vuint64m1_t): Ditto. (vuint64m2_t): Ditto. (vuint64m4_t): Ditto. (vuint64m8_t): Ditto. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS): Add new data struct for crypto vector. (DEF_RVV_CRYPTO_SEW64_OPS): Ditto. (registered_function::overloaded_hash): Processing size_t uimm for C overloaded func. * config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE. --- .../riscv/riscv-vector-builtins-bases.cc | 264 +- .../riscv/riscv-vector-builtins-bases.h | 28 ++ .../riscv/riscv-vector-builtins-functions.def | 94 +++ .../riscv/riscv-vector-builtins-shapes.cc | 87 +- .../riscv/riscv-vector-builtins-shapes.h | 4 + .../riscv/riscv-vector-builtins-types.def | 25 ++ gcc/config/riscv/riscv-vector-builtins.cc | 133 - gcc/config/riscv/riscv-vector-builtins.def| 1 + 8 files changed, 633 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index d70468542ee..d12bb89f91c 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -2127,6 +2127,212 @@ public: } }; +/* Below implements are vector crypto */ +/* Implements vandn.[vv,vx] */ +class vandn : public function_base +{ +public: + rtx expand (function_expander ) const override + { +switch (e.op_info->op) + { + case OP_TYPE_vv: +return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ())); + case OP_TYPE_vx: +return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode ())); + default: +gcc_unreachable (); + } + } +}; + +/* Implements vrol/vror/clz/ctz. */ +template +class bitmanip : public function_base +{ +public: + bool apply_tail_policy_p () const override + { +return (CODE == CLZ || CODE == CTZ) ? false : true; + } + bool apply_mask_policy_p () const override + { +return (CODE == CLZ || CODE == CTZ) ? false : true; + } + bool has_merge_operand_p () const override + { +return (CODE == CLZ || CODE == CTZ) ? false : true; + } + + rtx expand (function_expander ) const override + { +switch (e.op_info->op) +{ + case OP_TYPE_v: + case OP_TYPE_vv: +return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ())); + case OP_TYPE_vx: +return e.use_exact_insn (code_for_pred_v_scalar (CODE, e.vector_mode ())); + default: +gcc_unreachable (); +} + } +}; + +/* Implements vbrev/vbrev8/vrev8. */ +template +class b_reverse : public
Re: [committed V3] libstdc++: Add Unicode-aware width estimation for std::format
On Mon, 8 Jan 2024 at 01:19, Jonathan Wakely wrote: > > I decided to push this now, not wait for the morning. > > This is mostly the same as V2, but adds to the contrib/unicode/README as > suggested by Lewis, and avoids a trailing whitespace character in the > generated header. > > Tested x86_64-linux and aarch64-linux. Pushed to trunk. > > -- >8 -- > > > This implements the requirements in the following proposals, which > dictate how std::format deals with non-ASCII strings: > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1868r1.html > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2572r1.html > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2675r1.pdf > > There are two parts to this. The width estimation for strings must only > count the width of the first character in an extended grapheme cluster. > That requires implementing the algorithm for detecting cluster breaks, > which requires a number of lookup tables of the grapheme cluster break > properties (and Indic_Conjunct_Break and Extended_Pictographic > properties) of every code point. Additionally, some characters have a > field width of 2, which requires another lookup table of field widths > for every code point. The tables added in this commit do not contain > entries for every code point from 0 to 0x10 as that would be very > inefficient and use too much memory. Instead the tables only contain the > code points that form an "edge" for a property, omitting all the code > points that have the same property as the preceding one. We can use a > binary search to find the closest code point in the table that is not > greater than the one we're looking for. > > The tables are generated by a new Python script added to the > contrib/unicode directory, and a new data file downloaded from the > Unicode Consortium website. > > The rules for extended grapheme cluster breaking are implemented for the > latest Unicode standard, version 15.1.0. > > libstdc++-v3/ChangeLog: > > * include/Makefile.am: Add new headers. > * include/Makefile.in: Regenerate. > * include/bits/unicode.h: New file. > * include/bits/unicode-data.h: New file. > * include/std/format: Include . > (__literal_encoding_is_utf8): Move to . > (_Spec::_M_fill): Change type to char32_t. > (_Spec::_M_parse_fill_and_align): Read a Unicode scalar value > instead of a single character. > (__write_padded): Change __fill_char parameter to char32_t and > encode it into the output. > (__formatter_str::format): Use new __unicode::__field_width and > __unicode::__truncate functions. > * include/std/ostream: Adjust namespace qualification for > __literal_encoding_is_utf8. > * include/std/print: Likewise. > * src/c++23/print.cc: Add [[unlikely]] attribute to error path. > * testsuite/ext/unicode/view.cc: New test. > * testsuite/std/format/functions/format.cc: Add missing examples > from the standard demonstrating alignment with non-ASCII > characters. Add examples checking correct handling of extended > grapheme clusters. > > contrib/ChangeLog: > > * unicode/README: Add notes about generating libstdc++ tables. > * unicode/GraphemeBreakProperty.txt: New file. > * unicode/emoji-data.txt: New file. > * unicode/gen_libstdcxx_unicode_data.py: New file. > --- While writing some more tests I realised I'd forgotten to finish this function, and had left it as a copy from __field_width(char32_t) above: > + constexpr bool > + __is_extended_pictographic(char32_t __c) > + { > +if (__c < __xpicto_edges[0]) [[likely]] > + return 1; > + > +auto* __p = std::upper_bound(__xpicto_edges, std::end(__xpicto_edges), > __c); > +return (__p - __xpicto_edges) % 2 + 1; > + } It should be: constexpr bool __is_extended_pictographic(char32_t __c) { if (__c < __xpicto_edges[0]) [[likely]] return false; auto* __p = std::upper_bound(__xpicto_edges, std::end(__xpicto_edges), __c); return (__p - __xpicto_edges) % 2; } I'll push a fix for that (and add my new tests) tomorrow.
Re: c++/modules: Emit definitions of ODR-used static members imported from modules [PR112899]
On 1/8/24 04:21, Iain Sandoe wrote: On 6 Jan 2024, at 22:30, Nathan Sidwell wrote: Richard Smith & I discussed whether we should use the module interface's capability of giving vague linkage entities a strong location. I didn't want to go messing with that, 'cos it was changing yet more stuff. But, perhaps we should revisit that? Any keyless polymorphic class in module purview gets its vtables etc emitted in the module's object file? Likewise these kinds of entities. cc'ing Iain, who probably knows more about Clang's state here. I have been trying to keep up with this thread, but not sure if I can throw a whole lot of light on things. There is an on-going attempt (now some 3 or 4 papers in) to try and figure out how to handle `static inline` entities at least at file scope - but that appears to be a different case (I can try an locate the latest paper on this if needed; the topic was discussed in Varna and Kona, but no new paper yet - perhaps Michael [Spencer] will bring a paper in Tokyo). clang ran into some issues with vtables and that resulted in some discussion about whether there should be an amendment to the Itanium ABI to deal with the module-specific stuff. https://github.com/itanium-cxx-abi/cxx-abi/issues/170 https://github.com/llvm/llvm-project/pull/75912#discussion_r1444150069 Sorry I cannot be much more specific at present, That's pretty specific that vtables at least get emitted in the module whether or not there's a key function. I've asked on that issue why this only applies to vtables. Jason
[committed] xfail dg-final "Sunk statements: 5" on hppa*64*-*-*
Tested on hppa64-hp-hpux11.11. Committed to trunk. Dave --- xfail dg-final "Sunk statements: 5" on hppa*64*-*-* 2024-01-08 John David Anglin gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-sink-18.c: xfail dg-final "Sunk statements: 5" on hppa*64*-*-*. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c index 1372100882e..b199df26a0f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-18.c @@ -215,4 +215,4 @@ compute_on_bytes (uint8_t *in_data, int in_len, uint8_t *out_data, int out_len) base+index addressing modes, so the ip[len] address computation can't be made from the IV computation above. powerpc64le similarly is affected. */ - /* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink2" { target lp64 xfail { riscv64-*-* powerpc64le-*-* } } } } */ + /* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink2" { target lp64 xfail { riscv64-*-* powerpc64le-*-* hppa*64*-*-* } } } } */ signature.asc Description: PGP signature
[committed] Skip gfortran.dg/dec_math.f90 on hppa*-*-hpux*
Tested on hppa64-hp-hpux11.11. Committed to trunk. Dave --- Skip gfortran.dg/dec_math.f90 on hppa hppa*-*-hpux* doesn't have any long double trig functions. 2024-01-08 John David Anglin gcc/testsuite/ChangeLog: * gfortran.dg/dec_math.f90: Skip on hppa*-*-hpux*. diff --git a/gcc/testsuite/gfortran.dg/dec_math.f90 b/gcc/testsuite/gfortran.dg/dec_math.f90 index d95233a5169..393e7def88e 100644 --- a/gcc/testsuite/gfortran.dg/dec_math.f90 +++ b/gcc/testsuite/gfortran.dg/dec_math.f90 @@ -1,5 +1,6 @@ ! { dg-options "-cpp -std=gnu" } ! { dg-do run { xfail i?86-*-freebsd* } } +! { dg-skip-if "No long double libc functions" { hppa*-*-hpux* } } ! ! Test extra math intrinsics formerly offered by -fdec-math, ! now included with -std=gnu or -std=legacy. signature.asc Description: PGP signature
[r14-7003 Regression] FAIL: gfortran.dg/power_8.f90 -O3 -g execution test on Linux/x86_64
On Linux/x86_64, b3cc5a1efead520bc977b4ba51f1328d01b3e516 is the first bad commit commit b3cc5a1efead520bc977b4ba51f1328d01b3e516 Author: Richard Biener Date: Fri Dec 15 10:32:29 2023 +0100 tree-optimization/113026 - avoid vector epilog in more cases caused FAIL: gcc.c-torture/execute/950612-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/950612-1.c -O3 -g execution test FAIL: gcc.c-torture/execute/builtin-bitops-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/builtin-bitops-1.c -O3 -g execution test FAIL: gcc.dg/vect/vect-early-break_74.c execution test FAIL: gcc.dg/vect/vect-early-break_74.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/vect-early-break_78.c execution test FAIL: gcc.dg/vect/vect-early-break_78.c -flto -ffat-lto-objects execution test FAIL: gfortran.dg/power_8.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gfortran.dg/power_8.f90 -O3 -g execution test with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-7003/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/950612-1.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/950612-1.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/builtin-bitops-1.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/builtin-bitops-1.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_74.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_74.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_78.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-early-break_78.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/power_8.f90 --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/power_8.f90 --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
Re: [PATCH][frontend]: don't ice with pragma NOVECTOR if loop in C has no condition [PR113267]
On Mon, 8 Jan 2024, Tamar Christina wrote: > Hi All, > > In C you can have loops without a condition, the original version of the patch > was rejecting the use of #pragma GCC novector, however during review it was > changed to not due this with the reason that we didn't want to give a compile > error with such cases. > > However because annotations seem to be only be allowed on conditions (unless > I'm mistaken?) the attached example ICEs because there's no condition. > > This will have it ignore the pragma instead of ICEing. I don't know if this > is > the best solution, but as far as I can tell we can't attach the annotation to > anything else. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? OK. -- Joseph S. Myers josmy...@redhat.com
Re: [Patch, fortran PR89645/99065 No IMPLICIT type error with: ASSOCIATE( X => function() )
Hi Paul, your patch looks already very impressive! Regarding the patch as is, I am still trying to grok it, even with your explanations at hand... While the testcase works as advertised, I noticed that it exhibits a runtime memleak that occurs for (likely) each case where the associate target is an allocatable, class-valued function result. I tried to produce a minimal testcase using class(*), which apparently is not handled by your patch (it ICEs for me): program p implicit none class(*), allocatable :: x(:) x = foo() call prt (x) deallocate (x) ! up to here no memleak... associate (var => foo()) call prt (var) end associate contains function foo() result(res) class(*), allocatable :: res(:) res = [42] end function foo subroutine prt (x) class(*), intent(in) :: x(:) select type (x) type is (integer) print *, x class default stop 99 end select end subroutine prt end Traceback (truncated): foo.f90:9:18: 9 | call prt (var) | 1 internal compiler error: tree check: expected record_type or union_type or qual_union_type, have function_type in gfc_class_len_get, at fortran/trans-expr.cc:271 0x19fd5d5 tree_check_failed(tree_node const*, char const*, int, char const*, ...) ../../gcc-trunk/gcc/tree.cc:8952 0xe1562d tree_check3(tree_node*, char const*, int, char const*, tree_code, tree_code, tree_code) ../../gcc-trunk/gcc/tree.h:3652 0xe3e264 gfc_class_len_get(tree_node*) ../../gcc-trunk/gcc/fortran/trans-expr.cc:271 0xecda48 trans_associate_var ../../gcc-trunk/gcc/fortran/trans-stmt.cc:2325 0xecdd09 gfc_trans_block_construct(gfc_code*) ../../gcc-trunk/gcc/fortran/trans-stmt.cc:2383 [...] I don't see anything wrong with it: NAG groks it, like Nvidia and Flang, while Intel crashes at runtime. Can you have another brief look? Thanks, Harald On 1/6/24 18:26, Paul Richard Thomas wrote: These PRs come about because of gfortran's single pass parsing. If the function in the title is parsed after the associate construct, then its type and rank are not known. The point at which this becomes a problem is when expressions within the associate block are parsed. primary.cc (gfc_match_varspec) could already deal with intrinsic types and so component references were the trigger for the problem. The two major parts of this patch are the fixup needed in gfc_match_varspec and the resolution of expressions with references in resolve.cc (gfc_fixup_inferred_type_refs). The former relies on the two new functions in symbol.cc to search for derived types with an appropriate component to match the component reference and then set the associate name to have a matching derived type. gfc_fixup_inferred_type_refs is called in resolution and so the type of the selector function is known. gfc_fixup_inferred_type_refs ensures that the component references use this derived type and that array references occur in the right place in expressions and match preceding array specs. Most of the work in preparing the patch was sorting out cases where the selector was not a derived type but, instead, a class function. If it were not for this, the patch would have been submitted six months ago :-( The patch is relatively safe because most of the chunks are guarded by testing for the associate name being an inferred type, which is set in gfc_match_varspec. For this reason, I do not think it likely that the patch will cause regressions. However, it is more than possible that variants not appearing in the submitted testcase will throw up new bugs. Jerry has already given the patch a whirl and found that it applies cleanly, regtests OK and works as advertised. OK for trunk? Paul Fortran: Fix class/derived type function associate selectors [PR87477] 2024-01-06 Paul Thomas gcc/fortran PR fortran/87477 PR fortran/89645 PR fortran/99065 * class.cc (gfc_change_class): New function needed for associate names, when rank changes or a derived type is produced by resolution * dump-parse-tree.cc (show_code_node): Make output for SELECT TYPE more comprehensible. * gfortran.h : Add 'gfc_association_list' to structure 'gfc_association_list'. Add prototypes for 'gfc_find_derived_types', 'gfc_fixup_inferred_type_refs' and 'gfc_change_class'. Add macro IS_INFERRED_TYPE. * match.cc (copy_ts_from_selector_to_associate): Add bolean arg 'select_type' with default false. If this is a select type name and the selector is a inferred type, build the class type and apply it to the associate name. (build_associate_name): Pass true to 'select_type' in call to previous. * parse.cc (parse_associate): If the selector is a inferred type the associate name is too. Make sure that function selector class and rank, if known, are passed to the associate name. If a function result exists, pass its typespec to the associate name. * primary.cc (gfc_match_varspec): If a scalar derived type select type temporary has an array reference,
Re: [PATCH] Add a late-combine pass [PR106594]
On 1/8/24 12:11, Richard Sandiford wrote: Thanks. That led me to the following, which seems a bit more plausible than my first attempt. I'll test it on aarch64-linux-gnu and x86_64-linux-gnu. Does it look OK? It looks reasonable to me. I'm going to send another failure (ICE in finalize_new_accesses on a different target) separately. Jeff
[committed] hppa: Fix bind_c_coms.f90 and bind_c_vars.f90 tests on hppa
Tested on hppa64-hp-hpux11.11. Committed to trunk. Dave --- hppa: Fix bind_c_coms.f90 and bind_c_vars.f90 tests on hppa Commit 6271dd98 changed the default from -fcommon to -fno-common. This silently changed the alignment of uninitialized BSS data on hppa where the alignment of common data must be greater or equal to the alignment of the largest type that will fit in the block. For example, the alignment of `double d[2];' changed from 16 to 8 on hppa64. The hppa architecture requires strict alignment and the linker warns about inconsistent alignment of variables. This change broke the gfortran.dg/bind_c_coms.f90 and gfortran.dg/bind_c_vars.f90 tests. These tests check whether bind_c works between fortran and C. Adding the -fcommon option fixes the tests. Probably, gcc and HP C are now by default inconsistent but that's water under the bridge. 2024-01-08 John David Anglin gcc/testsuite/ChangeLog: PR testsuite/94253 * gfortran.dg/bind_c_coms.f90: Add -fcommon option on hppa*-*-*. * gfortran.dg/bind_c_vars.f90: Likewise. diff --git a/gcc/testsuite/gfortran.dg/bind_c_coms.f90 b/gcc/testsuite/gfortran.dg/bind_c_coms.f90 index 85ead9fb636..2f9714947c7 100644 --- a/gcc/testsuite/gfortran.dg/bind_c_coms.f90 +++ b/gcc/testsuite/gfortran.dg/bind_c_coms.f90 @@ -3,6 +3,7 @@ ! { dg-options "-w" } ! the -w option is to prevent the warning about long long ints module bind_c_coms +! { dg-additional-options "-fcommon" { target hppa*-*-hpux* } } use, intrinsic :: iso_c_binding implicit none diff --git a/gcc/testsuite/gfortran.dg/bind_c_vars.f90 b/gcc/testsuite/gfortran.dg/bind_c_vars.f90 index 4f4a0cfd795..ede3ffd8c21 100644 --- a/gcc/testsuite/gfortran.dg/bind_c_vars.f90 +++ b/gcc/testsuite/gfortran.dg/bind_c_vars.f90 @@ -1,6 +1,7 @@ ! { dg-do run } ! { dg-additional-sources bind_c_vars_driver.c } module bind_c_vars +! { dg-additional-options "-fcommon" { target hppa*-*-hpux* } } use, intrinsic :: iso_c_binding implicit none signature.asc Description: PGP signature
Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
On Mon, Jan 8, 2024 at 5:57 PM Andrew Pinski wrote: > > On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak wrote: > > > > Instead of converting XOR or PLUS of two values, ANDed with two constants > > that > > have no bits in common, to IOR expression, convert IOR or XOR of said two > > ANDed values to PLUS expression. > > I think this only helps targets which have leal like instruction. Also > I think it is the same issue as I recorded as PR 111763 . I suspect > BIT_IOR is more of a Canonical form for GIMPLE while we should handle > this in expand to decide if we want to use PLUS or IOR. For the pr108477.c testcase, expand pass expands: r_3 = a_2(D) & 1; p_5 = b_4(D) & 4294967292; _1 = r_3 | p_5; _6 = _1 + 2; return _6; The transformation ( | -> + ) is valid only when CST1 & CST2 == 0, so we need to determine values of constants. Is this information available in the expand pass? IMO, the transformation from (ra | rb | cst) to (ra + rb + cst) as in the shown testcase would be beneficial when constructing control register values (see e.g. mesa-3d). We can use LEA instead of OR+ADD sequence in this case. Uros.
Re: [Patch] GCN: Add pre-initial support for gfx1100
Hi! On 2024-01-08T15:30:06+0100, Tobias Burnus wrote: > Andrew Stubbs wrote: >> I know there will be things that need fixing for >> both experimental architectures. > > Indeed. [...] ..., like, making it even build? ;-P >> P.S. Apologies, but I think my commits today conflict a little; you >> should be able to drop the hunks that patch deleted code. > > I did so - but I then realized that I should have also added gfx1100 to > the new chunk. > > Committed as r14-7006-g97a52f69d209f6 (see attachment) - as follow up to > the original r14-7005-g52a2c659ae6c21 Pushed to master branch commit f9290cdf4697f467fd0fb7c710f58cc12e497889 "GCN: Add pre-initial support for gfx1100: 'EF_AMDGPU_MACH_AMDGCN_GFX1100'", see attached. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From f9290cdf4697f467fd0fb7c710f58cc12e497889 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 8 Jan 2024 20:35:27 +0100 Subject: [PATCH] GCN: Add pre-initial support for gfx1100: 'EF_AMDGPU_MACH_AMDGCN_GFX1100' MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_hsa_name’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1666 | case EF_AMDGPU_MACH_AMDGCN_GFX1100: | ^ | EF_AMDGPU_MACH_AMDGCN_GFX1030 ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: note: each undeclared identifier is reported only once for each function it appears in ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_code’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1711:12: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1711 | return EF_AMDGPU_MACH_AMDGCN_GFX1100; |^ |EF_AMDGPU_MACH_AMDGCN_GFX1030 ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘max_isa_vgprs’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1728:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1728 | case EF_AMDGPU_MACH_AMDGCN_GFX1100: | ^ | EF_AMDGPU_MACH_AMDGCN_GFX1030 make[4]: *** [Makefile:813: libgomp_plugin_gcn_la-plugin-gcn.lo] Error 1 Fix-up for commit 52a2c659ae6c21f84b6acce0afcb9b93b9dc71a0 "GCN: Add pre-initial support for gfx1100". libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add 'EF_AMDGPU_MACH_AMDGCN_GFX1100'. --- libgomp/plugin/plugin-gcn.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index f24a28faa22..0339848451e 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -389,7 +389,8 @@ typedef enum { EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f, EF_AMDGPU_MACH_AMDGCN_GFX908 = 0x030, EF_AMDGPU_MACH_AMDGCN_GFX90a = 0x03f, - EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036 + EF_AMDGPU_MACH_AMDGCN_GFX1030 = 0x036, + EF_AMDGPU_MACH_AMDGCN_GFX1100 = 0x041 } EF_AMDGPU_MACH; const static int EF_AMDGPU_MACH_MASK = 0x00ff; -- 2.34.1
Re: [PATCH] Add a late-combine pass [PR106594]
Jeff Law writes: > On 1/8/24 09:59, Richard Sandiford wrote: >> This is a bit of a hopeful stab, but is the problem that recog_data still >> had the previous contents of insn 3674, and so extract_insn_cached wrongly >> thought that it doesn't need to do anything? If so, does something like: >> >> diff --git a/gcc/recog.cc b/gcc/recog.cc >> index a6799e3f5e6..8ba63c78179 100644 >> --- a/gcc/recog.cc >> +++ b/gcc/recog.cc >> @@ -267,6 +267,8 @@ validate_change_1 (rtx object, rtx *loc, rtx new_rtx, >> bool in_group, >> case invalid. */ >> changes[num_changes].old_code = INSN_CODE (object); >> INSN_CODE (object) = -1; >> + if (recog_data.insn == object) >> +recog_data.insn = nullptr; >> } >> >> num_changes++; >> >> fix it? I suppose there's an argument that this belongs in whatever code >> sets INSN_CODE to a new nonnegative value (so recog_level2 for RTL-SSA). >> But doing it in validate_change_1 seems more robust, since anything >> calling that function is considering changing the insn code. > Nope, doesn't help at all. Yeah, in hindsight it was a dull guess. recog resets recog_data.insn itself, so doing it here wasn't likely to help. > I'd briefly put a reset of the INSN_CODE > and a call to recog_memoized in the costing path of rtl-ssa to see if > that would allow things to move forward, but it failed miserably. > > I'll pass along the .i file separately. Hopefully it'll fail for you > and you can debug. But given failure depends on stale bits in > recog_data, it may not. Thanks. That led me to the following, which seems a bit more plausible than my first attempt. I'll test it on aarch64-linux-gnu and x86_64-linux-gnu. Does it look OK? Richard insn_info::calculate_cost computes the costs of unchanged insns lazily, so that we don't waste time costing instructions that we never try to change. It therefore has to revert any in-progress changes, cost the original instruction, and then reapply the in-progress changes. However, doing that temporarily changes the INSN_CODEs, and so temporarily invalidates any information cached about the insn. This means that insn_cost can end up looking at stale data, or can cache data that becomes stale once the in-progress changes are reapplied. This could in principle happen for any use of temporarily_undo_changes and redo_changes. Those functions in turn share a common subroutine, swap_change, so that seems like the best place to fix this. gcc/ * recog.cc (swap_change): Invalidate the cached recog_data if it describes an insn that is being changed. --- gcc/recog.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/recog.cc b/gcc/recog.cc index a6799e3f5e6..56370e40e01 100644 --- a/gcc/recog.cc +++ b/gcc/recog.cc @@ -614,7 +614,11 @@ swap_change (int num) else std::swap (*changes[num].loc, changes[num].old); if (changes[num].object && !MEM_P (changes[num].object)) -std::swap (INSN_CODE (changes[num].object), changes[num].old_code); +{ + std::swap (INSN_CODE (changes[num].object), changes[num].old_code); + if (recog_data.insn == changes[num].object) + recog_data.insn = nullptr; +} } /* Temporarily undo all the changes numbered NUM and up, with a view -- 2.25.1
[committed] steering.html: Update my affiliation
diff --git a/htdocs/steering.html b/htdocs/steering.html index 95d6a4a8..6039a503 100644 --- a/htdocs/steering.html +++ b/htdocs/steering.html @@ -36,7 +36,7 @@ place to reach them is the gcc mailing list. Jason Merrill (Red Hat) David Miller (Red Hat) Toon Moene (Koninklijk Nederlands Meteorologisch Instituut) -Joseph Myers (CodeSourcery / Mentor Graphics) [co-Release Manager] +Joseph Myers (Red Hat) [co-Release Manager] Gerald Pfeifer (SUSE) Ramana Radhakrishnan Joel Sherrill (OAR Corporation) -- Joseph S. Myers josmy...@redhat.com
[committed] MAINTAINERS: Update my email address
* MAINTAINERS: Update my email address. diff --git a/MAINTAINERS b/MAINTAINERS index fe5d95ae970..882694cc47d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -34,7 +34,7 @@ Jeff Law Michael Meissner Jason Merrill David S. Miller -Joseph Myers +Joseph Myers Richard Sandiford Bernd Schmidt Ian Lance Taylor @@ -155,7 +155,7 @@ cygwin, mingw-w64 Jonathan Yong <10wa...@gmail.com> Language Front Ends Maintainers -C front end/ISO C99Joseph Myers +C front end/ISO C99Joseph Myers Ada front end Arnaud Charlet Ada front end Eric Botcazou Ada front end Marc Poulhiès @@ -192,7 +192,7 @@ libquadmath Jakub Jelinek libvtv Caroline Tice libphobos Iain Buclaw line map Dodji Seketeli -soft-fpJoseph Myers +soft-fpJoseph Myers scheduler (+ haifa)Jim Wilson scheduler (+ haifa)Michael Meissner scheduler (+ haifa)Jeff Law @@ -219,7 +219,7 @@ jump.cc David S. Miller web pages Gerald Pfeifer config.sub/config.guessBen Elliston i18n Philipp Thomas -i18n Joseph Myers +i18n Joseph Myers diagnostic messagesDodji Seketeli diagnostic messagesDavid Malcolm build machinery (*.in) Paolo Bonzini @@ -227,14 +227,14 @@ build machinery (*.in)Nathanael Nerode build machinery (*.in) Alexandre Oliva build machinery (*.in) Ralf Wildenhues docs co-maintainer Gerald Pfeifer -docs co-maintainer Joseph Myers +docs co-maintainer Joseph Myers docs co-maintainer Sandra Loosemore docstring relicensing Gerald Pfeifer -docstring relicensing Joseph Myers +docstring relicensing Joseph Myers predict.defJan Hubicka gcov Jan Hubicka gcov Nathan Sidwell -option handlingJoseph Myers +option handlingJoseph Myers middle-end Jeff Law middle-end Ian Lance Taylor middle-end Richard Biener @@ -278,7 +278,7 @@ CTF, BTF, bpf port David Faust dataflow Paolo Bonzini dataflow Seongbae Park dataflow Kenneth Zadeck -driver Joseph Myers +driver Joseph Myers FortranHarald Anlauf FortranJanne Blomqvist FortranTobias Burnus -- Joseph S. Myers josmy...@redhat.com
[wwwdocs] gcc-14/changes.html: OpenMP - improve wording
The attached patch does a tiny updated to the OpenMP features (AMD GCN now also has an optimized memcpy_rect not only nvptx), but the main change is some shifting around to make it more consistent and better readable. I intend to commit this relatively soon; like always, comments and suggestions are welcome - be it before or after the commit. Current version: http://gcc.gnu.org/gcc-14/changes.html Thanks, Tobias
[PATCH] c++: non-dep array list-init w/ non-triv dtor [PR109899]
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/13/12? -- >8 -- The get_target_expr call added in r12-7069-g119cea98f66476 causes us for the below testcase to call build_vec_delete in a template context, which builds a templated destructor call and checks expr_noexcept_p for it, which ICEs because the call has templated form. Much of the work of build_vec_delete however is code generation and thus will just get throw away in a template context, including this expr_noexcept_p check and the code generation guarded by it. So this patch narrowly fixes this ICE by assuming the expr_noexcept_p call returns true in a template context. PR c++/109899 gcc/cp/ChangeLog: * init.cc (build_vec_delete_1): Assume expr_noexcept_p is true in a template context. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-array21.C: New test. --- gcc/cp/init.cc| 3 ++- gcc/testsuite/g++.dg/cpp0x/initlist-array21.C | 12 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-array21.C diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc index 09584719ee6..aa0a35a3885 100644 --- a/gcc/cp/init.cc +++ b/gcc/cp/init.cc @@ -4155,7 +4155,8 @@ build_vec_delete_1 (location_t loc, tree base, tree maxindex, tree type, /* If one destructor throws, keep trying to clean up the rest, unless we're already in a build_vec_init cleanup. */ - if (flag_exceptions && !in_cleanup && !expr_noexcept_p (tmp, tf_none)) + if (flag_exceptions && !in_cleanup && !processing_template_decl + && !expr_noexcept_p (tmp, tf_none)) { loop = build2 (TRY_CATCH_EXPR, void_type_node, loop, unshare_expr (loop)); diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C new file mode 100644 index 000..5e37e3de62a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist-array21.C @@ -0,0 +1,12 @@ +// PR c++/109899 +// { dg-do compile { target c++11 } } + +struct A { A(); ~A(); }; + +template +using array = T[42]; + +template +void f() { + array{}; +} -- 2.43.0.254.ga26002b628
Re: [PATCH] btf: print string position as comment for validation and testing purposes.
Thanks! Committed. David Faust writes: > Hi Cupertino, > > On 1/8/24 02:55, Cupertino Miranda wrote: >> Hi everyone, >> >> This patch adds a comment to the BTF strings regarding their position >> within the section. This is useful for assembly inspection purposes. >> >> Regards, >> Cupertino >> >> When using -dA, this function was only printing as comment btf_string or >> btf_aux_string. >> This patch changes the comment to also include the position of the >> string within the section in hexadecimal format. >> >> gcc/ChangeLog: >> * btfout.cc (output_btf_strs): Changed. > > Please be a little bit more expressive in the ChangeLog. > Something along the lines of "print string offset in comment" will be > much more useful. > > LGTM with that change, please apply. > Thanks! > >> --- >> gcc/btfout.cc | 7 +-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/btfout.cc b/gcc/btfout.cc >> index db4f1084f85c..04218adc9e66 100644 >> --- a/gcc/btfout.cc >> +++ b/gcc/btfout.cc >> @@ -1081,17 +1081,20 @@ static void >> output_btf_strs (ctf_container_ref ctfc) >> { >>ctf_string_t * ctf_string = ctfc->ctfc_strtable.ctstab_head; >> + static int str_pos = 0; >> >>while (ctf_string) >> { >> - dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string"); >> + dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string, str_pos >> = 0x%x", str_pos); >> + str_pos += strlen(ctf_string->cts_str) + 1; >>ctf_string = ctf_string->cts_next; >> } >> >>ctf_string = ctfc->ctfc_aux_strtable.ctstab_head; >>while (ctf_string) >> { >> - dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string"); >> + dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string, >> str_pos = 0x%x", str_pos); >> + str_pos += strlen(ctf_string->cts_str) + 1; >>ctf_string = ctf_string->cts_next; >> } >> }
Re: [PATCH] bpf: Correct BTF for kernel_helper attributed decls.
Thanks! Committed. David Faust writes: > Hi Cupetino, > > On 1/8/24 03:05, Cupertino Miranda wrote: >> Hi everyone, >> >> This patch address the problem reported in: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113225 >> >> Looking forward to your review. > > LGTM, thanks. Please apply. > >> >> Cheers, >> Cupertino >> >> >> This patch fix a problem with kernel_helper attribute BTF information, >> which incorrectly generates BTF_KIND_FUNC entry. >> This BTF entry although accurate with traditional extern function >> declarations, once the function is attributed with kernel_helper, it is >> semantically incompatible of the kernel helpers in BPF infrastructure. >> >> gcc/ChangeLog: >> PR target/113225 >> * btfout.cc (btf_collect_datasec): Skip creating BTF info for >> extern and kernel_helper attributed function decls. >> gcc/testsuite/ChangeLog: >> * gcc.target/bpf/attr-kernel-helper.c: New test. >> --- >> gcc/btfout.cc | 7 +++ >> gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c | 15 +++ >> 2 files changed, 22 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c >> >> diff --git a/gcc/btfout.cc b/gcc/btfout.cc >> index 04218adc9e66..39e7bec43bfb 100644 >> --- a/gcc/btfout.cc >> +++ b/gcc/btfout.cc >> @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see >> #include "diagnostic-core.h" >> #include "cgraph.h" >> #include "varasm.h" >> +#include "stringpool.h" >> +#include "attribs.h" >> #include "dwarf2out.h" /* For lookup_decl_die. */ >> >> static int btf_label_num; >> @@ -429,6 +431,11 @@ btf_collect_datasec (ctf_container_ref ctfc) >>if (dtd == NULL) >> continue; >> >> + if (DECL_EXTERNAL (func->decl) >> + && (lookup_attribute ("kernel_helper", >> +DECL_ATTRIBUTES (func->decl))) != NULL_TREE) >> +continue; >> + >>/* Functions actually get two types: a BTF_KIND_FUNC_PROTO, and >> also a BTF_KIND_FUNC. But the CTF container only allocates one >> type per function, which matches closely with BTF_KIND_FUNC_PROTO. >> diff --git a/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c >> b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c >> new file mode 100644 >> index ..7c5a0007c979 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c >> @@ -0,0 +1,15 @@ >> +/* Basic test for kernel_helper attribute BTF information. */ >> + >> +/* { dg-do compile } */ >> +/* { dg-options "-O0 -dA -gbtf" } */ >> + >> +extern int foo_helper(int) __attribute((kernel_helper(42))); >> +extern int foo_nohelper(int); >> + >> +int bar (int arg) >> +{ >> + return foo_helper (arg) + foo_nohelper (arg); >> +} >> + >> +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_nohelper'" 1 } } */ >> +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_helper'" 0 } } */
Re: [PATCH] OpenMP: Support accelerated 2D/3D memory copies for AMD GCN
On Thu, 21 Dec 2023 17:05:18 +0100 Tobias Burnus wrote: > I think it makes sense to split this patch into two parts: > > * The libgomp/plugin/plugin-gcn.c – which is independent and would > already used by omp_memcpy_rect. I will commit this version in a moment. I needed to add the DLSYM_OPT_FN bit from one of Andrew Stubbs's patches elsewhere. Re-tested with offloading to AMD GCN (...with a couple of patches applied locally to get working test results, as plain mainline as of a few days ago wasn't working too well for GCN offloading). Thanks for review! Julian commit 34c6e9132b3ea33c2e15c88e127c4134a5e88b8d Author: Julian Brown Date: Thu Jan 4 16:44:18 2024 + OpenMP: Support accelerated 2D/3D memory copies for AMD GCN This patch adds support for 2D/3D memory copies for omp_target_memcpy_rect using AMD extensions to the HSA API. This is just the AMD GCN-specific part of the following patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631001.html 2024-01-04 Julian Brown libgomp/ * plugin/plugin-gcn.c (hsa_runtime_fn_info): Add hsa_amd_memory_lock_fn, hsa_amd_memory_unlock_fn, hsa_amd_memory_async_copy_rect_fn function pointers. (init_hsa_runtime_functions): Add above functions, with DLSYM_OPT_FN. (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New functions. diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index e3e8b31c558..f24a28faa22 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -196,6 +196,16 @@ struct hsa_runtime_fn_info hsa_status_t (*hsa_code_object_deserialize_fn) (void *serialized_code_object, size_t serialized_code_object_size, const char *options, hsa_code_object_t *code_object); + hsa_status_t (*hsa_amd_memory_lock_fn) +(void *host_ptr, size_t size, hsa_agent_t *agents, int num_agent, + void **agent_ptr); + hsa_status_t (*hsa_amd_memory_unlock_fn) (void *host_ptr); + hsa_status_t (*hsa_amd_memory_async_copy_rect_fn) +(const hsa_pitched_ptr_t *dst, const hsa_dim3_t *dst_offset, + const hsa_pitched_ptr_t *src, const hsa_dim3_t *src_offset, + const hsa_dim3_t *range, hsa_agent_t copy_agent, + hsa_amd_copy_direction_t dir, uint32_t num_dep_signals, + const hsa_signal_t *dep_signals, hsa_signal_t completion_signal); }; /* Structure describing the run-time and grid properties of an HSA kernel @@ -1371,6 +1381,8 @@ init_hsa_runtime_functions (void) hsa_fns.function##_fn = dlsym (handle, #function); \ if (hsa_fns.function##_fn == NULL) \ return false; +#define DLSYM_OPT_FN(function) \ + hsa_fns.function##_fn = dlsym (handle, #function); void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY); if (handle == NULL) return false; @@ -1405,7 +1417,11 @@ init_hsa_runtime_functions (void) DLSYM_FN (hsa_signal_load_acquire) DLSYM_FN (hsa_queue_destroy) DLSYM_FN (hsa_code_object_deserialize) + DLSYM_OPT_FN (hsa_amd_memory_lock) + DLSYM_OPT_FN (hsa_amd_memory_unlock) + DLSYM_OPT_FN (hsa_amd_memory_async_copy_rect) return true; +#undef DLSYM_OPT_FN #undef DLSYM_FN } @@ -3933,6 +3949,352 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst, const void *src, size_t n) return true; } +/* Here _size refers to multiplied by size -- i.e. + measured in bytes. So we have: + + dim1_size: number of bytes to copy on innermost dimension ("row") + dim0_len: number of rows to copy + dst: base pointer for destination of copy + dst_offset1_size: innermost row offset (for dest), in bytes + dst_offset0_len: offset, number of rows (for dest) + dst_dim1_size: whole-array dest row length, in bytes (pitch) + src: base pointer for source of copy + src_offset1_size: innermost row offset (for source), in bytes + src_offset0_len: offset, number of rows (for source) + src_dim1_size: whole-array source row length, in bytes (pitch) +*/ + +int +GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, size_t dim1_size, + size_t dim0_len, void *dst, size_t dst_offset1_size, + size_t dst_offset0_len, size_t dst_dim1_size, + const void *src, size_t src_offset1_size, + size_t src_offset0_len, size_t src_dim1_size) +{ + if (!hsa_fns.hsa_amd_memory_lock_fn + || !hsa_fns.hsa_amd_memory_unlock_fn + || !hsa_fns.hsa_amd_memory_async_copy_rect_fn) +return -1; + + /* GCN hardware requires 4-byte alignment for base addresses & pitches. Bail + out quietly if we have anything oddly-aligned rather than letting the + driver raise an error. */ + if uintptr_t) dst) & 3) != 0 || (((uintptr_t) src) & 3) != 0) +return -1; + + if ((dst_dim1_size & 3) != 0 || (src_dim1_size & 3) != 0) +return -1; + + /* Only handle host to device or device to host transfers here. */ + if ((dst_ord == -1 && src_ord == -1) + || (dst_ord != -1 && src_ord != -1)) +return -1; + +
[PATCH][GCC][Arm] Define __ARM_FEATURE_BF16 when +bf16 feature is enabled
Hi, Arm GCC backend does not define __ARM_FEATURE_BF16 when +bf16 is specified (via -march option, or target pragma) whereas it is supposed to be tested before including arm_bf16.h (as specified in ACLE document: https://arm-software.github.io/acle/main/acle.html#arm_bf16h). gcc/ChangeLog: * config/arm/arm-c.cc (arm_cpu_builtins): define __ARM_FEATURE_BF16 * config/arm/arm.h: define TARGET_BF16 Ok for master ? Matthieudiff --git a/gcc/config/arm/arm-c.cc b/gcc/config/arm/arm-c.cc index 2e181bf7f36bab1209d5358e65d9513541683632..21ca22ac71119eda4ff01709aa95002ca13b1813 100644 --- a/gcc/config/arm/arm-c.cc +++ b/gcc/config/arm/arm-c.cc @@ -425,12 +425,14 @@ arm_cpu_builtins (struct cpp_reader* pfile) arm_arch_cde_coproc); def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM); + + def_or_undef_macro (pfile, "__ARM_FEATURE_BF16", TARGET_BF16); + def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE", + TARGET_BF16_FP); def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC", TARGET_BF16_FP); def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC", TARGET_BF16_SIMD); - def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE", - TARGET_BF16_FP || TARGET_BF16_SIMD); } void diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 2a2207c0ba1acef1c7082c89bf5f542b1466d033..e7a7fc47e606d2ead5f778dca2e63b2e894d0efe 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -252,10 +252,10 @@ emission of floating point pcs attributes. */ #define TARGET_I8MM (TARGET_NEON && arm_arch8_2 && arm_arch_i8mm) /* FPU supports Brain half-precision floating-point (BFloat16) extension. */ -#define TARGET_BF16_FP (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP5 \ - && arm_arch8_2 && arm_arch_bf16) -#define TARGET_BF16_SIMD (TARGET_NEON && TARGET_VFP5 \ - && arm_arch8_2 && arm_arch_bf16) +#define TARGET_BF16 (TARGET_32BIT && TARGET_HARD_FLOAT && arm_arch8_2 \ + && TARGET_VFP5 && arm_arch_bf16) +#define TARGET_BF16_FP (TARGET_BF16) +#define TARGET_BF16_SIMD (TARGET_BF16 && TARGET_NEON) /* Q-bit is present. */ #define TARGET_ARM_QBIT \
Re: [PATCH] bpf: Correct BTF for kernel_helper attributed decls.
Hi Cupetino, On 1/8/24 03:05, Cupertino Miranda wrote: > Hi everyone, > > This patch address the problem reported in: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113225 > > Looking forward to your review. LGTM, thanks. Please apply. > > Cheers, > Cupertino > > > This patch fix a problem with kernel_helper attribute BTF information, > which incorrectly generates BTF_KIND_FUNC entry. > This BTF entry although accurate with traditional extern function > declarations, once the function is attributed with kernel_helper, it is > semantically incompatible of the kernel helpers in BPF infrastructure. > > gcc/ChangeLog: > PR target/113225 > * btfout.cc (btf_collect_datasec): Skip creating BTF info for > extern and kernel_helper attributed function decls. > gcc/testsuite/ChangeLog: > * gcc.target/bpf/attr-kernel-helper.c: New test. > --- > gcc/btfout.cc | 7 +++ > gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c | 15 +++ > 2 files changed, 22 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c > > diff --git a/gcc/btfout.cc b/gcc/btfout.cc > index 04218adc9e66..39e7bec43bfb 100644 > --- a/gcc/btfout.cc > +++ b/gcc/btfout.cc > @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see > #include "diagnostic-core.h" > #include "cgraph.h" > #include "varasm.h" > +#include "stringpool.h" > +#include "attribs.h" > #include "dwarf2out.h" /* For lookup_decl_die. */ > > static int btf_label_num; > @@ -429,6 +431,11 @@ btf_collect_datasec (ctf_container_ref ctfc) >if (dtd == NULL) > continue; > > + if (DECL_EXTERNAL (func->decl) > + && (lookup_attribute ("kernel_helper", > + DECL_ATTRIBUTES (func->decl))) != NULL_TREE) > + continue; > + >/* Functions actually get two types: a BTF_KIND_FUNC_PROTO, and >also a BTF_KIND_FUNC. But the CTF container only allocates one >type per function, which matches closely with BTF_KIND_FUNC_PROTO. > diff --git a/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c > b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c > new file mode 100644 > index ..7c5a0007c979 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c > @@ -0,0 +1,15 @@ > +/* Basic test for kernel_helper attribute BTF information. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O0 -dA -gbtf" } */ > + > +extern int foo_helper(int) __attribute((kernel_helper(42))); > +extern int foo_nohelper(int); > + > +int bar (int arg) > +{ > + return foo_helper (arg) + foo_nohelper (arg); > +} > + > +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_nohelper'" 1 } } */ > +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_helper'" 0 } } */
Re: [PATCH] Add a late-combine pass [PR106594]
On 1/8/24 09:59, Richard Sandiford wrote: This is a bit of a hopeful stab, but is the problem that recog_data still had the previous contents of insn 3674, and so extract_insn_cached wrongly thought that it doesn't need to do anything? If so, does something like: diff --git a/gcc/recog.cc b/gcc/recog.cc index a6799e3f5e6..8ba63c78179 100644 --- a/gcc/recog.cc +++ b/gcc/recog.cc @@ -267,6 +267,8 @@ validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool in_group, case invalid. */ changes[num_changes].old_code = INSN_CODE (object); INSN_CODE (object) = -1; + if (recog_data.insn == object) + recog_data.insn = nullptr; } num_changes++; fix it? I suppose there's an argument that this belongs in whatever code sets INSN_CODE to a new nonnegative value (so recog_level2 for RTL-SSA). But doing it in validate_change_1 seems more robust, since anything calling that function is considering changing the insn code. Nope, doesn't help at all. I'd briefly put a reset of the INSN_CODE and a call to recog_memoized in the costing path of rtl-ssa to see if that would allow things to move forward, but it failed miserably. I'll pass along the .i file separately. Hopefully it'll fail for you and you can debug. But given failure depends on stale bits in recog_data, it may not. Jeff
Re: [PATCH] btf: print string position as comment for validation and testing purposes.
Hi Cupertino, On 1/8/24 02:55, Cupertino Miranda wrote: > Hi everyone, > > This patch adds a comment to the BTF strings regarding their position > within the section. This is useful for assembly inspection purposes. > > Regards, > Cupertino > > When using -dA, this function was only printing as comment btf_string or > btf_aux_string. > This patch changes the comment to also include the position of the > string within the section in hexadecimal format. > > gcc/ChangeLog: > * btfout.cc (output_btf_strs): Changed. Please be a little bit more expressive in the ChangeLog. Something along the lines of "print string offset in comment" will be much more useful. LGTM with that change, please apply. Thanks! > --- > gcc/btfout.cc | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/gcc/btfout.cc b/gcc/btfout.cc > index db4f1084f85c..04218adc9e66 100644 > --- a/gcc/btfout.cc > +++ b/gcc/btfout.cc > @@ -1081,17 +1081,20 @@ static void > output_btf_strs (ctf_container_ref ctfc) > { >ctf_string_t * ctf_string = ctfc->ctfc_strtable.ctstab_head; > + static int str_pos = 0; > >while (ctf_string) > { > - dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string"); > + dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string, str_pos > = 0x%x", str_pos); > + str_pos += strlen(ctf_string->cts_str) + 1; >ctf_string = ctf_string->cts_next; > } > >ctf_string = ctfc->ctfc_aux_strtable.ctstab_head; >while (ctf_string) > { > - dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string"); > + dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string, > str_pos = 0x%x", str_pos); > + str_pos += strlen(ctf_string->cts_str) + 1; >ctf_string = ctf_string->cts_next; > } > }
Re: [PATCH] c++/modules: Prevent overwriting arguments for duplicates [PR112588]
On Mon, 8 Jan 2024, Nathaniel Shead wrote: > On Sat, Jan 06, 2024 at 05:32:37PM -0500, Nathan Sidwell wrote: > > I;m not sure about this, there was clearly a reason I did it the way it is, > > but perhaps that reasoning became obsolete -- something about an existing > > declaration and reading in a definition maybe? > > > > nathan > > So I took a bit of a closer look and this is actually a regression, > seeming to start with r13-3134-g09df0d8b14dda6. I haven't looked more > closely at the actual change though to see whether this implies a > different fix yet though. Interesting.. FWIW I applied your patch to the gcc 12 release branch, which doesn't have r13-3134, and there were no modules testsuite regressions there either, which at least suggests that this maybe_dup logic isn't directly related to the optimization that r13-3134 removed. Your patch also seems to fix PR99244 (which AFAICT is not a regression) > > Nathaniel > > > On 11/22/23 06:33, Nathaniel Shead wrote: > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write > > > access. > > > > > > -- >8 -- > > > > > > When merging duplicate instantiations of function templates, currently > > > read_function_def overwrites the arguments with that of the existing > > > duplicate. This is problematic, however, since this means that the > > > PARM_DECLs in the body of the function definition no longer match with > > > the PARM_DECLs in the argument list, which causes issues when it comes > > > to generating RTL. > > > > > > There doesn't seem to be any reason to do this replacement, so this > > > patch removes that logic. > > > > > > PR c++/112588 > > > > > > gcc/cp/ChangeLog: > > > > > > * module.cc (trees_in::read_function_def): Don't overwrite > > > arguments. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * g++.dg/modules/merge-16.h: New test. > > > * g++.dg/modules/merge-16_a.C: New test. > > > * g++.dg/modules/merge-16_b.C: New test. > > > > > > Signed-off-by: Nathaniel Shead > > > --- > > > gcc/cp/module.cc | 2 -- > > > gcc/testsuite/g++.dg/modules/merge-16.h | 10 ++ > > > gcc/testsuite/g++.dg/modules/merge-16_a.C | 7 +++ > > > gcc/testsuite/g++.dg/modules/merge-16_b.C | 5 + > > > 4 files changed, 22 insertions(+), 2 deletions(-) > > > create mode 100644 gcc/testsuite/g++.dg/modules/merge-16.h > > > create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_a.C > > > create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_b.C > > > > > > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc > > > index 4f5b6e2747a..2520ab659cc 100644 > > > --- a/gcc/cp/module.cc > > > +++ b/gcc/cp/module.cc > > > @@ -11665,8 +11665,6 @@ trees_in::read_function_def (tree decl, tree > > > maybe_template) > > > DECL_RESULT (decl) = result; > > > DECL_INITIAL (decl) = initial; > > > DECL_SAVED_TREE (decl) = saved; > > > - if (maybe_dup) > > > - DECL_ARGUMENTS (decl) = DECL_ARGUMENTS (maybe_dup); > > > if (context) > > > SET_DECL_FRIEND_CONTEXT (decl, context); > > > diff --git a/gcc/testsuite/g++.dg/modules/merge-16.h > > > b/gcc/testsuite/g++.dg/modules/merge-16.h > > > new file mode 100644 > > > index 000..fdb38551103 > > > --- /dev/null > > > +++ b/gcc/testsuite/g++.dg/modules/merge-16.h > > > @@ -0,0 +1,10 @@ > > > +// PR c++/112588 > > > + > > > +void f(int*); > > > + > > > +template > > > +struct S { > > > + void g(int n) { f(); } > > > +}; > > > + > > > +template struct S; If we use a partial specialization here instead (which would have disabled the removed optimization, demonstrating how fragile/inconsistent it was) void f(int*); template struct S { }; template struct S { void g(int n) { f(); } }; template struct S; then the ICE appears earlier, since GCC 12 instead of 13. > > > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_a.C > > > b/gcc/testsuite/g++.dg/modules/merge-16_a.C > > > new file mode 100644 > > > index 000..c243224c875 > > > --- /dev/null > > > +++ b/gcc/testsuite/g++.dg/modules/merge-16_a.C > > > @@ -0,0 +1,7 @@ > > > +// PR c++/112588 > > > +// { dg-additional-options "-fmodules-ts" } > > > +// { dg-module-cmi merge16 } > > > + > > > +module; > > > +#include "merge-16.h" > > > +export module merge16; > > > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_b.C > > > b/gcc/testsuite/g++.dg/modules/merge-16_b.C > > > new file mode 100644 > > > index 000..8c7b1f0511f > > > --- /dev/null > > > +++ b/gcc/testsuite/g++.dg/modules/merge-16_b.C > > > @@ -0,0 +1,5 @@ > > > +// PR c++/112588 > > > +// { dg-additional-options "-fmodules-ts" } > > > + > > > +#include "merge-16.h" > > > +import merge16; > > > > -- > > Nathan Sidwell > > > >
Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
On 1/8/24 09:57, Andrew Pinski wrote: On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak wrote: Instead of converting XOR or PLUS of two values, ANDed with two constants that have no bits in common, to IOR expression, convert IOR or XOR of said two ANDed values to PLUS expression. I think this only helps targets which have leal like instruction. Also I think it is the same issue as I recorded as PR 111763 . I suspect BIT_IOR is more of a Canonical form for GIMPLE while we should handle this in expand to decide if we want to use PLUS or IOR. Actually there's benefit on RISC-V to using PLUS over IOR/XOR when there's no bits in common. In fact, I've been asked to do that by Andrew W. for a case where we know ahead of time there's no bits in common in a sequence that currently uses IOR. Specifically it can allow more use of the compact instructions as the compact PLUS allows the full set of hard registers while compact IOR/XOR only allow a subset of registers. jeff
Re: [PATCH] Add a late-combine pass [PR106594]
Jeff Law writes: > On 1/8/24 04:52, Richard Sandiford wrote: >> Jeff Law writes: >>> The other issue that's been in the back of my mind is costing. But I >>> think the model here is combine without regards to cost. >> >> No, it does take costing into account. For size, it's the usual >> "sum up the before and after insn costs and see which one is lower". >> For speed, the costs are weighted by execution frequency, so e.g. >> two insns of cost 4 in the same block can be combined into a single >> instruction of cost 8, but a hoisted invariant can only be combined >> into a loop body instruction if the loop body instruction's cost >> doesn't increase significantly. >> >> This is done by rtl_ssa::changes_are_worthwhile. > You're absolutely correct. My bad. > > Interesting that's exactly where we do have a notable concern. Gah. > If you remember, there were a few ports that failed to build > newlib/libgcc that we initially ignored. I went back and looked at one > (arc-elf). > > What appears to be happening for arc-elf is we're testing to see if the > change is profitable. On arc-elf the costing model is highly dependent > on the length of the insns. > > We've got a very reasonable looking insn: > >> (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300]) >> (ashift:SI (reg:SI 27 fp [548]) >> (const_int 4 [0x4]))) >> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 >> {*ashlsi3_insn} >> (nil)) > > We call rtl_ssa::changes_are_profitable -> insn_cost -> arc_insn_cost -> > get_attr_length -> get_attr_length_1 -> insn_default_length > > insn_default_length grubs around looking at the operands via recog_data > which appears to be stale: > > > >> (gdb) p debug_rtx(recog_data.operand[0]) >> (reg/v:SI 18 r18 [orig:300 inex ] [300]) >> $4 = void >> (gdb) p debug_rtx(recog_data.operand[1]) >> (reg/v:SI 3 r3 [orig:300 inex ] [300]) >> $5 = void >> (gdb) p debug_rtx(recog_data.operand[2]) >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x01432955 in rtx_writer::print_rtx (this=0x7fffe0e0, >> in_rtx=0xabababababababab) at /home/jlaw/test/gcc/gcc/print-rtl.cc:809 >> 809 else if (GET_CODE (in_rtx) > NUM_RTX_CODE) > > Note the 0xabab That was accessing operand #2, which should have > been (const_int 4). > > Sure enough if I force re-recognition then look at the recog_data I get > the right values. > > After LRA we have: > >> (insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300]) >> (ashift:SI (reg:SI 27 fp [548]) >> (const_int 4 [0x4]))) >> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 >> {*ashlsi3_insn} >> (nil)) >> (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300]) >> (reg/v:SI 3 r3 [orig:300 inex ] [300])) >> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 3 >> {*movsi_insn} >> (nil)) > > In the emergency dump in late_combine2 (so cleanup hasn't been done): > >> (insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300]) >> (ashift:SI (reg:SI 27 fp [548]) >> (const_int 4 [0x4]))) >> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 >> {*ashlsi3_insn} >> (nil)) >> (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300]) >> (ashift:SI (reg:SI 27 fp [548]) >> (const_int 4 [0x4]))) >> "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 >> {*ashlsi3_insn} >> (nil)) > > > Which brings us to the question. If we change the form of an insn, then > ask for its cost, don't we need to make sure the insn is re-recognized > as the costing function may do things like query the insn's length which > would use cached recog_data? Yeah, this only happens once we've verified that the new instruction is valid. And it looks from the emergency dump above that the insn code has been correctly updated to *ashlsi3_insn. This is a bit of a hopeful stab, but is the problem that recog_data still had the previous contents of insn 3674, and so extract_insn_cached wrongly thought that it doesn't need to do anything? If so, does something like: diff --git a/gcc/recog.cc b/gcc/recog.cc index a6799e3f5e6..8ba63c78179 100644 --- a/gcc/recog.cc +++ b/gcc/recog.cc @@ -267,6 +267,8 @@ validate_change_1 (rtx object, rtx *loc, rtx new_rtx, bool in_group, case invalid. */ changes[num_changes].old_code = INSN_CODE (object); INSN_CODE (object) = -1; + if (recog_data.insn == object) + recog_data.insn = nullptr; } num_changes++; fix it? I suppose there's an argument that this belongs in whatever code sets INSN_CODE to a new nonnegative value (so recog_level2 for RTL-SSA). But doing it in validate_change_1 seems more robust, since anything calling that function is considering changing the insn code. Thanks for debugging the problem. Richard
Re: breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20
On Mon, 8 Jan 2024 at 16:25, Hans-Peter Nilsson wrote: > > (Sorry, never a bringer of good news...) Regarding this bit ... even if you're reporting something I've broken, I like to see it as an incremental step towards better portability, so it's always good news ;-)
Re: [PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
On Mon, Jan 8, 2024 at 6:44 AM Uros Bizjak wrote: > > Instead of converting XOR or PLUS of two values, ANDed with two constants that > have no bits in common, to IOR expression, convert IOR or XOR of said two > ANDed values to PLUS expression. I think this only helps targets which have leal like instruction. Also I think it is the same issue as I recorded as PR 111763 . I suspect BIT_IOR is more of a Canonical form for GIMPLE while we should handle this in expand to decide if we want to use PLUS or IOR. Thanks, Andrew Pinski > > If we consider the following testcase: > > --cut here-- > unsigned int foo (unsigned int a, unsigned int b) > { > unsigned int r = a & 0x1; > unsigned int p = b & ~0x3; > > return r + p + 2; > } > > unsigned int bar (unsigned int a, unsigned int b) > { > unsigned int r = a & 0x1; > unsigned int p = b & ~0x3; > > return r | p | 2; > } > --cut here-- > > the above testcase compiles (x86_64 -O2) to: > > foo: > andl$1, %edi > andl$-4, %esi > orl %esi, %edi > leal2(%rdi), %eax > ret > > bar: > andl$1, %edi > andl$-4, %esi > orl %esi, %edi > movl%edi, %eax > orl $2, %eax > ret > > There is no further simplification possible in any case, we can't combine > OR with a PLUS in the first case, and we don't have OR instruction with > multiple inputs in the second case. > > If we switch around the logic in the conversion and convert from IOR/XOR > to PLUS, then the resulting assembly reads: > > foo: > andl$-4, %esi > andl$1, %edi > leal2(%rsi,%rdi), %eax > ret > > bar: > andl$1, %edi > andl$-4, %esi > leal(%rdi,%rsi), %eax > orl $2, %eax > ret > > On x86, the conversion can now use LEA instruction, which is much more > usable than OR instruction. In the first case, LEA implements three input > ADD instruction, while in the second case, even though the instruction > can't be combined with a follow-up OR, the non-destructive LEA avoids a move. > > PR target/108477 > > gcc/ChangeLog: > > * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2): > Do not convert PLUS of two values, ANDed with two constants > that have no bits in common to IOR exporession, convert > IOR or XOR of said two ANDed values to PLUS expression. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr108477.c: New test. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > OK for mainline? > > Uros.
Re: breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20
On Mon, 8 Jan 2024 at 16:28, Hans-Peter Nilsson wrote: > > > From: Hans-Peter Nilsson > > Date: Mon, 8 Jan 2024 17:24:35 +0100 > > > For some reason, this (r14-6990-g74a0dab18292be) breaks a > > build of (newlib targets) at least cris-elf and arm-eabi: > > ...aaand, just now fixed in r14-7007-geb846114ed7c49. > (Thanks!) Yup, it got reported on IRC this morning, but I had to finish testing the fix. Sorry for the temporary breakage.
Re: [PATCH] RISC-V: Teach liveness computation loop invariant shift amount[Dynamic LMUL]
> > + if (is_gimple_min_invariant (op)) > > + return true; > > + if (SSA_NAME_IS_DEFAULT_DEF (op) > > + || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT > (op > > + return true; > > + return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1; > > +} > > + Does gimple_uid ever return something useful for us here? In tree-ssa-loop-ch it is being populated before and then used but I don't think we populate it properly? So my question would be, isn't is_gimple_constant and flow_bb_inside_loop_p sufficient for our purpose? Regards Robin
Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a
Hi Richard, >> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance >> once >> the atomic is acquire, release or both. Given there is already a significant >> overhead due >> to the function call, PLT indirection and argument setup, it doesn't make >> sense to add >> extra taken branches that may mispredict or cause extra fetch cycles... > > Thanks for the extra context, especially wrt the LSE/LSE2 benchmarking. > If there isn't any difference for acquire vs. the rest, is there a > justification we can use for keeping the acquire branch, rather than > using SWPAL for everything except relaxed? The results showed that acquire is typically slightly faster than release (5-10%), so for the most frequently used atomics (CAS and SWP) it makes sense to add support for acquire. In most cases once you have release semantics, adding acquire didn't make things slower, so combining release/acq_rel/seq_cst avoids unnecessary extra branches and keeps the code small. > If so, then Victor, could you include that in the explanation above and > add it as a source comment? Although maybe tone down "doesn't make > sense to add" to something like "doesn't seem worth adding". :) Yes it's worth adding a comment to this effect. Cheers, Wilco
Re: breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20
> From: Hans-Peter Nilsson > Date: Mon, 8 Jan 2024 17:24:35 +0100 > For some reason, this (r14-6990-g74a0dab18292be) breaks a > build of (newlib targets) at least cris-elf and arm-eabi: ...aaand, just now fixed in r14-7007-geb846114ed7c49. (Thanks!) brgds, H-P
breakage with: [committed] libstdc++: Implement P2909R4 ("Dude, where's my char?") for C++20
(Sorry, never a bringer of good news...) > From: Jonathan Wakely > Date: Mon, 8 Jan 2024 01:15:50 + > Tested x86_64-linux and aarch64-linux. Pushed to trunk. > > -- >8 -- > > This change ensures that char and wchar_t arguments are formatted > consistently when using integer presentation types. This avoids > non-portable std::format output that depends on whether char and wchar_t > happen to be signed or unsigned on the target. Formatting '\xff' as an > integer will now always format 255 and not sometimes -1. This was > approved in Kona 2023 as a DR for C++20 so the change is implemented > unconditionally. > > Also make character formatters check for _Pres_c explicitly and call > _M_format_character directly. This avoid the overhead of calling format > and _S_to_character and then calling _M_format_character anyway. > > libstdc++-v3/ChangeLog: > > * include/bits/version.def (format_uchar): Define. > * include/bits/version.h: Regenerate. > * include/std/format (formatter::format): Check for > _Pres_c and call _M_format_character directly. Cast C to its > unsigned equivalent for formatting as an integer. > (formatter::format): Likewise. > (basic_format_arg(T&)): Store char arguments as unsigned char > for formatting to a wide string. > * testsuite/std/format/functions/format.cc: Adjust test. Check > formatting of For some reason, this (r14-6990-g74a0dab18292be) breaks a build of (newlib targets) at least cris-elf and arm-eabi: libtool: compile: /obj/./gcc/xgcc -shared-libgcc -B/obj/./gcc -nostdinc++ -L/obj/cris-elf/libstdc++-v3/src -L/obj/cris-elf/libstdc++-v3/src/.libs -L/obj/cris-elf/libstdc++-v3/libsupc++/.libs -nostdinc -B/obj/cris-elf/newlib/ -isystem /obj/cris-elf/newlib/targ-include -isystem /x/gcc/newlib/libc/include -B/obj/cris-elf/libgloss/cris -L/obj/cris-elf/libgloss/libnosys -L/x/gcc/libgloss/cris -B/x/cris-elf/pre/cris-elf/bin/ -B/x/cris-elf/pre/cris-elf/lib/ -isystem /x/cris-elf/pre/cris-elf/include -isystem /x/cris-elf/pre/cris-elf/sys-include -I/x/gcc/libstdc++-v3/../libgcc -I/obj/cris-elf/libstdc++-v3/include/cris-elf -I/obj/cris-elf/libstdc++-v3/include -I/x/gcc/libstdc++-v3/libsupc++ -std=gnu++20 -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi=2 -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=tzdb.lo -fimplicit-templates -g -O2 -I. -c /x/gcc/libstdc++-v3/src/c++20/tzdb.cc -o tzdb.o In file included from /x/gcc/newlib/libc/include/time.h:11, from /obj/cris-elf/libstdc++-v3/include/ctime:42, from /obj/cris-elf/libstdc++-v3/include/bits/chrono.h:40, from /obj/cris-elf/libstdc++-v3/include/chrono:41, from /x/gcc/libstdc++-v3/src/c++20/tzdb.cc:31: /obj/cris-elf/libstdc++-v3/include/bits/unicode.h:86:37: error: declaration does not declare anything [-fpermissive] 86 | inline constexpr _Null_sentinel_t __null_sentinel; | ^~~ make[5]: *** [Makefile:754: tzdb.lo] Error 1 I don't see anything immediately related to that line in the patch, though, so the actual cause and fix isn't obvious, at least to me. brgds, H-P
Re: [PATCH] Add a late-combine pass [PR106594]
On 1/8/24 04:52, Richard Sandiford wrote: Jeff Law writes: The other issue that's been in the back of my mind is costing. But I think the model here is combine without regards to cost. No, it does take costing into account. For size, it's the usual "sum up the before and after insn costs and see which one is lower". For speed, the costs are weighted by execution frequency, so e.g. two insns of cost 4 in the same block can be combined into a single instruction of cost 8, but a hoisted invariant can only be combined into a loop body instruction if the loop body instruction's cost doesn't increase significantly. This is done by rtl_ssa::changes_are_worthwhile. You're absolutely correct. My bad. Interesting that's exactly where we do have a notable concern. If you remember, there were a few ports that failed to build newlib/libgcc that we initially ignored. I went back and looked at one (arc-elf). What appears to be happening for arc-elf is we're testing to see if the change is profitable. On arc-elf the costing model is highly dependent on the length of the insns. We've got a very reasonable looking insn: (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300]) (ashift:SI (reg:SI 27 fp [548]) (const_int 4 [0x4]))) "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 {*ashlsi3_insn} (nil)) We call rtl_ssa::changes_are_profitable -> insn_cost -> arc_insn_cost -> get_attr_length -> get_attr_length_1 -> insn_default_length insn_default_length grubs around looking at the operands via recog_data which appears to be stale: (gdb) p debug_rtx(recog_data.operand[0]) (reg/v:SI 18 r18 [orig:300 inex ] [300]) $4 = void (gdb) p debug_rtx(recog_data.operand[1]) (reg/v:SI 3 r3 [orig:300 inex ] [300]) $5 = void (gdb) p debug_rtx(recog_data.operand[2]) Program received signal SIGSEGV, Segmentation fault. 0x01432955 in rtx_writer::print_rtx (this=0x7fffe0e0, in_rtx=0xabababababababab) at /home/jlaw/test/gcc/gcc/print-rtl.cc:809 809 else if (GET_CODE (in_rtx) > NUM_RTX_CODE) Note the 0xabab That was accessing operand #2, which should have been (const_int 4). Sure enough if I force re-recognition then look at the recog_data I get the right values. After LRA we have: (insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300]) (ashift:SI (reg:SI 27 fp [548]) (const_int 4 [0x4]))) "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 {*ashlsi3_insn} (nil)) (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300]) (reg/v:SI 3 r3 [orig:300 inex ] [300])) "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 3 {*movsi_insn} (nil)) In the emergency dump in late_combine2 (so cleanup hasn't been done): (insn 753 2434 3674 98 (set (reg/v:SI 3 r3 [orig:300 inex ] [300]) (ashift:SI (reg:SI 27 fp [548]) (const_int 4 [0x4]))) "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 {*ashlsi3_insn} (nil)) (insn 3674 753 2851 98 (set (reg/v:SI 18 r18 [orig:300 inex ] [300]) (ashift:SI (reg:SI 27 fp [548]) (const_int 4 [0x4]))) "../../../..//newlib-cygwin/newlib/libc/stdlib/gdtoa-gdtoa.c":437:10 120 {*ashlsi3_insn} (nil)) Which brings us to the question. If we change the form of an insn, then ask for its cost, don't we need to make sure the insn is re-recognized as the costing function may do things like query the insn's length which would use cached recog_data? jeff
[committed] libstdc++: Remove std::__unicode::__null_sentinel
Tested x86_64-linux, pushed to trunk. -- >8 -- The name __null_sentinel is defined as a macro by newlib, so we can't use it as an identifier. That variable is not actually used by libstdc++, it was added because P2728R6 proposes std::uc::null_sentinel. Since we don't need it and it breaks bootstrap for newlib targets, just remove it. A null sentinel can still be used by constructing a _Null_sentinel_t object as needed, rather than having a named object of that type predefined. libstdc++-v3/ChangeLog: * include/bits/unicode.h (__null_sentinel): Remove. * testsuite/17_intro/names.cc: Add __null_sentinel. --- libstdc++-v3/include/bits/unicode.h | 2 -- libstdc++-v3/testsuite/17_intro/names.cc | 1 + 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/libstdc++-v3/include/bits/unicode.h b/libstdc++-v3/include/bits/unicode.h index 66f8399fdfb..e49498a0531 100644 --- a/libstdc++-v3/include/bits/unicode.h +++ b/libstdc++-v3/include/bits/unicode.h @@ -83,8 +83,6 @@ namespace __unicode { return *__it == iter_value_t<_It>{}; } }; - inline constexpr _Null_sentinel_t __null_sentinel; - template _Sent = _Iter, typename _ErrorHandler = _Repl> diff --git a/libstdc++-v3/testsuite/17_intro/names.cc b/libstdc++-v3/testsuite/17_intro/names.cc index 5e77e9f2ab0..53c5aff219d 100644 --- a/libstdc++-v3/testsuite/17_intro/names.cc +++ b/libstdc++-v3/testsuite/17_intro/names.cc @@ -140,6 +140,7 @@ // These clash with newlib so don't use them. # define __lockablecannot be used as an identifier +# define __null_sentinel cannot be used as an identifier # define __packed cannot be used as an identifier # define __unused cannot be used as an identifier # define __usedcannot be used as an identifier -- 2.43.0
[libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].
Bootstrapping GCC on arm-linux-gnueabihf with --with-arch=armv6 currently has a large number of FAILs in libatomic (regressions since last time I attempted this). The failure mode is related to IFUNC handling with the file tas_8_2_.o containing an unresolved reference to the function libat_test_and_set_1_i2. Bearing in mind I've no idea what's going on, the following one line change, to build tas_1_2_.o when building tas_8_2_.o, resolves the problem for me and restores the libatomic testsuite to 44 expected passes and 5 unsupported tests [from 22 unexpected failures and 22 unresolved testcases]. If this looks like the correct fix, I'm not confident with rebuilding Makefile.in with correct version of automake, so I'd very much appreciate it if someone/the reviewer/mainainer could please check this in for me. Thanks in advance. 2024-01-08 Roger Sayle libatomic/ChangeLog * Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX * Makefile.in: Regenerate. Roger -- diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am index cfad90124f9..e0988a18c9a 100644 --- a/libatomic/Makefile.am +++ b/libatomic/Makefile.am @@ -139,6 +139,7 @@ if ARCH_ARM_LINUX IFUNC_OPTIONS = -march=armv7-a+fp -DHAVE_KERNEL64 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS))) libatomic_la_LIBADD += $(addsuffix _8_2_.lo,$(SIZEOBJS)) +libatomic_la_LIBADD += $(addsuffix _1_2_.lo,$(SIZEOBJS)) endif if ARCH_I386 IFUNC_OPTIONS = -march=i586
Re: [RFA] [V3] new pass for sign/zero extension elimination
Jeff Law writes: >>> + >>> +/* Initialization of the ext-dce pass. Primarily this means >>> + setting up the various bitmaps we utilize. */ >>> + >>> +static void >>> +ext_dce_init (void) >>> +{ >>> + >> >> Nit: excess blank line. > Various nits have been fixed. I think those are all mine. For reasons > I don't understand to this day, my brain thinks there should be vertical > whitespace between the function comment and the definition. I'm > constantly having to fix that. Yeah, I've never known whether a blank line is preferred between the comment and function definition. When I started (obviously somewhat later than you :)), "yes" seemed to be much more common, but now it's pretty mixed. So I just do what surrounding code does. (Personally I slightly prefer the blank line.) So I wasn't commenting on that part, although reading it back, I can see how it looked like that. It was just on the blank line immediately above, after the opening "{". I.e. there were some instances of: void f (void) { ...foo...; } rather than: void f (void) { ...foo...; } Thanks, Richard
Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
Victor Do Nascimento writes: > On 1/5/24 11:10, Richard Sandiford wrote: >> Victor Do Nascimento writes: >>> The introduction of further architectural-feature dependent ifuncs >>> for AArch64 makes hard-coding ifunc `_i' suffixes to functions >>> cumbersome to work with. It is awkward to remember which ifunc maps >>> onto which arch feature and makes the code harder to maintain when new >>> ifuncs are added and their suffixes possibly altered. >>> >>> This patch uses pre-processor `#define' statements to map each suffix to >>> a descriptive feature name macro, for example: >>> >>>#define LSE2 _i1 >>> >>> and reconstructs function names with the pre-processor's token >>> concatenation feature, such that for `MACRO(_i)', we would >>> now have `MACRO_FEAT(name, feature)' and in the macro definition body >>> we replace `name` with `name##feature`. >> >> FWIW, another way of doing this would be to have: >> >> #define CORE(NAME) NAME >> #define LSE2(NAME) NAME##_i1 >> >> and use feature(name) instead of name##feature. This has the slight >> advantage of not using ## on empty tokens, and the maybe slightly >> better advantage of not needing the extra forwarding step in: >> >> #define ENTRY_FEAT(name, feat) \ >> ENTRY_FEAT1(name, feat) >> >> #define ENTRY_FEAT1(name, feat) \ >> >> WDYT? >> >> Richard >> > > While from a strictly stylistic point of view, I'm not so keen on the > resulting interface and its 'function call within a function call' look, > e.g. > >ENTRY (LSE2 (libat_compare_exchange_16)) > > and > >ALIAS (LSE128 (libat_compare_exchange_16), \ > LSE2 (libat_compare_exchange_16)) > > on the implementation-side of things, I like the benefits this brings > about. Namely allowing the use of the unaltered original > implementations of the ENTRY, END and ALIAS macros with the > aforementioned advantages of not having to use ## on empty tokens and > abolishing the need for the extra forwarding step. > > I'm happy enough to go with this approach. I was thinking that the invocations would stay the same. A C example is: #define LSE2(NAME) NAME##_i2 #define ENTRY(NAME, FEAT) void FEAT (NAME) () ENTRY(foo, LSE2) {} https://godbolt.org/z/rdn5dEMPM Thanks, Richard
Re: [PATCH v2] c++/modules: Differentiate extern templates and TYPE_DECL_SUPPRESS_DEBUG [PR112820]
On Mon, 8 Jan 2024, Nathaniel Shead wrote: > On Thu, Jan 04, 2024 at 03:39:15PM -0500, Patrick Palka wrote: > > On Sun, 3 Dec 2023, Nathaniel Shead wrote: > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > > > > > -- >8 -- > > > > > > The TYPE_DECL_SUPPRESS_DEBUG and DECL_EXTERNAL flags use the same > > > underlying bit. This is causing confusion when attempting to determine > > > the interface for a streamed-in class type, since the modules code > > > currently assumes that all DECL_EXTERNAL types are extern templates. > > > However, when -g is specified then TYPE_DECL_SUPPRESS_DEBUG (and hence > > > DECL_EXTERNAL) is marked on various other kinds of declarations, such as > > > vtables, which causes them to never be emitted. > > > > Good catch.. Maybe we should use different bits for these flags? I > > wouldn't be > > surprised if this bit sharing causes issues elsewhere in the compiler. The > > documentation in tree.h / tree-core.h says DECL_EXTERNAL is only valid for > > VAR_DECL and FUNCTION_DECL, so at one point it was safe to share the same > > bit > > but that's not true anymore it seems. > > > > Looking at tree-core.h:tree_decl_common luckily we have plenty of spare > > bits. > > We could also e.g. make TYPE_DECL_SUPPRESS_DEBUG use the decl_not_flexarray > > bit > > which is otherwise only used for FIELD_DECL. > > > > That seems like a good idea, thanks. How does this look? > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > -- >8 -- > > Currently, DECL_EXTERNAL and TYPE_DECL_SUPPRESS_DEBUG share a bit. This > causes issues with module code, which then incorrectly assumes that > anything with suppressed debug info (such as vtables when '-g' is > specified) is an extern template and thus prevents their emission. > > This patch splits the two flags up; extern templates continue to use the > DECL_EXTERNAL flag (and the documentation is updated to indicate this), > but TYPE_DECL_SUPPRESS_DEBUG now uses the 'decl_not_flexarray' flag, > which currently is only used by FIELD_DECLs. > > PR c++/112820 > PR c++/102607 > > gcc/cp/ChangeLog: > > * pt.cc (mark_class_instantiated): Set DECL_EXTERNAL explicitly. > > gcc/ChangeLog: > > * tree-core.h (struct tree_decl_common): Update comments. > * tree.h (DECL_EXTERNAL): Update comments. > (TYPE_DECL_SUPPRESS_DEBUG): Use 'decl_not_flexarray' instead. > > gcc/testsuite/ChangeLog: > > * g++.dg/modules/debug-2_a.C: New test. > * g++.dg/modules/debug-2_b.C: New test. > * g++.dg/modules/debug-2_c.C: New test. > * g++.dg/modules/debug-3_a.C: New test. > * g++.dg/modules/debug-3_b.C: New test. > > Signed-off-by: Nathaniel Shead > --- > gcc/cp/pt.cc | 1 + > gcc/testsuite/g++.dg/modules/debug-2_a.C | 9 + > gcc/testsuite/g++.dg/modules/debug-2_b.C | 8 > gcc/testsuite/g++.dg/modules/debug-2_c.C | 9 + > gcc/testsuite/g++.dg/modules/debug-3_a.C | 8 > gcc/testsuite/g++.dg/modules/debug-3_b.C | 9 + > gcc/tree-core.h | 6 +++--- > gcc/tree.h | 8 > 8 files changed, 51 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc > index e38e7a773f0..7839745035b 100644 > --- a/gcc/cp/pt.cc > +++ b/gcc/cp/pt.cc > @@ -26256,6 +26256,7 @@ mark_class_instantiated (tree t, int extern_p) >SET_CLASSTYPE_EXPLICIT_INSTANTIATION (t); >SET_CLASSTYPE_INTERFACE_KNOWN (t); >CLASSTYPE_INTERFACE_ONLY (t) = extern_p; > + DECL_EXTERNAL (TYPE_NAME (t)) = extern_p; >TYPE_DECL_SUPPRESS_DEBUG (TYPE_NAME (t)) = extern_p; >if (! extern_p) > { > diff --git a/gcc/testsuite/g++.dg/modules/debug-2_a.C > b/gcc/testsuite/g++.dg/modules/debug-2_a.C > new file mode 100644 > index 000..eed0905542b > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/debug-2_a.C > @@ -0,0 +1,9 @@ > +// PR c++/112820 > +// { dg-additional-options "-fmodules-ts -g" } > +// { dg-module-cmi io } > + > +export module io; > + > +export struct error { > + virtual const char* what() const noexcept; > +}; > diff --git a/gcc/testsuite/g++.dg/modules/debug-2_b.C > b/gcc/testsuite/g++.dg/modules/debug-2_b.C > new file mode 100644 > index 000..fc9afbc02e0 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/debug-2_b.C > @@ -0,0 +1,8 @@ > +// PR c++/112820 > +// { dg-additional-options "-fmodules-ts -g" } > + > +module io; > + > +const char* error::what() const noexcept { > + return "bla"; > +} > diff --git a/gcc/testsuite/g++.dg/modules/debug-2_c.C > b/gcc/testsuite/g++.dg/modules/debug-2_c.C >
[PATCH] match.pd: Convert {I, X}OR of two values ANDed with alien CSTs to PLUS [PR108477]
Instead of converting XOR or PLUS of two values, ANDed with two constants that have no bits in common, to IOR expression, convert IOR or XOR of said two ANDed values to PLUS expression. If we consider the following testcase: --cut here-- unsigned int foo (unsigned int a, unsigned int b) { unsigned int r = a & 0x1; unsigned int p = b & ~0x3; return r + p + 2; } unsigned int bar (unsigned int a, unsigned int b) { unsigned int r = a & 0x1; unsigned int p = b & ~0x3; return r | p | 2; } --cut here-- the above testcase compiles (x86_64 -O2) to: foo: andl$1, %edi andl$-4, %esi orl %esi, %edi leal2(%rdi), %eax ret bar: andl$1, %edi andl$-4, %esi orl %esi, %edi movl%edi, %eax orl $2, %eax ret There is no further simplification possible in any case, we can't combine OR with a PLUS in the first case, and we don't have OR instruction with multiple inputs in the second case. If we switch around the logic in the conversion and convert from IOR/XOR to PLUS, then the resulting assembly reads: foo: andl$-4, %esi andl$1, %edi leal2(%rsi,%rdi), %eax ret bar: andl$1, %edi andl$-4, %esi leal(%rdi,%rsi), %eax orl $2, %eax ret On x86, the conversion can now use LEA instruction, which is much more usable than OR instruction. In the first case, LEA implements three input ADD instruction, while in the second case, even though the instruction can't be combined with a follow-up OR, the non-destructive LEA avoids a move. PR target/108477 gcc/ChangeLog: * match.pd (A & CST1 | B & CST2 -> A & CST1 + B & CST2): Do not convert PLUS of two values, ANDed with two constants that have no bits in common to IOR exporession, convert IOR or XOR of said two ANDed values to PLUS expression. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108477.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. OK for mainline? Uros. diff --git a/gcc/match.pd b/gcc/match.pd index 7b4b15acc41..deac18a7635 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1830,18 +1830,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && element_precision (type) <= element_precision (TREE_TYPE (@1))) (bit_not (rop (convert @0) (convert @1)) -/* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing +/* If we are ORing or XORing two BIT_AND_EXPR's, both of which are and'ing with a constant, and the two constants have no bits in common, - we should treat this as a BIT_IOR_EXPR since this may produce more + we should treat this as a PLUS_EXPR since this may produce more simplifications. */ -(for op (bit_xor plus) +(for op (bit_ior bit_xor) (simplify (op (convert1? (bit_and@4 @0 INTEGER_CST@1)) (convert2? (bit_and@5 @2 INTEGER_CST@3))) (if (tree_nop_conversion_p (type, TREE_TYPE (@0)) && tree_nop_conversion_p (type, TREE_TYPE (@2)) && (wi::to_wide (@1) & wi::to_wide (@3)) == 0) - (bit_ior (convert @4) (convert @5) + (plus (convert @4) (convert @5) /* (X | Y) ^ X -> Y & ~ X*/ (simplify diff --git a/gcc/testsuite/gcc.target/i386/pr108477.c b/gcc/testsuite/gcc.target/i386/pr108477.c new file mode 100644 index 000..fb320a84c6d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr108477.c @@ -0,0 +1,13 @@ +/* PR target/108477 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ + +unsigned int foo (unsigned int a, unsigned int b) +{ + unsigned int r = a & 0x1; + unsigned int p = b & ~0x3; + + return r + p + 2; +} + +/* { dg-final { scan-assembler-not "orl" } } */
Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a
Wilco Dijkstra writes: > Hi, > >>> Is there no benefit to using SWPPL for RELEASE here? Similarly for the >>> others. >> >> We started off implementing all possible memory orderings available. >> Wilco saw value in merging less restricted orderings into more >> restricted ones - mainly to reduce codesize in less frequently used atomics. >> >> This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions >> a little smaller. > > Benchmarking showed that LSE and LSE2 RMW atomics have similar performance > once > the atomic is acquire, release or both. Given there is already a significant > overhead due > to the function call, PLT indirection and argument setup, it doesn't make > sense to add > extra taken branches that may mispredict or cause extra fetch cycles... Thanks for the extra context, especially wrt the LSE/LSE2 benchmarking. If there isn't any difference for acquire vs. the rest, is there a justification we can use for keeping the acquire branch, rather than using SWPAL for everything except relaxed? If so, then Victor, could you include that in the explanation above and add it as a source comment? Although maybe tone down "doesn't make sense to add" to something like "doesn't seem worth adding". :) Richard
RE: [PATCH 2/2] arm: Add cortex-m52 doc
> -Original Message- > From: Chung-Ju Wu > Sent: Monday, January 8, 2024 6:17 AM > To: gcc-patches ; Kyrylo Tkachov > ; Richard Earnshaw > Cc: jason...@anshingtek.com.tw > Subject: [PATCH 2/2] arm: Add cortex-m52 doc > > Hi, > > This is the patch to add cortex-m52 in the Arm-related options > sections of the gcc invoke.texi documentation. > > Is it OK for trunk? In the ChangeLog entry: gcc/ChangeLog: * doc/invoke.texi: Update docs. Let's be more specific and specify something like * doc/invoke.texi (Arm Options): Document Cortex-m52 options. Ok with a better ChangeLog entry. Thanks, Kyrill > > Regards, > jasonwucj
RE: [PATCH 1/2] arm: Add cortex-m52 core
Hi jasonwucj, > -Original Message- > From: Chung-Ju Wu > Sent: Monday, January 8, 2024 6:16 AM > To: gcc-patches ; Kyrylo Tkachov > ; Richard Earnshaw > Cc: jason...@anshingtek.com.tw > Subject: [PATCH 1/2] arm: Add cortex-m52 core > > Hi, > > Recently, Arm announced the Cortex-M52, delivering increased performance > in DSP and ML along with a range of other features and benefits. > For the completeness of Arm ecosystem, we hope that cortex-m52 support > could be available in gcc-14. > > Attached is the patch to support cortex-m52 cpu with MVE and PACBTI enabled in > GCC. > Bootstrapped and tested on arm-none-eabi. > > Is it OK for trunk? The patch looks good to me. It should be safe to include it in GCC 14 as it doesn’t add any new logic beyond a new entry in arm-cpus.in. Do you have commit rights to push it? Thanks, Kyrill > > Regards, > jasonwucj
Re: [Patch] GCN: Add pre-initial support for gfx1100
Hi Andrew, Andrew Stubbs wrote: OK for mainline ? This looks fine to me. I know there will be things that need fixing for both experimental architectures. Indeed. I tried to be a bit more verbose also to avoid too high expectations by occasional gcc-patches@ readers. P.S. Apologies, but I think my commits today conflict a little; you should be able to drop the hunks that patch deleted code. I did so - but I then realized that I should have also added gfx1100 to the new chunk. Committed as r14-7006-g97a52f69d209f6 (see attachment) - as follow up to the original r14-7005-g52a2c659ae6c21 Tobiascommit 97a52f69d209f69e755ffad6897c7176da9ac686 Author: Tobias Burnus Date: Mon Jan 8 15:18:10 2024 +0100 amdgcn: Add gfx1100 to new XNACK defaults in mkoffload Commit r14-6997-g78dff4c25c1b95 added an arch-dependent SET_XNACK_OFF vs. SET_XNACK_ANY check; that was added between writing and committing the add-gfx1100 commit r14-7005-g52a2c659ae6c21 - and I missed to add it there. gcc/ChangeLog: * config/gcn/mkoffload.cc (main): Handle gfx1100 when setting the default XNACK. --- gcc/config/gcn/mkoffload.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 2cd201d56ca..d4cd509089e 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -1018,6 +1018,7 @@ main (int argc, char **argv) case EF_AMDGPU_MACH_AMDGCN_GFX906: case EF_AMDGPU_MACH_AMDGCN_GFX908: case EF_AMDGPU_MACH_AMDGCN_GFX1030: +case EF_AMDGPU_MACH_AMDGCN_GFX1100: SET_XNACK_OFF (elf_flags); break; case EF_AMDGPU_MACH_AMDGCN_GFX90a:
RE: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]
> -Original Message- > From: Richard Biener > Sent: Monday, January 8, 2024 12:48 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > Subject: Re: [PATCH]middle-end: check if target can do extract first for > early breaks > [PR113199] > > On Tue, 2 Jan 2024, Tamar Christina wrote: > > > Hi All, > > > > I was generating the vector reverse mask without checking if the target > > actually supported such an operation. > > > > It also seems like more targets implement VEC_EXTRACT than permute on mask > > registers. > > > > So this adds a check for IFN_VEC_EXTRACT support when required and changes > > the select first code to use it. > > > > This is good for now since masks always come from whilelo. But in the > > future > > when masks can come from other sources we will need the old code back. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > and no issues with --enable-checking=release --enable-lto > > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra. > > tested on cross cc1 for amdgcn-amdhsa and issue fixed. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/113199 > > * tree-vect-loop.cc (vectorizable_live_operation_1): Use > > IFN_VEC_EXTRACT. > > (vectorizable_live_operation): Check for IFN_VEC_EXTRACT support. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/113199 > > * gcc.target/gcn/pr113199.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/testsuite/gcc.target/gcn/pr113199.c > b/gcc/testsuite/gcc.target/gcn/pr113199.c > > new file mode 100644 > > index > ..8a641e5536e80e207ca01 > 63cac66c0f4f6ca93f7 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/gcn/pr113199.c > > @@ -0,0 +1,44 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-O2" } */ > > + > > +typedef long unsigned int size_t; > > +typedef int wchar_t; > > +struct tm > > +{ > > + int tm_mon; > > + int tm_year; > > +}; > > +int abs (int); > > +struct lc_time_T { const char *month[12]; }; > > +struct __locale_t * __get_current_locale (void) { } > > +const struct lc_time_T * __get_time_locale (struct __locale_t *locale) { } > > +const wchar_t * __ctloc (wchar_t *buf, const char *elem, size_t *len_ret) { > return buf; } > > +size_t > > +__strftime (wchar_t *s, size_t maxsize, const wchar_t *format, > > + const struct tm *tim_p, struct __locale_t *locale) > > +{ > > + size_t count = 0; > > + const wchar_t *ctloc; > > + wchar_t ctlocbuf[256]; > > + size_t i, ctloclen; > > + const struct lc_time_T *_CurrentTimeLocale = __get_time_locale (locale); > > +{ > > + switch (*format) > > + { > > + case L'B': > > + (ctloc = __ctloc (ctlocbuf, _CurrentTimeLocale->month[tim_p->tm_mon], > )); > > + for (i = 0; i < ctloclen; i++) > > + { > > + if (count < maxsize - 1) > > + s[count++] = ctloc[i]; > > + else > > + return 0; > > + { > > + int century = tim_p->tm_year >= 0 > > +? tim_p->tm_year / 100 + 1900 / 100 > > +: abs (tim_p->tm_year + 1900) / 100; > > + } > > + } > > + } > > +} > > +} > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index > 37f1be1101ffae779214056a0886411e0683e887..5aa92e67444e7aacf458fffa14 > 28f1983c482374 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -10648,36 +10648,18 @@ vectorizable_live_operation_1 (loop_vec_info > loop_vinfo, > > _VINFO_MASKS (loop_vinfo), > > 1, vectype, 0); > >tree scalar_res; > > + gimple_seq_add_seq (, tem); > > > >/* For an inverted control flow with early breaks we want > > EXTRACT_FIRST > > -instead of EXTRACT_LAST. Emulate by reversing the vector and mask. */ > > +instead of EXTRACT_LAST. For now since the mask always comes from a > > +WHILELO we can get the first element ignoring the mask since CLZ of the > > +mask will always be zero. */ > >if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > > - { > > - /* First create the permuted mask. */ > > - tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask)); > > - tree perm_dest = copy_ssa_name (mask); > > - gimple *perm_stmt > > - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask, > > - mask, perm_mask); > > - vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt, > > - ); > > - mask = perm_dest; > > - > > - /* Then permute the vector contents. */ > > - tree perm_elem = perm_mask_for_reverse (vectype); > > - perm_dest = copy_ssa_name (vec_lhs_phi); > > - perm_stmt > > - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi, > > - vec_lhs_phi, perm_elem); > > -
RE: [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands
> -Original Message- > From: Richard Biener > Sent: Monday, January 8, 2024 12:38 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > Subject: Re: [PATCH]middle-end: maintain LCSSA form when peeled vector > iterations have virtual operands > > On Fri, 29 Dec 2023, Tamar Christina wrote: > > > Hi All, > > > > This patch fixes several interconnected issues. > > > > 1. When picking an exit we wanted to check for niter_desc.may_be_zero not > true. > >i.e. we want to pick an exit which we know will iterate at least once. > >However niter_desc.may_be_zero is not a boolean. It is a tree that > > encodes > >a boolean value. !niter_desc.may_be_zero is just checking if we have > > some > >information, not what the information is. This leads us to pick a more > >difficult to vectorize exit more often than we should. > > > > 2. Because we had this bug, we used to pick an alternative exit much more > > ofthen > >which showed one issue, when the loop accesses memory and we "invert it" > > we > >would corrupt the VUSE chain. This is because on an peeled vector > > iteration > >every exit restarts the loop (i.e. they're all early) BUT since we may > > have > >performed a store, the vUSE would need to be updated. This version > > maintains > >virtual PHIs correctly in these cases. Note that we can't simply > > remove all > >of them and recreate them because we need the PHI nodes still in the > > right > >order for if skip_vector. > > > > 3. Since we're moving the stores to a safe location I don't think we > > actually > >need to analyze whether the store is in range of the memref, because if > > we > >ever get there, we know that the loads must be in range, and if the > > loads are > >in range and we get to the store we know the early breaks were not taken > > and > >so the scalar loop would have done the VF stores too. > > > > 4. Instead of searching for where to move stores to, they should always be > > in > >exit belonging to the latch. We can only ever delay stores and even if > > we > >pick a different exit than the latch one as the main one, effects still > >happen in program order when vectorized. If we don't move the stores to > > the > >latch exit but instead to whever we pick as the "main" exit then we can > >perform incorrect memory accesses (luckily these are trapped by > > verify_ssa). > > > > 5. We only used to analyze loads inside the same BB as an early break, and > > also > >we'd never analyze the ones inside the block where we'd be moving memory > >references to. This is obviously bogus and to fix it this patch splits > > apart > >the two constraints. We first validate that all load memory references > > are > >in bounds and only after that do we perform the alias checks for the > > writes. > >This makes the code simpler to understand and more trivially correct. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > and no issues with --enable-checking=release --enable-lto > > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/113137 > > PR tree-optimization/113136 > > PR tree-optimization/113172 > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): > > (vect_do_peeling): Maintain virtual PHIs on inverted loops. > > * tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to > > latch. > > (vect_create_loop_vinfo): Record all conds instead of only alt ones. > > * tree-vectorizer.h: Fix comment > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/113137 > > PR tree-optimization/113136 > > PR tree-optimization/113172 > > * g++.dg/vect/vect-early-break_4-pr113137.cc: New test. > > * g++.dg/vect/vect-early-break_5-pr113137.cc: New test. > > * gcc.dg/vect/vect-early-break_95-pr113137.c: New test. > > * gcc.dg/vect/vect-early-break_96-pr113136.c: New test. > > * gcc.dg/vect/vect-early-break_97-pr113172.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc > b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc > > new file mode 100644 > > index > ..f78db8669dcc65f1b45ea7 > 8f4433d175e1138332 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options vect_early_break } */ > > +/* { dg-require-effective-target vect_early_break } */ > > +/* { dg-require-effective-target vect_int } */ > > + > > +int b; > > +void a() __attribute__((__noreturn__)); > > +void c() { > > + char *buf; > > + int bufsz = 64; > > +
Re: [PATCH] Clarify -mmovbe documentation
On Mon, Jan 8, 2024 at 10:56 AM Richard Biener wrote: > > It was noticed that -mmovbe doesn't use movbe for __builtin_bswap{32,64} > when not optimizing. The follownig adjusts the documentation to > say it will be used for optimizing and applies to all byte swaps, > not just those carried out via builtin function calls. > > OK? > > Thanks, > Richard. > > * doc/invoke.texi (-mmovbe): Clarify. OK. Thanks, Uros. > --- > gcc/doc/invoke.texi | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 68d1f364ac0..8cf99f395a5 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -34708,8 +34708,8 @@ see @ref{Other Builtins} for details. > > @opindex mmovbe > @item -mmovbe > -This option enables use of the @code{movbe} instruction to implement > -@code{__builtin_bswap32} and @code{__builtin_bswap64}. > +This option enables use of the @code{movbe} instruction to optimize > +byte swapping of four and eight byte entities. > > @opindex mshstk > @item -mshstk > -- > 2.35.3
[PATCH 5/5] RISC-V: Document the syntax of -march
--- gcc/doc/invoke.texi | 16 1 file changed, 16 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 68d1f364ac0..81ee7ac758a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -30037,6 +30037,22 @@ Generate code for given RISC-V ISA (e.g.@: @samp{rv64im}). ISA strings must be lower-case. Examples include @samp{rv64i}, @samp{rv32g}, @samp{rv32e}, and @samp{rv32imaf}. +The syntax of the ISA string is defined as follows: + +@table @code +@item The string must start with @samp{rv32} or @samp{rv64}, followed by +@samp{i}, @samp{e}, or @samp{g}, referred to as the base ISA. +@item The subsequent part of the string is a list of extension names. Extension +names can be categorized as multi-letter (e.g.@: @samp{zba}) and single-letter +(e.g.@: @samp{v}). Single-letter extensions can appear consecutively, +but multi-letter extensions must be separated by underscores. +@item An underscore can appear anywhere after the base ISA. It has no specific +effect but is used to improve readability and can act as a separator. +@item Extension names may include an optional version number, following the +syntax @samp{p} or @samp{}, (e.g.@: @samp{m2p1} or +@samp{m2}). +@end table + When @option{-march=} is not specified, use the setting from @option{-mcpu}. If both @option{-march} and @option{-mcpu=} are not specified, the default for -- 2.34.1
[PATCH 4/5] RISC-V: Update testsuite due to -march string relaxation
We has relaxed -march string, it no longer require canonical order, so we need update some of those testcase. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-23.c: Update test. * gcc.target/riscv/arch-27.c: Ditto. * gcc.target/riscv/arch-28.c: Ditto. * gcc.target/riscv/attribute-10.c: Ditto. --- gcc/testsuite/gcc.target/riscv/arch-23.c | 1 - gcc/testsuite/gcc.target/riscv/arch-27.c | 2 +- gcc/testsuite/gcc.target/riscv/arch-28.c | 2 +- gcc/testsuite/gcc.target/riscv/attribute-10.c | 4 +++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/arch-23.c b/gcc/testsuite/gcc.target/riscv/arch-23.c index fca5425790c..aacfc451043 100644 --- a/gcc/testsuite/gcc.target/riscv/arch-23.c +++ b/gcc/testsuite/gcc.target/riscv/arch-23.c @@ -4,7 +4,6 @@ int foo() { } -/* { dg-error "ISA string is not in canonical order. 'c'" "" { target *-*-* } 0 } */ /* { dg-error "extension 'w' is unsupported standard single letter extension" "" { target *-*-* } 0 } */ /* { dg-error "extension 'zvl' starts with 'z' but is unsupported standard extension" "" { target *-*-* } 0 } */ /* { dg-error "extension 's123' starts with 's' but is unsupported standard supervisor extension" "" { target *-*-* } 0 } */ diff --git a/gcc/testsuite/gcc.target/riscv/arch-27.c b/gcc/testsuite/gcc.target/riscv/arch-27.c index 70143b2156f..03f07deedd1 100644 --- a/gcc/testsuite/gcc.target/riscv/arch-27.c +++ b/gcc/testsuite/gcc.target/riscv/arch-27.c @@ -4,4 +4,4 @@ int foo() { } -/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 0 } */ +/* { dg-error "'i', 'e' or 'g' must be the first extension" "" { target *-*-* } 0 } */ diff --git a/gcc/testsuite/gcc.target/riscv/arch-28.c b/gcc/testsuite/gcc.target/riscv/arch-28.c index 934399a7b3a..0f83c03ad3d 100644 --- a/gcc/testsuite/gcc.target/riscv/arch-28.c +++ b/gcc/testsuite/gcc.target/riscv/arch-28.c @@ -4,4 +4,4 @@ int foo() { } -/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 0 } */ +/* { dg-error "'i', 'e' or 'g' must be the first extension" "" { target *-*-* } 0 } */ diff --git a/gcc/testsuite/gcc.target/riscv/attribute-10.c b/gcc/testsuite/gcc.target/riscv/attribute-10.c index 868adef6ab7..8a7f0a8ac49 100644 --- a/gcc/testsuite/gcc.target/riscv/attribute-10.c +++ b/gcc/testsuite/gcc.target/riscv/attribute-10.c @@ -3,4 +3,6 @@ int foo() { } -/* { dg-error "unexpected ISA string at end:" "" { target { "riscv*-*-*" } } 0 } */ +/* { dg-error "extension 'u' is unsupported standard single letter extension" "" { target { "riscv*-*-*" } } 0 } */ +/* { dg-error "extension 'n' is unsupported standard single letter extension" "" { target { "riscv*-*-*" } } 0 } */ +/* { dg-error "'i', 'e' or 'g' must be the first extension" "" { target { "riscv*-*-*" } } 0 } */ -- 2.34.1
[PATCH 3/5] RISC-V: Remove unused function in riscv_subset_list [NFC]
gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::parse_std_ext): Remove. (riscv_subset_list::parse_multiletter_ext): Remove. * config/riscv/riscv-subset.h (riscv_subset_list::parse_std_ext): Remove. (riscv_subset_list::parse_multiletter_ext): Remove. --- gcc/common/config/riscv/riscv-common.cc | 179 gcc/config/riscv/riscv-subset.h | 4 - 2 files changed, 183 deletions(-) diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 891ecfce464..cf1c82c9f5e 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -1059,73 +1059,6 @@ riscv_subset_list::parse_base_ext (const char *p) return p; } - -/* Parsing function for standard extensions. - - Return Value: - Points to the end of extensions. - - Arguments: - `p`: Current parsing position. */ - -const char * -riscv_subset_list::parse_std_ext (const char *p) -{ - const char *all_std_exts = riscv_supported_std_ext (); - const char *std_exts = all_std_exts; - - unsigned major_version = 0; - unsigned minor_version = 0; - char std_ext = '\0'; - bool explicit_version_p = false; - - while (p != NULL && *p) -{ - char subset[2] = {0, 0}; - - if (*p == 'x' || *p == 's' || *p == 'z') - break; - - if (*p == '_') - { - p++; - continue; - } - - std_ext = *p; - - /* Checking canonical order. */ - const char *prior_std_exts = std_exts; - - while (*std_exts && std_ext != *std_exts) - std_exts++; - - subset[0] = std_ext; - if (std_ext != *std_exts && standard_extensions_p (subset)) - { - error_at (m_loc, - "%<-march=%s%>: ISA string is not in canonical order. " - "%<%c%>", - m_arch, *p); - /* Extension ordering is invalid. Ignore this extension and keep -searching for other issues with remaining extensions. */ - std_exts = prior_std_exts; - p++; - continue; - } - - std_exts++; - - p++; - - p = parsing_subset_version (subset, p, _version, _version, - /* std_ext_p= */ true, _version_p); - - add (subset, major_version, minor_version, explicit_version_p, false); -} - return p; -} - /* Parsing function for one standard extensions. Return Value: @@ -1409,118 +1342,6 @@ riscv_subset_list::parse_single_multiletter_ext (const char *p, } -/* Parsing function for multi-letter extensions. - - Return Value: - Points to the end of extensions. - - Arguments: - `p`: Current parsing position. - `ext_type`: What kind of extensions, 's', 'z' or 'x'. - `ext_type_str`: Full name for kind of extension. */ - -const char * -riscv_subset_list::parse_multiletter_ext (const char *p, - const char *ext_type, - const char *ext_type_str) -{ - unsigned major_version = 0; - unsigned minor_version = 0; - size_t ext_type_len = strlen (ext_type); - - while (*p) -{ - if (*p == '_') - { - p++; - continue; - } - - if (strncmp (p, ext_type, ext_type_len) != 0) - break; - - char *subset = xstrdup (p); - char *q = subset; - const char *end_of_version; - bool explicit_version_p = false; - char *ext; - char backup; - size_t len; - size_t end_of_version_pos, i; - bool found_any_number = false; - bool found_minor_version = false; - - /* Parse until end of this extension including version number. */ - while (*++q != '\0' && *q != '_') - ; - - backup = *q; - *q = '\0'; - len = q - subset; - *q = backup; - - end_of_version_pos = len; - /* Find the begin of version string. */ - for (i = len -1; i > 0; --i) - { - if (ISDIGIT (subset[i])) - { - found_any_number = true; - continue; - } - /* Might be version seperator, but need to check one more char, -we only allow p, so we could stop parsing if found -any more `p`. */ - if (subset[i] == 'p' && - !found_minor_version && - found_any_number && ISDIGIT (subset[i-1])) - { - found_minor_version = true; - continue; - } - - end_of_version_pos = i + 1; - break; - } - - backup = subset[end_of_version_pos]; - subset[end_of_version_pos] = '\0'; - ext = xstrdup (subset); - subset[end_of_version_pos] = backup; - - end_of_version - = parsing_subset_version (ext, subset + end_of_version_pos, _version, _version, - /* std_ext_p= */ false, _version_p); - free (ext); - - if (end_of_version ==
[PATCH 2/5] RISC-V: Relax the -march string for accept any order
-march was require canonical order before, however it's not easy for most user when we have so many extension, so this patch is relax the constraint, -march accept the ISA string in any order, it only has few requirement: 1. Must start with rv[32|64][e|i|g]. 2. Multi-letter and single letter extension must be separated by at least one underscore(`_`). gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::parse_single_std_ext): New parameter. (riscv_subset_list::parse_single_multiletter_ext): Ditto. (riscv_subset_list::parse_single_ext): Ditto. (riscv_subset_list::parse): Relax the order for the input of ISA string. * config/riscv/riscv-subset.h (riscv_subset_list::parse_single_std_ext): New parameter. (riscv_subset_list::parse_single_multiletter_ext): Ditto. (riscv_subset_list::parse_single_ext): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-33.c: New. * gcc.target/riscv/arch-34.c: New. --- gcc/common/config/riscv/riscv-common.cc | 91 ++-- gcc/config/riscv/riscv-subset.h | 6 +- gcc/testsuite/gcc.target/riscv/arch-33.c | 5 ++ gcc/testsuite/gcc.target/riscv/arch-34.c | 5 ++ 4 files changed, 67 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/arch-33.c create mode 100644 gcc/testsuite/gcc.target/riscv/arch-34.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index f0359380451..891ecfce464 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -1132,10 +1132,12 @@ riscv_subset_list::parse_std_ext (const char *p) Points to the end of extensions. Arguments: - `p`: Current parsing position. */ + `p`: Current parsing position. + `exact_single_p`: True if input string is exactly an extension and end + with '\0'. */ const char * -riscv_subset_list::parse_single_std_ext (const char *p) +riscv_subset_list::parse_single_std_ext (const char *p, bool exact_single_p) { if (*p == 'x' || *p == 's' || *p == 'z') { @@ -1146,6 +1148,11 @@ riscv_subset_list::parse_single_std_ext (const char *p) return nullptr; } + if (exact_single_p && strlen (p) > 1) +{ + return nullptr; +} + unsigned major_version = 0; unsigned minor_version = 0; bool explicit_version_p = false; @@ -1296,13 +1303,16 @@ riscv_subset_list::check_conflict_ext () Arguments: `p`: Current parsing position. `ext_type`: What kind of extensions, 's', 'z' or 'x'. - `ext_type_str`: Full name for kind of extension. */ + `ext_type_str`: Full name for kind of extension. + `exact_single_p`: True if input string is exactly an extension and end + with '\0'. */ const char * riscv_subset_list::parse_single_multiletter_ext (const char *p, const char *ext_type, -const char *ext_type_str) +const char *ext_type_str, +bool exact_single_p) { unsigned major_version = 0; unsigned minor_version = 0; @@ -1314,6 +1324,7 @@ riscv_subset_list::parse_single_multiletter_ext (const char *p, char *subset = xstrdup (p); const char *end_of_version; bool explicit_version_p = false; + char *q = subset; char *ext; char backup; size_t len = strlen (p); @@ -1321,6 +1332,17 @@ riscv_subset_list::parse_single_multiletter_ext (const char *p, bool found_any_number = false; bool found_minor_version = false; + if (!exact_single_p) +{ + /* Extension may not ended with '\0', may come with another extension +which concat by '_' */ + /* Parse until end of this extension including version number. */ + while (*++q != '\0' && *q != '_') + ; + + len = q - subset; +} + end_of_version_pos = len; /* Find the begin of version string. */ for (i = len -1; i > 0; --i) @@ -1505,21 +1527,26 @@ riscv_subset_list::parse_multiletter_ext (const char *p, Points to the end of extensions. Arguments: - `p`: Current parsing position. */ + `p`: Current parsing position. + `exact_single_p`: True if input string is exactly an extension and end + with '\0'. */ const char * -riscv_subset_list::parse_single_ext (const char *p) +riscv_subset_list::parse_single_ext (const char *p, bool exact_single_p) { switch (p[0]) { case 'x': - return parse_single_multiletter_ext (p, "x", "non-standard extension"); + return parse_single_multiletter_ext (p, "x", "non-standard extension", + exact_single_p); case 'z': - return parse_single_multiletter_ext (p, "z", "sub-extension"); + return parse_single_multiletter_ext (p, "z", "sub-extension", +
[PATCH 1/5] RISC-V: Extract part parsing base ISA logic into a standalone function [NFC]
Minor refactor, preparation for further change. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::parse_base_ext): New. (riscv_subset_list::parse): Extract part of logic into riscv_subset_list::parse_base_ext. * config/riscv/riscv-subset.h (riscv_subset_list::parse_base_ext): New. --- gcc/common/config/riscv/riscv-common.cc | 68 - gcc/config/riscv/riscv-subset.h | 2 + 2 files changed, 47 insertions(+), 23 deletions(-) diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 0301d170a41..f0359380451 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -970,25 +970,38 @@ riscv_subset_list::parsing_subset_version (const char *ext, return p; } -/* Parsing function for standard extensions. +/* Parsing function for base extensions, rv[32|64][i|e|g] Return Value: - Points to the end of extensions. + Points to the end of extensions, return NULL if any error. Arguments: `p`: Current parsing position. */ - const char * -riscv_subset_list::parse_std_ext (const char *p) +riscv_subset_list::parse_base_ext (const char *p) { - const char *all_std_exts = riscv_supported_std_ext (); - const char *std_exts = all_std_exts; - unsigned major_version = 0; unsigned minor_version = 0; char std_ext = '\0'; bool explicit_version_p = false; + if (startswith (p, "rv32")) +{ + m_xlen = 32; + p += 4; +} + else if (startswith (p, "rv64")) +{ + m_xlen = 64; + p += 4; +} + else +{ + error_at (m_loc, "%<-march=%s%>: ISA string must begin with rv32 or rv64", + m_arch); + return NULL; +} + /* First letter must start with i, e or g. */ switch (*p) { @@ -1043,6 +1056,28 @@ riscv_subset_list::parse_std_ext (const char *p) "% or %", m_arch); return NULL; } + return p; +} + + +/* Parsing function for standard extensions. + + Return Value: + Points to the end of extensions. + + Arguments: + `p`: Current parsing position. */ + +const char * +riscv_subset_list::parse_std_ext (const char *p) +{ + const char *all_std_exts = riscv_supported_std_ext (); + const char *std_exts = all_std_exts; + + unsigned major_version = 0; + unsigned minor_version = 0; + char std_ext = '\0'; + bool explicit_version_p = false; while (p != NULL && *p) { @@ -1499,22 +1534,9 @@ riscv_subset_list::parse (const char *arch, location_t loc) riscv_subset_list *subset_list = new riscv_subset_list (arch, loc); riscv_subset_t *itr; const char *p = arch; - if (startswith (p, "rv32")) -{ - subset_list->m_xlen = 32; - p += 4; -} - else if (startswith (p, "rv64")) -{ - subset_list->m_xlen = 64; - p += 4; -} - else -{ - error_at (loc, "%<-march=%s%>: ISA string must begin with rv32 or rv64", - arch); - goto fail; -} + p = subset_list->parse_base_ext (p); + if (p == NULL) +goto fail; /* Parsing standard extension. */ p = subset_list->parse_std_ext (p); diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h index 14461838db5..c8117d8daf2 100644 --- a/gcc/config/riscv/riscv-subset.h +++ b/gcc/config/riscv/riscv-subset.h @@ -67,6 +67,8 @@ private: const char *parsing_subset_version (const char *, const char *, unsigned *, unsigned *, bool, bool *); + const char *parse_base_ext (const char *); + const char *parse_std_ext (const char *); const char *parse_single_std_ext (const char *); -- 2.34.1
[PATCH 0/5] RISC-V: Relax the -march string for accept any order
Do you know how to build a ISA string with following extension? - g - c - zba - zbs - svnapot - zve64d - zvl128b Don't trial and error with your gcc and don't read RISC-V ISA spec! OK, I believe it's impossible for most people, even I work for RISC-V so many years, I remember most of the rule of the the canonical order, it's still hard to order that right in short time... So I think it's time to relax that for the -march string inputs, since we have so many extension today, but we still keep the canonicalization within the compiler, because we need that to handle multi-lib and also it's easier to compare different ISA string. This patch break into serveral part: 1) Small refactor patch 2) Change the way of parsing ISA string. 3) Remove unused functions 4) Update test cases 5) Update document
RE: [PATCH]middle-end: rejects loops with nonlinear inductions and early breaks [PR113163]
> -Original Message- > From: Richard Biener > Sent: Monday, January 8, 2024 12:07 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > Subject: Re: [PATCH]middle-end: rejects loops with nonlinear inductions and > early > breaks [PR113163] > > On Fri, 29 Dec 2023, Tamar Christina wrote: > > > Hi All, > > > > We can't support nonlinear inductions other than neg when vectorizing > > early breaks and iteration count is known. > > > > For early break we currently require a peeled epilog but in these cases > > we can't compute the remaining values. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > tested on cross cc1 for amdgcn-amdhsa and issue fixed. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR middle-end/113163 > > * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): > > Misses sth. > > > gcc/testsuite/ChangeLog: > > > > PR middle-end/113163 > > * gcc.target/gcn/pr113163.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/testsuite/gcc.target/gcn/pr113163.c > b/gcc/testsuite/gcc.target/gcn/pr113163.c > > new file mode 100644 > > index > ..99b0fdbaf3a3152ca008b5 > 109abf6e80d8cb3d6a > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/gcn/pr113163.c > > @@ -0,0 +1,30 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-O2 -ftree-vectorize" } */ > > + > > +struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; }; > > +static const char R64_ARRAY[] = > "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" > ; > > +char * > > +_l64a_r (struct _reent *rptr, > > + long value) > > +{ > > + char *ptr; > > + char *result; > > + int i, index; > > + unsigned long tmp = (unsigned long)value & 0x; > > + result = > > + (( > > + rptr > > + )->_new._reent._l64a_buf) > > + ; > > + ptr = result; > > + for (i = 0; i < 6; ++i) > > +{ > > + if (tmp == 0) > > + { > > + *ptr = '\0'; > > + break; > > + } > > + *ptr++ = R64_ARRAY[index]; > > + tmp >>= 6; > > +} > > +} > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > > index > 3810983a80c8b989be9fd9a9993642069fd39b99..f1bf43b3731868e7b053c18 > 6302fbeaf515be8cf 100644 > > --- a/gcc/tree-vect-loop-manip.cc > > +++ b/gcc/tree-vect-loop-manip.cc > > @@ -2075,6 +2075,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info > loop_vinfo, > >return false; > > } > > > > + /* We can't support partial vectors and early breaks with an induction > > + type other than add or neg since we require the epilog and can't > > + perform the peeling. PR113163. */ > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > > + && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant () > > But why's that only for constant VF? We might never end up here > with variable VF but the check looks odd ... It's mirroring the condition in vect_gen_vector_loop_niters where we create step_vector which is not 1. This is the case which causes niters_vector_mult_vf_var to become a tree var instead. I'll update the comment to say this. Thanks, Tamar > > OK with that clarified and/or the test removed. > > Thanks, > Richard. > > > + && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) > > + && induction_type != vect_step_op_neg) > > +{ > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > +"Peeling for epilogue is not supported" > > +" for nonlinear induction except neg" > > +" when iteration count is known and early breaks.\n"); > > + return false; > > +} > > + > >return true; > > } > > > > > > > > > > > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
RE: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases
> -Original Message- > From: Richard Biener > Sent: Monday, January 8, 2024 11:29 AM > To: gcc-patches@gcc.gnu.org > Cc: Tamar Christina > Subject: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases > > The following avoids creating a niter peeling epilog more consistently, > matching what peeling later uses for the skip_vector condition, in > particular when versioning is required which then also ensures the > vector loop is entered unless the epilog is vectorized. This should > ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed > later, some refactoring could make that better matching. > > The patch also makes sure to adjust the upper bound of the epilogues > when we do not have a skip edge around the vector loop. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. Tamar, does > that look OK wrt early-breaks? Yeah the value looks correct, I did find a few cases where the niters should actually be higher for skip_vector, namely when of the breaks forces ncopies > 1 and we have a break condition that requires all values to be true to continue. The code is not wrong in that case, just executes a completely useless vector iters. But that's unrelated, this looks correct because it means bound_scalar is not set, in which case there's no difference between one and multiple exits. Thanks, Tamar > > Thanks, > Richard. > > PR tree-optimization/113026 > * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): > Avoid an epilog in more cases. > * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the > epilogues niter upper bounds and estimates. > > * gcc.dg/torture/pr113026-1.c: New testcase. > * gcc.dg/torture/pr113026-2.c: Likewise. > --- > gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 > gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 + > gcc/tree-vect-loop-manip.cc | 32 +++ > gcc/tree-vect-loop.cc | 6 - > 4 files changed, 66 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c > create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c > b/gcc/testsuite/gcc.dg/torture/pr113026-1.c > new file mode 100644 > index 000..56dfef3b36c > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-Wall" } */ > + > +char dst[16]; > + > +void > +foo (char *src, long n) > +{ > + for (long i = 0; i < n; i++) > +dst[i] = src[i]; /* { dg-bogus "" } */ > +} > diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c > b/gcc/testsuite/gcc.dg/torture/pr113026-2.c > new file mode 100644 > index 000..b9d5857a403 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-Wall" } */ > + > +char dst1[17]; > +void > +foo1 (char *src, long n) > +{ > + for (long i = 0; i < n; i++) > +dst1[i] = src[i]; /* { dg-bogus "" } */ > +} > + > +char dst2[18]; > +void > +foo2 (char *src, long n) > +{ > + for (long i = 0; i < n; i++) > +dst2[i] = src[i]; /* { dg-bogus "" } */ > +} > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 9330183bfb9..927f76a0947 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count > (); > bb_before_epilog = loop_preheader_edge (epilog)->src; > } > + else > + { > + /* When we do not have a loop-around edge to the epilog we know > + the vector loop covered at least VF scalar iterations unless > + we have early breaks and the epilog will cover at most > + VF - 1 + gap peeling iterations. > + Update any known upper bound with this knowledge. */ > + if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > + { > + if (epilog->any_upper_bound) > + epilog->nb_iterations_upper_bound -= lowest_vf; > + if (epilog->any_likely_upper_bound) > + epilog->nb_iterations_likely_upper_bound -= lowest_vf; > + if (epilog->any_estimate) > + epilog->nb_iterations_estimate -= lowest_vf; > + } > + unsigned HOST_WIDE_INT const_vf; > + if (vf.is_constant (_vf)) > + { > + const_vf += LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - 1; > + if (epilog->any_upper_bound) > + epilog->nb_iterations_upper_bound > + = wi::umin (epilog->nb_iterations_upper_bound, const_vf); > + if (epilog->any_likely_upper_bound) > + epilog->nb_iterations_likely_upper_bound > + = wi::umin
Re: [PATCH v4] aarch64: SVE/NEON Bridging intrinsics
On Mon, Dec 11, 2023 at 03:13:03PM +, Richard Ball wrote: > ACLE has added intrinsics to bridge between SVE and Neon. > > The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and > SVE vectors. > > This patch adds support to GCC for the following 3 intrinsics: > svset_neonq, svget_neonq and svdup_neonq This broke PCH on aarch64, see https://gcc.gnu.org/PR113270 Given that the tree pointers are no longer GC marked, bet it results in random crashes elsewhere too even when not using PCH. Jakub
[PATCH v5 1/1] RISC-V: Add support for XCVbi extension in CV32E40P
Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md Contributors: Mary Bennett Nandni Jamnadas Pietra Ferreira Charlie Keaney Jessica Mills Craig Blackmore Simon Cook Jeremy Bennett Helene Chelin gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Create XCVbi extension support. * config/riscv/riscv.opt: Likewise. * config/riscv/corev.md: Implement cv_branch pattern for cv.beqimm and cv.bneimm. * config/riscv/riscv.md: Add CORE-V branch immediate to RISC-V branch instruction pattern. * config/riscv/constraints.md: Implement constraints cv_bi_s5 - signed 5-bit immediate. * config/riscv/predicates.md: Implement predicate const_int5s_operand - signed 5 bit immediate. * doc/sourcebuild.texi: Add XCVbi documentation. gcc/testsuite/ChangeLog: * gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test. * gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test. * gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test. * gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test. * lib/target-supports.exp: Add proc for XCVbi. --- gcc/common/config/riscv/riscv-common.cc | 2 + gcc/config/riscv/constraints.md | 6 +++ gcc/config/riscv/corev.md | 37 ++ gcc/config/riscv/predicates.md| 4 ++ gcc/config/riscv/riscv.md | 2 +- gcc/config/riscv/riscv.opt| 2 + gcc/doc/sourcebuild.texi | 3 ++ .../gcc.target/riscv/cv-bi-beqimm-compile-1.c | 17 +++ .../gcc.target/riscv/cv-bi-beqimm-compile-2.c | 48 +++ .../gcc.target/riscv/cv-bi-bneimm-compile-1.c | 17 +++ .../gcc.target/riscv/cv-bi-bneimm-compile-2.c | 48 +++ gcc/testsuite/lib/target-supports.exp | 13 + 12 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-2.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 0301d170a41..d61164a42b9 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -355,6 +355,7 @@ static const struct riscv_ext_version riscv_ext_version_table[] = {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0}, {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0}, {"xcvelw", ISA_SPEC_CLASS_NONE, 1, 0}, + {"xcvbi", ISA_SPEC_CLASS_NONE, 1, 0}, {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0}, {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0}, @@ -1730,6 +1731,7 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] = {"xcvmac",_options::x_riscv_xcv_subext, MASK_XCVMAC}, {"xcvalu",_options::x_riscv_xcv_subext, MASK_XCVALU}, {"xcvelw",_options::x_riscv_xcv_subext, MASK_XCVELW}, + {"xcvbi", _options::x_riscv_xcv_subext, MASK_XCVBI}, {"xtheadba", _options::x_riscv_xthead_subext, MASK_XTHEADBA}, {"xtheadbb", _options::x_riscv_xthead_subext, MASK_XTHEADBB}, diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md index ee1c12b2e51..e4bfa227a2f 100644 --- a/gcc/config/riscv/constraints.md +++ b/gcc/config/riscv/constraints.md @@ -262,3 +262,9 @@ (and (match_code "const_int") (and (match_test "IN_RANGE (ival, 0, 1073741823)") (match_test "exact_log2 (ival + 1) != -1" + +(define_constraint "CV_bi_sign5" + "@internal + A 5-bit signed immediate for CORE-V Immediate Branch." + (and (match_code "const_int") + (match_test "IN_RANGE (ival, -16, 15)"))) diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md index adad2409fb6..66e0e998e41 100644 --- a/gcc/config/riscv/corev.md +++ b/gcc/config/riscv/corev.md @@ -706,3 +706,40 @@ [(set_attr "type" "load") (set_attr "mode" "SI")]) + +;; XCVBI Instructions +(define_insn "*cv_branch" + [(set (pc) + (if_then_else +(match_operator 1 "equality_operator" +[(match_operand:X 2 "register_operand" "r") + (match_operand:X 3 "const_int5s_operand" "CV_bi_sign5")]) +(label_ref (match_operand 0 "" "")) +(pc)))] + "TARGET_XCVBI" +{ + if (get_attr_length (insn) == 12) +return "cv.b%N1\t%2,%z3,1f; jump\t%l0,ra; 1:"; + + return "cv.b%C1imm\t%2,%3,%0"; +} + [(set_attr "type" "branch") + (set_attr "mode" "none")]) + +(define_insn "*branch" + [(set (pc) +(if_then_else + (match_operator 1 "ordered_comparison_operator" + [(match_operand:X 2 "register_operand" "r") + (match_operand:X 3 "reg_or_0_operand" "rJ")]) +
[PATCH v5 0/1] RISC-V: Support CORE-V XCVBI extension
Thank you for reviewing my patches and merging XCVelw. This patch series presents the comprehensive implementation of the BI extension for CORE-V. Tested with riscv-gnu-toolchain on binutils, ld, gas and gcc testsuites to ensure its correctness and compatibility with the existing codebase. However, your input, reviews, and suggestions are invaluable in making this extension even more robust. The CORE-V builtins are described in the specification [1] and work can be found in the OpenHW group's Github repository [2]. [1] github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md [2] github.com/openhwgroup/corev-gcc Contributors: Mary Bennett Nandni Jamnadas Pietra Ferreira Charlie Keaney Jessica Mills Craig Blackmore Simon Cook Jeremy Bennett Helene Chelin RISC-V: Add support for XCVbi extension in CV32E40P gcc/common/config/riscv/riscv-common.cc | 4 ++ gcc/config/riscv/constraints.md | 21 +--- gcc/config/riscv/corev.def| 3 ++ gcc/config/riscv/corev.md | 51 ++- gcc/config/riscv/predicates.md| 4 ++ gcc/config/riscv/riscv.md | 2 +- gcc/config/riscv/riscv.opt| 2 + gcc/doc/sourcebuild.texi | 3 ++ .../gcc.target/riscv/cv-bi-beqimm-compile-1.c | 17 +++ .../gcc.target/riscv/cv-bi-beqimm-compile-2.c | 48 +++ .../gcc.target/riscv/cv-bi-bneimm-compile-1.c | 17 +++ .../gcc.target/riscv/cv-bi-bneimm-compile-2.c | 48 +++ gcc/testsuite/lib/target-supports.exp | 13 + 12 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-2.c -- 2.34.1
Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a
Hi, >> Is there no benefit to using SWPPL for RELEASE here? Similarly for the >> others. > > We started off implementing all possible memory orderings available. > Wilco saw value in merging less restricted orderings into more > restricted ones - mainly to reduce codesize in less frequently used atomics. > > This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions > a little smaller. Benchmarking showed that LSE and LSE2 RMW atomics have similar performance once the atomic is acquire, release or both. Given there is already a significant overhead due to the function call, PLT indirection and argument setup, it doesn't make sense to add extra taken branches that may mispredict or cause extra fetch cycles... The goal for next GCC is to inline these instructions directly to avoid these overheads. Cheers, Wilco
Re: Add -falign-all-functions
On Thu, 4 Jan 2024, Jan Hubicka wrote: > Hi, > this patch adds new option -falign-all-functions which works like > -falign-functions, but applies to all functions including those in cold > regions. As discussed in the PR log, this is needed for atomically > patching function entries in the kernel. > > An option would be to make -falign-function mandatory, but I think it is not a > good idea, since original purpose of -falign-funtions is optimization of > instruction decode and cache size. Having -falign-all-functions is > backwards compatible. Richi also suggested extending syntax of the > -falign-functions parameters (which is already non-trivial) but it seems > to me that having separate flag is more readable. > > Bootstrapped/regtested x86_64-linux, OK for master and later > backports to release branches? > > gcc/ChangeLog: > > PR middle-end/88345 > * common.opt: Add -falign-all-functions > * doc/invoke.texi: Add -falign-all-functions. > (-falign-functions, -falign-labels, -falign-loops): Document > that alignment is ignored in cold code. > * flags.h (align_loops): Reindent. > (align_jumps): Reindent. > (align_labels): Reindent. > (align_functions): Reindent. > (align_all_functions): New macro. > * opts.cc (common_handle_option): Handle -falign-all-functions. > * toplev.cc (parse_alignment_opts): Likewise. > * varasm.cc (assemble_start_function): Likewise. > > diff --git a/gcc/common.opt b/gcc/common.opt > index d263a959df3..fea2c855fcf 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1033,6 +1033,13 @@ faggressive-loop-optimizations > Common Var(flag_aggressive_loop_optimizations) Optimization Init(1) > Aggressively optimize loops using language constraints. > > +falign-all-functions > +Common Var(flag_align_all_functions) Optimization > +Align the start of functions. all functions or maybe "of every function."? > + > +falign-all-functions= > +Common RejectNegative Joined Var(str_align_all_functions) Optimization > + > falign-functions > Common Var(flag_align_functions) Optimization > Align the start of functions. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index d272b9228dd..ad3d75d310c 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}. > @xref{Optimize Options,,Options that Control Optimization}. > @gccoptlist{-faggressive-loop-optimizations > -falign-functions[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} > +-falign-all-functions=[@var{n}] > -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} > -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} > -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} > @@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two > greater than or > equal to @var{n}, skipping up to @var{m}-1 bytes. This ensures that at > least the first @var{m} bytes of the function can be fetched by the CPU > without crossing an @var{n}-byte alignment boundary. > +This is an optimization of code performance and alignment is ignored for > +functions considered cold. If alignment is required for all functions, > +use @option{-falign-all-functions}. > > If @var{m} is not specified, it defaults to @var{n}. > > @@ -14210,6 +14214,12 @@ overaligning functions. It attempts to instruct the > assembler to align > by the amount specified by @option{-falign-functions}, but not to > skip more bytes than the size of the function. > > +@opindex falign-all-functions=@var{n} > +@item -falign-all-functions > +Specify minimal alignment for function entry. Unlike > @option{-falign-functions} > +this alignment is applied also to all functions (even those considered cold). > +The alignment is also not affected by @option{-flimit-function-alignment} > + For functions with two entries (like on powerpc), which entry does this apply to? I suppose the external ABI entry, not the local one? But how does this then help to align the patchable entry (the common local entry should be aligned?). Should we align _both_ entries? > @opindex falign-labels > @item -falign-labels > @itemx -falign-labels=@var{n} > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}. > Align loops to a power-of-two boundary. If the loops are executed > many times, this makes up for any execution of the dummy padding > instructions. > +This is an optimization of code performance and alignment is ignored for > +loops considered cold. > > If @option{-falign-labels} is greater than this value, then its value > is used instead. > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}. > Align branch targets to a power-of-two boundary, for branch targets > where the targets can only be reached by jumping. In this case, > no dummy operations need be executed. > +This is an optimization of code performance and alignment is ignored for > +jumps
[PATCH][frontend]: don't ice with pragma NOVECTOR if loop in C has no condition [PR113267]
Hi All, In C you can have loops without a condition, the original version of the patch was rejecting the use of #pragma GCC novector, however during review it was changed to not due this with the reason that we didn't want to give a compile error with such cases. However because annotations seem to be only be allowed on conditions (unless I'm mistaken?) the attached example ICEs because there's no condition. This will have it ignore the pragma instead of ICEing. I don't know if this is the best solution, but as far as I can tell we can't attach the annotation to anything else. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/c/ChangeLog: PR c/113267 * c-parser.cc (c_parser_for_statement): Skip the pragma is no cond. gcc/testsuite/ChangeLog: PR c/113267 * gcc.dg/pr113267.c: New test. --- inline copy of patch -- diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index c3724304580cf54f52655e10d2697c68966b9a17..e8300cea8ef7cedead5871e40c2a9ba5333bf839 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -8442,7 +8442,7 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); - if (novector && cond != error_mark_node) + if (novector && cond && cond != error_mark_node) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, annot_expr_no_vector_kind), diff --git a/gcc/testsuite/gcc.dg/pr113267.c b/gcc/testsuite/gcc.dg/pr113267.c new file mode 100644 index ..8b6fa08324eb12ad6493291cca8e80bd3a072ba8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr113267.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ + +void f (char *a, int i) +{ +#pragma GCC novector + for (;;i++) +a[i] *= 2; +} -- diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index c3724304580cf54f52655e10d2697c68966b9a17..e8300cea8ef7cedead5871e40c2a9ba5333bf839 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -8442,7 +8442,7 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); - if (novector && cond != error_mark_node) + if (novector && cond && cond != error_mark_node) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, annot_expr_no_vector_kind), diff --git a/gcc/testsuite/gcc.dg/pr113267.c b/gcc/testsuite/gcc.dg/pr113267.c new file mode 100644 index ..8b6fa08324eb12ad6493291cca8e80bd3a072ba8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr113267.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ + +void f (char *a, int i) +{ +#pragma GCC novector + for (;;i++) +a[i] *= 2; +}
Re: [PATCH] lower-bitint: Fix up lowering of huge _BitInt 0 PHI args [PR113120]
On Thu, 4 Jan 2024, Jakub Jelinek wrote: > Hi! > > The PHI argument expansion of INTEGER_CSTs where bitint_min_cst_precision > returns significantly smaller precision than the PHI result precision is > optimized by loading the much smaller constant (if any) from memory and > then either setting the remaining limbs to {} or calling memset with -1. > The case where no constant is loaded (i.e. c == NULL) is when the > INTEGER_CST is 0 or all_ones - in that case we can just set all the limbs > to {} or call memset with -1 on everything. > While for the all ones extension case that is what the code was already > doing, I missed one spot in the zero extension case, where constricting > the offset of the MEM_REF lhs of the = {} store it was using unconditionally > the byte size of c, which obviously doesn't work if c is NULL. In that case > we want to use zero offset. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? OK. Richard. > 2024-01-04 Jakub Jelinek > > PR tree-optimization/113120 > * gimple-lower-bitint.cc (gimple_lower_bitint): Fix handling of very > large _BitInt zero INTEGER_CST PHI argument. > > * gcc.dg/bitint-62.c: New test. > > --- gcc/gimple-lower-bitint.cc.jj 2024-01-03 11:51:27.0 +0100 > +++ gcc/gimple-lower-bitint.cc2024-01-03 13:53:30.699328045 +0100 > @@ -6582,8 +6582,12 @@ gimple_lower_bitint (void) > = build_array_type_nelts (large_huge.m_limb_type, > nelts); > tree ptype = build_pointer_type (TREE_TYPE (v1)); > - tree off = fold_convert (ptype, > -TYPE_SIZE_UNIT (TREE_TYPE (c))); > + tree off; > + if (c) > + off = fold_convert (ptype, > + TYPE_SIZE_UNIT (TREE_TYPE (c))); > + else > + off = build_zero_cst (ptype); > tree vd = build2 (MEM_REF, vtype, > build_fold_addr_expr (v1), off); > g = gimple_build_assign (vd, build_zero_cst (vtype)); > --- gcc/testsuite/gcc.dg/bitint-62.c.jj 2024-01-03 14:11:22.332301884 > +0100 > +++ gcc/testsuite/gcc.dg/bitint-62.c 2024-01-03 14:10:58.219640178 +0100 > @@ -0,0 +1,32 @@ > +/* PR tree-optimization/113120 */ > +/* { dg-do compile { target bitint } } */ > +/* { dg-options "-std=c23 -O2" } */ > + > +_BitInt(8) a; > +_BitInt(55) b; > + > +#if __BITINT_MAXWIDTH__ >= 401 > +static __attribute__((noinline, noclone)) void > +foo (unsigned _BitInt(1) c, _BitInt(401) d) > +{ > + c /= d << b; > + a = c; > +} > + > +void > +bar (void) > +{ > + foo (1, 4); > +} > +#endif > + > +#if __BITINT_MAXWIDTH__ >= 6928 > +_BitInt(6928) > +baz (int x, _BitInt(6928) y) > +{ > + if (x) > +return y; > + else > +return 0; > +} > +#endif > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] lower-bitint: Punt .*_OVERFLOW optimization if cast from IMAGPART_EXPR appears before REALPART_EXPR [PR113119]
On Thu, 4 Jan 2024, Jakub Jelinek wrote: > Hi! > > _BitInt lowering for .{ADD,SUB,MUL}_OVERFLOW calls which have both > REALPART_EXPR and IMAGPART_EXPR used and have a cast from the IMAGPART_EXPR > to a boolean or normal integral type lowers them at the point of > the REALPART_EXPR statement (which is especially needed if the lhs of > the call is complex with large/huge _BitInt element type); we emit the > stmt to set the lhs of the cast at the same spot as well. > Normally, the lowering of __builtin_{add,sub,mul}_overflow arranges > the REALPART_EXPR to come before IMAGPART_EXPR, followed by cast from that, > but as the testcase shows, a redundant __builtin_*_overflow call and VN > can reorder those and we then ICE because the def-stmt of the former cast > from IMAGPART_EXPR may appear after its uses. > We already check that all of REALPART_EXPR, IMAGPART_EXPR and the cast > from the latter appear in the same bb as the .{ADD,SUB,MUL}_OVERFLOW call > in the optimization, the following patch just extends it to make sure > cast appears after REALPART_EXPR; if not, we punt on the optimization and > expand it as a store of a complex _BitInt on the location of the ifn call. > Only the testcase in the testsuite is changed by the patch, all other > __builtin_*_overflow* calls in the bitint* tests (and there are quite a few) > have REALPART_EXPR first. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. > 2024-01-04 Jakub Jelinek > > PR tree-optimization/113119 > * gimple-lower-bitint.cc (optimizable_arith_overflow): Punt if > both REALPART_EXPR and cast from IMAGPART_EXPR appear, but cast > is before REALPART_EXPR. > > * gcc.dg/bitint-61.c: New test. > > --- gcc/gimple-lower-bitint.cc.jj 2023-12-22 12:27:58.497437164 +0100 > +++ gcc/gimple-lower-bitint.cc2023-12-23 10:44:05.586522553 +0100 > @@ -305,6 +305,7 @@ optimizable_arith_overflow (gimple *stmt >imm_use_iterator ui; >use_operand_p use_p; >int seen = 0; > + gimple *realpart = NULL, *cast = NULL; >FOR_EACH_IMM_USE_FAST (use_p, ui, lhs) > { >gimple *g = USE_STMT (use_p); > @@ -317,6 +318,7 @@ optimizable_arith_overflow (gimple *stmt > if ((seen & 1) != 0) > return 0; > seen |= 1; > + realpart = g; > } >else if (gimple_assign_rhs_code (g) == IMAGPART_EXPR) > { > @@ -338,13 +340,35 @@ optimizable_arith_overflow (gimple *stmt > if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs2)) > || TREE_CODE (TREE_TYPE (lhs2)) == BITINT_TYPE) > return 0; > + cast = use_stmt; > } >else > return 0; > } >if ((seen & 2) == 0) > return 0; > - return seen == 3 ? 2 : 1; > + if (seen == 3) > +{ > + /* Punt if the cast stmt appears before realpart stmt, because > + if both appear, the lowering wants to emit all the code > + at the location of realpart stmt. */ > + gimple_stmt_iterator gsi = gsi_for_stmt (realpart); > + unsigned int cnt = 0; > + do > + { > + gsi_prev_nondebug (); > + if (gsi_end_p (gsi) || gsi_stmt (gsi) == cast) > + return 0; > + if (gsi_stmt (gsi) == stmt) > + return 2; > + /* If realpart is too far from stmt, punt as well. > + Usually it will appear right after it. */ > + if (++cnt == 32) > + return 0; > + } > + while (1); > +} > + return 1; > } > > /* If STMT is some kind of comparison (GIMPLE_COND, comparison assignment) > --- gcc/testsuite/gcc.dg/bitint-61.c.jj 2023-12-23 10:46:17.808658852 > +0100 > +++ gcc/testsuite/gcc.dg/bitint-61.c 2023-12-23 10:46:02.482874865 +0100 > @@ -0,0 +1,17 @@ > +/* PR tree-optimization/113119 */ > +/* { dg-do compile { target bitint } } */ > +/* { dg-options "-std=c23 -O2" } */ > + > +_BitInt(8) b; > +_Bool c; > +#if __BITINT_MAXWIDTH__ >= 8445 > +_BitInt(8445) a; > + > +void > +foo (_BitInt(4058) d) > +{ > + c = __builtin_add_overflow (a, 0ULL, ); > + __builtin_add_overflow (a, 0ULL, ); > + b = d; > +} > +#endif > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH]middle-end: check if target can do extract first for early breaks [PR113199]
On Tue, 2 Jan 2024, Tamar Christina wrote: > Hi All, > > I was generating the vector reverse mask without checking if the target > actually supported such an operation. > > It also seems like more targets implement VEC_EXTRACT than permute on mask > registers. > > So this adds a check for IFN_VEC_EXTRACT support when required and changes > the select first code to use it. > > This is good for now since masks always come from whilelo. But in the future > when masks can come from other sources we will need the old code back. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no issues with --enable-checking=release --enable-lto > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra. > tested on cross cc1 for amdgcn-amdhsa and issue fixed. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/113199 > * tree-vect-loop.cc (vectorizable_live_operation_1): Use > IFN_VEC_EXTRACT. > (vectorizable_live_operation): Check for IFN_VEC_EXTRACT support. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/113199 > * gcc.target/gcn/pr113199.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/gcc.target/gcn/pr113199.c > b/gcc/testsuite/gcc.target/gcn/pr113199.c > new file mode 100644 > index > ..8a641e5536e80e207ca0163cac66c0f4f6ca93f7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/gcn/pr113199.c > @@ -0,0 +1,44 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > + > +typedef long unsigned int size_t; > +typedef int wchar_t; > +struct tm > +{ > + int tm_mon; > + int tm_year; > +}; > +int abs (int); > +struct lc_time_T { const char *month[12]; }; > +struct __locale_t * __get_current_locale (void) { } > +const struct lc_time_T * __get_time_locale (struct __locale_t *locale) { } > +const wchar_t * __ctloc (wchar_t *buf, const char *elem, size_t *len_ret) { > return buf; } > +size_t > +__strftime (wchar_t *s, size_t maxsize, const wchar_t *format, > + const struct tm *tim_p, struct __locale_t *locale) > +{ > + size_t count = 0; > + const wchar_t *ctloc; > + wchar_t ctlocbuf[256]; > + size_t i, ctloclen; > + const struct lc_time_T *_CurrentTimeLocale = __get_time_locale (locale); > +{ > + switch (*format) > + { > + case L'B': > + (ctloc = __ctloc (ctlocbuf, _CurrentTimeLocale->month[tim_p->tm_mon], > )); > + for (i = 0; i < ctloclen; i++) > + { > + if (count < maxsize - 1) > + s[count++] = ctloc[i]; > + else > + return 0; > + { > + int century = tim_p->tm_year >= 0 > +? tim_p->tm_year / 100 + 1900 / 100 > +: abs (tim_p->tm_year + 1900) / 100; > + } > + } > + } > +} > +} > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index > 37f1be1101ffae779214056a0886411e0683e887..5aa92e67444e7aacf458fffa1428f1983c482374 > 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -10648,36 +10648,18 @@ vectorizable_live_operation_1 (loop_vec_info > loop_vinfo, > _VINFO_MASKS (loop_vinfo), > 1, vectype, 0); >tree scalar_res; > + gimple_seq_add_seq (, tem); > >/* For an inverted control flow with early breaks we want EXTRACT_FIRST > - instead of EXTRACT_LAST. Emulate by reversing the vector and mask. */ > + instead of EXTRACT_LAST. For now since the mask always comes from a > + WHILELO we can get the first element ignoring the mask since CLZ of the > + mask will always be zero. */ >if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) > - { > - /* First create the permuted mask. */ > - tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask)); > - tree perm_dest = copy_ssa_name (mask); > - gimple *perm_stmt > - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask, > -mask, perm_mask); > - vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt, > -); > - mask = perm_dest; > - > - /* Then permute the vector contents. */ > - tree perm_elem = perm_mask_for_reverse (vectype); > - perm_dest = copy_ssa_name (vec_lhs_phi); > - perm_stmt > - = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi, > -vec_lhs_phi, perm_elem); > - vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt, > -); > - vec_lhs_phi = perm_dest; > - } > - > - gimple_seq_add_seq (, tem); > - > - scalar_res = gimple_build (, CFN_EXTRACT_LAST, scalar_type, > - mask, vec_lhs_phi); > + scalar_res = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE (vectype), > +vec_lhs_phi, bitstart); So bitstart is always zero? I
Re: [PATCH]middle-end: maintain LCSSA form when peeled vector iterations have virtual operands
On Fri, 29 Dec 2023, Tamar Christina wrote: > Hi All, > > This patch fixes several interconnected issues. > > 1. When picking an exit we wanted to check for niter_desc.may_be_zero not > true. >i.e. we want to pick an exit which we know will iterate at least once. >However niter_desc.may_be_zero is not a boolean. It is a tree that encodes >a boolean value. !niter_desc.may_be_zero is just checking if we have some >information, not what the information is. This leads us to pick a more >difficult to vectorize exit more often than we should. > > 2. Because we had this bug, we used to pick an alternative exit much more > ofthen >which showed one issue, when the loop accesses memory and we "invert it" we >would corrupt the VUSE chain. This is because on an peeled vector > iteration >every exit restarts the loop (i.e. they're all early) BUT since we may have >performed a store, the vUSE would need to be updated. This version > maintains >virtual PHIs correctly in these cases. Note that we can't simply remove > all >of them and recreate them because we need the PHI nodes still in the right >order for if skip_vector. > > 3. Since we're moving the stores to a safe location I don't think we actually >need to analyze whether the store is in range of the memref, because if we >ever get there, we know that the loads must be in range, and if the loads > are >in range and we get to the store we know the early breaks were not taken > and >so the scalar loop would have done the VF stores too. > > 4. Instead of searching for where to move stores to, they should always be in >exit belonging to the latch. We can only ever delay stores and even if we >pick a different exit than the latch one as the main one, effects still >happen in program order when vectorized. If we don't move the stores to > the >latch exit but instead to whever we pick as the "main" exit then we can >perform incorrect memory accesses (luckily these are trapped by > verify_ssa). > > 5. We only used to analyze loads inside the same BB as an early break, and > also >we'd never analyze the ones inside the block where we'd be moving memory >references to. This is obviously bogus and to fix it this patch splits > apart >the two constraints. We first validate that all load memory references are >in bounds and only after that do we perform the alias checks for the > writes. >This makes the code simpler to understand and more trivially correct. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no issues with --enable-checking=release --enable-lto > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/113137 > PR tree-optimization/113136 > PR tree-optimization/113172 > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): > (vect_do_peeling): Maintain virtual PHIs on inverted loops. > * tree-vect-loop.cc (vec_init_loop_exit_info): Pick exit closes to > latch. > (vect_create_loop_vinfo): Record all conds instead of only alt ones. > * tree-vectorizer.h: Fix comment > > gcc/testsuite/ChangeLog: > > PR tree-optimization/113137 > PR tree-optimization/113136 > PR tree-optimization/113172 > * g++.dg/vect/vect-early-break_4-pr113137.cc: New test. > * g++.dg/vect/vect-early-break_5-pr113137.cc: New test. > * gcc.dg/vect/vect-early-break_95-pr113137.c: New test. > * gcc.dg/vect/vect-early-break_96-pr113136.c: New test. > * gcc.dg/vect/vect-early-break_97-pr113172.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc > b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc > new file mode 100644 > index > ..f78db8669dcc65f1b45ea78f4433d175e1138332 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_4-pr113137.cc > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-add-options vect_early_break } */ > +/* { dg-require-effective-target vect_early_break } */ > +/* { dg-require-effective-target vect_int } */ > + > +int b; > +void a() __attribute__((__noreturn__)); > +void c() { > + char *buf; > + int bufsz = 64; > + while (b) { > +!bufsz ? a(), 0 : *buf++ = bufsz--; > +b -= 4; > + } > +} > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc > b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc > new file mode 100644 > index > ..dcd19fa2d2145e09de18279479b3f20fc27336ba > --- /dev/null > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_5-pr113137.cc > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-add-options
Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
On 1/5/24 11:10, Richard Sandiford wrote: Victor Do Nascimento writes: The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their suffixes possibly altered. This patch uses pre-processor `#define' statements to map each suffix to a descriptive feature name macro, for example: #define LSE2 _i1 and reconstructs function names with the pre-processor's token concatenation feature, such that for `MACRO(_i)', we would now have `MACRO_FEAT(name, feature)' and in the macro definition body we replace `name` with `name##feature`. FWIW, another way of doing this would be to have: #define CORE(NAME) NAME #define LSE2(NAME) NAME##_i1 and use feature(name) instead of name##feature. This has the slight advantage of not using ## on empty tokens, and the maybe slightly better advantage of not needing the extra forwarding step in: #define ENTRY_FEAT(name, feat) \ ENTRY_FEAT1(name, feat) #define ENTRY_FEAT1(name, feat) \ WDYT? Richard While from a strictly stylistic point of view, I'm not so keen on the resulting interface and its 'function call within a function call' look, e.g. ENTRY (LSE2 (libat_compare_exchange_16)) and ALIAS (LSE128 (libat_compare_exchange_16), \ LSE2 (libat_compare_exchange_16)) on the implementation-side of things, I like the benefits this brings about. Namely allowing the use of the unaltered original implementations of the ENTRY, END and ALIAS macros with the aforementioned advantages of not having to use ## on empty tokens and abolishing the need for the extra forwarding step. I'm happy enough to go with this approach. Cheers Consequently, for base functionality, where the ifunc suffix is absent, the macro interface remains the same. For example, the entry and endpoints of `libat_store_16' remain defined by: - ENTRY (libat_store_16) and - END (libat_store_16) For the LSE2 implementation of the same 16-byte atomic store, we now have: - ENTRY_FEAT (libat_store_16, LSE2) and - END_FEAT (libat_store_16, LSE2) For the alising of ifunc names, we define the following new implementation of the ALIAS macro: - ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX) Defining the base feature name macro to map `CORE' to the empty string, mapping LSE2 to the base implementation, we'd alias the LSE2 `libat_exchange_16' to it base implementation with: - ALIAS (libat_exchange_16, LSE2, CORE) libatomic/ChangeLog: * config/linux/aarch64/atomic_16.S (CORE): New macro. (LSE2): Likewise. (ENTRY_FEAT): Likewise. (END_FEAT): Likewise. (ENTRY_FEAT1): Likewise. (END_FEAT1): Likewise. (ALIAS): Modify macro to take in `arch' arguments. --- libatomic/config/linux/aarch64/atomic_16.S | 83 +- 1 file changed, 49 insertions(+), 34 deletions(-) diff --git a/libatomic/config/linux/aarch64/atomic_16.S b/libatomic/config/linux/aarch64/atomic_16.S index a099037179b..eb8e749b8a2 100644 --- a/libatomic/config/linux/aarch64/atomic_16.S +++ b/libatomic/config/linux/aarch64/atomic_16.S @@ -40,22 +40,38 @@ .arch armv8-a+lse -#define ENTRY(name) \ - .global name; \ - .hidden name; \ - .type name,%function; \ - .p2align 4; \ -name: \ - .cfi_startproc; \ +#define ENTRY(name) ENTRY_FEAT (name, CORE) + +#define ENTRY_FEAT(name, feat) \ + ENTRY_FEAT1(name, feat) + +#define ENTRY_FEAT1(name, feat)\ + .global name##feat; \ + .hidden name##feat; \ + .type name##feat,%function; \ + .p2align 4; \ +name##feat:\ + .cfi_startproc; \ hint34 // bti c -#define END(name) \ - .cfi_endproc; \ - .size name, .-name; +#define END(name) END_FEAT (name, CORE) -#define ALIAS(alias,name) \ - .global alias; \ - .set alias, name; +#define END_FEAT(name, feat) \ + END_FEAT1(name, feat) + +#define END_FEAT1(name, feat) \ + .cfi_endproc; \ + .size name##feat, .-name##feat; + +#define ALIAS(alias, from, to) \ + ALIAS1(alias,from,to) + +#define ALIAS1(alias, from, to)\ + .global alias##from;\ + .set alias##from, alias##to; + +#define CORE +#define LSE2 _i1 #define res0 x0 #define res1 x1 @@ -108,7 +124,7 @@ ENTRY (libat_load_16) END (libat_load_16) -ENTRY (libat_load_16_i1) +ENTRY_FEAT (libat_load_16, LSE2) cbnzw1, 1f /* RELAXED. */ @@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1) ldp res0, res1,
Re: [PATCH]middle-end: Fix dominators updates when peeling with multiple exits [PR113144]
On Fri, 29 Dec 2023, Tamar Christina wrote: > Hi All, > > Only trying to update certain dominators doesn't seem to work very well > because as the loop gets versioned, peeled, or skip_vector then we end up with > very complicated control flow. This means that the final merge blocks for the > loop exit are not easy to find or update. > > Instead of trying to pick which exits to update, this changes it to update all > the blocks reachable by the new exits. This is because they'll contain common > blocks with e.g. the versioned loop. It's these blocks that need an update > most of the time. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? This makes it quadratic in the number of vectorized early exit loops in a function. The vectorizer CFG manipulation operates in a local enough bubble that programmatic updating of dominators should be possible (after all we manage to produce correct SSA form!), the proposed change gets us too far off to a point where re-computating dominance info is likely cheaper (but no, we shouldn't do this either). Can you instead give manual updating a try again? I think versioning should produce up-to-date dominator info, it's only when you redirect branches during peeling that you'd need adjustments - but IIRC we're never introducing new merges? IIRC we can't wipe dominators during transform since we query them during code generation. We possibly could code generate all CFG manipulations of all vectorized loops, recompute all dominators and then do code generation of all vectorized loops. But then we're doing a loop transform and the exits will ultimatively end up in the same place, so the CFG and dominator update is bound to where the original exits went to. Richard > Thanks, > Tamar > > gcc/ChangeLog: > > PR middle-end/113144 > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): > Update all dominators reachable from exit. > > gcc/testsuite/ChangeLog: > > PR middle-end/113144 > * gcc.dg/vect/vect-early-break_94-pr113144.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c > b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c > new file mode 100644 > index > ..903fe7be6621e81db6f29441e4309fa213d027c5 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c > @@ -0,0 +1,41 @@ > +/* { dg-do compile } */ > +/* { dg-add-options vect_early_break } */ > +/* { dg-require-effective-target vect_early_break } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > + > +long tar_atol256_max, tar_atol256_size, tar_atosl_min; > +char tar_atol256_s; > +void __errno_location(); > + > + > +inline static long tar_atol256(long min) { > + char c; > + int sign; > + c = tar_atol256_s; > + sign = c; > + while (tar_atol256_size) { > +if (c != sign) > + return sign ? min : tar_atol256_max; > +c = tar_atol256_size--; > + } > + if ((c & 128) != (sign & 128)) > +return sign ? min : tar_atol256_max; > + return 0; > +} > + > +inline static long tar_atol(long min) { > + return tar_atol256(min); > +} > + > +long tar_atosl() { > + long n = tar_atol(-1); > + if (tar_atosl_min) { > +__errno_location(); > +return 0; > + } > + if (n > 0) > +return 0; > + return n; > +} > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index > 1066ea17c5674e03412b3dcd8a62ddf4dd54cf31..3810983a80c8b989be9fd9a9993642069fd39b99 > 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -1716,8 +1716,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop > *loop, edge loop_exit, > /* Now link the alternative exits. */ > if (multiple_exits_p) > { > - set_immediate_dominator (CDI_DOMINATORS, new_preheader, > -main_loop_exit_block); > for (auto gsi_from = gsi_start_phis (loop->header), > gsi_to = gsi_start_phis (new_preheader); > !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); > @@ -1751,12 +1749,26 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop > *loop, edge loop_exit, > >/* Finally after wiring the new epilogue we need to update its main > exit >to the original function exit we recorded. Other exits are already > - correct. */ > + correct. Because of versioning, skip vectors and others we must update > + the dominators of every node reachable by the new exits. */ >if (multiple_exits_p) > { > update_loop = new_loop; > - for (edge e : get_loop_exit_edges (loop)) > - doms.safe_push (e->dest); > + hash_set visited; > + auto_vec workset; > + edge ev; > + edge_iterator ei; > + workset.safe_splice
Re: [PATCH]middle-end: rejects loops with nonlinear inductions and early breaks [PR113163]
On Fri, 29 Dec 2023, Tamar Christina wrote: > Hi All, > > We can't support nonlinear inductions other than neg when vectorizing > early breaks and iteration count is known. > > For early break we currently require a peeled epilog but in these cases > we can't compute the remaining values. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > tested on cross cc1 for amdgcn-amdhsa and issue fixed. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR middle-end/113163 > * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Misses sth. > gcc/testsuite/ChangeLog: > > PR middle-end/113163 > * gcc.target/gcn/pr113163.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/gcc.target/gcn/pr113163.c > b/gcc/testsuite/gcc.target/gcn/pr113163.c > new file mode 100644 > index > ..99b0fdbaf3a3152ca008b5109abf6e80d8cb3d6a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/gcn/pr113163.c > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -ftree-vectorize" } */ > + > +struct _reent { union { struct { char _l64a_buf[8]; } _reent; } _new; }; > +static const char R64_ARRAY[] = > "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"; > +char * > +_l64a_r (struct _reent *rptr, > + long value) > +{ > + char *ptr; > + char *result; > + int i, index; > + unsigned long tmp = (unsigned long)value & 0x; > + result = > + (( > + rptr > + )->_new._reent._l64a_buf) > + ; > + ptr = result; > + for (i = 0; i < 6; ++i) > +{ > + if (tmp == 0) > + { > + *ptr = '\0'; > + break; > + } > + *ptr++ = R64_ARRAY[index]; > + tmp >>= 6; > +} > +} > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index > 3810983a80c8b989be9fd9a9993642069fd39b99..f1bf43b3731868e7b053c186302fbeaf515be8cf > 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -2075,6 +2075,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo, >return false; > } > > + /* We can't support partial vectors and early breaks with an induction > + type other than add or neg since we require the epilog and can't > + perform the peeling. PR113163. */ > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) > + && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant () But why's that only for constant VF? We might never end up here with variable VF but the check looks odd ... OK with that clarified and/or the test removed. Thanks, Richard. > + && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) > + && induction_type != vect_step_op_neg) > +{ > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "Peeling for epilogue is not supported" > + " for nonlinear induction except neg" > + " when iteration count is known and early breaks.\n"); > + return false; > +} > + >return true; > } > > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH v2] c++/modules: Differentiate extern templates and TYPE_DECL_SUPPRESS_DEBUG [PR112820]
On Mon, Jan 8, 2024 at 10:58 AM Nathaniel Shead wrote: > > On Thu, Jan 04, 2024 at 03:39:15PM -0500, Patrick Palka wrote: > > On Sun, 3 Dec 2023, Nathaniel Shead wrote: > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? > > > > > > -- >8 -- > > > > > > The TYPE_DECL_SUPPRESS_DEBUG and DECL_EXTERNAL flags use the same > > > underlying bit. This is causing confusion when attempting to determine > > > the interface for a streamed-in class type, since the modules code > > > currently assumes that all DECL_EXTERNAL types are extern templates. > > > However, when -g is specified then TYPE_DECL_SUPPRESS_DEBUG (and hence > > > DECL_EXTERNAL) is marked on various other kinds of declarations, such as > > > vtables, which causes them to never be emitted. > > > > Good catch.. Maybe we should use different bits for these flags? I > > wouldn't be > > surprised if this bit sharing causes issues elsewhere in the compiler. The > > documentation in tree.h / tree-core.h says DECL_EXTERNAL is only valid for > > VAR_DECL and FUNCTION_DECL, so at one point it was safe to share the same > > bit > > but that's not true anymore it seems. > > > > Looking at tree-core.h:tree_decl_common luckily we have plenty of spare > > bits. > > We could also e.g. make TYPE_DECL_SUPPRESS_DEBUG use the decl_not_flexarray > > bit > > which is otherwise only used for FIELD_DECL. > > > > That seems like a good idea, thanks. How does this look? > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? OK if C++ folks are fine. Richard. > -- >8 -- > > Currently, DECL_EXTERNAL and TYPE_DECL_SUPPRESS_DEBUG share a bit. This > causes issues with module code, which then incorrectly assumes that > anything with suppressed debug info (such as vtables when '-g' is > specified) is an extern template and thus prevents their emission. > > This patch splits the two flags up; extern templates continue to use the > DECL_EXTERNAL flag (and the documentation is updated to indicate this), > but TYPE_DECL_SUPPRESS_DEBUG now uses the 'decl_not_flexarray' flag, > which currently is only used by FIELD_DECLs. > > PR c++/112820 > PR c++/102607 > > gcc/cp/ChangeLog: > > * pt.cc (mark_class_instantiated): Set DECL_EXTERNAL explicitly. > > gcc/ChangeLog: > > * tree-core.h (struct tree_decl_common): Update comments. > * tree.h (DECL_EXTERNAL): Update comments. > (TYPE_DECL_SUPPRESS_DEBUG): Use 'decl_not_flexarray' instead. > > gcc/testsuite/ChangeLog: > > * g++.dg/modules/debug-2_a.C: New test. > * g++.dg/modules/debug-2_b.C: New test. > * g++.dg/modules/debug-2_c.C: New test. > * g++.dg/modules/debug-3_a.C: New test. > * g++.dg/modules/debug-3_b.C: New test. > > Signed-off-by: Nathaniel Shead > --- > gcc/cp/pt.cc | 1 + > gcc/testsuite/g++.dg/modules/debug-2_a.C | 9 + > gcc/testsuite/g++.dg/modules/debug-2_b.C | 8 > gcc/testsuite/g++.dg/modules/debug-2_c.C | 9 + > gcc/testsuite/g++.dg/modules/debug-3_a.C | 8 > gcc/testsuite/g++.dg/modules/debug-3_b.C | 9 + > gcc/tree-core.h | 6 +++--- > gcc/tree.h | 8 > 8 files changed, 51 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C > create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc > index e38e7a773f0..7839745035b 100644 > --- a/gcc/cp/pt.cc > +++ b/gcc/cp/pt.cc > @@ -26256,6 +26256,7 @@ mark_class_instantiated (tree t, int extern_p) >SET_CLASSTYPE_EXPLICIT_INSTANTIATION (t); >SET_CLASSTYPE_INTERFACE_KNOWN (t); >CLASSTYPE_INTERFACE_ONLY (t) = extern_p; > + DECL_EXTERNAL (TYPE_NAME (t)) = extern_p; >TYPE_DECL_SUPPRESS_DEBUG (TYPE_NAME (t)) = extern_p; >if (! extern_p) > { > diff --git a/gcc/testsuite/g++.dg/modules/debug-2_a.C > b/gcc/testsuite/g++.dg/modules/debug-2_a.C > new file mode 100644 > index 000..eed0905542b > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/debug-2_a.C > @@ -0,0 +1,9 @@ > +// PR c++/112820 > +// { dg-additional-options "-fmodules-ts -g" } > +// { dg-module-cmi io } > + > +export module io; > + > +export struct error { > + virtual const char* what() const noexcept; > +}; > diff --git a/gcc/testsuite/g++.dg/modules/debug-2_b.C > b/gcc/testsuite/g++.dg/modules/debug-2_b.C > new file mode 100644 > index 000..fc9afbc02e0 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/modules/debug-2_b.C > @@ -0,0 +1,8 @@ > +// PR c++/112820 > +// { dg-additional-options "-fmodules-ts -g" } > + > +module io; > + > +const char* error::what() const noexcept { > + return "bla"; > +} > diff --git
Re: [PATCH] Add a late-combine pass [PR106594]
Jeff Law writes: > The other issue that's been in the back of my mind is costing. But I > think the model here is combine without regards to cost. No, it does take costing into account. For size, it's the usual "sum up the before and after insn costs and see which one is lower". For speed, the costs are weighted by execution frequency, so e.g. two insns of cost 4 in the same block can be combined into a single instruction of cost 8, but a hoisted invariant can only be combined into a loop body instruction if the loop body instruction's cost doesn't increase significantly. This is done by rtl_ssa::changes_are_worthwhile. Thanks, Richard
Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]
On Mon, Jan 8, 2024 at 3:35 AM Kewen.Lin wrote: > > Hi, > > As PR113100 shows, the unbiasing introduced by r14-6737 can > cause the scrubbing to overrun and screw some critical data > on stack like saved toc base consequently cause segfault on > Power. > > By checking PR112917, IMHO we should keep this unbiasing > guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 && > TARGET_STACK_BIAS), similar to some existing code special > treating SPARC stack bias. > > Bootstrapped and regtested on x86_64-redhat-linux and > powerpc64{,le}-linux-gnu. All reported failures in > PR113100 are gone. I also expect the culprit commit can > affect those ports with nonzero STACK_POINTER_OFFSET. > > Is it ok for trunk? OK > BR, > Kewen > - > PR middle-end/113100 > > gcc/ChangeLog: > > * builtins.cc (expand_builtin_stack_address): Guard stack point > adjustment with SPARC_STACK_BOUNDARY_HACK. > --- > gcc/builtins.cc | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index 125ea158ebf..9bad1e962b4 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -5450,6 +5450,7 @@ expand_builtin_stack_address () >rtx ret = convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx), > STACK_UNSIGNED); > > +#ifdef SPARC_STACK_BOUNDARY_HACK >/* Unbias the stack pointer, bringing it to the boundary between the > stack area claimed by the active function calling this builtin, > and stack ranges that could get clobbered if it called another > @@ -5476,7 +5477,9 @@ expand_builtin_stack_address () > (caller) function's active area as well, whereas those pushed or > allocated temporarily for a call are regarded as part of the > callee's stack range, rather than the caller's. */ > - ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET); > + if (SPARC_STACK_BOUNDARY_HACK) > +ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET); > +#endif > >return force_reg (ptr_mode, ret); > } > -- > 2.39.3
Re: [PATCH] sparc: Char arrays are 64-bit aligned on SPARC
On 2024-01-08 10:20, Eric Botcazou wrote: pr88077 fails on SPARC since char HeaderStr[1] in pr88077_1.c and long HeaderStr in pr88077_0.c differs in alignment. warning: alignment 4 of normal symbol `HeaderStr' in c_lto_pr88077_0.o is smaller than 8 used by the common definition in c_lto_pr88077_1.o I have never seen it though. Is that really a warning issued by GCC? Hello Eric! Thank you for reviewing the patches! No, this warning is not from GCC, it is from binutils ld. I forgot to mention that in the message. I get a similar warning from older versions of ld, so I do not think it is a new warning. It is also there with GCC 10. For the OK:ed patches (with your changes), can I push them to release/gcc-13 in addition to master? /Daniel
Re: [PATCH] gimplify: Fix ICE in recalculate_side_effects [PR113228]
On Sat, 6 Jan 2024, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs during regimplificatgion since the addition of > (convert (eqne zero_one_valued_p@0 INTEGER_CST@1)) > simplification. That simplification is novel in the sense that in > gimplify_expr it can turn an expression (comparison in particular) into > a SSA_NAME. Normally when gimplify_expr sees originally a SSA_NAME, it does > case SSA_NAME: > /* Allow callbacks into the gimplifier during optimization. */ > ret = GS_ALL_DONE; > break; > and doesn't try to recalculate side effects because of that, but in this > case gimplify_expr normally enters the: > default: > switch (TREE_CODE_CLASS (TREE_CODE (*expr_p))) > { > case tcc_comparison: > then does > *expr_p = gimple_boolify (*expr_p); > and then > *expr_p = fold_convert_loc (input_location, > org_type, *expr_p); > with this new match.pd simplification turns that tcc_comparison class > into SSA_NAME. Unlike the outer SSA_NAME handling though, this falls > through into > recalculate_side_effects (*expr_p); > > dont_recalculate: > break; > but unfortunately recalculate_side_effects doesn't handle SSA_NAME and ICEs > on it. > SSA_NAMEs don't ever have TREE_SIDE_EFFECTS set on those, so the following > patch fixes it by handling it similarly to the tcc_constant case. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. > 2024-01-06 Jakub Jelinek > > PR tree-optimization/113228 > * gimplify.cc (recalculate_side_effects): Do nothing for SSA_NAMEs. > > * gcc.c-torture/compile/pr113228.c: New test. > > --- gcc/gimplify.cc.jj2024-01-03 11:51:40.744603324 +0100 > +++ gcc/gimplify.cc 2024-01-05 13:32:34.351336320 +0100 > @@ -3344,6 +3344,9 @@ recalculate_side_effects (tree t) >return; > > default: > + if (code == SSA_NAME) > + /* No side-effects. */ > + return; >gcc_unreachable (); > } > } > --- gcc/testsuite/gcc.c-torture/compile/pr113228.c.jj 2024-01-05 > 13:27:42.876330301 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr113228.c2024-01-05 > 13:27:22.503609458 +0100 > @@ -0,0 +1,17 @@ > +/* PR tree-optimization/113228 */ > + > +int a, b, c, d, i; > + > +void > +foo (void) > +{ > + int k[3] = {}; > + int *l = > + for (d = 0; c; c--) > +for (i = 0; i <= 9; i++) > + { > + for (b = 1; b <= 4; b++) > + k[0] = k[0] == 0; > + *l |= k[d]; > + } > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
[PATCH] tree-optimization/113026 - avoid vector epilog in more cases
The following avoids creating a niter peeling epilog more consistently, matching what peeling later uses for the skip_vector condition, in particular when versioning is required which then also ensures the vector loop is entered unless the epilog is vectorized. This should ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed later, some refactoring could make that better matching. The patch also makes sure to adjust the upper bound of the epilogues when we do not have a skip edge around the vector loop. Bootstrapped and tested on x86_64-unknown-linux-gnu. Tamar, does that look OK wrt early-breaks? Thanks, Richard. PR tree-optimization/113026 * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Avoid an epilog in more cases. * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the epilogues niter upper bounds and estimates. * gcc.dg/torture/pr113026-1.c: New testcase. * gcc.dg/torture/pr113026-2.c: Likewise. --- gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 + gcc/tree-vect-loop-manip.cc | 32 +++ gcc/tree-vect-loop.cc | 6 - 4 files changed, 66 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c b/gcc/testsuite/gcc.dg/torture/pr113026-1.c new file mode 100644 index 000..56dfef3b36c --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Wall" } */ + +char dst[16]; + +void +foo (char *src, long n) +{ + for (long i = 0; i < n; i++) +dst[i] = src[i]; /* { dg-bogus "" } */ +} diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c b/gcc/testsuite/gcc.dg/torture/pr113026-2.c new file mode 100644 index 000..b9d5857a403 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Wall" } */ + +char dst1[17]; +void +foo1 (char *src, long n) +{ + for (long i = 0; i < n; i++) +dst1[i] = src[i]; /* { dg-bogus "" } */ +} + +char dst2[18]; +void +foo2 (char *src, long n) +{ + for (long i = 0; i < n; i++) +dst2[i] = src[i]; /* { dg-bogus "" } */ +} diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 9330183bfb9..927f76a0947 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count (); bb_before_epilog = loop_preheader_edge (epilog)->src; } + else + { + /* When we do not have a loop-around edge to the epilog we know +the vector loop covered at least VF scalar iterations unless +we have early breaks and the epilog will cover at most +VF - 1 + gap peeling iterations. +Update any known upper bound with this knowledge. */ + if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + if (epilog->any_upper_bound) + epilog->nb_iterations_upper_bound -= lowest_vf; + if (epilog->any_likely_upper_bound) + epilog->nb_iterations_likely_upper_bound -= lowest_vf; + if (epilog->any_estimate) + epilog->nb_iterations_estimate -= lowest_vf; + } + unsigned HOST_WIDE_INT const_vf; + if (vf.is_constant (_vf)) + { + const_vf += LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - 1; + if (epilog->any_upper_bound) + epilog->nb_iterations_upper_bound + = wi::umin (epilog->nb_iterations_upper_bound, const_vf); + if (epilog->any_likely_upper_bound) + epilog->nb_iterations_likely_upper_bound + = wi::umin (epilog->nb_iterations_likely_upper_bound, + const_vf); + if (epilog->any_estimate) + epilog->nb_iterations_estimate + = wi::umin (epilog->nb_iterations_estimate, const_vf); + } + } /* If loop is peeled for non-zero constant times, now niters refers to orig_niters - prolog_peeling, it won't overflow even the orig_niters diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a06771611ac..9dd573ef125 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1261,7 +1261,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo) the epilogue is unnecessary. */ && (!LOOP_REQUIRES_VERSIONING (loop_vinfo) || ((unsigned HOST_WIDE_INT) max_niter - > (th / const_vf) * const_vf + /* We'd like to
[PATCH] btf: print string position as comment for validation and testing purposes.
Hi everyone, This patch adds a comment to the BTF strings regarding their position within the section. This is useful for assembly inspection purposes. Regards, Cupertino When using -dA, this function was only printing as comment btf_string or btf_aux_string. This patch changes the comment to also include the position of the string within the section in hexadecimal format. gcc/ChangeLog: * btfout.cc (output_btf_strs): Changed. --- gcc/btfout.cc | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/btfout.cc b/gcc/btfout.cc index db4f1084f85c..04218adc9e66 100644 --- a/gcc/btfout.cc +++ b/gcc/btfout.cc @@ -1081,17 +1081,20 @@ static void output_btf_strs (ctf_container_ref ctfc) { ctf_string_t * ctf_string = ctfc->ctfc_strtable.ctstab_head; + static int str_pos = 0; while (ctf_string) { - dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string"); + dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_string, str_pos = 0x%x", str_pos); + str_pos += strlen(ctf_string->cts_str) + 1; ctf_string = ctf_string->cts_next; } ctf_string = ctfc->ctfc_aux_strtable.ctstab_head; while (ctf_string) { - dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string"); + dw2_asm_output_nstring (ctf_string->cts_str, -1, "btf_aux_string, str_pos = 0x%x", str_pos); + str_pos += strlen(ctf_string->cts_str) + 1; ctf_string = ctf_string->cts_next; } } -- 2.30.2
[PATCH] bpf: Correct BTF for kernel_helper attributed decls.
Hi everyone, This patch address the problem reported in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113225 Looking forward to your review. Cheers, Cupertino This patch fix a problem with kernel_helper attribute BTF information, which incorrectly generates BTF_KIND_FUNC entry. This BTF entry although accurate with traditional extern function declarations, once the function is attributed with kernel_helper, it is semantically incompatible of the kernel helpers in BPF infrastructure. gcc/ChangeLog: PR target/113225 * btfout.cc (btf_collect_datasec): Skip creating BTF info for extern and kernel_helper attributed function decls. gcc/testsuite/ChangeLog: * gcc.target/bpf/attr-kernel-helper.c: New test. --- gcc/btfout.cc | 7 +++ gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c | 15 +++ 2 files changed, 22 insertions(+) create mode 100644 gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c diff --git a/gcc/btfout.cc b/gcc/btfout.cc index 04218adc9e66..39e7bec43bfb 100644 --- a/gcc/btfout.cc +++ b/gcc/btfout.cc @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "cgraph.h" #include "varasm.h" +#include "stringpool.h" +#include "attribs.h" #include "dwarf2out.h" /* For lookup_decl_die. */ static int btf_label_num; @@ -429,6 +431,11 @@ btf_collect_datasec (ctf_container_ref ctfc) if (dtd == NULL) continue; + if (DECL_EXTERNAL (func->decl) + && (lookup_attribute ("kernel_helper", + DECL_ATTRIBUTES (func->decl))) != NULL_TREE) + continue; + /* Functions actually get two types: a BTF_KIND_FUNC_PROTO, and also a BTF_KIND_FUNC. But the CTF container only allocates one type per function, which matches closely with BTF_KIND_FUNC_PROTO. diff --git a/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c new file mode 100644 index ..7c5a0007c979 --- /dev/null +++ b/gcc/testsuite/gcc.target/bpf/attr-kernel-helper.c @@ -0,0 +1,15 @@ +/* Basic test for kernel_helper attribute BTF information. */ + +/* { dg-do compile } */ +/* { dg-options "-O0 -dA -gbtf" } */ + +extern int foo_helper(int) __attribute((kernel_helper(42))); +extern int foo_nohelper(int); + +int bar (int arg) +{ + return foo_helper (arg) + foo_nohelper (arg); +} + +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_nohelper'" 1 } } */ +/* { dg-final { scan-assembler-times "BTF_KIND_FUNC 'foo_helper'" 0 } } */ -- 2.30.2
Re: [PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option
On Tue, Jan 2, 2024 at 2:37 PM wrote: > > From: Pan Li > > According to the sematics of no-signed-zeros option, the backend > like RISC-V should treat the minus zero -0.0f as plus zero 0.0f. > > Consider below example with option -fno-signed-zeros. > > void > test (float *a) > { > *a = -0.0; > } > > We will generate code as below, which doesn't treat the minus zero > as plus zero. > > test: > lui a5,%hi(.LC0) > flw fa5,%lo(.LC0)(a5) > fsw fa5,0(a0) > ret > > .LC0: > .word -2147483648 // aka -0.0 (0x8000 in hex) > > This patch would like to fix the bug and treat the minus zero -0.0 > as plus zero, aka +0.0. Thus after this patch we will have asm code > as below for the above sampe code. > > test: > sw zero,0(a0) > ret > > This patch also fix the run failure of the test case pr30957-1.c. The > below tests are passed for this patch. We don't really expect targets to do this. The small testcase above is somewhat ill-formed with -fno-signed-zeros. Note there's no -0.0 in pr30957-1.c so why does that one fail for you? Does the -fvariable-expansion-in-unroller code maybe not trigger for riscv? I think we should go to PR30957 and see what that was filed originally for, the testcase doesn't make much sense to me. > * The riscv regression tests. > * The pr30957-1.c run tests. > > gcc/ChangeLog: > > * config/riscv/constraints.md: Leverage func > riscv_float_const_zero_rtx_p > for predicating the rtx is const zero float or not. > * config/riscv/predicates.md: Ditto. > * config/riscv/riscv.cc (riscv_const_insns): Ditto. > (riscv_float_const_zero_rtx_p): New func impl for predicating the rtx > is > const zero float or not. > (riscv_const_zero_rtx_p): New func impl for predicating the rtx > is const zero (both int and fp) or not. > * config/riscv/riscv-protos.h (riscv_float_const_zero_rtx_p): > New func decl. > (riscv_const_zero_rtx_p): Ditto. > * config/riscv/riscv.md: Making sure the operand[1] of movfp is > CONST0_RTX when the operand[1] is const zero float. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/no-signed-zeros-0.c: New test. > * gcc.target/riscv/no-signed-zeros-1.c: New test. > * gcc.target/riscv/no-signed-zeros-2.c: New test. > * gcc.target/riscv/no-signed-zeros-3.c: New test. > * gcc.target/riscv/no-signed-zeros-4.c: New test. > * gcc.target/riscv/no-signed-zeros-5.c: New test. > * gcc.target/riscv/no-signed-zeros-run-0.c: New test. > * gcc.target/riscv/no-signed-zeros-run-1.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/constraints.md | 2 +- > gcc/config/riscv/predicates.md| 2 +- > gcc/config/riscv/riscv-protos.h | 2 + > gcc/config/riscv/riscv.cc | 35 - > gcc/config/riscv/riscv.md | 49 --- > .../gcc.target/riscv/no-signed-zeros-0.c | 26 ++ > .../gcc.target/riscv/no-signed-zeros-1.c | 28 +++ > .../gcc.target/riscv/no-signed-zeros-2.c | 26 ++ > .../gcc.target/riscv/no-signed-zeros-3.c | 28 +++ > .../gcc.target/riscv/no-signed-zeros-4.c | 26 ++ > .../gcc.target/riscv/no-signed-zeros-5.c | 28 +++ > .../gcc.target/riscv/no-signed-zeros-run-0.c | 36 ++ > .../gcc.target/riscv/no-signed-zeros-run-1.c | 36 ++ > 13 files changed, 314 insertions(+), 10 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-0.c > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-4.c > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-5.c > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-run-0.c > create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-run-1.c > > diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md > index de4359af00d..db1d5e1385f 100644 > --- a/gcc/config/riscv/constraints.md > +++ b/gcc/config/riscv/constraints.md > @@ -108,7 +108,7 @@ (define_constraint "DnS" > (define_constraint "G" >"@internal" >(and (match_code "const_double") > - (match_test "op == CONST0_RTX (mode)"))) > + (match_test "riscv_float_const_zero_rtx_p (op)"))) > > (define_memory_constraint "A" >"An address that is held in a general-purpose register." > diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md > index b87a6900841..b428d842101 100644 > --- a/gcc/config/riscv/predicates.md > +++ b/gcc/config/riscv/predicates.md > @@ -78,7 +78,7 @@ (define_predicate "sleu_operand" > > (define_predicate "const_0_operand" >