[PATCH] tree-optimization/107865 - ICE with outlining of loops
The following makes sure to clear loops number of iterations when outlining them as part of a SESE region as can happen with auto-parallelization. The referenced SSA names become stale otherwise. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. PR tree-optimization/107865 * tree-cfg.cc (move_sese_region_to_fn): Free the number of iterations of moved loops. * gfortran.dg/graphite/pr107865.f90: New testcase. --- .../gfortran.dg/graphite/pr107865.f90 | 18 ++ gcc/tree-cfg.cc| 2 ++ 2 files changed, 20 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/graphite/pr107865.f90 diff --git a/gcc/testsuite/gfortran.dg/graphite/pr107865.f90 b/gcc/testsuite/gfortran.dg/graphite/pr107865.f90 new file mode 100644 index 000..6bddb17a1be --- /dev/null +++ b/gcc/testsuite/gfortran.dg/graphite/pr107865.f90 @@ -0,0 +1,18 @@ +! { dg-do compile } +! { dg-options "-O1 -floop-parallelize-all -ftree-parallelize-loops=2" } + + SUBROUTINE FNC (F) + + IMPLICIT REAL (A-H) + DIMENSION F(N) + + DO I = 1, 6 + DO J = 1, 6 +IF (J .NE. I) THEN + F(I) = F(I) + 1 +END IF + END DO + END DO + + RETURN + END diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 28175312afc..0c409b435fb 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -7859,6 +7859,8 @@ move_sese_region_to_fn (struct function *dest_cfun, basic_block entry_bb, if (bb->loop_father->header == bb) { class loop *this_loop = bb->loop_father; + /* Avoid the need to remap SSA names used in nb_iterations. */ + free_numbers_of_iterations_estimates (this_loop); class loop *outer = loop_outer (this_loop); if (outer == loop /* If the SESE region contains some bbs ending with -- 2.35.3
Re: [PATCH-1, rs6000] Generate permute index directly for little endian target [PR100866]
Hi Haochen, Sorry for the late review. on 2022/10/11 15:38, HAO CHEN GUI wrote: > Hi, > This patch modifies the help function which generates permute index for > vector byte reversion and generates permute index directly for little endian > targets. It saves one "xxlnor" instructions on P8 little endian targets as > the original process needs an "xxlnor" to calculate complement for the index. > Nice. > Bootstrapped and tested on ppc64 Linux BE and LE with no regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-10-11 Haochen Gui > > gcc/ > PR target/100866 > * config/rs6000/rs6000-call.cc (swap_endian_selector_for_mode): > Generate permute index directly for little endian targets. > * config/rs6000/vsx.md (revb_): Call vprem directly with > corresponding permute indexes. > > gcc/testsuite/ > PR target/100866 > * gcc.target/powerpc/pr100866.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-call.cc > b/gcc/config/rs6000/rs6000-call.cc > index 551968b0995..bad8e9e0e52 100644 > --- a/gcc/config/rs6000/rs6000-call.cc > +++ b/gcc/config/rs6000/rs6000-call.cc > @@ -2839,7 +2839,10 @@ swap_endian_selector_for_mode (machine_mode mode) > } > >for (i = 0; i < 16; ++i) > -perm[i] = GEN_INT (swaparray[i]); > +if (BYTES_BIG_ENDIAN) > + perm[i] = GEN_INT (swaparray[i]); > +else > + perm[i] = GEN_INT (~swaparray[i] & 0x001f); IMHO, it would be good to add a function comment for this function, it's sad that we didn't have it before. With this patch, the selector (perm) is expected to be used with vperm direct as shown below, it would be good to note it explicitly for other potential callers too. > >return force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, >gen_rtvec_v (16, perm))); > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index e226a93bbe5..b68eba48d2c 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -6096,8 +6096,8 @@ (define_expand "revb_" >to the endian mode in use, i.e. in LE mode, put elements >in BE order. */ >rtx sel = swap_endian_selector_for_mode(mode); > - emit_insn (gen_altivec_vperm_ (operands[0], operands[1], > -operands[1], sel)); > + emit_insn (gen_altivec_vperm__direct (operands[0], operands[1], > + operands[1], sel));> } > >DONE; > diff --git a/gcc/testsuite/gcc.target/powerpc/pr100866.c > b/gcc/testsuite/gcc.target/powerpc/pr100866.c > new file mode 100644 > index 000..c708dfd502e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr100866.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > +/* { dg-final { scan-assembler-not "xxlnor" } } */ Nit: may be better with {\mxxlnor\M}? The others look good to me. Thanks! BR, Kewen > + > +#include > + > +vector unsigned short revb(vector unsigned short a) > +{ > + return vec_revb(a); > +} >
[PATCH V3] [x86] Fix incorrect _mm_cvtsbh_ss.
Update in V3: Remove !flag_signaling_nans since there's already HONOR_NANS (BFmode). Here's the patch: After supporting real __bf16, the implementation of _mm_cvtsbh_ss went wrong. The patch add a builtin to generate pslld for the intrinsic, also extendbfsf2 is supported with pslld when !HONOR_NANS (BFmode). truncsfbf2 is supported with vcvtneps2bf16 when !HONOR_NANS (BFmode) && flag_unsafe_math_optimizations. gcc/ChangeLog: PR target/107748 * config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Refined. * config/i386/i386-builtin-types.def (FLOAT_FTYPE_BFLOAT16): New function type. * config/i386/i386-builtin.def (BDESC): New builtin. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle the builtin. * config/i386/i386.md (extendbfsf2): New expander. (extendbfsf2_1): New define_insn. (truncsfbf2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Scan pslld. * gcc.target/i386/extendbfsf.c: New test. --- gcc/config/i386/avx512bf16intrin.h| 4 +- gcc/config/i386/i386-builtin-types.def| 1 + gcc/config/i386/i386-builtin.def | 2 + gcc/config/i386/i386-expand.cc| 1 + gcc/config/i386/i386.md | 40 ++- .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c | 3 +- gcc/testsuite/gcc.target/i386/extendbfsf.c| 16 7 files changed, 61 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/extendbfsf.c diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf16intrin.h index ea1d0125b3f..75378af5584 100644 --- a/gcc/config/i386/avx512bf16intrin.h +++ b/gcc/config/i386/avx512bf16intrin.h @@ -46,9 +46,7 @@ extern __inline float __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_cvtsbh_ss (__bf16 __A) { - union{ float a; unsigned int b;} __tmp; - __tmp.b = ((unsigned int)(__A)) << 16; - return __tmp.a; + return __builtin_ia32_cvtbf2sf (__A); } /* vcvtne2ps2bf16 */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index d10de32643f..65fe070e37f 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1281,6 +1281,7 @@ DEF_FUNCTION_TYPE (V4SI, V4SI, V4SI, UHI) DEF_FUNCTION_TYPE (V8SI, V8SI, V8SI, UHI) # BF16 builtins +DEF_FUNCTION_TYPE (FLOAT, BFLOAT16) DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF) DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, V32BF, USI) DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, USI) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 5e0461acc00..d85b1753039 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2838,6 +2838,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builtin_ia32_dpbf16ps_v4sf", IX86_BUILTIN_DPBF16PS_V4SF, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF) BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPBF16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPBF16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI) +BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_extendbfsf2_1, "__builtin_ia32_cvtbf2sf", IX86_BUILTIN_CVTBF2SF, UNKNOWN, (int) FLOAT_FTYPE_BFLOAT16) + /* AVX512FP16. */ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 0373c3614a4..d26e7e41445 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10423,6 +10423,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, return ix86_expand_sse_ptest (d, exp, target); case FLOAT128_FTYPE_FLOAT128: case FLOAT_FTYPE_FLOAT: +case FLOAT_FTYPE_BFLOAT16: case INT_FTYPE_INT: case UINT_FTYPE_UINT: case UINT16_FTYPE_UINT16: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 01faa911b77..9451883396c 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -130,6 +130,7 @@ (define_c_enum "unspec" [ ;; For AVX/AVX512F support UNSPEC_SCALEF UNSPEC_PCMP + UNSPEC_CVTBFSF ;; Generic math support UNSPEC_IEEE_MIN ; not commutative @@ -4961,6 +4962,31 @@ (define_insn "*extendhf2" (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_expand "extendbfsf2" + [(set (match_operand:SF 0 "register_operand") + (unspec:SF + [(match_operand:BF 1 "register_operand")] +UNSPEC_CVTBFS
Re: [PATCH V2] Update block move for struct param or returns
Based on the discussions in previous mails: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607139.html https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607197.html I will update the patch accordingly, and then submit a new version. BR, Jeff (Jiufu) Jiufu Guo writes: > Hi, > > When assigning a parameter to a variable, or assigning a variable to > return value with struct type, "block move" are used to expand > the assignment. It would be better to use the register mode according > to the target/ABI to move the blocks. And then this would raise more > opportunities for other optimization passes(cse/dse/xprop). > > As the example code (like code in PR65421): > > typedef struct SA {double a[3];} A; > A ret_arg_pt (A *a){return *a;} // on ppc64le, only 3 lfd(s) > A ret_arg (A a) {return a;} // just empty fun body > void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s) > > This patch is based on the previous version which supports assignments > from parameter: > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605709.html > This patch also supports returns. > > I also tried to update gimplify/nrv to replace "return D.xxx;" with > "return ;". While there is one issue: "" with > PARALLEL code can not be accessed through address/component_ref. > This issue blocks a few passes (e.g. sra, expand). > > On ppc64, some dead stores are not eliminated. e.g. for ret_arg: > .cfi_startproc > std 4,56(1)//reductant > std 5,64(1)//reductant > std 6,72(1)//reductant > std 4,0(3) > std 5,8(3) > std 6,16(3) > blr > > Bootstraped and regtested on ppc64le and x86_64. > > I'm wondering if this patch could be committed first. > Thanks for the comments and suggestions. > > > BR, > Jeff (Jiufu) > > PR target/65421 > > gcc/ChangeLog: > > * cfgexpand.cc (expand_used_vars): Add collecting return VARs. > (expand_gimple_stmt_1): Call expand_special_struct_assignment. > (pass_expand::execute): Free collections of return VARs. > * expr.cc (expand_special_struct_assignment): New function. > * expr.h (expand_special_struct_assignment): Declare. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr65421-1.c: New test. > * gcc.target/powerpc/pr65421.c: New test. > > --- > gcc/cfgexpand.cc | 37 + > gcc/expr.cc | 43 > gcc/expr.h | 3 ++ > gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 21 ++ > gcc/testsuite/gcc.target/powerpc/pr65421.c | 19 + > 5 files changed, 123 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > index dd29c03..f185de39341 100644 > --- a/gcc/cfgexpand.cc > +++ b/gcc/cfgexpand.cc > @@ -341,6 +341,9 @@ static hash_map *decl_to_stack_part; > all of them in one big sweep. */ > static bitmap_obstack stack_var_bitmap_obstack; > > +/* Those VARs on returns. */ > +static bitmap return_vars; > + > /* An array of indices such that stack_vars[stack_vars_sorted[i]].size > is non-decreasing. */ > static size_t *stack_vars_sorted; > @@ -2158,6 +2161,24 @@ expand_used_vars (bitmap forced_stack_vars) > frame_phase = off ? align - off : 0; >} > > + /* Collect VARs on returns. */ > + return_vars = NULL; > + if (DECL_RESULT (current_function_decl) > + && TYPE_MODE (TREE_TYPE (DECL_RESULT (current_function_decl))) == > BLKmode) > +{ > + return_vars = BITMAP_ALLOC (NULL); > + > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) > + { > + tree val = gimple_return_retval (ret); > + if (val && VAR_P (val)) > + bitmap_set_bit (return_vars, DECL_UID (val)); > + } > +} > + >/* Set TREE_USED on all variables in the local_decls. */ >FOR_EACH_LOCAL_DECL (cfun, i, var) > TREE_USED (var) = 1; > @@ -3942,6 +3963,17 @@ expand_gimple_stmt_1 (gimple *stmt) > /* This is a clobber to mark the going out of scope for >this LHS. */ > expand_clobber (lhs); > + else if ((TREE_CODE (rhs) == PARM_DECL && DECL_INCOMING_RTL (rhs) > + && TYPE_MODE (TREE_TYPE (rhs)) == BLKmode > + && (GET_CODE (DECL_INCOMING_RTL (rhs)) == PARALLEL > + || REG_P (DECL_INCOMING_RTL (rhs > + || (VAR_P (lhs) && return_vars > + && DECL_RTL_SET_P (DECL_RESULT (current_function_decl)) > + && GET_CODE ( > + DECL_RTL (DECL_RESULT (current_function_decl))) > + == PARALLEL > + && bitmap_bit_p (return_vars
Re: [PATCH V2] Use subscalar mode to move struct block for parameter
Hi Richard, Thanks a lot for your comments! Richard Biener writes: > On Wed, 23 Nov 2022, Jiufu Guo wrote: > >> Hi Jeff, >> >> Thanks a lot for your comments! > > Sorry for the late response ... > >> Jeff Law writes: >> >> > On 11/20/22 20:07, Jiufu Guo wrote: >> >> Jiufu Guo writes: >> >> >> >>> Hi, >> >>> >> >>> As mentioned in the previous version patch: >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >> >>> The suboptimal code is generated for "assigning from parameter" or >> >>> "assigning to return value". >> >>> This patch enhances the assignment from parameters like the below >> >>> cases: >> >>> /case1.c >> >>> typedef struct SA {double a[3];long l; } A; >> >>> A ret_arg (A a) {return a;} >> >>> void st_arg (A a, A *p) {*p = a;} >> >>> >> >>> case2.c >> >>> typedef struct SA {double a[3];} A; >> >>> A ret_arg (A a) {return a;} >> >>> void st_arg (A a, A *p) {*p = a;} >> >>> >> >>> For this patch, bootstrap and regtest pass on ppc64{,le} >> >>> and x86_64. >> >>> * Besides asking for help reviewing this patch, I would like to >> >>> consult comments about enhancing for "assigning to returns". >> >> I updated the patch to fix the issue for returns. This patch >> >> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used >> >> by a return stmt. This patch fix the issue in expand pass only, >> >> so, we would try to update the patch to avoid this flag. >> >> >> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc >> >> index dd29c03..09b8ec64cea 100644 >> >> --- a/gcc/cfgexpand.cc >> >> +++ b/gcc/cfgexpand.cc >> >> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) >> >> frame_phase = off ? align - off : 0; >> >> } >> >> + /* Collect VARs on returns. */ >> >> + if (DECL_RESULT (current_function_decl)) >> >> +{ >> >> + edge_iterator ei; >> >> + edge e; >> >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >> >> + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) >> >> + { >> >> + tree val = gimple_return_retval (ret); >> >> + if (val && VAR_P (val)) >> >> + DECL_USEDBY_RETURN_P (val) = 1; > > you probably want to check && auto_var_in_fn (val, ...) since val > might be global? Since we are collecting the return vals, it would be built in function gimplify_return_expr. In this function, create_tmp_reg is used and a local temp. So it would not be a global var here. code piece in gimplify_return_expr: if (!result_decl) result = NULL_TREE; else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl))) { result = result_decl; } else if (gimplify_ctxp->return_temp) result = gimplify_ctxp->return_temp; else { result = create_tmp_reg (TREE_TYPE (result_decl)); Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns false for target like ppc64le, because result of "hard_function_value" is a "rtx with PARALLELL code". And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a reg here. it built a DECL_VAR with RECORD type and BLK mode). I also tried the way to use RESULT_DECL for this kind of type, or let aggregate_value_p accept this kind of type. But it seems not easy, because " (RESULT_DECL with PARALLEL)" is not ok for address operations. > >> >> + } >> >> +} >> >> + >> >> /* Set TREE_USED on all variables in the local_decls. */ >> >> FOR_EACH_LOCAL_DECL (cfun, i, var) >> >> TREE_USED (var) = 1; >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> >> index d9407432ea5..20973649963 100644 >> >> --- a/gcc/expr.cc >> >> +++ b/gcc/expr.cc >> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool >> >> nontemporal) >> >> return; >> >> } > > I miss an explanatory comment here on that the following is heuristics > and its reasoning. > >> >> + if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) >> >> + && TYPE_MODE (TREE_TYPE (from)) == BLKmode > > Why check TYPE_MODE here? Do you want AGGREGATE_TYPE_P on the type > instead? Checking BLK, because I want make sure the param should occur on register and stack (saved from register). Actualy, my intention is checking: GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode For code: GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL || REG_P (DECL_INCOMING_RTL (from))) This checking indicates if the param may be passing via 2 or more registers. Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more readable. I would have a test. > >> >> + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL >> >> +|| REG_P (DECL_INCOMING_RTL (from >> >> + || (VAR_P (to) && DECL_USEDBY_RETURN_P (to) >> >> + && TYPE_MODE (TREE_TYPE (to)) == BLKmode > > Likewise. > >> >> + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) >> >> +== PARALLEL)) > > Not REG_P here? REG_P with BLK on return would
Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]
Hi Richard, on 2022/11/23 00:08, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi Richard, >> >> Many thanks for your review comments! >> > on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote: >> Hi, >> >> As discussed in PR98125, -fpatchable-function-entry with >> SECTION_LINK_ORDER support doesn't work well on powerpc64 >> ELFv1 because the filled "Symbol" in >> >> .section name,"flags"o,@type,Symbol >> >> sits in .opd section instead of in the function_section >> like .text or named .text*. >> >> Since we already generates one label LPFE* which sits in >> function_section of current_function_decl, this patch is >> to reuse it as the symbol for the linked_to section. It >> avoids the above ABI specific issue when using the symbol >> concluded from current_function_decl. >> >> Besides, with this support some previous workarounds for >> powerpc64 ELFv1 can be reverted. >> >> btw, rs6000_print_patchable_function_entry can be dropped >> but there is another rs6000 patch which needs this rs6000 >> specific hook rs6000_print_patchable_function_entry, not >> sure which one gets landed first, so just leave it here. >> >> Bootstrapped and regtested on below: >> >> 1) powerpc64-linux-gnu P8 with default binutils 2.27 >> and latest binutils 2.39. >> 2) powerpc64le-linux-gnu P9 (default binutils 2.30). >> 3) powerpc64le-linux-gnu P10 (default binutils 2.30). >> 4) x86_64-redhat-linux with default binutils 2.30 >> and latest binutils 2.39. >> 5) aarch64-linux-gnu with default binutils 2.30 >> and latest binutils 2.39. >> >> >> [snip...] >> >> diff --git a/gcc/varasm.cc b/gcc/varasm.cc >> index 4db8506b106..d4de6e164ee 100644 >> --- a/gcc/varasm.cc >> +++ b/gcc/varasm.cc >> @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, >> unsigned int flags, >> fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE); >>if (flags & SECTION_LINK_ORDER) >> { >> - tree id = DECL_ASSEMBLER_NAME (decl); >> - ultimate_transparent_alias_target (&id); >> - const char *name = IDENTIFIER_POINTER (id); >> - name = targetm.strip_name_encoding (name); >> - fprintf (asm_out_file, ",%s", name); >> + /* For now, only section "__patchable_function_entries" >> + adopts flag SECTION_LINK_ORDER, internal label LPFE* >> + was emitted in default_print_patchable_function_entry, >> + just place it here for linked_to section. */ >> + gcc_assert (!strcmp (name, "__patchable_function_entries")); >>> >>> I like the idea of removing the rs600 workaround in favour of making the >>> target-independent more robust. But this seems a bit hackish. What >>> would we do if SECTION_LINK_ORDER was used for something else in future? >>> >> >> Good question! I think it depends on how we can get the symbol for the >> linked_to section, if adopting the name of the decl will suffer the >> similar issue which this patch wants to fix, we have to reuse the label >> LPFE* or some kind of new artificial label in the related section; or >> we can just go with the name of the given decl, or something related to >> that decl. Since we can't predict any future uses, I just placed an >> assertion here to ensure that we would revisit and adjust this part at >> that time. Does it sound reasonable to you? > > Yeah, I guess that's good enough. If the old scheme ends up being > correct for some future use, we can make the new behaviour conditional > on __patchable_function_entries. Yes, we can check if the given section name is "__patchable_function_entries". > > So yeah, the patch LGTM to me, thanks. Thanks again! I rebased and re-tested it on x86/aarch64/powerpc64{,le}, just committed in r13-4294-gf120196382ac5a. BR, Kewen
[PATCH] [OpenMP] GC unused SIMD clones
This patch is a followup to my not-yet-reviewed patch [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target" https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html In comments on a previous iteration of that patch, I was asked to do something to delete unused SIMD clones to avoid code bloat; this is it. I've implemented something like a simple mark-and-sweep algorithm. Clones that are used are marked at the point where the call is generated in the vectorizer. The loop that iterates over functions to apply the passes after IPA is modified to defer processing of unmarked clones, and anything left over is deleted. OK to commit this along with the above-linked patch? -SandraFrom bfffcea926d4dfb6275346237c61922a95c9e715 Mon Sep 17 00:00:00 2001 From: Sandra Loosemore Date: Wed, 23 Nov 2022 23:14:31 + Subject: [PATCH] [OpenMP] GC unused SIMD clones SIMD clones are created during the IPA phase when it is not known whether or not the vectorizer can use them. Clones for functions with external linkage are part of the ABI, but local clones can be GC'ed if no calls are found in the compilation unit after vectorization. gcc/ChangeLog * cgraph.h (struct cgraph_node): Add gc_candidate bit, modify default constructor to initialize it. * cgraphunit.cc (expand_all_functions): Save gc_candidate functions for last and iterate to handle recursive calls. Delete leftover candidates at the end. * omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit on local clones. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear gc_candidate bit when a clone is used. gcc/testsuite/ChangeLog * testsuite/g++.dg/gomp/target-simd-clone-1.C: Tweak to test that the unused clone is GC'ed. * testsuite/gcc.dg/gomp/target-simd-clone-1.c: Likewise. --- gcc/cgraph.h | 7 ++- gcc/cgraphunit.cc | 49 --- gcc/omp-simd-clone.cc | 5 ++ .../g++.dg/gomp/target-simd-clone-1.C | 7 ++- .../gcc.dg/gomp/target-simd-clone-1.c | 6 ++- gcc/tree-vect-stmts.cc| 3 ++ 6 files changed, 66 insertions(+), 11 deletions(-) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 4be67e3cea9..b065677a8d0 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -891,7 +891,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node versionable (false), can_change_signature (false), redefined_extern_inline (false), tm_may_enter_irr (false), ipcp_clone (false), declare_variant_alt (false), - calls_declare_variant_alt (false), m_uid (uid), m_summary_id (-1) + calls_declare_variant_alt (false), gc_candidate (false), + m_uid (uid), m_summary_id (-1) {} /* Remove the node from cgraph and all inline clones inlined into it. @@ -1490,6 +1491,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node unsigned declare_variant_alt : 1; /* True if the function calls declare_variant_alt functions. */ unsigned calls_declare_variant_alt : 1; + /* True if the function should only be emitted if it is used. This flag + is set for local SIMD clones when they are created and cleared if the + vectorizer uses them. */ + unsigned gc_candidate : 1; private: /* Unique id of the node. */ diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc index b05d790bf8d..587daf5674e 100644 --- a/gcc/cgraphunit.cc +++ b/gcc/cgraphunit.cc @@ -1996,19 +1996,52 @@ expand_all_functions (void) /* Output functions in RPO so callees get optimized before callers. This makes ipa-ra and other propagators to work. - FIXME: This is far from optimal code layout. */ - for (i = new_order_pos - 1; i >= 0; i--) -{ - node = order[i]; + FIXME: This is far from optimal code layout. + Make multiple passes over the list to defer processing of gc + candidates until all potential uses are seen. */ + int gc_candidates = 0; + int prev_gc_candidates = 0; - if (node->process) + while (1) +{ + for (i = new_order_pos - 1; i >= 0; i--) { - expanded_func_count++; - node->process = 0; - node->expand (); + node = order[i]; + + if (node->gc_candidate) + gc_candidates++; + else if (node->process) + { + expanded_func_count++; + node->process = 0; + node->expand (); + } } + if (!gc_candidates || gc_candidates == prev_gc_candidates) + break; + prev_gc_candidates = gc_candidates; + gc_candidates = 0; } + /* Free any unused gc_candidate functions. */ + if (gc_candidates) +for (i = new_order_pos - 1; i >= 0; i--) + { + node = order[i]; + if (node->gc_candidate) + { + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + if (symtab->dump_file) + fprintf (symtab->dump_file, + "Deleting unused function %s\n", + IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl))); + n
[committed] libstdc++: Change return type of std::bit_width to int (LWG 3656)
Tested x86_64-linux. Pushed to trunk. -- >8 -- libstdc++-v3/ChangeLog: * doc/html/manual/bugs.html: Regenerate. * doc/xml/manual/intro.xml: Document LWG 3656 change. * include/std/bit (__bit_width, bit_width): Return int. * testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc: New test. --- libstdc++-v3/doc/html/manual/bugs.html| 4 libstdc++-v3/doc/xml/manual/intro.xml | 7 +++ libstdc++-v3/include/std/bit | 6 -- .../26_numerics/bit/bit.pow.two/lwg3656.cc| 15 +++ 4 files changed, 30 insertions(+), 2 deletions(-) create mode 100644 libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc diff --git a/libstdc++-v3/doc/html/manual/bugs.html b/libstdc++-v3/doc/html/manual/bugs.html index 58600cd6ede..c4a2c26ea39 100644 --- a/libstdc++-v3/doc/html/manual/bugs.html +++ b/libstdc++-v3/doc/html/manual/bugs.html @@ -619,4 +619,8 @@ path::lexically_relative is confused by trailing slashes Implement the fix for trailing slashes. +http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#3656"; target="_top">3656: + Inconsistent bit operations returning a count + +Changed bit_width to return int. Prev Up NextLicense Home Chapter 2. Setup \ No newline at end of file diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml index dee01c82159..aee96e37c61 100644 --- a/libstdc++-v3/doc/xml/manual/intro.xml +++ b/libstdc++-v3/doc/xml/manual/intro.xml @@ -1308,6 +1308,13 @@ requirements of the license of GCC. Implement the fix for trailing slashes. +http://www.w3.org/1999/xlink"; xlink:href="&DR;#3656">3656: + Inconsistent bit operations returning a count + + +Changed bit_width to return int. + + diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit index 2fd80187210..3e072ef2113 100644 --- a/libstdc++-v3/include/std/bit +++ b/libstdc++-v3/include/std/bit @@ -361,7 +361,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } template -constexpr _Tp +constexpr int __bit_width(_Tp __x) noexcept { constexpr auto _Nd = __gnu_cxx::__int_traits<_Tp>::__digits; @@ -448,9 +448,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bit_floor(_Tp __x) noexcept { return std::__bit_floor(__x); } + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3656. Inconsistent bit operations returning a count /// The smallest integer greater than the base-2 logarithm of `x`. template -constexpr _If_is_unsigned_integer<_Tp> +constexpr _If_is_unsigned_integer<_Tp, int> bit_width(_Tp __x) noexcept { return std::__bit_width(__x); } diff --git a/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc b/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc new file mode 100644 index 000..4752c3b1d33 --- /dev/null +++ b/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc @@ -0,0 +1,15 @@ +// { dg-options "-std=gnu++20" } +// { dg-do compile { target c++20 } } + +#include + +template constexpr bool is_int = false; +template<> constexpr bool is_int = true; + +// LWG 3656. Inconsistent bit operations returning a count +// Rturn type of std::bit_width(T) changed from T to int. +static_assert( is_int ); +static_assert( is_int ); +static_assert( is_int ); +static_assert( is_int ); +static_assert( is_int ); -- 2.38.1
[committed] libstdc++: Update tests on trunk [PR106201]
Tested x86_64-linux. Pushed to trunk. -- >8 -- This copies the better tests from gcc-12 to trunk. libstdc++-v3/ChangeLog: PR libstdc++/106201 * testsuite/27_io/filesystem/iterators/106201.cc: Improve test. * testsuite/experimental/filesystem/iterators/106201.cc: New test. --- .../testsuite/27_io/filesystem/iterators/106201.cc | 8 +--- .../experimental/filesystem/iterators/106201.cc| 14 ++ 2 files changed, 19 insertions(+), 3 deletions(-) create mode 100644 libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc index 4a64e675816..c5fefd9ac3f 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc @@ -5,8 +5,10 @@ // PR libstdc++/106201 constraint recursion in path(Source const&) constructor. #include -#include -using I = std::counted_iterator; +#include +#include +namespace fs = std::filesystem; +using I = std::counted_iterator; static_assert( std::swappable ); -using R = std::counted_iterator; +using R = std::counted_iterator; static_assert( std::swappable ); diff --git a/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc new file mode 100644 index 000..017b72ef5f6 --- /dev/null +++ b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc @@ -0,0 +1,14 @@ +// { dg-options "-std=gnu++20" } +// { dg-do compile { target c++20 } } +// { dg-require-filesystem-ts "" } + +// PR libstdc++/106201 constraint recursion in path(Source const&) constructor. + +#include +#include +#include +namespace fs = std::experimental::filesystem; +using I = std::counted_iterator; +static_assert( std::swappable ); +using R = std::counted_iterator; +static_assert( std::swappable ); -- 2.38.1
[PATCH] gcc/jit/jit-recording.cc: recording::global::write_to_dump: Avoid crashes when writing psuedo-C for globals with string initializers.
If a char * global was initialized with a rvalue from `gcc_jit_context_new_string_literal` containing a format string, dumping the context causes libgccjit to SIGSEGV due to an improperly constructed call to vasprintf. The following code snippet can reproduce the crash: int main(int argc, char **argv) { gcc_jit_context *ctxt = gcc_jit_context_acquire (); gcc_jit_lvalue *var = gcc_jit_context_new_global( ctxt, NULL, GCC_JIT_GLOBAL_EXPORTED, gcc_jit_context_get_type(ctxt, GCC_JIT_TYPE_CONST_CHAR_PTR), "var"); gcc_jit_global_set_initializer_rvalue( var, gcc_jit_context_new_string_literal(ctxt, "%s")); gcc_jit_context_dump_to_file (ctxt, "output", 0); return 0; } The offending line is jit-recording.cc:4922, where a call to d.write passes the initializer rvalue's debug string to `write` without a format specifier. The attached patch fixes this issue. Thanks, Vibhav -- Vibhav Pant vibh...@gmail.com GPG: 7ED1 D48C 513C A024 BE3A 785F E3FB 28CB 6AB5 9598 From e598a4076b2bff72b4a3cc29d1d70db8c53baf45 Mon Sep 17 00:00:00 2001 From: Vibhav Pant Date: Fri, 25 Nov 2022 02:02:09 +0530 Subject: [PATCH] jit-recording.cc: Dump string literal initializers correctly --- gcc/jit/jit-recording.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc index 6ae5a667e90..7bb98ddcb42 100644 --- a/gcc/jit/jit-recording.cc +++ b/gcc/jit/jit-recording.cc @@ -4919,7 +4919,7 @@ recording::global::write_to_dump (dump &d) else if (m_rvalue_init) { d.write (" = "); - d.write (m_rvalue_init->get_debug_string ()); + d.write ("%s", m_rvalue_init->get_debug_string ()); d.write (";\n"); } -- 2.38.1 signature.asc Description: This is a digitally signed message part
Re: [PATCH] Fortran: error recovery on associate with bad selector [PR107577]
Hi Harald, please find attached an obvious patch by Steve for a technical regression that resulted from improvements in error recovery of bad uses of associate. Regtested on x86_64-pc-linux-gnu. Will commit soon unless there are comments. Obvious enough, I think. Thanks! As a sidenote: the testcase shows that we resolve the associate names quite often, likely more often than necessary, resulting in many error messages produced for the same line of code. In the present case, each use of the bad name produces two errors, one where it is used, and one at the associate statement. That is probably not helpful for the user. We have an "error" flag in gfc_expr, which we use infrequently to avoid repetitions of errors. If an error has already been issued for an expression, then we could set this. We would have to be careful about resetting the error on matching though, so it is probably better to only use it during resolution. Best regards Thomas
Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, November 23, 2022 4:18 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830] >> >> Tamar Christina writes: >> > Hi All, >> > >> > SVE has an actual division optab, and when using -Os we don't optimize >> > the division away. This means that we need to distinguish between a >> > div which we can optimize and one we cannot even during expansion. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Ok for master? >> > >> > Thanks, >> > Tamar >> > >> > gcc/ChangeLog: >> > >> >PR target/107830 >> >* config/aarch64/aarch64.cc >> >(aarch64_vectorize_can_special_div_by_constant): Check validity >> during >> >codegen phase as well. >> > >> > gcc/testsuite/ChangeLog: >> > >> >PR target/107830 >> >* gcc.target/aarch64/sve2/pr107830.c: New test. >> > >> > --- inline copy of patch -- >> > diff --git a/gcc/config/aarch64/aarch64.cc >> > b/gcc/config/aarch64/aarch64.cc index >> > >> 4176d7b046a126664360596b6db79a43e77ff76a..bee23625807af95d5ec15ad45 >> 702 >> > 961b2d7ab55d 100644 >> > --- a/gcc/config/aarch64/aarch64.cc >> > +++ b/gcc/config/aarch64/aarch64.cc >> > @@ -24322,12 +24322,15 @@ >> aarch64_vectorize_can_special_div_by_constant (enum tree_code code, >> >if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) >> > return false; >> > >> > + wide_int val = wi::add (cst, 1); >> > + int pow = wi::exact_log2 (val); >> > + bool valid_p = pow == (int)(element_precision (vectype) / 2); >> > + /* SVE actually has a div operator, we we may have gotten here through >> > + that route. */ >> >if (in0 == NULL_RTX && in1 == NULL_RTX) >> > -{ >> > - wide_int val = wi::add (cst, 1); >> > - int pow = wi::exact_log2 (val); >> > - return pow == (int)(element_precision (vectype) / 2); >> > -} >> > +return valid_p; >> > + else if (!valid_p) >> > +return false; >> >> Is this equivalent to: >> >> int pow = wi::exact_log2 (cst + 1); >> if (pow != (int) (element_precision (vectype) / 2)) >> return false; >> >> /* We can use the optimized pattern. */ >> if (in0 == NULL_RTX && in1 == NULL_RTX) >> return true; >> >> ? If so, I'd find that slightly easier to follow, but I realise it's >> personal taste. >> OK with that change if it works and you agree. >> >> While looking at this, I noticed that we ICE for: >> >> void f(unsigned short *restrict p1, unsigned int *restrict p2) >> { >> for (int i = 0; i < 16; ++i) >> { >> p1[i] /= 0xff; >> p2[i] += 1; >> } >> } >> >> for -march=armv8-a+sve2 -msve-vector-bits=512. I guess we need to filter >> out partial modes or (better) add support for them. Adding support for them >> probably requires changes to the underlying ADDHNB pattern. > > I've prevented the ice by checking if the expansion for the mode exists. I'd > like to > defer adding partial support because when I tried I had to modify some > iterators > as well and need to check that it's safe to do so. Sounds good. > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/107830 > * config/aarch64/aarch64.cc > (aarch64_vectorize_can_special_div_by_constant): Check validity during > codegen phase as well. > > gcc/testsuite/ChangeLog: > > PR target/107830 > * gcc.target/aarch64/sve2/pr107830-1.c: New test. > * gcc.target/aarch64/sve2/pr107830-2.c: New test. > > --- inline copy of patch > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > 4176d7b046a126664360596b6db79a43e77ff76a..02aa1f34ac6155b877340d788c6d151b7c8d8bcd > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24322,12 +24322,18 @@ aarch64_vectorize_can_special_div_by_constant (enum > tree_code code, >if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) > return false; > > + wide_int val = wi::add (cst, 1); > + int pow = wi::exact_log2 (val); Does the: int pow = wi::exact_log2 (cst + 1); I suggested above not work? That seems easier to read IMO, since there are no other uses of "val". > + auto insn_code = maybe_code_for_aarch64_bitmask_udiv3 (TYPE_MODE > (vectype)); > + /* SVE actually has a div operator, we may have gotten here through > + that route. */ > + if (pow != (int)(element_precision (vectype) / 2) Formatting nit: should be a space after "(int)". OK with those changes, thanks. Richard > + || insn_code == CODE_FOR_nothing) > +return false; > + > + /* We can use the optimized pattern. */ >if (in0 == NULL_RTX && in1 == NULL_RTX) > -{ > - wide_int val = wi::add (cst, 1); > - int pow = wi::exact_log2 (val); > - ret
RE: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, November 23, 2022 4:18 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; Kyrylo Tkachov > Subject: Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830] > > Tamar Christina writes: > > Hi All, > > > > SVE has an actual division optab, and when using -Os we don't optimize > > the division away. This means that we need to distinguish between a > > div which we can optimize and one we cannot even during expansion. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR target/107830 > > * config/aarch64/aarch64.cc > > (aarch64_vectorize_can_special_div_by_constant): Check validity > during > > codegen phase as well. > > > > gcc/testsuite/ChangeLog: > > > > PR target/107830 > > * gcc.target/aarch64/sve2/pr107830.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/config/aarch64/aarch64.cc > > b/gcc/config/aarch64/aarch64.cc index > > > 4176d7b046a126664360596b6db79a43e77ff76a..bee23625807af95d5ec15ad45 > 702 > > 961b2d7ab55d 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -24322,12 +24322,15 @@ > aarch64_vectorize_can_special_div_by_constant (enum tree_code code, > >if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) > > return false; > > > > + wide_int val = wi::add (cst, 1); > > + int pow = wi::exact_log2 (val); > > + bool valid_p = pow == (int)(element_precision (vectype) / 2); > > + /* SVE actually has a div operator, we we may have gotten here through > > + that route. */ > >if (in0 == NULL_RTX && in1 == NULL_RTX) > > -{ > > - wide_int val = wi::add (cst, 1); > > - int pow = wi::exact_log2 (val); > > - return pow == (int)(element_precision (vectype) / 2); > > -} > > +return valid_p; > > + else if (!valid_p) > > +return false; > > Is this equivalent to: > > int pow = wi::exact_log2 (cst + 1); > if (pow != (int) (element_precision (vectype) / 2)) > return false; > > /* We can use the optimized pattern. */ > if (in0 == NULL_RTX && in1 == NULL_RTX) > return true; > > ? If so, I'd find that slightly easier to follow, but I realise it's > personal taste. > OK with that change if it works and you agree. > > While looking at this, I noticed that we ICE for: > > void f(unsigned short *restrict p1, unsigned int *restrict p2) > { > for (int i = 0; i < 16; ++i) > { > p1[i] /= 0xff; > p2[i] += 1; > } > } > > for -march=armv8-a+sve2 -msve-vector-bits=512. I guess we need to filter > out partial modes or (better) add support for them. Adding support for them > probably requires changes to the underlying ADDHNB pattern. I've prevented the ice by checking if the expansion for the mode exists. I'd like to defer adding partial support because when I tried I had to modify some iterators as well and need to check that it's safe to do so. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/107830 * config/aarch64/aarch64.cc (aarch64_vectorize_can_special_div_by_constant): Check validity during codegen phase as well. gcc/testsuite/ChangeLog: PR target/107830 * gcc.target/aarch64/sve2/pr107830-1.c: New test. * gcc.target/aarch64/sve2/pr107830-2.c: New test. --- inline copy of patch diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 4176d7b046a126664360596b6db79a43e77ff76a..02aa1f34ac6155b877340d788c6d151b7c8d8bcd 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24322,12 +24322,18 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) return false; + wide_int val = wi::add (cst, 1); + int pow = wi::exact_log2 (val); + auto insn_code = maybe_code_for_aarch64_bitmask_udiv3 (TYPE_MODE (vectype)); + /* SVE actually has a div operator, we may have gotten here through + that route. */ + if (pow != (int)(element_precision (vectype) / 2) + || insn_code == CODE_FOR_nothing) +return false; + + /* We can use the optimized pattern. */ if (in0 == NULL_RTX && in1 == NULL_RTX) -{ - wide_int val = wi::add (cst, 1); - int pow = wi::exact_log2 (val); - return pow == (int)(element_precision (vectype) / 2); -} +return true; if (!VECTOR_TYPE_P (vectype)) return false; diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c new file mode 100644 index ..6d8ee3615fdb0083dbde1e45a2826fb681726139 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c @@ -0,0 +1,13 @@ +/* { dg-do c
[Patch] libgomp: Add no-target-region rev offload test + fix plugin-nvptx
The nvptx reverse-offload code mishandled the case that there was a reverse offload function that isn't called inside a target region. In that case, the linker did not include GOMP_target_ext and the global variable it uses. But the plugin-nvptx.c code expected that the latter is present. Found via sollve_vv's tests/5.0/requires/test_requires_reverse_offload.c which is similar to the new testcase. (Albeit the 'if' and comments imply that the sollve_vv author did not intend this.) Solution: Handle it gracefully that the global variable does not exist - and do this check first - and only when successful allocate dev->rev_data. If not, deallocate rev_fn_table to disable reverse offload handling. OK for mainline? Tobias PS: Admittedly, the nvptx code is not yet exercised as I still have to submit the libgomp/target.c code handling the reverse offload (+ enabling requires reverse_offload in plugin-nvptx.c). As it is obvious from this patch, the target.c patch is nearly but not yet completely ready. - That patch passes the three sollve_vv testcases and also the existing libgomp testcases, but some corner cases and more testcases are missing. - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp: Add no-target-region rev offload test + fix plugin-nvptx OpenMP permits that a 'target device(ancestor:1)' is called without being enclosed in a target region - using the current device (i.e. the host) in that case. This commit adds a testcase for this. In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal gracefully by disabling reverse offload and assuming that the failure is fine. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR as valid and the code having no reverse-offload code. * testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test. libgomp/plugin/plugin-nvptx.c | 36 ++-- .../libgomp.c-c++-common/reverse-offload-2.c | 49 ++ 2 files changed, 73 insertions(+), 12 deletions(-) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 0768fca350b..e803f083591 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1390,7 +1390,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, else if (rev_fn_table) { CUdeviceptr var; - size_t bytes, i; + size_t bytes; + unsigned int i; r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module, "$offload_func_table"); if (r != CUDA_SUCCESS) @@ -1413,12 +1414,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, if (rev_fn_table && *rev_fn_table && dev->rev_data == NULL) { - /* cuMemHostAlloc memory is accessible on the device, if unified-shared - address is supported; this is assumed - see comment in - nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. */ - CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) &dev->rev_data, - sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP); - CUdeviceptr dp = (CUdeviceptr) dev->rev_data; + /* Get the on-device GOMP_REV_OFFLOAD_VAR variable. It should be + available but it might be not. One reason could be: if the user code + has 'omp target device(ancestor:1)' in pure hostcode, GOMP_target_ext + is not called on the device and, hence, it and GOMP_REV_OFFLOAD_VAR + are not linked in. */ CUdeviceptr device_rev_offload_var; size_t device_rev_offload_size; CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, @@ -1426,11 +1426,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, &device_rev_offload_size, module, XSTRING (GOMP_REV_OFFLOAD_VAR)); if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuModuleGetGlobal error - GOMP_REV_OFFLOAD_VAR: %s", cuda_error (r)); - r = CUDA_CALL_NOCHECK (cuMemcpyHtoD, device_rev_offload_var, &dp, - sizeof (dp)); - if (r != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r)); + { + free (*rev_fn_table); + *rev_fn_table = NULL; + } + else + { + /* cuMemHostAlloc memory is accessible on the device, if + unified-shared address is supported; this is assumed - see comment + in nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. */ + CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) &dev->rev_data, + sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP); + CUdeviceptr dp = (CUdeviceptr) dev->rev
Re: [Patch Arm] Fix PR 92999
On 11/11/2022 21:50, Ramana Radhakrishnan via Gcc-patches wrote: On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan wrote: On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw wrote: On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote: On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote: PR92999 is a case where the VFP calling convention does not allocate enough FP registers for a homogenous aggregate containing FP16 values. I believe this is the complete fix but would appreciate another set of eyes on this. Could I get a hand with a regression test run on an armhf environment while I fix my environment ? gcc/ChangeLog: PR target/92999 * config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle aggregates with elements smaller than SFmode. gcc/testsuite/ChangeLog: * gcc.target/arm/pr92999.c: New test. Thanks, Ramana Signed-off-by: Ramana Radhakrishnan I'm not sure about this. The AAPCS does not mention a base type of a half-precision FP type as an appropriate homogeneous aggregate for using VFP registers for either calling or returning. Ooh interesting, thanks for taking a look and poking at the AAPCS and that's a good catch. BF16 should also have the same behaviour as FP16 , I suspect ? I suspect I got caught out by the definition of the Homogenous aggregate from Section 5.3.5 ((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates) which simply suggests it's an aggregate of fundamental types which lists half precision floating point . A homogeneous aggregate is any aggregate that fits the general definition, but only HAs of specific types are of interest for the VFP PCS rules. The problem we have is that when we added HFmode (and later BF16mode) support we didn't notice that the base types are VFP candidates, but the nested types (eg in records or arrays) are not. The problems started around SVN r236269 (git:1b81a1c1bd53) when we added FP16 support. FTR, ideally I should have read 7.1.2.1 https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling) :) So perhaps the bug is that we try to treat this as a homogeneous aggregate at all. Yep I agree - I'll take a look again tomorrow and see if I can get a fix. (And thanks Alex for the test run, I might trouble you again while I still (slowly) get some of my boards back up) and as promised take 2. I'd really prefer another review on this one to see if I've not missed anything in the cases below. I think I'd prefer to try and fix this at the point where we accept the base types, ie around: case REAL_TYPE: mode = TYPE_MODE (type); if (mode != DFmode && mode != SFmode && mode != HFmode && mode != BFmode) return -1; by changing this to something like /* HFmode and BFmode can be passed in registers, but are not valid base types for an HFA, so only accept these if we are at the top level. */ if (!(mode == DFmode || mode == SFmode || (depth == 0 && (mode == HFmode || mode == BFmode))) return -1; and we then pass depth into the recursion calls as an extra parameter, starting at 0 for the top level and incrementing it by 1 each time aapcs_vfp_sub_candidate recurses. For the test, would it be possible to rewrite it in the style of gcc.target/arm/aapcs/* and put it there? That would ensure that not only are the caller and callee compatible, but that the values are passed in the correct location. R.
RE: [PATCH 35/35 V2] arm: improve tests for vsetq_lane*
> -Original Message- > From: Andrea Corallo > Sent: Thursday, November 24, 2022 2:44 PM > To: Kyrylo Tkachov > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > > Subject: [PATCH 35/35 V2] arm: improve tests for vsetq_lane* > > Kyrylo Tkachov writes: > > [...] > > >> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c > >> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c > >> index e03e9620528..b5c9f4d5eb8 100644 > >> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c > >> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c > >> @@ -1,15 +1,45 @@ > >> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } > >> {""} } > */ > >> /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > >> /* { dg-add-options arm_v8_1m_mve_fp } */ > >> /* { dg-additional-options "-O2" } */ > >> +/* { dg-final { check-function-bodies "**" "" } } */ > >> > >> #include "arm_mve.h" > >> > >> +/* > >> +**foo: > >> +**... > >> +**vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|) > >> +**... > >> +*/ > >> float16x8_t > >> foo (float16_t a, float16x8_t b) > >> { > >> -return vsetq_lane_f16 (a, b, 0); > >> + return vsetq_lane_f16 (a, b, 1); > >> } > >> > > > > Hmm, for these tests we should be able to scan for more specific codegen > as we're setting individual lanes, so we should be able to scan for lane 1 in > the vmov instruction, though it may need to be flipped for big-endian. > > Thanks, > > Kyrill > > Hi Kyrill, > > please find attached the updated version of this patch. > > Big-endian should not be a problem as for my understanding is just not > supported with MVE intrinsics. Huh, that's right. This version is ok. Thanks! Kyrill > > Thanks! > > Andrea
Re: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]
>> How did you test the patch? If you bootstrapped it and ran the >> testsuite then it's OK. Yes, i ran testsuite and bootstrapped and everything seemed OK, but i missed fail of tests gcc.dg/Warray-bounds-34.c and gcc.dg/Warray-bounds-43.c, so Franz is right. After that I fixed the regexps in dg directives and now everything seems OK. > I'm pretty sure the testsuite will have regressions, as I have a very similar > patch lying around that needs these testsuite changes. You are right, thank you. I missed this, attaching corrected version of patch. > This also shows nicely why I don't like warnings with levels, what if I want > -Werror=array-bounds=2 + -Warray-bounds=1? I completely agree with you, because I also thought about using -Werror=opt=X + -Wopt=Y, this functionality looks useful. As I know, gcc, while parsing an option with the same OPT, overwrites the old config of OPT. > Because I think at least -Wuse-after-free= and Wattributes= have the same > problem. Yes, looks like this, probably should be fixed too. > BTW, is the duplicated warning description "Warn if an array is accessed out > of bounds." needed or not with Alias()? According to other examples in common.opt, duplicated description is not necessary, you are right. > I've attached my patch, feel free to integrate the testsuite changes. Thanks, but it seems to me that duplicating existing tests seems redundant to test functionality of -Werror=array-bounds=X. From bf047e36392dab138db10be2ec257d08c376ada5 Mon Sep 17 00:00:00 2001 From: Iskander Shakirzyanov Date: Thu, 24 Nov 2022 14:26:59 + Subject: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787] According to documentation the -Werror= option makes the specified warning into an error and also automatically implies this option. Then it seems that the behavior of the compiler when specifying -Werror=array-bounds=X should be the same as specifying "-Werror=array-bounds -Warray-bounds=X", so we expect to receive array-bounds pass triggers and they must be processed as errors. In practice, we observe that the array-bounds pass is indeed called, but its responses are processed as warnings, not errors. As I understand, this happens because Warray-bounds and Warray-bounds= are declared as 2 different options in common.opt, so when diagnostic_classify_diagnostic() is called, DK_ERROR is set for the Warray-bounds= option, but in diagnostic_report_diagnostic() through warning_at() passes opt_index of Warray-bounds, so information about DK_ERROR is lost. Fixed by using Alias() in declaration of Warray-bounds (similarly as in Wattribute-alias etc.) PR driver/107787 Co-authored-by: Franz Sirl gcc/ChangeLog: * common.opt (Warray-bounds): Turn into alias to -Warray-bounds=1. * builtins.cc (warn_array_bounds): Use OPT_Warray_bounds_ instead of OPT_Warray_bounds. * diagnostic-spec.cc: Likewise. * gimple-array-bounds.cc: Likewise. * gimple-ssa-warn-restrict.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/Warray-bounds-34.c: Correct the regular expression for -Warray-bounds=. * gcc.dg/Warray-bounds-43.c: Likewise. * gcc.dg/pr107787.c: New test. gcc/c-family/ChangeLog: * c-common.cc (warn_array_bounds): Use OPT_Warray_bounds_ instead of OPT_Warray_bounds. --- gcc/builtins.cc | 6 ++-- gcc/c-family/c-common.cc| 4 +-- gcc/common.opt | 3 +- gcc/diagnostic-spec.cc | 1 - gcc/gimple-array-bounds.cc | 38 - gcc/gimple-ssa-warn-restrict.cc | 2 +- gcc/testsuite/gcc.dg/Warray-bounds-34.c | 2 +- gcc/testsuite/gcc.dg/Warray-bounds-43.c | 6 ++-- gcc/testsuite/gcc.dg/pr107787.c | 13 + 9 files changed, 43 insertions(+), 32 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr107787.c diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 4dc1ca672b2..02c4fefa86f 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -696,14 +696,14 @@ c_strlen (tree arg, int only_value, c_strlen_data *data, unsigned eltsize) { /* Suppress multiple warnings for propagated constant strings. */ if (only_value != 2 - && !warning_suppressed_p (arg, OPT_Warray_bounds) - && warning_at (loc, OPT_Warray_bounds, + && !warning_suppressed_p (arg, OPT_Warray_bounds_) + && warning_at (loc, OPT_Warray_bounds_, "offset %qwi outside bounds of constant string", eltoff)) { if (decl) inform (DECL_SOURCE_LOCATION (decl), "%qE declared here", decl); - suppress_warning (arg, OPT_Warray_bounds); + suppress_warning (arg, OPT_Warray_bounds_); } return NULL_TREE; } diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index 6f1f21bc4c1..b0da6886ccf 100644 --- a/gcc/c-family/c-com
[PATCH 35/35 V2] arm: improve tests for vsetq_lane*
Kyrylo Tkachov writes: [...] >> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c >> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c >> index e03e9620528..b5c9f4d5eb8 100644 >> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c >> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c >> @@ -1,15 +1,45 @@ >> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } >> {""} } */ >> /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ >> /* { dg-add-options arm_v8_1m_mve_fp } */ >> /* { dg-additional-options "-O2" } */ >> +/* { dg-final { check-function-bodies "**" "" } } */ >> >> #include "arm_mve.h" >> >> +/* >> +**foo: >> +** ... >> +** vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|) >> +** ... >> +*/ >> float16x8_t >> foo (float16_t a, float16x8_t b) >> { >> -return vsetq_lane_f16 (a, b, 0); >> + return vsetq_lane_f16 (a, b, 1); >> } >> > > Hmm, for these tests we should be able to scan for more specific codegen as > we're setting individual lanes, so we should be able to scan for lane 1 in > the vmov instruction, though it may need to be flipped for big-endian. > Thanks, > Kyrill Hi Kyrill, please find attached the updated version of this patch. Big-endian should not be a problem as for my understanding is just not supported with MVE intrinsics. Thanks! Andrea >From 79f2c990553a1f793e08b9a0c4abb7dae8de7120 Mon Sep 17 00:00:00 2001 From: Andrea Corallo Date: Thu, 17 Nov 2022 11:06:29 +0100 Subject: [PATCH] arm: improve tests for vsetq_lane* gcc/testsuite/ChangeLog: * gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test. * gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise. --- .../arm/mve/intrinsics/vsetq_lane_f16.c | 36 +++-- .../arm/mve/intrinsics/vsetq_lane_f32.c | 36 +++-- .../arm/mve/intrinsics/vsetq_lane_s16.c | 24 ++-- .../arm/mve/intrinsics/vsetq_lane_s32.c | 24 ++-- .../arm/mve/intrinsics/vsetq_lane_s64.c | 27 ++--- .../arm/mve/intrinsics/vsetq_lane_s8.c| 24 ++-- .../arm/mve/intrinsics/vsetq_lane_u16.c | 36 +++-- .../arm/mve/intrinsics/vsetq_lane_u32.c | 36 +++-- .../arm/mve/intrinsics/vsetq_lane_u64.c | 39 --- .../arm/mve/intrinsics/vsetq_lane_u8.c| 36 +++-- 10 files changed, 284 insertions(+), 34 deletions(-) diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c index e03e9620528..6b148a4b03d 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c @@ -1,15 +1,45 @@ -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ /* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ #include "arm_mve.h" +/* +**foo: +** ... +** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ float16x8_t foo (float16_t a, float16x8_t b) { -return vsetq_lane_f16 (a, b, 0); + return vsetq_lane_f16 (a, b, 1); } -/* { dg-final { scan-assembler "vmov.16" } } */ +/* +**foo1: +** ... +** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +float16x8_t +foo1 (float16_t a, float16x8_t b) +{ + return vsetq_lane (a, b, 1); +} + +/* +**foo2: +** ... +** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?: @.*|) +** ... +*/ +float16x8_t +foo2 (float16x8_t b) +{ + return vsetq_lane (1.1, b, 1); +} + +/* { dg-final { scan-assembler-not "__ARM_undef" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c index 2b9f1a7e627..e4e7f892e97 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c @@ -1,15 +1,45 @@ -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} } */ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ /* { dg-additional-options "-O2" } */ +/* { dg-final { check
Re: [PATCH v2 16/19] modula2 front end: bootstrap and documentation tools
Martin Liška writes: > On 11/8/22 14:22, Gaius Mulley wrote: >> Martin Liška writes: >> >> should be good - I'll complete the rst output in the scripts, > > Hi. > Hi Martin, > As you probably noticed, the Sphinx migration didn't go well. Yes, sorry to see this didn't happen. Thank you for your hard work and I hope it can occur in the future. > However, it's still up to you if you want to use it or not for Modula > 2. Once modula-2 is in master I'd like to revisit rst in devel/modula-2 along with analyzer patches and m2 generics. If successful then submit patches in early stage 1. > We have manuals like libgccjit, or Ada manuals > that use RST natively and provide exported .texi files. Ok thanks for the pointers, I will experiment with these build rhunes. > Cheers and sorry for the troubles I caused. No problem at all - the modula-2 scripts are now improved and cleaner due to the port. Hopefully rst will happen sometime in the future, regards, Gaius
Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
Hi Jakub, > * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix. Better say what exactly you changed in words. Changed. > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq *pre_p) >struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp; > >if (teams == NULL_TREE) > -num_teams_upper = integer_one_node; > +num_teams_upper = build_int_cst (integer_type_node, -2); >else > for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c)) >{ The function comment above optimize_target_teams contains detailed description on what the values mean and why, so it definitely should document what -2 means and when it is used. I know you have documentation in libgomp for it, but it should be in both places. I updated the comment with an explanation for "-2". > + intptr_t new_teams = orig_teams, new_threads = orig_threads; > + /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1. Two spaces after . Corrected here and at other places. > + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but the > + value could not be specified. No Change. Likewise. lowercase change ? Corrected. > + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause. > + Set device-specific value. > + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause. > +No change. */ > + if (orig_teams == -2) > +new_teams = 1; > + else if (orig_teams == 0) > +{ > + struct gomp_offload_icv_list *item = gomp_get_offload_icv_item (device); > + if (item != NULL) > + new_teams = item->icvs.nteams; > +} > + /* The device-specific teams-thread-limit is only set if (a) an explicit TEAMS > + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already set by > + e.g. a THREAD_LIMIT clause. */ > + if (orig_teams >= -2 && orig_threads == 0) The comment talks about ORIG_TEAMS > -2, but the condition is >= -2. So which one is it? Thanks for the hint. It should be indeed "> -2" since teams_thread_limit "sets the maximum number of OpenMP threads to use in each contention group created by a teams construct" (OpenMP 5.2, section 21.6.2). So if there is no (explicit) teams construct, then teams_thread_limit doesn't need to be copied to the device. > + /* This tests a large number of teams and threads. If it is larger than > +2^15+1 then the according argument in the kernels arguments list > +is encoded with two items instead of one. On NVIDIA there is an > +adjustment for too large teams and threads. For AMD such adjustment > +exists only for threads and will cause runtime errors with a two > +large s/two/too/ ? Shouldn't amdgcn adjusts also number of teams? I adjusted now also the number of teams in the amdgcn plugin. As upper bound I chose two times the number of compute units. This seems to be sufficient when one team is executed at one compute unit. This at least avoids the queueing of a large amount of teams and the corresponding memory allocation. The drawback is that a user is probably not aware of the actual number of compute units (which is not very large on gfx cards, e.g. 120 for gfx908 and 104 for gfx90a) and thus maybe expects different values from omp_get_team_num(). For instance in something like the following: #pragma omp target #pragma omp teams num_teams(220) #pragma omp distribute parallel for for(int i = 0; i < 220; ++i) { #pragma omp critical ... omp_get_team_num () ... } On a gfx90a card with 104 compute units 12 threads are assigned to "reused" teams (instead of having their own teams) that would not be the case without the limit. Alternatively, we could just define some (larger) constant number (though I don't know a reasonable value here). But this does actually not solve the above mentioned drawback. I think, we need to find a compromise between an unneccessary small upper bound and the chance to get memory allocation failures due to a too large number of teams. As for testcases, have you tested this in a native setup where dg-set-target-env-var actually works? Besides remote testing with offloading (which does not yet work with dg-set-target-env-var), I also tested locally on x86_64-pc-linux-gnu with one nvptx offload device without issues (using "make check" and verifying that offloading is indeed used). Marcel - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 This patch adds support for omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of device-specific ICV values (specified as environment variables or changed on a device). In order t
Re: [Patch Arm] Fix PR 92999
Ping x 2 Ramana On Thu, 17 Nov 2022, 20:15 Ramana Radhakrishnan, wrote: > On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan > wrote: > > > > On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan > > wrote: > > > > > > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw > > > wrote: > > > > > > > > > > > > > > > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote: > > > > > > > > > > > > > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote: > > > > >> PR92999 is a case where the VFP calling convention does not > allocate > > > > >> enough FP registers for a homogenous aggregate containing FP16 > values. > > > > >> I believe this is the complete fix but would appreciate another > set of > > > > >> eyes on this. > > > > >> > > > > >> Could I get a hand with a regression test run on an armhf > environment > > > > >> while I fix my environment ? > > > > >> > > > > >> gcc/ChangeLog: > > > > >> > > > > >> PR target/92999 > > > > >> * config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to > handle > > > > >> aggregates with elements smaller than SFmode. > > > > >> > > > > >> gcc/testsuite/ChangeLog: > > > > >> > > > > >> * gcc.target/arm/pr92999.c: New test. > > > > >> > > > > >> > > > > >> Thanks, > > > > >> Ramana > > > > >> > > > > >> Signed-off-by: Ramana Radhakrishnan > > > > > > > > > > I'm not sure about this. The AAPCS does not mention a base type > of a > > > > > half-precision FP type as an appropriate homogeneous aggregate for > using > > > > > VFP registers for either calling or returning. > > > > > > Ooh interesting, thanks for taking a look and poking at the AAPCS and > > > that's a good catch. BF16 should also have the same behaviour as FP16 > > > , I suspect ? > > > > I suspect I got caught out by the definition of the Homogenous > > aggregate from Section 5.3.5 > > (( > https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates > ) > > which simply suggests it's an aggregate of fundamental types which > > lists half precision floating point . > > > > FTR, ideally I should have read 7.1.2.1 > > > https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling > ) > > :) > > > > > > > > > > > > > > > > > > > So perhaps the bug is that we try to treat this as a homogeneous > > > > > aggregate at all. > > > > > > Yep I agree - I'll take a look again tomorrow and see if I can get a > fix. > > > > > > (And thanks Alex for the test run, I might trouble you again while I > > > still (slowly) get some of my boards back up) > > > > > > and as promised take 2. I'd really prefer another review on this one > > to see if I've not missed anything in the cases below. > > Ping ? > > Ramana > > > > > regards > > Ramana > > > > > > > > > > regards, > > > Ramana > > > > > > > > > > > > > > R. >
Re: [PATCH][AArch64] Cleanup move immediate code
Sorry for the very long delay in reviewing this. Wilco Dijkstra writes: > Hi Richard, > > Here is the immediate cleanup splitoff from the previous patch: > > Simplify, refactor and improve various move immediate functions. > Allow 32-bit MOVZ/N as a valid 64-bit immediate which removes special > cases in aarch64_internal_mov_immediate. Add new constraint so the movdi > pattern only needs a single alternative for move immediate. Just to make sure I understand: isn't it really just MOVN? I would have expected a 32-bit MOVZ to be equivalent to (and add no capabilities over) a 64-bit MOVZ. > Passes bootstrap and regress, OK for commit? > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_bitmask_imm): Use unsigned type. > (aarch64_zeroextended_move_imm): New function. > (aarch64_move_imm): Refactor, assert mode is SImode or DImode. > (aarch64_internal_mov_immediate): Assert mode is SImode or DImode. > Simplify special cases. > (aarch64_uimm12_shift): Simplify code. > (aarch64_clamp_to_uimm12_shift): Likewise. > (aarch64_movw_imm): Remove. > (aarch64_float_const_rtx_p): Pass either SImode or DImode to > aarch64_internal_mov_immediate. > (aarch64_rtx_costs): Likewise. > * config/aarch64/aarch64.md (movdi_aarch64): Merge 'N' and 'M' > constraints into single 'O'. > (mov_aarch64): Likewise. > * config/aarch64/aarch64-protos.h (aarch64_move_imm): Use unsigned. > (aarch64_bitmask_imm): Likewise. > (aarch64_uimm12_shift): Likewise. > (aarch64_zeroextended_move_imm): New prototype. > * config/aarch64/constraints.md: Add 'O' for 32/64-bit immediates, > limit 'N' to 64-bit only moves. > > --- > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index > 1a71f02284137c64e7115b26e6aa00447596f105..a73bfa20acb9b92ae0475794c3f11c67d22feb97 > 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -755,7 +755,7 @@ void aarch64_post_cfi_startproc (void); > poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned); > int aarch64_get_condition_code (rtx); > bool aarch64_address_valid_for_prefetch_p (rtx, bool); > -bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); > +bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode); > unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); > unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); > bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode > mode); > @@ -792,7 +792,7 @@ bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, > unsigned HOST_WIDE_INT, > unsigned HOST_WIDE_INT, > unsigned HOST_WIDE_INT); > bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx); > -bool aarch64_move_imm (HOST_WIDE_INT, machine_mode); > +bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode); > machine_mode aarch64_sve_int_mode (machine_mode); > opt_machine_mode aarch64_sve_pred_mode (unsigned int); > machine_mode aarch64_sve_pred_mode (machine_mode); > @@ -842,8 +842,9 @@ bool aarch64_sve_float_arith_immediate_p (rtx, bool); > bool aarch64_sve_float_mul_immediate_p (rtx); > bool aarch64_split_dimode_const_store (rtx, rtx); > bool aarch64_symbolic_address_p (rtx); > -bool aarch64_uimm12_shift (HOST_WIDE_INT); > +bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT); > int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &); > +bool aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT); > bool aarch64_use_return_insn_p (void); > const char *aarch64_output_casesi (rtx *); > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > 5d1ab5aa42b2cda0a655d2bc69c4df19da457ab3..798363bcc449c414de5bbb4f26b8e1c64a0cf71a > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -5558,12 +5558,10 @@ aarch64_bitmask_imm (unsigned HOST_WIDE_INT val) > > /* Return true if VAL is a valid bitmask immediate for MODE. */ > bool > -aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode) > +aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode) > { >if (mode == DImode) > -return aarch64_bitmask_imm (val_in); > - > - unsigned HOST_WIDE_INT val = val_in; > +return aarch64_bitmask_imm (val); > >if (mode == SImode) > return aarch64_bitmask_imm ((val & 0x) | (val << 32)); > @@ -5602,51 +5600,60 @@ aarch64_check_bitmask (unsigned HOST_WIDE_INT val, > } > > > -/* Return true if val is an immediate that can be loaded into a > - register by a MOVZ instruction. */ > -static bool > -aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode) > +/* Return true if immediate VAL can only be created by using a 32-bit > + zero-extended move immediate,
RE: [PATCH 2/2]AArch64 Support new tbranch optab.
Hi, I had a question and I figured I'd easier to ask before I spend more time implementing it 😊 I had noticed that one of the other reasons that cbranch and the other optabs like cmov explicitly emit the compare separately and use combine to match up the final form is for ifcvt. In particular by expanding tbranch directly to the final RTL we lose some ifcvt because there are no patterns that can handle the new zero_extract idiom. So the three solutions I can think of are: 1. Don't expand tbranch to its final form immediately, but still use zero_extract. This regresses -O0. (but do we care?) 2. Expand tbranch with vec_extract and provide new zero_extract based rtl sequences for ifcvt. I currently tried this, and while it works, I don't fully trust the RTL. In particular unlike say, combine Ifcvt doesn't allow me to add an extra clobber to say that CC Is clobbered by the pattern. Now tbranch Itself also expands a clobber, so the RTL isn't wrong even after ifcvt, but I'm worried that the pattern can Be idiom recognized and then no clobber could be present. I could modify the recog code in ifcvt to try to Ignore clobbers during matching. 3. I could expand using AND instead of zero_extract instead. We have more patterns handling AND, but I'm not Sure If this will fix the problem entirely, but in principle could expand to what ANDS generates and recog that instead. This shouldn't regress -O0 as we wouldn't put a zero_extract explicitly in RTL (and we already have a pattern for ANDS). What do you think? I personally favor 3.. Thanks, Tamar > -Original Message- > From: Richard Sandiford > Sent: Tuesday, November 22, 2022 2:00 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > ; nd ; Marcus Shawcroft > > Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. > > Tamar Christina writes: > >> -Original Message- > >> From: Richard Sandiford > >> Sent: Tuesday, November 15, 2022 11:34 AM > >> To: Tamar Christina > >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > >> ; nd ; Marcus Shawcroft > >> > >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. > >> > >> Tamar Christina writes: > >> >> -Original Message- > >> >> From: Richard Sandiford > >> >> Sent: Tuesday, November 15, 2022 11:15 AM > >> >> To: Tamar Christina > >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > >> >> ; nd ; Marcus > Shawcroft > >> >> > >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. > >> >> > >> >> Tamar Christina writes: > >> >> >> -Original Message- > >> >> >> From: Richard Sandiford > >> >> >> Sent: Tuesday, November 15, 2022 10:51 AM > >> >> >> To: Tamar Christina > >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > >> >> >> ; nd ; Marcus > >> Shawcroft > >> >> >> > >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. > >> >> >> > >> >> >> Tamar Christina writes: > >> >> >> >> -Original Message- > >> >> >> >> From: Richard Sandiford > >> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM > >> >> >> >> To: Tamar Christina > >> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw > >> >> >> >> ; nd ; Marcus > >> >> Shawcroft > >> >> >> >> > >> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. > >> >> >> >> > >> >> >> >> Tamar Christina writes: > >> >> >> >> > Hello, > >> >> >> >> > > >> >> >> >> > Ping and updated patch. > >> >> >> >> > > >> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no > >> issues. > >> >> >> >> > > >> >> >> >> > Ok for master? > >> >> >> >> > > >> >> >> >> > Thanks, > >> >> >> >> > Tamar > >> >> >> >> > > >> >> >> >> > gcc/ChangeLog: > >> >> >> >> > > >> >> >> >> > * config/aarch64/aarch64.md (*tb1): > >> >> >> >> > Rename > >> >> to... > >> >> >> >> > (*tb1): ... this. > >> >> >> >> > (tbranch4): New. > >> >> >> >> > > >> >> >> >> > gcc/testsuite/ChangeLog: > >> >> >> >> > > >> >> >> >> > * gcc.target/aarch64/tbz_1.c: New test. > >> >> >> >> > > >> >> >> >> > --- inline copy of patch --- > >> >> >> >> > > >> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md > >> >> >> >> > b/gcc/config/aarch64/aarch64.md index > >> >> >> >> > > >> >> >> >> > >> >> >> > >> >> > >> > 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd > >> >> >> >> 71 > >> >> >> >> > 2bde55c7c72e 100644 > >> >> >> >> > --- a/gcc/config/aarch64/aarch64.md > >> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md > >> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1" > >> >> >> >> > (const_int 1)))] > >> >> >> >> > ) > >> >> >> >> > > >> >> >> >> > -(define_insn "*tb1" > >> >> >> >> > +(define_expand "tbranch4" > >> >> >> >> >[(set (pc) (if_then_else > >> >> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0 > >> >> "register_operand" > >> >> >> >> "r") > >> >> >> >> > - (const_int 1) > >> >> >> >>
OpenMP Patch Ping
Updated list as follow up to last ping at https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601162.html Recent patches: Sandra's (Tue Nov 15 04:46:15 GMT 2022) [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target" https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html Julian's patches - I hope I got it right as I lost a bit track: (Tue Nov 8 14:36:17 GMT 2022) [PATCH v2 06/11] OpenMP: lvalue parsing for map clauses (C++) https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605367.html (Fri Sep 30 13:30:22 GMT 2022) [PATCH v3 06/11] OpenMP: Pointers and member mappings https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602609.html (Tue Oct 18 10:39:01 GMT 2022) [PATCH v5 0/4] OpenMP/OpenACC: Fortran array descriptor mappings https://gcc.gnu.org/pipermail/gcc-patches/2022-October/thread.html#603790 (I think this is partially my task to review those.) Approved but waiting for the Fortran patches (v5) to get approved. [PATCH v3 08/11] OpenMP/OpenACC: Rework clause expansion and nested struct handling https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602010.html Possibly requiring a second look/review despite my initial comment (which might require revisions on the patch side as well): OpenMP: Duplicate checking for map clauses in Fortran (PR107214) https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604033.html Older patches: * [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html * Unified-Shared Memory & Pinned Memory Depending on those: * [PATCH] OpenMP, libgomp: Handle unified shared memory in omp_target_is_accessible. https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594187.html * [PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust libgfortran memory allocators https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599703.html (Fortran part, required for ...) * Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html And finally: * [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599332.html (Side remark: some other debugging support like showing the mapping being done as stderr output or ... would be nice as well; might depend on a libgomp-debug.so and/or -f...(sanitize=openmp or ...); the other open-source compiler has something similar.) * * * Pending libgomp/nvptx patches: (Wed Sep 21 07:45:36 GMT 2022) [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601922.html (Wed Sep 21 07:45:54 GMT 2022) [PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601925.html Those were pinged 4 times :-( Hopefully, I have not missed any patch Tobias PS: The following list covers pending patches, which have been reviewed but but need to updated before being ready - hopefully, this list is also up to date: * (No pending patch, but wwwdoc's changes-13.html + projects/gomp/ need an update before GCC 13) * [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices Should be re-submitted any time soon (today, next few days) * [Patch] OpenMP/Fortran: Permit end-clause on directive https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600433.html Trivial patch modifications required - mostly LGTM already. * [PATCH] libgomp: fix hang on fatal error https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603616.html (Patch rejected but alternative solutions were suggested.) * Re: [Patch] OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays (Committed but failing occasionally:) https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605854.html * "[PATCH 3/3] vect: inbranch SIMD clones" https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599490.html Review comments to be addressed. * [PATCH 0/5] [gfortran] Support for allocate directive (OpenMP 5.0) https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588367.html * [PATCH] openmp: fix max_vf setting for amdgcn offloading https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598265.html → To be updated for review comments. (Side note: we should at some point find a way to improve target-specific handling; similar to the are-exceptions-supported issue of PR101544 but there are more.) * [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596587.html * Needs to be revised according to review comments * Fortran allocatable components handling (needs to be split into separate pieces and submitted separately) https://gcc.gnu.org/pipermail/gcc-patches/2022-April/59370
Re: [PATCH v2 16/19] modula2 front end: bootstrap and documentation tools
On 11/8/22 14:22, Gaius Mulley wrote: > Martin Liška writes: > >> 1) I would prefer using ' instead of ": >> >> $ flake8 ./gcc/m2/tools-src/tidydates.py >> ... >> ./gcc/m2/tools-src/tidydates.py:124:30: Q000 Double quotes found but single >> quotes preferred >> ./gcc/m2/tools-src/tidydates.py:127:27: Q000 Double quotes found but single >> quotes preferred >> ./gcc/m2/tools-src/tidydates.py:132:27: Q000 Double quotes found but single >> quotes preferred >> ./gcc/m2/tools-src/tidydates.py:133:33: Q000 Double quotes found but single >> quotes preferred >> ./gcc/m2/tools-src/tidydates.py:138:26: Q000 Double quotes found but single >> quotes preferred >> ./gcc/m2/tools-src/tidydates.py:143:28: Q000 Double quotes found but >> single quotes preferred > > ah yes will switch the quotes character. > >> 2) Python-names would be nicer: >> >> def writeTemplate(fo, magic, start, end, dates, contribution, summary, >> lic): >> >> def write_template(...) > > agreed, will change > >> 3) def hasExt(name, ext) - please use Path from pathlib >> >> 4) while (str.find(line, "(*") != -1): >> >> '(*' in line >> ? Similarly elsewhere. >> >> 5) str.find(line, ...) >> >> Use rather directly: line.find(...) >> >> 6) please use flake8: >> https://gcc.gnu.org/codingconventions.html#python > > sure will do all above - I used flake8 but maybe the plugins weren't > enabled. I'll try flake8 on tumbleweed. > >> Thanks, >> Martin >> >> P.S. I'm going to merge Sphinx branch this Wednesday, so then we should port >> your >> conversion scripts to emit .rst instead of .texi. > > should be good - I'll complete the rst output in the scripts, > > regards, > Gaius Hi. As you probably noticed, the Sphinx migration didn't go well. However, it's still up to you if you want to use it or not for Modula 2. We have manuals like libgccjit, or Ada manuals that use RST natively and provide exported .texi files. Cheers and sorry for the troubles I caused. Martin
Re: [PATCH v2] [x86] Fix incorrect _mm_cvtsbh_ss.
On Thu, Nov 24, 2022 at 4:53 PM Jakub Jelinek wrote: > > On Thu, Nov 24, 2022 at 09:22:00AM +0800, liuhongt via Gcc-patches wrote: > > --- a/gcc/config/i386/i386.md > > +++ b/gcc/config/i386/i386.md > > @@ -130,6 +130,7 @@ (define_c_enum "unspec" [ > >;; For AVX/AVX512F support > >UNSPEC_SCALEF > >UNSPEC_PCMP > > + UNSPEC_CVTBFSF > > > >;; Generic math support > >UNSPEC_IEEE_MIN; not commutative > > @@ -4961,6 +4962,31 @@ (define_insn "*extendhf2" > > (set_attr "prefix" "evex") > > (set_attr "mode" "")]) > > > > +(define_expand "extendbfsf2" > > + [(set (match_operand:SF 0 "register_operand") > > + (unspec:SF > > + [(match_operand:BF 1 "register_operand")] > > + UNSPEC_CVTBFSF))] > > + "TARGET_SSE2 && !HONOR_NANS (BFmode) && !flag_signaling_nans") > > I think if !HONOR_NANS (BFmode), then flag_signaling_nans doesn't matter, > the former says that no NaNs may appear in a valid program, > so just testing !HONOR_NANS (BFmode) should be enough. I'll remove flag_signaling_nans. > > What I'm not sure about, my memory is weak, is whether one can > safely use the fast math related tests in define_expand conditions. > I vaguely remember init_all_optabs remembers the conditions, for > changes say in the ISA options optabs are reinited, but not sure if > that happens for optimization option changes like the fast math related > options are. So it would be perhaps safer to use just TARGET_SSE2 > as the expand condition and in the C code body do > if (HONOR_NANS (BFmode) FAIL; > (similarly for truncsfbf2). > On the other side brief look at x86 insn-flags.h shows several fast math > related checks in HAVE_* macros. > PR92791 I found related to this was actually about Oh, good to know that, thanks. > optimize_function_for_{size,speed}_p (cfun) > so maybe fast math related stuff is fine, just not the optimization for > speed or size. I saw many backends(riscv,rs6000,mips,loongarch) already used HONOR_* stuff in the expander conditions. > > Jakub > -- BR, Hongtao
[committed] c++: Further -fcontract* option description fixes
On Tue, Nov 22, 2022 at 10:09:06AM -0500, Jason Merrill wrote: > > Though, shall we have those [on|off] options at all? > > Those are inconsistent with all other boolean options gcc has. > > Every other boolean option is -fwhatever for it being on > > and -fno-whatever for it being off, shouldn't the options be > > without arguments and accept negatives (-fcontract-assumption-mode > > vs. -fno-contract-assumption-mode etc.)? > > True, but I think let's leave them alone for now, they'll probably all be > replaced as the feature specification evolves. If we don't want to support them for too long, another possibility would be to use params for those instead of normal options, params are understood to be more volatile than normal options and can be removed easily (while for normal options we typically keep them but error or them or silently ignore depending on what the option is about). Anyway, during testing I've missed my previous patch just changed: -FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]\$" absent from output: " -fcontract-build-level=[off|default|audit] Specify max contract level to generate runtime checks for" +FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]\$" absent from output: " -fcontract-role=: Specify the semantics for all levels in a role (default, review), or a custom contract role with given semantics (ex: opt:assume,assume,assume)" rather than actually fixed it, the test only reports the first such problem. This patch fixes the remaining ones. Tested with make check-gcc RUNTESTFLAGS=help.exp and committed to trunk as obvious. 2022-11-24 Jakub Jelinek * c.opt (fcontract-role=, fcontract-semantic=): Terminate descriptions with a dot. --- gcc/c-family/c.opt.jj 2022-11-23 09:29:01.083548284 +0100 +++ gcc/c-family/c.opt 2022-11-24 11:42:29.582499720 +0100 @@ -1713,11 +1713,11 @@ C++ Joined RejectNegative fcontract-role= C++ Joined RejectNegative --fcontract-role=: Specify the semantics for all levels in a role (default, review), or a custom contract role with given semantics (ex: opt:assume,assume,assume) +-fcontract-role=: Specify the semantics for all levels in a role (default, review), or a custom contract role with given semantics (ex: opt:assume,assume,assume). fcontract-semantic= C++ Joined RejectNegative --fcontract-semantic=: Specify the concrete semantics for level +-fcontract-semantic=: Specify the concrete semantics for level. fcoroutines C++ LTO Var(flag_coroutines) Jakub
Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]
* Jakub Jelinek: > On Thu, Nov 24, 2022 at 11:01:40AM +0100, Florian Weimer via Gcc-patches > wrote: >> * Joseph Myers: >> >> > On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote: >> > >> >> Without this change, finish_declspecs cannot tell that whether there >> >> was an erroneous type specified, or no type at all. This may result >> >> in additional diagnostics for implicit ints, or missing diagnostics >> >> for multiple types. >> >> >> >> PR c/107805 >> >> >> >> gcc/c/ >> >> * c-decl.cc (declspecs_add_type): Propagate error_mark_bode >> >> from type to specs. >> >> >> >> gcc/testsuite/ >> >> * gcc.dg/pr107805-1.c: New test. >> >> * gcc.dg/pr107805-1.c: Likewise. >> > >> > OK. >> >> Thanks. Permission to backport this to GCC 12 after a week or two? > > In this case I'd wait a month, it will take some time until possible > error recovery bugs are discovered. Okay, I have made a note to backport it in the new year. Hopefully any regressions will be flagged on the PR or linked to it. Thanks, Florian
Re: [PATCH 5/8] middle-end: Add cltz_complement idiom recognition
On Mon, Nov 21, 2022 at 4:53 PM Andrew Carlotti wrote: > > On Mon, Nov 14, 2022 at 04:10:22PM +0100, Richard Biener wrote: > > On Fri, Nov 11, 2022 at 7:53 PM Andrew Carlotti via Gcc-patches > > wrote: > > > > > > This recognises patterns of the form: > > > while (n) { n >>= 1 } > > > > > > This patch results in improved (but still suboptimal) codegen: > > > > > > foo (unsigned int b) { > > > int c = 0; > > > > > > while (b) { > > > b >>= 1; > > > c++; > > > } > > > > > > return c; > > > } > > > > > > foo: > > > .LFB11: > > > .cfi_startproc > > > cbz w0, .L3 > > > clz w1, w0 > > > tst x0, 1 > > > mov w0, 32 > > > sub w0, w0, w1 > > > cselw0, w0, wzr, ne > > > ret > > > > > > The conditional is unnecessary. phiopt could recognise a redundant csel > > > (using cond_removal_in_builtin_zero_pattern) when one of the inputs is a > > > clz call, but it cannot recognise the redunancy when the input is (e.g.) > > > (32 - clz). > > > > > > I could perhaps extend this function to recognise this pattern in a later > > > patch, if this is a good place to recognise more patterns. > > > > > > gcc/ChangeLog: > > > > + PR tree-optimization/94793 > > > * tree-scalar-evolution.cc (expression_expensive_p): Add checks > > > for c[lt]z optabs. > > > * tree-ssa-loop-niter.cc (build_cltz_expr): New. > > > (number_of_iterations_cltz_complement): New. > > > (number_of_iterations_bitcount): Add call to the above. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * lib/target-supports.exp (check_effective_target_clz) > > > (check_effective_target_clzl, check_effective_target_clzll) > > > (check_effective_target_ctz, check_effective_target_clzl) > > > (check_effective_target_ctzll): New. > > > * gcc.dg/tree-ssa/cltz-complement-max.c: New test. > > > * gcc.dg/tree-ssa/clz-complement-char.c: New test. > > > * gcc.dg/tree-ssa/clz-complement-int.c: New test. > > > * gcc.dg/tree-ssa/clz-complement-long-long.c: New test. > > > * gcc.dg/tree-ssa/clz-complement-long.c: New test. > > > * gcc.dg/tree-ssa/ctz-complement-char.c: New test. > > > * gcc.dg/tree-ssa/ctz-complement-int.c: New test. > > > * gcc.dg/tree-ssa/ctz-complement-long-long.c: New test. > > > * gcc.dg/tree-ssa/ctz-complement-long.c: New test. > > > > > > > > > -- > > > > > > > > [snip test diffs] > > > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc > > > index > > > 7e2a3e986619de87e4ae9daf16198be1f13b917c..1ac9526c69b5fe80c26022f2fa1176d222e2dfb9 > > > 100644 > > > --- a/gcc/tree-scalar-evolution.cc > > > +++ b/gcc/tree-scalar-evolution.cc > > > @@ -3406,12 +3406,21 @@ expression_expensive_p (tree expr, hash_map > > uint64_t> &cache, > > > library call for popcount when backend does not have an > > > instruction > > > to do so. We consider this to be expensive and generate > > > __builtin_popcount only when backend defines it. */ > > > + optab optab; > > >combined_fn cfn = get_call_combined_fn (expr); > > >switch (cfn) > > > { > > > CASE_CFN_POPCOUNT: > > > + optab = popcount_optab; > > > + goto bitcount_call; > > > + CASE_CFN_CLZ: > > > + optab = clz_optab; > > > + goto bitcount_call; > > > + CASE_CFN_CTZ: > > > + optab = ctz_optab; > > > +bitcount_call: > > > /* Check if opcode for popcount is available in the mode > > > required. */ > > > - if (optab_handler (popcount_optab, > > > + if (optab_handler (optab, > > > TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, > > > 0 > > > == CODE_FOR_nothing) > > > { > > > @@ -3424,7 +3433,7 @@ expression_expensive_p (tree expr, hash_map > > uint64_t> &cache, > > > instructions. */ > > > if (is_a (mode, &int_mode) > > > && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD > > > - && (optab_handler (popcount_optab, word_mode) > > > + && (optab_handler (optab, word_mode) > > > != CODE_FOR_nothing)) > > > break; > > > return true; > > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc > > > index > > > fece876099c1687569d6351e7d2416ea6acae5b5..16e8e25919d808cea27adbd72f0be01ae2f0e547 > > > 100644 > > > --- a/gcc/tree-ssa-loop-niter.cc > > > +++ b/gcc/tree-ssa-loop-niter.cc > > > @@ -2198,6 +2198,195 @@ number_of_iterations_popcount (loop_p loop, edge > > > exit, > > >return true; > > > } > > > > > > +/* Return an expression that counts the leading/trailing zeroes of src. > > > */ > > > > Can you expand the comment on how you handle defined_at_zero and how > > that relates to the C[LT
Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]
On Thu, Nov 24, 2022 at 11:01:40AM +0100, Florian Weimer via Gcc-patches wrote: > * Joseph Myers: > > > On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote: > > > >> Without this change, finish_declspecs cannot tell that whether there > >> was an erroneous type specified, or no type at all. This may result > >> in additional diagnostics for implicit ints, or missing diagnostics > >> for multiple types. > >> > >>PR c/107805 > >> > >> gcc/c/ > >>* c-decl.cc (declspecs_add_type): Propagate error_mark_bode > >>from type to specs. > >> > >> gcc/testsuite/ > >>* gcc.dg/pr107805-1.c: New test. > >>* gcc.dg/pr107805-1.c: Likewise. > > > > OK. > > Thanks. Permission to backport this to GCC 12 after a week or two? In this case I'd wait a month, it will take some time until possible error recovery bugs are discovered. Jakub
Re: [PATCH] Add a new conversion for conditional ternary set into ifcvt [PR106536]
On Thu, Nov 24, 2022 at 8:25 AM HAO CHEN GUI wrote: > > Hi Richard, > > > 在 2022/11/24 4:06, Richard Biener 写道: > > Wouldn't we usually either add an optab or try to recog a canonical > > RTL form instead of adding a new target hook for things like this? > > Thanks so much for your comments. Please let me make it clear. > > Do you mean we should create an optab for "setb" pattern (the nested > if-then-else insn) and detect candidate insns in ifcvt pass? Then > generate the insn with the new optab? Yes, that would be one way to do it. Another way would be to generate a (to be defined) canonical form of such instruction and see whether it can be recognized (whether there's a define_insn for it). Note that were just things that came into my mind here, I'm not too familiar with how we handle such situations but at least I'm not aware of dozens of target hooks to handle instruction availability. Richard. > My concern is that some candidate insns are target specific. For > example, different modes cause additional zero_extend or subreg insns > generated on different targets. So I put the detection process into a > target hook. > > Looking forward to your advice. > > Thanks again > Gui Haochen
Re: [PATCH] asan: Fix up error recovery for too large frames [PR107317]
On Thu, 24 Nov 2022, Jakub Jelinek wrote: > Hi! > > asan_emit_stack_protection and functions it calls have various asserts that > verify sanity of the stack protection instrumentation. But, that > verification can easily fail if we've diagnosed a frame offset overflow. > asan_emit_stack_protection just emits some extra code in the prologue, > if we've reported errors, we aren't producing assembly, so it doesn't > really matter if we don't include the protection code, compilation > is going to fail anyway. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. > 2022-11-24 Jakub Jelinek > > PR middle-end/107317 > * asan.cc: Include diagnostic-core.h. > (asan_emit_stack_protection): Return NULL early if seen_error (). > > * gcc.dg/asan/pr107317.c: New test. > > --- gcc/asan.cc.jj2022-06-28 13:03:30.613693889 +0200 > +++ gcc/asan.cc 2022-11-23 17:47:09.130332461 +0100 > @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. > #include "tree-inline.h" > #include "tree-ssa.h" > #include "tree-eh.h" > +#include "diagnostic-core.h" > > /* AddressSanitizer finds out-of-bounds and use-after-free bugs > with <2x slowdown on average. > @@ -1818,6 +1819,11 @@ asan_emit_stack_protection (rtx base, rt >tree str_cst, decl, id; >int use_after_return_class = -1; > > + /* Don't emit anything when doing error recovery, the assertions > + might fail e.g. if a function had a frame offset overflow. */ > + if (seen_error ()) > +return NULL; > + >if (shadow_ptr_types[0] == NULL_TREE) > asan_init_shadow_ptr_types (); > > --- gcc/testsuite/gcc.dg/asan/pr107317.c.jj 2022-11-23 17:46:09.145219960 > +0100 > +++ gcc/testsuite/gcc.dg/asan/pr107317.c 2022-11-23 17:49:45.148024097 > +0100 > @@ -0,0 +1,13 @@ > +/* PR middle-end/107317 */ > +/* { dg-do compile { target ilp32 } } */ > +/* { dg-options "-fsanitize=address -ffat-lto-objects" } */ > + > +void bar (float *, float *); > + > +void > +foo (void) /* { dg-error "exceeds maximum" } */ > +{ > + float a[4]; > + float b[2]; > + bar (a, b); > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
[COMMITTED] ada: Add assertion for the implementation of storage models
From: Eric Botcazou We cannot generate a call to memset for an aggregate with an Others choice when the target of the assignment has a storage model with Copy_To routine. gcc/ada/ * gcc-interface/trans.cc (gnat_to_gnu) : Add assertion that memset is not supposed to be used when the target has a storage model with Copy_To routine. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/gcc-interface/trans.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc index 1cd621a9377..b9d7c015a73 100644 --- a/gcc/ada/gcc-interface/trans.cc +++ b/gcc/ada/gcc-interface/trans.cc @@ -7450,6 +7450,9 @@ gnat_to_gnu (Node_Id gnat_node) else if (Present (gnat_smo) && Present (Storage_Model_Copy_To (gnat_smo))) { + /* We obviously cannot use memset in this case. */ + gcc_assert (!use_memset_p); + tree t = remove_conversions (gnu_rhs, false); /* If a storage model load is present on the RHS then instantiate -- 2.34.1
[COMMITTED] ada: Spurious error on Lock_Free protected type with discriminants
From: Justin Squirek This patch corrects an issue in the compiler whereby unprefixed discriminants appearing in protected subprograms were unable to be properly resolved - leading to spurious resolution errors. gcc/ada/ * sem_ch8.adb (Find_Direct_Name): Remove bypass to reanalyze incorrectly analyzed discriminals. (Set_Entity_Or_Discriminal): Avoid resetting the entity field of a discriminant reference to be the internally generated renaming when we are in strict preanalysis mode. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_ch8.adb | 29 - 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/gcc/ada/sem_ch8.adb b/gcc/ada/sem_ch8.adb index ca306663791..841076bbd01 100644 --- a/gcc/ada/sem_ch8.adb +++ b/gcc/ada/sem_ch8.adb @@ -4891,6 +4891,18 @@ package body Sem_Ch8 is then null; + -- Don't replace the discriminant in strict preanalysis mode since + -- it can lead to errors during full analysis when the discriminant + -- gets referenced later. + + -- This can occur in situations where a protected type contains + -- an expression function which references a discriminant. + + elsif Preanalysis_Active + and then Inside_Preanalysis_Without_Freezing = 0 + then +null; + else Set_Entity (N, Discriminal (E)); end if; @@ -6048,21 +6060,6 @@ package body Sem_Ch8 is if Is_Type (Entity (N)) then Set_Etype (N, Entity (N)); - -- The exception to this general rule are constants associated with - -- discriminals of protected types because for each protected op - -- a new set of discriminals is internally created by the frontend - -- (see Exp_Ch9.Set_Discriminals), and the current decoration of the - -- entity pointer may have been set as part of a preanalysis, where - -- discriminals still reference the first subprogram or entry to be - -- expanded (see Expand_Protected_Body_Declarations). - - elsif Full_Analysis - and then Ekind (Entity (N)) = E_Constant - and then Present (Discriminal_Link (Entity (N))) - and then Is_Protected_Type (Scope (Discriminal_Link (Entity (N - then -goto Find_Name; - else declare Entyp : constant Entity_Id := Etype (Entity (N)); @@ -6102,8 +6099,6 @@ package body Sem_Ch8 is return; end if; - <> - -- Preserve relevant elaboration-related attributes of the context which -- are no longer available or very expensive to recompute once analysis, -- resolution, and expansion are over. -- 2.34.1
Re: [PATCH] 8/19 modula2 front end: libgm2 contents
Richard Biener writes: > On Mon, Oct 10, 2022 at 5:35 PM Gaius Mulley via Gcc-patches > wrote: >> >> >> >> This patch set consists of the libgm2 makefile, autoconf sources >> necessary to build the libm2pim, libm2iso, libm2min, libm2cor >> and libm2log. > > This looks OK. I suppose it was also tested building a cross-compiler? > > Can we get some up-to-date status on the build and support status for the > list of primary and secondary platforms we list on > https://gcc.gnu.org/gcc-13/criteria.html? sure, here is a summary of the primary/secondary platforms, added at the end of the page: https://splendidisolation.ddns.net/public/modula2/patchsummary.html regards, Gaius
Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]
* Joseph Myers: > On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote: > >> Without this change, finish_declspecs cannot tell that whether there >> was an erroneous type specified, or no type at all. This may result >> in additional diagnostics for implicit ints, or missing diagnostics >> for multiple types. >> >> PR c/107805 >> >> gcc/c/ >> * c-decl.cc (declspecs_add_type): Propagate error_mark_bode >> from type to specs. >> >> gcc/testsuite/ >> * gcc.dg/pr107805-1.c: New test. >> * gcc.dg/pr107805-1.c: Likewise. > > OK. Thanks. Permission to backport this to GCC 12 after a week or two? Florian
Re: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]
Am 2022-11-23 um 21:11 schrieb Richard Biener via Gcc-patches: On Wed, Nov 23, 2022 at 3:08 PM Iskander Shakirzyanov via Gcc-patches wrote: Hi! Sorry for the initially missing description. The following patch changes the definition of -Warray-bounds to an Alias to -Warray-bounds=1. This is necessary for the correct use of -Werror=array-bounds=X, for more information see bug report 107787 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107787) As I understand, this happens because -Warray-bounds and -Warray-bounds= are declared as 2 different options in common.opt, so when diagnostic_classify_diagnostic() (opts-common.cc:1880) is called, DK_ERROR is set for the -Warray-bounds= option, but in diagnostic_report_diagnostic() (diagnostic.cc:1446) through warning_at() passes opt_index of -Warray-bounds, so information about DK_ERROR is lost. How did you test the patch? If you bootstrapped it and ran the testsuite then it's OK. Hi, I'm pretty sure the testsuite will have regressions, as I have a very similar patch lying around that needs these testsuite changes. I also added some modified tests that check that -Werror=array-bounds=X works as expected (which it didn't before). This also shows nicely why I don't like warnings with levels, what if I want -Werror=array-bounds=2 + -Warray-bounds=1? The reason (besides lack of time) I didn't submit that patch yet, is that I wanted to check if the tool to process common.opt can be changed to detect warnings with duplicated warning variables. Because I think at least -Wuse-after-free= and Wattributes= have the same problem. BTW, is the duplicated warning description "Warn if an array is accessed out of bounds." needed or not with Alias()? There are examples either way in common.opt. I've attached my patch, feel free to integrate the testsuite changes. Franz From 9bfefe2082f55f2ad2cd19beedbfeaf9bd20fb4a Mon Sep 17 00:00:00 2001 From: Franz Sirl Date: Thu, 8 Jul 2021 10:25:00 +0200 Subject: [PATCH 08/11] Unify -Warray-bounds/-Warray-bounds= warning variables. --- gcc/builtins.cc | 6 +- gcc/c-family/c-common.cc| 6 +- gcc/common.opt | 3 +- gcc/diagnostic-spec.cc | 1 - gcc/gimple-array-bounds.cc | 38 +++--- gcc/gimple-ssa-warn-restrict.cc | 2 +- gcc/testsuite/gcc.dg/Warray-bounds-34.c | 2 +- gcc/testsuite/gcc.dg/Warray-bounds-43.c | 6 +- gcc/testsuite/gcc.dg/Warray-bounds-93.c | 103 gcc/testsuite/gcc.dg/Warray-bounds-94.c | 149 10 files changed, 283 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-93.c create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-94.c diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 4dc1ca672b2..02c4fefa86f 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -696,14 +696,14 @@ c_strlen (tree arg, int only_value, c_strlen_data *data, unsigned eltsize) { /* Suppress multiple warnings for propagated constant strings. */ if (only_value != 2 - && !warning_suppressed_p (arg, OPT_Warray_bounds) - && warning_at (loc, OPT_Warray_bounds, + && !warning_suppressed_p (arg, OPT_Warray_bounds_) + && warning_at (loc, OPT_Warray_bounds_, "offset %qwi outside bounds of constant string", eltoff)) { if (decl) inform (DECL_SOURCE_LOCATION (decl), "%qE declared here", decl); - suppress_warning (arg, OPT_Warray_bounds); + suppress_warning (arg, OPT_Warray_bounds_); } return NULL_TREE; } diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index 71507d4cb0a..cf423d384f6 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -6811,7 +6811,7 @@ fold_offsetof (tree expr, tree type, enum tree_code ctx) definition thereof. */ if (TREE_CODE (v) == ARRAY_REF || TREE_CODE (v) == COMPONENT_REF) - warning (OPT_Warray_bounds, + warning (OPT_Warray_bounds_, "index %E denotes an offset " "greater than size of %qT", t, TREE_TYPE (TREE_OPERAND (expr, 0))); @@ -8530,9 +8530,9 @@ convert_vector_to_array_for_subscript (location_t loc, index = fold_for_warn (index); if (TREE_CODE (index) == INTEGER_CST) -if (!tree_fits_uhwi_p (index) + if (!tree_fits_uhwi_p (index) || maybe_ge (tree_to_uhwi (index), TYPE_VECTOR_SUBPARTS (type))) - warning_at (loc, OPT_Warray_bounds, "index value is out of bound"); + warning_at (loc, OPT_Warray_bounds_, "index value is out of bound"); /* We are building an ARRAY_REF so mark the vector as addressable to not run into the gimplifiers premature setting of DECL_GIMP
[PATCH V2] Update block move for struct param or returns
Hi, When assigning a parameter to a variable, or assigning a variable to return value with struct type, "block move" are used to expand the assignment. It would be better to use the register mode according to the target/ABI to move the blocks. And then this would raise more opportunities for other optimization passes(cse/dse/xprop). As the example code (like code in PR65421): typedef struct SA {double a[3];} A; A ret_arg_pt (A *a){return *a;} // on ppc64le, only 3 lfd(s) A ret_arg (A a) {return a;} // just empty fun body void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s) This patch is based on the previous version which supports assignments from parameter: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605709.html This patch also supports returns. I also tried to update gimplify/nrv to replace "return D.xxx;" with "return ;". While there is one issue: "" with PARALLEL code can not be accessed through address/component_ref. This issue blocks a few passes (e.g. sra, expand). On ppc64, some dead stores are not eliminated. e.g. for ret_arg: .cfi_startproc std 4,56(1)//reductant std 5,64(1)//reductant std 6,72(1)//reductant std 4,0(3) std 5,8(3) std 6,16(3) blr Bootstraped and regtested on ppc64le and x86_64. I'm wondering if this patch could be committed first. Thanks for the comments and suggestions. BR, Jeff (Jiufu) PR target/65421 gcc/ChangeLog: * cfgexpand.cc (expand_used_vars): Add collecting return VARs. (expand_gimple_stmt_1): Call expand_special_struct_assignment. (pass_expand::execute): Free collections of return VARs. * expr.cc (expand_special_struct_assignment): New function. * expr.h (expand_special_struct_assignment): Declare. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr65421-1.c: New test. * gcc.target/powerpc/pr65421.c: New test. --- gcc/cfgexpand.cc | 37 + gcc/expr.cc | 43 gcc/expr.h | 3 ++ gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 21 ++ gcc/testsuite/gcc.target/powerpc/pr65421.c | 19 + 5 files changed, 123 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index dd29c03..f185de39341 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -341,6 +341,9 @@ static hash_map *decl_to_stack_part; all of them in one big sweep. */ static bitmap_obstack stack_var_bitmap_obstack; +/* Those VARs on returns. */ +static bitmap return_vars; + /* An array of indices such that stack_vars[stack_vars_sorted[i]].size is non-decreasing. */ static size_t *stack_vars_sorted; @@ -2158,6 +2161,24 @@ expand_used_vars (bitmap forced_stack_vars) frame_phase = off ? align - off : 0; } + /* Collect VARs on returns. */ + return_vars = NULL; + if (DECL_RESULT (current_function_decl) + && TYPE_MODE (TREE_TYPE (DECL_RESULT (current_function_decl))) == BLKmode) +{ + return_vars = BITMAP_ALLOC (NULL); + + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) + { + tree val = gimple_return_retval (ret); + if (val && VAR_P (val)) + bitmap_set_bit (return_vars, DECL_UID (val)); + } +} + /* Set TREE_USED on all variables in the local_decls. */ FOR_EACH_LOCAL_DECL (cfun, i, var) TREE_USED (var) = 1; @@ -3942,6 +3963,17 @@ expand_gimple_stmt_1 (gimple *stmt) /* This is a clobber to mark the going out of scope for this LHS. */ expand_clobber (lhs); + else if ((TREE_CODE (rhs) == PARM_DECL && DECL_INCOMING_RTL (rhs) + && TYPE_MODE (TREE_TYPE (rhs)) == BLKmode + && (GET_CODE (DECL_INCOMING_RTL (rhs)) == PARALLEL + || REG_P (DECL_INCOMING_RTL (rhs +|| (VAR_P (lhs) && return_vars +&& DECL_RTL_SET_P (DECL_RESULT (current_function_decl)) +&& GET_CODE ( + DECL_RTL (DECL_RESULT (current_function_decl))) + == PARALLEL +&& bitmap_bit_p (return_vars, DECL_UID (lhs + expand_special_struct_assignment (lhs, rhs); else expand_assignment (lhs, rhs, gimple_assign_nontemporal_move_p ( @@ -7025,6 +7057,11 @@ pass_expand::execute (function *fun) /* After expanding, the return labels are no longer needed. */ return_label = NULL; naked_return_label = NULL; + if (return_vars) +{ + BITMAP_FREE (return_vars); + return_var
[committed] testsuite: Fix up broken testcase [PR107127]
Hi! On Thu, Nov 24, 2022 at 09:37:54AM +0800, haochen.jiang wrote: > 8a0fce6a51915c29584427fd376b40073c328090 is the first bad commit > commit 8a0fce6a51915c29584427fd376b40073c328090 > Author: Jakub Jelinek > Date: Wed Nov 23 19:09:31 2022 +0100 > > c: Fix compile time hog in c_genericize [PR107127] > > caused > > FAIL: gcc.dg/pr107127.c (test for excess errors) Oops, sorry. I've added { dg-options "" } line manually in the patch but forgot to adjust the number of added lines. Tested on x86_64-linux, committed to trunk as obvious. 2022-11-24 Jakub Jelinek PR c/107127 * gcc.dg/pr107127.c (foo): Add missing closing }. --- gcc/testsuite/gcc.dg/pr107127.c.jj 2022-11-23 19:09:24.670333114 +0100 +++ gcc/testsuite/gcc.dg/pr107127.c 2022-11-24 10:31:48.208143681 +0100 @@ -10,3 +10,4 @@ foo (_Complex double a, double b, double return v[0] / c * (0 - 0 / a + 699.0 + 7.05 - 286.0 - +-4.65 + 1.57 + 0) * 0.1 - 3.28 + 4.22 + 0.1)) * b + 5.06) * 1.23 * 8.0 * 12.0 * 16.0 * 2.0 * 2.0 * 0.25 * 0.125 * 18.2 * -15.25 * 0.0001 * 42.0 * 0.012 - 8.45 + 0 + 88.0 + 6.96 + 867.0 + 9.10 - 7.04 * -1.0); +} Jakub
Re: [PATCH] libstdc++: Workaround buggy printf on Solaris in to_chars/float128_c++23.cc test [PR107815]
On Thu, 24 Nov 2022 at 09:20, Jakub Jelinek wrote: > > Hi! > > As mentioned in the PR, Solaris apparently can handle right > printf ("%.0Lf\n", 1e+202L * __DBL_MAX__); > which prints 511 chars long number, but can't handle > printf ("%.0Lf\n", 1e+203L * __DBL_MAX__); > nor > printf ("%.0Lf\n", __LDBL_MAX__); > properly, instead of printing 512 chars long number for the former and > 4933 chars long number for the second, it handles them as > if user asked for "%.0Le\n" in those cases. > > The following patch disables the single problematic value that fails > in the test, and also fixes commented out debugging printouts. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK, thanks > > 2022-11-24 Jakub Jelinek > > PR libstdc++/107815 > * testsuite/20_util/to_chars/float128_c++23.cc (test): Disable > __FLT128_MAX__ test on Solaris. Fix up commented out debugging > printouts. > > --- libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc.jj > 2022-11-08 11:19:22.251768167 +0100 > +++ libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc 2022-11-23 > 17:02:22.380051796 +0100 > @@ -52,14 +52,17 @@ test(std::chars_format fmt = std::chars_ > std::numbers::inv_sqrt3_v, > std::numbers::egamma_v, > std::numbers::phi_v, > +// Solaris has non-conforming printf, see PR98384 and PR107815. > +#if !(defined(__sun__) && defined(__svr4__)) > std::numeric_limits::max() > +#endif >}; >char str1[1], str2[1]; >for (auto u : tests) > { >auto [ptr1, ec1] = std::to_chars(str1, str1 + sizeof(str1), u, fmt); >VERIFY( ec1 == std::errc() ); > -//std::cout << i << ' ' << std::string_view (str1, ptr1) << '\n'; > +//std::cout << u << ' ' << std::string_view (str1, ptr1) << '\n'; >if (fmt == std::chars_format::fixed) > { > auto [ptr2, ec2] = std::to_chars(str2, str2 + (ptr1 - str1), u, > fmt); > @@ -76,7 +79,7 @@ test(std::chars_format fmt = std::chars_ > >auto [ptr5, ec5] = std::to_chars(str1, str1 + sizeof(str1), u, fmt, > 90); >VERIFY( ec5 == std::errc() ); > -//std::cout << i << ' ' << std::string_view (str1, ptr5) << '\n'; > +//std::cout << u << ' ' << std::string_view (str1, ptr5) << '\n'; >v = 4.0f128; >auto [ptr6, ec6] = std::from_chars(str1, ptr5, v, > fmt == std::chars_format{} > > Jakub >
Re: [PATCH] libstdc++: Another merge from fast_float upstream [PR107468]
On Thu, 24 Nov 2022 at 09:23, Jakub Jelinek wrote: > > Hi! > > Upstream fast_float came up with a cheaper test for > fegetround () == FE_TONEAREST using one float addition, one subtraction > and one comparison. If we know we are rounding to nearest, we can use > fast path in more cases as before. > The following patch merges those changes into libstdc++. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK, thanks. > > 2022-11-24 Jakub Jelinek > > PR libstdc++/107468 > * src/c++17/fast_float/MERGE: Adjust for merge from upstream. > * src/c++17/fast_float/fast_float.h: Merge from fast_float > 2ef9abbcf6a11958b6fa685a89d0150022e82e78 commit. > > --- libstdc++-v3/src/c++17/fast_float/MERGE.jj 2022-11-07 15:17:14.035071694 > +0100 > +++ libstdc++-v3/src/c++17/fast_float/MERGE 2022-11-23 17:09:20.940866070 > +0100 > @@ -1,4 +1,4 @@ > -662497742fea7055f0e0ee27e5a7ddc382c2c38e > +2ef9abbcf6a11958b6fa685a89d0150022e82e78 > > The first line of this file holds the git revision number of the > last merge done from the master library sources. > --- libstdc++-v3/src/c++17/fast_float/fast_float.h.jj 2022-11-07 > 15:17:14.066071268 +0100 > +++ libstdc++-v3/src/c++17/fast_float/fast_float.h 2022-11-23 > 17:19:41.735693122 +0100 > @@ -99,11 +99,11 @@ from_chars_result from_chars_advanced(co > || defined(__MINGW64__) \ > || defined(__s390x__)\ > || (defined(__ppc64__) || defined(__PPC64__) || defined(__ppc64le__) > || defined(__PPC64LE__)) ) > -#define FASTFLOAT_64BIT > +#define FASTFLOAT_64BIT 1 > #elif (defined(__i386) || defined(__i386__) || defined(_M_IX86) \ > || defined(__arm__) || defined(_M_ARM) \ > || defined(__MINGW32__) || defined(__EMSCRIPTEN__)) > -#define FASTFLOAT_32BIT > +#define FASTFLOAT_32BIT 1 > #else >// Need to check incrementally, since SIZE_MAX is a size_t, avoid overflow. >// We can never tell the register width, but the SIZE_MAX is a good > approximation. > @@ -111,9 +111,9 @@ from_chars_result from_chars_advanced(co >#if SIZE_MAX == 0x > #error Unknown platform (16-bit, unsupported) >#elif SIZE_MAX == 0x > -#define FASTFLOAT_32BIT > +#define FASTFLOAT_32BIT 1 >#elif SIZE_MAX == 0x > -#define FASTFLOAT_64BIT > +#define FASTFLOAT_64BIT 1 >#else > #error Unknown platform (not 32-bit, not 64-bit?) >#endif > @@ -359,10 +359,12 @@ template struct binary_form >static inline constexpr int minimum_exponent(); >static inline constexpr int infinite_power(); >static inline constexpr int sign_index(); > + static inline constexpr int min_exponent_fast_path(); // used when > fegetround() == FE_TONEAREST >static inline constexpr int max_exponent_fast_path(); >static inline constexpr int max_exponent_round_to_even(); >static inline constexpr int min_exponent_round_to_even(); >static inline constexpr uint64_t max_mantissa_fast_path(int64_t power); > + static inline constexpr uint64_t max_mantissa_fast_path(); // used when > fegetround() == FE_TONEAREST >static inline constexpr int largest_power_of_ten(); >static inline constexpr int smallest_power_of_ten(); >static inline constexpr T exact_power_of_ten(int64_t power); > @@ -372,6 +374,22 @@ template struct binary_form >static inline constexpr equiv_uint hidden_bit_mask(); > }; > > +template <> inline constexpr int > binary_format::min_exponent_fast_path() { > +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0) > + return 0; > +#else > + return -22; > +#endif > +} > + > +template <> inline constexpr int > binary_format::min_exponent_fast_path() { > +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0) > + return 0; > +#else > + return -10; > +#endif > +} > + > template <> inline constexpr int > binary_format::mantissa_explicit_bits() { >return 52; > } > @@ -418,13 +436,18 @@ template <> inline constexpr int binary_ > template <> inline constexpr int > binary_format::max_exponent_fast_path() { >return 10; > } > - > +template <> inline constexpr uint64_t > binary_format::max_mantissa_fast_path() { > + return uint64_t(2) << mantissa_explicit_bits(); > +} > template <> inline constexpr uint64_t > binary_format::max_mantissa_fast_path(int64_t power) { >// caller is responsible to ensure that >// power >= 0 && power <= 22 >// >return max_mantissa_double[power]; > } > +template <> inline constexpr uint64_t > binary_format::max_mantissa_fast_path() { > + return uint64_t(2) << mantissa_explicit_bits(); > +} > template <> inline constexpr uint64_t > binary_format::max_mantissa_fast_path(int64_t power) { >// caller is responsible to ensure that >// power >= 0 && power <= 10 > @@ -619,10 +642,6 @@ parsed_number_string parse_number_string > >uint64_t i = 0; // an unsign
[PATCH] asan: Fix up error recovery for too large frames [PR107317]
Hi! asan_emit_stack_protection and functions it calls have various asserts that verify sanity of the stack protection instrumentation. But, that verification can easily fail if we've diagnosed a frame offset overflow. asan_emit_stack_protection just emits some extra code in the prologue, if we've reported errors, we aren't producing assembly, so it doesn't really matter if we don't include the protection code, compilation is going to fail anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-11-24 Jakub Jelinek PR middle-end/107317 * asan.cc: Include diagnostic-core.h. (asan_emit_stack_protection): Return NULL early if seen_error (). * gcc.dg/asan/pr107317.c: New test. --- gcc/asan.cc.jj 2022-06-28 13:03:30.613693889 +0200 +++ gcc/asan.cc 2022-11-23 17:47:09.130332461 +0100 @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. #include "tree-inline.h" #include "tree-ssa.h" #include "tree-eh.h" +#include "diagnostic-core.h" /* AddressSanitizer finds out-of-bounds and use-after-free bugs with <2x slowdown on average. @@ -1818,6 +1819,11 @@ asan_emit_stack_protection (rtx base, rt tree str_cst, decl, id; int use_after_return_class = -1; + /* Don't emit anything when doing error recovery, the assertions + might fail e.g. if a function had a frame offset overflow. */ + if (seen_error ()) +return NULL; + if (shadow_ptr_types[0] == NULL_TREE) asan_init_shadow_ptr_types (); --- gcc/testsuite/gcc.dg/asan/pr107317.c.jj 2022-11-23 17:46:09.145219960 +0100 +++ gcc/testsuite/gcc.dg/asan/pr107317.c2022-11-23 17:49:45.148024097 +0100 @@ -0,0 +1,13 @@ +/* PR middle-end/107317 */ +/* { dg-do compile { target ilp32 } } */ +/* { dg-options "-fsanitize=address -ffat-lto-objects" } */ + +void bar (float *, float *); + +void +foo (void) /* { dg-error "exceeds maximum" } */ +{ + float a[4]; + float b[2]; + bar (a, b); +} Jakub
Re: [PATCH] vect: Fold LEN_{LOAD, STORE} if it's for the whole vector [PR107412]
"Kewen.Lin" writes: > Hi, > > As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, > STORE} into normal vector load/store if the given length is known > to be equal to the length of the whole vector. It would help to > improve overall cycles as normally the latency of vector access > with length in bytes is bigger than normal vector access, and it > also saves the preparation for length if constant length can not > be encoded into instruction (such as on power). > > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > Is it ok for trunk? > > BR, > Kewen > - > PR tree-optimization/107412 > > gcc/ChangeLog: > > * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ... > (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter > mask_p indicating it's for mask or length, and add some handlings for > IFN LEN_{LOAD,STORE}. > (gimple_fold_mask_load): Rename to ... > (gimple_fold_partial_load): ... this, add one parameter mask_p. > (gimple_fold_mask_store): Rename to ... > (gimple_fold_partial_store): ... this, add one parameter mask_p. > (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE}, > and adjust calls on gimple_fold_mask_load_store_mem_ref to > gimple_fold_partial_load_store_mem_ref. Sorry to reply to late (still catching up on email), but: > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr107412.c: New test. > * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for > folded LEN_LOAD. > --- > gcc/gimple-fold.cc| 57 ++- > .../gcc.target/powerpc/p9-vec-length-epil-8.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr107412.c | 19 +++ > 3 files changed, 64 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index a1704784bc9..e3a087defa6 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree > type, >return wi::min_precision (wres, sign) > TYPE_PRECISION (type); > } > > -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF > +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF > for the memory it references, otherwise return null. VECTYPE is the > - type of the memory vector. */ > + type of the memory vector. MASK_P indicates it's for MASK if true, > + otherwise it's for LEN. */ > > static tree > -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype) > +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool > mask_p) > { >tree ptr = gimple_call_arg (call, 0); >tree alias_align = gimple_call_arg (call, 1); > - tree mask = gimple_call_arg (call, 2); > - if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask)) > + if (!tree_fits_uhwi_p (alias_align)) > return NULL_TREE; > > + if (mask_p) > +{ > + tree mask = gimple_call_arg (call, 2); > + if (!integer_all_onesp (mask)) > + return NULL_TREE; > +} else { Minor nit: }, else, and { should be on separate lines. But the thing I actually wanted to say was... > + tree basic_len = gimple_call_arg (call, 2); > + if (!tree_fits_uhwi_p (basic_len)) > + return NULL_TREE; > + unsigned int nargs = gimple_call_num_args (call); > + tree bias = gimple_call_arg (call, nargs - 1); > + gcc_assert (tree_fits_uhwi_p (bias)); > + tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); > + unsigned int len = tree_to_uhwi (biased_len); > + unsigned int vect_len > + = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); > + if (vect_len != len) > + return NULL_TREE; Using "unsigned int" truncates the value. I realise that's probably safe in this context, since large values have undefined behaviour. But it still seems better to use an untruncated type, so that it looks less like an oversight. (I think this is one case where "auto" can help, since it gets the type right automatically.) It would also be better to avoid the to_constant, since we haven't proven is_constant. How about: tree basic_len = gimple_call_arg (call, 2); if (!poly_int_tree_p (basic_len)) return NULL_TREE; unsigned int nargs = gimple_call_num_args (call); tree bias = gimple_call_arg (call, nargs - 1); gcc_assert (TREE_CODE (bias) == INTEGER_CST); if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias), GET_MODE_SIZE (TYPE_MODE (vectype return NULL_TREE; which also avoids using tree arithmetic for the subtraction? Thanks, Richard > +} > + >unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align); >if (TYPE_ALIGN (vectype) != align) > vectype = build_aligned_t
[PATCH] libstdc++: Another merge from fast_float upstream [PR107468]
Hi! Upstream fast_float came up with a cheaper test for fegetround () == FE_TONEAREST using one float addition, one subtraction and one comparison. If we know we are rounding to nearest, we can use fast path in more cases as before. The following patch merges those changes into libstdc++. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-11-24 Jakub Jelinek PR libstdc++/107468 * src/c++17/fast_float/MERGE: Adjust for merge from upstream. * src/c++17/fast_float/fast_float.h: Merge from fast_float 2ef9abbcf6a11958b6fa685a89d0150022e82e78 commit. --- libstdc++-v3/src/c++17/fast_float/MERGE.jj 2022-11-07 15:17:14.035071694 +0100 +++ libstdc++-v3/src/c++17/fast_float/MERGE 2022-11-23 17:09:20.940866070 +0100 @@ -1,4 +1,4 @@ -662497742fea7055f0e0ee27e5a7ddc382c2c38e +2ef9abbcf6a11958b6fa685a89d0150022e82e78 The first line of this file holds the git revision number of the last merge done from the master library sources. --- libstdc++-v3/src/c++17/fast_float/fast_float.h.jj 2022-11-07 15:17:14.066071268 +0100 +++ libstdc++-v3/src/c++17/fast_float/fast_float.h 2022-11-23 17:19:41.735693122 +0100 @@ -99,11 +99,11 @@ from_chars_result from_chars_advanced(co || defined(__MINGW64__) \ || defined(__s390x__)\ || (defined(__ppc64__) || defined(__PPC64__) || defined(__ppc64le__) || defined(__PPC64LE__)) ) -#define FASTFLOAT_64BIT +#define FASTFLOAT_64BIT 1 #elif (defined(__i386) || defined(__i386__) || defined(_M_IX86) \ || defined(__arm__) || defined(_M_ARM) \ || defined(__MINGW32__) || defined(__EMSCRIPTEN__)) -#define FASTFLOAT_32BIT +#define FASTFLOAT_32BIT 1 #else // Need to check incrementally, since SIZE_MAX is a size_t, avoid overflow. // We can never tell the register width, but the SIZE_MAX is a good approximation. @@ -111,9 +111,9 @@ from_chars_result from_chars_advanced(co #if SIZE_MAX == 0x #error Unknown platform (16-bit, unsupported) #elif SIZE_MAX == 0x -#define FASTFLOAT_32BIT +#define FASTFLOAT_32BIT 1 #elif SIZE_MAX == 0x -#define FASTFLOAT_64BIT +#define FASTFLOAT_64BIT 1 #else #error Unknown platform (not 32-bit, not 64-bit?) #endif @@ -359,10 +359,12 @@ template struct binary_form static inline constexpr int minimum_exponent(); static inline constexpr int infinite_power(); static inline constexpr int sign_index(); + static inline constexpr int min_exponent_fast_path(); // used when fegetround() == FE_TONEAREST static inline constexpr int max_exponent_fast_path(); static inline constexpr int max_exponent_round_to_even(); static inline constexpr int min_exponent_round_to_even(); static inline constexpr uint64_t max_mantissa_fast_path(int64_t power); + static inline constexpr uint64_t max_mantissa_fast_path(); // used when fegetround() == FE_TONEAREST static inline constexpr int largest_power_of_ten(); static inline constexpr int smallest_power_of_ten(); static inline constexpr T exact_power_of_ten(int64_t power); @@ -372,6 +374,22 @@ template struct binary_form static inline constexpr equiv_uint hidden_bit_mask(); }; +template <> inline constexpr int binary_format::min_exponent_fast_path() { +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0) + return 0; +#else + return -22; +#endif +} + +template <> inline constexpr int binary_format::min_exponent_fast_path() { +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0) + return 0; +#else + return -10; +#endif +} + template <> inline constexpr int binary_format::mantissa_explicit_bits() { return 52; } @@ -418,13 +436,18 @@ template <> inline constexpr int binary_ template <> inline constexpr int binary_format::max_exponent_fast_path() { return 10; } - +template <> inline constexpr uint64_t binary_format::max_mantissa_fast_path() { + return uint64_t(2) << mantissa_explicit_bits(); +} template <> inline constexpr uint64_t binary_format::max_mantissa_fast_path(int64_t power) { // caller is responsible to ensure that // power >= 0 && power <= 22 // return max_mantissa_double[power]; } +template <> inline constexpr uint64_t binary_format::max_mantissa_fast_path() { + return uint64_t(2) << mantissa_explicit_bits(); +} template <> inline constexpr uint64_t binary_format::max_mantissa_fast_path(int64_t power) { // caller is responsible to ensure that // power >= 0 && power <= 10 @@ -619,10 +642,6 @@ parsed_number_string parse_number_string uint64_t i = 0; // an unsigned int avoids signed overflows (which are bad) - while ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) { -i = i * 1 + parse_eight_digits_unrolled(p); // in rare cases, this will overflow, but that's ok -p += 8; - } while ((p != pend) && is_integer(*p)) { // a multi
[PATCH 8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
The current handlings in rs6000_emit_vector_compare is a bit complicated to me, especially after we emit vector float comparison insn with the given code directly. So it's better to refactor the handlings of vector integer comparison here. This is part 4, it's to rework the handlings on GE/GEU/LE/LEU, also make the function not recursive any more. This patch doesn't introduce any functionality change. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the handlings for operators GE/GEU/LE/LEU. --- gcc/config/rs6000/rs6000.cc | 87 - 1 file changed, 17 insertions(+), 70 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index b4ca7b3d1b1..a3645e321a7 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15640,7 +15640,7 @@ output_cbranch (rtx op, const char *label, int reversed, rtx_insn *insn) } /* Emit vector compare for operands OP0 and OP1 using code RCODE. - DMODE is expected destination mode. This is a recursive function. */ + DMODE is expected destination mode. */ static rtx rs6000_emit_vector_compare (enum rtx_code rcode, @@ -15649,7 +15649,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode, { gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode)); gcc_assert (GET_MODE (op0) == GET_MODE (op1)); - rtx mask; + rtx mask = gen_reg_rtx (dmode); /* In vector.md, we support all kinds of vector float point comparison operators in a comparison rtl pattern, we can @@ -15658,7 +15658,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode, of raising invalid exception. */ if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT) { - mask = gen_reg_rtx (dmode); emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); return mask; } @@ -15667,11 +15666,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode, have direct hardware instructions, just emit it directly here. */ if (rcode == EQ || rcode == GT || rcode == GTU) -{ - mask = gen_reg_rtx (dmode); - emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); - return mask; -} +emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); else if (rcode == LT || rcode == LTU) { /* lt{,u}(a,b) = gt{,u}(b,a) */ @@ -15679,76 +15674,28 @@ rs6000_emit_vector_compare (enum rtx_code rcode, std::swap (op0, op1); mask = gen_reg_rtx (dmode); emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); - return mask; } - else if (rcode == NE) + else if (rcode == NE || rcode == LE || rcode == LEU) { - /* ne(a,b) = ~eq(a,b) */ + /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b) */ + enum rtx_code code = reverse_condition (rcode); mask = gen_reg_rtx (dmode); - emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1))); + emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); + enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode); + gcc_assert (nor_code != CODE_FOR_nothing); + emit_insn (GEN_FCN (nor_code) (mask, mask)); +} else { + /* ge{,u}(a,b) = ~gt{,u}(b,a) */ + gcc_assert (rcode == GE || rcode == GEU); + enum rtx_code code = rcode == GE ? GT : GTU; + mask = gen_reg_rtx (dmode); + emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode); gcc_assert (nor_code != CODE_FOR_nothing); emit_insn (GEN_FCN (nor_code) (mask, mask)); - return mask; -} - - switch (rcode) -{ -case GE: -case GEU: -case LE: -case LEU: - /* Try GT/GTU/LT/LTU OR EQ */ - { - rtx c_rtx, eq_rtx; - enum insn_code ior_code; - enum rtx_code new_code; - - switch (rcode) - { - case GE: - new_code = GT; - break; - - case GEU: - new_code = GTU; - break; - - case LE: - new_code = LT; - break; - - case LEU: - new_code = LTU; - break; - - default: - gcc_unreachable (); - } - - ior_code = optab_handler (ior_optab, dmode); - if (ior_code == CODE_FOR_nothing) - return NULL_RTX; - - c_rtx = rs6000_emit_vector_compare (new_code, op0, op1, dmode); - if (!c_rtx) - return NULL_RTX; - - eq_rtx = rs6000_emit_vector_compare (EQ, op0, op1, dmode); - if (!eq_rtx) - return NULL_RTX; - - mask = gen_reg_rtx (dmode); - emit_insn (GEN_FCN (ior_code) (mask, c_rtx, eq_rtx)); - return mask; - } - break; -default: - return NULL_RTX; } - /* You only get two chances. */ - return NULL_RTX; + return mask; } /* Emit vector conditional expression. DEST is destination.
[PATCH 5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
The current handlings in rs6000_emit_vector_compare is a bit complicated to me, especially after we emit vector float comparison insn with the given code directly. So it's better to refactor the handlings of vector integer comparison here. This is part 1, it's to remove the helper function rs6000_emit_vector_compare_inner and move the logics into rs6000_emit_vector_compare. This patch doesn't introduce any functionality change. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove. (rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/ GT/GTU directly. --- gcc/config/rs6000/rs6000.cc | 37 + 1 file changed, 9 insertions(+), 28 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 94e039649f5..0a5826800c5 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15639,30 +15639,6 @@ output_cbranch (rtx op, const char *label, int reversed, rtx_insn *insn) return string; } -/* Return insn for VSX or Altivec comparisons. */ - -static rtx -rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1) -{ - machine_mode mode = GET_MODE (op0); - gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT); - - switch (code) -{ -default: - break; - -case EQ: -case GT: -case GTU: - rtx mask = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1))); - return mask; -} - - return NULL_RTX; -} - /* Emit vector compare for operands OP0 and OP1 using code RCODE. DMODE is expected destination mode. This is a recursive function. */ @@ -15687,10 +15663,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode, return mask; } - /* See if the comparison works as is. */ - mask = rs6000_emit_vector_compare_inner (rcode, op0, op1); - if (mask) -return mask; + /* For any of vector integer comparison operators for which we + have direct hardware instructions, just emit it directly + here. */ + if (rcode == EQ || rcode == GT || rcode == GTU) +{ + mask = gen_reg_rtx (dmode); + emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); + return mask; +} bool swap_operands = false; bool try_again = false; -- 2.27.0
[PATCH] libstdc++: Workaround buggy printf on Solaris in to_chars/float128_c++23.cc test [PR107815]
Hi! As mentioned in the PR, Solaris apparently can handle right printf ("%.0Lf\n", 1e+202L * __DBL_MAX__); which prints 511 chars long number, but can't handle printf ("%.0Lf\n", 1e+203L * __DBL_MAX__); nor printf ("%.0Lf\n", __LDBL_MAX__); properly, instead of printing 512 chars long number for the former and 4933 chars long number for the second, it handles them as if user asked for "%.0Le\n" in those cases. The following patch disables the single problematic value that fails in the test, and also fixes commented out debugging printouts. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-11-24 Jakub Jelinek PR libstdc++/107815 * testsuite/20_util/to_chars/float128_c++23.cc (test): Disable __FLT128_MAX__ test on Solaris. Fix up commented out debugging printouts. --- libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc.jj 2022-11-08 11:19:22.251768167 +0100 +++ libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc 2022-11-23 17:02:22.380051796 +0100 @@ -52,14 +52,17 @@ test(std::chars_format fmt = std::chars_ std::numbers::inv_sqrt3_v, std::numbers::egamma_v, std::numbers::phi_v, +// Solaris has non-conforming printf, see PR98384 and PR107815. +#if !(defined(__sun__) && defined(__svr4__)) std::numeric_limits::max() +#endif }; char str1[1], str2[1]; for (auto u : tests) { auto [ptr1, ec1] = std::to_chars(str1, str1 + sizeof(str1), u, fmt); VERIFY( ec1 == std::errc() ); -//std::cout << i << ' ' << std::string_view (str1, ptr1) << '\n'; +//std::cout << u << ' ' << std::string_view (str1, ptr1) << '\n'; if (fmt == std::chars_format::fixed) { auto [ptr2, ec2] = std::to_chars(str2, str2 + (ptr1 - str1), u, fmt); @@ -76,7 +79,7 @@ test(std::chars_format fmt = std::chars_ auto [ptr5, ec5] = std::to_chars(str1, str1 + sizeof(str1), u, fmt, 90); VERIFY( ec5 == std::errc() ); -//std::cout << i << ' ' << std::string_view (str1, ptr5) << '\n'; +//std::cout << u << ' ' << std::string_view (str1, ptr5) << '\n'; v = 4.0f128; auto [ptr6, ec6] = std::from_chars(str1, ptr5, v, fmt == std::chars_format{} Jakub
[PATCH 4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
All kinds of vector float comparison operators have been supported in a rtl comparison pattern as vector.md, we can just emit an rtx comparison insn with the given comparison operator in function rs6000_emit_vector_compare instead of checking and handling the reverse condition cases. This is part 4, it further checks for comparison opeators LT/UNGE. In rs6000_emit_vector_compare, for the handling of LT, it switches to use code GT, swaps operands and try again, it's exactly the same as what we have in vector.md: ; lt(a,b) = gt(b,a) As to UNGE, in rs6000_emit_vector_compare, it uses reversed code LT and further operates on the result with one_cmpl, it's also the same as what's in vector.md: ; unge(a,b) = ~lt(a,b) This patch should not have any functionality change too. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Emit rtx comparison for operators LT/UNGE of MODE_VECTOR_FLOAT directly. (rs6000_emit_vector_compare): Move assertion of no MODE_VECTOR_FLOAT to function beginning. --- gcc/config/rs6000/rs6000.cc | 24 1 file changed, 4 insertions(+), 20 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 98754805bd2..94e039649f5 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15645,6 +15645,7 @@ static rtx rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1) { machine_mode mode = GET_MODE (op0); + gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT); switch (code) { @@ -15654,7 +15655,6 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1) case EQ: case GT: case GTU: - gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT); rtx mask = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1))); return mask; @@ -15679,18 +15679,8 @@ rs6000_emit_vector_compare (enum rtx_code rcode, comparison operators in a comparison rtl pattern, we can just emit the comparison rtx insn directly here. Besides, we should have a centralized place to handle the possibility - of raising invalid exception. For EQ/GT/GE/UNORDERED/ - ORDERED/LTGT/UNEQ, they are handled equivalently as before; - for NE/UNLE/UNLT, they are handled with reversed code - and inverting, it's the same as before; for LE/UNGT, they - are handled with LE ior EQ previously, emitting directly - here will make use of GE later, it's slightly better; - - FIXME: Handle the remaining vector float comparison operators - here. */ - if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT - && rcode != LT - && rcode != UNGE) + of raising invalid exception. */ + if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT) { mask = gen_reg_rtx (dmode); emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); @@ -15718,23 +15708,17 @@ rs6000_emit_vector_compare (enum rtx_code rcode, try_again = true; break; case NE: -case UNGE: /* Invert condition and try again. e.g., A != B becomes ~(A==B). */ { - enum rtx_code rev_code; enum insn_code nor_code; rtx mask2; - rev_code = reverse_condition_maybe_unordered (rcode); - if (rev_code == UNKNOWN) - return NULL_RTX; - nor_code = optab_handler (one_cmpl_optab, dmode); if (nor_code == CODE_FOR_nothing) return NULL_RTX; - mask2 = rs6000_emit_vector_compare (rev_code, op0, op1, dmode); + mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode); if (!mask2) return NULL_RTX; -- 2.27.0
[PATCH 2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
All kinds of vector float comparison operators have been supported in a rtl comparison pattern as vector.md, we can just emit an rtx comparison insn with the given comparison operator in function rs6000_emit_vector_compare instead of checking and handling the reverse condition cases. This is part 2, it further checks for comparison opeators NE/UNLE/UNLT. In rs6000_emit_vector_compare, they are handled with reversed code which is queried from function reverse_condition_maybe_unordered and inverting with one_cmpl_optab. It's the same as what we have in vector.md: ; ne(a,b) = ~eq(a,b) ; unle(a,b) = ~gt(a,b) ; unlt(a,b) = ~ge(a,b) The operators on the right side have been supported in part 1. This patch should not have any functionality change too. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx comparison for operators NE/UNLE/UNLT of MODE_VECTOR_FLOAT directly. --- gcc/config/rs6000/rs6000.cc | 20 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 5a8f7ff3bf8..09299bef6a2 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15679,20 +15679,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode, comparison operators in a comparison rtl pattern, we can just emit the comparison rtx insn directly here. Besides, we should have a centralized place to handle the possibility - of raising invalid exception. As the first step, only check - operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they - are handled equivalently as before. + of raising invalid exception. For EQ/GT/GE/UNORDERED/ + ORDERED/LTGT/UNEQ, they are handled equivalently as before; + for NE/UNLE/UNLT, they are handled with reversed code + and inverting, it's the same as before. FIXME: Handle the remaining vector float comparison operators here. */ if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT - && (rcode == EQ - || rcode == GT - || rcode == GE - || rcode == UNORDERED - || rcode == ORDERED - || rcode == LTGT - || rcode == UNEQ)) + && rcode != LT + && rcode != LE + && rcode != UNGE + && rcode != UNGT) { mask = gen_reg_rtx (dmode); emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); @@ -15720,8 +15718,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode, try_again = true; break; case NE: -case UNLE: -case UNLT: case UNGE: case UNGT: /* Invert condition and try again. -- 2.27.0
[PATCH 7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
The current handlings in rs6000_emit_vector_compare is a bit complicated to me, especially after we emit vector float comparison insn with the given code directly. So it's better to refactor the handlings of vector integer comparison here. This is part 3, it's to refactor the handlings on NE. This patch doesn't introduce any functionality change. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the handlings for operator NE. --- gcc/config/rs6000/rs6000.cc | 30 ++ 1 file changed, 10 insertions(+), 20 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index c1aebbb5c03..b4ca7b3d1b1 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15681,29 +15681,19 @@ rs6000_emit_vector_compare (enum rtx_code rcode, emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); return mask; } + else if (rcode == NE) +{ + /* ne(a,b) = ~eq(a,b) */ + mask = gen_reg_rtx (dmode); + emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1))); + enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode); + gcc_assert (nor_code != CODE_FOR_nothing); + emit_insn (GEN_FCN (nor_code) (mask, mask)); + return mask; +} switch (rcode) { -case NE: - /* Invert condition and try again. -e.g., A != B becomes ~(A==B). */ - { - enum insn_code nor_code; - rtx mask2; - - nor_code = optab_handler (one_cmpl_optab, dmode); - if (nor_code == CODE_FOR_nothing) - return NULL_RTX; - - mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode); - if (!mask2) - return NULL_RTX; - - mask = gen_reg_rtx (dmode); - emit_insn (GEN_FCN (nor_code) (mask, mask2)); - return mask; - } - break; case GE: case GEU: case LE: -- 2.27.0
[PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
All kinds of vector float comparison operators have been supported in a rtl comparison pattern as vector.md, we can just emit an rtx comparison insn with the given comparison operator in function rs6000_emit_vector_compare instead of checking and handling the reverse condition cases. This is part 1, it only handles the operators which are already emitted with an rtx comparison previously in function rs6000_emit_vector_compare_inner, they are EQ/GT/GE/ORDERED/ UNORDERED/UNEQ/LTGT. There is no functionality change. With this change, rs6000_emit_vector_compare_inner would only work for vector integer comparison handling, it would be cleaned up later in vector integer comparison rework. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Move MODE_VECTOR_FLOAT handlings out. (rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/GT/ GE/UNORDERED/ORDERED/UNEQ/LTGT of MODE_VECTOR_FLOAT directly, and adjust one call site of rs6000_emit_vector_compare_inner to rs6000_emit_vector_compare. --- gcc/config/rs6000/rs6000.cc | 47 - 1 file changed, 31 insertions(+), 16 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index d2743f7bce6..5a8f7ff3bf8 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15644,7 +15644,6 @@ output_cbranch (rtx op, const char *label, int reversed, rtx_insn *insn) static rtx rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1) { - rtx mask; machine_mode mode = GET_MODE (op0); switch (code) @@ -15652,19 +15651,11 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1) default: break; -case GE: - if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) - return NULL_RTX; - /* FALLTHRU */ - case EQ: case GT: case GTU: -case ORDERED: -case UNORDERED: -case UNEQ: -case LTGT: - mask = gen_reg_rtx (mode); + gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT); + rtx mask = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1))); return mask; } @@ -15680,18 +15671,42 @@ rs6000_emit_vector_compare (enum rtx_code rcode, rtx op0, rtx op1, machine_mode dmode) { - rtx mask; - bool swap_operands = false; - bool try_again = false; - gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode)); gcc_assert (GET_MODE (op0) == GET_MODE (op1)); + rtx mask; + + /* In vector.md, we support all kinds of vector float point + comparison operators in a comparison rtl pattern, we can + just emit the comparison rtx insn directly here. Besides, + we should have a centralized place to handle the possibility + of raising invalid exception. As the first step, only check + operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they + are handled equivalently as before. + + FIXME: Handle the remaining vector float comparison operators + here. */ + if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT + && (rcode == EQ + || rcode == GT + || rcode == GE + || rcode == UNORDERED + || rcode == ORDERED + || rcode == LTGT + || rcode == UNEQ)) +{ + mask = gen_reg_rtx (dmode); + emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); + return mask; +} /* See if the comparison works as is. */ mask = rs6000_emit_vector_compare_inner (rcode, op0, op1); if (mask) return mask; + bool swap_operands = false; + bool try_again = false; + switch (rcode) { case LT: @@ -15791,7 +15806,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode, if (swap_operands) std::swap (op0, op1); - mask = rs6000_emit_vector_compare_inner (rcode, op0, op1); + mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode); if (mask) return mask; } -- 2.27.0
[PATCH 9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
The current handlings in rs6000_emit_vector_compare is a bit complicated to me, especially after we emit vector float comparison insn with the given code directly. So it's better to refactor the handlings of vector integer comparison here. This is part 5, it's to refactor all the handlings of vector integer comparison to make it neat. This patch doesn't introduce any functionality change. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the handlings of vector integer comparison. --- gcc/config/rs6000/rs6000.cc | 68 - 1 file changed, 44 insertions(+), 24 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index a3645e321a7..b694080840a 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15662,34 +15662,54 @@ rs6000_emit_vector_compare (enum rtx_code rcode, return mask; } - /* For any of vector integer comparison operators for which we - have direct hardware instructions, just emit it directly - here. */ - if (rcode == EQ || rcode == GT || rcode == GTU) -emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); - else if (rcode == LT || rcode == LTU) + bool swap_operands = false; + bool need_invert = false; + enum rtx_code code = rcode; + + switch (rcode) { +case EQ: +case GT: +case GTU: + /* Emit directly with native hardware insn. */ + break; +case LT: +case LTU: /* lt{,u}(a,b) = gt{,u}(b,a) */ - enum rtx_code code = swap_condition (rcode); - std::swap (op0, op1); - mask = gen_reg_rtx (dmode); - emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); + code = swap_condition (rcode); + swap_operands = true; + break; +case NE: +case LE: +case LEU: + /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b) */ + code = reverse_condition (rcode); + need_invert = true; + break; +case GE: + /* ge(a,b) = ~gt(b,a) */ + code = GT; + swap_operands = true; + need_invert = true; + break; +case GEU: + /* geu(a,b) = ~gtu(b,a) */ + code = GTU; + swap_operands = true; + need_invert = true; + break; +default: + gcc_unreachable (); + break; } - else if (rcode == NE || rcode == LE || rcode == LEU) + + if (swap_operands) +std::swap (op0, op1); + + emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); + + if (need_invert) { - /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b) */ - enum rtx_code code = reverse_condition (rcode); - mask = gen_reg_rtx (dmode); - emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); - enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode); - gcc_assert (nor_code != CODE_FOR_nothing); - emit_insn (GEN_FCN (nor_code) (mask, mask)); -} else { - /* ge{,u}(a,b) = ~gt{,u}(b,a) */ - gcc_assert (rcode == GE || rcode == GEU); - enum rtx_code code = rcode == GE ? GT : GTU; - mask = gen_reg_rtx (dmode); - emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode); gcc_assert (nor_code != CODE_FOR_nothing); emit_insn (GEN_FCN (nor_code) (mask, mask)); -- 2.27.0
[PATCH 6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
The current handlings in rs6000_emit_vector_compare is a bit complicated to me, especially after we emit vector float comparison insn with the given code directly. So it's better to refactor the handlings of vector integer comparison here. This is part 2, it's to refactor the handlings on LT and LTU. This patch doesn't introduce any functionality change. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the handlings for operators LT and LTU. --- gcc/config/rs6000/rs6000.cc | 32 +--- 1 file changed, 9 insertions(+), 23 deletions(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 0a5826800c5..c1aebbb5c03 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15672,22 +15672,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode, emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); return mask; } - - bool swap_operands = false; - bool try_again = false; + else if (rcode == LT || rcode == LTU) +{ + /* lt{,u}(a,b) = gt{,u}(b,a) */ + enum rtx_code code = swap_condition (rcode); + std::swap (op0, op1); + mask = gen_reg_rtx (dmode); + emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1))); + return mask; +} switch (rcode) { -case LT: - rcode = GT; - swap_operands = true; - try_again = true; - break; -case LTU: - rcode = GTU; - swap_operands = true; - try_again = true; - break; case NE: /* Invert condition and try again. e.g., A != B becomes ~(A==B). */ @@ -15761,16 +15757,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode, return NULL_RTX; } - if (try_again) -{ - if (swap_operands) - std::swap (op0, op1); - - mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode); - if (mask) - return mask; -} - /* You only get two chances. */ return NULL_RTX; } -- 2.27.0
[PATCH 3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
All kinds of vector float comparison operators have been supported in a rtl comparison pattern as vector.md, we can just emit an rtx comparison insn with the given comparison operator in function rs6000_emit_vector_compare instead of checking and handling the reverse condition cases. This is part 3, it further checks for comparison opeators LE/UNGT. In rs6000_emit_vector_compare, UNGT is handled with reversed code LE and inverting with one_cmpl_optab, LE is handled with LT ior EQ, while in vector.md, we have the support: ; le(a,b) = ge(b,a) ; ungt(a,b) = ~le(a,b) The associated test case shows it's an improvement. gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx comparison for operators LE/UNGT of MODE_VECTOR_FLOAT directly. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vcond-fp.c: New test. --- gcc/config/rs6000/rs6000.cc | 9 gcc/testsuite/gcc.target/powerpc/vcond-fp.c | 25 + 2 files changed, 29 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 09299bef6a2..98754805bd2 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -15682,15 +15682,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode, of raising invalid exception. For EQ/GT/GE/UNORDERED/ ORDERED/LTGT/UNEQ, they are handled equivalently as before; for NE/UNLE/UNLT, they are handled with reversed code - and inverting, it's the same as before. + and inverting, it's the same as before; for LE/UNGT, they + are handled with LE ior EQ previously, emitting directly + here will make use of GE later, it's slightly better; FIXME: Handle the remaining vector float comparison operators here. */ if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT && rcode != LT - && rcode != LE - && rcode != UNGE - && rcode != UNGT) + && rcode != UNGE) { mask = gen_reg_rtx (dmode); emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1))); @@ -15719,7 +15719,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode, break; case NE: case UNGE: -case UNGT: /* Invert condition and try again. e.g., A != B becomes ~(A==B). */ { diff --git a/gcc/testsuite/gcc.target/powerpc/vcond-fp.c b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c new file mode 100644 index 000..b71861d3588 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -mpower8-vector" } */ + +/* Test we use xvcmpge[sd]p rather than xvcmpeq[sd]p and xvcmpgt[sd]p + for UNGT and LE handlings. */ + +#define UNGT(a, b) (!__builtin_islessequal ((a), (b))) +#define LE(a, b) (((a) <= (b))) + +#define TEST_VECT(NAME, TYPE) \ + __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y, \ +int *res, int n) \ + { \ +for (int i = 0; i < n; i++) \ + res[i] = NAME (x[i], y[i]); \ + } + +#define TEST(TYPE) \ + TEST_VECT (UNGT, TYPE) \ + TEST_VECT (LE, TYPE) + +TEST (float) +TEST (double) + +/* { dg-final { scan-assembler-not {\mxvcmp(gt|eq)[sd]p\M} } } */ -- 2.27.0
[PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare
Hi, Following Segher's suggestion, this patch series is to rework function rs6000_emit_vector_compare for vector float and int in multiple steps, it's based on the previous attempts [1][2]. As mentioned in [1], the need to rework this for float is to make a centralized place for vector float comparison handlings instead of supporting with swapping ops and reversing code etc. dispersedly. It's also for a subsequent patch to handle comparison operators with or without trapping math (PR105480). With the handling on vector float reworked, we can further make the handling on vector int simplified as shown. For Segher's concern about whether this rework causes any assembly change, I constructed two testcases for vector float[3] and int[4] respectively before, it showed the most are fine excepting for the difference on LE and UNGT, it's demonstrated as improvement since it uses GE instead of GT ior EQ. The associated test case in patch 3/9 is a good example. Besides, w/ and w/o the whole patch series, I built the whole SPEC2017 at options -O3 and -Ofast separately, checked the differences on object assembly. The result showed that the most are unchanged, except for: * at -O3, 521.wrf_r has 9 object files and 526.blender_r has 9 object files with differences. * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has one and 527.cam4_r has 4 object files with differences. By looking into these differences, all significant differences are caused by the known improvement mentined above transforming GT ior EQ to GE, which can also affect unrolling decision due to insn count. Some other trivial differences are branch target offset difference, nop difference for alignment, vsx register number differences etc. I also evaluated the runtime performance for these changed benchmarks, the result is neutral. These patches are bootstrapped and regress-tested incrementally on powerpc64-linux-gnu P7 & P8, and powerpc64le-linux-gnu P9 & P10. Is it ok for trunk? BR, Kewen - [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html Kewen Lin (9): rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1 rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2 rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3 rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4 rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1 rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2 rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3 rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4 rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5 gcc/config/rs6000/rs6000.cc | 180 ++-- gcc/testsuite/gcc.target/powerpc/vcond-fp.c | 25 +++ 2 files changed, 74 insertions(+), 131 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c -- 2.27.0
[PATCH] c++: Don't clear TREE_READONLY for -fmerge-all-constants for non-aggregates [PR107558]
Hi! The following testcase ICEs, because OpenMP lowering for shared clause on l variable with REFERENCE_TYPE creates POINTER_TYPE to REFERENCE_TYPE. The reason is that the automatic variable has non-trivial construction (reference to a lambda) and -fmerge-all-constants is on and so TREE_READONLY isn't set - omp-low will handle automatic TREE_READONLY vars in shared specially and only copy to the construct and not back, while !TREE_READONLY are assumed to be changeable. The PR91529 change rationale was that the gimplification can change some non-addressable automatic variables to TREE_STATIC with -fmerge-all-constants and therefore TREE_READONLY on them is undesirable. But, the gimplifier does that only for aggregate variables: switch (TREE_CODE (type)) { case RECORD_TYPE: case UNION_TYPE: case QUAL_UNION_TYPE: case ARRAY_TYPE: and not for anything else. So, I think clearing TREE_READONLY for automatic integral or reference or pointer etc. vars for -fmerge-all-constants only is unnecessary. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-11-24 Jakub Jelinek PR c++/107558 * decl.cc (cp_finish_decl): Don't clear TREE_READONLY on automatic non-aggregate variables just because of -fmerge-all-constants. * g++.dg/gomp/pr107558.C: New test. --- gcc/cp/decl.cc.jj 2022-11-19 09:21:14.662439877 +0100 +++ gcc/cp/decl.cc 2022-11-23 13:12:31.866553152 +0100 @@ -8679,8 +8679,10 @@ cp_finish_decl (tree decl, tree init, bo if (var_definition_p /* With -fmerge-all-constants, gimplify_init_constructor -might add TREE_STATIC to the variable. */ - && (TREE_STATIC (decl) || flag_merge_constants >= 2)) +might add TREE_STATIC to aggregate variables. */ + && (TREE_STATIC (decl) + || (flag_merge_constants >= 2 + && AGGREGATE_TYPE_P (type { /* If a TREE_READONLY variable needs initialization at runtime, it is no longer readonly and we need to --- gcc/testsuite/g++.dg/gomp/pr107558.C.jj 2022-11-23 13:13:27.260736525 +0100 +++ gcc/testsuite/g++.dg/gomp/pr107558.C2022-11-23 13:15:22.271041005 +0100 @@ -0,0 +1,14 @@ +// PR c++/107558 +// { dg-do compile { target c++11 } } +// { dg-additional-options "-fmerge-all-constants" } +// { dg-additional-options "-flto" { target lto } } + +int a = 15; + +void +foo () +{ + auto &&l = [&]() { return a; }; +#pragma omp target parallel + l (); +} Jakub
[r13-4272 Regression] FAIL: gcc.dg/guality/loop-1.c -O3 -g -DPREVENT_OPTIMIZATION line 20 i == 1 on Linux/x86_64
On Linux/x86_64, 8caf155a3d6e23e47bf55068ad23c23d4655a054 is the first bad commit commit 8caf155a3d6e23e47bf55068ad23c23d4655a054 Author: Hongyu Wang Date: Sat Nov 19 09:38:00 2022 +0800 i386: Only enable small loop unrolling in backend [PR 107692] caused FAIL: gcc.dg/guality/loop-1.c -O2 -DPREVENT_OPTIMIZATION line 20 i == 1 FAIL: gcc.dg/guality/loop-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -DPREVENT_OPTIMIZATION line 20 i == 1 FAIL: gcc.dg/guality/loop-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 20 i == 1 FAIL: gcc.dg/guality/loop-1.c -O3 -g -DPREVENT_OPTIMIZATION line 20 i == 1 with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-4272/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="guality.exp=gcc.dg/guality/loop-1.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="guality.exp=gcc.dg/guality/loop-1.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com)
Re: [PATCH v2] [x86] Fix incorrect _mm_cvtsbh_ss.
On Thu, Nov 24, 2022 at 09:22:00AM +0800, liuhongt via Gcc-patches wrote: > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -130,6 +130,7 @@ (define_c_enum "unspec" [ >;; For AVX/AVX512F support >UNSPEC_SCALEF >UNSPEC_PCMP > + UNSPEC_CVTBFSF > >;; Generic math support >UNSPEC_IEEE_MIN; not commutative > @@ -4961,6 +4962,31 @@ (define_insn "*extendhf2" > (set_attr "prefix" "evex") > (set_attr "mode" "")]) > > +(define_expand "extendbfsf2" > + [(set (match_operand:SF 0 "register_operand") > + (unspec:SF > + [(match_operand:BF 1 "register_operand")] > + UNSPEC_CVTBFSF))] > + "TARGET_SSE2 && !HONOR_NANS (BFmode) && !flag_signaling_nans") I think if !HONOR_NANS (BFmode), then flag_signaling_nans doesn't matter, the former says that no NaNs may appear in a valid program, so just testing !HONOR_NANS (BFmode) should be enough. What I'm not sure about, my memory is weak, is whether one can safely use the fast math related tests in define_expand conditions. I vaguely remember init_all_optabs remembers the conditions, for changes say in the ISA options optabs are reinited, but not sure if that happens for optimization option changes like the fast math related options are. So it would be perhaps safer to use just TARGET_SSE2 as the expand condition and in the C code body do if (HONOR_NANS (BFmode) FAIL; (similarly for truncsfbf2). On the other side brief look at x86 insn-flags.h shows several fast math related checks in HAVE_* macros. PR92791 I found related to this was actually about optimize_function_for_{size,speed}_p (cfun) so maybe fast math related stuff is fine, just not the optimization for speed or size. Jakub
Re: [PING] [PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.
> The motivation of this patch is to correct the wrong estimation of >> the number of instructions needed for loading a 64bit constant in >> rv32 in the current cost model(riscv_interger_cost). According to >> the current implementation, if a constant requires more than 3 >> instructions(riscv_const_insn and riscv_legitimate_constant_p), >> then the constant will be put into constant pool when expanding >> gimple to rtl(legitimate_constant_p hook and emit_move_insn). >> So the inaccurate cost model leads to the suboptimal codegen >> in rv32 and the wrong estimation part could be corrected through >> this fix. >> >> e.g. the current codegen for loading 0x839290001 in rv32 >> >> lui a5,%hi(.LC0) >> lw a0,%lo(.LC0)(a5) >> lw a1,%lo(.LC0+4)(a5) >> .LC0: >> .word 958988289 >> .word 8 >> >> output after this patch >> >> li a0,958988288 >> addi a0,a0,1 >> li a1,8 >> >> gcc/ChangeLog: >> >> * config/riscv/riscv.cc (riscv_build_integer): Handle the case of loading >> 64bit constant in rv32. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/rv32-load-64bit-constant.c: New test. >> >> Signed-off-by: Lin Sinan >> --- >> gcc/config/riscv/riscv.cc | 23 +++ >> .../riscv/rv32-load-64bit-constant.c | 38 +++ >> 2 files changed, 61 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c >> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >> index 32f9ef9ade9..9dffabdc5e3 100644 >> --- a/gcc/config/riscv/riscv.cc >> +++ b/gcc/config/riscv/riscv.cc >> @@ -618,6 +618,29 @@ riscv_build_integer (struct riscv_integer_op *codes, >> HOST_WIDE_INT value, >> } >> } >> >> + if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT) > > Nit. It's common practice to have the TARGET test first in a series of > tests. It may also be advisable to break this into two lines. > Something like this: > > > if ((!TARGET_64BIT) > || value > INT32_MAX || value < INT32_MIN) > > > That's the style most GCC folks are more accustomed to reading. Thanks for the tips and I will change it then. >> + { >> + unsigned HOST_WIDE_INT loval = sext_hwi (value, 32); >> + unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32); >> + struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS], >> + hicode[RISCV_MAX_INTEGER_OPS]; >> + int hi_cost, lo_cost; >> + >> + hi_cost = riscv_build_integer_1 (hicode, hival, mode); >> + if (hi_cost < cost) >> + { >> + lo_cost = riscv_build_integer_1 (alt_codes, loval, mode); >> + if (lo_cost + hi_cost < cost) > > Just so I'm sure. "cost" here refers strictly to other synthesized > forms? If so, then ISTM that we'd want to generate the new style when > lo_cost + hi_cost < cost OR when lo_cost + hi_cost is less than loading > the constant from memory -- which is almost certainly more than "3" > since the sequence from memory will be at least 3 instructions, two of > which will hit memory. > > > Jeff > Yes, almost right. The basic idea of this patch is to improve the cost calculation for loading 64bit constant in rv32, instead of adding a new way to load constant. gcc now loads 0x739290001LL in rv32gc with three instructions, li a0,958988288 addi a0,a0,1 li a1,7 However, when it loads 0x839290001LL, the output assembly becomes lui a5,%hi(.LC0) lw a0,%lo(.LC0)(a5) lw a1,%lo(.LC0+4)(a5) .LC0: .word 958988289 .word 8 The cost calculation is inaccurate in such cases, since loading these two constant should have no difference in rv32 (just change `li a1,7` to `li a1,8` to load the hi part). This patch will take these cases into consideration. BR, Sinan