Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization
On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote: > On Fri, 18 Aug 2023 at 17:11, Richard Biener wrote: > > > > On Fri, 18 Aug 2023, Richard Biener wrote: > > > > > On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote: > > > > > > > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford > > > > wrote: > > > > > > > > > > Richard Biener writes: > > > > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote: > > > > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener > > > > > >> wrote: > > > > > >> > It doesn't seem to make a difference for x86. That said, the > > > > > >> > "fix" is > > > > > >> > probably sticking the correct target on the dump-check, it seems > > > > > >> > that vect_fold_extract_last is no longer correct here. > > > > > >> Um sorry, I did go thru various checks in target-supports.exp, but > > > > > >> not > > > > > >> sure which one will be appropriate for this case, > > > > > >> and am stuck here :/ Could you please suggest how to proceed ? > > > > > > > > > > > > Maybe Richard S. knows the magic thing to test, he originally > > > > > > implemented the direct conversion support. I suggest to implement > > > > > > such dg-checks if they are not present (I can't find them), > > > > > > possibly quite specific to the modes involved (like we have > > > > > > other checks with _qi_to_hi suffixes, for float modes maybe > > > > > > just _float). > > > > > > > > > > Yeah, can't remember specific selectors for that feature. TBH I think > > > > > most (all?) of the tests were AArch64-specific. > > > > Hi, > > > > As Richi mentioned above, the test now vectorizes on AArch64 because > > > > it has support for direct conversion > > > > between vectors while x86 doesn't. IIUC this is because > > > > supportable_convert_operation returns true > > > > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for > > > > doing the conversion ? > > > > > > > > In the attached patch, I added a new target check vect_extend which > > > > (currently) returns 1 only for aarch64*-*-*, > > > > which makes the test PASS on both the targets, altho I am not sure if > > > > this is entirely correct. > > > > Does the patch look OK ? > > > > > > Can you make vect_extend more specific, say vect_extend_hi_si or > > > what is specifically needed here? Note I'll have to investigate > > > why x86 cannot vectorize here since in fact it does have > > > the extend operation ... it might be also worth splitting the > > > sign/zero extend case, so - vect_sign_extend_hi_si or > > > vect_extend_short_int? > > > > And now having anaylzed _why_ x86 doesn't vectorize it's rather > > why we get this vectorized with NEON which is because > > > > static opt_machine_mode > > aarch64_vectorize_related_mode (machine_mode vector_mode, > > scalar_mode element_mode, > > poly_uint64 nunits) > > { > > ... > > /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors. */ > > if (TARGET_SIMD > > && (vec_flags & VEC_ADVSIMD) > > && known_eq (nunits, 0U) > > && known_eq (GET_MODE_BITSIZE (vector_mode), 64U) > > && maybe_ge (GET_MODE_BITSIZE (element_mode) > >* GET_MODE_NUNITS (vector_mode), 128U)) > > { > > machine_mode res = aarch64_simd_container_mode (element_mode, 128); > > if (VECTOR_MODE_P (res)) > > return res; > > > > which makes us get a V4SImode vector for a V4HImode loop vector_mode. > Thanks for the explanation! > > > > So I think the appropriate effective dejagnu target is > > aarch64-*-* (there's none specifically to advsimd, not sure if one > > can disable that?) > The attached patch uses aarch64*-*-* target check, and additionally > for SVE (and other targets supporting vect_fold_extract_last) it > checks > if the condition reduction was carried out using FOLD_EXTRACT_LAST. > Does that look OK ? Works for me. Richard. > Thanks, > Prathamesh > > > > > Richard. > > > > > > Thanks, > > > > Prathamesh > > > > > > > > > > Thanks, > > > > > Richard > > > > > > > > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm
On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote: > On Fri, 18 Aug 2023 at 14:52, Richard Biener wrote: > > > > On Fri, 18 Aug 2023, Richard Sandiford wrote: > > > > > Richard Biener writes: > > > > The following avoids running into somehow flawed logic in fold_vec_perm > > > > for non-VLA vectors. > > > > > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > > > > > Richard. > > > > > > > > PR tree-optimization/111048 > > > > * fold-const.cc (fold_vec_perm_cst): Check for non-VLA > > > > vectors first. > > > > > > > > * gcc.dg/torture/pr111048.c: New testcase. > > > > > > Please don't do this as a permanent thing. It was a deliberate choice > > > to have the is_constant be the fallback, so that the "generic" (VLA+VLS) > > > logic gets more coverage. Like you say, if something is wrong for VLS > > > then the chances are that it's also wrong for VLA. > > > > Sure, feel free to undo this change together with the fix for the > > VLA case. > Hi, > The attached patch reverts the workaround, and fixes the issue. > Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and > x64_64-linux-gnu. > OK to commit ? OK. > Thanks, > Prathamesh > > > > Richard. > > > > > Thanks, > > > Richard > > > > > > > > > > --- > > > > gcc/fold-const.cc | 12 ++-- > > > > gcc/testsuite/gcc.dg/torture/pr111048.c | 24 > > > > 2 files changed, 30 insertions(+), 6 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c > > > > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > > > index 5c51c9d91be..144fd7481b3 100644 > > > > --- a/gcc/fold-const.cc > > > > +++ b/gcc/fold-const.cc > > > > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree > > > > arg1, const vec_perm_indices &sel, > > > >unsigned res_npatterns, res_nelts_per_pattern; > > > >unsigned HOST_WIDE_INT res_nelts; > > > > > > > > + if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts)) > > > > +{ > > > > + res_npatterns = res_nelts; > > > > + res_nelts_per_pattern = 1; > > > > +} > > > >/* (1) If SEL is a suitable mask as determined by > > > > valid_mask_for_fold_vec_perm_cst_p, then: > > > > res_npatterns = max of npatterns between ARG0, ARG1, and SEL > > > > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree > > > > arg1, const vec_perm_indices &sel, > > > > res_npatterns = nelts in result vector. > > > > res_nelts_per_pattern = 1. > > > > This exception is made so that VLS ARG0, ARG1 and SEL work as > > > > before. */ > > > > - if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason)) > > > > + else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, > > > > reason)) > > > > { > > > >res_npatterns > > > > = std::max (VECTOR_CST_NPATTERNS (arg0), > > > > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree > > > > arg1, const vec_perm_indices &sel, > > > > > > > >res_nelts = res_npatterns * res_nelts_per_pattern; > > > > } > > > > - else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts)) > > > > -{ > > > > - res_npatterns = res_nelts; > > > > - res_nelts_per_pattern = 1; > > > > -} > > > >else > > > > return NULL_TREE; > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c > > > > b/gcc/testsuite/gcc.dg/torture/pr111048.c > > > > new file mode 100644 > > > > index 000..475978aae2b > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c > > > > @@ -0,0 +1,24 @@ > > > > +/* { dg-do run } */ > > > > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */ > > > > + > > > > +typedef unsigned char u8; > > > > + > > > > +__attribute__((noipa)) > > > > +static void check(const u8 * v) { > > > > +if (*v != 15) __builtin_trap(); > > > > +} > > > > + > > > > +__attribute__((noipa)) > > > > +static void bug(void) { > > > > +u8 in_lanes[32]; > > > > +for (unsigned i = 0; i < 32; i += 2) { > > > > + in_lanes[i + 0] = 0; > > > > + in_lanes[i + 1] = ((u8)0xff) >> (i & 7); > > > > +} > > > > + > > > > +check(&in_lanes[13]); > > > > + } > > > > + > > > > +int main() { > > > > +bug(); > > > > +} > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
[PING^4] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.
Ping! Forwarded Message Subject: [PING^3] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces. Date: Tue, 1 Aug 2023 13:48:58 +0530 From: Ajit Agarwal To: gcc-patches , Jeff Law , Richard Biener , Peter Bergner , Segher Boessenkool , rashmi.srid...@ibm.com Ping! Forwarded Message Subject: [PING^2] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces. Date: Tue, 18 Jul 2023 13:28:08 +0530 From: Ajit Agarwal To: gcc-patches CC: Jeff Law , Richard Biener , Segher Boessenkool , Peter Bergner Ping^2. Please review. Thanks & Regards Ajit This new version of patch 4 use improve ree pass for rs6000 target using defined ABI interfaces. Bootstrapped and regtested on power64-linux-gnu. Review comments incorporated. Thanks & Regards Ajit Improve ree pass for rs6000 target using defined abi interfaces For rs6000 target we see redundant zero and sign extension and done to improve ree pass to eliminate such redundant zero and sign extension using defined ABI interfaces. 2023-06-01 Ajit Kumar Agarwal gcc/ChangeLog: * ree.cc (combine_reaching_defs): Use of zero_extend and sign_extend defined abi interfaces. (add_removable_extension): Use of defined abi interfaces for no reaching defs. (abi_extension_candidate_return_reg_p): New function. (abi_extension_candidate_p): New function. (abi_extension_candidate_argno_p): New function. (abi_handle_regs_without_defs_p): New function. (abi_target_promote_function_mode): New function. gcc/testsuite/ChangeLog: * g++.target/powerpc/zext-elim-3.C --- gcc/ree.cc| 199 +++--- .../g++.target/powerpc/zext-elim-3.C | 13 ++ 2 files changed, 183 insertions(+), 29 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C diff --git a/gcc/ree.cc b/gcc/ree.cc index fc04249fa84..2025a7c43da 100644 --- a/gcc/ree.cc +++ b/gcc/ree.cc @@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg) if (REGNO (DF_REF_REG (def)) == REGNO (reg)) break; - gcc_assert (def != NULL); + if (def == NULL) +return NULL; ref_chain = DF_REF_CHAIN (def); @@ -750,6 +751,120 @@ get_extended_src_reg (rtx src) return src; } +/* Return TRUE if target mode is equal to source mode of zero_extend + or sign_extend otherwise false. */ + +static bool +abi_target_promote_function_mode (machine_mode mode) +{ + int unsignedp; + machine_mode tgt_mode = +targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp, +NULL_TREE, 1); + + if (tgt_mode == mode) +return true; + else +return false; +} + +/* Return TRUE if the candidate insn is zero extend and regno is + an return registers. */ + +static bool +abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno) +{ + rtx set = single_set (insn); + + if (GET_CODE (SET_SRC (set)) != ZERO_EXTEND) +return false; + + if (FUNCTION_VALUE_REGNO_P (regno)) +return true; + + return false; +} + +/* Return TRUE if reg source operand of zero_extend is argument registers + and not return registers and source and destination operand are same + and mode of source and destination operand are not same. */ + +static bool +abi_extension_candidate_p (rtx_insn *insn) +{ + rtx set = single_set (insn); + + if (GET_CODE (SET_SRC (set)) != ZERO_EXTEND) +return false; + + machine_mode ext_dst_mode = GET_MODE (SET_DEST (set)); + rtx orig_src = XEXP (SET_SRC (set),0); + + bool copy_needed += (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0))); + + if (!copy_needed && ext_dst_mode != GET_MODE (orig_src) + && FUNCTION_ARG_REGNO_P (REGNO (orig_src)) + && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src))) +return true; + + return false; +} + +/* Return TRUE if the candidate insn is zero extend and regno is + an argument registers. */ + +static bool +abi_extension_candidate_argno_p (rtx_code code, int regno) +{ + if (code != ZERO_EXTEND) +return false; + + if (FUNCTION_ARG_REGNO_P (regno)) +return true; + + return false; +} + +/* Return TRUE if the candidate insn doesn't have defs and have + * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class. */ + +static bool +abi_handle_regs_without_defs_p (rtx_insn *insn) +{ + if (side_effects_p (PATTERN (insn))) +return false; + + struct df_link *uses += get_uses (insn, SET_DEST (PATTERN (insn))); + + if (!uses) +return false; + + for (df_link *use = uses; use; use = use->next) +{ + if (!use->ref) + return false; + + if (BLOCK_FOR_INSN (insn) + != BLOCK_FOR_INSN (DF_REF_INSN (use->ref))) + return false; + + rtx_insn *use_insn = DF_REF_INSN (use->ref); + + if (GET_CODE (PATTERN (use_insn)) == SET) + { + rtx_code code = GET_CO
[PING^4] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 target.
Ping! Forwarded Message Subject: PING^3] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 target. Date: Tue, 1 Aug 2023 13:50:21 +0530 From: Ajit Agarwal To: gcc-patches , Jeff Law , Richard Biener , Peter Bergner , Segher Boessenkool , rashmi.srid...@ibm.com Ping! Forwarded Message Subject: [PING^2] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 target. Date: Tue, 18 Jul 2023 13:31:27 +0530 From: Ajit Agarwal To: gcc-patches CC: Jeff Law , Richard Biener , Segher Boessenkool , Peter Bergner Ping^2. Please review. Thanks & Regards Ajit This patch provide functionality to improve ree pass for rs6000 target. Eliminated sign_extend/zero_extend/AND with varying constants. Bootstrapped and regtested on powerpc64-linux-gnu. Thanks & Regards Ajit ree: Improve ree pass for rs6000 target For rs6000 target we see redundant zero and sign extension and done to improve ree pass to eliminate such redundant zero and sign extension. Support of zero_extend/sign_extend/AND. Also support of AND with extension with different constants other than 1. 2023-06-07 Ajit Kumar Agarwal gcc/ChangeLog: * ree.cc (eliminate_across_bbs_p): Add checks to enable extension elimination across and within basic blocks. (def_arith_p): New function to check definition has arithmetic operation. (combine_set_extension): Modification to incorporate AND and current zero_extend and sign_extend instruction. (merge_def_and_ext): Add calls to eliminate_across_bbs_p and zero_extend sign_extend and AND instruction. (rtx_is_zext_p): New function. (feasible_cfg): New function. * rtl.h (reg_used_set_between_p): Add prototype. * rtlanal.cc (reg_used_set_between_p): New function. gcc/testsuite/ChangeLog: * g++.target/powerpc/zext-elim.C: New testcase. * g++.target/powerpc/zext-elim-1.C: New testcase. * g++.target/powerpc/zext-elim-2.C: New testcase. * g++.target/powerpc/sext-elim.C: New testcase. --- gcc/ree.cc| 476 -- gcc/rtl.h | 1 + gcc/rtlanal.cc| 15 + gcc/testsuite/g++.target/powerpc/sext-elim.C | 18 + .../g++.target/powerpc/zext-elim-1.C | 19 + .../g++.target/powerpc/zext-elim-2.C | 11 + gcc/testsuite/g++.target/powerpc/zext-elim.C | 30 ++ 7 files changed, 524 insertions(+), 46 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C diff --git a/gcc/ree.cc b/gcc/ree.cc index fc04249fa84..dc6da21ec16 100644 --- a/gcc/ree.cc +++ b/gcc/ree.cc @@ -253,6 +253,66 @@ struct ext_cand static int max_insn_uid; +/* Return TRUE if OP can be considered a zero extension from one or + more sub-word modes to larger modes up to a full word. + + For example (and:DI (reg) (const_int X)) + + Depending on the value of X could be considered a zero extension + from QI, HI and SI to larger modes up to DImode. */ + +static bool +rtx_is_zext_p (rtx insn) +{ + if (GET_CODE (insn) == AND) +{ + rtx set = XEXP (insn, 0); + if (REG_P (set)) + { + rtx src = XEXP (insn, 1); + + if (CONST_INT_P (src) + && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7)) + return true; + } + else + return false; +} + + return false; +} +/* Return TRUE if OP can be considered a zero extension from one or + more sub-word modes to larger modes up to a full word. + + For example (and:DI (reg) (const_int X)) + + Depending on the value of X could be considered a zero extension + from QI, HI and SI to larger modes up to DImode. */ + +static bool +rtx_is_zext_p (rtx_insn *insn) +{ + rtx body = single_set (insn); + + if (GET_CODE (body) == SET && GET_CODE (SET_SRC (body)) == AND) + { + rtx set = XEXP (SET_SRC (body), 0); + + if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set)) + { + rtx src = XEXP (SET_SRC (body), 1); + + if (CONST_INT_P (src) + && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7)) + return true; + } + else + return false; + } + + return false; +} + /* Update or remove REG_EQUAL or REG_EQUIV notes for INSN. */ static bool @@ -319,7 +379,7 @@ combine_set_extension (ext_cand *cand, rtx_insn *curr_insn, rtx *orig_set) { rtx orig_src = SET_SRC (*orig_set); machine_mode orig_mode = GET_MODE (SET_DEST (*orig_set)); - rtx new_set; + rtx new_set = NULL_RTX; rtx cand_pat = single_set (cand->insn); /* If the extension's source/destination registers are not the same @@ -359,27 +419,41 @@ combine_set_extension (ext_cand
[PING^2] [PATCH v8] tree-ssa-sink: Improve code sinking pass.
Ping! Forwarded Message Subject: [PING^1] [PATCH v8] tree-ssa-sink: Improve code sinking pass. Date: Tue, 1 Aug 2023 13:47:10 +0530 From: Ajit Agarwal To: gcc-patches CC: Richard Biener , Jeff Law , Peter Bergner , Segher Boessenkool , rashmi.srid...@ibm.com Ping! Forwarded Message Subject: [PATCH v8] tree-ssa-sink: Improve code sinking pass. Date: Tue, 18 Jul 2023 19:03:37 +0530 From: Ajit Agarwal To: gcc-patches CC: Richard Biener , Jeff Law , Segher Boessenkool , Peter Bergner Hello All: This patch improves code sinking pass to sink statements before call to reduce register pressure. Review comments are incorporated. For example : void bar(); int j; void foo(int a, int b, int c, int d, int e, int f) { int l; l = a + b + c + d +e + f; if (a != 5) { bar(); j = l; } } Code Sinking does the following: void bar(); int j; void foo(int a, int b, int c, int d, int e, int f) { int l; if (a != 5) { l = a + b + c + d +e + f; bar(); j = l; } } Bootstrapped regtested on powerpc64-linux-gnu. Thanks & Regards Ajit tree-ssa-sink: Improve code sinking pass Currently, code sinking will sink code after function calls. This increases register pressure for callee-saved registers. The following patch improves code sinking by placing the sunk code before calls in the use block or in the immediate dominator of the use blocks. 2023-07-18 Ajit Kumar Agarwal gcc/ChangeLog: PR tree-optimization/81953 * tree-ssa-sink.cc (statement_sink_location): Move statements before calls. (def_use_same_block): New function. (select_best_block): Add heuristics to select the best blocks in the immediate post dominator. gcc/testsuite/ChangeLog: PR tree-optimization/81953 * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase. * gcc.dg/tree-ssa/ssa-sink-21.c: New testcase. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 +++ gcc/tree-ssa-sink.cc| 59 - 3 files changed, 67 insertions(+), 26 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c new file mode 100644 index 000..d3b79ca5803 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ +void bar(); +int j; +void foo(int a, int b, int c, int d, int e, int f) +{ + int l; + l = a + b + c + d +e + f; + if (a != 5) +{ + bar(); + j = l; +} +} +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c new file mode 100644 index 000..84e7938c54f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ +void bar(); +int j, x; +void foo(int a, int b, int c, int d, int e, int f) +{ + int l; + l = a + b + c + d +e + f; + if (a != 5) +{ + bar(); + if (b != 3) +x = 3; + else +x = 5; + j = l; +} +} +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */ diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc index b1ba7a2ad6c..e7190323abe 100644 --- a/gcc/tree-ssa-sink.cc +++ b/gcc/tree-ssa-sink.cc @@ -173,7 +173,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts) /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator tree, return the best basic block between them (inclusive) to place - statements. + statements. The best basic block should be an immediate dominator of + best basic block if the use stmt is after the call. We want the most control dependent block in the shallowest loop nest. @@ -190,11 +191,22 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts) static basic_block select_best_block (basic_block early_bb, basic_block late_bb, - gimple *stmt) + gimple *stmt, + gimple *use) { basic_block best_bb = late_bb; basic_block temp_bb = late_bb; int threshold; + /* Get the sinking threshold. If the statement to be moved has memory + operands, then increase the threshold by 7% as those are even more + profitable to avoid, clamping at 100%. */ + threshold = param_sink_frequency_threshold; + if (gimple_vuse (stmt) || gimple_vdef (stmt)) +{ + threshold += 7; + if (threshold > 100) + threshold = 100; +} while (temp_bb != early_bb) { @@ -203,34 +
Re: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API
Why does this patch not have HAS_FRM? juzhe.zh...@rivai.ai From: pan2.li Date: 2023-08-17 16:05 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API From: Pan Li This patch would like to support the rounding mode API for the VFWREDUSUM.VS as the below samples * __riscv_vfwredusum_vs_f32m1_f64m1_rm * __riscv_vfwredusum_vs_f32m1_f64m1_rm_m Signed-off-by: Pan Li gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (vfwredusum_frm_obj): New declaration. (BASE): Ditto. * config/riscv/riscv-vector-builtins-bases.h: Ditto. * config/riscv/riscv-vector-builtins-functions.def (vfwredusum_frm): New intrinsic function def. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/float-point-wredusum.c: New test. --- .../riscv/riscv-vector-builtins-bases.cc | 2 ++ .../riscv/riscv-vector-builtins-bases.h | 1 + .../riscv/riscv-vector-builtins-functions.def | 1 + .../riscv/rvv/base/float-point-wredusum.c | 33 +++ 4 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index abf03bab0da..5ee7d3119db 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -2548,6 +2548,7 @@ static CONSTEXPR const freducop vfredosum_frm_obj; static CONSTEXPR const reducop vfredmax_obj; static CONSTEXPR const reducop vfredmin_obj; static CONSTEXPR const widen_freducop vfwredusum_obj; +static CONSTEXPR const widen_freducop vfwredusum_frm_obj; static CONSTEXPR const widen_freducop vfwredosum_obj; static CONSTEXPR const widen_freducop vfwredosum_frm_obj; static CONSTEXPR const vmv vmv_x_obj; @@ -2810,6 +2811,7 @@ BASE (vfredmin) BASE (vfwredosum) BASE (vfwredosum_frm) BASE (vfwredusum) +BASE (vfwredusum_frm) BASE (vmv_x) BASE (vmv_s) BASE (vfmv_f) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h b/gcc/config/riscv/riscv-vector-builtins-bases.h index c1bb164a712..69d4562091f 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.h +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h @@ -247,6 +247,7 @@ extern const function_base *const vfredmin; extern const function_base *const vfwredosum; extern const function_base *const vfwredosum_frm; extern const function_base *const vfwredusum; +extern const function_base *const vfwredusum_frm; extern const function_base *const vmv_x; extern const function_base *const vmv_s; extern const function_base *const vfmv_f; diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def b/gcc/config/riscv/riscv-vector-builtins-functions.def index da1157f5a56..3ce06dc60b7 100644 --- a/gcc/config/riscv/riscv-vector-builtins-functions.def +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def @@ -508,6 +508,7 @@ DEF_RVV_FUNCTION (vfwredosum, reduc_alu, no_mu_preds, wf_vs_ops) DEF_RVV_FUNCTION (vfwredusum, reduc_alu, no_mu_preds, wf_vs_ops) DEF_RVV_FUNCTION (vfwredosum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops) +DEF_RVV_FUNCTION (vfwredusum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops) /* 15. Vector Mask Instructions. */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c new file mode 100644 index 000..6c888c10c0d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */ + +#include "riscv_vector.h" + +vfloat64m1_t +test_riscv_vfwredusum_vs_f32m1_f64m1_rm (vfloat32m1_t op1, vfloat64m1_t op2, + size_t vl) { + return __riscv_vfwredusum_vs_f32m1_f64m1_rm (op1, op2, 0, vl); +} + +vfloat64m1_t +test_vfwredusum_vs_f32m1_f64m1_rm_m (vbool32_t mask, vfloat32m1_t op1, + vfloat64m1_t op2, size_t vl) { + return __riscv_vfwredusum_vs_f32m1_f64m1_rm_m (mask, op1, op2, 1, vl); +} + +vfloat64m1_t +test_riscv_vfwredusum_vs_f32m1_f64m1 (vfloat32m1_t op1, vfloat64m1_t op2, + size_t vl) { + return __riscv_vfwredusum_vs_f32m1_f64m1 (op1, op2, vl); +} + +vfloat64m1_t +test_vfwredusum_vs_f32m1_f64m1_m (vbool32_t mask, vfloat32m1_t op1, + vfloat64m1_t op2, size_t vl) { + return __riscv_vfwredusum_vs_f32m1_f64m1_m (mask, op1, op2, vl); +} + +/* { dg-final { scan-assembler-times {vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {fsrmi\s+[01234]} 2 } } */ -- 2.34.1
[PATCH] MATCH: [PR111002] Sink view_convert for vec_cond
Like convert we can sink view_convert into vec_cond but we can only do it if the element types are nop_conversions. This is to allow conversion between signed and unsigned types only. Rather than between integer and float types which mess up the vec_cond so that isel does not understand `a?-1:0` is still that. OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu. PR tree-optimization/111002 gcc/ChangeLog: * match.pd (view_convert(vec_cond(a,b,c))): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/cond_convert_8.c: New test. --- gcc/match.pd | 9 .../gcc.target/aarch64/sve/cond_convert_8.c | 22 +++ 2 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c diff --git a/gcc/match.pd b/gcc/match.pd index 851f1af6eac..81666f28465 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4718,6 +4718,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && types_match (TREE_TYPE (@0), truth_type_for (type))) (vec_cond @0 (convert! @1) (convert! @2 +/* Likewise for view_convert of nop_conversions. */ +(simplify + (view_convert (vec_cond:s @0 @1 @2)) + (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@1)) + && known_eq (TYPE_VECTOR_SUBPARTS (type), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))) + && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@1 + (vec_cond @0 (view_convert! @1) (view_convert! @2 + /* Sink binary operation to branches, but only if we can fold it. */ (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor lshift rshift rdiv trunc_div ceil_div floor_div round_div diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c new file mode 100644 index 000..d8b96e5fcfb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 -fdump-tree-optimized" } */ +/* PR tree-optimization/111002 */ + +/* We should be able to remove the neg. */ + +void __attribute__ ((noipa)) +f (int *__restrict r, + int *__restrict a, + short *__restrict pred) +{ + for (int i = 0; i < 1024; ++i) +r[i] = pred[i] != 0 ? -1 : 0; +} + + +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]+/z, #-1} 1 } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.[hs], p[0-7]+/z, #1} } } */ + +/* { dg-final { scan-tree-dump-not "VIEW_CONVERT_EXPR " "optimized" } } */ +/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " = \\\(vector" "optimized" } } */ -- 2.31.1
[PATCH] Mention Intel -march=gracemont for Alderlake-N.
--- htdocs/gcc-14/changes.html | 4 1 file changed, 4 insertions(+) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index eae25f1a..2c888660 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -151,6 +151,10 @@ a work-in-progress. -march=lunarlake. Lunar Lake is based on Arrow Lake S. + GCC now supports the Intel CPU named Alderlake-N through + -march=gracemont. + Alderlake-N is E-core only, not hybrid architecture. + -- 2.31.1
[PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest fusion. I do the refactor for the following reasons: 1. Current implementation of phase 3 is doing too many things which makes the code quality quite messy and not easy to maintain. 2. The demand fusion I do previously is we explicitly make the fusion including how to fuse VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion point (location) whether it is correct and optimal...etc. We are dong these things too much so I added these following functions: enum fusion_type get_backward_fusion_type (const bb_info *, const vector_insn_info &); bool hard_empty_block_p (const bb_info *, const vector_insn_info &) const; bool backward_demand_fusion (void); bool forward_demand_fusion (void); bool cleanup_illegal_dirty_blocks (void); to make sure the VSETV fusion is optimal and correct. I found in may downstream testing it is not the reliable and optimal approach. Instead, this patch is to use 'compute_earliest' which is the function of LCM to fuse multiple 'compatible' VSETVL demand info if they are having same earliest edge. We let LCM decide almost everything of demand fusion for us. The only thing we do (Not the LCM do) is just checking the VSETVLs demand info are compatible or not. That's all we need to do. I belive such approach is much more reliable and optimal than before (We have many testcases already to check this refactor patch). 3. Using LCM approach to do the demand fusion is more reliable and better CFG than before. ... Here is the basics of this patch approach: Consider this following case: for for for ... for if (...) VSETVL 1 demand: RATIO = 32 and TU policy. else if (...) VSETVL 2 demand: SEW = 16. else VSETVL 3 demand: MU policy. - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 and VSETVL 3. They are having same earliest edge which is outside the 1th inner-most loop. - Then, we check these 3 VSETVL demand info are compatible so fuse them into a single VSETVL info: demand SEW = 16, LMUL = MF2, TU, MU. - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) will hoist such VSETVL to the outer-most loop. So that we can get optimal codegen. This patch is depending on: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627948.html gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p): New function. (find_reg_killed_by): Delete. (after_or_same_p): New function. (has_vsetvl_killed_avl_p):Delete. (anticipatable_occurrence_p): Adapt function. (get_same_bb_set): Delete. (any_set_in_bb_p): Ditto. (change_insn): Format. (ge_sew_ratio_unavailable_p): Fix bug. (backward_propagate_worthwhile_p): Delete. (vector_insn_info::parse_insn): Adapt function. (vector_insn_info::merge): Ditto. (vector_insn_info::dump): Ditto. (vector_infos_manager::vector_infos_manager): Refactor Phase 3. (vector_infos_manager::all_empty_predecessor_p): Delete. (vector_infos_manager::all_same_ratio_p): Refactor Phase 3. (vector_infos_manager::all_same_avl_p): Ditto. (vector_infos_manager::create_bitmap_vectors): Ditto. (vector_infos_manager::free_bitmap_vectors): Ditto. (vector_infos_manager::dump): Ditto. (pass_vsetvl::update_block_info): New function. (enum fusion_type): Refactor Phase 3. (pass_vsetvl::get_backward_fusion_type): Delete. (demands_can_be_fused_p): New function. (pass_vsetvl::hard_empty_block_p): Delete. (earliest_pred_can_be_fused_p): New function. (pass_vsetvl::backward_demand_fusion): Delete. (pass_vsetvl::earliest_fusion): New function. (pass_vsetvl::forward_demand_fusion): Delete. (pass_vsetvl::demand_fusion): Ditto. (pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto. (pass_vsetvl::compute_local_properties): Adapt function. (pass_vsetvl::refine_vsetvls): Ditto. (pass_vsetvl::cleanup_vsetvls): Ditto. (pass_vsetvl::commit_vsetvls): Ditto. (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. (get_first_vsetvl_before_rvv_insns): Ditto. (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto. (pass_vsetvl::cleanup_earliest_vsetvls): New function. (pass_vsetvl::df_post_optimization): Adapt function. (pass_vsetvl::compute_probabilities): Ditto. (pass_vsetvl::lazy_vsetvl): Ditto. * config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Fix bug. * config/riscv/riscv-vsetvl.h: Refactor Phase 3. * config/riscv/t-riscv:
Re: [PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]
Jeff, Thanks a lot for your comments. The widen shift mode is on i1/i2 before they're combined with i3 to newpat. The newpat matches rotate/mask pattern. The i1/i2 itself don't match rotate/mask pattern. I did an experiment to disable widen shift mode for lshiftrt. I tested it on powerpc/x86/aarch64. There is no regression occurred. I thought that the widen shift mode is helpful for newpat matching. But it seems not, at least no impact on powerpc/x86/aarch64. diff --git a/gcc/combine.cc b/gcc/combine.cc index 4bf867d74b0..0b9b115f9bb 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -10479,11 +10479,6 @@ try_widen_shift_mode (enum rtx_code code, rtx op, int count, return orig_mode; case LSHIFTRT: - /* Similarly here but with zero bits. */ - if (HWI_COMPUTABLE_MODE_P (mode) - && (nonzero_bits (op, mode) & ~GET_MODE_MASK (orig_mode)) == 0) - return mode; - /* We can also widen if the bits brought in will be masked off. This operation is performed in ORIG_MODE. */ if (outer_code == AND) Segher, Could you inform me what's the purpose of widen shift mode in simplify_shift_const? Does it definitely reduce the rtx cost or it helps match patterns? Thanks a lot. Thanks Gui Haochen 在 2023/8/5 7:32, Jeff Law 写道: > > > On 7/20/23 18:59, HAO CHEN GUI wrote: >> Hi Jeff, >> >> 在 2023/7/21 5:27, Jeff Law 写道: >>> Wouldn't it make more sense to just try rotate/mask in the original mode >>> before trying a shift in a widened mode? I'm not sure why we need a target >>> hook here. >> >> There is no change to try rotate/mask with the original mode when >> expensive_optimizations is set. The subst widens the shift mode. > But we can add it before the attempt in the wider mode. > >> >> if (flag_expensive_optimizations) >> { >> /* Pass pc_rtx so no substitutions are done, just >> simplifications. */ >> if (i1) >> { >> subst_low_luid = DF_INSN_LUID (i1); >> i1src = subst (i1src, pc_rtx, pc_rtx, 0, 0, 0); >> } >> >> subst_low_luid = DF_INSN_LUID (i2); >> i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0); >> } >> >> I don't know if the wider mode is helpful to other targets, so >> I added the target hook. > In this scenario we're often better off relying on rtx_costs (even with all > its warts) rather than adding yet another target hook. > > I'd love to hear from Segher here to see if he's got other ideas. > > jeff
[PATCH] LoongArch: initial ada support on linux
gcc/ChangeLog: * ada/Makefile.rtl: Add LoongArch support. * ada/libgnarl/s-linux__loongarch.ads: New. * ada/libgnat/system-linux-loongarch.ads: New. * config/loongarch/loongarch.h: mark normalized options passed from driver to gnat1 as explicit for multilib. --- gcc/ada/Makefile.rtl | 49 +++ gcc/ada/libgnarl/s-linux__loongarch.ads| 134 +++ gcc/ada/libgnat/system-linux-loongarch.ads | 145 + gcc/config/loongarch/loongarch.h | 4 +- 4 files changed, 330 insertions(+), 2 deletions(-) create mode 100644 gcc/ada/libgnarl/s-linux__loongarch.ads create mode 100644 gcc/ada/libgnat/system-linux-loongarch.ads diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl index b94caa45b10..8908a5acf38 100644 --- a/gcc/ada/Makefile.rtl +++ b/gcc/ada/Makefile.rtl @@ -2118,6 +2118,55 @@ ifeq ($(strip $(filter-out cygwin% mingw32% pe,$(target_os))),) LIBRARY_VERSION := $(LIB_VERSION) endif +# LoongArch Linux +ifeq ($(strip $(filter-out loongarch% linux%,$(target_cpu) $(target_os))),) + LIBGNAT_TARGET_PAIRS = \ + a-exetim.adbhttp://www.gnu.org/licenses/>. -- +-- -- +-- + +-- This is the LoongArch version of this package + +-- This package encapsulates cpu specific differences between implementations +-- of GNU/Linux, in order to share s-osinte-linux.ads. + +-- PLEASE DO NOT add any with-clauses to this package or remove the pragma +-- Preelaborate. This package is designed to be a bottom-level (leaf) package + +with Interfaces.C; +with System.Parameters; + +package System.Linux is + pragma Preelaborate; + + -- + -- Time -- + -- + + subtype int is Interfaces.C.int; + subtype longis Interfaces.C.long; + subtype suseconds_t is Interfaces.C.long; + type time_t is range -2 ** (System.Parameters.time_t_bits - 1) + .. 2 ** (System.Parameters.time_t_bits - 1) - 1; + subtype clockid_t is Interfaces.C.int; + + type timespec is record + tv_sec : time_t; + tv_nsec : long; + end record; + pragma Convention (C, timespec); + + type timeval is record + tv_sec : time_t; + tv_usec : suseconds_t; + end record; + pragma Convention (C, timeval); + + --- + -- Errno -- + --- + + EAGAIN: constant := 11; + EINTR : constant := 4; + EINVAL: constant := 22; + ENOMEM: constant := 12; + EPERM : constant := 1; + ETIMEDOUT : constant := 110; + + - + -- Signals -- + - + + SIGHUP : constant := 1; -- hangup + SIGINT : constant := 2; -- interrupt (rubout) + SIGQUIT: constant := 3; -- quit (ASCD FS) + SIGILL : constant := 4; -- illegal instruction (not reset) + SIGTRAP: constant := 5; -- trace trap (not reset) + SIGIOT : constant := 6; -- IOT instruction + SIGABRT: constant := 6; -- used by abort, replace SIGIOT in the future + SIGBUS : constant := 7; -- bus error + SIGFPE : constant := 8; -- floating point exception + SIGKILL: constant := 9; -- kill (cannot be caught or ignored) + SIGUSR1: constant := 10; -- user defined signal 1 + SIGSEGV: constant := 11; -- segmentation violation + SIGUSR2: constant := 12; -- user defined signal 2 + SIGPIPE: constant := 13; -- write on a pipe with no one to read it + SIGALRM: constant := 14; -- alarm clock + SIGTERM: constant := 15; -- software termination signal from kill + SIGSTKFLT : constant := 16; -- coprocessor stack fault (Linux) + SIGCLD : constant := 17; -- alias for SIGCHLD + SIGCHLD: constant := 17; -- child status change + SIGCONT: constant := 18; -- stopped process has been continued + SIGSTOP: constant := 19; -- stop (cannot be caught or ignored) + SIGTSTP: constant := 20; -- user stop requested from tty + SIGTTIN: constant := 21; -- background tty read attempted + SIGTTOU: constant := 22; -- background tty write attempted + SIGURG : constant := 23; -- urgent condition on IO channel + SIGXCPU: constant := 24; -- CPU time limit exceeded + SIGXFSZ: constant := 25; -- filesize limit exceeded + SIGVTALRM : constant := 26; -- virtual timer expired + SIGPROF: constant := 27; -- profiling timer expired + SIGWINCH : constant := 28; -- window size change + SIGPOLL: constant := 29; -- pollable event occurred + SIGIO : constant := 29; -- I/O now possible (4.2 BSD) + SIGPWR : constant := 30; -- power-fail restart + SIGSYS : constant := 31; -- bad system call + SIG32 : constant := 32; -- glibc internal signal + SIG33 : constant := 33; -- glibc internal signal + SIG34 : constant :=
Re: Intel AVX10.1 Compiler Design and Support
On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches wrote: > > Hi, > > With the proposed design of these switches, how would I restrict AVX10.1 > to particular AVX-512 subsets? We can't, avx10.1 is taken as an indivisible ISA which contains all AVX512 related instructions. > We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, > so in some cases it might prove difficult to guarantee this). intel sde support avx10.1-256 target which can be used to validate the binary(if there's invalid 512-bit vector register or 64-bit kmask register is used). > I don’t see any other way of doing what you want within the constraints of > this design. It looks like the requirement is that we want a -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256) option that acts on the original -mavx512XXX option to produce avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may include avx512fp16 directives and thus not be backward compatible SKX/CLX/ICX. > > For example, usage of the |_mm256_rol_epi32| intrinsic should be > compatible on any AVX10/256 implementation, /as well as /any AVX-512VL > without AVX10 implementation (e.g. Skylake-X). But how do I signal that > I want compatibility with both these targets? > > * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible > with 256-bit AVX10. > * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away > from 512-bit registers, but I don't think it guarantees it. > * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512 > features at 256-bit wide (so in theory, it could choose to compile > it with |vpshldd|) -> incompatible with Skylake-X. > * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning > and ignore the attempts at disabling AVX-512 subsets. > * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not > the /intersection./ > > Is there something like |-mavx512vl -mmax-vector-width=256|, or am I > misunderstanding the situation? > > Thanks! -- BR, Hongtao
Re: [PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]
I am so sorry sending the wrong and duplicate patch. Forget about this patch.
[PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend
This patch exports 'compute_antinout_edge' and 'compute_earliest' as global scope which is going to be used in VSETVL PASS of RISC-V backend. The demand fusion is the fusion of VSETVL information to emit VSETVL which dominate and pre-config for most of the RVV instructions in order to elide redundant VSETVLs. For exmaple: for for for if (cond} VSETVL demand 1: SEW/LMUL = 16 and TU policy else VSETVL demand 2: SEW = 32 VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW = 32, LMUL = M2, TU policy. Then emit such VSETVL at the outmost of the for loop to get the most optimal codegen and run-time execution. Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and un-reliable as well as un-maintainable. And, I recently read dragon book and morgan's book again, I found there "earliest" can allow us to do the demand fusion in a very reliable and optimal way. So, this patch exports these 2 functions which are very helpful for VSETVL pass. gcc/ChangeLog: * lcm.cc (compute_antinout_edge): Export as global use. (compute_earliest): Ditto. (compute_rev_insert_delete): Ditto. * lcm.h (compute_antinout_edge): Ditto. (compute_earliest): Ditto. --- gcc/lcm.cc | 7 ++- gcc/lcm.h | 3 +++ 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/lcm.cc b/gcc/lcm.cc index 94a3ed43aea..03421e490e4 100644 --- a/gcc/lcm.cc +++ b/gcc/lcm.cc @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3. If not see #include "lcm.h" /* Edge based LCM routines. */ -static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap *); -static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *, - sbitmap *, sbitmap *, sbitmap *); static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *, sbitmap *, sbitmap *); static void compute_insert_delete (struct edge_list *edge_list, sbitmap *, @@ -79,7 +76,7 @@ static void compute_rev_insert_delete (struct edge_list *edge_list, sbitmap *, This is done based on the flow graph, and not on the pred-succ lists. Other than that, its pretty much identical to compute_antinout. */ -static void +void compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin, sbitmap *antout) { @@ -170,7 +167,7 @@ compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin, /* Compute the earliest vector for edge based lcm. */ -static void +void compute_earliest (struct edge_list *edge_list, int n_exprs, sbitmap *antin, sbitmap *antout, sbitmap *avout, sbitmap *kill, sbitmap *earliest) diff --git a/gcc/lcm.h b/gcc/lcm.h index e08339352e0..7145d6fc46d 100644 --- a/gcc/lcm.h +++ b/gcc/lcm.h @@ -31,4 +31,7 @@ extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *, sbitmap *, sbitmap *, sbitmap *, sbitmap **, sbitmap **); +extern void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap *); +extern void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *, + sbitmap *, sbitmap *, sbitmap *); #endif /* GCC_LCM_H */ -- 2.36.3
[PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]
void foo(_Float16 y, int64_t *i64p) { vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1); vx = __riscv_vadd_vv_i64m1 (vx, vx, 1); vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1); asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy)); } zve64f: foo: vsetivlizero,1,e16,mf4,ta,ma vle64.v v1,0(a0) vfmv.s.fv2,fa0 vsetvli zero,zero,e64,m1,ta,ma vadd.vv v1,v1,v1 zve64d: foo: vsetivlizero,1,e64,m1,ta,ma vle64.v v1,0(a0) vfmv.s.fv2,fa0 vadd.vv v1,v1,v1 PR target111037 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (float_insn_valid_sew_p): New function. (second_sew_less_than_first_sew_p): Fix bug. (first_sew_less_than_second_sew_p): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr111037-1.c: New test. * gcc.target/riscv/rvv/base/pr111037-2.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 22 +-- .../gcc.target/riscv/rvv/base/pr111037-1.c| 15 + .../gcc.target/riscv/rvv/base/pr111037-2.c| 8 +++ 3 files changed, 43 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 08c487d82c0..79cbac01047 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1183,18 +1183,36 @@ second_ratio_invalid_for_first_lmul_p (const vector_insn_info &info1, return calculate_sew (info1.get_vlmul (), info2.get_ratio ()) == 0; } +static bool +float_insn_valid_sew_p (const vector_insn_info &info, unsigned int sew) +{ + if (info.get_insn () && info.get_insn ()->is_real () + && get_attr_type (info.get_insn ()->rtl ()) == TYPE_VFMOVFV) +{ + if (sew == 16) + return TARGET_VECTOR_ELEN_FP_16; + else if (sew == 32) + return TARGET_VECTOR_ELEN_FP_32; + else if (sew == 64) + return TARGET_VECTOR_ELEN_FP_64; +} + return true; +} + static bool second_sew_less_than_first_sew_p (const vector_insn_info &info1, const vector_insn_info &info2) { - return info2.get_sew () < info1.get_sew (); + return info2.get_sew () < info1.get_sew () +|| !float_insn_valid_sew_p (info1, info2.get_sew ()); } static bool first_sew_less_than_second_sew_p (const vector_insn_info &info1, const vector_insn_info &info2) { - return info1.get_sew () < info2.get_sew (); + return info1.get_sew () < info2.get_sew () +|| !float_insn_valid_sew_p (info2, info1.get_sew ()); } /* return 0 if LMUL1 == LMUL2. diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c new file mode 100644 index 000..0b7b32fc3e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gc_zve64f_zvfh -mabi=ilp32d -O3" } */ + +#include "riscv_vector.h" + +void foo(_Float16 y, int64_t *i64p) +{ + vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1); + vx = __riscv_vadd_vv_i64m1 (vx, vx, 1); + vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1); + asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy)); +} + +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c new file mode 100644 index 000..ac50da71726 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gc_zve64d_zvfh -mabi=ilp32d -O3" } */ + +#include "pr111037-1.c" + +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*1,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */ +/* { dg-final { scan-assembler-not {vsetvli} } } */ +/* { dg-final { scan-assembler-times {vsetivli} 1 } } */ -- 2.36.3
[committed] Testsuite, darwin: account for macOS 13 and 14
Committed as obvious, making gcc.dg/darwin-minversion-link.c pass on macOS 13 and 14 (darwin22 and darwin23, respectively). FX 0001-Testsuite-darwin-account-for-macOS-13-and-14.patch Description: Binary data
[PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin
Hi, testsuite/libgomp.c/simd-math-1.c calls nonstandard functions that are not available on darwin (and possibly other systems?). Because I did not want to disable their testing completely, I suggest we simply use preprocessor macros to avoid them on darwin. This fixes the test failure on aarch64-apple-darwin. OK to commit? FX 0001-libgomp-testsuite-Do-not-call-nonstandard-functions-.patch Description: Binary data
Re: [PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message
Hello Thiago, On 18.08.23 23:24, Thiago Jung Bauermann wrote: Tobias Burnus writes: the patch looks good to me. Thanks! Can you commit the patch yourself or do you need someone to do this for you? Thank you! I don't have commit access, so I would need someone to do this for me. Done now in commit r14-3344-g40a6803c6d8ca2. Thanks, Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 40a6803c6d8ca244a7bdda8e4ec986c418362b24 Author: Thiago Jung Bauermann Date: Sun Aug 20 20:46:05 2023 +0200 testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls" changed the compiler error message in this testcase from : In instantiation of 'void foo() [with T = int]': :14:11: required from here :8:22: error: 'int' is not a class, struct, or union type :8:22: error: 'int' is not a class, struct, or union type :8:22: error: 'int' is not a class, struct, or union type :8:3: error: expected iteration declaration or initialization compiler exited with status 1 to: : In instantiation of 'void foo() [with T = int]': :14:11: required from here :8:22: error: 'int' is not a class, struct, or union type :8:3: error: invalid type for iteration variable 'i' compiler exited with status 1 Excess errors: :8:3: error: invalid type for iteration variable 'i' Andrew Pinski analysed the issue in PR 110756 and considered that it was a testsuite issue in that the error message changed slightly. Also, it's a better error message. Therefore, we only need to adjust the testcase to expect the new message. gcc/testsuite/ChangeLog: PR testsuite/110756 * g++.dg/gomp/pr58567.C: Adjust to new compiler error message. --- gcc/testsuite/g++.dg/gomp/pr58567.C | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/g++.dg/gomp/pr58567.C b/gcc/testsuite/g++.dg/gomp/pr58567.C index 35a5bb027ff..866d831c65e 100644 --- a/gcc/testsuite/g++.dg/gomp/pr58567.C +++ b/gcc/testsuite/g++.dg/gomp/pr58567.C @@ -5,7 +5,7 @@ template void foo() { #pragma omp parallel for - for (typename T::X i = 0; i < 100; ++i) /* { dg-error "'int' is not a class, struct, or union type|expected iteration declaration or initialization" } */ + for (typename T::X i = 0; i < 100; ++i) /* { dg-error "'int' is not a class, struct, or union type|invalid type for iteration variable 'i'" } */ ; }
[PATCH, committed] Testsuite, darwin: Fix analyzer testcases
Committed as obvious, fixing three more darwin-specific failures in analyzer testsuite. This fixes: FAIL: gcc.dg/plugin/taint-CVE-2011-0521-5.c -fplugin=./analyzer_kernel_plugin.so (test for warnings, line 39) FAIL: gcc.dg/plugin/taint-CVE-2011-0521-6.c -fplugin=./analyzer_kernel_plugin.so (test for warnings, line 36) XPASS: gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c -fplugin=./analyzer_kernel_plugin.so (test for bogus messages, line 39) Committed to trunk, FX 0001-Testsuite-darwin-Fix-analyzer-testcases.patch Description: Binary data
Re: [PATCH] Testsuite: mark IPA test as requiring alias support
Hi, > IMO, changes like this qualify as obvious, so I’d go ahead (thanks for this > test fail triage) Makes sense. I’ve committed, as well as another one, attached, slightly amending the expected pattern of a sarif plugin test. FX 0001-Testsuite-plugin-make-testcase-pattern-more-flexible.patch Description: Binary data
[committed] i386: Micro-optimize ix86_expand_sse_extend
Partial vector src is forced to a register as ops[1], we can use it instead of SRC in the call to ix86_expand_sse_cmp. This change avoids forcing operand[1] to a register in sign/zero-extend expanders. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_extend): Use ops[1] instead of src in the call to ix86_expand_sse_cmp. * config/i386/sse.md (v8qiv8hi2): Do not force operands[1] to a register. (v4hiv4si2): Ditto. (v2siv2di2): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 460d496ef22..031e2f72d15 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -5667,7 +5667,7 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool unsigned_p) ops[2] = force_reg (imode, CONST0_RTX (imode)); else ops[2] = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode), - src, pc_rtx, pc_rtx); + ops[1], pc_rtx, pc_rtx); ix86_split_mmx_punpck (ops, false); emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), ops[0], imode)); diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 87c3bf07020..da85223a9b4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -22923,8 +22923,7 @@ (define_expand "v8qiv8hi2" { if (!TARGET_SSE4_1) { - rtx op1 = force_reg (V8QImode, operands[1]); - ix86_expand_sse_extend (operands[0], op1, ); + ix86_expand_sse_extend (operands[0], operands[1], ); DONE; } @@ -23240,8 +23239,7 @@ (define_expand "v4hiv4si2" { if (!TARGET_SSE4_1) { - rtx op1 = force_reg (V4HImode, operands[1]); - ix86_expand_sse_extend (operands[0], op1, ); + ix86_expand_sse_extend (operands[0], operands[1], ); DONE; } @@ -23846,8 +23844,7 @@ (define_expand "v2siv2di2" { if (!TARGET_SSE4_1) { - rtx op1 = force_reg (V2SImode, operands[1]); - ix86_expand_sse_extend (operands[0], op1, ); + ix86_expand_sse_extend (operands[0], operands[1], ); DONE; }
Re: [PATCH] Testsuite: mark IPA test as requiring alias support
Hi FX, > On 20 Aug 2023, at 13:15, FX Coudert via Gcc-patches > wrote: > The fact that this test needs alias support was indicated in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85656 but never committed. > Without it, the test fails on darwin. > > OK to commit? IMO, changes like this qualify as obvious, so I’d go ahead (thanks for this test fail triage) Iain > > FX > > <0001-Testsuite-mark-IPA-test-as-requiring-alias-support.patch>
[PATCH] Testsuite: mark IPA test as requiring alias support
Hi, The fact that this test needs alias support was indicated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85656 but never committed. Without it, the test fails on darwin. OK to commit? FX 0001-Testsuite-mark-IPA-test-as-requiring-alias-support.patch Description: Binary data
[PATCH] Testsuite, DWARF2: adjust regexp to match darwin output
Hi, This was a painful one to fix, because I hate regexps, especially when they are quoted. On darwin, we have this failure: FAIL: gcc.dg/debug/dwarf2/inline4.c scan-assembler DW_TAG_inlined_subroutine[^(]*([^)]*)[^(]*(DIE (0x[0-9a-f]*) DW_TAG_formal_parameter[^(]*(DIE (0x[0-9a-f]*) DW_TAG_variable That hideous regexp is trying to match (generated on Linux): > .uleb128 0x4# (DIE (0x5c) DW_TAG_inlined_subroutine) > .long 0xa0# DW_AT_abstract_origin > .quad .LBI4 # DW_AT_entry_pc > .byte .LVU2 # DW_AT_GNU_entry_view > .quad .LBB4 # DW_AT_low_pc > .quad .LBE4-.LBB4 # DW_AT_high_pc > .byte 0x1 # DW_AT_call_file (u.c) > .byte 0xf # DW_AT_call_line > .byte 0x14# DW_AT_call_column > .uleb128 0x5# (DIE (0x7d) DW_TAG_formal_parameter) > .long 0xad# DW_AT_abstract_origin > .long .LLST0 # DW_AT_location > .long .LVUS0 # DW_AT_GNU_locviews > .uleb128 0x6# (DIE (0x8a) DW_TAG_variable) It is using the parentheses to check what is between DW_TAG_inlined_subroutine, DW_TAG_formal_parameter and DW_TAG_variable. There’s only one block of parentheses in the middle, that "(u.c)”. However, on darwin, the generated output is more compact: > .uleb128 0x4; (DIE (0x188) DW_TAG_inlined_subroutine) > .long 0x1b8 ; DW_AT_abstract_origin > .quad LBB4; DW_AT_low_pc > .quad LBE4; DW_AT_high_pc > .uleb128 0x5; (DIE (0x19d) DW_TAG_formal_parameter) > .long 0x1c6 ; DW_AT_abstract_origin > .uleb128 0x6; (DIE (0x1a2) DW_TAG_variable) I think that’s valid as well, and the test should pass (what the test really wants to check is that there is no DW_TAG_lexical_block emitted there, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37801 for its origin). It could be achieved in two ways: 1. making darwin emit the DW_AT_call_file 2. adjusting the regexp to match, making the internal block of parentheses optional I chose the second approach. It makes the test pass on darwin. If someone can test it on linux, it’d be appreciated :) I don’t have ready access to such a system right now. Once that passes, OK to commit? FX 0001-Testsuite-DWARF2-adjust-regexp-to-match-darwin-outpu.patch Description: Binary data
Re: Re: [PATCH 1/4][V4][RISC-V] support cm.push cm.pop cm.popret in zcmp
Hi Kito This issue is due to zcmp and shrink-wrap-separate conflict, which has been addressed by an under-review patch. [PATCH 0/2] resolve confilct between RISC-V zcmp and shrink-wrap-separate https://patchwork.sourceware.org/project/gcc/list/?series=21577 https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311487.html I'm making [PATCH 1/4][V5][RISC-V] support cm.push cm.pop cm.popret in zcmp for the 1st issue you catched. Please let me know if you want me to merge https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311486.html into [PATCH 1/4][V5][RISC-V]. BR, Fei On 2023-08-16 16:38 Kito Cheng wrote: > >Another fail case for CFI: > >$ riscv64-unknown-elf-gcc _mulhc3.i >-march=rv64imafd_zicsr_zifencei_zca_zcmp -mabi=lp64d -g -O2 -o >_mulhc3.s > >typedef float a __attribute__((mode(HF))); >b, c; >f() { > a a, d, e = a + d; > if (g() && e) > c = b; >} > > >0x10e508a maybe_record_trace_start > ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2584 >0x10e58fb scan_trace > ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2784 >0x10e5fab create_cfi_notes > ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2938 >0x10e6ee4 execute_dwarf2_frame > ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3309 >0x10e7c5a execute > ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3797 > >On Wed, Aug 16, 2023 at 4:33 PM Kito Cheng wrote: >> >> Hi Fei: >> >> Tried to use Jiawei's patch to test this patch and found some issue: >> >> >> > @@ -5430,13 +5632,15 @@ riscv_expand_prologue (void) >> > /* Save the registers. */ >> > if ((frame->mask | frame->fmask) != 0) >> > { >> > - HOST_WIDE_INT step1 = riscv_first_stack_step (frame, >> > remaining_size); >> > - >> > - insn = gen_add3_insn (stack_pointer_rtx, >> > - stack_pointer_rtx, >> > - GEN_INT (-step1)); >> > - RTX_FRAME_RELATED_P (emit_insn (insn)) = 1; >> > - remaining_size -= step1; >> > + if (known_gt (remaining_size, frame->frame_pointer_offset)) >> > + { >> > + HOST_WIDE_INT step1 = riscv_first_stack_step (frame, >> > remaining_size); >> > + remaining_size -= step1; >> > + insn = gen_add3_insn (stack_pointer_rtx, >> > + stack_pointer_rtx, >> > + GEN_INT (-step1)); >> > + RTX_FRAME_RELATED_P (emit_insn (insn)) = 1; >> > + } >> > riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, >> >false); >> > } >> > >> >> I hit some issue here during building libgcc, I use >> riscv-gnu-toolchain with --with-arch=rv64gzca_zcmp >> >> And the error message is: >> >> In file included from >> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind-dw2.c:1471: >> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc: In >> function '_Unwind_Backtrace': >> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc:330:1: >> internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176 >> 330 | } >> | ^ >> 0x83753a gen_reg_rtx(machine_mode) >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/emit-rtl.cc:1176 >> 0xf5566f maybe_legitimize_operand >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8047 >> 0xf5566f maybe_legitimize_operands(insn_code, unsigned int, unsigned >> int, expand_operand*) >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8191 >> 0xf511d9 maybe_gen_insn(insn_code, unsigned int, expand_operand*) >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8210 >> 0xf58539 expand_binop_directly >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1452 >> 0xf5 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, >> rtx_def*, int, optab_methods) >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1539 >> 0xcbfdd0 force_operand(rtx_def*, rtx_def*) >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:8231 >> 0xc8fca1 force_reg(machine_mode, rtx_def*) >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/explow.cc:687 >> 0x144b8cd riscv_force_temporary >> >>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1531 >> 0x144b8cd riscv_force_address >> >>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1528 >> 0x144b8cd riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*) >> >>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:2387 >> 0x1af063e gen_movdf(rtx_def*, rtx_def*) >> >>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.md:2107 >> 0xcba503 rtx_insn* insn_gen_fn::operator()> rtx_def*>(rtx_def*, rtx_def*) const >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/recog.h:411 >> 0xcba503 emit_move_insn_1(rtx_def*, rtx_def*) >> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:4164 >> 0x143d6c4 riscv_emit_move(rtx_def*, rtx_def*) >
[PATCH] Testsuite, LTO: silence warning to make test pass on Darwin
Hi, On darwin (both x86_64-apple-darwin and aarch64-apple-darwin) we see the following test failure: FAIL: gcc.dg/lto/20091013-1 c_lto_20091013-1_2.o assemble, -fPIC -r -nostdlib -O2 -flto which is due to this extra warning: In function 'fontcmp', inlined from 'find_in_cache' at /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:140:13, inlined from 'WineEngCreateFontInstance' at /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:160:15: /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:107:8: warning: 'memcmp' specified bound 4 exceeds source size 0 [-Wst ringop-overread] /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c: In function 'WineEngCreateFontInstance': /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:66:20: note: source object allocated here Now, the main file for the test has: /* { dg-extra-ld-options "-flinker-output=nolto-rel -Wno-stringop-overread" } */ and I believe the intent of -Wno-stringop-overread is to silence this warning, but that only applies to the linker, and the warning on darwin is produced by the compiler (in addition to the linker). Adding the flag to the compilation of the source file makes the test pass on darwin. OK to commit? FX 0001-Testsuite-LTO-silence-warning-to-make-test-pass-on-D.patch Description: Binary data
Re: [PATCH] core: Support heap-based trampolines
Hi, A gentle ping on the revised patch, for Richard or another global reviewer. Thanks, FX > Le 5 août 2023 à 16:20, FX Coudert a écrit : > > Hi Richard, > > Thanks for your feedback. Here is an amended version of the patch, taking > into consideration your requests and the following discussion. There is no > configure option for the libgcc part, and the documentation is amended. The > patch is split into three commits for core, target and libgcc. > > Currently regtesting on x86_64 linux and darwin (it was fine before I split > up into three commits, so I’m re-testing to make sure I didn’t screw anything > up). > > OK to commit? > FX 0001-core-Support-heap-based-trampolines.patch Description: Binary data 0002-target-Support-heap-based-trampolines.patch Description: Binary data 0003-libgcc-support-heap-based-trampolines.patch Description: Binary data
[committed] d: Merge upstream dmd, druntime 26f049fb26, phobos 330d6a4fd.
Hi, This patch merges the D front-end and run-time library with upstream dmd 26f049fb26, and standard library with phobos 330d6a4fd. Synchronizing with the latest bug fixes in the v2.105.0-beta.1 release. D front-end changes: - Import dmd v2.105.0-beta.1. - Added predefined version identifier VisionOS (ignored by GDC). - Functions can no longer have `enum` storage class. - The deprecation of the `body` keyword has been reverted, it is now an obsolete feature. - The error for `scope class` has been reverted, it is now an obsolete feature. D runtime changes: - Import druntime v2.105.0-beta.1. Phobos changes: - Import phobos v2.105.0-beta.1. - AliasSeq has been removed from std.math. - extern(C) getdelim and getline have been removed from std.stdio. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 26f049fb26. * dmd/VERSION: Bump version to v2.105.0-beta.1. * d-codegen.cc (get_frameinfo): Check useGC in condition. * d-lang.cc (d_handle_option): Set obsolete parameter when compiling with -Wall. (d_post_options): Set useGC to false when compiling with -fno-druntime. Propagate obsolete flag to compileEnv. * expr.cc (ExprVisitor::visit (CatExp *)): Check useGC in condition. Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed to mainline. Regards, Iain. --- libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 26f049fb26. * src/MERGE: Merge upstream phobos 330d6a4fd. --- gcc/d/d-codegen.cc| 2 +- gcc/d/d-lang.cc | 3 + gcc/d/dmd/MERGE | 2 +- gcc/d/dmd/VERSION | 2 +- gcc/d/dmd/clone.d | 2 +- gcc/d/dmd/common/string.d | 2 +- gcc/d/dmd/cond.d | 1 + gcc/d/dmd/cparse.d| 10 +- gcc/d/dmd/dsymbolsem.d| 194 ++ gcc/d/dmd/errors.d| 34 +-- gcc/d/dmd/expression.d| 24 ++- gcc/d/dmd/expression.h| 6 +- gcc/d/dmd/expressionsem.d | 4 +- gcc/d/dmd/func.d | 18 +- gcc/d/dmd/globals.d | 10 +- gcc/d/dmd/globals.h | 11 +- gcc/d/dmd/initsem.d | 25 ++- gcc/d/dmd/lexer.d | 1 + gcc/d/dmd/nogc.d | 2 +- gcc/d/dmd/parse.d | 86 +--- gcc/d/dmd/semantic3.d | 3 +- gcc/d/dmd/target.d| 4 +- gcc/d/dmd/target.h| 2 +- gcc/d/dmd/traits.d| 23 ++- gcc/d/expr.cc | 2 +- gcc/testsuite/gdc.test/compilable/cppmangle.d | 1 - .../gdc.test/compilable/deprecate14283.d | 8 +- .../gdc.test/compilable/emptystatement.d | 19 ++ .../gdc.test/compilable/imports/imp24022.c| 5 + .../gdc.test/compilable/parens_inc.d | 23 +++ gcc/testsuite/gdc.test/compilable/test23951.d | 10 + gcc/testsuite/gdc.test/compilable/test23966.d | 19 ++ gcc/testsuite/gdc.test/compilable/test24022.d | 30 +++ gcc/testsuite/gdc.test/compilable/test7172.d | 6 +- .../gdc.test/fail_compilation/biterrors3.d| 2 +- .../gdc.test/fail_compilation/body.d | 11 + .../gdc.test/fail_compilation/ccast.d | 21 +- .../gdc.test/fail_compilation/diag4596.d | 4 +- .../gdc.test/fail_compilation/enum_function.d | 13 ++ .../gdc.test/fail_compilation/fail10285.d | 12 +- .../gdc.test/fail_compilation/fail13116.d | 2 +- .../gdc.test/fail_compilation/fail15896.d | 1 + .../gdc.test/fail_compilation/fail22729.d | 2 +- .../gdc.test/fail_compilation/fail22780.d | 2 +- .../gdc.test/fail_compilation/fail4559.d | 22 -- .../gdc.test/fail_compilation/format.d| 21 +- .../fail_compilation/reserved_version.d | 2 + .../gdc.test/fail_compilation/scope_class.d | 2 +- .../gdc.test/fail_compilation/scope_type.d| 16 -- .../gdc.test/fail_compilation/test23279.d | 14 ++ .../gdc.test/fail_compilation/typeerrors.d| 2 +- gcc/testsuite/gdc.test/runnable/betterc.d | 11 + gcc/testsuite/gdc.test/runnable/sctor2.d | 5 - gcc/testsuite/gdc.test/runnable/test24029.c | 23 +++ .../gdc.test/runnable/testcontracts.d | 16 -- libphobos/libdruntime/MERGE | 2 +- libphobos/libdruntime/core/int128.d | 8 +- .../core/internal/array/comparison.d | 25 ++- libphobos/libdruntime/core/lifetime.d | 6 +- libphobos/src/MERGE | 2 +- libpho
Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.
On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote: > Seems ARMv8-A only guarantees to preserve low 64-bit value of > NEON/floating-point register value. I'm not sure that I modify the > testcase in the right way and maybe we need more investigations. Any > ideas or suggestion? Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my test case is not valid: "As with global register variables, it is recommended that you choose a register that is normally saved and restored by function calls on your machine, so that calls to library routines will not clobber it." So when I use asm(name), the compiler has no obligation to guarantee that it will ever work like a normal variable after a function call. But I still need to verify that the compiler correctly understands only the low 64 bits of the vector register is saved. I'll try to make another test case... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[committed] fix misleading identation breaking bootstrap
Committed as obvious. fix misleading identation breaking bootstrap Fix identation issue introduced by 966f3c13 "Fix format attribute for printf". gcc/c-family/ChangeLog: * c-format.cc: Fix identation. diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc index 122ff9bd1cd..b3ef2d44ce9 100644 --- a/gcc/c-family/c-format.cc +++ b/gcc/c-family/c-format.cc @@ -1214,8 +1214,8 @@ check_function_format (const_tree fn, tree attrs, int nargs, skipped_default_format = true; break; } - if (skipped_default_format) - continue; + if (skipped_default_format) +continue; } if (warn_format)
[PATCHv2/COMMITTED] MATCH: Sink convert for vec_cond
Convert be sinked into a vec_cond if both sides fold. Unlike other unary operations, we need to check that we still can handle this vec_cond's first operand is the same as the new truth type. I tried a few different versions of this patch: view_convert to the new truth_type but that does not work as we always support all vec_cond afterwards. using expand_vec_cond_expr_p; but that would allow too much. I also tried to see if view_convert can be handled here but we end up with: _3 = VEC_COND_EXPR <_2, { Nan(-1), Nan(-1), Nan(-1), Nan(-1) }, { 0.0, 0.0, 0.0, 0.0 }>; Which isel does not know how to handle as just being a view_convert from `vector(4) ` to `vector(4) float` and causes a regression with `g++.target/i386/pr88152.C` Note, in the case of the SVE testcase, we will sink negate after the convert and be able to remove a few extra instructions in the end. Also with this change gcc.target/aarch64/sve/cond_unary_5.c will now pass. Committed as approved after a bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu. gcc/ChangeLog: PR tree-optimization/111006 PR tree-optimization/110986 * match.pd: (op(vec_cond(a,b,c))): Handle convert for op. gcc/testsuite/ChangeLog: PR tree-optimization/111006 * gcc.target/aarch64/sve/cond_convert_7.c: New test. --- gcc/match.pd | 8 +++ .../gcc.target/aarch64/sve/cond_convert_7.c | 23 +++ 2 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c diff --git a/gcc/match.pd b/gcc/match.pd index 6b2d3a11776..851f1af6eac 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4710,6 +4710,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (op (vec_cond:s @0 @1 @2)) (vec_cond @0 (op! @1) (op! @2 +/* Sink unary conversions to branches, but only if we do fold both + and the target's truth type is the same as we already have. */ +(simplify + (convert (vec_cond:s @0 @1 @2)) + (if (VECTOR_TYPE_P (type) + && types_match (TREE_TYPE (@0), truth_type_for (type))) + (vec_cond @0 (convert! @1) (convert! @2 + /* Sink binary operation to branches, but only if we can fold it. */ (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor lshift rshift rdiv trunc_div ceil_div floor_div round_div diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c new file mode 100644 index 000..4bb95b92195 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 -fdump-tree-optimized" } */ + +/* This is a modified reduced version of cond_unary_5.c */ + +void __attribute__ ((noipa)) +f0 (unsigned short *__restrict r, + int *__restrict a, + int *__restrict pred) +{ + for (int i = 0; i < 1024; ++i) + { +int p = pred[i]?-1:0; +r[i] = p ; + } +} + +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]+/z, #-1} 1 } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.[hs], p[0-7]+/z, #1} } } */ + +/* { dg-final { scan-tree-dump-not "VIEW_CONVERT_EXPR " "optimized" } } */ +/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " = \\\(vector" "optimized" } } */ -- 2.31.1