Re: GCC: v850-elf
On 3/16/2021 5:32 AM, Nick Clifton via Gcc-patches wrote: Hi Jan-Benedict, With my re-started testing efforts, I've got another one for you I guess (`./configure --target=v850-elf && make all-gcc`): ../.././gcc/config/v850/v850.c: In function ‘char* construct_restore_jr(rtx)’: ../.././gcc/config/v850/v850.c:2260:22: error: ‘%s’ directive writing up to 39 bytes into a region of size between 32 and 71 [-Werror=format-overflow=] 2260 | sprintf (buff, "movhi hi(%s), r0, r6\n\tmovea lo(%s), r6, r6\n\tjmp r6", | ^~~~ 2261 | name, name); | I could not reproduce these in my local environment, but I suspect that you are using a more recent version of gcc than me. The fix looks obvious however, so please could you try out the attached patch and let me know if it resolves the problem ? I suspect that construct_save_jarl has the same problem and the same patch should fix it. Jeff
Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]
Thanks Jakub & Segher, On 2021/3/17 06:47, Segher Boessenkool wrote: Hi! On Tue, Mar 16, 2021 at 07:57:17PM +0100, Jakub Jelinek wrote: On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote: diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ec068c58aa5..48eb91132a9 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -7000,11 +7000,15 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx) gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); - gcc_assert (GET_MODE (idx) == E_SImode); - machine_mode inner_mode = GET_MODE (val); - rtx tmp = gen_reg_rtx (GET_MODE (idx)); + machine_mode idx_mode = GET_MODE (idx); + rtx tmp = gen_reg_rtx (DImode); + if (idx_mode != DImode) +tmp = convert_modes (DImode, idx_mode, idx, 0); + else +tmp = idx; + int width = GET_MODE_SIZE (inner_mode); gcc_assert (width >= 1 && width <= 8); @@ -7012,9 +7016,7 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx) int shift = exact_log2 (width); /* Generate the IDX for permute shift, width is the vector element size. idx = idx * width. */ - emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift))); - - tmp = convert_modes (DImode, SImode, tmp, 1); + emit_insn (gen_ashldi3 (tmp, tmp, GEN_INT (shift))); This looks broken. Yup. The gen_reg_rtx (DImode); call result is completely ignored, so it wastes one pseudo, and I'm not convinced that idx nor tmp is guaranteed to be usable as output of shift. So, shouldn't it be instead: rtx tmp = gen_reg_rtx (DImode); if (idx_mode != DImode) idx = convert_modes (DImode, idx_mode, idx, 0); ... emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift)); ? Yeah, something like that. Just idx = convert_modes (DImode, idx_mode, idx, 1); ... rtx tmp = gen_reg_rtx (DImode); emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift)); works just as well. Also, dunno what is less and what is more expensive on rs6000, whether convert_modes with unsigned = 0 or 1. Unsigned is never more expensive. Sometimes it is cheaper. It depends more on the situation what is best. Just look at the generated code? Change it to "idx = convert_modes (DImode, idx_mode, idx, 1);" For the case with "-mvsx -Og -mcpu=power9": vector char f2(vector char v, char k, char val) { v[k] = val; return v; } gimple: _1 = (int) k_2(D); _7 = v; _8 = .VEC_SET (_7, val_4(D), _1); v = _8; _6 = v; the expanded RTL is: 6: NOTE_INSN_BASIC_BLOCK 2 2: r121:V16QI=%2:V16QI 3: r122:DI=%5:DI 4: r123:DI=%6:DI 5: NOTE_INSN_FUNCTION_BEG 8: r117:DI=sign_extend(r122:DI#0) 9: r124:V16QI=r121:V16QI 10: r126:QI=r123:DI#0 11: r127:DI=zero_extend(r117:DI#0) 12: r128:DI=r127:DI<<0 13: r129:V16QI=unspec[r128:DI] 152 14: r130:V16QI=unspec[r128:DI] 151 15: r124:V16QI=unspec[r124:V16QI,r124:V16QI,r129:V16QI] 236 16: r124:V16QI=unspec[r124:V16QI,r126:QI,0] 118 17: r124:V16QI=unspec[r124:V16QI,r124:V16QI,r130:V16QI] 236 18: r119:V16QI=r124:V16QI 19: r121:V16QI=r119:V16QI 20: r120:V16QI=r121:V16QI 24: %2:V16QI=r120:V16QI 25: use %2:V16QI sign_extend of insn #8 is generated by "_1 = (int) k_2(D);", zero_extend of insn #11 is the convert added by this patch. Both will be optimized out in final ASM. Since k is index of vector element, so it could only be unsigned value, which means zero_extend should be used here as well? PS: If type of k is "unsigned int", no zero_extend in expander, and if k is "int", a zero_extend is generated in expander. Attached the updated patch. Thanks. Xionghu From 06bff74a35ad7641dffc87d1eab9c5872851ae79 Mon Sep 17 00:00:00 2001 From: Xionghu Luo Date: Tue, 16 Mar 2021 21:23:14 -0500 Subject: [PATCH] rs6000: Convert the vector set variable idx to DImode [PR98914] vec_insert defines the element argument type to be signed int by ELFv2 ABI. When expanding a vector with a variable rtx, convert the rtx type to DImode to support both intrinsic usage and other callers from rs6000_expand_vector_init produced by v[k] = val when k is long type. gcc/ChangeLog: 2021-03-17 Xionghu Luo PR target/98914 * config/rs6000/rs6000.c (rs6000_expand_vector_set_var_p9): Convert idx to DImode. (rs6000_expand_vector_set_var_p8): Likewise. gcc/testsuite/ChangeLog: 2021-03-17 Xionghu Luo PR target/98914 * gcc.target/powerpc/pr98914.c: New test. --- gcc/config/rs6000/rs6000.c | 39 ++ gcc/testsuite/gcc.target/powerpc/pr98914.c | 11 ++ 2 files changed, 29 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr98914.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ec068c58aa5..bb7fcdca876 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -7000,21 +7000,22 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx) gcc_assert
Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]
Segher, The const_anchor should work on both 64 and 32 bit. I think the constant loading is cheap on 32 bit platform, so I only enabled it on 64 bit. I will add a test case and verify the patch on Darwin and AIX. Thanks. On 17/3/2021 上午 12:35, Segher Boessenkool wrote: Hi! On Mon, Mar 15, 2021 at 11:11:32AM +0800, HAO CHEN GUI via Gcc-patches wrote: This patch adds const_anchor for rs6000. The const_anchor is used in cse pass. 1) This isn't suitable for stage 4. 2) Please add a test case, which shows what it does, that it is useful. 3) Does this work on other OSes than Linux? What about Darwin and AIX? diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ec068c58aa5..2b2350c53ae 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p) warning (0, "%qs is deprecated and not recommended in any circumstances", "-mno-speculate-indirect-jumps"); + if (TARGET_64BIT) +{ + targetm.min_anchor_offset = -32768; + targetm.max_anchor_offset = 32767; + targetm.const_anchor = 0x8000; +} Why only on 64 bit? Why these choices? Segher
Re: [PATCH] c-family: Fix up -Wduplicated-branches for union members [PR99565]
On 3/16/21 11:56 AM, Jakub Jelinek via Gcc-patches wrote: Hi! Honza has fairly recently changed operand_equal_p to compare DECL_FIELD_OFFSET for COMPONENT_REFs when comparing addresses. As the first testcase in this patch shows, while that is very nice for optimizations, for the -Wduplicated-branches warning it causes regressions. Pedantically a union in both C and C++ has only one active member at a time, so using some other union member even if it has the same type is UB, so I think the warning shouldn't warn when it sees access to different fields that happen to have the same offset and should consider them different. In my first attempt to fix this I've keyed the old behavior on OEP_LEXICOGRAPHIC, but unfortunately that has various problems, the warning has a quick non-lexicographic compare in build_conditional_expr* and another lexicographic more expensive one later during genericization and turning the first one into lexicographic would mean wasting compile time on large conditionals. So, this patch instead introduces a new OEP_ flag and makes sure to pass it to operand_equal_p in all -Wduplicated-branches cases. The cvt.c changes are because on the other testcase we were warning with UNKNOWN_LOCATION, so the user wouldn't really know where the questionable code is. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-03-16 Jakub Jelinek PR c++/99565 * tree-core.h (enum operand_equal_flag): Add OEP_DUPLICATED_BRANCHES. * fold-const.c (operand_compare::operand_equal_p): Don't compare field offsets if OEP_DUPLICATED_BRANCHES. * c-warn.c (do_warn_duplicated_branches): Pass also OEP_DUPLICATED_BRANCHES to operand_equal_p. * c-typeck.c (build_conditional_expr): Pass OEP_DUPLICATED_BRANCHES to operand_equal_p. * call.c (build_conditional_expr_1): Pass OEP_DUPLICATED_BRANCHES to operand_equal_p. * cvt.c (convert_to_void): Preserve location_t on COND_EXPR or or COMPOUND_EXPR. * g++.dg/warn/Wduplicated-branches6.C: New test. * g++.dg/warn/Wduplicated-branches7.C: New test. --- gcc/tree-core.h.jj 2021-01-16 22:52:33.654413400 +0100 +++ gcc/tree-core.h 2021-03-16 16:24:06.136829900 +0100 @@ -896,7 +896,10 @@ enum operand_equal_flag { OEP_HASH_CHECK = 32, /* Makes operand_equal_p handle more expressions: */ OEP_LEXICOGRAPHIC = 64, - OEP_BITWISE = 128 + OEP_BITWISE = 128, + /* Considers different fields different even when they have + the same offset. */ + OEP_DUPLICATED_BRANCHES = 256 It seems sort of "inverted:" I'd expect OEP_LEXICOGRAPHIC on its own to do a lexicographical comparison, without having to set an additional bit to ask for it. If a more refined form of a lexicographical comparison is needed that considers lexicographically distinct names equal then adding a new bit for that would seem like the right design. I'd also expect the name of the new bit to be independent of the context where it's used (like -Wduplicated-branches), in case it also needs to be used in an unrelated warning. (E.g., the new -Wvla-parameter also uses OEP_LEXICOGRAPHICAL and if it does need to set a new bit it would look odd if that bit had the name of an unrelated warning in it.) Martin }; /* Enum and arrays used for tree allocation stats. --- gcc/fold-const.c.jj 2021-02-24 12:10:17.571212194 +0100 +++ gcc/fold-const.c2021-03-16 16:19:16.327018956 +0100 @@ -3317,7 +3317,7 @@ operand_compare::operand_equal_p (const_ flags &= ~OEP_ADDRESS_OF; if (!OP_SAME (1)) { - if (compare_address) + if (compare_address && (flags & OEP_DUPLICATED_BRANCHES) == 0) { if (TREE_OPERAND (arg0, 2) || TREE_OPERAND (arg1, 2)) --- gcc/c-family/c-warn.c.jj2021-02-12 23:57:30.459142340 +0100 +++ gcc/c-family/c-warn.c 2021-03-16 16:24:19.224685926 +0100 @@ -2779,7 +2779,8 @@ do_warn_duplicated_branches (tree expr) /* Compare the hashes. */ if (h0 == h1 - && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC) + && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC + | OEP_DUPLICATED_BRANCHES) /* Don't warn if any of the branches or their subexpressions comes from a macro. */ && !walk_tree_without_duplicates (, expr_from_macro_expansion_r, --- gcc/c/c-typeck.c.jj 2021-02-18 22:17:44.184657291 +0100 +++ gcc/c/c-typeck.c2021-03-16 16:23:58.335915719 +0100 @@ -5490,7 +5490,7 @@ build_conditional_expr (location_t colon warn here, because the COND_EXPR will be turned into OP1. */ if (warn_duplicated_branches && TREE_CODE (ret) == COND_EXPR - && (op1 == op2 || operand_equal_p (op1, op2, 0))) + && (op1 == op2 || operand_equal_p (op1, op2, OEP_DUPLICATED_BRANCHES))) warning_at (EXPR_LOCATION (ret),
[WIP] 'walk_gimple_seq' backward
Hi! On 2021-03-17T00:24:55+0100, I wrote: > Now, walking each function backwards (!), [...] > I've now got a simple 'callback_op', which for '!is_lhs' looks at > 'get_base_address ([op])', and if that 'var' is contained in the set of > current candidates (initialized per containg 'bind's, which we enter > first, even if walking a 'gimple_seq' backwards), removes that 'var' as a > candidate for such optimization. (Plus some "details", of couse.) This > seems to work fine, as far as I can tell. :-) Is something like the attached "[WIP] 'walk_gimple_seq' backward" OK conceptually? (For next development stage 1 and with all the TODOs resolved, of course.) The 'backward' flag cannot simply be a argument to 'walk_gimple_seq' etc.: it needs to also be passed to 'walk_gimple_seq_mod' calls triggered from inside 'walk_gimple_stmt'. Hence, I've put it into the state 'struct walk_stmt_info'. Grüße Thomas - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf >From 48036d65cea6bb47c7d4eca3b4fea77e058f29e3 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 15 Mar 2021 17:31:00 +0100 Subject: [PATCH] [WIP] 'walk_gimple_seq' backward This seems to work -- for the one case where I'm using it... --- gcc/doc/gimple.texi | 2 ++ gcc/gimple-walk.c | 15 --- gcc/gimple-walk.h | 4 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 5e0fc2e0dc5..51fe0f2d715 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -2778,4 +2778,6 @@ calling @code{walk_gimple_stmt} on each one. @code{WI} is as in @code{walk_gimple_stmt}. If @code{walk_gimple_stmt} returns non-@code{NULL}, the walk is stopped and the value returned. Otherwise, all the statements are walked and @code{NULL_TREE} returned. + +TODO update for forward vs. backward. @end deftypefn diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c index 9a761f32578..dfc2f0b4dbc 100644 --- a/gcc/gimple-walk.c +++ b/gcc/gimple-walk.c @@ -32,6 +32,8 @@ along with GCC; see the file COPYING3. If not see /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt on each one. WI is as in walk_gimple_stmt. + TODO update for forward vs. backward. + If walk_gimple_stmt returns non-NULL, the walk is stopped, and the value is stored in WI->CALLBACK_RESULT. Also, the statement that produced the value is returned if this statement has not been @@ -44,9 +46,10 @@ gimple * walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt, walk_tree_fn callback_op, struct walk_stmt_info *wi) { - gimple_stmt_iterator gsi; + bool forward = !(wi && wi->backward); - for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); ) + gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq); + for (; !gsi_end_p (gsi); ) { tree ret = walk_gimple_stmt (, callback_stmt, callback_op, wi); if (ret) @@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt, } if (!wi->removed_stmt) - gsi_next (); + { + if (forward) + gsi_next (); + else //TODO Correct? + gsi_prev (); + //TODO This could do with some unit testing, to make sure all the corner cases (removing first/last, for example) work correctly. + } } if (wi) diff --git a/gcc/gimple-walk.h b/gcc/gimple-walk.h index bdc7351f190..9590c63ba18 100644 --- a/gcc/gimple-walk.h +++ b/gcc/gimple-walk.h @@ -71,6 +71,10 @@ struct walk_stmt_info /* True if we've removed the statement that was processed. */ BOOL_BITFIELD removed_stmt : 1; + + /*TODO */ + //TODO Only applicable for 'walk_gimple_seq'. + BOOL_BITFIELD backward : 1; }; /* Callback for walk_gimple_stmt. Called for every statement found -- 2.17.1
Re: [PATCH] c-family: Fix up -Wduplicated-branches for union members [PR99565]
On Tue, Mar 16, 2021 at 06:56:47PM +0100, Jakub Jelinek wrote: > Hi! > > Honza has fairly recently changed operand_equal_p to compare > DECL_FIELD_OFFSET for COMPONENT_REFs when comparing addresses. > As the first testcase in this patch shows, while that is very nice > for optimizations, for the -Wduplicated-branches warning it causes > regressions. Pedantically a union in both C and C++ has only one > active member at a time, so using some other union member even if it has the > same type is UB, so I think the warning shouldn't warn when it sees access > to different fields that happen to have the same offset and should consider > them different. > In my first attempt to fix this I've keyed the old behavior on > OEP_LEXICOGRAPHIC, but unfortunately that has various problems, the warning > has a quick non-lexicographic compare in build_conditional_expr* and another > lexicographic more expensive one later during genericization and turning the > first one into lexicographic would mean wasting compile time on large > conditionals. > So, this patch instead introduces a new OEP_ flag and makes sure to pass it > to operand_equal_p in all -Wduplicated-branches cases. > > The cvt.c changes are because on the other testcase we were warning with > UNKNOWN_LOCATION, so the user wouldn't really know where the questionable > code is. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-03-16 Jakub Jelinek > > PR c++/99565 > * tree-core.h (enum operand_equal_flag): Add OEP_DUPLICATED_BRANCHES. > * fold-const.c (operand_compare::operand_equal_p): Don't compare > field offsets if OEP_DUPLICATED_BRANCHES. > > * c-warn.c (do_warn_duplicated_branches): Pass also > OEP_DUPLICATED_BRANCHES to operand_equal_p. > > * c-typeck.c (build_conditional_expr): Pass OEP_DUPLICATED_BRANCHES > to operand_equal_p. > > * call.c (build_conditional_expr_1): Pass OEP_DUPLICATED_BRANCHES > to operand_equal_p. > * cvt.c (convert_to_void): Preserve location_t on COND_EXPR or > or COMPOUND_EXPR. > > * g++.dg/warn/Wduplicated-branches6.C: New test. > * g++.dg/warn/Wduplicated-branches7.C: New test. > > --- gcc/tree-core.h.jj2021-01-16 22:52:33.654413400 +0100 > +++ gcc/tree-core.h 2021-03-16 16:24:06.136829900 +0100 > @@ -896,7 +896,10 @@ enum operand_equal_flag { >OEP_HASH_CHECK = 32, >/* Makes operand_equal_p handle more expressions: */ >OEP_LEXICOGRAPHIC = 64, > - OEP_BITWISE = 128 > + OEP_BITWISE = 128, > + /* Considers different fields different even when they have > + the same offset. */ > + OEP_DUPLICATED_BRANCHES = 256 > }; > > /* Enum and arrays used for tree allocation stats. > --- gcc/fold-const.c.jj 2021-02-24 12:10:17.571212194 +0100 > +++ gcc/fold-const.c 2021-03-16 16:19:16.327018956 +0100 > @@ -3317,7 +3317,7 @@ operand_compare::operand_equal_p (const_ > flags &= ~OEP_ADDRESS_OF; > if (!OP_SAME (1)) > { > - if (compare_address) > + if (compare_address && (flags & OEP_DUPLICATED_BRANCHES) == 0) > { > if (TREE_OPERAND (arg0, 2) > || TREE_OPERAND (arg1, 2)) > --- gcc/c-family/c-warn.c.jj 2021-02-12 23:57:30.459142340 +0100 > +++ gcc/c-family/c-warn.c 2021-03-16 16:24:19.224685926 +0100 > @@ -2779,7 +2779,8 @@ do_warn_duplicated_branches (tree expr) > >/* Compare the hashes. */ >if (h0 == h1 > - && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC) > + && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC > + | OEP_DUPLICATED_BRANCHES) This change isn't strictly necessary, OEP_DUPLICATED_BRANCHES is needed mainly in build_conditional_expr*, right? (It makes sense to me though, so let's leave it in.) >/* Don't warn if any of the branches or their subexpressions comes >from a macro. */ >&& !walk_tree_without_duplicates (, expr_from_macro_expansion_r, > --- gcc/c/c-typeck.c.jj 2021-02-18 22:17:44.184657291 +0100 > +++ gcc/c/c-typeck.c 2021-03-16 16:23:58.335915719 +0100 > @@ -5490,7 +5490,7 @@ build_conditional_expr (location_t colon > warn here, because the COND_EXPR will be turned into OP1. */ >if (warn_duplicated_branches >&& TREE_CODE (ret) == COND_EXPR > - && (op1 == op2 || operand_equal_p (op1, op2, 0))) > + && (op1 == op2 || operand_equal_p (op1, op2, OEP_DUPLICATED_BRANCHES))) > warning_at (EXPR_LOCATION (ret), OPT_Wduplicated_branches, > "this condition has identical branches"); The c/ and c-family/ parts are ok, thanks. Marek
[PATCH] rs6000: Fix disassembling a vector pair in gcc-10 in little-endian mode
In gcc-10, we don't handle disassembling a vector pair in little-endian mode correctly. The solution is to make use of the disassemble accumulator code that is endian friendly. Trunk does not have this bug, as the use of opaque modes for the MMA types "fixed" this issue there. This passed bootstrap and regtesting on powerpc64le-linux with no regressions. Ok for the GCC 10 release branch? Peter gcc/ 2021-03-16 Peter Bergner * config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle disassembling a vector pair vector by vector in little-endian mode. diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 538a57adceb..a112593878a 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -10850,10 +10850,12 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi) tree src = make_ssa_name (TREE_TYPE (src_type)); gimplify_assign (src, build_simple_mem_ref (src_ptr), _seq); - /* If we are not disassembling an accumulator or our destination is -another accumulator, then just copy the entire thing as is. */ - if (fncode != MMA_BUILTIN_DISASSEMBLE_ACC - || TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node) + /* If we are disassembling an accumulator/pair and our destination is +another accumulator/pair, then just copy the entire thing as is. */ + if ((fncode == MMA_BUILTIN_DISASSEMBLE_ACC + && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node) + || (fncode == VSX_BUILTIN_DISASSEMBLE_PAIR + && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_pair_type_node)) { tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR, src_type, dst_ptr)); @@ -10865,21 +10867,25 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi) /* We're disassembling an accumulator into a different type, so we need to emit a xxmfacc instruction now, since we cannot do it later. */ - new_decl = rs6000_builtin_decls[MMA_BUILTIN_XXMFACC_INTERNAL]; - new_call = gimple_build_call (new_decl, 1, src); - src = make_ssa_name (vector_quad_type_node); - gimple_call_set_lhs (new_call, src); - gimple_seq_add_stmt (_seq, new_call); + if (fncode == MMA_BUILTIN_DISASSEMBLE_ACC) + { + new_decl = rs6000_builtin_decls[MMA_BUILTIN_XXMFACC_INTERNAL]; + new_call = gimple_build_call (new_decl, 1, src); + src = make_ssa_name (vector_quad_type_node); + gimple_call_set_lhs (new_call, src); + gimple_seq_add_stmt (_seq, new_call); + } - /* Copy the accumulator vector by vector. */ + /* Copy the accumulator/pair vector by vector. */ + unsigned nvecs = (fncode == MMA_BUILTIN_DISASSEMBLE_ACC) ? 4 : 2; tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node, ptr_mode, true); tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr); - tree array_type = build_array_type_nelts (unsigned_V16QI_type_node, 4); + tree array_type = build_array_type_nelts (unsigned_V16QI_type_node, nvecs); tree src_array = build1 (VIEW_CONVERT_EXPR, array_type, src); - for (unsigned i = 0; i < 4; i++) + for (unsigned i = 0; i < nvecs; i++) { - unsigned index = WORDS_BIG_ENDIAN ? i : 3 - i; + unsigned index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i; tree ref = build4 (ARRAY_REF, unsigned_V16QI_type_node, src_array, build_int_cst (size_type_node, i), NULL_TREE, NULL_TREE);
Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]
Hi! On Tue, Mar 16, 2021 at 07:57:17PM +0100, Jakub Jelinek wrote: > On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote: > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > > index ec068c58aa5..48eb91132a9 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -7000,11 +7000,15 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx > > val, rtx idx) > > > >gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); > > > > - gcc_assert (GET_MODE (idx) == E_SImode); > > - > >machine_mode inner_mode = GET_MODE (val); > > > > - rtx tmp = gen_reg_rtx (GET_MODE (idx)); > > + machine_mode idx_mode = GET_MODE (idx); > > + rtx tmp = gen_reg_rtx (DImode); > > + if (idx_mode != DImode) > > +tmp = convert_modes (DImode, idx_mode, idx, 0); > > + else > > +tmp = idx; > > + > >int width = GET_MODE_SIZE (inner_mode); > > > >gcc_assert (width >= 1 && width <= 8); > > @@ -7012,9 +7016,7 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, > > rtx idx) > >int shift = exact_log2 (width); > >/* Generate the IDX for permute shift, width is the vector element size. > > idx = idx * width. */ > > - emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift))); > > - > > - tmp = convert_modes (DImode, SImode, tmp, 1); > > + emit_insn (gen_ashldi3 (tmp, tmp, GEN_INT (shift))); > > This looks broken. Yup. > The gen_reg_rtx (DImode); call result is completely ignored, > so it wastes one pseudo, and I'm not convinced that idx nor tmp > is guaranteed to be usable as output of shift. > So, shouldn't it be instead: > rtx tmp = gen_reg_rtx (DImode); > if (idx_mode != DImode) > idx = convert_modes (DImode, idx_mode, idx, 0); > > ... > emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift)); > ? Yeah, something like that. Just idx = convert_modes (DImode, idx_mode, idx, 1); ... rtx tmp = gen_reg_rtx (DImode); emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift)); works just as well. > Also, dunno what is less and what is more expensive on > rs6000, whether convert_modes with unsigned = 0 or 1. Unsigned is never more expensive. Sometimes it is cheaper. It depends more on the situation what is best. Just look at the generated code? > I think rs6000 only supports very narrow vectors, so even for The only hardware vectors are 128 bits. There are modes for two vectors concatenated as well, but that is just for RTL, you should not have insns making such a pair. > say QImode idx anything with MSB set will be out of out of bounds and UB, > so you can sign or zero extend, whatever is more efficient. Yup. > > + machine_mode idx_mode = GET_MODE (idx); > > + rtx tmp = gen_reg_rtx (DImode); > > + if (idx_mode != DImode) > > +tmp = convert_modes (DImode, idx_mode, idx, 0); > > + else > > +tmp = idx; > > + > >gcc_assert (width >= 1 && width <= 4); > > > >if (!BYTES_BIG_ENDIAN) > > { > >/* idx = idx * width. */ > > - emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > > + emit_insn (gen_muldi3 (tmp, tmp, GEN_INT (width))); > >/* idx = idx + 8. */ > > - emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8))); > > + emit_insn (gen_adddi3 (tmp, tmp, GEN_INT (8))); > > } > >else > > { > > - emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > > - emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp)); > > + emit_insn (gen_muldi3 (tmp, idx, GEN_INT (width))); > > + emit_insn (gen_subdi3 (tmp, GEN_INT (24 - width), tmp)); > > Ditto. But the above was even inconsistent, sometimes it > used tmp (for LE) and otherwise idx in whatever mode it has. Thanks for the review Jakub, I kinda dropped this :-( Segher
[pushed] c++: Fix NaN as C++20 template argument
C++20 allows floating-point types for non-type template parameters; floating-point values are considered to be equivalent template arguments if they are "identical", which conveniently seems to map onto an existing GCC predicate. Tested x86_64-pc-linux-gnu, applying to trunk. gcc/cp/ChangeLog: * tree.c (cp_tree_equal): Use real_identical. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-float1.C: New test. --- gcc/cp/tree.c | 2 +- gcc/testsuite/g++.dg/cpp2a/nontype-float1.C | 12 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-float1.C diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index 3c469750e9d..3acb6433b64 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -3740,7 +3740,7 @@ cp_tree_equal (tree t1, tree t2) return tree_int_cst_equal (t1, t2); case REAL_CST: - return real_equal (_REAL_CST (t1), _REAL_CST (t2)); + return real_identical (_REAL_CST (t1), _REAL_CST (t2)); case STRING_CST: return TREE_STRING_LENGTH (t1) == TREE_STRING_LENGTH (t2) diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-float1.C b/gcc/testsuite/g++.dg/cpp2a/nontype-float1.C new file mode 100644 index 000..4fafac13793 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-float1.C @@ -0,0 +1,12 @@ +// { dg-do compile { target c++20 } } + +#include + +template class MyClass { }; + +static_assert(__is_same(MyClass, MyClass)); + +constexpr auto mynan = NAN; +static_assert(__is_same(MyClass, MyClass)); + +static_assert(__is_same(MyClass, MyClass)); base-commit: 0251051db64f13c9a31a05c8133c31dc50b2b235 -- 2.27.0
[PATCH] Complete __gnu_debug::basic_string
Following: https://gcc.gnu.org/pipermail/libstdc++/2021-March/052158.html Here is the patch to complete __gnu_debug::basic_string support. Contrarily to what I thought code in std::basic_string to generate a basic_string_view works just fine for __gnu_debug::basic_string. libstdc++: [_GLIBCXX_DEBUG] Add __gnu_debug u8string/u16string/u32string Complete __gnu_debug::basic_string support so that it can be used as a transparent replacement of std::basic_string. libstdc++-v3/ChangeLog: * include/debug/string (basic_string(const _CharT*, const _Allocator&)): Remove assign call. (basic_string<>::insert(const_iterator, _InputIte, _InputIte)): Try to remove iterator debug layer even if !_GLIBCXX_USE_CXX11_ABI. [_GLIBCXX_USE_CHAR8_T] (__gnu_debug::u8string): New. (__gnu_debug::u16string, __gnu_debug::u32string): New. [!_GLIBCXX_COMPATIBILITY_CXX0X](std::hash<__gnu_debug::string>): New. [!_GLIBCXX_COMPATIBILITY_CXX0X][_GLIBCXX_USE_WCHAR_T](std::hash<__gnu_debug::wstring>): New. [_GLIBCXX_USE_CHAR8_T](std::hash<__gnu_debug::u8string>): New. (std::hash<__gnu_debug::u16string>): New. (std::hash<__gnu_debug::u32string>): New. * testsuite/21_strings/basic_string/hash/hash_char8_t.cc: Adapt for __gnu_debug basic_string. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/debug/string b/libstdc++-v3/include/debug/string index d6eb5280ade..dec23f6277b 100644 --- a/libstdc++-v3/include/debug/string +++ b/libstdc++-v3/include/debug/string @@ -41,6 +41,14 @@ __gnu_debug::_Error_formatter::_S_at(_File, _Line, _Func) \ ._M_message(#_Cond)._M_error() +#if _GLIBCXX_USE_CXX11_ABI && __cplusplus >= 201103 +# define _GLIBCXX_CPP11_AND_CXX11_ABI 1 +# define _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(Statement) Statement +#else +# define _GLIBCXX_CPP11_AND_CXX11_ABI 0 +# define _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(Statement) +#endif + namespace __gnu_debug { /** Checks that __s is non-NULL or __n == 0, and then returns __s. */ @@ -180,7 +188,7 @@ namespace __gnu_debug basic_string(const _CharT* __s, const _Allocator& __a = _Allocator()) : _Base(__glibcxx_check_string_constructor(__s), __a) - { this->assign(__s); } + { } basic_string(size_type __n, _CharT __c, const _Allocator& __a = _Allocator()) @@ -637,15 +645,22 @@ namespace __gnu_debug __glibcxx_check_insert_range(__p, __first, __last, __dist); typename _Base::iterator __res; -#if _GLIBCXX_USE_CXX11_ABI && __cplusplus >= 201103 +#if ! _GLIBCXX_CPP11_AND_CXX11_ABI + const size_type __offset = __p.base() - _Base::begin(); +#endif if (__dist.second >= __dp_sign) - __res = _Base::insert(__p.base(), __gnu_debug::__unsafe(__first), - __gnu_debug::__unsafe(__last)); + { + _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(__res =) + _Base::insert(__p.base(), __gnu_debug::__unsafe(__first), + __gnu_debug::__unsafe(__last)); + } else - __res = _Base::insert(__p.base(), __first, __last); -#else - const size_type __offset = __p.base() - _Base::begin(); - _Base::insert(__p.base(), __first, __last); + { + _GLIBCXX_CPP11_AND_CXX11_ABI_ONLY(__res =) + _Base::insert(__p.base(), __first, __last); + } + +#if ! _GLIBCXX_CPP11_AND_CXX11_ABI __res = _Base::begin() + __offset; #endif this->_M_invalidate_all(); @@ -1287,6 +1302,19 @@ namespace __gnu_debug typedef basic_string wstring; #endif +#ifdef _GLIBCXX_USE_CHAR8_T + /// A string of @c char8_t + typedef basic_string u8string; +#endif + +#if __cplusplus >= 201103L + /// A string of @c char16_t + typedef basic_string u16string; + + /// A string of @c char32_t + typedef basic_string u32string; +#endif + template struct _Insert_range_from_self_is_safe< __gnu_debug::basic_string<_CharT, _Traits, _Allocator> > @@ -1294,4 +1322,96 @@ namespace __gnu_debug } // namespace __gnu_debug +#if __cplusplus >= 201103L + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + // DR 1182. + +#ifndef _GLIBCXX_COMPATIBILITY_CXX0X + /// std::hash specialization for string. + template<> +struct hash<__gnu_debug::string> +: public __hash_base +{ + size_t + operator()(const __gnu_debug::string& __s) const noexcept + { return std::hash()(__s); } +}; + + template<> +struct __is_fast_hash> : std::false_type +{ }; + +#ifdef _GLIBCXX_USE_WCHAR_T + /// std::hash specialization for wstring. + template<> +struct hash<__gnu_debug::wstring> +: public __hash_base +{ + size_t + operator()(const __gnu_debug::wstring& __s) const noexcept + { return std::hash<__gnu_debug::wstring>()(__s); } +}; + + template<> +struct __is_fast_hash> : std::false_type +{ }; +#endif +#endif /* _GLIBCXX_COMPATIBILITY_CXX0X */ + +#ifdef _GLIBCXX_USE_CHAR8_T +
Aw: Re: [Patch, fortran] PR99602 - [11 regression] runtime error: pointer actual argument not associated
Hi Tobias, > Shouldn't there be also a testcase which triggers this run-time error? The testcase is there, it simply has the wrong dg-foo: ! { dg-do compile } which should be ! { dg-do run } *-*-* There are certainly more cases which should insert checks but currently don't, like: subroutine p (mm) implicit none type :: m_t end type m_t type, extends (m_t) :: m2_t end type m2_t class(m_t), pointer :: mm select type (mm) type is (m2_t) print *, "m2_t" end select end We've still two or three weeks to solve this before 11-release! :-) Harald
Re: [PATCH] c++: Fix up calls to immediate functions returning reference [PR99507]
On 3/11/21 3:52 AM, Jakub Jelinek wrote: Hi! build_cxx_call calls convert_from_reference at the end, so if an immediate function returns a reference, we were constant evaluating not just that call, but that call wrapped in an INDIRECT_REF. That unfortunately means it can constant evaluate to something non-addressable, so if code later needs to take its address it will fail. The following patch fixes that by undoing the convert_from_reference wrapping for the cxx_constant_value evaluation and readdding it ad the end. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. 2021-03-11 Jakub Jelinek PR c++/99507 * call.c (build_over_call): For immediate evaluation of functions that return references, undo convert_from_reference effects before calling cxx_constant_value and call convert_from_reference afterwards. * g++.dg/cpp2a/consteval19.C: New test. --- gcc/cp/call.c.jj2021-03-06 10:17:01.687304578 +0100 +++ gcc/cp/call.c 2021-03-10 13:51:29.121445195 +0100 @@ -9504,6 +9504,8 @@ build_over_call (struct z_candidate *can if (immediate_invocation_p (fndecl, nargs)) { tree obj_arg = NULL_TREE; + if (REFERENCE_REF_P (call)) + call = TREE_OPERAND (call, 0); if (DECL_CONSTRUCTOR_P (fndecl)) obj_arg = cand->first_arg ? cand->first_arg : (*args)[0]; if (obj_arg && is_dummy_object (obj_arg)) @@ -9527,6 +9529,7 @@ build_over_call (struct z_candidate *can call = cxx_constant_value (call, obj_arg); if (obj_arg && !error_operand_p (call)) call = build2 (INIT_EXPR, void_type_node, obj_arg, call); + call = convert_from_reference (call); } } return call; --- gcc/testsuite/g++.dg/cpp2a/consteval19.C.jj 2021-03-10 14:20:58.018835190 +0100 +++ gcc/testsuite/g++.dg/cpp2a/consteval19.C2021-03-10 14:20:16.642294167 +0100 @@ -0,0 +1,6 @@ +// PR c++/99507 +// { dg-do compile { target c++20 } } + +constexpr int i{0}; +consteval const int () { return i; } +const int *a{ ()}; Jakub
Re: [PATCH] tighten up checking for virtual bases (PR 99502)
On 3/11/21 1:06 PM, Martin Sebor wrote: More extensive testing of the patch I just committed in r11-7563 to avoid the false positive -Warray-bounds on accesses to members of virtual bases has exposed a couple of problems that cause many false negatives for even basic bugs like: typedef struct A { int i; } A; void* g (void) { A *p = (A*)malloc (3); p->i = 0; // missing warning in C++ return p; } as well as a number of others, including more complex ones involving virtual base classes that can, with some additional work, be reliably detected. Most of them were successfully diagnosed before the fix for PR 98266. Unfortunately, the tests that exercise this are all C so the C++ only regressions due to the divergence went unnoticed. The root cause is twofold: a) the simplistic check for TYPE_BINFO in the new inbounds_vbase_memaccess_p() function means we exclude from checking any member accesses whose leading offset is in bounds, and b) the check for the leading offset misses cases where just the ending offset of an access is out of bounds. The attached patch adds code to restrict the original solution to just the narrow subset of accesses to members of classes with virtual bases, and also adds a check for the ending offset. Is it enough to just fix the ending offset check, and maybe remove "vbase" from the function name? The shortcut should be valid for classes without vbases, as well. Is the patch okay for trunk? (The code for has_virtual_base() is adapted from polymorphic_type_binfo_p() in ipa-utils.h. It seems like a handy utility that could go in tree.h.) Martin PS There is another regression I discovered while working on this. It's a false positive but unrelated to the r11-7563 fix so I'll post a patch for separately.
Re: [PATCH] c++: Ensure correct destruction order of local statics [PR99613]
On 3/16/21 2:08 PM, Jakub Jelinek wrote: Hi! As mentioned in the PR, if end of two constructions of local statics is strongly ordered, their destructors should be run in the reverse order. As we run __cxa_guard_release before calling __cxa_atexit, it is possible that we have two threads that access two local statics in the same order for the first time, one thread wins the __cxa_guard_acquire on the first one but is rescheduled in between the __cxa_guard_release and __cxa_atexit calls, then the other thread is scheduled and wins __cxa_guard_acquire on the second one and calls __cxa_quard_release and __cxa_atexit and only afterwards the first thread calls its __cxa_atexit. This means a variable whose completion of the constructor strongly happened after the completion of the other one will be destructed after the other variable is destructed. The following patch fixes that by swapping the __cxa_guard_release and __cxa_atexit calls. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. 2021-03-16 Jakub Jelinek PR c++/99613 * decl.c (expand_static_init): For thread guards, call __cxa_atexit before calling __cxa_guard_release rather than after it. Formatting fixes. --- gcc/cp/decl.c.jj2021-03-16 00:21:29.517232582 +0100 +++ gcc/cp/decl.c 2021-03-16 15:16:40.803278426 +0100 @@ -9246,17 +9246,25 @@ expand_static_init (tree decl, tree init /* Do the initialization itself. */ init = add_stmt_to_compound (begin, init); - init = add_stmt_to_compound - (init, build2 (MODIFY_EXPR, void_type_node, flag, boolean_true_node)); - init = add_stmt_to_compound - (init, build_call_n (release_fn, 1, guard_addr)); + init = add_stmt_to_compound (init, + build2 (MODIFY_EXPR, void_type_node, + flag, boolean_true_node)); + + /* Use atexit to register a function for destroying this static +variable. Do this before calling __cxa_guard_release. */ + init = add_stmt_to_compound (init, register_dtor_fn (decl)); + + init = add_stmt_to_compound (init, build_call_n (release_fn, 1, + guard_addr)); } else - init = add_stmt_to_compound (init, set_guard (guard)); + { + init = add_stmt_to_compound (init, set_guard (guard)); - /* Use atexit to register a function for destroying this static -variable. */ - init = add_stmt_to_compound (init, register_dtor_fn (decl)); + /* Use atexit to register a function for destroying this static +variable. */ + init = add_stmt_to_compound (init, register_dtor_fn (decl)); + } finish_expr_stmt (init); Jakub
Re: [PATCH] C++: target attribute - local decl
On 3/8/21 4:33 AM, Martin Liška wrote: On 3/4/21 9:54 PM, Jason Merrill wrote: On 3/4/21 10:52 AM, Martin Liška wrote: On 3/4/21 4:45 PM, Jason Merrill wrote: Sure, I guess you do need to set that flag for the local decls, but that should be all. Jason Doing that also fails :/ This time likely due to how we set RECORD argument of maybe_version_functions function: gcc/cp/decl.c: && maybe_version_functions (newdecl, olddecl, gcc/cp/decl.c- (!DECL_FUNCTION_VERSIONED (newdecl) gcc/cp/decl.c- || !DECL_FUNCTION_VERSIONED (olddecl That is odd. The other problem is that DECL_LOCAL_DECL_ALIAS isn't always set before we get to maybe_version_functions; it's set further down in do_pushdecl. We need it to be set or things break. Oh, I see. This seems to work for me, what do you think? Seems good to me, please apply the patch. Done. Jason
[PATCH] rs6000: Workaround for PR98092
The bcdinvalid_ RTL instruction uses the "unordered" comparison, which cannot be used if we have -ffinite-math-only. We really need CCMODEs that describe what bits in a CR field are set by other insns than just comparisons, but that is a lot more surgery, and it is stage 4 now. This patch does a simple workaround. 2021-03-16 Segher Boessenkool PR target/98092 * config/rs6000/predicates.md (branch_comparison_operator): Allow ordered and unordered for CCFPmode, if flag_finite_math_only. gcc/testsuite/ PR target/98092 * gcc.target/powerpc/pr98092.c: New. --- gcc/config/rs6000/predicates.md| 9 + gcc/testsuite/gcc.target/powerpc/pr98092.c | 7 +++ 2 files changed, 12 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr98092.c diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 69f3c709abbc..859af75dfbd1 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1208,10 +1208,11 @@ (define_special_predicate "equality_operator" (define_predicate "branch_comparison_operator" (and (match_operand 0 "comparison_operator") (match_test "GET_MODE_CLASS (GET_MODE (XEXP (op, 0))) == MODE_CC") - (if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCFPmode - && !flag_finite_math_only") - (match_code "lt,gt,eq,unordered,unge,unle,ne,ordered") - (match_code "lt,ltu,le,leu,gt,gtu,ge,geu,eq,ne")) + (if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCFPmode") + (if_then_else (match_test "flag_finite_math_only") + (match_code "lt,le,gt,ge,eq,ne,unordered,ordered") + (match_code "lt,gt,eq,unordered,unge,unle,ne,ordered")) + (match_code "lt,ltu,le,leu,gt,gtu,ge,geu,eq,ne")) (match_test "validate_condition_mode (GET_CODE (op), GET_MODE (XEXP (op, 0))), 1"))) diff --git a/gcc/testsuite/gcc.target/powerpc/pr98092.c b/gcc/testsuite/gcc.target/powerpc/pr98092.c new file mode 100644 index ..03eab5a24ef7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr98092.c @@ -0,0 +1,7 @@ +/* { dg-options "-mdejagnu-cpu=power9 -ffinite-math-only" } */ + +int +h9 (__attribute__ ((altivec (vector__))) char un) +{ + return (__builtin_vec_bcdinvalid (un)); +} -- 1.8.3.1
Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]
On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote: > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index ec068c58aa5..48eb91132a9 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -7000,11 +7000,15 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, > rtx idx) > >gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); > > - gcc_assert (GET_MODE (idx) == E_SImode); > - >machine_mode inner_mode = GET_MODE (val); > > - rtx tmp = gen_reg_rtx (GET_MODE (idx)); > + machine_mode idx_mode = GET_MODE (idx); > + rtx tmp = gen_reg_rtx (DImode); > + if (idx_mode != DImode) > +tmp = convert_modes (DImode, idx_mode, idx, 0); > + else > +tmp = idx; > + >int width = GET_MODE_SIZE (inner_mode); > >gcc_assert (width >= 1 && width <= 8); > @@ -7012,9 +7016,7 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, > rtx idx) >int shift = exact_log2 (width); >/* Generate the IDX for permute shift, width is the vector element size. > idx = idx * width. */ > - emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift))); > - > - tmp = convert_modes (DImode, SImode, tmp, 1); > + emit_insn (gen_ashldi3 (tmp, tmp, GEN_INT (shift))); This looks broken. The gen_reg_rtx (DImode); call result is completely ignored, so it wastes one pseudo, and I'm not convinced that idx nor tmp is guaranteed to be usable as output of shift. So, shouldn't it be instead: rtx tmp = gen_reg_rtx (DImode); if (idx_mode != DImode) idx = convert_modes (DImode, idx_mode, idx, 0); ... emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift)); ? Also, dunno what is less and what is more expensive on rs6000, whether convert_modes with unsigned = 0 or 1. I think rs6000 only supports very narrow vectors, so even for say QImode idx anything with MSB set will be out of out of bounds and UB, so you can sign or zero extend, whatever is more efficient. > + machine_mode idx_mode = GET_MODE (idx); > + rtx tmp = gen_reg_rtx (DImode); > + if (idx_mode != DImode) > +tmp = convert_modes (DImode, idx_mode, idx, 0); > + else > +tmp = idx; > + >gcc_assert (width >= 1 && width <= 4); > >if (!BYTES_BIG_ENDIAN) > { >/* idx = idx * width. */ > - emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > + emit_insn (gen_muldi3 (tmp, tmp, GEN_INT (width))); >/* idx = idx + 8. */ > - emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8))); > + emit_insn (gen_adddi3 (tmp, tmp, GEN_INT (8))); > } >else > { > - emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > - emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp)); > + emit_insn (gen_muldi3 (tmp, idx, GEN_INT (width))); > + emit_insn (gen_subdi3 (tmp, GEN_INT (24 - width), tmp)); Ditto. But the above was even inconsistent, sometimes it used tmp (for LE) and otherwise idx in whatever mode it has. Jakub
RE: [PATCH] Ada: hashed container Cursor type predefined equality non-conformance
Just a note that I do not have write access, so I will need someone who does to commit this patch, if approved. Richard > -Original Message- > From: Richard Wai > Sent: March 11, 2021 9:13 AM > To: 'Arnaud Charlet' > Cc: 'gcc-patches@gcc.gnu.org' ; 'Bob Duff' > > Subject: RE: [PATCH] Ada: hashed container Cursor type predefined equality > non-conformance > > Here is the amended commit log: > > -- > > Ada: Ensure correct predefined equality behavior for Cursor objects of > hashed containers. > > 2021-03-11 Richard Wai > > gcc/ada/ > * libgnat/a-cohase.ads (Cursor): Synchronize comments for the > Cursor > type definition to be consistent with identical definitions in other > container packages. Add additional comments regarding the > importance of > maintaining the "Position" component for predefined equality. > * libgnat/a-cohama.ads (Cursor): Likewise. > * libgnat/a-cihama.ads (Cursor): Likewise. > * libgnat/a-cohase.adb (Find, Insert): Ensure that Cursor objects > always have their "Position" component set to ensure predefined > equality works as required. > * libgnat/a-cohama.adb (Find, Insert): Likewise. > * libgnat/a-cihama.adb (Find, Insert): Likewise. > > gcc/testsuite/ > * gnat.dg/containers2.adb: New test. > > -- > > Thanks! > > Richard > > > -Original Message- > > From: Arnaud Charlet > > Sent: March 10, 2021 11:27 AM > > To: Richard Wai > > Cc: gcc-patches@gcc.gnu.org; 'Bob Duff' ; Arnaud > > Charlet > > Subject: Re: [PATCH] Ada: hashed container Cursor type predefined > > equality non-conformance > > > > > I'm not sure I correctly understand you here, but my interpretation > > > is that I should no longer submit Changelog entries, rather just the > > > patch, and then > > > > Right. > > > > > a commit message (a-la git), and then presumably the Changelong > > > entries will be generated automatically. From what I can see, gcc' > > > website does not talk about that, so I'm guessing this format based > > > on what I see from git-log, generally. > > > > > > So assuming that, attached is the "correct" patch, and here is the > > > commit > > > message: > > > > > > --- > > > > > > Author: Richard Wai > > > > > > Ada: Ensure correct Cursor predefined equality behavior for hashed > > > containers. > > > > > > -- > > > > > > And for the record, the change log entries I've come up with as per > > > the previous email: > > > > And the commit log will look like: > > > > 2021-03-09 Richard Wai > > > > gcc/ada/ > > * libgnat/... > > > > gcc/testsuite/ > > * gnat.dg/containers2.adb: ... > > > > Your patch is OK with these changes, thanks.
[PATCH] c++: Ensure correct destruction order of local statics [PR99613]
Hi! As mentioned in the PR, if end of two constructions of local statics is strongly ordered, their destructors should be run in the reverse order. As we run __cxa_guard_release before calling __cxa_atexit, it is possible that we have two threads that access two local statics in the same order for the first time, one thread wins the __cxa_guard_acquire on the first one but is rescheduled in between the __cxa_guard_release and __cxa_atexit calls, then the other thread is scheduled and wins __cxa_guard_acquire on the second one and calls __cxa_quard_release and __cxa_atexit and only afterwards the first thread calls its __cxa_atexit. This means a variable whose completion of the constructor strongly happened after the completion of the other one will be destructed after the other variable is destructed. The following patch fixes that by swapping the __cxa_guard_release and __cxa_atexit calls. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-03-16 Jakub Jelinek PR c++/99613 * decl.c (expand_static_init): For thread guards, call __cxa_atexit before calling __cxa_guard_release rather than after it. Formatting fixes. --- gcc/cp/decl.c.jj2021-03-16 00:21:29.517232582 +0100 +++ gcc/cp/decl.c 2021-03-16 15:16:40.803278426 +0100 @@ -9246,17 +9246,25 @@ expand_static_init (tree decl, tree init /* Do the initialization itself. */ init = add_stmt_to_compound (begin, init); - init = add_stmt_to_compound - (init, build2 (MODIFY_EXPR, void_type_node, flag, boolean_true_node)); - init = add_stmt_to_compound - (init, build_call_n (release_fn, 1, guard_addr)); + init = add_stmt_to_compound (init, + build2 (MODIFY_EXPR, void_type_node, + flag, boolean_true_node)); + + /* Use atexit to register a function for destroying this static +variable. Do this before calling __cxa_guard_release. */ + init = add_stmt_to_compound (init, register_dtor_fn (decl)); + + init = add_stmt_to_compound (init, build_call_n (release_fn, 1, + guard_addr)); } else - init = add_stmt_to_compound (init, set_guard (guard)); + { + init = add_stmt_to_compound (init, set_guard (guard)); - /* Use atexit to register a function for destroying this static -variable. */ - init = add_stmt_to_compound (init, register_dtor_fn (decl)); + /* Use atexit to register a function for destroying this static +variable. */ + init = add_stmt_to_compound (init, register_dtor_fn (decl)); + } finish_expr_stmt (init); Jakub
[PATCH] c-family: Fix up -Wduplicated-branches for union members [PR99565]
Hi! Honza has fairly recently changed operand_equal_p to compare DECL_FIELD_OFFSET for COMPONENT_REFs when comparing addresses. As the first testcase in this patch shows, while that is very nice for optimizations, for the -Wduplicated-branches warning it causes regressions. Pedantically a union in both C and C++ has only one active member at a time, so using some other union member even if it has the same type is UB, so I think the warning shouldn't warn when it sees access to different fields that happen to have the same offset and should consider them different. In my first attempt to fix this I've keyed the old behavior on OEP_LEXICOGRAPHIC, but unfortunately that has various problems, the warning has a quick non-lexicographic compare in build_conditional_expr* and another lexicographic more expensive one later during genericization and turning the first one into lexicographic would mean wasting compile time on large conditionals. So, this patch instead introduces a new OEP_ flag and makes sure to pass it to operand_equal_p in all -Wduplicated-branches cases. The cvt.c changes are because on the other testcase we were warning with UNKNOWN_LOCATION, so the user wouldn't really know where the questionable code is. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-03-16 Jakub Jelinek PR c++/99565 * tree-core.h (enum operand_equal_flag): Add OEP_DUPLICATED_BRANCHES. * fold-const.c (operand_compare::operand_equal_p): Don't compare field offsets if OEP_DUPLICATED_BRANCHES. * c-warn.c (do_warn_duplicated_branches): Pass also OEP_DUPLICATED_BRANCHES to operand_equal_p. * c-typeck.c (build_conditional_expr): Pass OEP_DUPLICATED_BRANCHES to operand_equal_p. * call.c (build_conditional_expr_1): Pass OEP_DUPLICATED_BRANCHES to operand_equal_p. * cvt.c (convert_to_void): Preserve location_t on COND_EXPR or or COMPOUND_EXPR. * g++.dg/warn/Wduplicated-branches6.C: New test. * g++.dg/warn/Wduplicated-branches7.C: New test. --- gcc/tree-core.h.jj 2021-01-16 22:52:33.654413400 +0100 +++ gcc/tree-core.h 2021-03-16 16:24:06.136829900 +0100 @@ -896,7 +896,10 @@ enum operand_equal_flag { OEP_HASH_CHECK = 32, /* Makes operand_equal_p handle more expressions: */ OEP_LEXICOGRAPHIC = 64, - OEP_BITWISE = 128 + OEP_BITWISE = 128, + /* Considers different fields different even when they have + the same offset. */ + OEP_DUPLICATED_BRANCHES = 256 }; /* Enum and arrays used for tree allocation stats. --- gcc/fold-const.c.jj 2021-02-24 12:10:17.571212194 +0100 +++ gcc/fold-const.c2021-03-16 16:19:16.327018956 +0100 @@ -3317,7 +3317,7 @@ operand_compare::operand_equal_p (const_ flags &= ~OEP_ADDRESS_OF; if (!OP_SAME (1)) { - if (compare_address) + if (compare_address && (flags & OEP_DUPLICATED_BRANCHES) == 0) { if (TREE_OPERAND (arg0, 2) || TREE_OPERAND (arg1, 2)) --- gcc/c-family/c-warn.c.jj2021-02-12 23:57:30.459142340 +0100 +++ gcc/c-family/c-warn.c 2021-03-16 16:24:19.224685926 +0100 @@ -2779,7 +2779,8 @@ do_warn_duplicated_branches (tree expr) /* Compare the hashes. */ if (h0 == h1 - && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC) + && operand_equal_p (thenb, elseb, OEP_LEXICOGRAPHIC + | OEP_DUPLICATED_BRANCHES) /* Don't warn if any of the branches or their subexpressions comes from a macro. */ && !walk_tree_without_duplicates (, expr_from_macro_expansion_r, --- gcc/c/c-typeck.c.jj 2021-02-18 22:17:44.184657291 +0100 +++ gcc/c/c-typeck.c2021-03-16 16:23:58.335915719 +0100 @@ -5490,7 +5490,7 @@ build_conditional_expr (location_t colon warn here, because the COND_EXPR will be turned into OP1. */ if (warn_duplicated_branches && TREE_CODE (ret) == COND_EXPR - && (op1 == op2 || operand_equal_p (op1, op2, 0))) + && (op1 == op2 || operand_equal_p (op1, op2, OEP_DUPLICATED_BRANCHES))) warning_at (EXPR_LOCATION (ret), OPT_Wduplicated_branches, "this condition has identical branches"); --- gcc/cp/call.c.jj2021-03-12 10:11:17.642122302 +0100 +++ gcc/cp/call.c 2021-03-16 16:23:49.647011304 +0100 @@ -5798,7 +5798,8 @@ build_conditional_expr_1 (const op_locat warn here, because the COND_EXPR will be turned into ARG2. */ if (warn_duplicated_branches && (complain & tf_warning) - && (arg2 == arg3 || operand_equal_p (arg2, arg3, 0))) + && (arg2 == arg3 || operand_equal_p (arg2, arg3, + OEP_DUPLICATED_BRANCHES))) warning_at (EXPR_LOCATION (result), OPT_Wduplicated_branches, "this condition has identical branches"); --- gcc/cp/cvt.c.jj 2021-03-04 16:03:40.089323550 +0100 +++ gcc/cp/cvt.c
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, 16 Mar 2021 at 18:38, Jakub Jelinek wrote: > > On Tue, Mar 16, 2021 at 06:34:34PM +0100, Christophe Lyon wrote: > > On Sat, 13 Mar 2021 at 21:34, Jakub Jelinek via Gcc-patches > > Unfortunately, the last new warning does not happen when running the > > tests with -mabi=ilp32. > > Is that small patch OK? > > With a proper ChangeLog entry, sure. > Though perhaps it would be easier to do && lp64. Your choice. Indeed, I'll do that. Thanks > > > diff --git a/gcc/testsuite/gcc.dg/declare-simd.c > > b/gcc/testsuite/gcc.dg/declare-simd.c > > index 52796f6..894fb64 100644 > > --- a/gcc/testsuite/gcc.dg/declare-simd.c > > +++ b/gcc/testsuite/gcc.dg/declare-simd.c > > @@ -3,7 +3,7 @@ > > > > #pragma omp declare simd linear (p2, p3) > > extern void fn2 (float p1, float *p2, float *p3); > > -/* { dg-warning "GCC does not currently support mixed size types for > > 'simd' functions" "" { target aarch64*-*-* } .-1 } */ > > +/* { dg-warning "GCC does not currently support mixed size types for > > 'simd' functions" "" { target { { aarch64*-*-* } && { ! ilp32 } } } > > .-1 } */ > > > > float *a, *b; > > void fn1 (float *p1) > > Jakub >
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 06:34:34PM +0100, Christophe Lyon wrote: > On Sat, 13 Mar 2021 at 21:34, Jakub Jelinek via Gcc-patches > Unfortunately, the last new warning does not happen when running the > tests with -mabi=ilp32. > Is that small patch OK? With a proper ChangeLog entry, sure. Though perhaps it would be easier to do && lp64. Your choice. > diff --git a/gcc/testsuite/gcc.dg/declare-simd.c > b/gcc/testsuite/gcc.dg/declare-simd.c > index 52796f6..894fb64 100644 > --- a/gcc/testsuite/gcc.dg/declare-simd.c > +++ b/gcc/testsuite/gcc.dg/declare-simd.c > @@ -3,7 +3,7 @@ > > #pragma omp declare simd linear (p2, p3) > extern void fn2 (float p1, float *p2, float *p3); > -/* { dg-warning "GCC does not currently support mixed size types for > 'simd' functions" "" { target aarch64*-*-* } .-1 } */ > +/* { dg-warning "GCC does not currently support mixed size types for > 'simd' functions" "" { target { { aarch64*-*-* } && { ! ilp32 } } } > .-1 } */ > > float *a, *b; > void fn1 (float *p1) Jakub
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Sat, 13 Mar 2021 at 21:34, Jakub Jelinek via Gcc-patches wrote: > > Hi! > > As the patch shows, there are several bugs in > aarch64_simd_clone_compute_vecsize_and_simdlen. > One is that unlike for function declarations that aren't definitions > it completely ignores argument types. Such decls don't have DECL_ARGUMENTS, > but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like > the simd cloning code in the middle end does too. > > Another problem is that it checks types of uniform arguments. That is > unnecessary, uniform arguments are passed the way it normally is, it is > a scalar argument rather than vector, so there is no reason not to support > uniform argument of different size, or long double, structure etc. > > Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk? > > 2021-03-13 Jakub Jelinek > > PR target/99542 > * config/aarch64/aarch64.c > (aarch64_simd_clone_compute_vecsize_and_simdlen): If not a function > definition, walk TYPE_ARG_TYPES list if non-NULL for argument types > instead of DECL_ARGUMENTS. Ignore types for uniform arguments. > > * gcc.dg/gomp/pr99542.c: New test. > * gcc.dg/gomp/pr59669-2.c (bar): Don't expect a warning on aarch64. > * gcc.dg/gomp/simd-clones-2.c (setArray): Likewise. > * g++.dg/vect/simd-clone-7.cc (bar): Likewise. > * g++.dg/gomp/declare-simd-1.C (f37): Expect a different warning > on aarch64. > * gcc.dg/declare-simd.c (fn2): Expect a new warning on aarch64. > > --- gcc/config/aarch64/aarch64.c.jj 2021-03-12 19:01:48.672296344 +0100 > +++ gcc/config/aarch64/aarch64.c2021-03-13 09:16:42.585045538 +0100 > @@ -23412,11 +23412,17 @@ aarch64_simd_clone_compute_vecsize_and_s >return 0; > } > > - for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t)) > + int i; > + tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); > + bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE); > + > + for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = > 0; > + t && t != void_list_node; t = TREE_CHAIN (t), i++) > { > - arg_type = TREE_TYPE (t); > + tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); > > - if (!currently_supported_simd_type (arg_type, base_type)) > + if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM > + && !currently_supported_simd_type (arg_type, base_type)) > { > if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type)) > warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > --- gcc/testsuite/gcc.dg/gomp/pr99542.c.jj 2021-03-13 09:16:42.586045527 > +0100 > +++ gcc/testsuite/gcc.dg/gomp/pr99542.c 2021-03-13 09:16:42.586045527 +0100 > @@ -0,0 +1,17 @@ > +/* PR middle-end/89246 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-O0 -fopenmp-simd" } */ > + > +#pragma omp declare simd > +extern int foo (__int128 x); /* { dg-warning "GCC does not currently > support mixed size types for 'simd' function" "" { target aarch64*-*-* } } */ > +/* { dg-warning "unsupported argument type '__int128' for simd" "" { target > i?86-*-* x86_64-*-* } .-1 } */ > + > +#pragma omp declare simd uniform (x) > +extern int baz (__int128 x); > + > +#pragma omp declare simd > +int > +bar (int x) > +{ > + return x + foo (0) + baz (0); > +} > --- gcc/testsuite/gcc.dg/gomp/pr59669-2.c.jj2020-01-12 11:54:37.430398065 > +0100 > +++ gcc/testsuite/gcc.dg/gomp/pr59669-2.c 2021-03-13 09:35:07.100807877 > +0100 > @@ -7,4 +7,3 @@ void > bar (int *a) > { > } > -/* { dg-warning "GCC does not currently support mixed size types for 'simd' > functions" "" { target aarch64*-*-* } .-3 } */ > --- gcc/testsuite/gcc.dg/gomp/simd-clones-2.c.jj2020-01-12 > 11:54:37.431398050 +0100 > +++ gcc/testsuite/gcc.dg/gomp/simd-clones-2.c 2021-03-13 09:36:21.143988256 > +0100 > @@ -15,7 +15,6 @@ float setArray(float *a, float x, int k) >return a[k]; > } > > -/* { dg-warning "GCC does not currently support mixed size types for 'simd' > functions" "" { target aarch64*-*-* } .-6 } */ > /* { dg-final { scan-tree-dump "_ZGVbN4ua32vl_setArray" "optimized" { target > i?86-*-* x86_64-*-* } } } */ > /* { dg-final { scan-tree-dump "_ZGVbN4vvva32_addit" "optimized" { target > i?86-*-* x86_64-*-* } } } */ > /* { dg-final { scan-tree-dump "_ZGVbM4vl66u_addit" "optimized" { target > i?86-*-* x86_64-*-* } } } */ > --- gcc/testsuite/g++.dg/vect/simd-clone-7.cc.jj2020-01-12 > 11:54:37.280400328 +0100 > +++ gcc/testsuite/g++.dg/vect/simd-clone-7.cc 2021-03-13 09:23:58.005214442 > +0100 > @@ -8,5 +8,3 @@ bar (float x, float *y, int) > { >return y[0] + y[1] * x; > } > -// { dg-warning "GCC does not currently support mixed size types for 'simd' > functions" "" { target { { aarch64*-*-* } && lp64 } } .-4 } > - > --- gcc/testsuite/g++.dg/gomp/declare-simd-1.C.jj
Re: [RFC] ldist: Recognize rawmemchr loop patterns
[snip] Please find attached a new version of the patch. A major change compared to the previous patch is that I created a separate pass which hopefully makes reviewing also easier since it is almost self-contained. After realizing that detecting loops which mimic the behavior of rawmemchr/strlen functions does not really fit into the topic of loop distribution, I created a separate pass. Due to this I was also able to play around a bit and schedule the pass at different times. Currently it is scheduled right before loop distribution where loop header copying already took place which leads to the following effect. Running this setup over char *t (char *p) { for (; *p; ++p); return p; } the new pass transforms char * t (char * p) { char _1; char _7; [local count: 118111600]: _7 = *p_3(D); if (_7 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: [local count: 955630225]: # p_8 = PHI p_6 = p_8 + 1; _1 = *p_6; if (_1 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: # p_2 = PHI goto ; [100.00%] [local count: 850510901]: goto ; [100.00%] [local count: 12992276]: [local count: 118111600]: # p_9 = PHI return p_9; } into char * t (char * p) { char * _5; char _7; [local count: 118111600]: _7 = *p_3(D); if (_7 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _5 = p_3(D) + 1; p_10 = .RAWMEMCHR (_5, 0); [local count: 118111600]: # p_9 = PHI return p_9; } which is fine so far. However, I haven't made up my mind so far whether it is worthwhile to spend more time in order to also eliminate the "first unrolling" of the loop. I gave it a shot by scheduling the pass prior pass copy header and ended up with: char * t (char * p) { [local count: 118111600]: p_5 = .RAWMEMCHR (p_3(D), 0); return p_5; } which seems optimal to me. The downside of this is that I have to initialize scalar evolution analysis which might be undesired that early. All this brings me to the question where do you see this peace of code running? If in a separate pass when would you schedule it? If in an existing pass, which one would you choose? Another topic which came up is whether there exists a more elegant solution to my current implementation in order to deal with stores (I'm speaking of the `if (store_dr)` statement inside of function transform_loop_1). For example, extern char *p; char *t () { for (; *p; ++p); return p; } ends up as char * t () { char * _1; char * _2; char _3; char * p.1_8; char _9; char * p.1_10; char * p.1_11; [local count: 118111600]: p.1_8 = p; _9 = *p.1_8; if (_9 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: [local count: 955630225]: # p.1_10 = PHI <_1(6), p.1_8(5)> _1 = p.1_10 + 1; p = _1; _3 = *_1; if (_3 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: # _2 = PHI <_1(3)> goto ; [100.00%] [local count: 850510901]: goto ; [100.00%] [local count: 12992276]: [local count: 118111600]: # p.1_11 = PHI <_2(8), p.1_8(7)> return p.1_11; } where inside the loop a load and store occurs. For a rawmemchr like loop I have to show that we never load from a memory location to which we write. Currently I solve this by hard coding those facts which are not generic at all. I gave compute_data_dependences_for_loop a try which failed to determine the fact that stores only happen to p[0] and loads from p[i] where i>0. Maybe there are more generic solutions to express this in contrast to my current one? Thanks again for your input so far. Really appreciated. Cheers, Stefan diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 8a5fb3fd99c..7b2d7405277 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1608,6 +1608,7 @@ OBJS = \ tree-into-ssa.o \ tree-iterator.o \ tree-loop-distribution.o \ + tree-loop-pattern.o \ tree-nested.o \ tree-nrv.o \ tree-object-size.o \ diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index dd7173126fb..957e96a46a4 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2917,6 +2917,33 @@ expand_VEC_CONVERT (internal_fn, gcall *) gcc_unreachable (); } +void +expand_RAWMEMCHR (internal_fn, gcall *stmt) +{ + expand_operand ops[3]; + + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + tree lhs_type = TREE_TYPE (lhs); + rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + create_output_operand ([0], lhs_rtx, TYPE_MODE (lhs_type)); + + for (unsigned int i = 0; i < 2; ++i) +{ + tree rhs = gimple_call_arg (stmt, i); + tree rhs_type = TREE_TYPE (rhs); + rtx rhs_rtx = expand_normal (rhs); + create_input_operand ([i + 1], rhs_rtx, TYPE_MODE (rhs_type)); +} + + insn_code icode = direct_optab_handler (rawmemchr_optab, ops[2].mode); + + expand_insn (icode,
Re: [Patch, fortran] PR99602 - [11 regression] runtime error: pointer actual argument not associated
Hi Paul, On 16.03.21 17:42, Paul Richard Thomas via Gcc-patches wrote: Fortran: Fix runtime errors for class actual arguments [PR99602]. * trans-array.c (gfc_conv_procedure_call): For class formal arguments, use the _data field attributes for runtime errors. * gfortran.dg/pr99602.f90: New test. Shouldn't there be also a testcase which triggers this run-time error? I might have messed up my testcase, but I think it should trigger? (Attached is an attempt to pass the nullified pointer as actual argument to a non-pointer argument; otherwise it is the same testcase as before.) Otherwise, at a glance, it looked sensible. Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf ! { dg-do compile } ! { dg-options "-fcheck=pointer -fdump-tree-original" } ! ! Test fix of PR99602, where a spurious runtime error was introduced ! by PR99112. This is the testcase in comment #6 of the PR. ! ! Contributed by Jeurgen Reuter ! module m implicit none private public :: m_t type :: m_t integer :: ii(100) end type m_t end module m module m2_testbed use m implicit none private public :: prepare_m2 procedure (prepare_m2_proc), pointer :: prepare_m2 => null () abstract interface subroutine prepare_m2_proc (m2) import class(m_t), intent(inout) :: m2 end subroutine prepare_m2_proc end interface end module m2_testbed module a use m use m2_testbed, only: prepare_m2 implicit none private public :: a_1 contains subroutine a_1 () class(m_t), pointer :: mm mm => null () call prepare_m2 (mm) ! Runtime error triggered here end subroutine a_1 end module a module m2 use m implicit none private public :: m2_t type, extends (m_t) :: m2_t private contains procedure :: read => m2_read end type m2_t contains subroutine m2_read (mm) class(m2_t), intent(out), target :: mm end subroutine m2_read end module m2 program main use m2_testbed use a, only: a_1 implicit none prepare_m2 => prepare_whizard_m2 call a_1 () contains subroutine prepare_whizard_m2 (mm) use m use m2 class(m_t), intent(inout) :: mm !if (.not. associated (mm)) allocate (m2_t :: mm) mm%ii = 100 select type (mm) type is (m2_t) call mm%read () end select end subroutine prepare_whizard_m2 end program main ! { dg-final { scan-tree-dump-times "_gfortran_runtime_error_at" 0 "original" } } ! { dg-final { scan-tree-dump-times "Pointer actual argument" 0 "original" } }
Re: [PATCH v8] Practical improvement to libgcc complex divide
Ping... On 2/22/2021 5:08 PM, Patrick McGehearty via Gcc-patches wrote: Changes in this version from Version 7: gcc/c-family/c-cppbuiltin.c Changed modename to float_h_prefix Removed (mode == TYPE_MODE...) code left over from earlier approach. libgcc/libgcc2.c Removed conditional use of XF constants in TF case. Also left over from earlier approach. Tested _Float64x case on x86. Works as expected. Correctness and performance test programs used during development of this project may be found in the attachment to: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html Summary of Purpose This patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [2] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. Background This project started with an investigation related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714. Study of Beebe[1] provided an overview of past and recent practice for computing complex divide. The current glibc implementation is based on Robert Smith's algorithm [2] from 1962. A google search found the paper by Baudin and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's proposed patch [4] is based on that paper. I developed two sets of test data by randomly distributing values over a restricted range and the full range of input values. The current complex divide handled the restricted range well enough, but failed on the full range more than 1% of the time. Baudin and Smith's primary test for "ratio" equals zero reduced the cases with 16 or more error bits by a factor of 5, but still left too many flawed answers. Adding debug print out to cases with substantial errors allowed me to see the intermediate calculations for test values that failed. I noted that for many of the failures, "ratio" was a subnormal. Changing the "ratio" test from check for zero to check for subnormal reduced the 16 bit error rate by another factor of 12. This single modified test provides the greatest benefit for the least cost, but the percentage of cases with greater than 16 bit errors (double precision data) is still greater than 0.027% (2.7 in 10,000). Continued examination of remaining errors and their intermediate computations led to the various tests of input value tests and scaling to avoid under/overflow. The current patch does not handle some of the rare and most extreme combinations of input values, but the random test data is only showing 1 case in 10 million that has an error of greater than 12 bits. That case has 18 bits of error and is due to subtraction cancellation. These results are significantly better than the results reported by Baudin and Smith. Support for half, float, double, extended, and long double precision is included as all are handled with suitable preprocessor symbols in a single source routine. Since half precision is computed with float precision as per current libgcc practice, the enhanced algorithm provides no benefit for half precision and would cost performance. Further investigation showed changing the half precision algorithm to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no loss of precision and modest improvement in performance. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. This use is made generic by defining appropriate __LIBGCC2_* macros in c-cppbuiltin.c. Tests are added for when both parts of the denominator have exponents small enough to allow shifting any subnormal values to normal values all input values could be scaled up without risking overflow. That gained a clear improvement in accuracy. Similarly, when either numerator was subnormal and the other numerator and both denominator values were not too large, scaling could be used to reduce risk of computing with subnormals. The test and scaling values used all fit within the allowed exponent range for each precision required by the C standard.
[Patch, fortran] PR99602 - [11 regression] runtime error: pointer actual argument not associated
Hi Everybody, Although this is 'obvious' I thought that I should post it because I believe that it was triggered by the fix for PR99602 but I just do not have the bandwidth at the moment to test that. The ChangeLog together with the patch is more than sufficient explanation. Regtests OK on FC33/x86_64. OK for 11-branch? Paul Fortran: Fix runtime errors for class actual arguments [PR99602]. 2021-03-16 Paul Thomas gcc/fortran PR fortran/99602 * trans-array.c (gfc_conv_procedure_call): For class formal arguments, use the _data field attributes for runtime errors. gcc/testsuite/ PR fortran/99602 * gfortran.dg/pr99602.f90: New test. ! { dg-do compile } ! { dg-options "-fcheck=pointer -fdump-tree-original" } ! ! Test fix of PR99602, where a spurious runtime error was introduced ! by PR99112. This is the testcase in comment #6 of the PR. ! ! Contributed by Jeurgen Reuter ! module m implicit none private public :: m_t type :: m_t private end type m_t end module m module m2_testbed use m implicit none private public :: prepare_m2 procedure (prepare_m2_proc), pointer :: prepare_m2 => null () abstract interface subroutine prepare_m2_proc (m2) import class(m_t), intent(inout), pointer :: m2 end subroutine prepare_m2_proc end interface end module m2_testbed module a use m use m2_testbed, only: prepare_m2 implicit none private public :: a_1 contains subroutine a_1 () class(m_t), pointer :: mm mm => null () call prepare_m2 (mm) ! Runtime error triggered here end subroutine a_1 end module a module m2 use m implicit none private public :: m2_t type, extends (m_t) :: m2_t private contains procedure :: read => m2_read end type m2_t contains subroutine m2_read (mm) class(m2_t), intent(out), target :: mm end subroutine m2_read end module m2 program main use m2_testbed use a, only: a_1 implicit none prepare_m2 => prepare_whizard_m2 call a_1 () contains subroutine prepare_whizard_m2 (mm) use m use m2 class(m_t), intent(inout), pointer :: mm if (.not. associated (mm)) allocate (m2_t :: mm) select type (mm) type is (m2_t) call mm%read () end select end subroutine prepare_whizard_m2 end program main ! { dg-final { scan-tree-dump-times "_gfortran_runtime_error_at" 0 "original" } } ! { dg-final { scan-tree-dump-times "Pointer actual argument" 0 "original" } } diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index bffe0808dff..0cf17008b05 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -6663,6 +6663,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, char *msg; tree cond; tree tmp; + symbol_attribute fsym_attr; + + if (fsym) + { + if (fsym->ts.type == BT_CLASS && !UNLIMITED_POLY (fsym)) + fsym_attr = CLASS_DATA (fsym)->attr; + else + fsym_attr = fsym->attr; + } if (e->expr_type == EXPR_VARIABLE || e->expr_type == EXPR_FUNCTION) attr = gfc_expr_attr (e); @@ -6685,17 +6694,17 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, tree present, null_ptr, type; if (attr.allocatable - && (fsym == NULL || !fsym->attr.allocatable)) + && (fsym == NULL || !fsym_attr.allocatable)) msg = xasprintf ("Allocatable actual argument '%s' is not " "allocated or not present", e->symtree->n.sym->name); else if (attr.pointer - && (fsym == NULL || !fsym->attr.pointer)) + && (fsym == NULL || !fsym_attr.pointer)) msg = xasprintf ("Pointer actual argument '%s' is not " "associated or not present", e->symtree->n.sym->name); else if (attr.proc_pointer - && (fsym == NULL || !fsym->attr.proc_pointer)) + && (fsym == NULL || !fsym_attr.proc_pointer)) msg = xasprintf ("Proc-pointer actual argument '%s' is not " "associated or not present", e->symtree->n.sym->name); @@ -6719,15 +6728,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, else { if (attr.allocatable - && (fsym == NULL || !fsym->attr.allocatable)) + && (fsym == NULL || !fsym_attr.allocatable)) msg = xasprintf ("Allocatable actual argument '%s' is not " "allocated", e->symtree->n.sym->name); else if (attr.pointer - && (fsym == NULL || !fsym->attr.pointer)) + && (fsym == NULL || !fsym_attr.pointer)) msg = xasprintf ("Pointer actual argument '%s' is not " "associated", e->symtree->n.sym->name); else if (attr.proc_pointer - && (fsym == NULL || !fsym->attr.proc_pointer)) + && (fsym == NULL || !fsym_attr.proc_pointer)) msg = xasprintf ("Proc-pointer actual argument '%s' is not " "associated", e->symtree->n.sym->name); else
Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]
Hi! On Mon, Mar 15, 2021 at 11:11:32AM +0800, HAO CHEN GUI via Gcc-patches wrote: > This patch adds const_anchor for rs6000. The const_anchor is used > in cse pass. 1) This isn't suitable for stage 4. 2) Please add a test case, which shows what it does, that it is useful. 3) Does this work on other OSes than Linux? What about Darwin and AIX? > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index ec068c58aa5..2b2350c53ae 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p) > warning (0, "%qs is deprecated and not recommended in any circumstances", >"-mno-speculate-indirect-jumps"); > > + if (TARGET_64BIT) > +{ > + targetm.min_anchor_offset = -32768; > + targetm.max_anchor_offset = 32767; > + targetm.const_anchor = 0x8000; > +} Why only on 64 bit? Why these choices? Segher
[PATCH v4] x86: Update 'P' operand modifier for -fno-plt
On Sun, Mar 14, 2021 at 1:31 PM H.J. Lu wrote: > > On Sun, Mar 14, 2021 at 12:43 PM Uros Bizjak wrote: > > > > On Sun, Mar 14, 2021 at 8:14 PM H.J. Lu wrote: > > > > > Done. Here is the updated patch. Tested on Linux/x86-64. OK for > > > > > master? > > > > > > > > I don't understand the purpose of the current_output_insn check and I > > > > don't know if the usage of current_output_insn is correct. The > > > > comments are not helpful either, and no other target uses this > > > > variable in the way you propose. Can you please elaborate the reason > > > > and the purpose of the check a bit more? > > > > > > > > Uros. > > > > > > Originally, ix86_force_load_from_GOT_p is only for non-PIC. My patch > > > extended > > > it to inline assembly statements where current_output_insn == NULL and > > > PIC is > > > allowed in 64-bit. > > > > I can see this from the patch, but this explanation didn't answer my > > question. > > > > The purpose of current_output_insn == NULL is to allow PIC for inline > asm statements in 64-bit mode. Is there a better way to check if > ix86_print_operand () is called on inline asm statements? > Here is the v4 patch to check this_is_asm_operands for inline asm statements. OK for master? Thanks. -- H.J.
Re: testsuite: automatically checking for param documentation
On 3/16/21 4:09 PM, David Malcolm wrote: [updating subject for greater visibility] On Tue, 2021-03-16 at 08:51 -0600, Martin Sebor wrote: On 3/16/21 3:08 AM, Martin Liška wrote: On 3/15/21 9:57 PM, Martin Sebor wrote: Any plans to integrate it into the testsuite? (That way we presu mably wouldn't need to remember to run it by hand.) Likely not, I'm not so big friend with DejaGNU. Are you willing to help me with that? I'm not a fan either but that's what we've got. And sure, I'll help if/when I can. I think it should be "straightforward" to rewrite the script in Expect (as much as anything is straightforward in Expect). Or, it could stay as is and be invoked from some .exp file in testsuite. Although is Python a required prerequisite for running the testsuite? If not, it might be better to either rewrite the script in something that is (e.g., AWK) or in Expect. I find it painful writing non-trivial logic in Tcl. Me too! We already have run-gcov-pytest in gcov.exp which calls out to a python3 script, checking if python3 is supported, and bailing with UNSUPPORTED otherwise. FWIW I think it's reasonable to have something similar for this case. Yep, let's run the existing Python script. Martin Dave
testsuite: automatically checking for param documentation
[updating subject for greater visibility] On Tue, 2021-03-16 at 08:51 -0600, Martin Sebor wrote: > On 3/16/21 3:08 AM, Martin Liška wrote: > > On 3/15/21 9:57 PM, Martin Sebor wrote: > > > Any plans to integrate it into the testsuite? (That way we presu > > > mably > > > wouldn't need to remember to run it by hand.) > > > > Likely not, I'm not so big friend with DejaGNU. > > Are you willing to help me with that? > > I'm not a fan either but that's what we've got. And sure, I'll help > if/when I can. I think it should be "straightforward" to rewrite > the script in Expect (as much as anything is straightforward in > Expect). Or, it could stay as is and be invoked from some .exp > file in testsuite. Although is Python a required prerequisite > for running the testsuite? If not, it might be better to either > rewrite the script in something that is (e.g., AWK) or in Expect. I find it painful writing non-trivial logic in Tcl. We already have run-gcov-pytest in gcov.exp which calls out to a python3 script, checking if python3 is supported, and bailing with UNSUPPORTED otherwise. FWIW I think it's reasonable to have something similar for this case. Dave
Re: [PATCH][pushed] analyzer: document new param
On 3/16/21 3:08 AM, Martin Liška wrote: On 3/15/21 9:57 PM, Martin Sebor wrote: Any plans to integrate it into the testsuite? (That way we presumably wouldn't need to remember to run it by hand.) Likely not, I'm not so big friend with DejaGNU. Are you willing to help me with that? I'm not a fan either but that's what we've got. And sure, I'll help if/when I can. I think it should be "straightforward" to rewrite the script in Expect (as much as anything is straightforward in Expect). Or, it could stay as is and be invoked from some .exp file in testsuite. Although is Python a required prerequisite for running the testsuite? If not, it might be better to either rewrite the script in something that is (e.g., AWK) or in Expect. Martin Thanks, Martin
Re: RFA: libiberty: silence static analyzer warning
On Tue, Mar 16, 2021, 4:48 AM Nick Clifton via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > Hi Ian, > > One of the static analyzers we use is throwing up an error report for > one of the libiberty source files: > > Error: BUFFER_SIZE (CWE-474): > libiberty/sha1.c:261: overlapping_buffer: The source buffer > ">buffer[16]" potentially overlaps with the destination buffer > "ctx->buffer", which results in undefined behavior for "memcpy". > libiberty/sha1.c:261: remediation: Use memmove instead of "memcpy". > # 259| sha1_process_block (ctx->buffer, 64, ctx); > # 260| left_over -= 64; > # 261|-> memcpy (ctx->buffer, >buffer[16], left_over); > # 262| } > # 263| ctx->buflen = left_over; > > Looking at the source code I am not sure if the problem can actually > be triggered in reality, but there seems to be no harm in being > cautious, so I would like to ask for permission to apply the following > patch: > > diff --git a/libiberty/sha1.c b/libiberty/sha1.c > index e3d7f86e351..7d15d48d11d 100644 > --- a/libiberty/sha1.c > +++ b/libiberty/sha1.c > @@ -258,7 +258,7 @@ sha1_process_bytes (const void *buffer, size_t len, > struct sha1_ctx *ctx) > { > sha1_process_block (ctx->buffer, 64, ctx); > left_over -= 64; > - memcpy (ctx->buffer, >buffer[16], left_over); > + memmove (ctx->buffer, >buffer[16], left_over); > } >ctx->buflen = left_over; > } That is ok. Thanks. Ian > >
Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]
I suspect that this is incorrect. Did you look at the discussion of the choice of anchor limits when first implemented? We specifically chose 32 bit range. Thanks, David On Tue, Mar 16, 2021 at 10:21 AM will schmidt wrote: > > On Mon, 2021-03-15 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote: > > Hi, > > > > This patch adds const_anchor for rs6000. The const_anchor is used > > in cse pass. > > > > The attachment are the patch diff and change log file. > > > > Bootstrapped and tested on powerpc64le with no regressions. Is this > > okay for trunk? Any recommendations? Thanks a lot. > > > Be sure to CC David and Segher to help ensure they see the arch > specific patch. :-) > > > > > * config/rs6000/rs6000.c (rs6000_option_override_internal): Set > > targetm.const_anchor, targetm.min_anchor_offset > > and targetm.max_anchor_offset. > > > lgtm. > > > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > > index ec068c58aa5..2b2350c53ae 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p) > > warning (0, "%qs is deprecated and not recommended in any > > circumstances", > >"-mno-speculate-indirect-jumps"); > > > > + if (TARGET_64BIT) > > +{ > > + targetm.min_anchor_offset = -32768; > > + targetm.max_anchor_offset = 32767; > > + targetm.const_anchor = 0x8000; > > +} > > + > > > The mix of decimal and hexadecimal notation catches my eye, but > this matches the style I see for other architectures, mips in > particular. > > Do we want/need to explicitly set the values for !TARGET_64BIT ? (I > can't immediately tell what the default values are). > > lgtm. > > > >return ret; > > } > > >
Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]
On Mon, 2021-03-15 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote: > Hi, > > This patch adds const_anchor for rs6000. The const_anchor is > used > in cse pass. > > The attachment are the patch diff and change log file. > > Bootstrapped and tested on powerpc64le with no regressions. Is > this > okay for trunk? Any recommendations? Thanks a lot. > > * config/rs6000/rs6000.c (rs6000_option_override_internal): Set > targetm.const_anchor, targetm.min_anchor_offset > and targetm.max_anchor_offset. Part two of my review (i missed this first time).. :-) Some variation of "PR Target/33699 " should be included as part of the changelog blurb. Thanks -WIll
Re: [PATCH 1/2, rs6000] Add const_anchor for rs6000 [PR33699]
On Mon, 2021-03-15 at 11:11 +0800, HAO CHEN GUI via Gcc-patches wrote: > Hi, > > This patch adds const_anchor for rs6000. The const_anchor is used > in cse pass. > > The attachment are the patch diff and change log file. > > Bootstrapped and tested on powerpc64le with no regressions. Is this > okay for trunk? Any recommendations? Thanks a lot. Be sure to CC David and Segher to help ensure they see the arch specific patch. :-) > > * config/rs6000/rs6000.c (rs6000_option_override_internal): Set > targetm.const_anchor, targetm.min_anchor_offset > and targetm.max_anchor_offset. lgtm. > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index ec068c58aa5..2b2350c53ae 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -4911,6 +4911,13 @@ rs6000_option_override_internal (bool global_init_p) > warning (0, "%qs is deprecated and not recommended in any circumstances", >"-mno-speculate-indirect-jumps"); > > + if (TARGET_64BIT) > +{ > + targetm.min_anchor_offset = -32768; > + targetm.max_anchor_offset = 32767; > + targetm.const_anchor = 0x8000; > +} > + The mix of decimal and hexadecimal notation catches my eye, but this matches the style I see for other architectures, mips in particular. Do we want/need to explicitly set the values for !TARGET_64BIT ? (I can't immediately tell what the default values are). lgtm. >return ret; > } >
[PATCH][pushed] options: ignore flag_ipa_ra in cl_optimization_compare
One more exception for cl_optimization_compare. Pushed to master, thanks, Martin gcc/ChangeLog: PR target/99592 * optc-save-gen.awk: Add flag_ipa_ra to exceptions for cl_optimization_compare function. gcc/testsuite/ChangeLog: PR target/99592 * gcc.target/arm/pr99592.c: New test. --- gcc/optc-save-gen.awk | 1 + gcc/testsuite/gcc.target/arm/pr99592.c | 7 +++ 2 files changed, 8 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/pr99592.c diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk index 14b8d03888e..19afa895930 100644 --- a/gcc/optc-save-gen.awk +++ b/gcc/optc-save-gen.awk @@ -1445,6 +1445,7 @@ checked_options["TARGET_CASE_VECTOR_PC_RELATIVE"]++ checked_options["arc_size_opt_level"]++ # arm exceptions checked_options["arm_fp16_format"]++ +checked_options["flag_ipa_ra"]++ # s390 exceptions checked_options["param_max_completely_peel_times"]++ checked_options["param_max_completely_peeled_insns"]++ diff --git a/gcc/testsuite/gcc.target/arm/pr99592.c b/gcc/testsuite/gcc.target/arm/pr99592.c new file mode 100644 index 000..23d6591ba16 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr99592.c @@ -0,0 +1,7 @@ +/* PR target/99592 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -pg" } */ + +#pragma GCC push_options +#pragma GCC target "" +#pragma GCC pop_options -- 2.30.2
Re: [PATCH 2/2] libstdc++: Remove symbols for new std::call_once implementation [PR 99341]
On 12/03/21 17:46 +, Jonathan Wakely wrote: This removes the new symbols added for the new futex-based std::call_once implementation. These symbols were new on trunk, so not in any released version. However, they are already present in some beta distro releases (Fedora Linux 34) and in Fedora Linux rawhide. This change can be locally reverted by distros that need to keep the symbols present until affected packages have been rebuilt. libstdc++-v3/ChangeLog: PR libstdc++/99341 * config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Remove std::once_flag symbols. * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/m68k-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/pre/gnu.ver: Likewise. * src/c++11/mutex.cc [_GLIBCXX_HAVE_LINUX_FUTEX] (struct __once_flag_compat): Remove. (_ZNSt9once_flag11_M_activateEv): Remove. (_ZNSt9once_flag9_M_finishEb): Remove. Tested x86_64-linux, powerpc64-linux and powerpc64le-linux, but not yet committed (it's not really appropriate for a last-minute Friday change!) I've now pushed [PATCH 1/2] and this one.
Re: [PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]
On Tue, Mar 16, 2021 at 12:52 PM Jakub Jelinek wrote: > > On Tue, Mar 16, 2021 at 11:55:01AM +0100, Uros Bizjak wrote: > > Maybe we could simply emit a special form of a ASHIFT pattern, tagged > > with some unspec (similar to e.g. divmod4_1), and teach > > ix86_split_lea_for_addr to emit it instead? Peephole pass is so late > > in the pass sequence that we won't lose anything. We only need one > > additional SWI48mode ASHIFT pattern with a const123_operand immediate. > > Ok. Any reason not to use just MULT for that and split it back into > ASHIFT during the split pass that follows shortly after peephole2? Works for me. Uros. > > 2021-03-16 Jakub Jelinek > > PR target/99600 > * config/i386/i386-expand.c (ix86_split_lea_for_addr): Emit a MULT > rather than ASHIFT. > * config/i386/i386.md (mult by 1248 into ashift): New splitter. > > * gcc.target/i386/pr99600.c: New test. > > --- gcc/config/i386/i386-expand.c.jj2021-03-16 11:16:08.487860451 +0100 > +++ gcc/config/i386/i386-expand.c 2021-03-16 12:26:20.331083409 +0100 > @@ -1348,9 +1348,10 @@ ix86_split_lea_for_addr (rtx_insn *insn, > if (regno0 != regno2) > emit_insn (gen_rtx_SET (target, parts.index)); > > - /* Use shift for scaling. */ > - ix86_emit_binop (ASHIFT, mode, target, > - GEN_INT (exact_log2 (parts.scale))); > + /* Use shift for scaling, but emit it as MULT instead > +to avoid it being immediately peephole2 optimized back > +into lea. */ > + ix86_emit_binop (MULT, mode, target, GEN_INT (parts.scale)); > > if (parts.base) > ix86_emit_binop (PLUS, mode, target, parts.base); > --- gcc/config/i386/i386.md.jj 2021-03-16 00:21:12.192422264 +0100 > +++ gcc/config/i386/i386.md 2021-03-16 12:41:27.384022356 +0100 > @@ -5219,6 +5219,18 @@ (define_peephole2 > >DONE; > }) > + > +;; ix86_split_lea_for_addr emits the shifts as MULT to avoid it from being > +;; peephole2 optimized back into a lea. Split that into the shift during > +;; the following split pass. > +(define_split > + [(set (match_operand:SWI48 0 "general_reg_operand") > + (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 > "const1248_operand"))) > + (clobber (reg:CC FLAGS_REG))] > + "reload_completed" > + [(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1))) > + (clobber (reg:CC FLAGS_REG))])] > + "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));") > > ;; Add instructions > > --- gcc/testsuite/gcc.target/i386/pr99600.c.jj 2021-03-16 12:26:50.610747515 > +0100 > +++ gcc/testsuite/gcc.target/i386/pr99600.c 2021-03-16 12:26:50.610747515 > +0100 > @@ -0,0 +1,16 @@ > +/* PR target/99600 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=atom" } */ > + > +char a, b; > +long c; > + > +long > +foo (void) > +{ > + if (a) > +c = b == 1 ? 1 << 3 : 1 << 2; > + else > +c = 0; > + return 0; > +} > > > Jakub >
Re: [PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]
On Tue, Mar 16, 2021 at 11:55:01AM +0100, Uros Bizjak wrote: > Maybe we could simply emit a special form of a ASHIFT pattern, tagged > with some unspec (similar to e.g. divmod4_1), and teach > ix86_split_lea_for_addr to emit it instead? Peephole pass is so late > in the pass sequence that we won't lose anything. We only need one > additional SWI48mode ASHIFT pattern with a const123_operand immediate. Ok. Any reason not to use just MULT for that and split it back into ASHIFT during the split pass that follows shortly after peephole2? 2021-03-16 Jakub Jelinek PR target/99600 * config/i386/i386-expand.c (ix86_split_lea_for_addr): Emit a MULT rather than ASHIFT. * config/i386/i386.md (mult by 1248 into ashift): New splitter. * gcc.target/i386/pr99600.c: New test. --- gcc/config/i386/i386-expand.c.jj2021-03-16 11:16:08.487860451 +0100 +++ gcc/config/i386/i386-expand.c 2021-03-16 12:26:20.331083409 +0100 @@ -1348,9 +1348,10 @@ ix86_split_lea_for_addr (rtx_insn *insn, if (regno0 != regno2) emit_insn (gen_rtx_SET (target, parts.index)); - /* Use shift for scaling. */ - ix86_emit_binop (ASHIFT, mode, target, - GEN_INT (exact_log2 (parts.scale))); + /* Use shift for scaling, but emit it as MULT instead +to avoid it being immediately peephole2 optimized back +into lea. */ + ix86_emit_binop (MULT, mode, target, GEN_INT (parts.scale)); if (parts.base) ix86_emit_binop (PLUS, mode, target, parts.base); --- gcc/config/i386/i386.md.jj 2021-03-16 00:21:12.192422264 +0100 +++ gcc/config/i386/i386.md 2021-03-16 12:41:27.384022356 +0100 @@ -5219,6 +5219,18 @@ (define_peephole2 DONE; }) + +;; ix86_split_lea_for_addr emits the shifts as MULT to avoid it from being +;; peephole2 optimized back into a lea. Split that into the shift during +;; the following split pass. +(define_split + [(set (match_operand:SWI48 0 "general_reg_operand") + (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 "const1248_operand"))) + (clobber (reg:CC FLAGS_REG))] + "reload_completed" + [(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1))) + (clobber (reg:CC FLAGS_REG))])] + "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));") ;; Add instructions --- gcc/testsuite/gcc.target/i386/pr99600.c.jj 2021-03-16 12:26:50.610747515 +0100 +++ gcc/testsuite/gcc.target/i386/pr99600.c 2021-03-16 12:26:50.610747515 +0100 @@ -0,0 +1,16 @@ +/* PR target/99600 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=atom" } */ + +char a, b; +long c; + +long +foo (void) +{ + if (a) +c = b == 1 ? 1 << 3 : 1 << 2; + else +c = 0; + return 0; +} Jakub
RFA: libiberty: silence static analyzer warning
Hi Ian, One of the static analyzers we use is throwing up an error report for one of the libiberty source files: Error: BUFFER_SIZE (CWE-474): libiberty/sha1.c:261: overlapping_buffer: The source buffer ">buffer[16]" potentially overlaps with the destination buffer "ctx->buffer", which results in undefined behavior for "memcpy". libiberty/sha1.c:261: remediation: Use memmove instead of "memcpy". # 259| sha1_process_block (ctx->buffer, 64, ctx); # 260| left_over -= 64; # 261|-> memcpy (ctx->buffer, >buffer[16], left_over); # 262| } # 263| ctx->buflen = left_over; Looking at the source code I am not sure if the problem can actually be triggered in reality, but there seems to be no harm in being cautious, so I would like to ask for permission to apply the following patch: diff --git a/libiberty/sha1.c b/libiberty/sha1.c index e3d7f86e351..7d15d48d11d 100644 --- a/libiberty/sha1.c +++ b/libiberty/sha1.c @@ -258,7 +258,7 @@ sha1_process_bytes (const void *buffer, size_t len, struct sha1_ctx *ctx) { sha1_process_block (ctx->buffer, 64, ctx); left_over -= 64; - memcpy (ctx->buffer, >buffer[16], left_over); + memmove (ctx->buffer, >buffer[16], left_over); } ctx->buflen = left_over; } Cheers Nick
Re: c++: Fix 2 testcases [PR 99601]
On 3/15/21 7:29 PM, Jakub Jelinek wrote: On Mon, Mar 15, 2021 at 03:28:06PM -0400, Nathan Sidwell wrote: I'd failed to correctly restrict some checks to lp64 x86 targets. PR c++/99601 gcc/testsuite/ * g++.dg/modules/builtin-3_a.C: Fix lp64 x86 detection. * g++.dg/modules/builtin-3_b.C: Fix lp64 x86 detection. ERROR: tcl error sourcing /home/jakub/src/gcc/gcc/testsuite/g++.dg/modules/modules.exp. ERROR: unmatched open brace in list Sigh, I always fall for the trap of 'check the fail went away', not 'verify it now passes'. Thanks for fixing. nathan -- Nathan Sidwell
Re: GCC: v850-elf
Hi Jan-Benedict, With my re-started testing efforts, I've got another one for you I guess (`./configure --target=v850-elf && make all-gcc`): ../.././gcc/config/v850/v850.c: In function ‘char* construct_restore_jr(rtx)’: ../.././gcc/config/v850/v850.c:2260:22: error: ‘%s’ directive writing up to 39 bytes into a region of size between 32 and 71 [-Werror=format-overflow=] 2260 | sprintf (buff, "movhi hi(%s), r0, r6\n\tmovea lo(%s), r6, r6\n\tjmp r6", | ^~~~ 2261 | name, name); | I could not reproduce these in my local environment, but I suspect that you are using a more recent version of gcc than me. The fix looks obvious however, so please could you try out the attached patch and let me know if it resolves the problem ? Cheers Nick diff --git a/gcc/config/v850/v850.c b/gcc/config/v850/v850.c index 249cb400177..db3002a2cfb 100644 --- a/gcc/config/v850/v850.c +++ b/gcc/config/v850/v850.c @@ -2181,7 +2181,7 @@ construct_restore_jr (rtx op) unsigned long int first; unsigned long int last; int i; - static char buff [100]; /* XXX */ + static char buff [256]; /* XXX */ if (count <= 2) {
GCC 11.0.1 Status Report (2021-03-16)
Status == GCC trunk which eventually will become GCC 11 is still in regression and documentation fixes only mode (Stage 4). If history should repeat itself then a first release candidate of GCC 11 will be released mid April. For this to happen we need to resolve the remaining 17 P1 regressions - 8 of which are classified as target or bootstrap issues. Please have a look at the set of P1 (and preferably also P2) regressions which can be conveniently accessed via the 'Serious regressions' link on https://gcc.gnu.org Quality Data Priority # Change from last report --- --- P1 17 - 45 P2 288 - 46 P3 37 + 1 P4 192 + 2 P5 24 --- --- Total P1-P3 342 - 90 Total 558 - 88 Previous Report === https://gcc.gnu.org/pipermail/gcc/2021-January/234703.html
Re: [PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]
On Tue, Mar 16, 2021 at 11:10 AM Jakub Jelinek wrote: > > Hi! > > As the testcase shows, the compiler hangs and eats all memory when compiling > it. This is because in r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54 > I have changed the ix86_avoid_lea_for_addr splitting from a splitter > into a peephole2 (because during splitting passes we don't have guaranteed > df, while during peephole2 we do). > The problem is we have another peephole2 that works in the opposite way, > when seeing split lea (in particular ASHIFT followed by PLUS) it attempts > to turn it back into a lea. > In the past, they were fighting against each other, but as they were in > different passes, simply the last one won. So, split after reload > split the lea into shift left and plus, peephole2 reverted that (but, note > not perfectly, the peephole2 doesn't understand that something can be placed > into lea disp; to be fixed for GCC12) and then another split pass split the > lea appart again. > But my changes and the way peephole2 works means that we endlessly iterate > over those two, the first peephole2 splits the lea, the second one reverts > it, the first peephole2 splits the new lea back into new 2 insns and so > forth forever. > So, we need to break the cycle somehow. Best would be to call > ix86_avoid_lea_for_addr in the second peephole2 and if it says it would be > split, undo, but unfortunately calling that function is very non-trivial, > because it needs to walk insns forwards and backwards from the given insn > and uses df in those walks. Furthermore, one of the registers in the new > lea is set by the newly added insn. So I'd be afraid we'd need to > temporarily register the new insns with df and replace in the IL the old > insns with new insns (and undo their df), then call the function and then > undo everything we did. > So, this patch instead breaks the cycle by remembering INSN_UID of the last > ASHIFT ix86_split_lea_for_addr emitted and punting on the ASHIFT + PLUS > peephole2 when it is that uid. We need to reset it somewhere, so I've > added clearing of the var to ix86_expand_prologue which is guaranteed to be > a few passes before peephole2. Maybe we could simply emit a special form of a ASHIFT pattern, tagged with some unspec (similar to e.g. divmod4_1), and teach ix86_split_lea_for_addr to emit it instead? Peephole pass is so late in the pass sequence that we won't lose anything. We only need one additional SWI48mode ASHIFT pattern with a const123_operand immediate. Uros. > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Or would you e.g. prefer if I don't use a global var but stick it into > cfun->machine->last_lea_split_shift_uid ? That way it would be cleared > automatically. > > 2021-03-16 Jakub Jelinek > > PR target/99600 > * config/i386/i386-protos.h (ix86_last_lea_split_shift_uid): Declare. > * config/i386/i386-expand.c (ix86_last_lea_split_shift_uid): New > variable. > (ix86_split_lea_for_addr): Set it to INSN_UID of the emitted shift > insn. > * config/i386/i386.md (ashift + add peephole2 into lea): Punt if > INSN_UID of ashift is ix86_last_lea_split_shift_uid. > * config/i386/i386.c (ix86_expand_prologue): Clear > ix86_last_lea_split_shift_uid. > > * gcc.target/i386/pr99600.c: New test. > > --- gcc/config/i386/i386-protos.h.jj2021-01-04 10:25:45.436159056 +0100 > +++ gcc/config/i386/i386-protos.h 2021-03-15 20:06:36.267515383 +0100 > @@ -109,6 +109,7 @@ extern bool ix86_binary_operator_ok (enu > extern bool ix86_avoid_lea_for_add (rtx_insn *, rtx[]); > extern bool ix86_use_lea_for_mov (rtx_insn *, rtx[]); > extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]); > +extern int ix86_last_lea_split_shift_uid; > extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode); > extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]); > extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool > high); > --- gcc/config/i386/i386-expand.c.jj2021-03-15 12:34:26.549901726 +0100 > +++ gcc/config/i386/i386-expand.c 2021-03-15 20:08:52.992016204 +0100 > @@ -1291,6 +1291,10 @@ find_nearest_reg_def (rtx_insn *insn, in >return false; > } > > +/* INSN_UID of the last ASHIFT insn emitted by ix86_split_lea_for_addr > + or zero if none has been emitted yet. */ > +int ix86_last_lea_split_shift_uid; > + > /* Split lea instructions into a sequence of instructions > which are executed on ALU to avoid AGU stalls. > It is assumed that it is allowed to clobber flags register > @@ -1352,6 +1356,9 @@ ix86_split_lea_for_addr (rtx_insn *insn, > ix86_emit_binop (ASHIFT, mode, target, >GEN_INT (exact_log2 (parts.scale))); > > + rtx_insn *ashift_insn = get_last_insn (); > + ix86_last_lea_split_shift_uid = INSN_UID (ashift_insn); > + > if (parts.base) > ix86_emit_binop
[PATCH][pushed] gcc-changelog: skip broken commit in git_update_version.py.
Hi. For an unknown reason the following commit modified ChangeLog file together with other changes: $ git show --stat c2be82058fb40f3ae891c68d185ff53e07f14f45 ... libstdc++-v3/ChangeLog | 9 + libstdc++-v3/src/Makefile.am | 4 +++- libstdc++-v3/src/Makefile.in | 3 ++- Let's ignore the commit in git_update_version.py. Martin contrib/ChangeLog: * gcc-changelog/git_update_version.py: Skip one problematic commit. --- contrib/gcc-changelog/git_update_version.py | 4 1 file changed, 4 insertions(+) diff --git a/contrib/gcc-changelog/git_update_version.py b/contrib/gcc-changelog/git_update_version.py index d2cadb8811c..1e2b22b008b 100755 --- a/contrib/gcc-changelog/git_update_version.py +++ b/contrib/gcc-changelog/git_update_version.py @@ -26,6 +26,9 @@ from git_repository import parse_git_revisions current_timestamp = datetime.datetime.now().strftime('%Y%m%d\n') +# Skip the following commits, they cannot be correctly processed +IGNORED_COMMITS = ('c2be82058fb40f3ae891c68d185ff53e07f14f45') + def read_timestamp(path): with open(path) as f: @@ -98,6 +101,7 @@ def update_current_branch(): head = head.parents[1] commits = parse_git_revisions(args.git_path, '%s..%s' % (commit.hexsha, head.hexsha)) +commits = [c for c in commits if c.info.hexsha not in IGNORED_COMMITS] for git_commit in reversed(commits): prepend_to_changelog_files(repo, args.git_path, git_commit, not args.dry_mode) -- 2.30.2
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 11:19 AM Jakub Jelinek wrote: > > On Tue, Mar 16, 2021 at 11:16:54AM +0100, Richard Biener wrote: > > > Not varargs semantics, but unprototyped function semantics, i.e. the same > > > as for > > > int foo (); > > > Which means callers can pass anything, but it is not ..., i.e. callee > > > can't > > > use va_start/va_end/va_arg and the ... calling conventions are not in > > > place > > > (e.g. the passing of arguments in two places on some targets, or the > > > extra %rax on x86_64 etc.). > > > > Right, so for the caller side looking at DECL_ARGUMENTS is wrong as well > > then, the actual argument list is what matters for the used ABI. > > > > Anyway, I suppose the hook may just give up for calls to unprototyped > > functions? > > It can't. We can punt at trying to optimize them that way in the > vectorizer, that is purely an optimization, but on the function definition, > it is an ABI thing and perhaps some caller in some other TU could have correct > prototype and call clones that don't exist. I suppose we could simply reject OMP simd on K style definitions, OTOH they seem to be supported even in C18 ... > Jakub >
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 11:16:54AM +0100, Richard Biener wrote: > > Not varargs semantics, but unprototyped function semantics, i.e. the same > > as for > > int foo (); > > Which means callers can pass anything, but it is not ..., i.e. callee can't > > use va_start/va_end/va_arg and the ... calling conventions are not in place > > (e.g. the passing of arguments in two places on some targets, or the > > extra %rax on x86_64 etc.). > > Right, so for the caller side looking at DECL_ARGUMENTS is wrong as well > then, the actual argument list is what matters for the used ABI. > > Anyway, I suppose the hook may just give up for calls to unprototyped > functions? It can't. We can punt at trying to optimize them that way in the vectorizer, that is purely an optimization, but on the function definition, it is an ABI thing and perhaps some caller in some other TU could have correct prototype and call clones that don't exist. Jakub
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 10:40 AM Jakub Jelinek wrote: > > On Tue, Mar 16, 2021 at 10:36:35AM +0100, Richard Biener wrote: > > On Tue, Mar 16, 2021 at 10:29 AM Jakub Jelinek wrote: > > > > > > On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote: > > > > > shows we have an interface problem here -- we shouldn't need > > > > > something this > > > > > convoluted to walk through an argument list. But that's not your > > > > > problem > > > > > or a problem with the patch. > > > > > > > > The caller side should IMHO always look at TYPE_ARG_TYPES since > > > > that's what determines the ABI. But IIRC there were problems in the > > > > past > > > > that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?! > > > > > > Yes, for the K definitions > > > #pragma omp declare simd > > > int > > > foo (a, b, c) > > > int a, b, c; > > > { > > > return a + b + c; > > > } > > > > The C frontend could still populate TYPE_ARG_TYPES here, OTOH IIRC > > K definitions work like unprototyped functions on the caller side, thus > > having varargs argument passing semantics? > > Not varargs semantics, but unprototyped function semantics, i.e. the same > as for > int foo (); > Which means callers can pass anything, but it is not ..., i.e. callee can't > use va_start/va_end/va_arg and the ... calling conventions are not in place > (e.g. the passing of arguments in two places on some targets, or the > extra %rax on x86_64 etc.). Right, so for the caller side looking at DECL_ARGUMENTS is wrong as well then, the actual argument list is what matters for the used ABI. Anyway, I suppose the hook may just give up for calls to unprototyped functions? Richard. > > Jakub >
[PATCH] i386: Avoid mutual recursion between two peephole2s [PR99600]
Hi! As the testcase shows, the compiler hangs and eats all memory when compiling it. This is because in r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54 I have changed the ix86_avoid_lea_for_addr splitting from a splitter into a peephole2 (because during splitting passes we don't have guaranteed df, while during peephole2 we do). The problem is we have another peephole2 that works in the opposite way, when seeing split lea (in particular ASHIFT followed by PLUS) it attempts to turn it back into a lea. In the past, they were fighting against each other, but as they were in different passes, simply the last one won. So, split after reload split the lea into shift left and plus, peephole2 reverted that (but, note not perfectly, the peephole2 doesn't understand that something can be placed into lea disp; to be fixed for GCC12) and then another split pass split the lea appart again. But my changes and the way peephole2 works means that we endlessly iterate over those two, the first peephole2 splits the lea, the second one reverts it, the first peephole2 splits the new lea back into new 2 insns and so forth forever. So, we need to break the cycle somehow. Best would be to call ix86_avoid_lea_for_addr in the second peephole2 and if it says it would be split, undo, but unfortunately calling that function is very non-trivial, because it needs to walk insns forwards and backwards from the given insn and uses df in those walks. Furthermore, one of the registers in the new lea is set by the newly added insn. So I'd be afraid we'd need to temporarily register the new insns with df and replace in the IL the old insns with new insns (and undo their df), then call the function and then undo everything we did. So, this patch instead breaks the cycle by remembering INSN_UID of the last ASHIFT ix86_split_lea_for_addr emitted and punting on the ASHIFT + PLUS peephole2 when it is that uid. We need to reset it somewhere, so I've added clearing of the var to ix86_expand_prologue which is guaranteed to be a few passes before peephole2. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Or would you e.g. prefer if I don't use a global var but stick it into cfun->machine->last_lea_split_shift_uid ? That way it would be cleared automatically. 2021-03-16 Jakub Jelinek PR target/99600 * config/i386/i386-protos.h (ix86_last_lea_split_shift_uid): Declare. * config/i386/i386-expand.c (ix86_last_lea_split_shift_uid): New variable. (ix86_split_lea_for_addr): Set it to INSN_UID of the emitted shift insn. * config/i386/i386.md (ashift + add peephole2 into lea): Punt if INSN_UID of ashift is ix86_last_lea_split_shift_uid. * config/i386/i386.c (ix86_expand_prologue): Clear ix86_last_lea_split_shift_uid. * gcc.target/i386/pr99600.c: New test. --- gcc/config/i386/i386-protos.h.jj2021-01-04 10:25:45.436159056 +0100 +++ gcc/config/i386/i386-protos.h 2021-03-15 20:06:36.267515383 +0100 @@ -109,6 +109,7 @@ extern bool ix86_binary_operator_ok (enu extern bool ix86_avoid_lea_for_add (rtx_insn *, rtx[]); extern bool ix86_use_lea_for_mov (rtx_insn *, rtx[]); extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]); +extern int ix86_last_lea_split_shift_uid; extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode); extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]); extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high); --- gcc/config/i386/i386-expand.c.jj2021-03-15 12:34:26.549901726 +0100 +++ gcc/config/i386/i386-expand.c 2021-03-15 20:08:52.992016204 +0100 @@ -1291,6 +1291,10 @@ find_nearest_reg_def (rtx_insn *insn, in return false; } +/* INSN_UID of the last ASHIFT insn emitted by ix86_split_lea_for_addr + or zero if none has been emitted yet. */ +int ix86_last_lea_split_shift_uid; + /* Split lea instructions into a sequence of instructions which are executed on ALU to avoid AGU stalls. It is assumed that it is allowed to clobber flags register @@ -1352,6 +1356,9 @@ ix86_split_lea_for_addr (rtx_insn *insn, ix86_emit_binop (ASHIFT, mode, target, GEN_INT (exact_log2 (parts.scale))); + rtx_insn *ashift_insn = get_last_insn (); + ix86_last_lea_split_shift_uid = INSN_UID (ashift_insn); + if (parts.base) ix86_emit_binop (PLUS, mode, target, parts.base); --- gcc/config/i386/i386.md.jj 2021-03-05 21:51:33.675350463 +0100 +++ gcc/config/i386/i386.md 2021-03-15 20:16:03.796292449 +0100 @@ -20680,6 +20680,11 @@ (define_peephole2 (match_operand 4 "x86_64_general_operand"))) (clobber (reg:CC FLAGS_REG))])] "IN_RANGE (INTVAL (operands[2]), 1, 3) + /* Avoid the ix86_avoid_lea_for_addr peephole2 from splitting + lea and this peephole2 to undo that into another lea that + would be split again. Break that cycle by ignoring
Re: [PATCH] i386: Fix up _mm256_vzeroupper() handling [PR99563]
On Tue, Mar 16, 2021 at 10:51 AM Jakub Jelinek wrote: > > Hi! > > My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for > vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling > when the implicit vzeroupper handling is disabled. > The epilogue_completed splitter for vzeroupper now adds clobbers for all > registers which don't have explicit sets in the pattern and the sets are > added during vzeroupper pass. Before my changes, for explicit user > vzeroupper, we just weren't modelling its effects at all, it was just > unspec that didn't tell that it clobbers the upper parts of all XMM < %xmm16 > registers. But now the splitter will even for those add clobbers and as > it has no sets, it will add clobbers for all registers, which means > we optimize away anything that lived across that vzeroupper. > > The vzeroupper pass has two parts, one is the mode switching that computes > where to put the implicit vzeroupper calls and puts them there, and then > another that uses df to figure out what sets to add to all the vzeroupper. > The former part should be done only under the conditions we have in the > gate, but the latter as this PR shows needs to happen either if we perform > the implicit vzeroupper additions, or if there are (or could be) any > explicit vzeroupper instructions. As that function does df_analyze and > walks the whole IL, I think it would be too expensive to run it always > whenever TARGET_AVX, so this patch remembers if we've expanded at least > one __builtin_ia32_vzeroupper in the function and runs that part of the > vzeroupper pass both when the old condition is true or when this new > flag is set. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-03-16 Jakub Jelinek > > PR target/99563 > * config/i386/i386.h (struct machine_function): Add > has_explicit_vzeroupper bitfield. > * config/i386/i386-expand.c (ix86_expand_builtin): Set > cfun->machine->has_explicit_vzeroupper when expanding > IX86_BUILTIN_VZEROUPPER. > * config/i386/i386-features.c (rest_of_handle_insert_vzeroupper): > Do the mode switching only when TARGET_VZEROUPPER, expensive > optimizations turned on and not optimizing for size. > (pass_insert_vzeroupper::gate): Enable even when > cfun->machine->has_explicit_vzeroupper is set. > > * gcc.target/i386/avx-pr99563.c: New test. OK. Thanks, Uros. > > --- gcc/config/i386/i386.h.jj 2021-02-22 17:54:05.617799002 +0100 > +++ gcc/config/i386/i386.h 2021-03-15 12:30:00.814841624 +0100 > @@ -2941,6 +2941,10 @@ struct GTY(()) machine_function { >/* True if the function needs a stack frame. */ >BOOL_BITFIELD stack_frame_required : 1; > > + /* True if __builtin_ia32_vzeroupper () has been expanded in current > + function. */ > + BOOL_BITFIELD has_explicit_vzeroupper : 1; > + >/* The largest alignment, in bytes, of stack slot actually used. */ >unsigned int max_used_stack_alignment; > > --- gcc/config/i386/i386-expand.c.jj2021-02-09 12:28:14.069323264 +0100 > +++ gcc/config/i386/i386-expand.c 2021-03-15 12:34:26.549901726 +0100 > @@ -13210,6 +13210,10 @@ rdseed_step: > >return 0; > > +case IX86_BUILTIN_VZEROUPPER: > + cfun->machine->has_explicit_vzeroupper = true; > + break; > + > default: >break; > } > --- gcc/config/i386/i386-features.c.jj 2021-02-01 09:55:45.953519272 +0100 > +++ gcc/config/i386/i386-features.c 2021-03-15 12:37:07.886116827 +0100 > @@ -1837,19 +1837,22 @@ ix86_add_reg_usage_to_vzerouppers (void) > static unsigned int > rest_of_handle_insert_vzeroupper (void) > { > - int i; > - > - /* vzeroupper instructions are inserted immediately after reload to > - account for possible spills from 256bit or 512bit registers. The pass > - reuses mode switching infrastructure by re-running mode insertion > - pass, so disable entities that have already been processed. */ > - for (i = 0; i < MAX_386_ENTITIES; i++) > -ix86_optimize_mode_switching[i] = 0; > + if (TARGET_VZEROUPPER > + && flag_expensive_optimizations > + && !optimize_size) > +{ > + /* vzeroupper instructions are inserted immediately after reload to > +account for possible spills from 256bit or 512bit registers. The > pass > +reuses mode switching infrastructure by re-running mode insertion > +pass, so disable entities that have already been processed. */ > + for (int i = 0; i < MAX_386_ENTITIES; i++) > + ix86_optimize_mode_switching[i] = 0; > > - ix86_optimize_mode_switching[AVX_U128] = 1; > + ix86_optimize_mode_switching[AVX_U128] = 1; > > - /* Call optimize_mode_switching. */ > - g->get_passes ()->execute_pass_mode_switching (); > + /* Call optimize_mode_switching. */ > + g->get_passes ()->execute_pass_mode_switching (); > +} >ix86_add_reg_usage_to_vzerouppers (); >
[PATCH] i386: Fix up _mm256_vzeroupper() handling [PR99563]
Hi! My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling when the implicit vzeroupper handling is disabled. The epilogue_completed splitter for vzeroupper now adds clobbers for all registers which don't have explicit sets in the pattern and the sets are added during vzeroupper pass. Before my changes, for explicit user vzeroupper, we just weren't modelling its effects at all, it was just unspec that didn't tell that it clobbers the upper parts of all XMM < %xmm16 registers. But now the splitter will even for those add clobbers and as it has no sets, it will add clobbers for all registers, which means we optimize away anything that lived across that vzeroupper. The vzeroupper pass has two parts, one is the mode switching that computes where to put the implicit vzeroupper calls and puts them there, and then another that uses df to figure out what sets to add to all the vzeroupper. The former part should be done only under the conditions we have in the gate, but the latter as this PR shows needs to happen either if we perform the implicit vzeroupper additions, or if there are (or could be) any explicit vzeroupper instructions. As that function does df_analyze and walks the whole IL, I think it would be too expensive to run it always whenever TARGET_AVX, so this patch remembers if we've expanded at least one __builtin_ia32_vzeroupper in the function and runs that part of the vzeroupper pass both when the old condition is true or when this new flag is set. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-03-16 Jakub Jelinek PR target/99563 * config/i386/i386.h (struct machine_function): Add has_explicit_vzeroupper bitfield. * config/i386/i386-expand.c (ix86_expand_builtin): Set cfun->machine->has_explicit_vzeroupper when expanding IX86_BUILTIN_VZEROUPPER. * config/i386/i386-features.c (rest_of_handle_insert_vzeroupper): Do the mode switching only when TARGET_VZEROUPPER, expensive optimizations turned on and not optimizing for size. (pass_insert_vzeroupper::gate): Enable even when cfun->machine->has_explicit_vzeroupper is set. * gcc.target/i386/avx-pr99563.c: New test. --- gcc/config/i386/i386.h.jj 2021-02-22 17:54:05.617799002 +0100 +++ gcc/config/i386/i386.h 2021-03-15 12:30:00.814841624 +0100 @@ -2941,6 +2941,10 @@ struct GTY(()) machine_function { /* True if the function needs a stack frame. */ BOOL_BITFIELD stack_frame_required : 1; + /* True if __builtin_ia32_vzeroupper () has been expanded in current + function. */ + BOOL_BITFIELD has_explicit_vzeroupper : 1; + /* The largest alignment, in bytes, of stack slot actually used. */ unsigned int max_used_stack_alignment; --- gcc/config/i386/i386-expand.c.jj2021-02-09 12:28:14.069323264 +0100 +++ gcc/config/i386/i386-expand.c 2021-03-15 12:34:26.549901726 +0100 @@ -13210,6 +13210,10 @@ rdseed_step: return 0; +case IX86_BUILTIN_VZEROUPPER: + cfun->machine->has_explicit_vzeroupper = true; + break; + default: break; } --- gcc/config/i386/i386-features.c.jj 2021-02-01 09:55:45.953519272 +0100 +++ gcc/config/i386/i386-features.c 2021-03-15 12:37:07.886116827 +0100 @@ -1837,19 +1837,22 @@ ix86_add_reg_usage_to_vzerouppers (void) static unsigned int rest_of_handle_insert_vzeroupper (void) { - int i; - - /* vzeroupper instructions are inserted immediately after reload to - account for possible spills from 256bit or 512bit registers. The pass - reuses mode switching infrastructure by re-running mode insertion - pass, so disable entities that have already been processed. */ - for (i = 0; i < MAX_386_ENTITIES; i++) -ix86_optimize_mode_switching[i] = 0; + if (TARGET_VZEROUPPER + && flag_expensive_optimizations + && !optimize_size) +{ + /* vzeroupper instructions are inserted immediately after reload to +account for possible spills from 256bit or 512bit registers. The pass +reuses mode switching infrastructure by re-running mode insertion +pass, so disable entities that have already been processed. */ + for (int i = 0; i < MAX_386_ENTITIES; i++) + ix86_optimize_mode_switching[i] = 0; - ix86_optimize_mode_switching[AVX_U128] = 1; + ix86_optimize_mode_switching[AVX_U128] = 1; - /* Call optimize_mode_switching. */ - g->get_passes ()->execute_pass_mode_switching (); + /* Call optimize_mode_switching. */ + g->get_passes ()->execute_pass_mode_switching (); +} ix86_add_reg_usage_to_vzerouppers (); return 0; } @@ -1880,8 +1883,10 @@ public: virtual bool gate (function *) { return TARGET_AVX -&& TARGET_VZEROUPPER && flag_expensive_optimizations -&& !optimize_size; +&& ((TARGET_VZEROUPPER + &&
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 10:36:35AM +0100, Richard Biener wrote: > On Tue, Mar 16, 2021 at 10:29 AM Jakub Jelinek wrote: > > > > On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote: > > > > shows we have an interface problem here -- we shouldn't need something > > > > this > > > > convoluted to walk through an argument list. But that's not your > > > > problem > > > > or a problem with the patch. > > > > > > The caller side should IMHO always look at TYPE_ARG_TYPES since > > > that's what determines the ABI. But IIRC there were problems in the past > > > that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?! > > > > Yes, for the K definitions > > #pragma omp declare simd > > int > > foo (a, b, c) > > int a, b, c; > > { > > return a + b + c; > > } > > The C frontend could still populate TYPE_ARG_TYPES here, OTOH IIRC > K definitions work like unprototyped functions on the caller side, thus > having varargs argument passing semantics? Not varargs semantics, but unprototyped function semantics, i.e. the same as for int foo (); Which means callers can pass anything, but it is not ..., i.e. callee can't use va_start/va_end/va_arg and the ... calling conventions are not in place (e.g. the passing of arguments in two places on some targets, or the extra %rax on x86_64 etc.). Jakub
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 10:29 AM Jakub Jelinek wrote: > > On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote: > > > shows we have an interface problem here -- we shouldn't need something > > > this > > > convoluted to walk through an argument list. But that's not your problem > > > or a problem with the patch. > > > > The caller side should IMHO always look at TYPE_ARG_TYPES since > > that's what determines the ABI. But IIRC there were problems in the past > > that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?! > > Yes, for the K definitions > #pragma omp declare simd > int > foo (a, b, c) > int a, b, c; > { > return a + b + c; > } The C frontend could still populate TYPE_ARG_TYPES here, OTOH IIRC K definitions work like unprototyped functions on the caller side, thus having varargs argument passing semantics? > Jakub >
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 09:46:33AM +0100, Richard Biener wrote: > > shows we have an interface problem here -- we shouldn't need something this > > convoluted to walk through an argument list. But that's not your problem > > or a problem with the patch. > > The caller side should IMHO always look at TYPE_ARG_TYPES since > that's what determines the ABI. But IIRC there were problems in the past > that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?! Yes, for the K definitions #pragma omp declare simd int foo (a, b, c) int a, b, c; { return a + b + c; } Jakub
Re: [PATCH][pushed] analyzer: document new param
On 3/15/21 9:57 PM, Martin Sebor wrote: Any plans to integrate it into the testsuite? (That way we presumably wouldn't need to remember to run it by hand.) Likely not, I'm not so big friend with DejaGNU. Are you willing to help me with that? Thanks, Martin
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
On Tue, Mar 16, 2021 at 9:43 AM Richard Sandiford via Gcc-patches wrote: > > Kyrylo Tkachov writes: > > Hi Jakub, > > > >> -Original Message- > >> From: Jakub Jelinek > >> Sent: 13 March 2021 20:34 > >> To: Richard Sandiford ; Richard Earnshaw > >> ; Kyrylo Tkachov > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: [PATCH] aarch64: Fix up > >> aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542] > >> > >> Hi! > >> > >> As the patch shows, there are several bugs in > >> aarch64_simd_clone_compute_vecsize_and_simdlen. > >> One is that unlike for function declarations that aren't definitions > >> it completely ignores argument types. Such decls don't have > >> DECL_ARGUMENTS, > >> but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like > >> the simd cloning code in the middle end does too. > >> > >> Another problem is that it checks types of uniform arguments. That is > >> unnecessary, uniform arguments are passed the way it normally is, it is > >> a scalar argument rather than vector, so there is no reason not to support > >> uniform argument of different size, or long double, structure etc. > >> > >> Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk? > > > > I don't have much experience in this part of GCC and I think you're best > > placed to make a judgement on the implementation of this hook. > > It's okay from my perspective (it doesn't look like it would break any SVE > > logic), but if you'd like to wait for a review from Richard I won't object. > > Yeah, it looks good to me too, sorry for the slow response. > > IMO: > > tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); > bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE); > > for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0; >t && t != void_list_node; t = TREE_CHAIN (t), i++) > { > tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); > > shows we have an interface problem here -- we shouldn't need something this > convoluted to walk through an argument list. But that's not your problem > or a problem with the patch. The caller side should IMHO always look at TYPE_ARG_TYPES since that's what determines the ABI. But IIRC there were problems in the past that TYPE_ARG_TYPES might be NULL but DECL_ARGUMENTS not?! > > Thanks, > Richard
Re: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
Kyrylo Tkachov writes: > Hi Jakub, > >> -Original Message- >> From: Jakub Jelinek >> Sent: 13 March 2021 20:34 >> To: Richard Sandiford ; Richard Earnshaw >> ; Kyrylo Tkachov >> Cc: gcc-patches@gcc.gnu.org >> Subject: [PATCH] aarch64: Fix up >> aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542] >> >> Hi! >> >> As the patch shows, there are several bugs in >> aarch64_simd_clone_compute_vecsize_and_simdlen. >> One is that unlike for function declarations that aren't definitions >> it completely ignores argument types. Such decls don't have >> DECL_ARGUMENTS, >> but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like >> the simd cloning code in the middle end does too. >> >> Another problem is that it checks types of uniform arguments. That is >> unnecessary, uniform arguments are passed the way it normally is, it is >> a scalar argument rather than vector, so there is no reason not to support >> uniform argument of different size, or long double, structure etc. >> >> Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk? > > I don't have much experience in this part of GCC and I think you're best > placed to make a judgement on the implementation of this hook. > It's okay from my perspective (it doesn't look like it would break any SVE > logic), but if you'd like to wait for a review from Richard I won't object. Yeah, it looks good to me too, sorry for the slow response. IMO: tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE); for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0; t && t != void_list_node; t = TREE_CHAIN (t), i++) { tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); shows we have an interface problem here -- we shouldn't need something this convoluted to walk through an argument list. But that's not your problem or a problem with the patch. Thanks, Richard
RE: [PATCH] aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]
Hi Jakub, > -Original Message- > From: Jakub Jelinek > Sent: 13 March 2021 20:34 > To: Richard Sandiford ; Richard Earnshaw > ; Kyrylo Tkachov > Cc: gcc-patches@gcc.gnu.org > Subject: [PATCH] aarch64: Fix up > aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542] > > Hi! > > As the patch shows, there are several bugs in > aarch64_simd_clone_compute_vecsize_and_simdlen. > One is that unlike for function declarations that aren't definitions > it completely ignores argument types. Such decls don't have > DECL_ARGUMENTS, > but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like > the simd cloning code in the middle end does too. > > Another problem is that it checks types of uniform arguments. That is > unnecessary, uniform arguments are passed the way it normally is, it is > a scalar argument rather than vector, so there is no reason not to support > uniform argument of different size, or long double, structure etc. > > Fixed thusly, bootstrapped/regtested on aarch64-linux, ok for trunk? I don't have much experience in this part of GCC and I think you're best placed to make a judgement on the implementation of this hook. It's okay from my perspective (it doesn't look like it would break any SVE logic), but if you'd like to wait for a review from Richard I won't object. Thanks, Kyrill > > 2021-03-13 Jakub Jelinek > > PR target/99542 > * config/aarch64/aarch64.c > (aarch64_simd_clone_compute_vecsize_and_simdlen): If not a > function > definition, walk TYPE_ARG_TYPES list if non-NULL for argument > types > instead of DECL_ARGUMENTS. Ignore types for uniform arguments. > > * gcc.dg/gomp/pr99542.c: New test. > * gcc.dg/gomp/pr59669-2.c (bar): Don't expect a warning on > aarch64. > * gcc.dg/gomp/simd-clones-2.c (setArray): Likewise. > * g++.dg/vect/simd-clone-7.cc (bar): Likewise. > * g++.dg/gomp/declare-simd-1.C (f37): Expect a different warning > on aarch64. > * gcc.dg/declare-simd.c (fn2): Expect a new warning on aarch64. > > --- gcc/config/aarch64/aarch64.c.jj 2021-03-12 19:01:48.672296344 > +0100 > +++ gcc/config/aarch64/aarch64.c 2021-03-13 09:16:42.585045538 > +0100 > @@ -23412,11 +23412,17 @@ > aarch64_simd_clone_compute_vecsize_and_s >return 0; > } > > - for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t)) > + int i; > + tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); > + bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE); > + > + for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i > = 0; > + t && t != void_list_node; t = TREE_CHAIN (t), i++) > { > - arg_type = TREE_TYPE (t); > + tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); > > - if (!currently_supported_simd_type (arg_type, base_type)) > + if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM > + && !currently_supported_simd_type (arg_type, base_type)) > { > if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type)) > warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > --- gcc/testsuite/gcc.dg/gomp/pr99542.c.jj2021-03-13 > 09:16:42.586045527 +0100 > +++ gcc/testsuite/gcc.dg/gomp/pr99542.c 2021-03-13 > 09:16:42.586045527 +0100 > @@ -0,0 +1,17 @@ > +/* PR middle-end/89246 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-O0 -fopenmp-simd" } */ > + > +#pragma omp declare simd > +extern int foo (__int128 x); /* { dg-warning "GCC does not currently > support mixed size types for 'simd' function" "" { target aarch64*-*-* } } */ > +/* { dg-warning "unsupported argument type '__int128' for simd" "" { target > i?86-*-* x86_64-*-* } .-1 } */ > + > +#pragma omp declare simd uniform (x) > +extern int baz (__int128 x); > + > +#pragma omp declare simd > +int > +bar (int x) > +{ > + return x + foo (0) + baz (0); > +} > --- gcc/testsuite/gcc.dg/gomp/pr59669-2.c.jj 2020-01-12 > 11:54:37.430398065 +0100 > +++ gcc/testsuite/gcc.dg/gomp/pr59669-2.c 2021-03-13 > 09:35:07.100807877 +0100 > @@ -7,4 +7,3 @@ void > bar (int *a) > { > } > -/* { dg-warning "GCC does not currently support mixed size types for 'simd' > functions" "" { target aarch64*-*-* } .-3 } */ > --- gcc/testsuite/gcc.dg/gomp/simd-clones-2.c.jj 2020-01-12 > 11:54:37.431398050 +0100 > +++ gcc/testsuite/gcc.dg/gomp/simd-clones-2.c 2021-03-13 > 09:36:21.143988256 +0100 > @@ -15,7 +15,6 @@ float setArray(float *a, float x, int k) >return a[k]; > } > > -/* { dg-warning "GCC does not currently support mixed size types for 'simd' > functions" "" { target aarch64*-*-* } .-6 } */ > /* { dg-final { scan-tree-dump "_ZGVbN4ua32vl_setArray" "optimized" > { target i?86-*-* x86_64-*-* } } } */ > /* { dg-final { scan-tree-dump "_ZGVbN4vvva32_addit" "optimized" { target > i?86-*-* x86_64-*-* } } } */ > /* { dg-final { scan-tree-dump "_ZGVbM4vl66u_addit" "optimized" { target > i?86-*-* x86_64-*-* }
Re: [PATCH] IBM Z: Fix "+fvm" constraint with long doubles
On 3/16/21 1:16 AM, Ilya Leoshkevich wrote: > Bootstrapped and regtested on s390x-redhat-linux. Ok for master? > > > > When a long double is passed to an asm statement with a "+fvm" > constraint, a LRA loop occurs. This happens, because LRA chooses the > widest register class in this case (VEC_REGS), but the code generated > by s390_md_asm_adjust() always wants FP_REGS. Mismatching register > classes cause infinite reloading. > > Fix by treating "fv" constraints as "v" in s390_md_asm_adjust(). > > gcc/ChangeLog: > > * config/s390/s390.c (f_constraint_p): Treat "fv" constraints > as "v". > > gcc/testsuite/ChangeLog: > > * gcc.target/s390/vector/long-double-asm-fprvrmem.c: New test. Ok. Thanks! Andreas > --- > gcc/config/s390/s390.c | 12 ++-- > .../s390/vector/long-double-asm-fprvrmem.c | 11 +++ > 2 files changed, 21 insertions(+), 2 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c > > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c > index 151136bedbc..f7b1c03561e 100644 > --- a/gcc/config/s390/s390.c > +++ b/gcc/config/s390/s390.c > @@ -16714,13 +16714,21 @@ s390_shift_truncation_mask (machine_mode mode) > static bool > f_constraint_p (const char *constraint) > { > + bool seen_f_p = false; > + bool seen_v_p = false; > + >for (size_t i = 0, c_len = strlen (constraint); i < c_len; > i += CONSTRAINT_LEN (constraint[i], constraint + i)) > { >if (constraint[i] == 'f') > - return true; > + seen_f_p = true; > + if (constraint[i] == 'v') > + seen_v_p = true; > } > - return false; > + > + /* Treat "fv" constraints as "v", because LRA will choose the widest > register > + * class. */ > + return seen_f_p && !seen_v_p; > } > > /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" > diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c > b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c > new file mode 100644 > index 000..f95656c5723 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=z14 -mzarch" } */ > + > +long double > +foo (long double x) > +{ > + x = x * x; > + asm("# %0" : "+fvm"(x)); > + x = x + x; > + return x; > +} >