Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
On 6/29/23 6:55 PM, Richard Biener wrote: On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod wrote: From: Richard Biener Date: Tuesday, June 27, 2023 at 12:58 PM To: Tejas Belagod Cc: gcc-patches@gcc.gnu.org Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod wrote: From: Richard Biener Date: Monday, June 26, 2023 at 2:23 PM To: Tejas Belagod Cc: gcc-patches@gcc.gnu.org Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches wrote: Hi, Packed Boolean Vectors -- I'd like to propose a feature addition to GNU Vector extensions to add packed boolean vectors (PBV). This has been discussed in the past here[1] and a variant has been implemented in Clang recently[2]. With predication features being added to vector architectures (SVE, MVE, AVX), it is a useful feature to have to model predication on targets. This could find its use in intrinsics or just used as is as a GNU vector extension being mapped to underlying target features. For example, the packed boolean vector could directly map to a predicate register on SVE. Also, this new packed boolean type GNU extension can be used with SVE ACLE intrinsics to replace a fixed-length svbool_t. Here are a few options to represent the packed boolean vector type. The GIMPLE frontend uses a new 'vector_mask' attribute: typedef int v8si __attribute__((vector_size(8*sizeof(int; typedef v8si v8sib __attribute__((vector_mask)); it get's you a vector type that's the appropriate (dependent on the target) vector mask type for the vector data type (v8si in this case). Thanks Richard. Having had a quick look at the implementation, it does seem to tick the boxes. I must admit I haven't dug deep, but if the target hook allows the mask to be defined in way that is target-friendly (and I don't know how much effort it will be to migrate the attribute to more front-ends), it should do the job nicely. Let me go back and dig a bit deeper and get back with questions if any. Let me add that the advantage of this is the compiler doesn't need to support weird explicitely laid out packed boolean vectors that do not match what the target supports and the user doesn't need to know what the target supports (and thus have an #ifdef maze around explicitely specified layouts). Sorry for the delayed response – I spent a day experimenting with vector_mask. Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough to avoid having to sprinkle the code with ifdefs. It does remove some flexibility though, for example with -mavx512f -mavx512vl you'll get AVX512 style masks for V4SImode data vectors but of course the target sill supports SSE2/AVX2 style masks as well, but those would not be available as "packed boolean vectors", though they are of course in fact equal to V4SImode data vectors with -1 or 0 values, so in this particular case it might not matter. That said, the vector_mask attribute will get you V4SImode vectors with signed boolean elements of 32 bits for V4SImode data vectors with SSE2/AVX2. This sounds very much like what the scenario would be with NEON vs SVE. Coming to think of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the ‘base’ vector type and a ‘w’ specified for the type. Given its current implementation, if vector_mask is exposed to the CFE, would there be any major challenges wrt implementation or defining behaviour semantics? I played around with a few examples from the testsuite and wrote some new ones. I mostly tried operations that the new type would have to support (unary, binary bitwise, initializations etc) – with a couple of exceptions most of the ops seem to be supported. I also triggered a couple of ICEs in some tests involving implicit conversions to wider/narrower vector_mask types (will raise reports for these). Correct me if I’m wrong here, but we’d probably have to support a couple of new ops if vector_mask is exposed to the CFE – initialization and subscript operations? Yes, either that or restrict how the mask vectors can be used, thus properly diagnose improper uses. Indeed. A question would be for example how to write common mask test operations like if (any (mask)) or if (all (mask)). I see 2 options here. New builtins could support new types - they'd provide a target independent way to test any and all conditions. Another would be to let the target use its intrinsics to do them in the most efficient way possible (which the builtins would get lowered down to anyway). Likewise writing merge operations - do those as a = a | (mask ? b : 0); thus use ternary ?: for this? Yes, like now, the ternary could just translate to {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... } One thing to flesh out is the semantics. Should we allow this operation as long as the num
[PATCH, rs6000] Extract the element in dword0 by mfvsrd and shift/mask [PR110331]
Hi, This patch implements the vector element extraction by mfvsrd and shift/mask when the element is in dword0 of the vector. Originally, it generates vsplat/mfvsrd on P8 and li/vextract on P9. Since mfvsrd has lower latency than vextract and rldicl has lower latency than vsplat, the new sequence has the benefit. Specially, the shift/mask is no need when the element is the first element of dword0. So it saves another rldicl when it returns a sign extend value. This patch is based on previous one. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622101.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Extract the element in dword0 by mfvsrd and shift/mask gcc/ PR target/110331 * config/rs6000/rs6000-protos.h (rs6000_vsx_element_in_dword0_p): Declare. (rs6000_vsx_extract_element_from_dword0): Declare. * config/rs6000/rs6000.cc (rs6000_vsx_element_in_dword0_p): New function to judge if an element is in dword0 of a vector. (rs6000_vsx_extract_element_from_dword0): Extract an element from dword0 by mfvsrd and lshiftrt and mask. * config/rs6000/rs6000.md (*rotl3_mask): Rename to... (rotl3_mask): ...this * config/rs6000/vsx.md (vsx_extract_): Add a comment. (split pattern for p9 vector extract): Call rs6000_vsx_extract_element_from_dword0 if the element is in dword0. (*vsx_extract__di_p9): Exclude the elements in dword0 which are processed by *vsx_extract__zero_extend for both p8 and p9. (*vsx_extract__zero_extend): Zero extend pattern for vector extract on the element of dword0. (*vsx_extract__p8): Call rs6000_vsx_extract_element_from_dword0 when the extracted element is in dword0. Refined the pattern and remove reload_completed from split condition. gcc/testsuite/ PR target/110331 * gcc.target/powerpc/fold-vec-extract-char.p8.c: Set the extracted elements in dword1. * gcc.target/powerpc/fold-vec-extract-char.p9.c: Likewise. * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise. * gcc.target/powerpc/fold-vec-extract-int.p9.c: Likewise. * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise. * gcc.target/powerpc/fold-vec-extract-short.p9.c: Likewise. * gcc.target/powerpc/p9-extract-1.c: Likewise. * gcc.target/powerpc/pr110331-p8.c: New. * gcc.target/powerpc/pr110331-p9.c: New. * gcc.target/powerpc/pr110331.h: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index f70118ea40f..ccef280122b 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -161,6 +161,8 @@ extern bool rs6000_function_pcrel_p (struct function *); extern bool rs6000_pcrel_p (void); extern bool rs6000_fndecl_pcrel_p (const_tree); extern void rs6000_output_addr_vec_elt (FILE *, int); +extern bool rs6000_vsx_element_in_dword0_p (rtx, enum machine_mode); +extern void rs6000_vsx_extract_element_from_dword0 (rtx, rtx, rtx, bool); /* Different PowerPC instruction formats that are used by GCC. There are various other instruction formats used by the PowerPC hardware, but these diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 07c3a3d15ac..fad01d6b5dd 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -29098,6 +29098,74 @@ rs6000_opaque_type_invalid_use_p (gimple *stmt) return false; } +/* Return true when the element is in dword0 of a vector. Exclude word + element 1 of VS4SI as the word can be extracted by mfvsrwz directly. */ + +bool +rs6000_vsx_element_in_dword0_p (rtx op, enum machine_mode mode) +{ + gcc_assert (CONST_INT_P (op)); + gcc_assert (mode == V16QImode || mode == V8HImode || mode == V4SImode); + + int units = GET_MODE_NUNITS (mode); + int elt = INTVAL (op); + elt = BYTES_BIG_ENDIAN ? units - 1 - elt : elt; + + if (elt > units / 2 + || (elt == units / 2 && mode != V4SImode)) +return true; + else +return false; +} + +/* Extract element from dword0 by mfvsrd and lshiftrt and mask. Extend_p + indicates if zero extend is needed or not. */ + +void +rs6000_vsx_extract_element_from_dword0 (rtx dest, rtx src, rtx element, + bool extend_p) +{ + enum machine_mode mode = GET_MODE (src); + gcc_assert (rs6000_vsx_element_in_dword0_p (element, mode)); + + enum machine_mode dest_mode = GET_MODE (dest); + enum machine_mode inner_mode = GET_MODE_INNER (mode); + int units = GET_MODE_NUNITS (mode); + int elt = INTVAL (element); + elt = BYTES_BIG_ENDIAN ? units - 1 - elt : elt; + int value, shift; + unsigned int mask; + + rtx vec_tmp = gen_lowpart (V2DImode, src); + rtx tmp1 = can_create_pseudo_p () +? gen_reg_rtx (DImode) +: simplify_gen_subreg (DImode, dest, dest_mode,
[PATCH] Middle-end: Change order of LEN_MASK_LOAD/LEN_MASK_STORE arguments
From: Ju-Zhe Zhong Hi, Richard and Richi. According to Richard's review comments: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html current len, bias and mask order is not reasonable. Change {len,mask,bias} into {len,bias,mask}. This patch is a simple fix patch, ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Change order of LEN_MASK_LOAD/LEN_MASK_STORE arguments. * config/riscv/riscv-v.cc (expand_load_store): Ditto. * doc/md.texi: Ditto. * gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Ditto. * internal-fn.cc (len_maskload_direct): Ditto. (len_maskstore_direct): Ditto. (add_len_bias_and_mask_args): Ditto. (expand_partial_load_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_len_index): Ditto. (internal_fn_bias_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_len_load_store_bias): Ditto. * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto. * tree-vect-stmts.cc (vectorizable_store): Ditto. (vectorizable_load): Ditto. --- gcc/config/riscv/autovec.md | 8 +- gcc/config/riscv/riscv-v.cc | 2 +- gcc/doc/md.texi | 16 ++-- gcc/gimple-fold.cc | 7 +- gcc/internal-fn.cc | 176 +--- gcc/tree-ssa-dse.cc | 13 +-- gcc/tree-vect-stmts.cc | 6 +- 7 files changed, 90 insertions(+), 138 deletions(-) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 1488f2be1be..4ab0e9f99eb 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -26,8 +26,8 @@ [(match_operand:V 0 "register_operand") (match_operand:V 1 "memory_operand") (match_operand 2 "autovec_length_operand") - (match_operand: 3 "vector_mask_operand") - (match_operand 4 "const_0_operand")] + (match_operand 3 "const_0_operand") + (match_operand: 4 "vector_mask_operand")] "TARGET_VECTOR" { riscv_vector::expand_load_store (operands, true); @@ -38,8 +38,8 @@ [(match_operand:V 0 "memory_operand") (match_operand:V 1 "register_operand") (match_operand 2 "autovec_length_operand") - (match_operand: 3 "vector_mask_operand") - (match_operand 4 "const_0_operand")] + (match_operand 3 "const_0_operand") + (match_operand: 4 "vector_mask_operand")] "TARGET_VECTOR" { riscv_vector::expand_load_store (operands, false); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index adb8d7d36a5..8d5bed7ebe4 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -2777,7 +2777,7 @@ expand_load_store (rtx *ops, bool is_load) { poly_int64 value; rtx len = ops[2]; - rtx mask = ops[3]; + rtx mask = ops[4]; machine_mode mode = GET_MODE (ops[0]); if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index b44d1ba3af9..f14dd32b2dc 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5319,15 +5319,15 @@ This pattern is not allowed to @code{FAIL}. @cindex @code{len_maskload@var{m}@var{n}} instruction pattern @item @samp{len_maskload@var{m}@var{n}} Perform a masked load from the memory location pointed to by operand 1 -into register operand 0. (operand 2 + operand 4) elements are loaded from +into register operand 0. (operand 2 + operand 3) elements are loaded from memory and other elements in operand 0 are set to undefined values. This is a combination of len_load and maskload. Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 has whichever integer mode the target prefers. A mask is specified in -operand 3 which must be of type @var{n}. The mask has lower precedence than +operand 4 which must be of type @var{n}. The mask has lower precedence than the length and is itself subject to length masking, -i.e. only mask indices < (operand 2 + operand 4) are used. -Operand 4 conceptually has mode @code{QI}. +i.e. only mask indices < (operand 2 + operand 3) are used. +Operand 3 conceptually has mode @code{QI}. Operand 2 can be a variable or a constant amount. Operand 4 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on @@ -5346,14 +5346,14 @@ This pattern is not allowed to @code{FAIL}. @cindex @code{len_maskstore@var{m}@var{n}} instruction pattern @item @samp{len_maskstore@var{m}@var{n}} Perform a masked store from vector register operand 1 into memory operand 0. -(operand 2 + operand 4) elements are stored to memory +(operand 2 + operand 3) elements are stored to memory and leave the other elements of operand 0 unchanged. This is a combination of len_store and maskstore. Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 2 has whichever -integer mode the target prefers. A mask is specified in operand 3 which must be +integer mode the target
Re: [PATCH 0/9] vect: Move costing next to the transform for vect load
Hi Richi, Thanks for your review comments on this and some others! on 2023/6/30 19:37, Richard Biener wrote: > On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin wrote: >> >> This patch series follows Richi's suggestion at the link [1], >> which suggest structuring vectorizable_load to make costing >> next to the transform, in order to make it easier to keep >> costing and the transform in sync. For now, it's a known >> issue that what we cost can be inconsistent with what we >> transform, as the case in PR82255 and some other associated >> test cases in the patches of this series show. >> >> Basically this patch series makes costing not call function >> vect_model_load_cost any more. To make the review and >> bisection easy, I organized the changes according to the >> memory access types of vector load. For each memory access >> type, firstly it follows the handlings in the function >> vect_model_load_costto avoid any missing, then refines >> further by referring to the transform code, I also checked >> them with some typical test cases to verify. Hope the >> subjects of patches are clear enough. >> >> The whole series can be bootstrapped and regtested >> incrementally on: >> - x86_64-redhat-linux >> - aarch64-linux-gnu >> - powerpc64-linux-gnu P7, P8 and P9 >> - powerpc64le-linux-gnu P8, P9 and P10 >> >> By considering the current vector test buckets are mainly >> tested without cost model, I also verified the whole patch >> series was neutral for SPEC2017 int/fp on Power9 at O2, >> O3 and Ofast separately. > > I went through the series now and I like it overall (well, I suggested > the change). > Looking at the changes I think we want some followup to reduce the > mess in the final loop nest. We already have some VMAT_* cases handled > separately, maybe we can split out some more cases. Maybe we should At first glance, the simple parts look to be the handlings for VMAT_LOAD_STORE_LANES, and VMAT_GATHER_SCATTER (with ifn and emulated). It seems a bit straightforward if it's fine to duplicate the nested loop, but may need to care about removing some useless code. > bite the bullet and duplicate that loop nest for the different VMAT_* cases. > Maybe we can merge some of the if (!costing_p) checks by clever > re-ordering. I've tried a bit to merge them if possible, like the place to check VMAT_CONTIGUOUS, VMAT_CONTIGUOUS_REVERSE and VMAT_CONTIGUOUS_PERMUTE. But will keep in mind for the following updates. > So what > this series doesn't improve is overall readability of the code (indent and our > 80 char line limit). Sorry about that. > > The change also makes it more difficult(?) to separate analysis and transform > though in the end I hope that analysis will actually "code generate" to a > (SLP) > data structure so the target will have a chance to see the actual flow of > insns. > > That said, I'd like to hear from Richard whether he thinks this is a step > in the right direction. > > Are you willing to followup with doing the same re-structuring to > vectorizable_store? Yes, vectorizable_store was also pointed out in your original suggestion [1], I planned to update this once this series meets your expectations and gets landed. > > OK from my side with the few comments addressed. The patch likely needs > refresh > after the RVV changes in this area? Thanks! Yes, I've updated 2/9 and 3/9 according to your comments, and updated 5/9 and 9/9 as they had some conflicts when rebasing. Re-testing is ongoing, do the updated versions look good to you? Is this series ok for trunk if all the test runs go well again as before? BR, Kewen
Re: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
Hi Richard, on 2023/7/2 16:58, Richard Sandiford wrote: > Kewen Lin writes: >> @@ -9743,11 +9739,23 @@ vectorizable_load (vec_info *vinfo, >>unsigned int n_groups = 0; >>for (j = 0; j < ncopies; j++) >> { >> - if (nloads > 1) >> + if (nloads > 1 && !costing_p) >> vec_alloc (v, nloads); >>gimple *new_stmt = NULL; >>for (i = 0; i < nloads; i++) >> { >> + if (costing_p) >> +{ >> + if (VECTOR_TYPE_P (ltype)) >> +vect_get_load_cost (vinfo, stmt_info, 1, >> +alignment_support_scheme, misalignment, >> +false, &inside_cost, nullptr, cost_vec, >> +cost_vec, true); >> + else >> +inside_cost += record_stmt_cost (cost_vec, 1, scalar_load, >> + stmt_info, 0, vect_body); >> + continue; >> +} > > Just a note that this might make life harder for AArch64 costing. > Strided SLP loads are still equivalent to vector loads for AArch64, > since they happen on the FPR/vector side even if they have integral > modes. > > But I agree this is more accurate from a general target-independent POV, > especially given the relatively coarse-grain costing enum. So I think > that's just something AArch64 will need to account for. Sorry for the inconvenience. Hope accounting for it with target hook on vect costing isn't very complicated. BR, Kewen
[PATCH 9/9 v2] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS
This is version v2 rebasing from latest trunk. = This patch adjusts the cost handling on VMAT_CONTIGUOUS in function vectorizable_load. We don't call function vect_model_load_cost for it any more. It removes function vect_model_load_cost which becomes useless and unreachable now. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_load_cost): Remove. (vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS without calling vect_model_load_cost. --- gcc/tree-vect-stmts.cc | 94 -- 1 file changed, 9 insertions(+), 85 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 0a9a75ce3c7..9dfe1903181 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1118,75 +1118,6 @@ vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, int ncopies, } } - -/* Function vect_model_load_cost - - Models cost for loads. In the case of grouped accesses, one access has - the overhead of the grouped access attributed to it. Since unaligned - accesses are supported for loads, we also account for the costs of the - access scheme chosen. */ - -static void -vect_model_load_cost (vec_info *vinfo, - stmt_vec_info stmt_info, unsigned ncopies, poly_uint64 vf, - vect_memory_access_type memory_access_type, - dr_alignment_support alignment_support_scheme, - int misalignment, - slp_tree slp_node, - stmt_vector_for_cost *cost_vec) -{ - gcc_assert (memory_access_type == VMAT_CONTIGUOUS); - - unsigned int inside_cost = 0, prologue_cost = 0; - bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); - - gcc_assert (cost_vec); - - /* ??? Somehow we need to fix this at the callers. */ - if (slp_node) -ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); - - if (slp_node && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()) -{ - /* If the load is permuted then the alignment is determined by -the first group element not by the first scalar stmt DR. */ - stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); - if (!first_stmt_info) - first_stmt_info = stmt_info; - /* Record the cost for the permutation. */ - unsigned n_perms, n_loads; - vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL, - vf, true, &n_perms, &n_loads); - inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm, - first_stmt_info, 0, vect_body); - - /* And adjust the number of loads performed. This handles -redundancies as well as loads that are later dead. */ - ncopies = n_loads; -} - - /* Grouped loads read all elements in the group at once, - so we want the DR for the first statement. */ - stmt_vec_info first_stmt_info = stmt_info; - if (!slp_node && grouped_access_p) -first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); - - /* True if we should include any once-per-group costs as well as - the cost of the statement itself. For SLP we only get called - once per group anyhow. */ - bool first_stmt_p = (first_stmt_info == stmt_info); - - vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, - misalignment, first_stmt_p, &inside_cost, &prologue_cost, - cost_vec, cost_vec, true); - - if (dump_enabled_p ()) -dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_load_cost: inside_cost = %d, " - "prologue_cost = %d .\n", inside_cost, prologue_cost); -} - - /* Calculate cost of DR's memory access. */ void vect_get_load_cost (vec_info *, stmt_vec_info stmt_info, int ncopies, @@ -10830,7 +10761,8 @@ vectorizable_load (vec_info *vinfo, we only need to count it once for the whole group. */ bool first_stmt_info_p = first_stmt_info == stmt_info; bool add_realign_cost = first_stmt_info_p && i == 0; - if (memory_access_type == VMAT_CONTIGUOUS_REVERSE + if (memory_access_type == VMAT_CONTIGUOUS + || memory_access_type == VMAT_CONTIGUOUS_REVERSE || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE && (!grouped_load || first_stmt_info_p))) vect_get_load_cost (vinfo, stmt_info, 1, @@ -10954,15 +10886,14 @@ vectorizable_load (vec_info *vinfo, direct vect_transform_slp_perm_load to DCE the unused parts. ??? This is a hack to prevent compile-time issues as seen in PR101120 and friends. */ - if (costing_p - && memory_access_type != VMAT_CONTIGUOUS) + if (costing_p) { vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf,
[PATCH 5/9 v2] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
This is version v2 rebasing from latest trunk. = This patch adjusts the cost handling on VMAT_GATHER_SCATTER in function vectorizable_load. We don't call function vect_model_load_cost for it any more. It's mainly for gather loads with IFN or emulated gather loads, it follows the handlings in function vect_model_load_cost. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on VMAT_GATHER_SCATTER without calling vect_model_load_cost. (vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER, remove VMAT_GATHER_SCATTER related handlings and the related parameter gs_info. --- gcc/tree-vect-stmts.cc | 124 + 1 file changed, 76 insertions(+), 48 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 7d8e72bda67..1ae917db627 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1132,11 +1132,10 @@ vect_model_load_cost (vec_info *vinfo, vect_memory_access_type memory_access_type, dr_alignment_support alignment_support_scheme, int misalignment, - gather_scatter_info *gs_info, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_INVARIANT && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP); @@ -1225,35 +1224,9 @@ vect_model_load_cost (vec_info *vinfo, group_size); } - /* The loads themselves. */ - if (memory_access_type == VMAT_GATHER_SCATTER) -{ - tree vectype = STMT_VINFO_VECTYPE (stmt_info); - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - if (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST && !gs_info->decl) - /* For emulated gathers N offset vector element extracts - (we assume the scalar scaling and ptr + offset add is consumed by - the load). */ - inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits, -vec_to_scalar, stmt_info, 0, -vect_body); - /* N scalar loads plus gathering them into a vector. */ - inside_cost += record_stmt_cost (cost_vec, - ncopies * assumed_nunits, - scalar_load, stmt_info, 0, vect_body); -} - else -vect_get_load_cost (vinfo, stmt_info, ncopies, - alignment_support_scheme, misalignment, first_stmt_p, - &inside_cost, &prologue_cost, - cost_vec, cost_vec, true); - - if (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST - && !gs_info->decl) -inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct, -stmt_info, 0, vect_body); + vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, + misalignment, first_stmt_p, &inside_cost, &prologue_cost, + cost_vec, cost_vec, true); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -10283,6 +10256,7 @@ vectorizable_load (vec_info *vinfo, } tree vec_mask = NULL_TREE; poly_uint64 group_elt = 0; + unsigned int inside_cost = 0; for (j = 0; j < ncopies; j++) { /* 1. Create the vector or array pointer update chain. */ @@ -10414,23 +10388,26 @@ vectorizable_load (vec_info *vinfo, /* Record that VEC_ARRAY is now dead. */ vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); } - else if (!costing_p) + else { for (i = 0; i < vec_num; i++) { tree final_mask = NULL_TREE; - if (loop_masks - && memory_access_type != VMAT_INVARIANT) - final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, -vec_num * ncopies, -vectype, vec_num * j + i); - if (vec_mask) - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, - final_mask, vec_mask, gsi); - - if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, - gsi, stmt_info, bump); + if (!costing_p) + { + if (loop_masks && memory_access_type != VMAT_INVARIANT) + final_mask +
[PATCH 3/9 v2] vect: Adjust vectorizable_load costing on VMAT_INVARIANT
Hi Richi, on 2023/6/30 19:18, Richard Biener wrote: > On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin wrote: >> >> This patch adjusts the cost handling on VMAT_INVARIANT in >> function vectorizable_load. We don't call function >> vect_model_load_cost for it any more. >> >> To make the costing on VMAT_INVARIANT better, this patch is >> to query hoist_defs_of_uses for hoisting decision, and add >> costs for different "where" based on it. Currently function >> hoist_defs_of_uses would always hoist the defs of all SSA >> uses, adding one argument HOIST_P aims to avoid the actual >> hoisting during costing phase. >> >> gcc/ChangeLog: >> >> * tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P. >> (vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect >> hoisting decision and without calling vect_model_load_cost. >> (vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more >> and remove VMAT_INVARIANT related handlings. >> --- >> gcc/tree-vect-stmts.cc | 61 +++--- >> 1 file changed, 39 insertions(+), 22 deletions(-) >> >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index 744cdf40e26..19c61d703c8 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -1135,7 +1135,8 @@ vect_model_load_cost (vec_info *vinfo, >> slp_tree slp_node, >> stmt_vector_for_cost *cost_vec) >> { >> - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl); >> + gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) >> + && memory_access_type != VMAT_INVARIANT); >> >>unsigned int inside_cost = 0, prologue_cost = 0; >>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); >> @@ -1238,16 +1239,6 @@ vect_model_load_cost (vec_info *vinfo, >>ncopies * assumed_nunits, >>scalar_load, stmt_info, 0, vect_body); >> } >> - else if (memory_access_type == VMAT_INVARIANT) >> -{ >> - /* Invariant loads will ideally be hoisted and splat to a vector. */ >> - prologue_cost += record_stmt_cost (cost_vec, 1, >> -scalar_load, stmt_info, 0, >> -vect_prologue); >> - prologue_cost += record_stmt_cost (cost_vec, 1, >> -scalar_to_vec, stmt_info, 0, >> -vect_prologue); >> -} >>else >> vect_get_load_cost (vinfo, stmt_info, ncopies, >> alignment_support_scheme, misalignment, first_stmt_p, >> @@ -9121,10 +9112,11 @@ permute_vec_elements (vec_info *vinfo, >> /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP, >> inserting them on the loops preheader edge. Returns true if we >> were successful in doing so (and thus STMT_INFO can be moved then), >> - otherwise returns false. */ >> + otherwise returns false. HOIST_P indicates if we want to hoist the >> + definitions of all SSA uses, it would be false when we are costing. */ >> >> static bool >> -hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop) >> +hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p) >> { >>ssa_op_iter i; >>tree op; >> @@ -9158,6 +9150,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class >> loop *loop) >>if (!any) >> return true; >> >> + if (!hoist_p) >> +return true; >> + >>FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE) >> { >>gimple *def_stmt = SSA_NAME_DEF_STMT (op); >> @@ -9510,14 +9505,6 @@ vectorizable_load (vec_info *vinfo, >> >>if (memory_access_type == VMAT_INVARIANT) >> { >> - if (costing_p) >> - { >> - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, >> - memory_access_type, alignment_support_scheme, >> - misalignment, &gs_info, slp_node, cost_vec); >> - return true; >> - } >> - >>gcc_assert (!grouped_load && !mask && !bb_vinfo); >>/* If we have versioned for aliasing or the loop doesn't >> have any data dependencies that would preclude this, >> @@ -9525,7 +9512,37 @@ vectorizable_load (vec_info *vinfo, >> thus we can insert it on the preheader edge. */ >>bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) >> && !nested_in_vect_loop >> - && hoist_defs_of_uses (stmt_info, loop)); >> + && hoist_defs_of_uses (stmt_info, loop, !costing_p)); > > 'hoist_defs_of_uses' should ideally be computed once at analysis time and > the result remembered. It's not so easy in this case so maybe just > add a comment > for this here. Ok, updated with: /* If we have versioned for aliasing or t
[PATCH 2/9 v2] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl
Hi Richi, on 2023/6/30 19:11, Richard Biener wrote: > On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin wrote: >> >> This patch adds one extra argument cost_vec to function >> vect_build_gather_load_calls, so that we can do costing >> next to the tranform in vect_build_gather_load_calls. >> For now, the implementation just follows the handlings in >> vect_model_load_cost, it isn't so good, so placing one >> FIXME for any further improvement. This patch should not >> cause any functional changes. >> >> gcc/ChangeLog: >> >> * tree-vect-stmts.cc (vect_build_gather_load_calls): Add the >> handlings >> on costing with one extra argument cost_vec. >> (vectorizable_load): Adjust the call to vect_build_gather_load_calls. >> (vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with >> gs_info.decl set any more. >> --- >> gcc/tree-vect-stmts.cc | 31 +++ >> 1 file changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index 44514658be3..744cdf40e26 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -1135,6 +1135,8 @@ vect_model_load_cost (vec_info *vinfo, >> slp_tree slp_node, >> stmt_vector_for_cost *cost_vec) >> { >> + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl); >> + >>unsigned int inside_cost = 0, prologue_cost = 0; >>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); >> >> @@ -2819,7 +2821,8 @@ vect_build_gather_load_calls (vec_info *vinfo, >> stmt_vec_info stmt_info, >> gimple_stmt_iterator *gsi, >> gimple **vec_stmt, >> gather_scatter_info *gs_info, >> - tree mask) >> + tree mask, >> + stmt_vector_for_cost *cost_vec) >> { >>loop_vec_info loop_vinfo = dyn_cast (vinfo); >>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >> @@ -2831,6 +2834,23 @@ vect_build_gather_load_calls (vec_info *vinfo, >> stmt_vec_info stmt_info, >>poly_uint64 gather_off_nunits >> = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype); >> >> + /* FIXME: Keep the previous costing way in vect_model_load_cost by costing >> + N scalar loads, but it should be tweaked to use target specific costs >> + on related gather load calls. */ >> + if (!vec_stmt) > > going over the series now, I'm collecting comments but wanted to get > this one out here: > I'd rather see if (cost_vec) here, that 'vec_stmt' argument is quite > legacy (I think it can > be completely purged everywhere) > OK, updated with the below new version v2. I guessed you meant we would like to use cost_vec for checking costing or not and use vec_stmts in stmt_info to store the vectorized stmt. diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ec177dcad82..dd8f5421d4e 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1136,6 +1136,8 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl); + unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -2846,7 +2848,8 @@ vect_build_gather_load_calls (vec_info *vinfo, stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, gimple **vec_stmt, gather_scatter_info *gs_info, - tree mask) + tree mask, + stmt_vector_for_cost *cost_vec) { loop_vec_info loop_vinfo = dyn_cast (vinfo); class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); @@ -2858,6 +2861,23 @@ vect_build_gather_load_calls (vec_info *vinfo, stmt_vec_info stmt_info, poly_uint64 gather_off_nunits = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype); + /* FIXME: Keep the previous costing way in vect_model_load_cost by costing + N scalar loads, but it should be tweaked to use target specific costs + on related gather load calls. */ + if (cost_vec) +{ + unsigned int assumed_nunits = vect_nunits_for_cost (vectype); + unsigned int inside_cost; + inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits, + scalar_load, stmt_info, 0, vect_body); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, +"vect_model_load_cost: inside_cost = %d, " +"prologue_cost = 0 .\n", +inside_cost); + return; +} + tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl)); tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl)); tree srctype = T
Re: [PATCH V6] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
Oh. Sorry. I forget update the changeLog. We hope RVV can vectorize such case into following IR: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias) LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias) This change log is incorrect and misleading. As Richard's comments, I have change the order of the arguments into: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask) LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask) Sorry about that. juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-07-03 10:15 To: gcc-patches CC: richard.sandiford; rguenther; pan2.li; Ju-Zhe Zhong Subject: [PATCH V6] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern From: Ju-Zhe Zhong Hi, Richi and Richard. Base one the review comments from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html I add the helper function: /* Add len, bias and mask arguments according to the STMT. */ static unsigned int add_len_bias_and_mask_args (expand_operand *ops, unsigned int opno, gcall *stmt) { internal_fn ifn = gimple_call_internal_fn (stmt); int len_index = internal_fn_len_index (ifn); int bias_index = internal_fn_bias_index (ifn); int mask_index = internal_fn_mask_index (ifn); /* The order of arguments are always {len,bias,mask}. */ if (len_index >= 0) { tree len = gimple_call_arg (stmt, len_index); rtx len_rtx = expand_normal (len); create_convert_operand_from (&ops[opno++], len_rtx, TYPE_MODE (TREE_TYPE (len)), TYPE_UNSIGNED (TREE_TYPE (len))); } if (bias_index >= 0) { tree biast = gimple_call_arg (stmt, bias_index); rtx bias = expand_normal (biast); create_input_operand (&ops[opno++], bias, QImode); } if (mask_index >= 0) { tree mask = gimple_call_arg (stmt, mask_index); rtx mask_rtx = expand_normal (mask); create_input_operand (&ops[opno++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); } return opno; } I think current codes look more reasonable and easier maitain now. This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets handle flow control by mask and loop control by length on gather/scatter memory operations. Consider this following case: #include void f (uint8_t *restrict a, uint8_t *restrict b, int n, int base, int step, int *restrict cond) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i * step + base] = b[i * step + base]; } } We hope RVV can vectorize such case into following IR: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias) LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias) This patch doesn't apply such patterns into vectorizer, just add patterns and update the documents. Will send patch which apply such patterns into vectorizer soon after this patch is approved. Ok for trunk? gcc/ChangeLog: * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store. * internal-fn.cc (add_len_bias_and_mask_args): New function. (expand_scatter_store_optab_fn): Add len_mask_gather_load/len_mask_scatter_store. (expand_gather_load_optab_fn): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_len_index): New function. (internal_fn_bias_index): New function. (internal_fn_stored_value_index): Add len_mask_gather_load/len_mask_scatter_store. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. * internal-fn.h (internal_fn_len_index): New function. (internal_fn_bias_index): Ditto. * optabs.def (OPTAB_CD): Add len_mask_gather_load/len_mask_scatter_store. --- gcc/doc/md.texi | 17 +++ gcc/internal-fn.cc | 108 ++-- gcc/internal-fn.def | 8 +++- gcc/internal-fn.h | 2 + gcc/optabs.def | 2 + 5 files changed, 120 insertions(+), 17 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index cefdee84821..b44d1ba3af9 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5040,6 +5040,15 @@ operand 5. Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be set to zero. +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern +@item @samp{len_mask_gather_load@var{m}@var{n}} +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5), +a bias operand (operand 6) as well as a mask operand (operand 7). Similar to len_maskload, +the instruction loads at most (operand 5 + operand 6) elements from memory. +Bit @var{i} of the mask is set if element @var{i} of the result should +be loaded from
[PATCH V6] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
From: Ju-Zhe Zhong Hi, Richi and Richard. Base one the review comments from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html I add the helper function: /* Add len, bias and mask arguments according to the STMT. */ static unsigned int add_len_bias_and_mask_args (expand_operand *ops, unsigned int opno, gcall *stmt) { internal_fn ifn = gimple_call_internal_fn (stmt); int len_index = internal_fn_len_index (ifn); int bias_index = internal_fn_bias_index (ifn); int mask_index = internal_fn_mask_index (ifn); /* The order of arguments are always {len,bias,mask}. */ if (len_index >= 0) { tree len = gimple_call_arg (stmt, len_index); rtx len_rtx = expand_normal (len); create_convert_operand_from (&ops[opno++], len_rtx, TYPE_MODE (TREE_TYPE (len)), TYPE_UNSIGNED (TREE_TYPE (len))); } if (bias_index >= 0) { tree biast = gimple_call_arg (stmt, bias_index); rtx bias = expand_normal (biast); create_input_operand (&ops[opno++], bias, QImode); } if (mask_index >= 0) { tree mask = gimple_call_arg (stmt, mask_index); rtx mask_rtx = expand_normal (mask); create_input_operand (&ops[opno++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); } return opno; } I think current codes look more reasonable and easier maitain now. This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets handle flow control by mask and loop control by length on gather/scatter memory operations. Consider this following case: #include void f (uint8_t *restrict a, uint8_t *restrict b, int n, int base, int step, int *restrict cond) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i * step + base] = b[i * step + base]; } } We hope RVV can vectorize such case into following IR: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias) LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias) This patch doesn't apply such patterns into vectorizer, just add patterns and update the documents. Will send patch which apply such patterns into vectorizer soon after this patch is approved. Ok for trunk? gcc/ChangeLog: * doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store. * internal-fn.cc (add_len_bias_and_mask_args): New function. (expand_scatter_store_optab_fn): Add len_mask_gather_load/len_mask_scatter_store. (expand_gather_load_optab_fn): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_len_index): New function. (internal_fn_bias_index): New function. (internal_fn_stored_value_index): Add len_mask_gather_load/len_mask_scatter_store. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. * internal-fn.h (internal_fn_len_index): New function. (internal_fn_bias_index): Ditto. * optabs.def (OPTAB_CD): Add len_mask_gather_load/len_mask_scatter_store. --- gcc/doc/md.texi | 17 +++ gcc/internal-fn.cc | 108 ++-- gcc/internal-fn.def | 8 +++- gcc/internal-fn.h | 2 + gcc/optabs.def | 2 + 5 files changed, 120 insertions(+), 17 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index cefdee84821..b44d1ba3af9 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5040,6 +5040,15 @@ operand 5. Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be set to zero. +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern +@item @samp{len_mask_gather_load@var{m}@var{n}} +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5), +a bias operand (operand 6) as well as a mask operand (operand 7). Similar to len_maskload, +the instruction loads at most (operand 5 + operand 6) elements from memory. +Bit @var{i} of the mask is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be undefined. +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as operand 5. Bit @var{i} of the mask is set if element @var{i} of the result should be stored to memory. +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern +@item @samp{len_mask_scatter_store@var{m}@var{n}} +Like @samp{scatter_store@var{m}@var{n}}, but takes an ext
[PATCH] libstdc++: use __is_enum built-in trait
This patch replaces is_enum::value with __is_enum built-in trait in the type_traits header. libstdc++-v3/ChangeLog: * include/std/type_traits (__make_unsigned_selector): Use __is_enum built-in trait. (__make_signed_selector): Likewise. (__underlying_type_impl): Likewise. Signed-off-by: Ken Matsui --- libstdc++-v3/include/std/type_traits | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits index 0e7a9c9c7f3..9f086992ebc 100644 --- a/libstdc++-v3/include/std/type_traits +++ b/libstdc++-v3/include/std/type_traits @@ -1740,7 +1740,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Select between integral and enum: not possible to be both. template::value, - bool _IsEnum = is_enum<_Tp>::value> + bool _IsEnum = __is_enum(_Tp)> class __make_unsigned_selector; template @@ -1900,7 +1900,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Select between integral and enum: not possible to be both. template::value, - bool _IsEnum = is_enum<_Tp>::value> + bool _IsEnum = __is_enum(_Tp)> class __make_signed_selector; template @@ -2353,7 +2353,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct __common_type_fold<_CTp, _Rp, void> { }; - template::value> + template struct __underlying_type_impl { using type = __underlying_type(_Tp); -- 2.41.0
[committed] d: Fix testcase failure of gdc.dg/Wbuiltin_declaration_mismatch2.d.
Hi, Seen at least on aarch64-*-darwin, the parameters used to instantiate the shufflevector intrinsic meant the return type was __vector(int[1]), which resulted in the error: vector type '__vector(int[1])' is not supported on this platform. All instantiations have now been fixed so the expected warning/error is now given by the compiler. Regression tested on x86_64-linux-gnu/-m32, committed to mainline, and backported to releases/gcc-13. Regards, Iain. --- gcc/testsuite/ChangeLog: * gdc.dg/Wbuiltin_declaration_mismatch2.d: Fix failed tests. --- .../gdc.dg/Wbuiltin_declaration_mismatch2.d | 44 +-- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d b/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d index 7b83fffae58..0d12bcb8b07 100644 --- a/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d +++ b/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d @@ -77,32 +77,32 @@ void test_shuffle() void test_shufflevector() { -shufflevector!(int, int4, int)(0, 0, 0); // { dg-warning "mismatch in argument 1" } -shufflevector!(double, int4, int)(0, 0, 0); // { dg-warning "mismatch in argument 1" } -shufflevector!(fake4, int4, int)(f, 0, 0); // { dg-warning "mismatch in argument 1" } - -shufflevector!(int4, int, int)(0, 0, 0); // { dg-warning "mismatch in argument 2" } -shufflevector!(int4, double, int)(0, 0, 0); // { dg-warning "mismatch in argument 2" } -shufflevector!(int4, int4, int)(0, 0, 0); -shufflevector!(int4, short8, int)(0, 0, 0); // { dg-error "mismatch in argument 2" } -shufflevector!(int4, float4, int)(0, 0, 0); // { dg-error "mismatch in argument 2" } -shufflevector!(int4, byte16, int)(0, 0, 0); // { dg-error "mismatch in argument 2" } -shufflevector!(int4, fake4, int)(0, f, 0); // { dg-warning "mismatch in argument 2" } - -shufflevector!(int4, int4, double)(0, 0, 0); // { dg-warning "mismatch in argument 3" } -shufflevector!(int4, int4, int4)(0, 0, 0); // { dg-warning "mismatch in argument 3" } -shufflevector!(int4, int4, short8)(0, 0, 0); // { dg-warning "mismatch in argument 3" } -shufflevector!(int4, int4, float4)(0, 0, 0); // { dg-warning "mismatch in argument 3" } -shufflevector!(int4, int4, byte16)(0, 0, 0); // { dg-warning "mismatch in argument 3" } - -shufflevector!(int4, int4, int, double)(0, 0, 0, 0); // { dg-warning "mismatch in argument 4" } +shufflevector!(int, int4, int, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 1" } +shufflevector!(double, int4, int, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 1" } +shufflevector!(fake4, int4, int, int, int, int)(f, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 1" } + +shufflevector!(int4, int, int, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 2" } +shufflevector!(int4, double, int, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 2" } +shufflevector!(int4, int4, int, int, int, int)(0, 0, 0, 0, 0, 0); +shufflevector!(int4, short8, int, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-error "mismatch in argument 2" } +shufflevector!(int4, float4, int, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-error "mismatch in argument 2" } +shufflevector!(int4, byte16, int, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-error "mismatch in argument 2" } +shufflevector!(int4, fake4, int, int, int, int)(0, f, 0, 0, 0, 0); // { dg-warning "mismatch in argument 2" } + +shufflevector!(int4, int4, double, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 3" } +shufflevector!(int4, int4, int4, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 3" } +shufflevector!(int4, int4, short8, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 3" } +shufflevector!(int4, int4, float4, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 3" } +shufflevector!(int4, int4, byte16, int, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 3" } + +shufflevector!(int4, int4, int, double, int, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 4" } shufflevector!(int4, int4, int, int, double, int)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 5" } shufflevector!(int4, int4, int, int, int, double)(0, 0, 0, 0, 0, 0); // { dg-warning "mismatch in argument 6" } int i; -shufflevector!(int4, int4, int)(0, 0, i); // { dg-error "argument .i. cannot be read at compile time" } -shufflevector!(int4, int4, int)(0, 0, -1u); // { dg-error "element index .-1. is out of bounds" } -shufflevector!(int4, int4, int)(0, 0, 8); // { dg-error "element index .8. is out of bounds" } +shufflevector!(int4, int4, int, int, int, int)(0, 0, i, 0, 0, 0); // { dg-error "argument .i. cannot be read at
[PATCH] Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT) arguments [PR92178]
Dear all, the attached patch fixes a long-standing issue with the order of evaluation of procedure argument expressions and deallocation of allocatable actual arguments passed to allocatable dummies with intent(out) attribute. It is based on an initial patch by Steve, handles issues pointed out by Tobias, and includes a suggestion by Tobias to scan the procedure arguments first to decide whether the creation of temporaries is needed. There is one unresolved issue left that might be more general: it appears to affect character arguments (only) in that quite often there still is no temporary generated. I haven't found the reason why and would like to defer this, unless someone has a good suggestion. Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald From 609ba636927811cddc74fb815cb18809c7d33565 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Sun, 2 Jul 2023 22:14:19 +0200 Subject: [PATCH] Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT) arguments [PR92178] gcc/fortran/ChangeLog: PR fortran/92178 * trans-expr.cc (gfc_conv_procedure_call): Check procedures for allocatable dummy arguments with INTENT(OUT) and move deallocation of actual arguments after evaluation of argument expressions before the procedure is executed. gcc/testsuite/ChangeLog: PR fortran/92178 * gfortran.dg/pr92178.f90: New test. * gfortran.dg/pr92178_2.f90: New test. Co-authored-by: Steven G. Kargl --- gcc/fortran/trans-expr.cc | 52 ++-- gcc/testsuite/gfortran.dg/pr92178.f90 | 83 + gcc/testsuite/gfortran.dg/pr92178_2.f90 | 46 ++ 3 files changed, 177 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/pr92178.f90 create mode 100644 gcc/testsuite/gfortran.dg/pr92178_2.f90 diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index 30946ba3f63..16e8f037cfc 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -6085,9 +6085,10 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, else info = NULL; - stmtblock_t post, clobbers; + stmtblock_t post, clobbers, dealloc_blk; gfc_init_block (&post); gfc_init_block (&clobbers); + gfc_init_block (&dealloc_blk); gfc_init_interface_mapping (&mapping); if (!comp) { @@ -6117,6 +6118,33 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, && UNLIMITED_POLY (sym) && comp && (strcmp ("_copy", comp->name) == 0); + /* First scan argument list for allocatable actual arguments passed to + allocatable dummy arguments with INTENT(OUT). As the corresponding + actual arguments are deallocated before execution of the procedure, we + evaluate actual argument expressions to avoid problems with possible + dependencies. */ + bool force_eval_args = false; + gfc_formal_arglist *tmp_formal; + for (arg = args, tmp_formal = formal; arg != NULL; + arg = arg->next, tmp_formal = tmp_formal ? tmp_formal->next : NULL) +{ + e = arg->expr; + fsym = tmp_formal ? tmp_formal->sym : NULL; + if (e && fsym + && e->expr_type == EXPR_VARIABLE + && fsym->attr.intent == INTENT_OUT + && (fsym->ts.type == BT_CLASS && fsym->attr.class_ok + ? CLASS_DATA (fsym)->attr.allocatable + : fsym->attr.allocatable) + && e->symtree + && e->symtree->n.sym + && gfc_variable_attr (e, NULL).allocatable) + { + force_eval_args = true; + break; + } +} + /* Evaluate the arguments. */ for (arg = args, argc = 0; arg != NULL; arg = arg->next, formal = formal ? formal->next : NULL, ++argc) @@ -6680,7 +6708,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, else tmp = gfc_finish_block (&block); - gfc_add_expr_to_block (&se->pre, tmp); + gfc_add_expr_to_block (&dealloc_blk, tmp); } /* A class array element needs converting back to be a @@ -6980,7 +7008,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, build_empty_stmt (input_location)); } if (tmp != NULL_TREE) - gfc_add_expr_to_block (&se->pre, tmp); + gfc_add_expr_to_block (&dealloc_blk, tmp); } tmp = parmse.expr; @@ -7004,7 +7032,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, void_type_node, gfc_conv_expr_present (e->symtree->n.sym), tmp, build_empty_stmt (input_location)); - gfc_add_expr_to_block (&se->pre, tmp); + gfc_add_expr_to_block (&dealloc_blk, tmp); } } } @@ -7101,6 +7129,21 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, } } + /* If any actual argument of the procedure is allocatable and passed + to an allocatable dummy with INTENT(OUT), we conservatively + evaluate all actual argument expressions before deallocations are + performed and the procedure is executed. This ensures we conform + to F2023:15.5.3, 15.5.4. Create temporaries except for constants, + variables, and functions returnin
[PATCH v7 2/4] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdeps-target=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously representable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. libcpp/ * include/cpplib.h: Add cpp_fdeps_format enum. (cpp_options): Add fdeps_format field (cpp_finish): Add structured dependency fdeps_stream parameter. * include/mkdeps.h (deps_add_module_target): Add flag for whether a module is exported or not. (fdeps_add_target): Add function. (deps_write_p1689r5): Add function. * init.cc (cpp_finish): Add new preprocessor parameter used for C++ module tracking. * mkdeps.cc (mkdeps): Implement P1689R5 output. gcc/ * doc/invoke.texi: Document -fdeps-format=, -fdeps-file=, and -fdeps-target= flags. * gcc.cc: add defaults for -fdeps-target= and -fdeps-file= when only -fdeps-format= is specified. * json.h: Add a TODO item to refactor out to share with `libcpp/mkdeps.cc`. gcc/c-family/ * c-opts.cc (c_common_handle_option): Add fdeps_file variable and -fdeps-format=, -fdeps-file=, and -fdeps-target= parsing. * c.opt: Add -fdeps-format=, -fdeps-file=, and -fdeps-target= flags. gcc/cp/ * module.cc (preprocessed_module): Pass whether the module is exported to dependency tracking. gcc/testsuite/ * g++.dg/modules/depflags-f-MD.C: New test. * g++.dg/modules/depflags-f.C: New test. * g++.dg/modules/depflags-fi.C: New test. * g++.dg/modules/depflags-fj-MD.C: New test. * g++.dg/modules/depflags-fj.C: New test. * g++.dg/modules/depflags-fjo-MD.C: New test. * g++.dg/modules/depflags-fjo.C: New test. * g++.dg/modules/depflags-fo-MD.C: New test. * g++.dg/modules/depflags-fo.C: New test. * g++.dg/modules/depflags-j-MD.C: New test. * g++.dg/modules/depflags-j.C: New test. * g++.dg/modules/depflags-jo-MD.C: New test. * g++.dg/modules/depflags-jo.C: New test. * g++.dg/modules/depflags-o-MD.C: New test. * g++.dg/modules/depflags-o.C: New test. * g++.dg/modules/p1689-1.C: New test. * g++.dg/modules/p1689-1.exp.ddi: New test expectation. * g++.dg/modules/p1689-2.C: New test. * g++.dg/modules/p1689-2.exp.ddi: New test expectation. * g++.dg/modules/p1689-3.C: New test. * g++.dg/modules/p1689-3.exp.ddi: New test expectation. * g++.dg/modules/p1689-4.C: New test. * g++.dg/modules/p1689-4.exp.ddi: New test expectation. * g++.dg/modules/p1689-5.C: New test. * g++.dg/modules/p1689-5.exp.ddi: New test expectation. * g++.dg/modules/modules.exp: Load new P1689 library routines. * g++.dg/modules/test-p1689.py: New tool for validating P1689 output. * lib/modules.exp: Support for validati
[PATCH v7 4/4] c++modules: report module mapper files as a dependency
It affects the build, and if used as a static file, can reliably be tracked using the `-MF` mechanism. gcc/cp/: * mapper-client.cc, mapper-client.h (open_module_client): Accept dependency tracking and track module mapper files as dependencies. * module.cc (make_mapper, get_mapper): Pass the dependency tracking class down. gcc/testsuite/: * g++.dg/modules/depreport-2.modmap: New test. * g++.dg/modules/depreport-2_a.C: New test. * g++.dg/modules/depreport-2_b.C: New test. * g++.dg/modules/test-depfile.py: Support `:|` syntax output when generating modules. Signed-off-by: Ben Boeckel --- gcc/cp/mapper-client.cc | 5 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 18 - .../g++.dg/modules/depreport-2.modmap | 2 ++ gcc/testsuite/g++.dg/modules/depreport-2_a.C | 15 ++ gcc/testsuite/g++.dg/modules/depreport-2_b.C | 14 + gcc/testsuite/g++.dg/modules/test-depfile.py | 20 +++ 7 files changed, 66 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc index 39e80df2d25..92727195246 100644 --- a/gcc/cp/mapper-client.cc +++ b/gcc/cp/mapper-client.cc @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "mapper-client.h" #include "intl.h" +#include "mkdeps.h" #include "../../c++tools/resolver.h" @@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string &name, module_client * module_client::open_module_client (location_t loc, const char *o, + class mkdeps *deps, void (*set_repo) (const char *), char const *full_program_name) { @@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const char *o, errmsg = "opening"; else { + /* Add the mapper file to the dependency tracking. */ + if (deps) + deps_add_dep (deps, name.c_str ()); if (int l = r->read_tuple_file (fd, ident, false)) { if (l > 0) diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h index b32723ce296..a3b0b8adc51 100644 --- a/gcc/cp/mapper-client.h +++ b/gcc/cp/mapper-client.h @@ -55,6 +55,7 @@ public: public: static module_client *open_module_client (location_t loc, const char *option, + class mkdeps *, void (*set_repo) (const char *), char const *); static void close_module_client (location_t loc, module_client *); diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index f3acc4e02fe..77c9edcbc04 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -3969,12 +3969,12 @@ static GTY(()) vec *partial_specializations; /* Our module mapper (created lazily). */ module_client *mapper; -static module_client *make_mapper (location_t loc); -inline module_client *get_mapper (location_t loc) +static module_client *make_mapper (location_t loc, class mkdeps *deps); +inline module_client *get_mapper (location_t loc, class mkdeps *deps) { auto *res = mapper; if (!res) -res = make_mapper (loc); +res = make_mapper (loc, deps); return res; } @@ -14033,7 +14033,7 @@ get_module (const char *ptr) /* Create a new mapper connecting to OPTION. */ module_client * -make_mapper (location_t loc) +make_mapper (location_t loc, class mkdeps *deps) { timevar_start (TV_MODULE_MAPPER); const char *option = module_mapper_name; @@ -14041,7 +14041,7 @@ make_mapper (location_t loc) option = getenv ("CXX_MODULE_MAPPER"); mapper = module_client::open_module_client -(loc, option, &set_cmi_repo, +(loc, option, deps, &set_cmi_repo, (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name) && save_decoded_options[0].arg != progname ? save_decoded_options[0].arg : nullptr); @@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps *lmaps, location_t loc, dump.push (NULL); dump () && dump ("Checking include translation '%s'", path); - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); size_t len = strlen (path); path = canonicalize_header_name (NULL, loc, true, path, len); @@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps *lmaps, static void name_pending_imports (cpp_reader *reader) { - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp
[PATCH v7 1/4] spec: add a spec function to join arguments
When passing `-o` flags to other options, the typical `-o foo` spelling leaves a leading whitespace when replacing elsewhere. This ends up creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't parse properly. When attempting to make a spec function to just remove the leading whitespace, the argument splitting ends up masking the whitespace. However, the intended extension *also* ends up being its own argument. To perform the desired behavior, the arguments need to be concatenated together. gcc/: * gcc.cc (join_spec_func): Add a spec function to join all arguments. Signed-off-by: Ben Boeckel --- gcc/gcc.cc | 15 +++ 1 file changed, 15 insertions(+) diff --git a/gcc/gcc.cc b/gcc/gcc.cc index fdfac0b4fe4..44433b80d61 100644 --- a/gcc/gcc.cc +++ b/gcc/gcc.cc @@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char **); static const char *debug_level_greater_than_spec_func (int, const char **); static const char *dwarf_version_greater_than_spec_func (int, const char **); static const char *find_fortran_preinclude_file (int, const char **); +static const char *join_spec_func (int, const char **); static char *convert_white_space (char *); static char *quote_spec (char *); static char *quote_spec_arg (char *); @@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] = { "debug-level-gt", debug_level_greater_than_spec_func }, { "dwarf-version-gt",dwarf_version_greater_than_spec_func }, { "fortran-preinclude-file", find_fortran_preinclude_file}, + { "join",join_spec_func}, #ifdef EXTRA_SPEC_FUNCTIONS EXTRA_SPEC_FUNCTIONS #endif @@ -10975,6 +10977,19 @@ find_fortran_preinclude_file (int argc, const char **argv) return result; } +/* The function takes any number of arguments and joins them together. */ + +static const char * +join_spec_func (int argc, const char **argv) +{ + char *result = NULL; + + for (int i = 0; i < argc; ++i) +result = reconcat(result, result ? result : "", argv[i], NULL); + + return result; +} + /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string so as to precede every one of them with a backslash. Return the original string or the reallocated one. */ -- 2.40.1
[PATCH v7 0/4] P1689R5 support
Hi, This patch series adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that any TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I've also added patches to include imported module CMI files and the module mapper file as dependencies of the compilation. I briefly looked into adding dependencies on response files as well, but that appeared to need some code contortions to have a `class mkdeps` available before parsing the command line or to keep the information around until one was made. I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. FWIW, Clang as taken an alternate approach with its `clang-scan-deps` tool rather than using the compiler directly. Thanks, --Ben --- v6 -> v7: - rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad discarded if result is unused, 2023-07-02)) - add test cases for patches 3 and 4 (new dependency reporting in `-MF`) - add a Python script to test aspects of generated dependency files - a new `join` spec function to support `-fdeps-*` defaults based on the `-o` flag (needed to strip the leading space that appears otherwise) - note that JSON writing support should be factored out for use by `libcpp` and `gcc` (libiberty?) - use `.ddi` for the extension of `-fdeps-*` output files by default - support defaults for `-fdeps-file=` and `-fdeps-target=` when only `-fdeps-format=` is provided (with tests) - error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`) file as their formats are incompatible - expand the documentation on how the `-fdeps-*` flags should be used v5 -> v6: - rebase onto `master` (585c660f041 (reload1: Change return type of predicate function from int to bool, 2023-06-06)) - fix crash related to reporting imported CMI files as dependencies - rework utf-8 validity to patch the new `cpp_valid_utf8_p` function instead of the core utf-8 decoding routine to reject invalid codepoints (preserves higher-level error detection of invalid utf-8) - harmonize of `fdeps` spelling in flags, variables, comments, etc. - rename `-fdeps-output=` to `-fdeps-target=` v4 -> v5: - add dependency tracking for imported modules to `-MF` - add dependency tracking for static module mapper files given to `-fmodule-mapper=` v3 -> v4: - add missing spaces between function names and arguments v2 -> v3: - changelog entries moved to commit messages - documentation updated/added in the UTF-8 routine editing v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (4): spec: add a spec function to join arguments p1689r5: initial support c++modules: report imported CMI files as dependencies c++modules: report module mapper files as a dependency gcc/c-family/c-opts.cc| 44 +++- gcc/c-family/c.opt| 12 + gcc/cp/mapper-client.cc | 5 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 24 +- gcc/doc/invoke.texi | 27 +++ gcc/gcc.cc| 19 +- gcc/json.h| 3 + gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 4 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + .../g++.dg/modules/depflags-fj-MF-share.C | 6 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depfl
[PATCH v7 3/4] c++modules: report imported CMI files as dependencies
They affect the build, so report them via `-MF` mechanisms. gcc/cp/ * module.cc (do_import): Report imported CMI files as dependencies. gcc/testsuite/ * g++.dg/modules/depreport-1_a.C: New test. * g++.dg/modules/depreport-1_b.C: New test. * g++.dg/modules/test-depfile.py: New tool for validating depfile information. * lib/modules.exp: Support for validating depfile contents. Signed-off-by: Ben Boeckel --- gcc/cp/module.cc | 3 + gcc/testsuite/g++.dg/modules/depreport-1_a.C | 10 + gcc/testsuite/g++.dg/modules/depreport-1_b.C | 12 ++ gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++ gcc/testsuite/lib/modules.exp| 29 +++ 5 files changed, 241 insertions(+) create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 9df60d695b1..f3acc4e02fe 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool outermost) dump () && dump ("CMI is %s", file); if (note_module_cmi_yes || inform_cmi_p) inform (loc, "reading CMI %qs", file); + /* Add the CMI file to the dependency tracking. */ + if (cpp_get_deps (reader)) + deps_add_dep (cpp_get_deps (reader), file); fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY); e = errno; } diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C b/gcc/testsuite/g++.dg/modules/depreport-1_a.C new file mode 100644 index 000..241701728a2 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C @@ -0,0 +1,10 @@ +// { dg-additional-options -fmodules-ts } + +export module Foo; +// { dg-module-cmi Foo } + +export class Base +{ +public: + int m; +}; diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C b/gcc/testsuite/g++.dg/modules/depreport-1_b.C new file mode 100644 index 000..b6e317c6703 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C @@ -0,0 +1,12 @@ +// { dg-additional-options -fmodules-ts } +// { dg-additional-options -MD } +// { dg-additional-options "-MF depreport-1.d" } + +import Foo; + +void foo () +{ + Base b; +} + +// { dg-final { run-check-module-dep-expect-input "depreport-1.d" "gcm.cache/Foo.gcm" } } diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py b/gcc/testsuite/g++.dg/modules/test-depfile.py new file mode 100644 index 000..ea4edb61434 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/test-depfile.py @@ -0,0 +1,187 @@ +import json + + +# Parameters. +ALL_ERRORS = False + + +def _report_error(msg): +'''Report an error.''' +full_msg = 'ERROR: ' + msg +if ALL_ERRORS: +print(full_msg) +else: +raise RuntimeError(full_msg) + + +class Token(object): +pass + + +class Output(Token): +def __init__(self, path): +self.path = path + + +class Input(Token): +def __init__(self, path): +self.path = path + + +class Colon(Token): +pass + + +class Append(Token): +pass + + +class Variable(Token): +def __init__(self, name): +self.name = name + + +class Word(Token): +def __init__(self, name): +self.name = name + + +def validate_depfile(depfile, expect_input=None): +'''Validate a depfile contains some information + +Returns `False` if the information is not found. +''' +with open(depfile, 'r') as fin: +depfile_content = fin.read() + +real_lines = [] +join_line = False +for line in depfile_content.split('\n'): +# Join the line if needed. +if join_line: +line = real_lines.pop() + line + +# Detect line continuations. +join_line = line.endswith('\\') +# Strip line continuation characters. +if join_line: +line = line[:-1] + +# Add to the real line set. +real_lines.append(line) + +# Perform tokenization. +tokenized_lines = [] +for line in real_lines: +tokenized = [] +join_word = False +for word in line.split(' '): +if join_word: +word = tokenized.pop() + ' ' + word + +# Detect word joins. +join_word = word.endswith('\\') +# Strip escape character. +if join_word: +word = word[:-1] + +# Detect `:` at the end of a word. +if word.endswith(':'): +tokenized.append(word[:-1]) +word = word[-1] + +# Add word to the tokenized set. +tokenized.append(word) + +tokenized_lines.append(tokenized) + +# Parse. +ast = [] +for line in tokenized_lines: +kind = None +for token in line: +if token == ':': +kind = 'dependency' +el
Re: [PATCH] tree-ssa-math-opts: Fix up ICE in match_uaddc_usubc [PR110508]
> Am 02.07.2023 um 16:54 schrieb Jakub Jelinek via Gcc-patches > : > > Hi! > > The match_uaddc_usubc matching doesn't require that the second > .{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is > at most one. So, in the weird case where the REALPART_EXPR of it isn't > present, we shouldn't ICE trying to replace that REALPART_EXPR with > REALPART_EXPR of .U{ADD,SUB}C result. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Richard > 2023-07-02 Jakub Jelinek > >PR tree-optimization/110508 >* tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with >REALPART_EXPR opf nlhs if re2 is non-NULL. > >* gcc.dg/pr110508.c: New test. > > --- gcc/tree-ssa-math-opts.cc.jj2023-06-20 11:22:26.726887276 +0200 > +++ gcc/tree-ssa-math-opts.cc2023-07-01 00:29:48.554230914 +0200 > @@ -4856,11 +4856,14 @@ match_uaddc_usubc (gimple_stmt_iterator > gsi_remove (&gsi2, true); > /* Replace the re2 statement with __real__ of the newly added > .UADDC/.USUBC call. */ > - gsi2 = gsi_for_stmt (re2); > - tree rlhs = gimple_assign_lhs (re2); > - g = gimple_build_assign (rlhs, REALPART_EXPR, > - build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs)); > - gsi_replace (&gsi2, g, true); > + if (re2) > +{ > + gsi2 = gsi_for_stmt (re2); > + tree rlhs = gimple_assign_lhs (re2); > + g = gimple_build_assign (rlhs, REALPART_EXPR, > + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs)); > + gsi_replace (&gsi2, g, true); > +} > if (rhs[2]) > { > /* If this is the arg1 + arg2 + (ovf1 + ovf2) or > --- gcc/testsuite/gcc.dg/pr110508.c.jj2023-07-01 00:33:12.494405901 +0200 > +++ gcc/testsuite/gcc.dg/pr110508.c2023-07-01 00:32:24.115075870 +0200 > @@ -0,0 +1,9 @@ > +/* PR tree-optimization/110508 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +void > +foo (unsigned a, unsigned b, unsigned *c, _Bool d) > +{ > + __builtin_addc (a, b, d, c); > +} > >Jakub >
[PATCH] tree-ssa-math-opts: Fix up ICE in match_uaddc_usubc [PR110508]
Hi! The match_uaddc_usubc matching doesn't require that the second .{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is at most one. So, in the weird case where the REALPART_EXPR of it isn't present, we shouldn't ICE trying to replace that REALPART_EXPR with REALPART_EXPR of .U{ADD,SUB}C result. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-07-02 Jakub Jelinek PR tree-optimization/110508 * tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with REALPART_EXPR opf nlhs if re2 is non-NULL. * gcc.dg/pr110508.c: New test. --- gcc/tree-ssa-math-opts.cc.jj2023-06-20 11:22:26.726887276 +0200 +++ gcc/tree-ssa-math-opts.cc 2023-07-01 00:29:48.554230914 +0200 @@ -4856,11 +4856,14 @@ match_uaddc_usubc (gimple_stmt_iterator gsi_remove (&gsi2, true); /* Replace the re2 statement with __real__ of the newly added .UADDC/.USUBC call. */ - gsi2 = gsi_for_stmt (re2); - tree rlhs = gimple_assign_lhs (re2); - g = gimple_build_assign (rlhs, REALPART_EXPR, - build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs)); - gsi_replace (&gsi2, g, true); + if (re2) +{ + gsi2 = gsi_for_stmt (re2); + tree rlhs = gimple_assign_lhs (re2); + g = gimple_build_assign (rlhs, REALPART_EXPR, + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs)); + gsi_replace (&gsi2, g, true); +} if (rhs[2]) { /* If this is the arg1 + arg2 + (ovf1 + ovf2) or --- gcc/testsuite/gcc.dg/pr110508.c.jj 2023-07-01 00:33:12.494405901 +0200 +++ gcc/testsuite/gcc.dg/pr110508.c 2023-07-01 00:32:24.115075870 +0200 @@ -0,0 +1,9 @@ +/* PR tree-optimization/110508 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void +foo (unsigned a, unsigned b, unsigned *c, _Bool d) +{ + __builtin_addc (a, b, d, c); +} Jakub
Re: [PATCH 1/2] xtensa: Fix missing mode warning in "*eqne_INT_MIN"
On Sat, Jul 1, 2023 at 10:21 AM Takayuki 'January June' Suwa wrote: > > gcc/ChangeLog: > > * config/xtensa/xtensa.md (*eqne_INT_MIN): > Add missing ":SI" to the match_operator. > --- > gcc/config/xtensa/xtensa.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master. -- Thanks. -- Max
Re: [PATCH 2/2] xtensa: The use of CLAMPS instruction also requires TARGET_MINMAX, as well as TARGET_CLAMPS
On Sat, Jul 1, 2023 at 10:21 AM Takayuki 'January June' Suwa wrote: > > Because both smin and smax requiring TARGET_MINMAX are essential to the > RTL representation. > > gcc/ChangeLog: > > * config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p): > Simplify. > * config/xtensa/xtensa.md (*xtensa_clamps): > Add TARGET_MINMAX to the condition. > --- > gcc/config/xtensa/xtensa.cc | 7 ++- > gcc/config/xtensa/xtensa.md | 4 ++-- > 2 files changed, 4 insertions(+), 7 deletions(-) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master. -- Thanks. -- Max
[pushed] Darwin, Objective-C: Support -fconstant-cfstrings [PR108743].
Tested on x86_64-darwin and with make pdf and an examination of the revised output. Pushed to trunk, thanks Iain --- 8< --- This support the -fconstant-cfstrings option as used by clang (and expect by some build scripts) as an alias to the target-specific -mconstant-cfstrings. The documentation is also updated to reflect that the 'f' option is only available on Darwin, and to add the 'm' option to the Darwin section of the invocation text. Signed-off-by: Iain Sandoe PR target/108743 gcc/ChangeLog: * config/darwin.opt: Add fconstant-cfstrings alias to mconstant-cfstrings. * doc/invoke.texi: Amend invocation descriptions to reflect that the fconstant-cfstrings is a target-option alias and to add the missing mconstant-cfstrings option description to the Darwin section. --- gcc/config/darwin.opt | 4 gcc/doc/invoke.texi | 27 --- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/gcc/config/darwin.opt b/gcc/config/darwin.opt index feaa95867b5..d655aaef2fb 100644 --- a/gcc/config/darwin.opt +++ b/gcc/config/darwin.opt @@ -33,6 +33,10 @@ fapple-kext Target C++ Var(flag_apple_kext) Generate code for darwin loadable kernel extensions. +fconstant-cfstrings +ObjC ObjC++ Alias(mconstant-cfstrings) +Generate compile-time CFString objects. + iframework Target RejectNegative C ObjC C++ ObjC++ Joined Separate -iframework Add to the end of the system framework include path. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index efcf3bfb3d6..26bcbe26c6c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -4830,10 +4830,12 @@ and Objective-C++ programs: Use @var{class-name} as the name of the class to instantiate for each literal string specified with the syntax @code{@@"@dots{}"}. The default class name is @code{NXConstantString} if the GNU runtime is being used, and -@code{NSConstantString} if the NeXT runtime is being used (see below). The -@option{-fconstant-cfstrings} option, if also present, overrides the -@option{-fconstant-string-class} setting and cause @code{@@"@dots{}"} literals -to be laid out as constant CoreFoundation strings. +@code{NSConstantString} if the NeXT runtime is being used (see below). On +Darwin (macOS, MacOS X) platforms, the @option{-fconstant-cfstrings} option, if +also present, overrides the @option{-fconstant-string-class} setting and cause +@code{@@"@dots{}"} literals to be laid out as constant CoreFoundation strings. +Note that @option{-fconstant-cfstrings} is an alias for the target-specific +@option{-mconstant-cfstrings} equivalent. @opindex fgnu-runtime @item -fgnu-runtime @@ -24118,10 +24120,21 @@ This is by default ON@. @item -gfull Emit debugging information for all symbols and types. +@opindex fconstant-cfstrings +@item -fconstant-cfstrings +The @option{-fconstant-cfstrings} is an alias for @option{-mconstant-cfstrings}. + +@opindex mconstant-cfstrings +@item -mconstant-cfstrings +When the NeXT runtime is being used (the default on these systems), override +any @option{-fconstant-string-class} setting and cause @code{@@"@dots{}"} +literals to be laid out as constant CoreFoundation strings. + +@opindex mmacosx-version-min @item -mmacosx-version-min=@var{version} -The earliest version of MacOS X that this executable will run on -is @var{version}. Typical values of @var{version} include @code{10.1}, -@code{10.2}, and @code{10.3.9}. +The earliest version of MacOS X that this executable will run on is +@var{version}. Typical values supported for @var{version} include @code{12}, +@code{10.12}, and @code{10.5.8}. If the compiler was built to use the system's headers by default, then the default for this option is the system version on which the -- 2.39.2 (Apple Git-143)
[committed] d: Add testcase from PR108962
Hi, This adds testcase from PR108962 into the gdc testsuite. The issue was fixed in r14-2232 and backported to gcc-13. Regtested, committed to mainline and gcc-13 branches. Regards, Iain. --- PR d/108962 gcc/testsuite/ChangeLog: * gdc.dg/pr108962.d: New test. --- gcc/testsuite/gdc.dg/pr108962.d | 13 + 1 file changed, 13 insertions(+) create mode 100644 gcc/testsuite/gdc.dg/pr108962.d diff --git a/gcc/testsuite/gdc.dg/pr108962.d b/gcc/testsuite/gdc.dg/pr108962.d new file mode 100644 index 000..0fefa126b54 --- /dev/null +++ b/gcc/testsuite/gdc.dg/pr108962.d @@ -0,0 +1,13 @@ +// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108962 +// { dg-do compile } +// { dg-options "-fno-exceptions -fdump-tree-original" } +extern(C) void main() +{ +final switch (0) +{ +case 1: +return; +} +} +// { dg-final { scan-tree-dump-times "_d_assert_msg" 1 "original" } } +// { dg-final { scan-tree-dump-not "_d_throw" "original" } } -- 2.39.2
Re: [PATCH] libphobos: Handle Darwin Arm and AArch64 in fibre context asm.
Excerpts from Iain Sandoe's message of Juli 2, 2023 12:22 pm: > Tested on AArch64 (Arm64) Darwin on 11.x, 13.x and master, > OK for trunk? > and backports? > thanks > Iain > > --- 8< --- > > This code currently fails to build because it contains ELF- > specific directives. This patch excludes those directives when > the platform is Darwin. > > We do not expect switching fibres between threads to be safe here > either owing to the possible caching of TLS pointers. > > Signed-off-by: Iain Sandoe > OK. Thanks! Iain.
[PATCH] libphobos: Handle Darwin Arm and AArch64 in fibre context asm.
Tested on AArch64 (Arm64) Darwin on 11.x, 13.x and master, OK for trunk? and backports? thanks Iain --- 8< --- This code currently fails to build because it contains ELF- specific directives. This patch excludes those directives when the platform is Darwin. We do not expect switching fibres between threads to be safe here either owing to the possible caching of TLS pointers. Signed-off-by: Iain Sandoe libphobos/ChangeLog: * libdruntime/config/aarch64/switchcontext.S: Exclude ELF- specific constructs for Darwin. * libdruntime/config/arm/switchcontext.S: Likewise. * libdruntime/core/thread/fiber.d: Disable switching fibres between threads. --- libphobos/libdruntime/config/aarch64/switchcontext.S | 9 - libphobos/libdruntime/config/arm/switchcontext.S | 8 libphobos/libdruntime/core/thread/fiber.d| 1 + 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/libphobos/libdruntime/config/aarch64/switchcontext.S b/libphobos/libdruntime/config/aarch64/switchcontext.S index 5cfa2f698e8..d3bd646bc56 100644 --- a/libphobos/libdruntime/config/aarch64/switchcontext.S +++ b/libphobos/libdruntime/config/aarch64/switchcontext.S @@ -44,7 +44,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see */ .text .global CSYM(fiber_switchContext) +#ifndef __APPLE__ .type CSYM(fiber_switchContext), %function +#endif .align 4 CSYM(fiber_switchContext): .cfi_startproc @@ -77,8 +79,9 @@ CSYM(fiber_switchContext): ldp d15, d14, [sp], #20*8 ret .cfi_endproc +#ifndef __APPLE__ .size CSYM(fiber_switchContext),.-CSYM(fiber_switchContext) - +#endif /** * When generating any kind of backtrace (gdb, exception handling) for * a function called in a Fiber, we need to tell the unwinder to stop @@ -93,11 +96,15 @@ CSYM(fiber_switchContext): .text .global CSYM(fiber_trampoline) .p2align 2 +#ifndef __APPLE__ .type CSYM(fiber_trampoline), %function +#endif CSYM(fiber_trampoline): .cfi_startproc .cfi_undefined x30 // fiber_entryPoint never returns bl CSYM(fiber_entryPoint) .cfi_endproc +#ifndef __APPLE__ .size CSYM(fiber_trampoline),.-CSYM(fiber_trampoline) +#endif diff --git a/libphobos/libdruntime/config/arm/switchcontext.S b/libphobos/libdruntime/config/arm/switchcontext.S index 3f9b35e7334..f1f2060fd97 100644 --- a/libphobos/libdruntime/config/arm/switchcontext.S +++ b/libphobos/libdruntime/config/arm/switchcontext.S @@ -60,11 +60,15 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see */ .text +#ifndef __APPLE__ #if defined(__ARM_PCS_VFP) || (defined(__ARM_PCS) && !defined(__SOFTFP__)) .fpu vfp #endif .global CSYM(fiber_switchContext) .type CSYM(fiber_switchContext), %function +#else +.global CSYM(fiber_switchContext) +#endif .align 4 CSYM(fiber_switchContext): .cfi_sections .debug_frame @@ -111,8 +115,12 @@ CSYM(fiber_switchContext): mov lr, #0 // return by writing lr into pc mov pc, r1 +#ifndef __APPLE__ .fnend .cfi_endproc .size CSYM(fiber_switchContext),.-CSYM(fiber_switchContext) +#else +.cfi_endproc +#endif #endif diff --git a/libphobos/libdruntime/core/thread/fiber.d b/libphobos/libdruntime/core/thread/fiber.d index 4590ff1c052..66fb9dad89d 100644 --- a/libphobos/libdruntime/core/thread/fiber.d +++ b/libphobos/libdruntime/core/thread/fiber.d @@ -1785,6 +1785,7 @@ version (OSX) { version (X86)version = UnsafeFiberMigration; version (X86_64) version = UnsafeFiberMigration; +version (AArch64) version = UnsafeFiberMigration; } version (UnsafeFiberMigration) -- 2.39.2 (Apple Git-143)
Re: [PATCH V5] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > Hi, Richi and Richard. > > This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets > handle flow control by mask and loop control by length on gather/scatter > memory > operations. Consider this following case: > > #include > void > f (uint8_t *restrict a, >uint8_t *restrict b, int n, >int base, int step, >int *restrict cond) > { > for (int i = 0; i < n; ++i) > { > if (cond[i]) > a[i * step + base] = b[i * step + base]; > } > } > > We hope RVV can vectorize such case into following IR: > > loop_len = SELECT_VL > control_mask = comparison > v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias) > LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias) > > This patch doesn't apply such patterns into vectorizer, just add patterns > and update the documents. > > Will send patch which apply such patterns into vectorizer soon after this > patch is approved. > > Thanks. > > --- > gcc/doc/md.texi | 17 > gcc/internal-fn.cc | 67 +++-- > gcc/internal-fn.def | 8 -- > gcc/internal-fn.h | 1 + > gcc/optabs.def | 2 ++ > 5 files changed, 90 insertions(+), 5 deletions(-) > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 9648fdc846a..df41b5251d4 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5040,6 +5040,15 @@ operand 5. Bit @var{i} of the mask is set if element > @var{i} > of the result should be loaded from memory and clear if element @var{i} > of the result should be set to zero. > > +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern > +@item @samp{len_mask_gather_load@var{m}@var{n}} > +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand > (operand 5), > +a mask operand (operand 6) as well as a bias operand (operand 7). Similar > to len_maskload, > +the instruction loads at most (operand 5 + operand 7) elements from memory. > +Bit @var{i} of the mask is set if element @var{i} of the result should > +be loaded from memory and clear if element @var{i} of the result should be > undefined. > +Mask elements @var{i} with i > (operand 5) are ignored. Nit: second i should be @var{i} too. > + > @cindex @code{scatter_store@var{m}@var{n}} instruction pattern > @item @samp{scatter_store@var{m}@var{n}} > Store a vector of mode @var{m} into several distinct memory locations. > @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an > extra mask operand as > operand 5. Bit @var{i} of the mask is set if element @var{i} > of the result should be stored to memory. > > +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern > +@item @samp{len_mask_scatter_store@var{m}@var{n}} > +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand > (operand 5), > +a mask operand (operand 6) as well as a bias operand (operand 7). The > instruction stores > +at most (operand 5 + operand 7) elements of (operand 4) to memory. > +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be > stored. > +Mask elements @var{i} with i > (operand 5) are ignored. Same here. > + > @cindex @code{vec_set@var{m}} instruction pattern > @item @samp{vec_set@var{m}} > Set given field in the vector value. Operand 0 is the vector to modify, > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 9017176dc7a..da3827481e9 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall > *stmt, direct_optab optab) >HOST_WIDE_INT scale_int = tree_to_shwi (scale); >rtx rhs_rtx = expand_normal (rhs); > > - class expand_operand ops[6]; > + class expand_operand ops[8]; >int i = 0; >create_address_operand (&ops[i++], base_rtx); >create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE > (offset))); > @@ -3546,9 +3546,23 @@ expand_scatter_store_optab_fn (internal_fn, gcall > *stmt, direct_optab optab) >create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); >if (mask_index >= 0) > { > + if (optab == len_mask_scatter_store_optab) > + { > + tree len = gimple_call_arg (stmt, internal_fn_len_index (ifn)); > + rtx len_rtx = expand_normal (len); > + create_convert_operand_from (&ops[i++], len_rtx, > +TYPE_MODE (TREE_TYPE (len)), > +TYPE_UNSIGNED (TREE_TYPE (len))); > + } >tree mask = gimple_call_arg (stmt, mask_index); >rtx mask_rtx = expand_normal (mask); >create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE > (mask))); > + if (optab == len_mask_scatter_store_optab) > + { > + tree biast = gimple_call_arg (stmt, gimple_call_num_args (stmt) - 1); > + rtx bias = expand_normal (biast); > + create_input_operand (&ops[i++], bias, QI
Re: [PATCH 0/9] vect: Move costing next to the transform for vect load
Richard Biener writes: > On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin wrote: >> >> This patch series follows Richi's suggestion at the link [1], >> which suggest structuring vectorizable_load to make costing >> next to the transform, in order to make it easier to keep >> costing and the transform in sync. FTR, I was keeping quiet given that this was following an agreed plan :) Thanks for organising the series this way. It made it easier to review. >> For now, it's a known >> issue that what we cost can be inconsistent with what we >> transform, as the case in PR82255 and some other associated >> test cases in the patches of this series show. >> >> Basically this patch series makes costing not call function >> vect_model_load_cost any more. To make the review and >> bisection easy, I organized the changes according to the >> memory access types of vector load. For each memory access >> type, firstly it follows the handlings in the function >> vect_model_load_costto avoid any missing, then refines >> further by referring to the transform code, I also checked >> them with some typical test cases to verify. Hope the >> subjects of patches are clear enough. >> >> The whole series can be bootstrapped and regtested >> incrementally on: >> - x86_64-redhat-linux >> - aarch64-linux-gnu >> - powerpc64-linux-gnu P7, P8 and P9 >> - powerpc64le-linux-gnu P8, P9 and P10 >> >> By considering the current vector test buckets are mainly >> tested without cost model, I also verified the whole patch >> series was neutral for SPEC2017 int/fp on Power9 at O2, >> O3 and Ofast separately. > > I went through the series now and I like it overall (well, I suggested > the change). > Looking at the changes I think we want some followup to reduce the > mess in the final loop nest. We already have some VMAT_* cases handled > separately, maybe we can split out some more cases. Maybe we should > bite the bullet and duplicate that loop nest for the different VMAT_* cases. > Maybe we can merge some of the if (!costing_p) checks by clever > re-ordering. So what > this series doesn't improve is overall readability of the code (indent and our > 80 char line limit). > > The change also makes it more difficult(?) to separate analysis and transform > though in the end I hope that analysis will actually "code generate" to a > (SLP) > data structure so the target will have a chance to see the actual flow of > insns. > > That said, I'd like to hear from Richard whether he thinks this is a step > in the right direction. Yeah, agree that it's probably better on balance. It's going to need a bit of discipline to make sure that we don't accidentally change the IR during the analysis phase, but I guess that already exists to a lesser extent with the “before !vec_stmt”/“after !vec_stmt” split. Thanks, Richard
Re: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
Kewen Lin writes: > @@ -9743,11 +9739,23 @@ vectorizable_load (vec_info *vinfo, >unsigned int n_groups = 0; >for (j = 0; j < ncopies; j++) > { > - if (nloads > 1) > + if (nloads > 1 && !costing_p) > vec_alloc (v, nloads); > gimple *new_stmt = NULL; > for (i = 0; i < nloads; i++) > { > + if (costing_p) > + { > + if (VECTOR_TYPE_P (ltype)) > + vect_get_load_cost (vinfo, stmt_info, 1, > + alignment_support_scheme, misalignment, > + false, &inside_cost, nullptr, cost_vec, > + cost_vec, true); > + else > + inside_cost += record_stmt_cost (cost_vec, 1, scalar_load, > + stmt_info, 0, vect_body); > + continue; > + } Just a note that this might make life harder for AArch64 costing. Strided SLP loads are still equivalent to vector loads for AArch64, since they happen on the FPR/vector side even if they have integral modes. But I agree this is more accurate from a general target-independent POV, especially given the relatively coarse-grain costing enum. So I think that's just something AArch64 will need to account for. Thanks, Richard