Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-02 Thread Tejas Belagod via Gcc-patches

On 6/29/23 6:55 PM, Richard Biener wrote:

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  wrote:






From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:






From: Richard Biener 
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
 wrote:


Hi,

Packed Boolean Vectors
--

I'd like to propose a feature addition to GNU Vector extensions to add packed
boolean vectors (PBV).  This has been discussed in the past here[1] and a 
variant has
been implemented in Clang recently[2].

With predication features being added to vector architectures (SVE, MVE, AVX),
it is a useful feature to have to model predication on targets.  This could
find its use in intrinsics or just used as is as a GNU vector extension being
mapped to underlying target features.  For example, the packed boolean vector
could directly map to a predicate register on SVE.

Also, this new packed boolean type GNU extension can be used with SVE ACLE
intrinsics to replace a fixed-length svbool_t.

Here are a few options to represent the packed boolean vector type.


The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int;
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).



Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.



Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.



Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough

to avoid having to sprinkle the code with ifdefs.


It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.



This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think

of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type

and a ‘w’ specified for the type.



Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any

major challenges wrt implementation or defining behaviour semantics? I played 
around with a

few examples from the testsuite and wrote some new ones. I mostly tried 
operations that

the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions

most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving

implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me

if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed

to the CFE – initialization and subscript operations?


Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses. 


Indeed.

 A question would be for example how to write common mask test

operations like
if (any (mask)) or if (all (mask)). 


I see 2 options here. New builtins could support new types - they'd 
provide a target independent way to test any and all conditions. Another 
would be to let the target use its intrinsics to do them in the most 
efficient way possible (which the builtins would get lowered down to 
anyway).



 Likewise writing merge operations

- do those as

  a = a | (mask ? b : 0);

thus use ternary ?: for this?  


Yes, like now, the ternary could just translate to

  {mask[0] ? b[0] : 0, mask[1] ? b[1] : 0, ... }

One thing to flesh out is the semantics. Should we allow this operation 
as long as the num

[PATCH, rs6000] Extract the element in dword0 by mfvsrd and shift/mask [PR110331]

2023-07-02 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch implements the vector element extraction by mfvsrd and
shift/mask when the element is in dword0 of the vector. Originally,
it generates vsplat/mfvsrd on P8 and li/vextract on P9. Since mfvsrd
has lower latency than vextract and rldicl has lower latency than
vsplat, the new sequence has the benefit. Specially, the shift/mask
is no need when the element is the first element of dword0. So it saves
another rldicl when it returns a sign extend value.

  This patch is based on previous one.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622101.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: Extract the element in dword0 by mfvsrd and shift/mask

gcc/
PR target/110331
* config/rs6000/rs6000-protos.h (rs6000_vsx_element_in_dword0_p):
Declare.
(rs6000_vsx_extract_element_from_dword0): Declare.
* config/rs6000/rs6000.cc (rs6000_vsx_element_in_dword0_p): New
function to judge if an element is in dword0 of a vector.
(rs6000_vsx_extract_element_from_dword0): Extract an element from
dword0 by mfvsrd and lshiftrt and mask.
* config/rs6000/rs6000.md (*rotl3_mask): Rename to...
(rotl3_mask): ...this
* config/rs6000/vsx.md (vsx_extract_): Add a comment.
(split pattern for p9 vector extract): Call
rs6000_vsx_extract_element_from_dword0 if the element is in dword0.
(*vsx_extract__di_p9): Exclude the elements in dword0 which
are processed by *vsx_extract__zero_extend for both p8 and p9.
(*vsx_extract__zero_extend): Zero extend pattern for vector
extract on the element of dword0.
(*vsx_extract__p8): Call rs6000_vsx_extract_element_from_dword0
when the extracted element is in dword0.  Refined the pattern and
remove reload_completed from split condition.

gcc/testsuite/
PR target/110331
* gcc.target/powerpc/fold-vec-extract-char.p8.c: Set the extracted
elements in dword1.
* gcc.target/powerpc/fold-vec-extract-char.p9.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p9.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p9.c: Likewise.
* gcc.target/powerpc/p9-extract-1.c: Likewise.
* gcc.target/powerpc/pr110331-p8.c: New.
* gcc.target/powerpc/pr110331-p9.c: New.
* gcc.target/powerpc/pr110331.h: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index f70118ea40f..ccef280122b 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -161,6 +161,8 @@ extern bool rs6000_function_pcrel_p (struct function *);
 extern bool rs6000_pcrel_p (void);
 extern bool rs6000_fndecl_pcrel_p (const_tree);
 extern void rs6000_output_addr_vec_elt (FILE *, int);
+extern bool rs6000_vsx_element_in_dword0_p (rtx, enum machine_mode);
+extern void rs6000_vsx_extract_element_from_dword0 (rtx, rtx, rtx, bool);

 /* Different PowerPC instruction formats that are used by GCC.  There are
various other instruction formats used by the PowerPC hardware, but these
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 07c3a3d15ac..fad01d6b5dd 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -29098,6 +29098,74 @@ rs6000_opaque_type_invalid_use_p (gimple *stmt)
   return false;
 }

+/* Return true when the element is in dword0 of a vector.  Exclude word
+   element 1 of VS4SI as the word can be extracted by mfvsrwz directly.  */
+
+bool
+rs6000_vsx_element_in_dword0_p (rtx op, enum machine_mode mode)
+{
+  gcc_assert (CONST_INT_P (op));
+  gcc_assert (mode == V16QImode || mode == V8HImode || mode == V4SImode);
+
+  int units = GET_MODE_NUNITS (mode);
+  int elt = INTVAL (op);
+  elt = BYTES_BIG_ENDIAN ? units - 1 - elt : elt;
+
+  if (elt > units / 2
+  || (elt == units / 2 && mode != V4SImode))
+return true;
+  else
+return false;
+}
+
+/* Extract element from dword0 by mfvsrd and lshiftrt and mask.  Extend_p
+   indicates if zero extend is needed or not.  */
+
+void
+rs6000_vsx_extract_element_from_dword0 (rtx dest, rtx src, rtx element,
+   bool extend_p)
+{
+  enum machine_mode mode = GET_MODE (src);
+  gcc_assert (rs6000_vsx_element_in_dword0_p (element, mode));
+
+  enum machine_mode dest_mode = GET_MODE (dest);
+  enum machine_mode inner_mode = GET_MODE_INNER (mode);
+  int units = GET_MODE_NUNITS (mode);
+  int elt = INTVAL (element);
+  elt = BYTES_BIG_ENDIAN ? units - 1 - elt : elt;
+  int value, shift;
+  unsigned int mask;
+
+  rtx vec_tmp = gen_lowpart (V2DImode, src);
+  rtx tmp1 = can_create_pseudo_p ()
+? gen_reg_rtx (DImode)
+: simplify_gen_subreg (DImode, dest, dest_mode,

[PATCH] Middle-end: Change order of LEN_MASK_LOAD/LEN_MASK_STORE arguments

2023-07-02 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

According to Richard's review comments:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html

current len, bias and mask order is not reasonable.

Change {len,mask,bias} into {len,bias,mask}.

This patch is a simple fix patch, ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Change order of LEN_MASK_LOAD/LEN_MASK_STORE 
arguments.
* config/riscv/riscv-v.cc (expand_load_store): Ditto.
* doc/md.texi: Ditto.
* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(add_len_bias_and_mask_args): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_bias_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.

---
 gcc/config/riscv/autovec.md |   8 +-
 gcc/config/riscv/riscv-v.cc |   2 +-
 gcc/doc/md.texi |  16 ++--
 gcc/gimple-fold.cc  |   7 +-
 gcc/internal-fn.cc  | 176 +---
 gcc/tree-ssa-dse.cc |  13 +--
 gcc/tree-vect-stmts.cc  |   6 +-
 7 files changed, 90 insertions(+), 138 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1488f2be1be..4ab0e9f99eb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -26,8 +26,8 @@
   [(match_operand:V 0 "register_operand")
(match_operand:V 1 "memory_operand")
(match_operand 2 "autovec_length_operand")
-   (match_operand: 3 "vector_mask_operand")
-   (match_operand 4 "const_0_operand")]
+   (match_operand 3 "const_0_operand")
+   (match_operand: 4 "vector_mask_operand")]
   "TARGET_VECTOR"
 {
   riscv_vector::expand_load_store (operands, true);
@@ -38,8 +38,8 @@
   [(match_operand:V 0 "memory_operand")
(match_operand:V 1 "register_operand")
(match_operand 2 "autovec_length_operand")
-   (match_operand: 3 "vector_mask_operand")
-   (match_operand 4 "const_0_operand")]
+   (match_operand 3 "const_0_operand")
+   (match_operand: 4 "vector_mask_operand")]
   "TARGET_VECTOR"
 {
   riscv_vector::expand_load_store (operands, false);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index adb8d7d36a5..8d5bed7ebe4 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2777,7 +2777,7 @@ expand_load_store (rtx *ops, bool is_load)
 {
   poly_int64 value;
   rtx len = ops[2];
-  rtx mask = ops[3];
+  rtx mask = ops[4];
   machine_mode mode = GET_MODE (ops[0]);
 
   if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index b44d1ba3af9..f14dd32b2dc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5319,15 +5319,15 @@ This pattern is not allowed to @code{FAIL}.
 @cindex @code{len_maskload@var{m}@var{n}} instruction pattern
 @item @samp{len_maskload@var{m}@var{n}}
 Perform a masked load from the memory location pointed to by operand 1
-into register operand 0.  (operand 2 + operand 4) elements are loaded from
+into register operand 0.  (operand 2 + operand 3) elements are loaded from
 memory and other elements in operand 0 are set to undefined values.
 This is a combination of len_load and maskload.
 Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
 has whichever integer mode the target prefers.  A mask is specified in
-operand 3 which must be of type @var{n}.  The mask has lower precedence than
+operand 4 which must be of type @var{n}.  The mask has lower precedence than
 the length and is itself subject to length masking,
-i.e. only mask indices < (operand 2 + operand 4) are used.
-Operand 4 conceptually has mode @code{QI}.
+i.e. only mask indices < (operand 2 + operand 3) are used.
+Operand 3 conceptually has mode @code{QI}.
 
 Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
 constant bias: it is either a constant 0 or a constant -1.  The predicate on
@@ -5346,14 +5346,14 @@ This pattern is not allowed to @code{FAIL}.
 @cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
 @item @samp{len_maskstore@var{m}@var{n}}
 Perform a masked store from vector register operand 1 into memory operand 0.
-(operand 2 + operand 4) elements are stored to memory
+(operand 2 + operand 3) elements are stored to memory
 and leave the other elements of operand 0 unchanged.
 This is a combination of len_store and maskstore.
 Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2 
has whichever
-integer mode the target prefers.  A mask is specified in operand 3 which must 
be
+integer mode the target 

Re: [PATCH 0/9] vect: Move costing next to the transform for vect load

2023-07-02 Thread Kewen.Lin via Gcc-patches
Hi Richi,

Thanks for your review comments on this and some others!

on 2023/6/30 19:37, Richard Biener wrote:
> On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin  wrote:
>>
>> This patch series follows Richi's suggestion at the link [1],
>> which suggest structuring vectorizable_load to make costing
>> next to the transform, in order to make it easier to keep
>> costing and the transform in sync.  For now, it's a known
>> issue that what we cost can be inconsistent with what we
>> transform, as the case in PR82255 and some other associated
>> test cases in the patches of this series show.
>>
>> Basically this patch series makes costing not call function
>> vect_model_load_cost any more.  To make the review and
>> bisection easy, I organized the changes according to the
>> memory access types of vector load.  For each memory access
>> type, firstly it follows the handlings in the function
>> vect_model_load_costto avoid any missing, then refines
>> further by referring to the transform code, I also checked
>> them with some typical test cases to verify.  Hope the
>> subjects of patches are clear enough.
>>
>> The whole series can be bootstrapped and regtested
>> incrementally on:
>>   - x86_64-redhat-linux
>>   - aarch64-linux-gnu
>>   - powerpc64-linux-gnu P7, P8 and P9
>>   - powerpc64le-linux-gnu P8, P9 and P10
>>
>> By considering the current vector test buckets are mainly
>> tested without cost model, I also verified the whole patch
>> series was neutral for SPEC2017 int/fp on Power9 at O2,
>> O3 and Ofast separately.
> 
> I went through the series now and I like it overall (well, I suggested
> the change).
> Looking at the changes I think we want some followup to reduce the
> mess in the final loop nest.  We already have some VMAT_* cases handled
> separately, maybe we can split out some more cases.  Maybe we should

At first glance, the simple parts look to be the handlings for
VMAT_LOAD_STORE_LANES, and VMAT_GATHER_SCATTER (with ifn and emulated).
It seems a bit straightforward if it's fine to duplicate the nested loop,
but may need to care about removing some useless code.

> bite the bullet and duplicate that loop nest for the different VMAT_* cases.
> Maybe we can merge some of the if (!costing_p) checks by clever
> re-ordering.

I've tried a bit to merge them if possible, like the place to check
VMAT_CONTIGUOUS, VMAT_CONTIGUOUS_REVERSE and VMAT_CONTIGUOUS_PERMUTE.
But will keep in mind for the following updates.

> So what
> this series doesn't improve is overall readability of the code (indent and our
> 80 char line limit).

Sorry about that.

> 
> The change also makes it more difficult(?) to separate analysis and transform
> though in the end I hope that analysis will actually "code generate" to a 
> (SLP)
> data structure so the target will have a chance to see the actual flow of 
> insns.
> 
> That said, I'd like to hear from Richard whether he thinks this is a step
> in the right direction.
> 
> Are you willing to followup with doing the same re-structuring to
> vectorizable_store?

Yes, vectorizable_store was also pointed out in your original suggestion [1],
I planned to update this once this series meets your expectations and gets 
landed.

> 
> OK from my side with the few comments addressed.  The patch likely needs 
> refresh
> after the RVV changes in this area?

Thanks!  Yes, I've updated 2/9 and 3/9 according to your comments, and updated
5/9 and 9/9 as they had some conflicts when rebasing.  Re-testing is ongoing,
do the updated versions look good to you?  Is this series ok for trunk if all 
the
test runs go well again as before?

BR,
Kewen


Re: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

2023-07-02 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2023/7/2 16:58, Richard Sandiford wrote:
> Kewen Lin  writes:
>> @@ -9743,11 +9739,23 @@ vectorizable_load (vec_info *vinfo,
>>unsigned int n_groups = 0;
>>for (j = 0; j < ncopies; j++)
>>  {
>> -  if (nloads > 1)
>> +  if (nloads > 1 && !costing_p)
>>  vec_alloc (v, nloads);
>>gimple *new_stmt = NULL;
>>for (i = 0; i < nloads; i++)
>>  {
>> +  if (costing_p)
>> +{
>> +  if (VECTOR_TYPE_P (ltype))
>> +vect_get_load_cost (vinfo, stmt_info, 1,
>> +alignment_support_scheme, misalignment,
>> +false, &inside_cost, nullptr, cost_vec,
>> +cost_vec, true);
>> +  else
>> +inside_cost += record_stmt_cost (cost_vec, 1, scalar_load,
>> + stmt_info, 0, vect_body);
>> +  continue;
>> +}
> 
> Just a note that this might make life harder for AArch64 costing.
> Strided SLP loads are still equivalent to vector loads for AArch64,
> since they happen on the FPR/vector side even if they have integral
> modes.
> 
> But I agree this is more accurate from a general target-independent POV,
> especially given the relatively coarse-grain costing enum.  So I think
> that's just something AArch64 will need to account for.

Sorry for the inconvenience.  Hope accounting for it with target hook on
vect costing isn't very complicated.

BR,
Kewen


[PATCH 9/9 v2] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS

2023-07-02 Thread Kewen.Lin via Gcc-patches
This is version v2 rebasing from latest trunk.

=

This patch adjusts the cost handling on VMAT_CONTIGUOUS in
function vectorizable_load.  We don't call function
vect_model_load_cost for it any more.  It removes function
vect_model_load_cost which becomes useless and unreachable
now.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_load_cost): Remove.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS without
calling vect_model_load_cost.
---
 gcc/tree-vect-stmts.cc | 94 --
 1 file changed, 9 insertions(+), 85 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 0a9a75ce3c7..9dfe1903181 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1118,75 +1118,6 @@ vect_get_store_cost (vec_info *, stmt_vec_info 
stmt_info, int ncopies,
 }
 }

-
-/* Function vect_model_load_cost
-
-   Models cost for loads.  In the case of grouped accesses, one access has
-   the overhead of the grouped access attributed to it.  Since unaligned
-   accesses are supported for loads, we also account for the costs of the
-   access scheme chosen.  */
-
-static void
-vect_model_load_cost (vec_info *vinfo,
- stmt_vec_info stmt_info, unsigned ncopies, poly_uint64 vf,
- vect_memory_access_type memory_access_type,
- dr_alignment_support alignment_support_scheme,
- int misalignment,
- slp_tree slp_node,
- stmt_vector_for_cost *cost_vec)
-{
-  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
-
-  unsigned int inside_cost = 0, prologue_cost = 0;
-  bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
-
-  gcc_assert (cost_vec);
-
-  /* ???  Somehow we need to fix this at the callers.  */
-  if (slp_node)
-ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-
-  if (slp_node && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
-{
-  /* If the load is permuted then the alignment is determined by
-the first group element not by the first scalar stmt DR.  */
-  stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
-  if (!first_stmt_info)
-   first_stmt_info = stmt_info;
-  /* Record the cost for the permutation.  */
-  unsigned n_perms, n_loads;
-  vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL,
-   vf, true, &n_perms, &n_loads);
-  inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm,
-  first_stmt_info, 0, vect_body);
-
-  /* And adjust the number of loads performed.  This handles
-redundancies as well as loads that are later dead.  */
-  ncopies = n_loads;
-}
-
-  /* Grouped loads read all elements in the group at once,
- so we want the DR for the first statement.  */
-  stmt_vec_info first_stmt_info = stmt_info;
-  if (!slp_node && grouped_access_p)
-first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
-
-  /* True if we should include any once-per-group costs as well as
- the cost of the statement itself.  For SLP we only get called
- once per group anyhow.  */
-  bool first_stmt_p = (first_stmt_info == stmt_info);
-
-  vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
- misalignment, first_stmt_p, &inside_cost, &prologue_cost,
- cost_vec, cost_vec, true);
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_load_cost: inside_cost = %d, "
- "prologue_cost = %d .\n", inside_cost, prologue_cost);
-}
-
-
 /* Calculate cost of DR's memory access.  */
 void
 vect_get_load_cost (vec_info *, stmt_vec_info stmt_info, int ncopies,
@@ -10830,7 +10761,8 @@ vectorizable_load (vec_info *vinfo,
 we only need to count it once for the whole group.  */
  bool first_stmt_info_p = first_stmt_info == stmt_info;
  bool add_realign_cost = first_stmt_info_p && i == 0;
- if (memory_access_type == VMAT_CONTIGUOUS_REVERSE
+ if (memory_access_type == VMAT_CONTIGUOUS
+ || memory_access_type == VMAT_CONTIGUOUS_REVERSE
  || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE
  && (!grouped_load || first_stmt_info_p)))
vect_get_load_cost (vinfo, stmt_info, 1,
@@ -10954,15 +10886,14 @@ vectorizable_load (vec_info *vinfo,
 direct vect_transform_slp_perm_load to DCE the unused parts.
 ???  This is a hack to prevent compile-time issues as seen
 in PR101120 and friends.  */
- if (costing_p
- && memory_access_type != VMAT_CONTIGUOUS)
+ if (costing_p)
{
  vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf,
   

[PATCH 5/9 v2] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER

2023-07-02 Thread Kewen.Lin via Gcc-patches
This is version v2 rebasing from latest trunk.

=

This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_load.  We don't call function
vect_model_load_cost for it any more.

It's mainly for gather loads with IFN or emulated gather
loads, it follows the handlings in function
vect_model_load_cost.  This patch shouldn't have any
functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_GATHER_SCATTER without calling vect_model_load_cost.
(vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER,
remove VMAT_GATHER_SCATTER related handlings and the related parameter
gs_info.
---
 gcc/tree-vect-stmts.cc | 124 +
 1 file changed, 76 insertions(+), 48 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 7d8e72bda67..1ae917db627 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1132,11 +1132,10 @@ vect_model_load_cost (vec_info *vinfo,
  vect_memory_access_type memory_access_type,
  dr_alignment_support alignment_support_scheme,
  int misalignment,
- gather_scatter_info *gs_info,
  slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
  && memory_access_type != VMAT_INVARIANT
  && memory_access_type != VMAT_ELEMENTWISE
  && memory_access_type != VMAT_STRIDED_SLP);
@@ -1225,35 +1224,9 @@ vect_model_load_cost (vec_info *vinfo,
  group_size);
 }

-  /* The loads themselves.  */
-  if (memory_access_type == VMAT_GATHER_SCATTER)
-{
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  if (memory_access_type == VMAT_GATHER_SCATTER
- && gs_info->ifn == IFN_LAST && !gs_info->decl)
-   /* For emulated gathers N offset vector element extracts
-  (we assume the scalar scaling and ptr + offset add is consumed by
-  the load).  */
-   inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits,
-vec_to_scalar, stmt_info, 0,
-vect_body);
-  /* N scalar loads plus gathering them into a vector.  */
-  inside_cost += record_stmt_cost (cost_vec,
-  ncopies * assumed_nunits,
-  scalar_load, stmt_info, 0, vect_body);
-}
-  else
-vect_get_load_cost (vinfo, stmt_info, ncopies,
-   alignment_support_scheme, misalignment, first_stmt_p,
-   &inside_cost, &prologue_cost,
-   cost_vec, cost_vec, true);
-
-  if (memory_access_type == VMAT_GATHER_SCATTER
-  && gs_info->ifn == IFN_LAST
-  && !gs_info->decl)
-inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct,
-stmt_info, 0, vect_body);
+  vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
+ misalignment, first_stmt_p, &inside_cost, &prologue_cost,
+ cost_vec, cost_vec, true);

   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
@@ -10283,6 +10256,7 @@ vectorizable_load (vec_info *vinfo,
 }
   tree vec_mask = NULL_TREE;
   poly_uint64 group_elt = 0;
+  unsigned int inside_cost = 0;
   for (j = 0; j < ncopies; j++)
 {
   /* 1. Create the vector or array pointer update chain.  */
@@ -10414,23 +10388,26 @@ vectorizable_load (vec_info *vinfo,
  /* Record that VEC_ARRAY is now dead.  */
  vect_clobber_variable (vinfo, stmt_info, gsi, vec_array);
}
-  else if (!costing_p)
+  else
{
  for (i = 0; i < vec_num; i++)
{
  tree final_mask = NULL_TREE;
- if (loop_masks
- && memory_access_type != VMAT_INVARIANT)
-   final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
-vec_num * ncopies,
-vectype, vec_num * j + i);
- if (vec_mask)
-   final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
-  final_mask, vec_mask, gsi);
-
- if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
-   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
-  gsi, stmt_info, bump);
+ if (!costing_p)
+   {
+ if (loop_masks && memory_access_type != VMAT_INVARIANT)
+   final_mask
+ 

[PATCH 3/9 v2] vect: Adjust vectorizable_load costing on VMAT_INVARIANT

2023-07-02 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2023/6/30 19:18, Richard Biener wrote:
> On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin  wrote:
>>
>> This patch adjusts the cost handling on VMAT_INVARIANT in
>> function vectorizable_load.  We don't call function
>> vect_model_load_cost for it any more.
>>
>> To make the costing on VMAT_INVARIANT better, this patch is
>> to query hoist_defs_of_uses for hoisting decision, and add
>> costs for different "where" based on it.  Currently function
>> hoist_defs_of_uses would always hoist the defs of all SSA
>> uses, adding one argument HOIST_P aims to avoid the actual
>> hoisting during costing phase.
>>
>> gcc/ChangeLog:
>>
>> * tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P.
>> (vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect
>> hoisting decision and without calling vect_model_load_cost.
>> (vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more
>> and remove VMAT_INVARIANT related handlings.
>> ---
>>  gcc/tree-vect-stmts.cc | 61 +++---
>>  1 file changed, 39 insertions(+), 22 deletions(-)
>>
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 744cdf40e26..19c61d703c8 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -1135,7 +1135,8 @@ vect_model_load_cost (vec_info *vinfo,
>>   slp_tree slp_node,
>>   stmt_vector_for_cost *cost_vec)
>>  {
>> -  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
>> +  gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
>> + && memory_access_type != VMAT_INVARIANT);
>>
>>unsigned int inside_cost = 0, prologue_cost = 0;
>>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>> @@ -1238,16 +1239,6 @@ vect_model_load_cost (vec_info *vinfo,
>>ncopies * assumed_nunits,
>>scalar_load, stmt_info, 0, vect_body);
>>  }
>> -  else if (memory_access_type == VMAT_INVARIANT)
>> -{
>> -  /* Invariant loads will ideally be hoisted and splat to a vector.  */
>> -  prologue_cost += record_stmt_cost (cost_vec, 1,
>> -scalar_load, stmt_info, 0,
>> -vect_prologue);
>> -  prologue_cost += record_stmt_cost (cost_vec, 1,
>> -scalar_to_vec, stmt_info, 0,
>> -vect_prologue);
>> -}
>>else
>>  vect_get_load_cost (vinfo, stmt_info, ncopies,
>> alignment_support_scheme, misalignment, first_stmt_p,
>> @@ -9121,10 +9112,11 @@ permute_vec_elements (vec_info *vinfo,
>>  /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP,
>> inserting them on the loops preheader edge.  Returns true if we
>> were successful in doing so (and thus STMT_INFO can be moved then),
>> -   otherwise returns false.  */
>> +   otherwise returns false.  HOIST_P indicates if we want to hoist the
>> +   definitions of all SSA uses, it would be false when we are costing.  */
>>
>>  static bool
>> -hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop)
>> +hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p)
>>  {
>>ssa_op_iter i;
>>tree op;
>> @@ -9158,6 +9150,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class 
>> loop *loop)
>>if (!any)
>>  return true;
>>
>> +  if (!hoist_p)
>> +return true;
>> +
>>FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
>>  {
>>gimple *def_stmt = SSA_NAME_DEF_STMT (op);
>> @@ -9510,14 +9505,6 @@ vectorizable_load (vec_info *vinfo,
>>
>>if (memory_access_type == VMAT_INVARIANT)
>>  {
>> -  if (costing_p)
>> -   {
>> - vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
>> -   memory_access_type, alignment_support_scheme,
>> -   misalignment, &gs_info, slp_node, cost_vec);
>> - return true;
>> -   }
>> -
>>gcc_assert (!grouped_load && !mask && !bb_vinfo);
>>/* If we have versioned for aliasing or the loop doesn't
>>  have any data dependencies that would preclude this,
>> @@ -9525,7 +9512,37 @@ vectorizable_load (vec_info *vinfo,
>>  thus we can insert it on the preheader edge.  */
>>bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo)
>>   && !nested_in_vect_loop
>> - && hoist_defs_of_uses (stmt_info, loop));
>> + && hoist_defs_of_uses (stmt_info, loop, !costing_p));
> 
> 'hoist_defs_of_uses' should ideally be computed once at analysis time and
> the result remembered.  It's not so easy in this case so maybe just
> add a comment
> for this here.

Ok, updated with: 

   /* If we have versioned for aliasing or t

[PATCH 2/9 v2] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl

2023-07-02 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2023/6/30 19:11, Richard Biener wrote:
> On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin  wrote:
>>
>> This patch adds one extra argument cost_vec to function
>> vect_build_gather_load_calls, so that we can do costing
>> next to the tranform in vect_build_gather_load_calls.
>> For now, the implementation just follows the handlings in
>> vect_model_load_cost, it isn't so good, so placing one
>> FIXME for any further improvement.  This patch should not
>> cause any functional changes.
>>
>> gcc/ChangeLog:
>>
>> * tree-vect-stmts.cc (vect_build_gather_load_calls): Add the 
>> handlings
>> on costing with one extra argument cost_vec.
>> (vectorizable_load): Adjust the call to vect_build_gather_load_calls.
>> (vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with
>> gs_info.decl set any more.
>> ---
>>  gcc/tree-vect-stmts.cc | 31 +++
>>  1 file changed, 23 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index 44514658be3..744cdf40e26 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -1135,6 +1135,8 @@ vect_model_load_cost (vec_info *vinfo,
>>   slp_tree slp_node,
>>   stmt_vector_for_cost *cost_vec)
>>  {
>> +  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
>> +
>>unsigned int inside_cost = 0, prologue_cost = 0;
>>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>>
>> @@ -2819,7 +2821,8 @@ vect_build_gather_load_calls (vec_info *vinfo, 
>> stmt_vec_info stmt_info,
>>   gimple_stmt_iterator *gsi,
>>   gimple **vec_stmt,
>>   gather_scatter_info *gs_info,
>> - tree mask)
>> + tree mask,
>> + stmt_vector_for_cost *cost_vec)
>>  {
>>loop_vec_info loop_vinfo = dyn_cast  (vinfo);
>>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> @@ -2831,6 +2834,23 @@ vect_build_gather_load_calls (vec_info *vinfo, 
>> stmt_vec_info stmt_info,
>>poly_uint64 gather_off_nunits
>>  = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
>>
>> +  /* FIXME: Keep the previous costing way in vect_model_load_cost by costing
>> + N scalar loads, but it should be tweaked to use target specific costs
>> + on related gather load calls.  */
>> +  if (!vec_stmt)
> 
> going over the series now, I'm collecting comments but wanted to get
> this one out here:
> I'd rather see if (cost_vec) here, that 'vec_stmt' argument is quite
> legacy (I think it can
> be completely purged everywhere)
> 

OK, updated with the below new version v2.  I guessed you meant we would like to
use cost_vec for checking costing or not and use vec_stmts in stmt_info to store
the vectorized stmt. 

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ec177dcad82..dd8f5421d4e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1136,6 +1136,8 @@ vect_model_load_cost (vec_info *vinfo,
  slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
+
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);

@@ -2846,7 +2848,8 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
  gimple_stmt_iterator *gsi,
  gimple **vec_stmt,
  gather_scatter_info *gs_info,
- tree mask)
+ tree mask,
+ stmt_vector_for_cost *cost_vec)
 {
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -2858,6 +2861,23 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
   poly_uint64 gather_off_nunits
 = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);

+  /* FIXME: Keep the previous costing way in vect_model_load_cost by costing
+ N scalar loads, but it should be tweaked to use target specific costs
+ on related gather load calls.  */
+  if (cost_vec)
+{
+  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+  unsigned int inside_cost;
+  inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
+ scalar_load, stmt_info, 0, vect_body);
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_load_cost: inside_cost = %d, "
+"prologue_cost = 0 .\n",
+inside_cost);
+  return;
+}
+
   tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl));
   tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl));
   tree srctype = T

Re: [PATCH V6] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-07-02 Thread juzhe.zh...@rivai.ai
Oh. Sorry. I forget update the changeLog.
We hope RVV can vectorize such case into following IR:
loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias)

This change log is incorrect and misleading.

As Richard's comments, I have change the order of the arguments into:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, bias, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, bias, control_mask)
Sorry about that.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-07-03 10:15
To: gcc-patches
CC: richard.sandiford; rguenther; pan2.li; Ju-Zhe Zhong
Subject: [PATCH V6] Machine Description: Add LEN_MASK_{GATHER_LOAD, 
SCATTER_STORE} pattern
From: Ju-Zhe Zhong 
 
Hi, Richi and Richard.
 
Base one the review comments from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
 
I add the helper function:
/* Add len, bias and mask arguments according to the STMT.  */
 
static unsigned int
add_len_bias_and_mask_args (expand_operand *ops, unsigned int opno, gcall *stmt)
{
  internal_fn ifn = gimple_call_internal_fn (stmt);
  int len_index = internal_fn_len_index (ifn);
  int bias_index = internal_fn_bias_index (ifn);
  int mask_index = internal_fn_mask_index (ifn);
  /* The order of arguments are always {len,bias,mask}.  */
  if (len_index >= 0)
{
  tree len = gimple_call_arg (stmt, len_index);
  rtx len_rtx = expand_normal (len);
  create_convert_operand_from (&ops[opno++], len_rtx,
   TYPE_MODE (TREE_TYPE (len)),
   TYPE_UNSIGNED (TREE_TYPE (len)));
}
  if (bias_index >= 0)
{
  tree biast = gimple_call_arg (stmt, bias_index);
  rtx bias = expand_normal (biast);
  create_input_operand (&ops[opno++], bias, QImode);
}
  if (mask_index >= 0)
{
  tree mask = gimple_call_arg (stmt, mask_index);
  rtx mask_rtx = expand_normal (mask);
  create_input_operand (&ops[opno++], mask_rtx,
TYPE_MODE (TREE_TYPE (mask)));
}
  return opno;
}
 
I think current codes look more reasonable and easier maitain now.
 
This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:
 
#include 
void
f (uint8_t *restrict a,
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}
 
We hope RVV can vectorize such case into following IR:
 
loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias)
 
This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.
 
Will send patch which apply such patterns into vectorizer soon after this
patch is approved.
 
Ok for trunk?
 
gcc/ChangeLog:
 
* doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
* internal-fn.cc (add_len_bias_and_mask_args): New function.
(expand_scatter_store_optab_fn): Add 
len_mask_gather_load/len_mask_scatter_store.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_len_index): New function.
(internal_fn_bias_index): New function.
(internal_fn_stored_value_index): Add 
len_mask_gather_load/len_mask_scatter_store.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* internal-fn.h (internal_fn_len_index): New function.
(internal_fn_bias_index): Ditto.
* optabs.def (OPTAB_CD): Add 
len_mask_gather_load/len_mask_scatter_store.
 
---
gcc/doc/md.texi |  17 +++
gcc/internal-fn.cc  | 108 ++--
gcc/internal-fn.def |   8 +++-
gcc/internal-fn.h   |   2 +
gcc/optabs.def  |   2 +
5 files changed, 120 insertions(+), 17 deletions(-)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index cefdee84821..b44d1ba3af9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
of the result should be loaded from memory and clear if element @var{i}
of the result should be set to zero.
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a bias operand (operand 6) as well as a mask operand (operand 7).  Similar to 
len_maskload,
+the instruction loads at most (operand 5 + operand 6) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from 

[PATCH V6] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-07-02 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richi and Richard.

Base one the review comments from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html

I add the helper function:
/* Add len, bias and mask arguments according to the STMT.  */

static unsigned int
add_len_bias_and_mask_args (expand_operand *ops, unsigned int opno, gcall *stmt)
{
  internal_fn ifn = gimple_call_internal_fn (stmt);
  int len_index = internal_fn_len_index (ifn);
  int bias_index = internal_fn_bias_index (ifn);
  int mask_index = internal_fn_mask_index (ifn);
  /* The order of arguments are always {len,bias,mask}.  */
  if (len_index >= 0)
{
  tree len = gimple_call_arg (stmt, len_index);
  rtx len_rtx = expand_normal (len);
  create_convert_operand_from (&ops[opno++], len_rtx,
   TYPE_MODE (TREE_TYPE (len)),
   TYPE_UNSIGNED (TREE_TYPE (len)));
}
  if (bias_index >= 0)
{
  tree biast = gimple_call_arg (stmt, bias_index);
  rtx bias = expand_normal (biast);
  create_input_operand (&ops[opno++], bias, QImode);
}
  if (mask_index >= 0)
{
  tree mask = gimple_call_arg (stmt, mask_index);
  rtx mask_rtx = expand_normal (mask);
  create_input_operand (&ops[opno++], mask_rtx,
TYPE_MODE (TREE_TYPE (mask)));
}
  return opno;
}

I think current codes look more reasonable and easier maitain now.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include 
void
f (uint8_t *restrict a,
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Ok for trunk?

gcc/ChangeLog:

* doc/md.texi: Add len_mask_gather_load/len_mask_scatter_store.
* internal-fn.cc (add_len_bias_and_mask_args): New function.
(expand_scatter_store_optab_fn): Add 
len_mask_gather_load/len_mask_scatter_store.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_len_index): New function.
(internal_fn_bias_index): New function.
(internal_fn_stored_value_index): Add 
len_mask_gather_load/len_mask_scatter_store.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* internal-fn.h (internal_fn_len_index): New function.
(internal_fn_bias_index): Ditto.
* optabs.def (OPTAB_CD): Add 
len_mask_gather_load/len_mask_scatter_store.

---
 gcc/doc/md.texi |  17 +++
 gcc/internal-fn.cc  | 108 ++--
 gcc/internal-fn.def |   8 +++-
 gcc/internal-fn.h   |   2 +
 gcc/optabs.def  |   2 +
 5 files changed, 120 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index cefdee84821..b44d1ba3af9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a bias operand (operand 6) as well as a mask operand (operand 7).  Similar to 
len_maskload,
+the instruction loads at most (operand 5 + operand 6) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an ext

[PATCH] libstdc++: use __is_enum built-in trait

2023-07-02 Thread Ken Matsui via Gcc-patches
This patch replaces is_enum::value with __is_enum built-in trait in
the type_traits header.

libstdc++-v3/ChangeLog:

* include/std/type_traits (__make_unsigned_selector): Use
__is_enum built-in trait.
(__make_signed_selector): Likewise.
(__underlying_type_impl): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0e7a9c9c7f3..9f086992ebc 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1740,7 +1740,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Select between integral and enum: not possible to be both.
   template::value,
-  bool _IsEnum = is_enum<_Tp>::value>
+  bool _IsEnum = __is_enum(_Tp)>
 class __make_unsigned_selector;
 
   template
@@ -1900,7 +1900,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Select between integral and enum: not possible to be both.
   template::value,
-  bool _IsEnum = is_enum<_Tp>::value>
+  bool _IsEnum = __is_enum(_Tp)>
 class __make_signed_selector;
 
   template
@@ -2353,7 +2353,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __common_type_fold<_CTp, _Rp, void>
 { };
 
-  template::value>
+  template
 struct __underlying_type_impl
 {
   using type = __underlying_type(_Tp);
-- 
2.41.0



[committed] d: Fix testcase failure of gdc.dg/Wbuiltin_declaration_mismatch2.d.

2023-07-02 Thread Iain Buclaw via Gcc-patches
Hi,

Seen at least on aarch64-*-darwin, the parameters used to instantiate
the shufflevector intrinsic meant the return type was __vector(int[1]),
which resulted in the error:

vector type '__vector(int[1])' is not supported on this platform.

All instantiations have now been fixed so the expected warning/error is
now given by the compiler.

Regression tested on x86_64-linux-gnu/-m32, committed to mainline, and
backported to releases/gcc-13.

Regards,
Iain.

---
gcc/testsuite/ChangeLog:

* gdc.dg/Wbuiltin_declaration_mismatch2.d: Fix failed tests.
---
 .../gdc.dg/Wbuiltin_declaration_mismatch2.d   | 44 +--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d 
b/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d
index 7b83fffae58..0d12bcb8b07 100644
--- a/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d
+++ b/gcc/testsuite/gdc.dg/Wbuiltin_declaration_mismatch2.d
@@ -77,32 +77,32 @@ void test_shuffle()
 
 void test_shufflevector()
 {
-shufflevector!(int, int4, int)(0, 0, 0); // { dg-warning "mismatch in 
argument 1" }
-shufflevector!(double, int4, int)(0, 0, 0); // { dg-warning "mismatch in 
argument 1" }
-shufflevector!(fake4, int4, int)(f, 0, 0); // { dg-warning "mismatch in 
argument 1" }
-
-shufflevector!(int4, int, int)(0, 0, 0); // { dg-warning "mismatch in 
argument 2" }
-shufflevector!(int4, double, int)(0, 0, 0); // { dg-warning "mismatch in 
argument 2" }
-shufflevector!(int4, int4, int)(0, 0, 0);
-shufflevector!(int4, short8, int)(0, 0, 0); // { dg-error "mismatch in 
argument 2" }
-shufflevector!(int4, float4, int)(0, 0, 0); // { dg-error "mismatch in 
argument 2" }
-shufflevector!(int4, byte16, int)(0, 0, 0); // { dg-error "mismatch in 
argument 2" }
-shufflevector!(int4, fake4, int)(0, f, 0); // { dg-warning "mismatch in 
argument 2" }
-
-shufflevector!(int4, int4, double)(0, 0, 0); // { dg-warning "mismatch in 
argument 3" }
-shufflevector!(int4, int4, int4)(0, 0, 0); // { dg-warning "mismatch in 
argument 3" }
-shufflevector!(int4, int4, short8)(0, 0, 0); // { dg-warning "mismatch in 
argument 3" }
-shufflevector!(int4, int4, float4)(0, 0, 0); // { dg-warning "mismatch in 
argument 3" }
-shufflevector!(int4, int4, byte16)(0, 0, 0); // { dg-warning "mismatch in 
argument 3" }
-
-shufflevector!(int4, int4, int, double)(0, 0, 0, 0); // { dg-warning 
"mismatch in argument 4" }
+shufflevector!(int, int4, int, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 1" }
+shufflevector!(double, int4, int, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 1" }
+shufflevector!(fake4, int4, int, int, int, int)(f, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 1" }
+
+shufflevector!(int4, int, int, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 2" }
+shufflevector!(int4, double, int, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 2" }
+shufflevector!(int4, int4, int, int, int, int)(0, 0, 0, 0, 0, 0);
+shufflevector!(int4, short8, int, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-error "mismatch in argument 2" }
+shufflevector!(int4, float4, int, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-error "mismatch in argument 2" }
+shufflevector!(int4, byte16, int, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-error "mismatch in argument 2" }
+shufflevector!(int4, fake4, int, int, int, int)(0, f, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 2" }
+
+shufflevector!(int4, int4, double, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 3" }
+shufflevector!(int4, int4, int4, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 3" }
+shufflevector!(int4, int4, short8, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 3" }
+shufflevector!(int4, int4, float4, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 3" }
+shufflevector!(int4, int4, byte16, int, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 3" }
+
+shufflevector!(int4, int4, int, double, int, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 4" }
 shufflevector!(int4, int4, int, int, double, int)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 5" }
 shufflevector!(int4, int4, int, int, int, double)(0, 0, 0, 0, 0, 0); // { 
dg-warning "mismatch in argument 6" }
 
 int i;
-shufflevector!(int4, int4, int)(0, 0, i); // { dg-error "argument .i. 
cannot be read at compile time" }
-shufflevector!(int4, int4, int)(0, 0, -1u); // { dg-error "element index 
.-1. is out of bounds" }
-shufflevector!(int4, int4, int)(0, 0, 8); // { dg-error "element index .8. 
is out of bounds" }
+shufflevector!(int4, int4, int, int, int, int)(0, 0, i, 0, 0, 0); // { 
dg-error "argument .i. cannot be read at 

[PATCH] Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT) arguments [PR92178]

2023-07-02 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached patch fixes a long-standing issue with the
order of evaluation of procedure argument expressions and
deallocation of allocatable actual arguments passed to
allocatable dummies with intent(out) attribute.

It is based on an initial patch by Steve, handles issues
pointed out by Tobias, and includes a suggestion by Tobias
to scan the procedure arguments first to decide whether the
creation of temporaries is needed.

There is one unresolved issue left that might be more
general: it appears to affect character arguments (only)
in that quite often there still is no temporary generated.
I haven't found the reason why and would like to defer this,
unless someone has a good suggestion.

Regtested on x86_64-pc-linux-gnu. OK for mainline?

Thanks,
Harald

From 609ba636927811cddc74fb815cb18809c7d33565 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 2 Jul 2023 22:14:19 +0200
Subject: [PATCH] Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT)
 arguments [PR92178]

gcc/fortran/ChangeLog:

	PR fortran/92178
	* trans-expr.cc (gfc_conv_procedure_call): Check procedures for
	allocatable dummy arguments with INTENT(OUT) and move deallocation
	of actual arguments after evaluation of argument expressions before
	the procedure is executed.

gcc/testsuite/ChangeLog:

	PR fortran/92178
	* gfortran.dg/pr92178.f90: New test.
	* gfortran.dg/pr92178_2.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/trans-expr.cc   | 52 ++--
 gcc/testsuite/gfortran.dg/pr92178.f90   | 83 +
 gcc/testsuite/gfortran.dg/pr92178_2.f90 | 46 ++
 3 files changed, 177 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr92178.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr92178_2.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 30946ba3f63..16e8f037cfc 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6085,9 +6085,10 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   else
 info = NULL;

-  stmtblock_t post, clobbers;
+  stmtblock_t post, clobbers, dealloc_blk;
   gfc_init_block (&post);
   gfc_init_block (&clobbers);
+  gfc_init_block (&dealloc_blk);
   gfc_init_interface_mapping (&mapping);
   if (!comp)
 {
@@ -6117,6 +6118,33 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	   && UNLIMITED_POLY (sym)
 	   && comp && (strcmp ("_copy", comp->name) == 0);

+  /* First scan argument list for allocatable actual arguments passed to
+ allocatable dummy arguments with INTENT(OUT).  As the corresponding
+ actual arguments are deallocated before execution of the procedure, we
+ evaluate actual argument expressions to avoid problems with possible
+ dependencies.  */
+  bool force_eval_args = false;
+  gfc_formal_arglist *tmp_formal;
+  for (arg = args, tmp_formal = formal; arg != NULL;
+   arg = arg->next, tmp_formal = tmp_formal ? tmp_formal->next : NULL)
+{
+  e = arg->expr;
+  fsym = tmp_formal ? tmp_formal->sym : NULL;
+  if (e && fsym
+	  && e->expr_type == EXPR_VARIABLE
+	  && fsym->attr.intent == INTENT_OUT
+	  && (fsym->ts.type == BT_CLASS && fsym->attr.class_ok
+	  ? CLASS_DATA (fsym)->attr.allocatable
+	  : fsym->attr.allocatable)
+	  && e->symtree
+	  && e->symtree->n.sym
+	  && gfc_variable_attr (e, NULL).allocatable)
+	{
+	  force_eval_args = true;
+	  break;
+	}
+}
+
   /* Evaluate the arguments.  */
   for (arg = args, argc = 0; arg != NULL;
arg = arg->next, formal = formal ? formal->next : NULL, ++argc)
@@ -6680,7 +6708,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		  else
 			tmp = gfc_finish_block (&block);

-		  gfc_add_expr_to_block (&se->pre, tmp);
+		  gfc_add_expr_to_block (&dealloc_blk, tmp);
 		}

 		  /* A class array element needs converting back to be a
@@ -6980,7 +7008,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	build_empty_stmt (input_location));
 		  }
 		if (tmp != NULL_TREE)
-		  gfc_add_expr_to_block (&se->pre, tmp);
+		  gfc_add_expr_to_block (&dealloc_blk, tmp);
 		  }

 		  tmp = parmse.expr;
@@ -7004,7 +7032,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  void_type_node,
  gfc_conv_expr_present (e->symtree->n.sym),
    tmp, build_empty_stmt (input_location));
-		  gfc_add_expr_to_block (&se->pre, tmp);
+		  gfc_add_expr_to_block (&dealloc_blk, tmp);
 		}
 	}
 	}
@@ -7101,6 +7129,21 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	}
 	}

+  /* If any actual argument of the procedure is allocatable and passed
+	 to an allocatable dummy with INTENT(OUT), we conservatively
+	 evaluate all actual argument expressions before deallocations are
+	 performed and the procedure is executed.  This ensures we conform
+	 to F2023:15.5.3, 15.5.4.  Create temporaries except for constants,
+	 variables, and functions returnin

[PATCH v7 2/4] p1689r5: initial support

2023-07-02 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdeps-target=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously representable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_fdeps_format enum.
(cpp_options): Add fdeps_format field
(cpp_finish): Add structured dependency fdeps_stream parameter.
* include/mkdeps.h (deps_add_module_target): Add flag for
whether a module is exported or not.
(fdeps_add_target): Add function.
(deps_write_p1689r5): Add function.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdeps-file=, and
-fdeps-target= flags.
* gcc.cc: add defaults for -fdeps-target= and -fdeps-file= when
only -fdeps-format= is specified.
* json.h: Add a TODO item to refactor out to share with
`libcpp/mkdeps.cc`.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdeps-file=, and -fdeps-target= parsing.
* c.opt: Add -fdeps-format=, -fdeps-file=, and -fdeps-target=
flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.ddi: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.ddi: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.ddi: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.ddi: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.ddi: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validati

[PATCH v7 4/4] c++modules: report module mapper files as a dependency

2023-07-02 Thread Ben Boeckel via Gcc-patches
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

gcc/testsuite/:

* g++.dg/modules/depreport-2.modmap: New test.
* g++.dg/modules/depreport-2_a.C: New test.
* g++.dg/modules/depreport-2_b.C: New test.
* g++.dg/modules/test-depfile.py: Support `:|` syntax output
when generating modules.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc   |  5 +
 gcc/cp/mapper-client.h|  1 +
 gcc/cp/module.cc  | 18 -
 .../g++.dg/modules/depreport-2.modmap |  2 ++
 gcc/testsuite/g++.dg/modules/depreport-2_a.C  | 15 ++
 gcc/testsuite/g++.dg/modules/depreport-2_b.C  | 14 +
 gcc/testsuite/g++.dg/modules/test-depfile.py  | 20 +++
 7 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..92727195246 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   if (deps)
+ deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f3acc4e02fe..77c9edcbc04 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14033,7 +14033,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14041,7 +14041,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 name_pending_imports (cpp_reader *reader)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp

[PATCH v7 1/4] spec: add a spec function to join arguments

2023-07-02 Thread Ben Boeckel via Gcc-patches
When passing `-o` flags to other options, the typical `-o foo` spelling
leaves a leading whitespace when replacing elsewhere. This ends up
creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't
parse properly. When attempting to make a spec function to just remove
the leading whitespace, the argument splitting ends up masking the
whitespace. However, the intended extension *also* ends up being its own
argument. To perform the desired behavior, the arguments need to be
concatenated together.

gcc/:

* gcc.cc (join_spec_func): Add a spec function to join all
arguments.

Signed-off-by: Ben Boeckel 
---
 gcc/gcc.cc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index fdfac0b4fe4..44433b80d61 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char 
**);
 static const char *debug_level_greater_than_spec_func (int, const char **);
 static const char *dwarf_version_greater_than_spec_func (int, const char **);
 static const char *find_fortran_preinclude_file (int, const char **);
+static const char *join_spec_func (int, const char **);
 static char *convert_white_space (char *);
 static char *quote_spec (char *);
 static char *quote_spec_arg (char *);
@@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] 
=
   { "debug-level-gt",  debug_level_greater_than_spec_func },
   { "dwarf-version-gt",dwarf_version_greater_than_spec_func },
   { "fortran-preinclude-file", find_fortran_preinclude_file},
+  { "join",join_spec_func},
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -10975,6 +10977,19 @@ find_fortran_preinclude_file (int argc, const char 
**argv)
   return result;
 }
 
+/* The function takes any number of arguments and joins them together.  */
+
+static const char *
+join_spec_func (int argc, const char **argv)
+{
+  char *result = NULL;
+
+  for (int i = 0; i < argc; ++i)
+result = reconcat(result, result ? result : "", argv[i], NULL);
+
+  return result;
+}
+
 /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string
so as to precede every one of them with a backslash.  Return the
original string or the reallocated one.  */
-- 
2.40.1



[PATCH v7 0/4] P1689R5 support

2023-07-02 Thread Ben Boeckel via Gcc-patches
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v6 -> v7:

- rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad
  discarded if result is unused, 2023-07-02))
- add test cases for patches 3 and 4 (new dependency reporting in `-MF`)
- add a Python script to test aspects of generated dependency files
- a new `join` spec function to support `-fdeps-*` defaults based on the
  `-o` flag (needed to strip the leading space that appears otherwise)
- note that JSON writing support should be factored out for use by
  `libcpp` and `gcc` (libiberty?)
- use `.ddi` for the extension of `-fdeps-*` output files by default
- support defaults for `-fdeps-file=` and `-fdeps-target=` when only
  `-fdeps-format=` is provided (with tests)
- error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`)
  file as their formats are incompatible
- expand the documentation on how the `-fdeps-*` flags should be used

v5 -> v6:

- rebase onto `master` (585c660f041 (reload1: Change return type of
  predicate function from int to bool, 2023-06-06))
- fix crash related to reporting imported CMI files as dependencies
- rework utf-8 validity to patch the new `cpp_valid_utf8_p` function
  instead of the core utf-8 decoding routine to reject invalid
  codepoints (preserves higher-level error detection of invalid utf-8)
- harmonize of `fdeps` spelling in flags, variables, comments, etc.
- rename `-fdeps-output=` to `-fdeps-target=`

v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (4):
  spec: add a spec function to join arguments
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   5 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  24 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|  19 +-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depfl

[PATCH v7 3/4] c++modules: report imported CMI files as dependencies

2023-07-02 Thread Ben Boeckel via Gcc-patches
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

gcc/testsuite/

* g++.dg/modules/depreport-1_a.C: New test.
* g++.dg/modules/depreport-1_b.C: New test.
* g++.dg/modules/test-depfile.py: New tool for validating depfile
information.
* lib/modules.exp: Support for validating depfile contents.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc |   3 +
 gcc/testsuite/g++.dg/modules/depreport-1_a.C |  10 +
 gcc/testsuite/g++.dg/modules/depreport-1_b.C |  12 ++
 gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++
 gcc/testsuite/lib/modules.exp|  29 +++
 5 files changed, 241 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9df60d695b1..f3acc4e02fe 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  if (cpp_get_deps (reader))
+   deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
new file mode 100644
index 000..241701728a2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
@@ -0,0 +1,10 @@
+// { dg-additional-options -fmodules-ts }
+
+export module Foo;
+// { dg-module-cmi Foo }
+
+export class Base
+{
+public:
+  int m;
+};
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
new file mode 100644
index 000..b6e317c6703
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
@@ -0,0 +1,12 @@
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -MD }
+// { dg-additional-options "-MF depreport-1.d" }
+
+import Foo;
+
+void foo ()
+{
+  Base b;
+}
+
+// { dg-final { run-check-module-dep-expect-input "depreport-1.d" 
"gcm.cache/Foo.gcm" } }
diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py 
b/gcc/testsuite/g++.dg/modules/test-depfile.py
new file mode 100644
index 000..ea4edb61434
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/test-depfile.py
@@ -0,0 +1,187 @@
+import json
+
+
+# Parameters.
+ALL_ERRORS = False
+
+
+def _report_error(msg):
+'''Report an error.'''
+full_msg = 'ERROR: ' + msg
+if ALL_ERRORS:
+print(full_msg)
+else:
+raise RuntimeError(full_msg)
+
+
+class Token(object):
+pass
+
+
+class Output(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Input(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Colon(Token):
+pass
+
+
+class Append(Token):
+pass
+
+
+class Variable(Token):
+def __init__(self, name):
+self.name = name
+
+
+class Word(Token):
+def __init__(self, name):
+self.name = name
+
+
+def validate_depfile(depfile, expect_input=None):
+'''Validate a depfile contains some information
+
+Returns `False` if the information is not found.
+'''
+with open(depfile, 'r') as fin:
+depfile_content = fin.read()
+
+real_lines = []
+join_line = False
+for line in depfile_content.split('\n'):
+# Join the line if needed.
+if join_line:
+line = real_lines.pop() + line
+
+# Detect line continuations.
+join_line = line.endswith('\\')
+# Strip line continuation characters.
+if join_line:
+line = line[:-1]
+
+# Add to the real line set.
+real_lines.append(line)
+
+# Perform tokenization.
+tokenized_lines = []
+for line in real_lines:
+tokenized = []
+join_word = False
+for word in line.split(' '):
+if join_word:
+word = tokenized.pop() + ' ' + word
+
+# Detect word joins.
+join_word = word.endswith('\\')
+# Strip escape character.
+if join_word:
+word = word[:-1]
+
+# Detect `:` at the end of a word.
+if word.endswith(':'):
+tokenized.append(word[:-1])
+word = word[-1]
+
+# Add word to the tokenized set.
+tokenized.append(word)
+
+tokenized_lines.append(tokenized)
+
+# Parse.
+ast = []
+for line in tokenized_lines:
+kind = None
+for token in line:
+if token == ':':
+kind = 'dependency'
+el

Re: [PATCH] tree-ssa-math-opts: Fix up ICE in match_uaddc_usubc [PR110508]

2023-07-02 Thread Richard Biener via Gcc-patches



> Am 02.07.2023 um 16:54 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> The match_uaddc_usubc matching doesn't require that the second
> .{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is
> at most one.  So, in the weird case where the REALPART_EXPR of it isn't
> present, we shouldn't ICE trying to replace that REALPART_EXPR with
> REALPART_EXPR of .U{ADD,SUB}C result.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2023-07-02  Jakub Jelinek  
> 
>PR tree-optimization/110508
>* tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with
>REALPART_EXPR opf nlhs if re2 is non-NULL.
> 
>* gcc.dg/pr110508.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj2023-06-20 11:22:26.726887276 +0200
> +++ gcc/tree-ssa-math-opts.cc2023-07-01 00:29:48.554230914 +0200
> @@ -4856,11 +4856,14 @@ match_uaddc_usubc (gimple_stmt_iterator
>   gsi_remove (&gsi2, true);
>   /* Replace the re2 statement with __real__ of the newly added
>  .UADDC/.USUBC call.  */
> -  gsi2 = gsi_for_stmt (re2);
> -  tree rlhs = gimple_assign_lhs (re2);
> -  g = gimple_build_assign (rlhs, REALPART_EXPR,
> -   build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
> -  gsi_replace (&gsi2, g, true);
> +  if (re2)
> +{
> +  gsi2 = gsi_for_stmt (re2);
> +  tree rlhs = gimple_assign_lhs (re2);
> +  g = gimple_build_assign (rlhs, REALPART_EXPR,
> +   build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
> +  gsi_replace (&gsi2, g, true);
> +}
>   if (rhs[2])
> {
>   /* If this is the arg1 + arg2 + (ovf1 + ovf2) or
> --- gcc/testsuite/gcc.dg/pr110508.c.jj2023-07-01 00:33:12.494405901 +0200
> +++ gcc/testsuite/gcc.dg/pr110508.c2023-07-01 00:32:24.115075870 +0200
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/110508 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void
> +foo (unsigned a, unsigned b, unsigned *c, _Bool d)
> +{
> +  __builtin_addc (a, b, d, c);
> +}
> 
>Jakub
> 


[PATCH] tree-ssa-math-opts: Fix up ICE in match_uaddc_usubc [PR110508]

2023-07-02 Thread Jakub Jelinek via Gcc-patches
Hi!

The match_uaddc_usubc matching doesn't require that the second
.{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is
at most one.  So, in the weird case where the REALPART_EXPR of it isn't
present, we shouldn't ICE trying to replace that REALPART_EXPR with
REALPART_EXPR of .U{ADD,SUB}C result.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-07-02  Jakub Jelinek  

PR tree-optimization/110508
* tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with
REALPART_EXPR opf nlhs if re2 is non-NULL.

* gcc.dg/pr110508.c: New test.

--- gcc/tree-ssa-math-opts.cc.jj2023-06-20 11:22:26.726887276 +0200
+++ gcc/tree-ssa-math-opts.cc   2023-07-01 00:29:48.554230914 +0200
@@ -4856,11 +4856,14 @@ match_uaddc_usubc (gimple_stmt_iterator
   gsi_remove (&gsi2, true);
   /* Replace the re2 statement with __real__ of the newly added
  .UADDC/.USUBC call.  */
-  gsi2 = gsi_for_stmt (re2);
-  tree rlhs = gimple_assign_lhs (re2);
-  g = gimple_build_assign (rlhs, REALPART_EXPR,
-  build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
-  gsi_replace (&gsi2, g, true);
+  if (re2)
+{
+  gsi2 = gsi_for_stmt (re2);
+  tree rlhs = gimple_assign_lhs (re2);
+  g = gimple_build_assign (rlhs, REALPART_EXPR,
+  build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
+  gsi_replace (&gsi2, g, true);
+}
   if (rhs[2])
 {
   /* If this is the arg1 + arg2 + (ovf1 + ovf2) or
--- gcc/testsuite/gcc.dg/pr110508.c.jj  2023-07-01 00:33:12.494405901 +0200
+++ gcc/testsuite/gcc.dg/pr110508.c 2023-07-01 00:32:24.115075870 +0200
@@ -0,0 +1,9 @@
+/* PR tree-optimization/110508 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void
+foo (unsigned a, unsigned b, unsigned *c, _Bool d)
+{
+  __builtin_addc (a, b, d, c);
+}

Jakub



Re: [PATCH 1/2] xtensa: Fix missing mode warning in "*eqne_INT_MIN"

2023-07-02 Thread Max Filippov via Gcc-patches
On Sat, Jul 1, 2023 at 10:21 AM Takayuki 'January June' Suwa
 wrote:
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md (*eqne_INT_MIN):
> Add missing ":SI" to the match_operator.
> ---
>  gcc/config/xtensa/xtensa.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH 2/2] xtensa: The use of CLAMPS instruction also requires TARGET_MINMAX, as well as TARGET_CLAMPS

2023-07-02 Thread Max Filippov via Gcc-patches
On Sat, Jul 1, 2023 at 10:21 AM Takayuki 'January June' Suwa
 wrote:
>
> Because both smin and smax requiring TARGET_MINMAX are essential to the
> RTL representation.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p):
> Simplify.
> * config/xtensa/xtensa.md (*xtensa_clamps):
> Add TARGET_MINMAX to the condition.
> ---
>  gcc/config/xtensa/xtensa.cc | 7 ++-
>  gcc/config/xtensa/xtensa.md | 4 ++--
>  2 files changed, 4 insertions(+), 7 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


[pushed] Darwin, Objective-C: Support -fconstant-cfstrings [PR108743].

2023-07-02 Thread Iain Sandoe via Gcc-patches
Tested on x86_64-darwin and with make pdf and an examination of the revised
output.  Pushed to trunk, thanks
Iain

--- 8< ---

This support the -fconstant-cfstrings option as used by clang (and
expect by some build scripts) as an alias to the target-specific
-mconstant-cfstrings.

The documentation is also updated to reflect that the 'f' option is
only available on Darwin, and to add the 'm' option to the Darwin
section of the invocation text.

Signed-off-by: Iain Sandoe 

PR target/108743

gcc/ChangeLog:

* config/darwin.opt: Add fconstant-cfstrings alias to
mconstant-cfstrings.
* doc/invoke.texi: Amend invocation descriptions to reflect
that the fconstant-cfstrings is a target-option alias and to
add the missing mconstant-cfstrings option description to the
Darwin section.
---
 gcc/config/darwin.opt |  4 
 gcc/doc/invoke.texi   | 27 ---
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/gcc/config/darwin.opt b/gcc/config/darwin.opt
index feaa95867b5..d655aaef2fb 100644
--- a/gcc/config/darwin.opt
+++ b/gcc/config/darwin.opt
@@ -33,6 +33,10 @@ fapple-kext
 Target C++ Var(flag_apple_kext)
 Generate code for darwin loadable kernel extensions.
 
+fconstant-cfstrings
+ObjC ObjC++ Alias(mconstant-cfstrings)
+Generate compile-time CFString objects.
+
 iframework
 Target RejectNegative C ObjC C++ ObjC++ Joined Separate
 -iframework   Add  to the end of the system framework include 
path.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index efcf3bfb3d6..26bcbe26c6c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -4830,10 +4830,12 @@ and Objective-C++ programs:
 Use @var{class-name} as the name of the class to instantiate for each
 literal string specified with the syntax @code{@@"@dots{}"}.  The default
 class name is @code{NXConstantString} if the GNU runtime is being used, and
-@code{NSConstantString} if the NeXT runtime is being used (see below).  The
-@option{-fconstant-cfstrings} option, if also present, overrides the
-@option{-fconstant-string-class} setting and cause @code{@@"@dots{}"} literals
-to be laid out as constant CoreFoundation strings.
+@code{NSConstantString} if the NeXT runtime is being used (see below).  On
+Darwin (macOS, MacOS X) platforms, the @option{-fconstant-cfstrings} option, if
+also present, overrides the @option{-fconstant-string-class} setting and cause
+@code{@@"@dots{}"} literals to be laid out as constant CoreFoundation strings.
+Note that @option{-fconstant-cfstrings} is an alias for the target-specific
+@option{-mconstant-cfstrings} equivalent.
 
 @opindex fgnu-runtime
 @item -fgnu-runtime
@@ -24118,10 +24120,21 @@ This is by default ON@.
 @item -gfull
 Emit debugging information for all symbols and types.
 
+@opindex fconstant-cfstrings
+@item -fconstant-cfstrings
+The @option{-fconstant-cfstrings} is an alias for 
@option{-mconstant-cfstrings}.
+
+@opindex mconstant-cfstrings
+@item -mconstant-cfstrings
+When the NeXT runtime is being used (the default on these systems), override
+any @option{-fconstant-string-class} setting and cause @code{@@"@dots{}"}
+literals to be laid out as constant CoreFoundation strings.
+
+@opindex mmacosx-version-min
 @item -mmacosx-version-min=@var{version}
-The earliest version of MacOS X that this executable will run on
-is @var{version}.  Typical values of @var{version} include @code{10.1},
-@code{10.2}, and @code{10.3.9}.
+The earliest version of MacOS X that this executable will run on is
+@var{version}.  Typical values supported for @var{version} include @code{12},
+@code{10.12}, and @code{10.5.8}.
 
 If the compiler was built to use the system's headers by default,
 then the default for this option is the system version on which the
-- 
2.39.2 (Apple Git-143)



[committed] d: Add testcase from PR108962

2023-07-02 Thread Iain Buclaw via Gcc-patches
Hi,

This adds testcase from PR108962 into the gdc testsuite.

The issue was fixed in r14-2232 and backported to gcc-13.

Regtested, committed to mainline and gcc-13 branches.

Regards,
Iain.

---
PR d/108962

gcc/testsuite/ChangeLog:

* gdc.dg/pr108962.d: New test.
---
 gcc/testsuite/gdc.dg/pr108962.d | 13 +
 1 file changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/pr108962.d

diff --git a/gcc/testsuite/gdc.dg/pr108962.d b/gcc/testsuite/gdc.dg/pr108962.d
new file mode 100644
index 000..0fefa126b54
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr108962.d
@@ -0,0 +1,13 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108962
+// { dg-do compile }
+// { dg-options "-fno-exceptions -fdump-tree-original" }
+extern(C) void main()
+{
+final switch (0)
+{
+case 1:
+return;
+}
+}
+// { dg-final { scan-tree-dump-times "_d_assert_msg" 1 "original" } }
+// { dg-final { scan-tree-dump-not "_d_throw" "original" } }
-- 
2.39.2



Re: [PATCH] libphobos: Handle Darwin Arm and AArch64 in fibre context asm.

2023-07-02 Thread Iain Buclaw via Gcc-patches
Excerpts from Iain Sandoe's message of Juli 2, 2023 12:22 pm:
> Tested on AArch64 (Arm64) Darwin on 11.x, 13.x and master,
> OK for trunk?
> and backports?
> thanks
> Iain
> 
> --- 8< ---
> 
> This code currently fails to build because it contains ELF-
> specific directives.  This patch excludes those directives when
> the platform is Darwin.
> 
> We do not expect switching fibres between threads to be safe here
> either owing to the possible caching of TLS pointers.
> 
> Signed-off-by: Iain Sandoe 
> 

OK.

Thanks!
Iain.


[PATCH] libphobos: Handle Darwin Arm and AArch64 in fibre context asm.

2023-07-02 Thread Iain Sandoe via Gcc-patches
Tested on AArch64 (Arm64) Darwin on 11.x, 13.x and master,
OK for trunk?
and backports?
thanks
Iain

--- 8< ---

This code currently fails to build because it contains ELF-
specific directives.  This patch excludes those directives when
the platform is Darwin.

We do not expect switching fibres between threads to be safe here
either owing to the possible caching of TLS pointers.

Signed-off-by: Iain Sandoe 

libphobos/ChangeLog:

* libdruntime/config/aarch64/switchcontext.S: Exclude ELF-
specific constructs for Darwin.
* libdruntime/config/arm/switchcontext.S: Likewise.
* libdruntime/core/thread/fiber.d: Disable switching fibres
between threads.
---
 libphobos/libdruntime/config/aarch64/switchcontext.S | 9 -
 libphobos/libdruntime/config/arm/switchcontext.S | 8 
 libphobos/libdruntime/core/thread/fiber.d| 1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/libphobos/libdruntime/config/aarch64/switchcontext.S 
b/libphobos/libdruntime/config/aarch64/switchcontext.S
index 5cfa2f698e8..d3bd646bc56 100644
--- a/libphobos/libdruntime/config/aarch64/switchcontext.S
+++ b/libphobos/libdruntime/config/aarch64/switchcontext.S
@@ -44,7 +44,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
  */
 .text
 .global CSYM(fiber_switchContext)
+#ifndef __APPLE__
 .type CSYM(fiber_switchContext), %function
+#endif
 .align 4
 CSYM(fiber_switchContext):
 .cfi_startproc
@@ -77,8 +79,9 @@ CSYM(fiber_switchContext):
 ldp d15, d14, [sp], #20*8
 ret
 .cfi_endproc
+#ifndef __APPLE__
 .size CSYM(fiber_switchContext),.-CSYM(fiber_switchContext)
-
+#endif
 /**
  * When generating any kind of backtrace (gdb, exception handling) for
  * a function called in a Fiber, we need to tell the unwinder to stop
@@ -93,11 +96,15 @@ CSYM(fiber_switchContext):
 .text
 .global CSYM(fiber_trampoline)
 .p2align  2
+#ifndef __APPLE__
 .type CSYM(fiber_trampoline), %function
+#endif
 CSYM(fiber_trampoline):
 .cfi_startproc
 .cfi_undefined x30
 // fiber_entryPoint never returns
 bl CSYM(fiber_entryPoint)
 .cfi_endproc
+#ifndef __APPLE__
 .size CSYM(fiber_trampoline),.-CSYM(fiber_trampoline)
+#endif
diff --git a/libphobos/libdruntime/config/arm/switchcontext.S 
b/libphobos/libdruntime/config/arm/switchcontext.S
index 3f9b35e7334..f1f2060fd97 100644
--- a/libphobos/libdruntime/config/arm/switchcontext.S
+++ b/libphobos/libdruntime/config/arm/switchcontext.S
@@ -60,11 +60,15 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  */
 
 .text
+#ifndef __APPLE__
 #if defined(__ARM_PCS_VFP) || (defined(__ARM_PCS) && !defined(__SOFTFP__))
 .fpu vfp
 #endif
 .global CSYM(fiber_switchContext)
 .type CSYM(fiber_switchContext), %function
+#else
+.global CSYM(fiber_switchContext)
+#endif
 .align 4
 CSYM(fiber_switchContext):
 .cfi_sections .debug_frame
@@ -111,8 +115,12 @@ CSYM(fiber_switchContext):
 mov lr, #0
 // return by writing lr into pc
 mov pc, r1
+#ifndef __APPLE__
 .fnend
 .cfi_endproc
 .size CSYM(fiber_switchContext),.-CSYM(fiber_switchContext)
+#else
+.cfi_endproc
+#endif
 
 #endif
diff --git a/libphobos/libdruntime/core/thread/fiber.d 
b/libphobos/libdruntime/core/thread/fiber.d
index 4590ff1c052..66fb9dad89d 100644
--- a/libphobos/libdruntime/core/thread/fiber.d
+++ b/libphobos/libdruntime/core/thread/fiber.d
@@ -1785,6 +1785,7 @@ version (OSX)
 {
 version (X86)version = UnsafeFiberMigration;
 version (X86_64) version = UnsafeFiberMigration;
+version (AArch64) version = UnsafeFiberMigration;
 }
 
 version (UnsafeFiberMigration)
-- 
2.39.2 (Apple Git-143)



Re: [PATCH V5] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-07-02 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richi and Richard.
>
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter 
> memory
> operations. Consider this following case:
>
> #include 
> void
> f (uint8_t *restrict a,
>uint8_t *restrict b, int n,
>int base, int step,
>int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
> {
>   if (cond[i])
> a[i * step + base] = b[i * step + base];
> }
> }
>
> We hope RVV can vectorize such case into following IR:
>
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias)
> LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias)
>
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.
>
> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
>
> Thanks.
>
> ---
>  gcc/doc/md.texi | 17 
>  gcc/internal-fn.cc  | 67 +++--
>  gcc/internal-fn.def |  8 --
>  gcc/internal-fn.h   |  1 +
>  gcc/optabs.def  |  2 ++
>  5 files changed, 90 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9648fdc846a..df41b5251d4 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
> @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
> (operand 5),
> +a mask operand (operand 6) as well as a bias operand (operand 7).  Similar 
> to len_maskload,
> +the instruction loads at most (operand 5 + operand 7) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with i > (operand 5) are ignored.

Nit: second i should be @var{i} too.

> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
> extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
> (operand 5),
> +a mask operand (operand 6) as well as a bias operand (operand 7).  The 
> instruction stores
> +at most (operand 5 + operand 7) elements of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
> +Mask elements @var{i} with i > (operand 5) are ignored.

Same here.

> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9017176dc7a..da3827481e9 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall 
> *stmt, direct_optab optab)
>HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>rtx rhs_rtx = expand_normal (rhs);
>  
> -  class expand_operand ops[6];
> +  class expand_operand ops[8];
>int i = 0;
>create_address_operand (&ops[i++], base_rtx);
>create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE 
> (offset)));
> @@ -3546,9 +3546,23 @@ expand_scatter_store_optab_fn (internal_fn, gcall 
> *stmt, direct_optab optab)
>create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
>if (mask_index >= 0)
>  {
> +  if (optab == len_mask_scatter_store_optab)
> + {
> +   tree len = gimple_call_arg (stmt, internal_fn_len_index (ifn));
> +   rtx len_rtx = expand_normal (len);
> +   create_convert_operand_from (&ops[i++], len_rtx,
> +TYPE_MODE (TREE_TYPE (len)),
> +TYPE_UNSIGNED (TREE_TYPE (len)));
> + }
>tree mask = gimple_call_arg (stmt, mask_index);
>rtx mask_rtx = expand_normal (mask);
>create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE 
> (mask)));
> +  if (optab == len_mask_scatter_store_optab)
> + {
> +   tree biast = gimple_call_arg (stmt, gimple_call_num_args (stmt) - 1);
> +   rtx bias = expand_normal (biast);
> +   create_input_operand (&ops[i++], bias, QI

Re: [PATCH 0/9] vect: Move costing next to the transform for vect load

2023-07-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin  wrote:
>>
>> This patch series follows Richi's suggestion at the link [1],
>> which suggest structuring vectorizable_load to make costing
>> next to the transform, in order to make it easier to keep
>> costing and the transform in sync.

FTR, I was keeping quiet given that this was following an agreed plan :)

Thanks for organising the series this way.  It made it easier to review.

>> For now, it's a known
>> issue that what we cost can be inconsistent with what we
>> transform, as the case in PR82255 and some other associated
>> test cases in the patches of this series show.
>>
>> Basically this patch series makes costing not call function
>> vect_model_load_cost any more.  To make the review and
>> bisection easy, I organized the changes according to the
>> memory access types of vector load.  For each memory access
>> type, firstly it follows the handlings in the function
>> vect_model_load_costto avoid any missing, then refines
>> further by referring to the transform code, I also checked
>> them with some typical test cases to verify.  Hope the
>> subjects of patches are clear enough.
>>
>> The whole series can be bootstrapped and regtested
>> incrementally on:
>>   - x86_64-redhat-linux
>>   - aarch64-linux-gnu
>>   - powerpc64-linux-gnu P7, P8 and P9
>>   - powerpc64le-linux-gnu P8, P9 and P10
>>
>> By considering the current vector test buckets are mainly
>> tested without cost model, I also verified the whole patch
>> series was neutral for SPEC2017 int/fp on Power9 at O2,
>> O3 and Ofast separately.
>
> I went through the series now and I like it overall (well, I suggested
> the change).
> Looking at the changes I think we want some followup to reduce the
> mess in the final loop nest.  We already have some VMAT_* cases handled
> separately, maybe we can split out some more cases.  Maybe we should
> bite the bullet and duplicate that loop nest for the different VMAT_* cases.
> Maybe we can merge some of the if (!costing_p) checks by clever
> re-ordering.  So what
> this series doesn't improve is overall readability of the code (indent and our
> 80 char line limit).
>
> The change also makes it more difficult(?) to separate analysis and transform
> though in the end I hope that analysis will actually "code generate" to a 
> (SLP)
> data structure so the target will have a chance to see the actual flow of 
> insns.
>
> That said, I'd like to hear from Richard whether he thinks this is a step
> in the right direction.

Yeah, agree that it's probably better on balance.  It's going to need a
bit of discipline to make sure that we don't accidentally change the IR
during the analysis phase, but I guess that already exists to a lesser
extent with the “before !vec_stmt”/“after !vec_stmt” split.

Thanks,
Richard


Re: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

2023-07-02 Thread Richard Sandiford via Gcc-patches
Kewen Lin  writes:
> @@ -9743,11 +9739,23 @@ vectorizable_load (vec_info *vinfo,
>unsigned int n_groups = 0;
>for (j = 0; j < ncopies; j++)
>   {
> -   if (nloads > 1)
> +   if (nloads > 1 && !costing_p)
>   vec_alloc (v, nloads);
> gimple *new_stmt = NULL;
> for (i = 0; i < nloads; i++)
>   {
> +   if (costing_p)
> + {
> +   if (VECTOR_TYPE_P (ltype))
> + vect_get_load_cost (vinfo, stmt_info, 1,
> + alignment_support_scheme, misalignment,
> + false, &inside_cost, nullptr, cost_vec,
> + cost_vec, true);
> +   else
> + inside_cost += record_stmt_cost (cost_vec, 1, scalar_load,
> +  stmt_info, 0, vect_body);
> +   continue;
> + }

Just a note that this might make life harder for AArch64 costing.
Strided SLP loads are still equivalent to vector loads for AArch64,
since they happen on the FPR/vector side even if they have integral
modes.

But I agree this is more accurate from a general target-independent POV,
especially given the relatively coarse-grain costing enum.  So I think
that's just something AArch64 will need to account for.

Thanks,
Richard