[Patch, fortran] PR112316 - [13 Regression] Fix for PR87477 rejects valid code with a bogus error...

2023-11-02 Thread Paul Richard Thomas
Hi All,

I have pushed as 'obvious' a fix for this regression to both 13-branch and
mainline. The patch itself looks substantial but it consists entirely of
the removal of a condition and repagination of the corresponding block.
Please see below for part of my first comment on the PR for an explanation.

Paul

A temporary work around is to invert the order of the contained procedures.

The problem is caused by a stupid (on my part :-( ) oversight:
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index e103ebee557..f88f9be3be8 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -5196,7 +5196,7 @@ parse_associate (void)
}
}

-  if (target->rank)
+  if (1)
{
  int rank = 0;
  rank = target->rank;

fixes the problem and regtests OK.


[tree-optimization/111721 V2] VECT: Support SLP for MASK_LEN_GATHER_LOAD with dummy mask

2023-11-02 Thread Juzhe-Zhong
This patch fixes following FAILs for RVV:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"

Bootstrap on X86 and regtest passed.

Ok for trunk ?

PR tree-optimization/111721

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for dummy 
mask -1.
* tree-vect-stmts.cc (vectorizable_load): Ditto.

---
 gcc/tree-vect-slp.cc   | 5 ++---
 gcc/tree-vect-stmts.cc | 5 +++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 43d742e3c92..6b8a7b628b6 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -759,9 +759,8 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char 
swap,
  if ((dt == vect_constant_def
   || dt == vect_external_def)
  && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
- && (TREE_CODE (type) == BOOLEAN_TYPE
- || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
- type)))
+ && TREE_CODE (type) != BOOLEAN_TYPE
+ && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6ce4868d3e1..8c92bd5d931 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9825,6 +9825,7 @@ vectorizable_load (vec_info *vinfo,
 
   tree mask = NULL_TREE, mask_vectype = NULL_TREE;
   int mask_index = -1;
+  slp_tree slp_op = NULL;
   if (gassign *assign = dyn_cast  (stmt_info->stmt))
 {
   scalar_dest = gimple_assign_lhs (assign);
@@ -9861,7 +9862,7 @@ vectorizable_load (vec_info *vinfo,
mask_index = vect_slp_child_index_for_operand (call, mask_index);
   if (mask_index >= 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask, &slp_op, &mask_dt, &mask_vectype))
return false;
 }
 
@@ -10046,7 +10047,7 @@ vectorizable_load (vec_info *vinfo,
 {
   if (slp_node
  && mask
- && !vect_maybe_update_slp_op_vectype (SLP_TREE_CHILDREN (slp_node)[0],
+ && !vect_maybe_update_slp_op_vectype (slp_op,
mask_vectype))
{
  if (dump_enabled_p ())
-- 
2.36.3



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Martin Uecker
Am Freitag, dem 03.11.2023 um 07:22 +0100 schrieb Jakub Jelinek:
> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
> > Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> > > On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> > > > 
> > > > Thanks a lot for raising these issues.
> > > > 
> > > > If I understand correctly,  the major question we need to answer is:
> > > > 
> > > > For the following example: (Jakub mentioned this  in an early message)
> > > > 
> > > >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> > > >   2 struct S s;
> > > >   3 s.a = 5;
> > > >   4 char *p = &s.b[2];
> > > >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> > > >   6 s.a = 3;
> > > >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > > > 
> > > > Should the 2nd __bdos call (line 7) get
> > > > A. the latest value of s.a (line 6) for it’s size?
> > > > Or  B. the value when the s.b was referenced (line 3, line 4)?
> > > > 
> > > I personally think it should be (A). The user is specifically
> > > indicating that the size has somehow changed, and the compiler should
> > > behave accordingly.
> > 
> > 
> > One potential problem for A apart from the potential impact on
> > optimization is that the information may get lost more
> > easily. Consider:
> > 
> > char *p = &s.b[2];
> > f(&s);
> > int i = __bdos(p, 0);
> > 
> > If the compiler can not see into 'f', the information is lost
> > because f may have changed the size.
> 
> Why?  It doesn't really matter.  The options are
> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
>POV we can describe it with more detail that it doesn't modify anything
>in the pointed structure, just escapes the pointer; __bdos can stay
>leaf I believe; and when expanding __bdos later on, it would just
>dereference the associated pointer at that point (note, __bdos is
>pure, so it has vuse but not vdef and can load from memory); if
>f changes s.a, no problem, __bdos will load the changed value in there

Ah, I right. Because of the reload it doesn't matter. 
Thank you for the explanation!

Martin

> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>__bdos later will use s.a value from the &s.b[2] spot



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> > On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> > > 
> > > Thanks a lot for raising these issues.
> > > 
> > > If I understand correctly,  the major question we need to answer is:
> > > 
> > > For the following example: (Jakub mentioned this  in an early message)
> > > 
> > >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> > >   2 struct S s;
> > >   3 s.a = 5;
> > >   4 char *p = &s.b[2];
> > >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> > >   6 s.a = 3;
> > >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > > 
> > > Should the 2nd __bdos call (line 7) get
> > > A. the latest value of s.a (line 6) for it’s size?
> > > Or  B. the value when the s.b was referenced (line 3, line 4)?
> > > 
> > I personally think it should be (A). The user is specifically
> > indicating that the size has somehow changed, and the compiler should
> > behave accordingly.
> 
> 
> One potential problem for A apart from the potential impact on
> optimization is that the information may get lost more
> easily. Consider:
> 
> char *p = &s.b[2];
> f(&s);
> int i = __bdos(p, 0);
> 
> If the compiler can not see into 'f', the information is lost
> because f may have changed the size.

Why?  It doesn't really matter.  The options are
A. p is at &s.b[2] associated with &s.a and int type (or size of int
   or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
   POV we can describe it with more detail that it doesn't modify anything
   in the pointed structure, just escapes the pointer; __bdos can stay
   leaf I believe; and when expanding __bdos later on, it would just
   dereference the associated pointer at that point (note, __bdos is
   pure, so it has vuse but not vdef and can load from memory); if
   f changes s.a, no problem, __bdos will load the changed value in there
B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
   __bdos later will use s.a value from the &s.b[2] spot

Jakub



[PATCH] g++: Add require-effective-target to multi-input file testcase pr95401.cc

2023-11-02 Thread Patrick O'Neill
On non-vector targets dejagnu attempts dg-do compile for pr95401.cc.
This produces a command like this:
g++ pr95401.cc pr95401a.cc -S -o pr95401.s

which isn't valid (gcc does not accept multiple input files when using
-S with -o).

This patch adds require-effective-target vect_int to avoid the case
where the testcase is invoked with dg-do compile.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr95401.cc: Add require-effective-target vect_int.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc & rv64gcv to make sure the testcase runs/doesn't
compile as expected.

Somewhat related/similar problem (running vector test on non-vector
target):
https://inbox.sourceware.org/gcc-patches/20231102190911.66763-1-patr...@rivosinc.com/T/#u
https://inbox.sourceware.org/gcc-patches/20231102234527.77231-1-patr...@rivosinc.com/T/#u

Ideally we would have a way to ban a dejagnu dg-do or an equivalent to
check_vect_support_and_set_flags that returns true/false based on
if it emits run/compile as the default dg-do command.
require-effective-target seems to be a reasonable alternative.
pr95401.cc and pr95401a.cc only use int variables and arrays.
---
 gcc/testsuite/g++.dg/vect/pr95401.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/vect/pr95401.cc 
b/gcc/testsuite/g++.dg/vect/pr95401.cc
index 6a56dab0957..6a1b65ff0e7 100644
--- a/gcc/testsuite/g++.dg/vect/pr95401.cc
+++ b/gcc/testsuite/g++.dg/vect/pr95401.cc
@@ -1,5 +1,6 @@
 // { dg-additional-options "-mavx2 -O3" { target avx2_runtime } }
 // { dg-additional-sources pr95401a.cc }
+// { dg-require-effective-target vect_int }
 
 extern int var_9;
 extern unsigned var_14;
-- 
2.34.1



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Martin Uecker
Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> > 
> > Thanks a lot for raising these issues.
> > 
> > If I understand correctly,  the major question we need to answer is:
> > 
> > For the following example: (Jakub mentioned this  in an early message)
> > 
> >   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
> >   2 struct S s;
> >   3 s.a = 5;
> >   4 char *p = &s.b[2];
> >   5 int i1 = __builtin_dynamic_object_size (p, 0);
> >   6 s.a = 3;
> >   7 int i2 = __builtin_dynamic_object_size (p, 0);
> > 
> > Should the 2nd __bdos call (line 7) get
> > A. the latest value of s.a (line 6) for it’s size?
> > Or  B. the value when the s.b was referenced (line 3, line 4)?
> > 
> I personally think it should be (A). The user is specifically
> indicating that the size has somehow changed, and the compiler should
> behave accordingly.


One potential problem for A apart from the potential impact on
optimization is that the information may get lost more
easily. Consider:

char *p = &s.b[2];
f(&s);
int i = __bdos(p, 0);

If the compiler can not see into 'f', the information is lost
because f may have changed the size.

And if I understand it correctly, if the pointers escapes
with .ACCESS_WITH_SIZE, then this is already true for:

char *p = &s.b[2];
g();
int i = __bdos(p, 0);


If we make it UB to change the size, then I guess we could
also delay this choice.  Or we implement B but have a UBSan
option based on A that only verifies at run-time that the size 
did not change.


Martin


> 
> > A should be more convenient for the user to use the dynamic array feature.
> > With B, the user has to modify the source code (to add code to “re-obtain”
> > the pointer after the size was adjusted at line 6) as mentioned by Richard.
> > 
> > This depends on how we design the new internal function .ACCESS_WITH_SIZE
> > 
> > 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
> > 
> > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> > 
> > 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
> > 
> > PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)
> > 
> > With 1, We can only provide B, the user needs to modify the source code to 
> > get the full feature of dynamic array;
> > With 2, We can provide  A, the user will get full support to the dynamic 
> > array without restrictions in the source code.
> > 
> My understanding of ACCESS_WITH_SIZE is that it's there to add an
> explicit reference to SIZE so that the optimizers won't reorder the
> code incorrectly. If that's the case, then it should act as if
> ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
> dereference into the FAM). We get that with (2) it appears. It would
> be a major headache to make the user go throughout their code base to
> ensure that SIZE was either unmodified, or if it was that extra code
> must be added to ensure the expected behavior.
> 
> > However, We have to pay additional cost for supporting A by using 2, which 
> > includes:
> > 
> > 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact 
> > the IPA optimizations, more runtime overhead.
> > Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be 
> > PURE?
> > 
> > 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
> > impact some IPA optimizations, more runtime overhead.
> > 
> > I think the following are the factors that make the decision:
> > 
> > 1. How big the performance impact?
> > 2. How important the dynamic array feature? Is adding some user 
> > restrictions as Richard mentioned feasible to support this feature?
> > 
> > Maybe we can implement 1 first, if the full support to the dynamic array is 
> > needed, we can add 2 then?
> > Or, we can implement both, and compare the performance difference, then 
> > decide?
> > 
> > Qing
> > 



Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-02 Thread waffl3x
> > That leaves 2, 4, and 5.
> > 
> > 2. I am pretty sure xobj functions should have the struct they are a
> > part of recorded as the method basetype member. I have already checked
> > that function_type and method_type are the same node type under the
> > hood and it does appear to be, so it should be trivial to set it.
> > However I do have to wonder why static member functions don't set it,
> > it seems to be that it would be convenient to use the same field. Can
> > you provide any insight into that?
> 
> 
> method basetype is only for METHOD_TYPE; if you want the containing type
> of the function, that's DECL_CONTEXT.

-- gcc/tree.h:530
#define FUNC_OR_METHOD_CHECK(T) TREE_CHECK2 (T, FUNCTION_TYPE, METHOD_TYPE)
-- gcc/tree.h:2518
#define TYPE_METHOD_BASETYPE(NODE)  \
  (FUNC_OR_METHOD_CHECK (NODE)->type_non_common.maxval)

The code doesn't seem to reflect that, perhaps since the underlying
node type is the same (as far as I can tell, they both inherit from
tree_type_non_common) it wasn't believed to be necessary.

Upon looking at DECL_CONTEXT though I see it seems you were thinking of
FUNCTION_DECL. I haven't observed DECL_CONTEXT being set for
FUNCTION_DECL nodes though, perhaps it should be? Although it's more
likely that it is being set and I just haven't noticed, but if that's
the case I'll have to make a note to make sure it is being set for xobj
member functions.

I was going to say that this seems like a redundancy, but I realized
that the type of a member function pointer is tied to the struct, so it
actually ends up relevant for METHOD_TYPE nodes. I would hazard a guess
that when forming member function pointers the FUNCTION_DECL node was
not as easily accessed. If I remember correctly that is not the case
right now so it might be worthwhile to refactor away from
TYPE_METHOD_BASETYPE and replace uses of it with DECL_CONTEXT.

I'm getting ahead of myself though, I'll stop here and avoid going on
too much of a refactoring tangent. I do want this patch to make it into
GCC14 after all.

> > 4. I have no comment here, but it does concern me since I don't
> > understand it at all.
> 
> 
> In the list near the top of cp-tree.h, DECL_LANG_FLAG_6 for a
> FUNCTION_DECL is documented to be DECL_THIS_STATIC, which should only be
> set on the static member.

Right, I'll try to remember to check this area in the future, but yeah
that tracks because I did remove that flag. Removing that flag just so
happened to be the start of this saga of bug fixes but alas, it had to
be done.

> > 5. I am pretty sure this is fine for now, but if xobj member functions
> > ever were to support virtual/override capabilities, then it would be a
> > problem. Is my understanding correct, or is there some other reason
> > that iobj member functions have a different value here?
> 
> 
> It is surprising that an iobj memfn would have a different DECL_ALIGN,
> but it shouldn't be a problem; the vtables only rely on alignment being
> at least 2.

I'll put a note for myself to look into it in the future, it's an
oddity at minimum and oddities interest me :^).

> > There are also some differences in the arg param in
> > cp_build_addr_expr_1 that concerns me, but most of them are reflected
> > in the differences I have already noted. I had wanted to include these
> > differences as well but I have been spending too much time staring at
> > it, it's no longer productive. In short, the indirect_ref node for xobj
> > member functions has reference_to_this set, while iobj member functions
> > do not.
> 
> 
> That's a result of difference 3.

Okay, makes sense, I'm mildly concerned about any possible side effects
this might have since we have a function_type node suddenly going
through execution paths that only method_type went through before. The
whole "reference_to_this" "pointer_to_this" thing is a little confusing
because I'm pretty sure that doesn't refer to the actual `this` object
argument or parameter since I've seen it all over the place. Hopefully
it's benign.

> > The baselink binfo field has the private flag set for xobj
> > member functions, iobj member functions does not.
> 
> 
> TREE_PRIVATE on a binfo is part of BINFO_ACCESS, which isn't a fixed
> value, but gets updated during member search. Perhaps the differences
> in consideration of conversion to a base led to it being set or cleared
> differently? I wouldn't worry too much about it unless you see
> differences in access control.

Unfortunately I don't have any tests for private/public access yet,
it's one of the area's I've neglected. Unfortunately I probably won't
put too much effort into writing TOO many more right now as it takes up
a lot of my time. I still have to clean up the ones I currently have
and I have a few I wanted to write that are not yet written.

> > I've spent too much time on this write up, so I am calling it here, it
> > wasn't all a waste of time because half of what I was doing here are
> > things I was going to need to do any

Re: [PATCH] recog/reload: Remove old UNARY_P operand support

2023-11-02 Thread Hans-Peter Nilsson
> From: Richard Sandiford 
> Date: Tue, 24 Oct 2023 11:14:20 +0100

> reload and constrain_operands had some old code to look through unary
> operators.  E.g. an operand could be (sign_extend (reg X)), and the
> constraints would match the reg rather than the sign_extend.
> 
> This was previously used by the MIPS port.  But relying on it was a
> recurring source of problems, so Eric and I removed it in the MIPS
> rewrite from ~20 years back.  I don't know of any other port that used it.

The SH did.  I remember this being one of the ugliest warts
of reload.  IIRC, there was a bit of a discourse involving
me and Joern way-back (also IIRC some 20 years ago, at least
before IRA and LRA).  The conclusion was that removing this
misfeature would be ok, as already at that time, there was
no de-facto beneficial effect for sh, likely due to code
rot.  However, no action was taken; no code changed.

Thanks for removing the last(?) bits!

brgds, H-P



Re: [PATCH v2] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-11-03 11:26
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Refactor prefix [I/L/LL] rounding API autovec 
iterator
From: Pan Li 
 
Update in v2:
 
* Add mode size equal check to disable different mode size when expand,
  because the underlying codegen is not implemented yet.
 
Original log:
 
The previous rounding API start with i/l/ll only works on the same
mode types. For example as below, and we arrange the iterator similar
to fcvt.
 
* SF => SI
* DF => DI
 
After we refined this limination from middle-end, these API can also
vectorized with different type sizes, aka:
 
* HF => SI, HF => DI
* SF => DI, SF => SI
* DF => SI, DF => DI
 
Then the iterator cannot take care of this simply and this patch
would like to re-arrange the iterator in two items.
 
* V_VLS_F_CONVERT_SI: handle (HF, SF, DF) => SI
* V_VLS_F_CONVERT_DI: handle (HF, SF, DF) => DI
 
As well as related mode_attr to reconcile the new iterator.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (lrint2): Remove.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
(lrint2): New pattern for cvt from
FP to SI.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
(lrint2): New pattern for cvt from
FP to DI.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
* config/riscv/vector-iterators.md: Renew iterators for both
the SI and DI.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md  |  90 +---
gcc/config/riscv/vector-iterators.md | 199 ---
2 files changed, 251 insertions(+), 38 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f5e3e347ace..cc4c9596bbf 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2395,42 +2395,92 @@ (define_expand "roundeven2"
   }
)
-(define_expand "lrint2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+;; Add mode_size equal check as we opened the modes for different sizes.
+;; The check will be removed soon after related codegen implemented
+(define_expand "lrint2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
   {
-riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
)
-(define_expand "lround2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+(define_expand "lrint2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
   {
-riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
)
-(define_expand "lceil2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+(define_expand "lround2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
   {
-riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
   }
)
-(define_expand "lfloor2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+(define_expand "lround2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
+  {
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lceil2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
+  {
+riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lceil2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trappin

[PATCH v2] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread pan2 . li
From: Pan Li 

Update in v2:

* Add mode size equal check to disable different mode size when expand,
  because the underlying codegen is not implemented yet.

Original log:

The previous rounding API start with i/l/ll only works on the same
mode types. For example as below, and we arrange the iterator similar
to fcvt.

* SF => SI
* DF => DI

After we refined this limination from middle-end, these API can also
vectorized with different type sizes, aka:

* HF => SI, HF => DI
* SF => DI, SF => SI
* DF => SI, DF => DI

Then the iterator cannot take care of this simply and this patch
would like to re-arrange the iterator in two items.

* V_VLS_F_CONVERT_SI: handle (HF, SF, DF) => SI
* V_VLS_F_CONVERT_DI: handle (HF, SF, DF) => DI

As well as related mode_attr to reconcile the new iterator.

gcc/ChangeLog:

* config/riscv/autovec.md (lrint2): Remove.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
(lrint2): New pattern for cvt from
FP to SI.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
(lrint2): New pattern for cvt from
FP to DI.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
* config/riscv/vector-iterators.md: Renew iterators for both
the SI and DI.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md  |  90 +---
 gcc/config/riscv/vector-iterators.md | 199 ---
 2 files changed, 251 insertions(+), 38 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f5e3e347ace..cc4c9596bbf 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2395,42 +2395,92 @@ (define_expand "roundeven2"
   }
 )
 
-(define_expand "lrint2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+;; Add mode_size equal check as we opened the modes for different sizes.
+;; The check will be removed soon after related codegen implemented
+(define_expand "lrint2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
   {
-riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
 )
 
-(define_expand "lround2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+(define_expand "lrint2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
   {
-riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
 )
 
-(define_expand "lceil2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+(define_expand "lround2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
   {
-riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
   }
 )
 
-(define_expand "lfloor2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+(define_expand "lround2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
+  {
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lceil2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
+  {
+riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lceil2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
+&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-02 Thread Jason Merrill

On 10/28/23 00:07, waffl3x wrote:

I wanted to change DECL_NONSTATIC_MEMBER_FUNCTION_P to include explicit
object member functions, but it had some problems when I made the
modification. I also noticed that it's used in cp-objcp-common.cc so
would making changes to it be a bad idea?

-- cp-tree.h
```
/* Nonzero for FUNCTION_DECL means that this decl is a non-static
member function.  */
#define DECL_NONSTATIC_MEMBER_FUNCTION_P(NODE) \
   (TREE_CODE (TREE_TYPE (NODE)) == METHOD_TYPE)
```
I didn't want to investigate the problems as I was knee deep in
investigating the addressof bug. So I instead modified
DECL_FUNCTION_MEMBER_P to include explicit object member functions and
moved on.

-- cp-tree.h
```
/* Nonzero for FUNCTION_DECL means that this decl is a member function
(static or non-static).  */
#define DECL_FUNCTION_MEMBER_P(NODE) \
   (DECL_NONSTATIC_MEMBER_FUNCTION_P (NODE) || DECL_STATIC_FUNCTION_P (NODE) \
   || DECL_IS_XOBJ_MEMBER_FUNC (NODE))
```
I am mostly just mentioning this here in case it becomes more relevant
later. Looking at how much DECL_NONSTATIC_MEMBER_FUNCTION_P is used
throughout the code I now suspect that adding explicit object member
functions to it might cause xobj member functions to be treated as
regular member functions when they should not be.

If this change were to stick it would cause a discrepancy in the
behavior of DECL_NONSTATIC_MEMBER_FUNCTION_P and it's name. If we were
to do this, I think it's important we document the discrepancy and why
it exists, and in the future, it should possibly be refactored. One
option would be to simply rename it to DECL_IOBJ_MEMBER_FUNCTION_P.
After all, I suspect that it's unlikely that the current macro
(DECL_NONSTATIC_MEMBER_FUNCTION_P) is being used in places that concern
explicit object member functions. So just adding explicit object member
functions to it will most likely just result in headaches.

It seems to me that would be the best solution, so when and if it comes
up again, I think that route should be considered.


Agreed, it sounds good to rename the current macro and then add a new 
macro that includes both implicit and explicit, assuming that's a useful 
category.



Secondly, there are some differences in the nodes describing an
explicit object member function from those describing static member
functions and implicit object member functions that I am not sure
should be present.

I did my best to summarize the differences, if you want the logs of
tree_debug that I derived them from I can provide them. Most of my
understanding of the layout of the nodes is from reading print-tree.cc
and looking at debug_tree outputs, so it's possible I made a mistake.

I am opting to use the names of members as they are output by
debug_tree, I recognize this is not always the actual name of the
member in the actual tree_node structures.

Additionally, some of the differences listed are to be expected and are
most likely the correct values for each node. However, I wanted to be
exhaustive when listing them just in case I am mistaken in my opinion
on whether the differences should or should not occur.

The following declarations were used as input to the compiler.
iobj decl:
struct S { void f() {} };
xobj decl:
struct S { void f(this S&) {} };
static decl:
struct S { static void f(S&) {} };

These differences can be observed in the return values of
grokdeclarator for each declaration.

1. function_decl::type::tree_code
iobj: method_type
xobj: function_type
stat: function_type
2. function_decl::type::method basetype
iobj: 
xobj: NULL/no output
stat: NULL/no output
3. function_decl::type::arg-types::tree_list[0]::value
iobj: 
xobj: 
stat: 
4. function_decl::decl_6
iobj: false/no output
xobj: false/no output
stat: true
5. function_decl::align
iobj: 16
xobj: 8
stat: 8
6. function_decl::result::uid
iobj: D.2513
xobj: D.2513
stat: D.2512
7. function_decl::full-name
iobj: "void S::f()"
xobj: "void S::f(this S&)"

Differences 1, 3, and 7 seem obviously correct to me for all 3
declarations, 6 is a little bizarre to me, but since it's just a UID
it's merely an oddity, I doubt it is concerning.


Agreed.


That leaves 2, 4, and 5.

2. I am pretty sure xobj functions should have the struct they are a
part of recorded as the method basetype member. I have already checked
that function_type and method_type are the same node type under the
hood and it does appear to be, so it should be trivial to set it.
However I do have to wonder why static member functions don't set it,
it seems to be that it would be convenient to use the same field. Can
you provide any insight into that?


method basetype is only for METHOD_TYPE; if you want the containing type 
of the function, that's DECL_CONTEXT.



4. I have no comment here, but it does concern me since I don't
understand it at all.


In the list near the top of cp-tree.h, DECL_LANG_FLAG_6 for a 
FUNCTION_DECL is documented to be DECL_THIS_STATIC, which should only be 
set on the static member.



5. I am 

libstdc++ patch RFA: Fix dl_iterate_phdr configury for libbacktrace

2023-11-02 Thread Ian Lance Taylor
The libbacktrace sources, as used by libstdc++-v3, fail to correctly
determine whether the system supports dl_iterate_phdr.  The issue is
that the libbacktrace configure assumes that _GNU_SOURCE is defined
during compilation, but the libstdc++-v3 configure does not do that.
This configury failure is the cause of PR 112263.

This patch fixes the problem.  OK for mainline?

Ian

PR libbacktrace/112263
* acinclude.m4: Set -D_GNU_SOURCE in BACKTRACE_CPPFLAGS and when
grepping link.h for dl_iterate_phdr.
* configure: Regenerate.
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index d8f0ba1c3e2..41446c2c3d6 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5443,7 +5443,7 @@ AC_DEFUN([GLIBCXX_ENABLE_BACKTRACE], [
 
   # Most of this is adapted from libsanitizer/configure.ac
 
-  BACKTRACE_CPPFLAGS=
+  BACKTRACE_CPPFLAGS="-D_GNU_SOURCE"
 
   # libbacktrace only needs atomics for int, which we've already tested
   if test "$glibcxx_cv_atomic_int" = "yes"; then
@@ -5471,8 +5471,11 @@ AC_DEFUN([GLIBCXX_ENABLE_BACKTRACE], [
 have_dl_iterate_phdr=no
   else
 # When built as a GCC target library, we can't do a link test.
+ac_save_CPPFLAGS="$CPPFLAGS"
+CPPFLAGS="$CPPFLAGS -D_GNU_SOURCE"
 AC_EGREP_HEADER([dl_iterate_phdr], [link.h], [have_dl_iterate_phdr=yes],
[have_dl_iterate_phdr=no])
+CPPFLAGS="$ac_save_CPPFLAGS"
   fi
   if test "$have_dl_iterate_phdr" = "yes"; then
 BACKTRACE_CPPFLAGS="$BACKTRACE_CPPFLAGS -DHAVE_DL_ITERATE_PHDR=1"
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 9f12c5baa3f..693564d3c7e 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -73299,7 +73299,7 @@ fi
 
   # Most of this is adapted from libsanitizer/configure.ac
 
-  BACKTRACE_CPPFLAGS=
+  BACKTRACE_CPPFLAGS="-D_GNU_SOURCE"
 
   # libbacktrace only needs atomics for int, which we've already tested
   if test "$glibcxx_cv_atomic_int" = "yes"; then
@@ -73382,6 +73382,8 @@ done
 have_dl_iterate_phdr=no
   else
 # When built as a GCC target library, we can't do a link test.
+ac_save_CPPFLAGS="$CPPFLAGS"
+CPPFLAGS="$CPPFLAGS -D_GNU_SOURCE"
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include 
@@ -73395,6 +73397,7 @@ else
 fi
 rm -f conftest*
 
+CPPFLAGS="$ac_save_CPPFLAGS"
   fi
   if test "$have_dl_iterate_phdr" = "yes"; then
 BACKTRACE_CPPFLAGS="$BACKTRACE_CPPFLAGS -DHAVE_DL_ITERATE_PHDR=1"


Re: [PATCH] c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341]

2023-11-02 Thread Jason Merrill

On 11/2/23 03:53, Jakub Jelinek wrote:

On Fri, Oct 27, 2023 at 07:05:34PM -0400, Jason Merrill wrote:

--- gcc/testsuite/g++.dg/cpp26/literals1.C.jj   2023-08-25 17:23:06.662878355 
+0200
+++ gcc/testsuite/g++.dg/cpp26/literals1.C  2023-08-25 17:37:03.085132304 
+0200
@@ -0,0 +1,65 @@
+// C++26 P1854R4 - Making non-encodable string literals ill-formed
+// { dg-do compile { target c++11 } }
+// { dg-require-effective-target int32 }
+// { dg-options "-pedantic-errors -finput-charset=UTF-8 -fexec-charset=UTF-8" }
+
+int d = '😁';   // { dg-error "character too 
large for character literal type" }

...

+char16_t m = u'😁'; // { dg-error "character 
constant too long for its type" }


Why are these different diagnostics?  Why doesn't the first line already hit
the existing diagnostic that the second gets?

Both could be clearer that the problem is that the single source character
can't be encoded as a single execution character.


The first diagnostics is the newly added in the patch which takes precedence
over the existing diagnostics (and wouldn't actually trigger without the
patch).  Sure, I could make that new diagnostics more specific, but all
I generally know is that (str2.len / nbwc) c-chars are encodable in str.len
execution character set code units.
So, would you like 2 different messages, one for str2.len / nbwb == 1
"single character not encodable in a single execution character code unit"


Sounds good, but let's drop "single".


and otherwise
"%d characters need %d execution character code units"
or
"at least one character not encodable in a single execution character code unit"
or something different?


The latter sounds good.  Maybe adding "in multicharacter literal"?


Everything else (i.e. u8 case in narrow_str_to_charconst and L, u and U
cases in wide_str_to_charconst) is already covered by existing diagnostics
which has the "character constant too long for its type"
wording and covers for both C and C++ both the cases where there are more
than one c-chars in the literal (allowed in the L case for < C++23) and
when one c-char encodes in more than one code units (but this time
it isn't execution character set, but UTF-8 character set for u8,
wide execution character set for L, UTF-16 character set for u and
UTF-32 for U).
Plus the same "character constant too long for its type" diagnostics
is emitted if normal narrow literal has several c-chars encodable all as
single execution character code units, but more than can fit into int.

So, do you want to change just the new diagnostics (and what is your
preferred wording), or use the old diagnostics wording also for the
new one, or do you want to change the preexisting diagnostics as well
and e.g. differentiate there between the single c-char cases which need
more than one code unit and different wording for more than one c-char?
Note, if we differentiate between those, we'd need to count how many
c-chars we have even for the u8, L, u and U cases if we see more than
one code unit, similarly how the patch does that (and also the follow-up
patch tweaks).


Under the existing diagnostic I'd like to distinguish the different 
cases more, e.g.


"multicharacter literal with %d characters exceeds 'int' size of %d bytes"
"multicharacter literal cannot have an encoding prefix"

Jason



Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-11-02 Thread Nathaniel Shead
Oh, this also fixes PR102284 and its other linked PRs (apart from
fields); I forgot to note that in the commit.

On Fri, Nov 03, 2023 at 12:18:29PM +1100, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86-64_pc_linux_gnu.
> 
> I'm not entirely sure if the change I made to have destructors clobber with
> CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to 
> have
> broken by doing this and I wasn't able to find anything else that really
> depended on this distinction other than a warning pass. Otherwise I could
> experiment with a new clobber kind for destructor calls.
> 
> -- >8 --
> 
> This patch adds checks for using objects after they've been manually
> destroyed via explicit destructor call. Currently this is only
> implemented for 'top-level' objects; FIELD_DECLs and individual elements
> of arrays will need a lot more work to track correctly and are left for
> a future patch.
> 
> The other limitation is that destruction of parameter objects is checked
> too 'early', happening at the end of the function call rather than the
> end of the owning full-expression as they should be for consistency;
> see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a
> good way to link the constructed parameter declarations with the
> variable declarations that are actually destroyed later on to propagate
> their lifetime status, so I'm leaving this for a later patch.
> 
>   PR c++/71093
> 
> gcc/cp/ChangeLog:
> 
>   * call.cc (build_trivial_dtor_call): Mark pseudo-destructors as
>   ending lifetime.
>   * constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't
>   return NULL_TREE for objects we're initializing.
>   (constexpr_global_ctx::destroy_value): Rename from remove_value.
>   Only mark real variables as outside lifetime.
>   (constexpr_global_ctx::clear_value): New function.
>   (destroy_value_checked): New function.
>   (cxx_eval_call_expression): Defer complaining about non-constant
>   arg0 for operator delete. Use remove_value_safe.
>   (cxx_fold_indirect_ref_1): Handle conversion to 'as base' type.
>   (outside_lifetime_error): Include name of object we're
>   accessing.
>   (cxx_eval_store_expression): Handle clobbers. Improve error
>   messages.
>   (cxx_eval_constant_expression): Use remove_value_safe. Clear
> bind variables before entering body.
>   * decl.cc (build_clobber_this): Mark destructors as ending
>   lifetime.
>   (start_preparsed_function): Pass false to build_clobber_this.
>   (begin_destructor_body): Pass true to build_clobber_this.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message.
>   * g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
>   * g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
>   * g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
>   * g++.dg/cpp2a/bitfield2.C: Likewise.
>   * g++.dg/cpp2a/constexpr-new3.C: Likewise. New check.
>   * g++.dg/cpp1y/constexpr-lifetime7.C: New test.
>   * g++.dg/cpp2a/constexpr-lifetime1.C: New test.
>   * g++.dg/cpp2a/constexpr-lifetime2.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/call.cc|   2 +-
>  gcc/cp/constexpr.cc   | 149 +++---
>  gcc/cp/decl.cc|  10 +-
>  .../g++.dg/cpp1y/constexpr-lifetime1.C|   2 +-
>  .../g++.dg/cpp1y/constexpr-lifetime2.C|   2 +-
>  .../g++.dg/cpp1y/constexpr-lifetime3.C|   2 +-
>  .../g++.dg/cpp1y/constexpr-lifetime4.C|   2 +-
>  .../g++.dg/cpp1y/constexpr-lifetime7.C|  93 +++
>  gcc/testsuite/g++.dg/cpp2a/bitfield2.C|   2 +-
>  .../g++.dg/cpp2a/constexpr-lifetime1.C|  21 +++
>  .../g++.dg/cpp2a/constexpr-lifetime2.C|  23 +++
>  gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  17 +-
>  12 files changed, 292 insertions(+), 33 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime7.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime2.C
> 
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index 2eb54b5b6ed..e5e9c6c44f8 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -9682,7 +9682,7 @@ build_trivial_dtor_call (tree instance, bool 
> no_ptr_deref)
>  }
>  
>/* A trivial destructor should still clobber the object.  */
> -  tree clobber = build_clobber (TREE_TYPE (instance));
> +  tree clobber = build_clobber (TREE_TYPE (instance), CLOBBER_EOL);
>return build2 (MODIFY_EXPR, void_type_node,
>instance, clobber);
>  }
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index c05760e6789..4f0f590c38a 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -1193,13 +1193,20 @@ public:
>   return *p;
>  return NULL_TREE;
>}
> -  tree *get_val

[PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-11-02 Thread Nathaniel Shead
Bootstrapped and regtested on x86-64_pc_linux_gnu.

I'm not entirely sure if the change I made to have destructors clobber with
CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to have
broken by doing this and I wasn't able to find anything else that really
depended on this distinction other than a warning pass. Otherwise I could
experiment with a new clobber kind for destructor calls.

-- >8 --

This patch adds checks for using objects after they've been manually
destroyed via explicit destructor call. Currently this is only
implemented for 'top-level' objects; FIELD_DECLs and individual elements
of arrays will need a lot more work to track correctly and are left for
a future patch.

The other limitation is that destruction of parameter objects is checked
too 'early', happening at the end of the function call rather than the
end of the owning full-expression as they should be for consistency;
see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a
good way to link the constructed parameter declarations with the
variable declarations that are actually destroyed later on to propagate
their lifetime status, so I'm leaving this for a later patch.

PR c++/71093

gcc/cp/ChangeLog:

* call.cc (build_trivial_dtor_call): Mark pseudo-destructors as
ending lifetime.
* constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't
return NULL_TREE for objects we're initializing.
(constexpr_global_ctx::destroy_value): Rename from remove_value.
Only mark real variables as outside lifetime.
(constexpr_global_ctx::clear_value): New function.
(destroy_value_checked): New function.
(cxx_eval_call_expression): Defer complaining about non-constant
arg0 for operator delete. Use remove_value_safe.
(cxx_fold_indirect_ref_1): Handle conversion to 'as base' type.
(outside_lifetime_error): Include name of object we're
accessing.
(cxx_eval_store_expression): Handle clobbers. Improve error
messages.
(cxx_eval_constant_expression): Use remove_value_safe. Clear
bind variables before entering body.
* decl.cc (build_clobber_this): Mark destructors as ending
lifetime.
(start_preparsed_function): Pass false to build_clobber_this.
(begin_destructor_body): Pass true to build_clobber_this.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp2a/bitfield2.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise. New check.
* g++.dg/cpp1y/constexpr-lifetime7.C: New test.
* g++.dg/cpp2a/constexpr-lifetime1.C: New test.
* g++.dg/cpp2a/constexpr-lifetime2.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/call.cc|   2 +-
 gcc/cp/constexpr.cc   | 149 +++---
 gcc/cp/decl.cc|  10 +-
 .../g++.dg/cpp1y/constexpr-lifetime1.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime2.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime3.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime4.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime7.C|  93 +++
 gcc/testsuite/g++.dg/cpp2a/bitfield2.C|   2 +-
 .../g++.dg/cpp2a/constexpr-lifetime1.C|  21 +++
 .../g++.dg/cpp2a/constexpr-lifetime2.C|  23 +++
 gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  17 +-
 12 files changed, 292 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime7.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime2.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 2eb54b5b6ed..e5e9c6c44f8 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -9682,7 +9682,7 @@ build_trivial_dtor_call (tree instance, bool no_ptr_deref)
 }
 
   /* A trivial destructor should still clobber the object.  */
-  tree clobber = build_clobber (TREE_TYPE (instance));
+  tree clobber = build_clobber (TREE_TYPE (instance), CLOBBER_EOL);
   return build2 (MODIFY_EXPR, void_type_node,
 instance, clobber);
 }
diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index c05760e6789..4f0f590c38a 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1193,13 +1193,20 @@ public:
return *p;
 return NULL_TREE;
   }
-  tree *get_value_ptr (tree t)
+  tree *get_value_ptr (tree t, bool initializing)
   {
 if (modifiable && !modifiable->contains (t))
   return nullptr;
 if (tree *p = values.get (t))
-  if (*p != void_node)
-   return p;
+  {
+   if (*p != void_node)
+ return p;
+   else if (initializing)
+ {

[Committed V3] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized codes[PR112326]

2023-11-02 Thread Juzhe-Zhong
With compile option --param=riscv-autovec-preference=fixed-vlmax, we have
redundant AVL/VL toggling:

vsetvli a5,a3,e8,mf4,ta,ma -> should be changed into e32m1
vle32.v v1,0(a1)
vle32.v v2,0(a0)
vsetivlizero,4,e32,m1,ta,ma -> redundant
sllia2,a5,2
vadd.vv v1,v1,v2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma -> redundant
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3

The root cause is because we simplify AVL into immediate AVL too early
in FIXED-VLMAX situation. The later avlprop PASS failed to propagate AVL
generated by (SELECT_VL/vsetvl VL, AVL) into the normal RVV instruction.

So we need to remove immedate AVL simplification in 'expand' stage.

After this patch:

vsetvli a5,a3,e32,m1,ta,ma
sllia2,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a0)
sub a3,a3,a5
vadd.vv v1,v1,v2
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3

After the removed simplification, the following situation should be fixed:
typedef int8_t vnx2qi __attribute__ ((vector_size (2)));

__attribute__ ((noipa)) void
f_vnx2qi (int8_t a, int8_t b, int8_t *out)
{
  vnx2qi v = {a, b};
  *(vnx2qi *) out = v;
}

We should use vsetvili zero, 2 instead of vsetvl a5,zero.
Such simplification is done in avlprop PASS which is also included in this patch
to fix regression of these situation.

PR target/112326

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): New function.
(simplify_replace_vlmax_avl): Ditto.
(pass_avlprop::execute): Add immediate AVL simplification.
* config/riscv/riscv-protos.h (imm_avl_p): Rename.
* config/riscv/riscv-v.cc (const_vlmax_p): Ditto.
(imm_avl_p): Ditto.
(emit_vlmax_insn): Adapt for new interface name.
* config/riscv/vector.md (mode_idx): New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112326.c: New test.

---
 gcc/config/riscv/riscv-avlprop.cc | 95 ++-
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 28 ++
 gcc/config/riscv/vector.md| 29 +-
 .../gcc.target/riscv/rvv/autovec/pr112326.c   | 16 
 5 files changed, 124 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112326.c

diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index bcd77a3047a..1dfaa8742da 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -109,6 +109,48 @@ vlmax_ta_p (rtx_insn *rinsn)
   return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn);
 }
 
+static machine_mode
+get_insn_vtype_mode (rtx_insn *rinsn)
+{
+  extract_insn_cached (rinsn);
+  int mode_idx = get_attr_mode_idx (rinsn);
+  gcc_assert (mode_idx != INVALID_ATTRIBUTE);
+  return GET_MODE (recog_data.operand[mode_idx]);
+}
+
+static void
+simplify_replace_vlmax_avl (rtx_insn *rinsn, rtx new_avl)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "\nPropagating AVL: ");
+  print_rtl_single (dump_file, new_avl);
+  fprintf (dump_file, "into: ");
+  print_rtl_single (dump_file, rinsn);
+}
+  /* Replace AVL operand.  */
+  extract_insn_cached (rinsn);
+  rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)];
+  int count = count_regno_occurrences (rinsn, REGNO (avl));
+  gcc_assert (count == 1);
+  rtx new_pat = simplify_replace_rtx (PATTERN (rinsn), avl, new_avl);
+  validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+  /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+  if (vlmax_avl_type_p (rinsn))
+{
+  int index = get_attr_avl_type_idx (rinsn);
+  gcc_assert (index != INVALID_ATTRIBUTE);
+  validate_change_or_fail (rinsn, recog_data.operand_loc[index],
+  get_avl_type_rtx (avl_type::NONVLMAX), false);
+}
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Successfully to match this instruction: ");
+  print_rtl_single (dump_file, rinsn);
+}
+}
+
 const pass_data pass_data_avlprop = {
   RTL_PASS, /* type */
   "avlprop",/* name */
@@ -384,34 +426,35 @@ pass_avlprop::execute (function *fn)
   for (const auto prop : *m_avl_propagations)
 {
   rtx_insn *rinsn = prop.first->rtl ();
+  simplify_replace_vlmax_avl (rinsn, prop.second);
+}
+
+  if (riscv_autovec_preference == RVV_FIXED_VLMAX)
+{
+  /* Simplify VLMAX AVL into immediate AVL.
+E.g. Simplify this following case:
+
+ vsetvl a5, zero, e32, m1
+ vadd.vv
+
+   into:
+
+ vsetvl zero, 4, e32, m1
+ vadd.vv
+if GET_MODE_NUNITS (RVVM1SImode) == 4.

RE: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread Li, Pan2
Thanks Patrick.

It caused by the underlying codegen is not implemented but expand modes opened. 
Revert it first to unblock others and will fix it ASAP.

Pan

From: Patrick O'Neill 
Sent: Friday, November 3, 2023 6:57 AM
To: Li, Pan2 ; juzhe.zhong 
Cc: gcc-patches@gcc.gnu.org; Wang, Yanzhang ; 
kito.ch...@gmail.com; gnu-toolchain 
Subject: Re: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec 
iterator


Hi Pan,

This patch is causing new failures (ICEs) on trunk:
https://github.com/patrick-rivos/gcc-postcommit-ci/issues/110

Pre-commit CI run:
https://github.com/ewlu/gcc-precommit-ci/issues/553#issuecomment-1790688172

New rv32gcv failures:

FAIL: gcc.dg/vect/fast-math-bb-slp-call-2.c (internal compiler error: in 
expand_vec_lrint, at config/riscv/riscv-v.cc:4134)

FAIL: gcc.dg/vect/fast-math-bb-slp-call-2.c (test for excess errors)

FAIL: gcc.dg/vect/fast-math-vect-call-2.c (internal compiler error: in 
expand_vec_lrint, at config/riscv/riscv-v.cc:4134)

FAIL: gcc.dg/vect/fast-math-vect-call-2.c (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O0  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O0  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O1  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O1  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O2  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O2  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O3 -g  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O3 -g  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -Os  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -Os  (test for excess errors)

New rv64gcv failures:

FAIL: gfortran.dg/pr32533.f90   -O0  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O0  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O1  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O1  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O2  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O2  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -O3 -g  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -O3 -g  (test for excess errors)

FAIL: gfortran.dg/pr32533.f90   -Os  (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144)

FAIL: gfortran.dg/pr32533.f90   -Os  (test for excess errors)

Please let me know if you need any additional information.

Thanks,
Patrick
On 11/2/23 05:13, Li, Pan2 wrote:
Committed, thanks Juzhe.

Pan

From: juzhe.zhong 
Sent: Thursday, November 2, 2023 8:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; Li, Pan2 
; Wang, Yanzhang 
; 
kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec 
iterator

lgtm
 Replied Message 
From
pan2...@intel.com
Date
11/02/2023 19:48
To
gcc-patches@gcc.gnu.org
Cc
juzhe.zh...@rivai.ai,
pan2...@intel.com,
yanzhang.w...@intel.com,
kito.ch...@gmail.com
Subject
[PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Bill Wendling
On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
>
> Thanks a lot for raising these issues.
>
> If I understand correctly,  the major question we need to answer is:
>
> For the following example: (Jakub mentioned this  in an early message)
>
>   1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>   2 struct S s;
>   3 s.a = 5;
>   4 char *p = &s.b[2];
>   5 int i1 = __builtin_dynamic_object_size (p, 0);
>   6 s.a = 3;
>   7 int i2 = __builtin_dynamic_object_size (p, 0);
>
> Should the 2nd __bdos call (line 7) get
> A. the latest value of s.a (line 6) for it’s size?
> Or  B. the value when the s.b was referenced (line 3, line 4)?
>
I personally think it should be (A). The user is specifically
indicating that the size has somehow changed, and the compiler should
behave accordingly.

> A should be more convenient for the user to use the dynamic array feature.
> With B, the user has to modify the source code (to add code to “re-obtain”
> the pointer after the size was adjusted at line 6) as mentioned by Richard.
>
> This depends on how we design the new internal function .ACCESS_WITH_SIZE
>
> 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
>
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>
> 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
>
> PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)
>
> With 1, We can only provide B, the user needs to modify the source code to 
> get the full feature of dynamic array;
> With 2, We can provide  A, the user will get full support to the dynamic 
> array without restrictions in the source code.
>
My understanding of ACCESS_WITH_SIZE is that it's there to add an
explicit reference to SIZE so that the optimizers won't reorder the
code incorrectly. If that's the case, then it should act as if
ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
dereference into the FAM). We get that with (2) it appears. It would
be a major headache to make the user go throughout their code base to
ensure that SIZE was either unmodified, or if it was that extra code
must be added to ensure the expected behavior.

> However, We have to pay additional cost for supporting A by using 2, which 
> includes:
>
> 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact 
> the IPA optimizations, more runtime overhead.
> Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
>
> 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
> impact some IPA optimizations, more runtime overhead.
>
> I think the following are the factors that make the decision:
>
> 1. How big the performance impact?
> 2. How important the dynamic array feature? Is adding some user restrictions 
> as Richard mentioned feasible to support this feature?
>
> Maybe we can implement 1 first, if the full support to the dynamic array is 
> needed, we can add 2 then?
> Or, we can implement both, and compare the performance difference, then 
> decide?
>
> Qing
>


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Bill Wendling
On Thu, Nov 2, 2023 at 1:00 AM Richard Biener
 wrote:
>
> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
> >
> >
> >
> > > On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> > >
> > > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > >
> > >> 2.3 A new semantic requirement in the user documentation of "counted_by"
> > >>
> > >> For the following structure including a FAM with a counted_by attribute:
> > >>
> > >>  struct A
> > >>  {
> > >>   size_t size;
> > >>   char buf[] __attribute__((counted_by(size)));
> > >>  };
> > >>
> > >> for any object with such type:
> > >>
> > >>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > >>
> > >> The setting to the size field should be done before the first reference
> > >> to the FAM field.
> > >>
> > >> Such requirement to the user will guarantee that the first reference to
> > >> the FAM knows the size of the FAM.
> > >>
> > >> We need to add this additional requirement to the user document.
> > >
> > > Make sure the manual is very specific about exactly when size is
> > > considered to be an accurate representation of the space available for buf
> > > (given that, after malloc or realloc, it's going to be temporarily
> > > inaccurate).  If the intent is that inaccurate size at such a time means
> > > undefined behavior, say so explicitly.
> >
> > Yes, good point. We need to define this clearly in the beginning.
> > We need to explicit say that
> >
> > the size of the FAM is defined by the latest “counted_by” value. And it’s 
> > an undefined behavior when the size field is not defined when the FAM is 
> > referenced.
> >
> > Is the above good enough?
> >
> >
> > >
> > >> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > >>
> > >> In C FE:
> > >>
> > >> for every reference to a FAM, for example, "obj->buf" in the small 
> > >> example,
> > >>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> > >>  if YES, replace the reference to "obj->buf" with a call to
> > >>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> > >
> > > This seems plausible - but you should also consider the case of static
> > > initializers - remember the GNU extension for statically allocated objects
> > > with flexible array members (unless you're not allowing it with
> > > counted_by).
> > >
> > > static struct A x = { sizeof "hello", "hello" };
> > > static char *y = &x.buf;
> > >
> > > I'd expect that to be valid - and unless you say such a usage is invalid,
> >
> > At this moment, I think that this should be valid.
> >
> > I,e, the following:
> >
> > struct A
> > {
> >  size_t size;
> >  char buf[] __attribute__((counted_by(size)));
> > };
> >
> > static struct A x = {sizeof "hello", "hello”};
> >
> > Should be valid, and x.size represents the number of elements of x.buf.
> > Both x.size and x.buf are initialized statically.
> >
> > > you should avoid the replacement in such a static initializer context when
> > > the FAM reference is to an object with a constant address (if
> > > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > > expression; if it works fine as a constant-address lvalue, then the
> > > replacement would be OK).
> >
> > Then if such usage for the “counted_by” is valid, we need to replace the FAM
> > reference by a call to  .ACCESS_WITH_SIZE as well.
> > Otherwise the “counted_by” relationship will be lost to the Middle end.
> >
> > With the current definition of .ACCESS_WITH_SIZE
> >
> > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> >
> > Isn’t the PTR (return value of the call) a LVALUE?
>
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.  That is,
>
> size_t *s = &a.size;
> *s = 1;
>
> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).
>
I believe Qing's original documentation for counted_by makes the
relationship between 'size' and the FAM very clear and that without
agreement it'll result in undefined behavior. Though it might be best
to state that in a strong way.

-bw


Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-02 Thread Richard Sandiford
Robin Dapp  writes:
>> Looks reasonable overall.  The new match patterns are 1:1 the
>> same as the COND_ ones.  That's a bit awkward, but I don't see
>> a good way to "macroize" stuff further there.  Can you at least
>> interleave the COND_LEN_* ones with the other ones instead of
>> putting them all at the end?
>
> Yes, no problem.  It's supposed to be only temporary anyway (FWIW)
> as I didn't manage with the "stripping _LEN" way on the first few tries.
> Still on the todo list but unlikely to be done before stage 1 closes.
>
> I believe Richard "kind of" LGTM'ed the rest minus the spurious
> pattern (which is gone now) but there is still the direct optab change
> that he didn't comment on so I think we should wait for his remarks
> still.

Could you explain why a special expansion is needed?  (Sorry if you already
have and I missed it, bit overloaded ATM.)  What does it do that is
different from what expand_fn_using_insn would do?

Thanks,
Richard



[PATCH] g++: Rely on dg-do-what-default to avoid running pr102788.cc on non-vector targets

2023-11-02 Thread Patrick O'Neill
Testcases in g++.dg/vect rely on check_vect_support_and_set_flags
to set dg-do-what-default and avoid running vector tests on non-vector
targets. The three testcases in this patch overwrite the default with
dg-do run.

Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr102788.cc: Remove dg-do run directive.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc & rv64gcv to make sure the testcases compile/run
as expected.

Similar to 
https://inbox.sourceware.org/gcc-patches/20231102190911.66763-1-patr...@rivosinc.com/T/#u
---
 gcc/testsuite/g++.dg/vect/pr102788.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr102788.cc 
b/gcc/testsuite/g++.dg/vect/pr102788.cc
index fa9c366fe56..032fa29fc72 100644
--- a/gcc/testsuite/g++.dg/vect/pr102788.cc
+++ b/gcc/testsuite/g++.dg/vect/pr102788.cc
@@ -1,4 +1,3 @@
-// { dg-do run }
 // { dg-additional-options "-O3" }

 unsigned long long int var_4 = 235;
--
2.34.1



Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-11-02 Thread Andrew Pinski
On Wed, Sep 20, 2023 at 6:52 AM Robin Dapp  wrote:
>
> Hi,
>
> as described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> Related question/change: We only allow PLUS_EXPR in fold_left_reduction_fn
> but have code to handle MINUS_EXPR in vectorize_fold_left_reduction.  I
> suppose that's intentional but it "just works" on riscv and the testsuite
> doesn't change when allowing MINUS_EXPR so I went ahead and did that.
>
> Bootstrapped and regtested on x86 and aarch64.

This caused gcc.target/i386/avx512f-reduce-op-1.c testcase to start to
fail when testing on a x86_64 that has avx512f (In my case I am using
`Intel(R) Xeon(R) D-2166NT CPU @ 2.00GHz`).  I reverted the commit to
double check it too.

The difference in optimized I see is:
  if (_40 != 3.5e+1) // working
vs
  if (_40 != 6.4e+1) // not working

It is test_epi32_ps which is failing with TEST_PS macro and the plus
operand that uses TESTOP:
TESTOP (add, +, float, ps, 0.0f);   \

I have not reduced the testcase any further though.

Thanks,
Andrew Pinski


>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR middle-end/111401
> * internal-fn.cc (cond_fn_p): New function.
> * internal-fn.h (cond_fn_p): Define.
> * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
> if supported.
> (predicate_scalar_phi): Add whitespace.
> * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
> (neutral_op_for_reduction): Return -0 for PLUS.
> (vect_is_simple_reduction): Don't count else operand in
> COND_ADD.
> (vectorize_fold_left_reduction): Add COND_ADD handling.
> (vectorizable_reduction): Don't count else operand in COND_ADD.
> (vect_transform_reduction): Add COND_ADD handling.
> * tree-vectorizer.h (neutral_op_for_reduction): Add default
> parameter.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  gcc/internal-fn.cc|  38 +
>  gcc/internal-fn.h |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 ++
>  .../riscv/rvv/autovec/cond/pr111401.c |  61 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 130 
>  gcc/tree-vectorizer.h |   2 +-
>  7 files changed, 394 insertions(+), 42 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..77939890f5a 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4241,6 +4241,44 @@ first_commutative_argument (internal_fn fn)
>  }
>  }
>
> +/* Return true if this CODE describes a conditional (masked) internal_fn.  */
> +
> +bool
> +cond_fn_p (code_helper code)
> +{
> +  if (!code.is_fn_code ())
> +return false;
> +
> +  if (!internal_fn_p ((combined_fn) code))
> +return false;
> +
> +  internal_fn fn = as_internal_fn ((combined_fn) code);
> +  switch (fn)
> +{
> +#undef DEF_INTERNAL_COND_FN
> +#define DEF_INTERNAL_COND_FN(NAME, F, O, T)  \
> +case IFN_COND_##NAME:\
> +case IFN_COND_LEN_##NAME:\
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_COND_FN
> +
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +#define DEF_INTERNAL_SIGNED_COND_FN(NAME, F, S, SO, UO, T)   \
> +case IFN_COND_##NAME:\
> +case IFN_COND_LEN_##NAME:\
> +  return true;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_SIGNED_COND_FN
> +
> +default:
> +  return false;
> +}
> +
> +  return false;
> +}
> +
> +
>  /* Return true if this CODE describes an internal_fn that returns a vector 
> with
> elements twice as wide as the element size of the input vectors.  */
>
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 99de13a0199..f1cc9db29c0 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -219,6 +219,7 @@ extern bool commutative_ternary_fn_p (internal_fn);
>  extern int first_commutative_argument (internal_fn);
>  extern bool associative_binary_fn_p (internal_fn);
>  extern bool widening_fn_p (code_helper);
> +exte

Re: [Committed] RISC-V: Add check for types without insn reservations

2023-11-02 Thread Edwin Lu

On 11/1/2023 11:53 AM, Jeff Law wrote:



On 11/1/23 12:17, Edwin Lu wrote:

Now that all insns are guaranteed to have a type, ensure every insn
is associated with a cpu unit/insn reservation.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): add disabled 
assert
OK.  Really interested to see how often this trips in practice.  I 
suspect often right now ;-)


Committed! On a local test with just rv64gc, actually not that many 
types were tripped. I think there were around 13 that weren't part of 
any reservation


Edwin

jeff









Re: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread Patrick O'Neill

Hi Pan,

This patch is causing new failures (ICEs) on trunk:
https://github.com/patrick-rivos/gcc-postcommit-ci/issues/110

Pre-commit CI run:
https://github.com/ewlu/gcc-precommit-ci/issues/553#issuecomment-1790688172

New rv32gcv failures:

|FAIL: gcc.dg/vect/fast-math-bb-slp-call-2.c (internal compiler error: 
in expand_vec_lrint, at config/riscv/riscv-v.cc:4134) FAIL: 
gcc.dg/vect/fast-math-bb-slp-call-2.c (test for excess errors) FAIL: 
gcc.dg/vect/fast-math-vect-call-2.c (internal compiler error: in 
expand_vec_lrint, at config/riscv/riscv-v.cc:4134) FAIL: 
gcc.dg/vect/fast-math-vect-call-2.c (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O0 (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O0 (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O1 (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O1 (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O2 (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O2 (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O3 -g (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O3 -g (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -Os (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -Os (test for excess errors) |


New rv64gcv failures:

|FAIL: gfortran.dg/pr32533.f90 -O0 (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O0 (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O1 (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O1 (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O2 (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O2 (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -O3 -g (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -O3 -g (test for excess errors) FAIL: 
gfortran.dg/pr32533.f90 -Os (internal compiler error: in 
expand_vec_lround, at config/riscv/riscv-v.cc:4144) FAIL: 
gfortran.dg/pr32533.f90 -Os (test for excess errors)|


Please let me know if you need any additional information.

Thanks,
Patrick

On 11/2/23 05:13, Li, Pan2 wrote:


Committed, thanks Juzhe.

Pan

*From:*juzhe.zhong 
*Sent:* Thursday, November 2, 2023 8:04 PM
*To:* Li, Pan2 
*Cc:* gcc-patches@gcc.gnu.org; Li, Pan2 ; Wang, 
Yanzhang ; kito.ch...@gmail.com
*Subject:* Re: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding 
API autovec iterator


lgtm

 Replied Message 

From



pan2...@intel.com 

Date



11/02/2023 19:48

To



gcc-patches@gcc.gnu.org 



Cc



juzhe.zh...@rivai.ai ,
pan2...@intel.com ,
yanzhang.w...@intel.com 
,

kito.ch...@gmail.com 

Subject



[PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator


Re: Re: [PATCH V2] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized codes[PR112326]

2023-11-02 Thread 钟居哲
Thanks Robin.

Committed with change nuints into nunits

and change mode_idx into 0 for vnshift and vnclip.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-02 23:18
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized 
codes[PR112326]
Hi Juzhe,
 
in principle this LGTM.  It could use some function comments, though ;)
> +imm_avl_p (machine_mode mode)
>  {
>poly_uint64 nuints = GET_MODE_NUNITS (mode);
>  
>return nuints.is_constant ()
> -/* The vsetivli can only hold register 0~31.  */
> -? (IN_RANGE (nuints.to_constant (), 0, 31))
> -/* Only allowed in VLS-VLMAX mode.  */
> -: false;
> +/* The vsetivli can only hold register 0~31.  */
> +? (IN_RANGE (nuints.to_constant (), 0, 31))
> +/* Only allowed in VLS-VLMAX mode.  */
> +: false;
>  }
 
Please replace nuints (or untis) with nunits here everywhere.
 
> +;; The index of operand[] represents the machine mode of the instruction.
> +(define_attr "mode_idx" ""
> + (cond [(eq_attr "type" 
> "vlde,vste,vldm,vstm,vlds,vsts,vldux,vldox,vldff,vldr,vstr,\
> + vlsegde,vlsegds,vlsegdux,vlsegdox,vlsegdff,vialu,vext,vicalu,\
> + vshift,vicmp,viminmax,vimul,vidiv,vimuladd,vimerge,vimov,\
> + vsalu,vaalu,vsmul,vsshift,vfalu,vfmul,vfdiv,vfmuladd,vfsqrt,vfrecp,\
> + vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,\
> + vfcvtitof,vfncvtitof,vfncvtftoi,vfncvtftof,vmalu,vmiota,vmidx,\
> + 
> vimovxv,vfmovfv,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,\
> + vgather,vcompress,vmov")
> +(const_int 0)
> +
> +(eq_attr "type" "vimovvx,vfmovvf")
> +(const_int 1)
> +
> +(eq_attr "type" "vssegte,vnshift,vmpop,vmffs")
> +(const_int 2)   
 
I'm not that fond of the growing number of necessary indices even though I
realize that it's the most painless way for now.  Why is vnshift "2" and
not "0", though?
 
"4" for vnclip also looks dubious.  I didn't go through all of them.
 
Regards
Robin
 
 


Re: [PATCH 2/4] maintainer-scripts/gcc_release: create index between snapshots <-> commits

2023-11-02 Thread rep . dot . nop
On 2 November 2023 11:25:47 CET, Jonathan Wakely  wrote:
>On Thu, 2 Nov 2023 at 10:23, Andreas Schwab wrote:
>>
>> On Nov 02 2023, Jonathan Wakely wrote:
>>
>> > Git tags are cheap, but I can imagine a concern about hundreds of new
>> > tags "littering" the output of 'git tag -l'. I don't _think_ you can
>> > put tags under an alternative ref that isn't fetched by default (as we
>> > do with refs/users and refs/vendor). I think tags have to go under
>> > refs/tags. But grep -v could be used to filter out snapshot tags
>> > easily.
>>
>> There is no inherent limitation on publishing tags outside of refs/tags,
>> to make them invisible by git tag.  There are already existing examples
>> of tags residing under various refs/users and refs/vendors namespaces.
>
>
>Ah, good to know, thanks.
>
>So then there's no reason that snapshots would have to clutter up the
>list of default tags for anybody who isn't interested in them.
>

Thanks Andreas. Exactly. So, just to emphasise the obvious:
Let's please use refs/snapshot ?


Re: [PATCH] libstdc++: avoid uninitialized read in basic_string constructor

2023-11-02 Thread Jonathan Wakely
On Thu, 2 Nov 2023 at 19:58, Ben Sherman  wrote:
>
> Tested on x86_64-pc-linux-gnu, please let me know if there's anything
> else needed. I haven't contributed before and don't have write access, so
> apologies if I've missed anything.

This was https://gcc.gnu.org/PR109703 (and several duplicates) and
should already be fixed in all affected branches. Where are you seeing
this?

> The basic_string input iterator constructor incrementally reads data and
> allocates the internal buffer as-needed. When _M_dispose() is called, there
> is a check for whether the local buffer is being used - if it is, there is
> an additional check guarding __builtin_unreachable() for the value of
> _M_string_length. The constructor does not initialize _M_string_length
> until all data has been read, so the first re-allocation out of the local
> buffer will have an uninitialized read.
>
> This updates the basic_string input iterator constructor to properly set
> _M_string_length as data is being read.  It additionally introduces a new
> _M_assign_terminator() function to assign the null-terminator based on the
> currently-stored _M_string_length.

Adding new member functions to std::string requires exporting them
from the shared library, which requires bumping the shared library
version, which is an ABI change that isn't suitable for backporting to
release branches. But it doesn't matter if we don't need to make this
change (and I don't think we do need to).


> libstdc++-v3/ChangeLog:
>
> * include/bits/basic_string.h (_M_assign_terminator()): New
>   function.
>   (_M_set_length()): Use _M_assign_terminator().
> * include/bits/basic_string.tcc (_M_construct(InIter, InIter,
>   input_iterator_tag)): Set length incrementally, use
>   _M_assign_terminator().
>
> diff --git a/libstdc++-v3/include/bits/basic_string.h 
> b/libstdc++-v3/include/bits/basic_string.h
> index 0fa32afeb..ba02d8f0f 100644
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -258,12 +258,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>_M_capacity(size_type __capacity)
>{ _M_allocated_capacity = __capacity; }
>
> +  _GLIBCXX20_CONSTEXPR
> +  void
> +  _M_assign_terminator()
> +  { traits_type::assign(_M_data()[_M_string_length], _CharT()); }
> +
>_GLIBCXX20_CONSTEXPR
>void
>_M_set_length(size_type __n)
>{
> _M_length(__n);
> -   traits_type::assign(_M_data()[__n], _CharT());
> +   _M_assign_terminator();
>}
>
>_GLIBCXX20_CONSTEXPR
> diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
> b/libstdc++-v3/include/bits/basic_string.tcc
> index f0a44e5e8..84366a44a 100644
> --- a/libstdc++-v3/include/bits/basic_string.tcc
> +++ b/libstdc++-v3/include/bits/basic_string.tcc
> @@ -182,6 +182,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> ++__beg;
>   }
>
> +   _M_length(__len);
> +
> struct _Guard
> {
>   _GLIBCXX20_CONSTEXPR
> @@ -206,12 +208,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> _M_capacity(__capacity);
>   }
> traits_type::assign(_M_data()[__len++], *__beg);
> +   _M_length(__len);
> ++__beg;
>   }
>
> __guard._M_guarded = 0;
>
> -   _M_set_length(__len);
> +   _M_assign_terminator();
>}
>
>template
> --
> 2.21.0
>
>
>
>
>
>
>
> This electronic mail message and any attached files contain information 
> intended for the exclusive use of the individual or entity to whom it is 
> addressed and may contain information that is proprietary, confidential 
> and/or exempt from disclosure under applicable law. If you are not the 
> intended recipient, you are hereby notified that any viewing, copying, 
> disclosure or distribution of this information may be subject to legal 
> restriction or sanction. Please notify the sender, by electronic mail or 
> telephone, of any unintended recipients and delete the original message 
> without making any copies.
>



[PATCH] tree-optimization: Add register pressure heuristics

2023-11-02 Thread Ajit Agarwal
Hello All:

Currently code sinking heuristics are based on profile data like
basic block count and sink frequency threshold. We have removed
such heuristics and added register pressure heuristics based on
live-in and live-out of early blocks and immediate dominator of
use blocks of the same loop nesting depth.

Such heuristics reduces register pressure when code sinking is 
done with same loop nesting depth.

High register pressure region is the region where there are live-in of
early blocks that has been modified by the early block. If there are
modification of the variables in best block that are live-in in early
block that are live-out of best block.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-optimization: Add register pressure heuristics

Currently code sinking heuristics are based on profile data like
basic block count and sink frequency threshold. We have removed
such heuristics to add register pressure heuristics based on
live-in and live-out of early blocks and immediate dominator of
use blocks.

High register pressure region is the region where there are live-in of
early blocks that has been modified by the early block. If there are
modification of the variables in best block that are live-in in early
block that are live-out of best block.

2023-11-03  Ajit Kumar Agarwal  

gcc/ChangeLog:

* tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
as paramters.
(sink_code_in_bb): Ditto.
(select_best_block): Add register pressure heuristics to select
the best blocks in the immediate dominator for same loop nest depth.
(execute): Add live range analysis.
(additional_var_map): New function.
* tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
tests on ssa_names.
(verify_live_on_entry): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
* gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +
 gcc/tree-ssa-live.cc| 11 ++-
 gcc/tree-ssa-sink.cc| 93 ++---
 4 files changed, 104 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
index f06daf23035..998fe588278 100644
--- a/gcc/tree-ssa-live.cc
+++ b/gcc/tree-ssa-live.cc
@@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p 
live)
 def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
 
   /* An undefined local variable does not need to be very alive.  */
-  if (ssa_undefined_value_p (ssa_name, false))
+  if (virtual_operand_p (ssa_name)
+  || ssa_undefined_value_p (ssa_name, false))
 return;
 
   /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
@@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
 
 
 /* Verify that the info in LIVE matches the current cfg.  */
-
 static void
 verify_live_on_entry (tree_live_info_p live)
 {
@@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
  tree d = NULL_TREE;
  bitmap loe;
  var = partition_to_var (map, i);
+ if (var == NULL_TREE)
+   continue;
  stmt = SSA_NAME_DEF_STMT (var);
  tmp = gimple_bb (stmt);
+
  if (SSA_NAME_VAR (var))
d = ssa_default_def (cfun, SSA_NAME_VAR (var));
-
  loe = live_on_entry (live, e->dest);
  if (loe && bitmap_bit_p (loe, i))
{
@@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
  {
 

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao



> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek  wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
>>> p_5 = &tmp_4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>>> pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is 
> obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with &s.a address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>>> []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good 
>> enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).

counted_by currently is limited to INTEGER type. This should resolve this 
issue, right?

Qing
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 7:52 AM, Richard Biener  wrote:
> 
> On Thu, Nov 2, 2023 at 11:40 AM Jakub Jelinek  wrote:
>> 
>> On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
 Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
 the arguments not the size value, but its address.  Then at __bdos time
 we would dereference that pointer to get the size.
 So,
 struct S { int a; char b __attribute__((counted_by (a))) []; };
 struct S s;
 s.a = 5;
 char *p = &s.b[2];
 int i1 = __builtin_dynamic_object_size (p, 0);
 s.a = 3;
 int i2 = __builtin_dynamic_object_size (p, 0);
 would then yield 3 and 1 rather than 3 and 3.
>>> 
>>> I fail to see how we can get the __builtin_dynamic_object_size call
>>> data dependent on s.a, thus avoid re-ordering or even DSE of the
>>> store.
>> 
>> If &s.b[2] is lowered as
>> sz_1 = s.a;
>> tmp_2 = .ACCESS_WITH_SIZE (&s.b[0], sz_1);
>> p_3 = &tmp_2[2];
>> then sure, there is no way, you get the size from that point.
>> tree-object-size.cc tracking then determines that in a particular
>> case the pointer is size associated with sz_1 and use that value
>> as the size (with the usual adjustments for pointer arithmetics and the
>> like).
>> 
>> What I meant is to emit
>> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
>> p_5 = &tmp_4[2];
>> i.e. don't associate the pointer with a value of the size, but with
>> an address where to find the size (plus how large it is), basically escape
>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>> pure,
>> so supposedly it can depend on what the escaped pointer points to.
> 
> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
> escape point (quite bad IMHO) and __builtin_dynamic_object_size being
> non-const (that's probably not too bad).
> 
>> We'd see that a particular pointer is size associated with &s.a address
>> and would use that address cast to the type of the third argument (to
>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>> VN CSE it anyway if one has say
>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>> []; } t; };
>> and
>> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
>> ...
>> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
>> ?
> 
> We'd probably CSE that - the usual issue of address-with-same-value.
> 
>> It would mean though that counted_by wouldn't be allowed to be a
>> bit-field...
> 
> Yup.  We could also pass a pointer to the container though, that's good enough
> for the escape, and pass the size by value in addition to that.
Could you explain a little bit more here? Then the .ACCESS_WITH_SIZE will become

PTR = .ACCESS_WITH_SIZE (PTR, &PTR’s Container, SIZE, ACCESS_MODE)

??

> 
>>Jakub
>> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao
Thanks a lot for raising these issues. 

If I understand correctly,  the major question we need to answer is:

For the following example: (Jakub mentioned this  in an early message)

  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
  2 struct S s;
  3 s.a = 5;
  4 char *p = &s.b[2];
  5 int i1 = __builtin_dynamic_object_size (p, 0);
  6 s.a = 3;
  7 int i2 = __builtin_dynamic_object_size (p, 0);

Should the 2nd __bdos call (line 7) get
A. the latest value of s.a (line 6) for it’s size? 
Or  B. the value when the s.b was referenced (line 3, line 4)?

A should be more convenient for the user to use the dynamic array feature.
With B, the user has to modify the source code (to add code to “re-obtain” 
the pointer after the size was adjusted at line 6) as mentioned by Richard. 

This depends on how we design the new internal function .ACCESS_WITH_SIZE

1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed. 

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.

PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)

With 1, We can only provide B, the user needs to modify the source code to get 
the full feature of dynamic array;
With 2, We can provide  A, the user will get full support to the dynamic array 
without restrictions in the source code. 

However, We have to pay additional cost for supporting A by using 2, which 
includes:

1. .ACCESS_WITH_SIZE will become an escape point, which will further impact the 
IPA optimizations, more runtime overhead. 
Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?

2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
impact some IPA optimizations, more runtime overhead. 

I think the following are the factors that make the decision:

1. How big the performance impact?
2. How important the dynamic array feature? Is adding some user restrictions as 
Richard mentioned feasible to support this feature?

Maybe we can implement 1 first, if the full support to the dynamic array is 
needed, we can add 2 then? 
Or, we can implement both, and compare the performance difference, then decide?

Qing




> On Nov 2, 2023, at 8:09 AM, Jakub Jelinek  wrote:
> 
> On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
>>> What I meant is to emit
>>> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
>>> p_5 = &tmp_4[2];
>>> i.e. don't associate the pointer with a value of the size, but with
>>> an address where to find the size (plus how large it is), basically escape
>>> pointer to the size at that point.  And __builtin_dynamic_object_size is 
>>> pure,
>>> so supposedly it can depend on what the escaped pointer points to.
>> 
>> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
>> escape point (quite bad IMHO)
> 
> That is why I've said we need to decide what cost we want to suffer because
> of that.
> 
>> and __builtin_dynamic_object_size being
>> non-const (that's probably not too bad).
> 
> It is already pure,leaf,nothrow (unlike __builtin_object_size which is 
> obviously
> const,leaf,nothrow).  Because under the hood, it can read memory when
> expanded.
> 
>>> We'd see that a particular pointer is size associated with &s.a address
>>> and would use that address cast to the type of the third argument (to
>>> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
>>> VN CSE it anyway if one has say
>>> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>>>  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
>>> []; } t; };
>>> and
>>> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
>>> ...
>>> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
>>> ?
>> 
>> We'd probably CSE that - the usual issue of address-with-same-value.
>> 
>>> It would mean though that counted_by wouldn't be allowed to be a
>>> bit-field...
>> 
>> Yup.  We could also pass a pointer to the container though, that's good 
>> enough
>> for the escape, and pass the size by value in addition to that.
> 
> I was wondering about stuff like _BitInt.  But sure, counted_by is just an
> extension, we can just refuse counting by _BitInt in addition to counting by
> floating point, pointers, aggregates, bit-fields, or we could somehow encode
> all the needed type's properties numerically into an integral constant.
> Similarly for alias set (unless it uses 0 for reads).
> 
>   Jakub
> 



Re: [Patch, fortran] PR98498 - Interp request: defined operators and unlimited polymorphic

2023-11-02 Thread Harald Anlauf

Hi Paul,

Am 02.11.23 um 19:18 schrieb Paul Richard Thomas:

Hi Harald,

I was overthinking the problem. The rejected cases led me to a fix that can
only be described as a considerable simplification compared with the first
patch!


this patch is *much* simpler, makes more sense, and works here. :-)


The testcase now reflects the requirements of the standard and
regtests without failures.

OK for mainline?


Yes, OK for mainline.

Thanks,
Harald


Thanks

Paul

Fortran: Defined operators with unlimited polymorphic args [PR98498]

2023-11-02  Paul Thomas  

gcc/fortran
PR fortran/98498
* interface.cc (upoly_ok): Defined operators using unlimited
polymorphic formal arguments must not override the intrinsic
operator use.

gcc/testsuite/
PR fortran/98498
* gfortran.dg/interface_50.f90: New test.


On Wed, 1 Nov 2023 at 20:12, Harald Anlauf  wrote:


Hi Paul,

Am 01.11.23 um 19:02 schrieb Paul Richard Thomas:

The interpretation request came in a long time ago but I only just got
around to implementing it.

The updated text from the standard is in the comment. Now I am writing
this, I think that I should perhaps use switch(op)/case rather than using
if/else if and depending on the order of the gfc_intrinsic_op enum being
maintained. Thoughts?


the logic is likely harder to parse with if/else than with
switch(op)/case.  However, I do not think that the order of
the enum will ever be changed, as the module format relies
on that very order.


The testcase runs fine with both mainline and nagfor. I think that
compile-only with counts of star-eq and star_not should suffice.


I found other cases that are rejected even with your patch,
but which are accepted by nagfor.  Example:

 print *, ('a' == c)

Nagfor prints F at runtime as expected, as it correctly resolves
this to star_eq.  Further examples can be easily constructed.

Can you have a look?

Thanks,
Harald


Regtests with no regressions. OK for mainline?

Paul

Fortran: Defined operators with unlimited polymorphic args [PR98498]

2023-11-01  Paul Thomas  

gcc/fortran
PR fortran/98498
* interface.cc (upoly_ok): New function.
(gfc_extend_expr): Use new function to ensure that defined
operators using unlimited polymorphic formal arguments do not
override their intrinsic uses.

gcc/testsuite/
PR fortran/98498
* gfortran.dg/interface_50.f90: New test.











[pushed] c++: use hash_set in nrv_data

2023-11-02 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

I noticed we were using a hash_table directly here instead of the simpler
hash_set interface.  Also, let's check for the variable itself and repeats
earlier, since they should happen more often than any of the other cases.

gcc/cp/ChangeLog:

* semantics.cc (nrv_data): Change visited to hash_set.
(finalize_nrv_r): Reorganize.
---
 gcc/cp/semantics.cc | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index a0f2edcf117..37bffca8e55 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4980,7 +4980,7 @@ public:
 
   tree var;
   tree result;
-  hash_table > visited;
+  hash_set visited;
   bool simple;
   bool in_nrv_cleanup;
 };
@@ -4991,12 +4991,22 @@ static tree
 finalize_nrv_r (tree* tp, int* walk_subtrees, void* data)
 {
   class nrv_data *dp = (class nrv_data *)data;
-  tree_node **slot;
 
   /* No need to walk into types.  There wouldn't be any need to walk into
  non-statements, except that we have to consider STMT_EXPRs.  */
   if (TYPE_P (*tp))
 *walk_subtrees = 0;
+
+  /* Replace all uses of the NRV with the RESULT_DECL.  */
+  else if (*tp == dp->var)
+*tp = dp->result;
+
+  /* Avoid walking into the same tree more than once.  Unfortunately, we
+ can't just use walk_tree_without duplicates because it would only call
+ us for the first occurrence of dp->var in the function body.  */
+  else if (dp->visited.add (*tp))
+*walk_subtrees = 0;
+
   /* If there's a label, we might need to destroy the NRV on goto (92407).  */
   else if (TREE_CODE (*tp) == LABEL_EXPR && !dp->in_nrv_cleanup)
 dp->simple = false;
@@ -5086,18 +5096,6 @@ finalize_nrv_r (tree* tp, int* walk_subtrees, void* data)
   SET_EXPR_LOCATION (init, EXPR_LOCATION (*tp));
   *tp = init;
 }
-  /* And replace all uses of the NRV with the RESULT_DECL.  */
-  else if (*tp == dp->var)
-*tp = dp->result;
-
-  /* Avoid walking into the same tree more than once.  Unfortunately, we
- can't just use walk_tree_without duplicates because it would only call
- us for the first occurrence of dp->var in the function body.  */
-  slot = dp->visited.find_slot (*tp, INSERT);
-  if (*slot)
-*walk_subtrees = 0;
-  else
-*slot = *tp;
 
   /* Keep iterating.  */
   return NULL_TREE;

base-commit: 36a26298ec7dfca615d4ba411a3508d1287d6ce5
-- 
2.39.3



[pushed] c++: retval dtor on rethrow [PR112301]

2023-11-02 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In r12-6333 for PR33799, I fixed the example in [except.ctor]/2.  In that
testcase, the exception is caught and the function returns again,
successfully.

In this testcase, however, the exception is rethrown, and hits two separate
cleanups: one in the try block and the other in the function body.  So we
destroy twice an object that was only constructed once.

Fortunately, the fix for the normal case is easy: we just need to clear the
"return value constructed by return" flag when we do it the first time.

This gets more complicated with the named return value optimization, since
we don't want to destroy the return value while the NRV variable is still in
scope.

PR c++/112301
PR c++/102191
PR c++/33799

gcc/cp/ChangeLog:

* except.cc (maybe_splice_retval_cleanup): Clear
current_retval_sentinel when destroying retval.
* semantics.cc (nrv_data): Add in_nrv_cleanup.
(finalize_nrv): Set it.
(finalize_nrv_r): Fix handling of throwing cleanups.

gcc/testsuite/ChangeLog:

* g++.dg/eh/return1.C: Add more cases.
---
 gcc/cp/except.cc  | 18 ++-
 gcc/cp/semantics.cc   | 47 +-
 gcc/testsuite/g++.dg/eh/return1.C | 81 ++-
 3 files changed, 142 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/except.cc b/gcc/cp/except.cc
index e32efb30457..d966725db9b 100644
--- a/gcc/cp/except.cc
+++ b/gcc/cp/except.cc
@@ -1284,7 +1284,15 @@ build_noexcept_spec (tree expr, tsubst_flags_t complain)
current_retval_sentinel so that we know that the return value needs to be
destroyed on throw.  Do the same if the current function might use the
named return value optimization, so we don't destroy it on return.
-   Otherwise, returns NULL_TREE.  */
+   Otherwise, returns NULL_TREE.
+
+   The sentinel is set to indicate that we're in the process of returning, and
+   therefore should destroy a normal return value on throw, and shouldn't
+   destroy a named return value variable on normal scope exit.  It is set on
+   return, and cleared either by maybe_splice_retval_cleanup, or when an
+   exception reaches the NRV scope (finalize_nrv_r).  Note that once return
+   passes the NRV scope, it's effectively a normal return value, so cleanup
+   past that point is handled by maybe_splice_retval_cleanup. */
 
 tree
 maybe_set_retval_sentinel ()
@@ -1361,6 +1369,14 @@ maybe_splice_retval_cleanup (tree compound_stmt, bool 
is_try)
  tsi_delink (&iter);
}
   tree dtor = build_cleanup (retval);
+  if (!function_body)
+   {
+ /* Clear the sentinel so we don't try to destroy the retval again on
+rethrow (c++/112301).  */
+ tree clear = build2 (MODIFY_EXPR, boolean_type_node,
+  current_retval_sentinel, boolean_false_node);
+ dtor = build2 (COMPOUND_EXPR, void_type_node, clear, dtor);
+   }
   tree cond = build3 (COND_EXPR, void_type_node, current_retval_sentinel,
  dtor, void_node);
   tree cleanup = build_stmt (loc, CLEANUP_STMT,
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 52044be7af8..a0f2edcf117 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4982,6 +4982,7 @@ public:
   tree result;
   hash_table > visited;
   bool simple;
+  bool in_nrv_cleanup;
 };
 
 /* Helper function for walk_tree, used by finalize_nrv below.  */
@@ -4997,7 +4998,7 @@ finalize_nrv_r (tree* tp, int* walk_subtrees, void* data)
   if (TYPE_P (*tp))
 *walk_subtrees = 0;
   /* If there's a label, we might need to destroy the NRV on goto (92407).  */
-  else if (TREE_CODE (*tp) == LABEL_EXPR)
+  else if (TREE_CODE (*tp) == LABEL_EXPR && !dp->in_nrv_cleanup)
 dp->simple = false;
   /* Change NRV returns to just refer to the RESULT_DECL; this is a nop,
  but differs from using NULL_TREE in that it indicates that we care
@@ -5016,16 +5017,59 @@ finalize_nrv_r (tree* tp, int* walk_subtrees, void* 
data)
   else if (TREE_CODE (*tp) == CLEANUP_STMT
   && CLEANUP_DECL (*tp) == dp->var)
 {
+  dp->in_nrv_cleanup = true;
+  cp_walk_tree (&CLEANUP_BODY (*tp), finalize_nrv_r, data, 0);
+  dp->in_nrv_cleanup = false;
+  cp_walk_tree (&CLEANUP_EXPR (*tp), finalize_nrv_r, data, 0);
+  *walk_subtrees = 0;
+
   if (dp->simple)
+   /* For a simple NRV, just run it on the EH path.  */
CLEANUP_EH_ONLY (*tp) = true;
   else
{
+ /* Not simple, we need to check current_retval_sentinel to decide
+whether to run it.  If it's set, we're returning normally and
+don't want to destroy the NRV.  If the sentinel is not set, we're
+leaving scope some other way, either by flowing off the end of its
+scope or throwing an exception.  */
  tree cond = build3 (COND_EXPR, void_type_node,
 

[PATCH] libstdc++: avoid uninitialized read in basic_string constructor

2023-11-02 Thread Ben Sherman
Tested on x86_64-pc-linux-gnu, please let me know if there's anything
else needed. I haven't contributed before and don't have write access, so
apologies if I've missed anything.

-- >8 --

The basic_string input iterator constructor incrementally reads data and
allocates the internal buffer as-needed. When _M_dispose() is called, there
is a check for whether the local buffer is being used - if it is, there is
an additional check guarding __builtin_unreachable() for the value of
_M_string_length. The constructor does not initialize _M_string_length
until all data has been read, so the first re-allocation out of the local
buffer will have an uninitialized read.

This updates the basic_string input iterator constructor to properly set
_M_string_length as data is being read.  It additionally introduces a new
_M_assign_terminator() function to assign the null-terminator based on the
currently-stored _M_string_length.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (_M_assign_terminator()): New
  function.
  (_M_set_length()): Use _M_assign_terminator().
* include/bits/basic_string.tcc (_M_construct(InIter, InIter,
  input_iterator_tag)): Set length incrementally, use
  _M_assign_terminator().

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 0fa32afeb..ba02d8f0f 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -258,12 +258,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _M_capacity(size_type __capacity)
   { _M_allocated_capacity = __capacity; }

+  _GLIBCXX20_CONSTEXPR
+  void
+  _M_assign_terminator()
+  { traits_type::assign(_M_data()[_M_string_length], _CharT()); }
+
   _GLIBCXX20_CONSTEXPR
   void
   _M_set_length(size_type __n)
   {
_M_length(__n);
-   traits_type::assign(_M_data()[__n], _CharT());
+   _M_assign_terminator();
   }

   _GLIBCXX20_CONSTEXPR
diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.tcc
index f0a44e5e8..84366a44a 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -182,6 +182,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
++__beg;
  }

+   _M_length(__len);
+
struct _Guard
{
  _GLIBCXX20_CONSTEXPR
@@ -206,12 +208,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_capacity(__capacity);
  }
traits_type::assign(_M_data()[__len++], *__beg);
+   _M_length(__len);
++__beg;
  }

__guard._M_guarded = 0;

-   _M_set_length(__len);
+   _M_assign_terminator();
   }

   template
--
2.21.0







This electronic mail message and any attached files contain information 
intended for the exclusive use of the individual or entity to whom it is 
addressed and may contain information that is proprietary, confidential and/or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, you are hereby notified that any viewing, copying, disclosure or 
distribution of this information may be subject to legal restriction or 
sanction. Please notify the sender, by electronic mail or telephone, of any 
unintended recipients and delete the original message without making any copies.



[PATCH] gfortran: Rely on dg-do-what-default to avoid running pr85853.f90, pr107254.f90 and vect-alias-check-1.F90 on non-vector targets

2023-11-02 Thread Patrick O'Neill
Testcases in gfortran.dg/vect/vect.exp rely on
check_vect_support_and_set_flags to set dg-do-what-default and avoid
running vector tests on non-vector targets. The three testcases in this
patch overwrite the default with dg-do run which causes issues
for non-vector targets.

Removing the dg-do run directive resolves this issue for non-vector
targets (while still running the tests on vector targets).

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/pr107254.f90: Remove dg-do run directive.
* gfortran.dg/vect/pr85853.f90: Ditto.
* gfortran.dg/vect/vect-alias-check-1.F90: Ditto.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc & rv64gcv to make sure the testcases compile/run
as expected.

These files haven't been changed in a long time so I'm not sure why (or
if) this hasn't been run into by other people before.
---
 gcc/testsuite/gfortran.dg/vect/pr107254.f90   | 2 --
 gcc/testsuite/gfortran.dg/vect/pr85853.f90| 1 -
 gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90 | 1 -
 3 files changed, 4 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/vect/pr107254.f90 
b/gcc/testsuite/gfortran.dg/vect/pr107254.f90
index 85bcb5f3fa2..adce6bedc30 100644
--- a/gcc/testsuite/gfortran.dg/vect/pr107254.f90
+++ b/gcc/testsuite/gfortran.dg/vect/pr107254.f90
@@ -1,5 +1,3 @@
-! { dg-do run }
-
 subroutine dlartg( f, g, s, r )
   implicit none
   double precision :: f, g, r, s
diff --git a/gcc/testsuite/gfortran.dg/vect/pr85853.f90 
b/gcc/testsuite/gfortran.dg/vect/pr85853.f90
index 68f4a004324..4c0e3b81a09 100644
--- a/gcc/testsuite/gfortran.dg/vect/pr85853.f90
+++ b/gcc/testsuite/gfortran.dg/vect/pr85853.f90
@@ -1,5 +1,4 @@
 ! Taken from execute/where_2.f90, but with special flags.
-! { dg-do run }
 ! { dg-additional-options "-fno-tree-loop-vectorize" }
 
 ! Program to test the WHERE constructs
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90 
b/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90
index 3014ff9f3b6..85ae9b151e3 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-alias-check-1.F90
@@ -1,4 +1,3 @@
-! { dg-do run }
 ! { dg-additional-options "-fno-inline" }
 
 #define N 200
-- 
2.34.1



Re: [PATCH 4/4] maintainer-scripts/gcc_release: cleanup whitespace

2023-11-02 Thread Joseph Myers
On Thu, 2 Nov 2023, Sam James wrote:

> maintainer-scripts/
>   * gcc_release: Cleanup whitespace.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 3/4] maintainer-scripts/gcc_release: use HTTPS for links

2023-11-02 Thread Joseph Myers
On Thu, 2 Nov 2023, Sam James wrote:

> maintainer-scripts/
>   * gcc_release: Use HTTPS for links.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Format gotools.sum closer to what DejaGnu does

2023-11-02 Thread rep . dot . nop
On 2 November 2023 18:06:54 CET, Maxim Kuvyrkov  
wrote:
>> On Nov 2, 2023, at 21:02, rep.dot@gmail.com wrote:
>> 
>> Hi Maxim!
>> 
>> Many thanks for the patch! Quick question below..
>> 
>> On 2 November 2023 13:48:55 CET, Maxim Kuvyrkov  
>> wrote:
>>> ... to restore compatability with validate_failures.py .
>>> The testsuite script validate_failures.py expects
>>> "Running  ..." to extract  values,
>>> and gotools.sum provided "Running ".
>>> 
>>> Note that libgo.sum, which also uses Makefile logic to generate
>>> DejaGnu-like output, already has "..." suffix.
>>> 
>>> gotools/ChangeLog:
>>> 
>>> * Makefile.am: Update "Running  ..." output
>>> * Makefile.in: Regenerate.
>>> ---
>>> gotools/Makefile.am | 4 ++--
>>> gotools/Makefile.in | 5 +++--
>>> 2 files changed, 5 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/gotools/Makefile.am b/gotools/Makefile.am
>>> index 7b5302990f8..d2376b9c25b 100644
>>> --- a/gotools/Makefile.am
>>> +++ b/gotools/Makefile.am
>>> @@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
>>> check-cgo-test check-carchive-test
>>> @cp gotools.sum gotools.log
>>> @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog 
>>> cmd_vet-testlog embed-testlog; do \
>>>   testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
>>> -   echo "Running $${testname}" >> gotools.sum; \
>>> -   echo "Running $${testname}" >> gotools.log; \
>>> +   echo "Running $${testname} ..." >> gotools.sum; \
>>> +   echo "Running $${testname} ..." >> gotools.log; \
>>>   sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
>>>   grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
>>> 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
>>> done
>>> diff --git a/gotools/Makefile.in b/gotools/Makefile.in
>>> index 2783b91ef4b..9cc238e748d 100644
>>> --- a/gotools/Makefile.in
>>> +++ b/gotools/Makefile.in
>>> @@ -317,6 +317,7 @@ pdfdir = @pdfdir@
>>> prefix = @prefix@
>>> program_transform_name = @program_transform_name@
>>> psdir = @psdir@
>>> +runstatedir = @runstatedir@
>> 
>> Are you sure you used the correct version of automake?
>
>I used automake 1.15.1 (from Ubuntu 20.04 automake-1.15 package), and I 
>double-checked after getting the runstatedir update.

I think that runstatedir is a Debian (and derivatives) addition, would probably 
suffice to just drop that line manually..

The patch itself looks like it would be ok, probably even obvious, but I can 
not approve it.

I'm a bit surprised that you don't need to have "exp" != None for 
validate-failures to work after your exp addition, but I take it you checked 
that aspect :-)

thanks, again!

>
>I would appreciate someone checking on their side to make sure I don't have 
>something weird going on in my setup.
>
>--
>Maxim Kuvyrkov
>https://www.linaro.org
>



Re: [Patch, fortran] PR98498 - Interp request: defined operators and unlimited polymorphic

2023-11-02 Thread Paul Richard Thomas
Hi Harald,

I was overthinking the problem. The rejected cases led me to a fix that can
only be described as a considerable simplification compared with the first
patch!

The testcase now reflects the requirements of the standard and
regtests without failures.

OK for mainline?

Thanks

Paul

Fortran: Defined operators with unlimited polymorphic args [PR98498]

2023-11-02  Paul Thomas  

gcc/fortran
PR fortran/98498
* interface.cc (upoly_ok): Defined operators using unlimited
polymorphic formal arguments must not override the intrinsic
operator use.

gcc/testsuite/
PR fortran/98498
* gfortran.dg/interface_50.f90: New test.


On Wed, 1 Nov 2023 at 20:12, Harald Anlauf  wrote:

> Hi Paul,
>
> Am 01.11.23 um 19:02 schrieb Paul Richard Thomas:
> > The interpretation request came in a long time ago but I only just got
> > around to implementing it.
> >
> > The updated text from the standard is in the comment. Now I am writing
> > this, I think that I should perhaps use switch(op)/case rather than using
> > if/else if and depending on the order of the gfc_intrinsic_op enum being
> > maintained. Thoughts?
>
> the logic is likely harder to parse with if/else than with
> switch(op)/case.  However, I do not think that the order of
> the enum will ever be changed, as the module format relies
> on that very order.
>
> > The testcase runs fine with both mainline and nagfor. I think that
> > compile-only with counts of star-eq and star_not should suffice.
>
> I found other cases that are rejected even with your patch,
> but which are accepted by nagfor.  Example:
>
> print *, ('a' == c)
>
> Nagfor prints F at runtime as expected, as it correctly resolves
> this to star_eq.  Further examples can be easily constructed.
>
> Can you have a look?
>
> Thanks,
> Harald
>
> > Regtests with no regressions. OK for mainline?
> >
> > Paul
> >
> > Fortran: Defined operators with unlimited polymorphic args [PR98498]
> >
> > 2023-11-01  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/98498
> > * interface.cc (upoly_ok): New function.
> > (gfc_extend_expr): Use new function to ensure that defined
> > operators using unlimited polymorphic formal arguments do not
> > override their intrinsic uses.
> >
> > gcc/testsuite/
> > PR fortran/98498
> > * gfortran.dg/interface_50.f90: New test.
> >
>
>
diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index 8c4571e0aa6..fc4fe662eab 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -4737,6 +4737,17 @@ gfc_extend_expr (gfc_expr *e)
 	  if (sym != NULL)
 	break;
 	}
+
+  /* F2018(15.4.3.4.2) requires that the use of unlimited polymorphic
+	 formal arguments does not override the intrinsic uses.  */
+  gfc_push_suppress_errors ();
+  if (sym
+	  && (UNLIMITED_POLY (sym->formal->sym)
+	  || (sym->formal->next
+		  && UNLIMITED_POLY (sym->formal->next->sym)))
+	  && !gfc_check_operator_interface (sym, e->value.op.op, e->where))
+	sym = NULL;
+  gfc_pop_suppress_errors ();
 }
 
   /* TODO: Do an ambiguity-check and error if multiple matching interfaces are
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
!
! Tests the fix for PR98498, which was subject to an interpretation request
! as to whether or not the interface operator overrode the intrinsic use.
! (See PR for correspondence)
!
! Contributed by Paul Thomas  
!
MODULE mytypes
  IMPLICIT none

  TYPE pvar
 character(len=20) :: name
 integer   :: level
  end TYPE pvar

  interface operator (==)
 module procedure star_eq
  end interface

  interface operator (.not.)
 module procedure star_not
  end interface

contains
  function star_eq(a, b)
implicit none
class(*), intent(in) :: a, b
logical :: star_eq
select type (a)
  type is (pvar)
  select type (b)
type is (pvar)
  if((a%level .eq. b%level) .and. (a%name .eq. b%name)) then
star_eq = .true.
  else
star_eq = .false.
  end if
type is (integer)
  star_eq = (a%level == b)
  end select
  class default
star_eq = .false.
end select
  end function star_eq

  function star_not (a)
implicit none
class(*), intent(in) :: a
type(pvar) :: star_not
select type (a)
  type is (pvar)
star_not = a
star_not%level = -star_not%level
  type is (real)
star_not = pvar ("real", -int(a))
  class default
star_not = pvar ("noname", 0)
end select
  end function

end MODULE mytypes

program test_eq
   use mytypes
   implicit none

   type(pvar) x, y
   integer :: i = 4
   real :: r = 2.0
   character(len = 4, kind =4) :: c = "abcd"
! Check that intrinsic use of .not. and == is not overridden.
   if (.not.(i == 2*int (r))) stop 1
   if (r == 1.0) stop 2

! Test defined operator ==
   x = pvar('test 1', 100)
   y = pvar('test 1', 100)
   if (.not.(x == y)) stop 3
   y = pvar('test 2', 100)
   if (x == y) stop 4
   if (x == r) stop 5! cla

Re: [PATCH] Format gotools.sum closer to what DejaGnu does

2023-11-02 Thread Rainer Orth
rep.dot@gmail.com writes:

> On 2 November 2023 18:06:54 CET, Maxim Kuvyrkov 
> wrote:
>>> On Nov 2, 2023, at 21:02, rep.dot@gmail.com wrote:
>>> 
>>> Hi Maxim!
>>> 
>>> Many thanks for the patch! Quick question below..
>>> 
>>> On 2 November 2023 13:48:55 CET, Maxim Kuvyrkov
>>>  wrote:
[...]
>>> Are you sure you used the correct version of automake?
>>
>>I used automake 1.15.1 (from Ubuntu 20.04 automake-1.15 package), and I
>> double-checked after getting the runstatedir update.
>
> I think that runstatedir is a Debian (and derivatives) addition, would
> probably suffice to just drop that line manually..

One needs to use the exact version of the autotools as documented on
https://gcc.gnu.org/install/prerequisites.html.  Since distros often
apply local patches, it's best to use a self-built version to guard
against those.  Manually dropping parts of the regenerated files is
heavily fraught with error, especially since you usually don't know what
to drop.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH V3 5/6] aarch64: Add front-end argument type checking for target builtins

2023-11-02 Thread Victor Do Nascimento
In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.

Example:

  const char *regname = "amcgcr_el0";
  long long a = __builtin_aarch64_rsr64 (regname);

is reduced by the ccp1 pass to

  long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");

As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.

The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
New.
* config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
Add check_general_builtin_call call.
* config/aarch64/aarch64-protos.h (check_general_builtin_call):
New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr-3.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc| 31 +++
 gcc/config/aarch64/aarch64-c.cc   |  4 +--
 gcc/config/aarch64/aarch64-protos.h   |  4 +++
 .../gcc.target/aarch64/acle/rwsr-3.c  | 18 +++
 4 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index dd76cca611b..c5f20f68bca 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2127,6 +2127,37 @@ aarch64_general_builtin_decl (unsigned code, bool)
   return aarch64_builtin_decls[code];
 }
 
+bool
+aarch64_check_general_builtin_call (location_t location, vec,
+   unsigned int code, tree fndecl,
+   unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
+{
+  switch (code)
+{
+case AARCH64_RSR:
+case AARCH64_RSRP:
+case AARCH64_RSR64:
+case AARCH64_RSRF:
+case AARCH64_RSRF64:
+case AARCH64_WSR:
+case AARCH64_WSRP:
+case AARCH64_WSR64:
+case AARCH64_WSRF:
+case AARCH64_WSRF64:
+  if (TREE_CODE (args[0]) != NOP_EXPR
+ || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
+ || (TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
+ != STRING_CST))
+   {
+ error_at (location, "first argument to %qD must be a string literal",
+   fndecl);
+ return false;
+   }
+}
+  /* Default behavior.  */
+  return true;
+}
+
 typedef enum
 {
   SIMD_ARG_COPY_TO_REG,
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index ab8844f6049..be8b7236cf9 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -339,8 +339,8 @@ aarch64_check_builtin_call (location_t loc, vec 
arg_loc,
   switch (code & AARCH64_BUILTIN_CLASS)
 {
 case AARCH64_BUILTIN_GENERAL:
-  return true;
-
+  return aarch64_check_general_builtin_call (loc, arg_loc, subcode,
+orig_fndecl, nargs, args);
 case AARCH64_BUILTIN_SVE:
   return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
  orig_fndecl, nargs, args);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 5d6a1e75700..dbd486cfea4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -990,6 +990,10 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
 void handle_arm_acle_h (void);
 void handle_arm_neon_h (void);
 
+bool aarch64_check_general_builtin_call (location_t, vec,
+unsigned int, tree, unsigned int,
+tree *);
+
 namespace aarch64_sve {
   void init_builtins ();
   void handle_arm_sve_h ();
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c
new file mode 100644
index 000..17038fefbf6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c
@@ -0,0 +1,18 @@
+/* Test the __arm_[r,w]sr ACLE intrinsics family.  */
+/* Ensure that illegal behavior is rejected by the compiler.  */
+
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -O3 -march=armv8.4-a" } */
+
+#include 
+
+void
+test_non_const_sysreg_name ()
+{
+  const char *regname = "trcseqstr";
+  long long a = __arm_rsr64 (regname); /* { dg-error "first argument to 
'__builtin_aarch64_rsr64' must be a string literal" } */
+  __arm_wsr64 (regname, a); /* { dg-error "first argument to 
'__builtin_aar

Re: [PATCH] Format gotools.sum closer to what DejaGnu does

2023-11-02 Thread Maxim Kuvyrkov
> On Nov 2, 2023, at 21:02, rep.dot@gmail.com wrote:
> 
> Hi Maxim!
> 
> Many thanks for the patch! Quick question below..
> 
> On 2 November 2023 13:48:55 CET, Maxim Kuvyrkov  
> wrote:
>> ... to restore compatability with validate_failures.py .
>> The testsuite script validate_failures.py expects
>> "Running  ..." to extract  values,
>> and gotools.sum provided "Running ".
>> 
>> Note that libgo.sum, which also uses Makefile logic to generate
>> DejaGnu-like output, already has "..." suffix.
>> 
>> gotools/ChangeLog:
>> 
>> * Makefile.am: Update "Running  ..." output
>> * Makefile.in: Regenerate.
>> ---
>> gotools/Makefile.am | 4 ++--
>> gotools/Makefile.in | 5 +++--
>> 2 files changed, 5 insertions(+), 4 deletions(-)
>> 
>> diff --git a/gotools/Makefile.am b/gotools/Makefile.am
>> index 7b5302990f8..d2376b9c25b 100644
>> --- a/gotools/Makefile.am
>> +++ b/gotools/Makefile.am
>> @@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
>> check-cgo-test check-carchive-test
>> @cp gotools.sum gotools.log
>> @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog 
>> cmd_vet-testlog embed-testlog; do \
>>   testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
>> -   echo "Running $${testname}" >> gotools.sum; \
>> -   echo "Running $${testname}" >> gotools.log; \
>> +   echo "Running $${testname} ..." >> gotools.sum; \
>> +   echo "Running $${testname} ..." >> gotools.log; \
>>   sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
>>   grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
>> 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
>> done
>> diff --git a/gotools/Makefile.in b/gotools/Makefile.in
>> index 2783b91ef4b..9cc238e748d 100644
>> --- a/gotools/Makefile.in
>> +++ b/gotools/Makefile.in
>> @@ -317,6 +317,7 @@ pdfdir = @pdfdir@
>> prefix = @prefix@
>> program_transform_name = @program_transform_name@
>> psdir = @psdir@
>> +runstatedir = @runstatedir@
> 
> Are you sure you used the correct version of automake?

I used automake 1.15.1 (from Ubuntu 20.04 automake-1.15 package), and I 
double-checked after getting the runstatedir update.

I would appreciate someone checking on their side to make sure I don't have 
something weird going on in my setup.

--
Maxim Kuvyrkov
https://www.linaro.org



Re: [PATCH] Format gotools.sum closer to what DejaGnu does

2023-11-02 Thread rep . dot . nop
Hi Maxim!

Many thanks for the patch! Quick question below..

On 2 November 2023 13:48:55 CET, Maxim Kuvyrkov  
wrote:
>... to restore compatability with validate_failures.py .
>The testsuite script validate_failures.py expects
>"Running  ..." to extract  values,
>and gotools.sum provided "Running ".
>
>Note that libgo.sum, which also uses Makefile logic to generate
>DejaGnu-like output, already has "..." suffix.
>
>gotools/ChangeLog:
>
>   * Makefile.am: Update "Running  ..." output
>   * Makefile.in: Regenerate.
>---
> gotools/Makefile.am | 4 ++--
> gotools/Makefile.in | 5 +++--
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
>diff --git a/gotools/Makefile.am b/gotools/Makefile.am
>index 7b5302990f8..d2376b9c25b 100644
>--- a/gotools/Makefile.am
>+++ b/gotools/Makefile.am
>@@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
>check-cgo-test check-carchive-test
>   @cp gotools.sum gotools.log
>   @for file in cmd_go-testlog runtime-testlog cgo-testlog 
> carchive-testlog cmd_vet-testlog embed-testlog; do \
> testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
>-echo "Running $${testname}" >> gotools.sum; \
>-echo "Running $${testname}" >> gotools.log; \
>+echo "Running $${testname} ..." >> gotools.sum; \
>+echo "Running $${testname} ..." >> gotools.log; \
> sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
> grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
> 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
>   done
>diff --git a/gotools/Makefile.in b/gotools/Makefile.in
>index 2783b91ef4b..9cc238e748d 100644
>--- a/gotools/Makefile.in
>+++ b/gotools/Makefile.in
>@@ -317,6 +317,7 @@ pdfdir = @pdfdir@
> prefix = @prefix@
> program_transform_name = @program_transform_name@
> psdir = @psdir@
>+runstatedir = @runstatedir@

Are you sure you used the correct version of automake?

thanks

> sbindir = @sbindir@
> sharedstatedir = @sharedstatedir@
> srcdir = @srcdir@
>@@ -1003,8 +1004,8 @@ mostlyclean-local:
> @NATIVE_TRUE@ @cp gotools.sum gotools.log
> @NATIVE_TRUE@ @for file in cmd_go-testlog runtime-testlog cgo-testlog 
> carchive-testlog cmd_vet-testlog embed-testlog; do \
> @NATIVE_TRUE@   testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; 
> \
>-@NATIVE_TRUE@   echo "Running $${testname}" >> gotools.sum; \
>-@NATIVE_TRUE@   echo "Running $${testname}" >> gotools.log; \
>+@NATIVE_TRUE@   echo "Running $${testname} ..." >> gotools.sum; \
>+@NATIVE_TRUE@   echo "Running $${testname} ..." >> gotools.log; \
> @NATIVE_TRUE@   sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> 
> gotools.log; \
> @NATIVE_TRUE@   grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' 
> -e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
> @NATIVE_TRUE@ done



[PATCH V3 4/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-11-02 Thread Victor Do Nascimento
Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr.c: New.
* gcc.target/aarch64/acle/rwsr-1.c: Likewise.
* gcc.target/aarch64/acle/rwsr-2.c: Likewise.
* gcc.dg/pch/rwsr-pch.c: Likewise.
* gcc.dg/pch/rwsr-pch.hs: Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc| 191 ++
 gcc/config/aarch64/aarch64.md |  18 ++
 gcc/config/aarch64/arm_acle.h |  30 +++
 gcc/testsuite/gcc.dg/pch/rwsr-pch.c   |   7 +
 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs  |  10 +
 .../gcc.target/aarch64/acle/rwsr-1.c  |  29 +++
 .../gcc.target/aarch64/acle/rwsr-2.c  |  25 +++
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
 8 files changed, 454 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.c
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..dd76cca611b 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -47,6 +47,7 @@
 #include "stringpool.h"
 #include "attribs.h"
 #include "gimple-fold.h"
+#include "builtins.h"
 
 #define v8qi_UP  E_V8QImode
 #define v8di_UP  E_V8DImode
@@ -808,6 +809,17 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1810,65 @@ aarch64_init_rng_builtins (void)
   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for reading system register.  */
+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NU

[PATCH V3 6/6] aarch64: Add system register duplication check selftest

2023-11-02 Thread Victor Do Nascimento
Add a build-time test to check whether system register data, as
imported from `aarch64-sys-reg.def' has any duplicate entries.

Duplicate entries are defined as any two SYSREG entries in the .def
file which share the same encoding values (as specified by its `CPENC'
field) and where the relationship amongst the two does not fit into
one of the following categories:

* Simple aliasing: In some cases, it is observed that one
register name serves as an alias to another.  One example of
this is where TRCEXTINSELR aliases TRCEXTINSELR0.
* Expressing intent: It is possible that when a given register
serves two distinct functions depending on how it is used, it
is given two distinct names whose use should match the context
under which it is being used.  Example:  Debug Data Transfer
Register. When used to receive data, it should be accessed as
DBGDTRRX_EL0 while when transmitting data it should be
accessed via DBGDTRTX_EL0.
* Register depreciation: Some register names have been
deprecated and should no longer be used, but backwards-
compatibility requires that such names continue to be
recognized, as is the case for the SPSR_EL1 register, whose
access via the SPSR_SVC name is now deprecated.
* Same encoding different target: Some encodings are given
different meaning depending on the target architecture and, as
such, are given different names in each of theses contexts.
We see an example of this for CPENC(3,4,2,0,0), which
corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
in Armv8-R targets.

A consequence of these observations is that `CPENC' duplication is
acceptable iff at least one of the `properties' or `arch_reqs' fields
of the `sysreg_t' structs associated with the two registers in
question differ and it's this condition that is checked by the new
`aarch64_test_sysreg_encoding_clashes' function.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_test_sysreg_encoding_clashes): New.
(aarch64_run_selftests): add call to
aarch64_test_sysreg_encoding_clashes selftest.
---
 gcc/config/aarch64/aarch64.cc | 44 +++
 1 file changed, 44 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index eaeab0be436..c0d75f167be 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -22,6 +22,7 @@
 
 #define INCLUDE_STRING
 #define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -28390,6 +28391,48 @@ aarch64_test_fractional_cost ()
   ASSERT_EQ (cf (1, 2).as_double (), 0.5);
 }
 
+/* Calculate whether our system register data, as imported from
+   `aarch64-sys-reg.def' has any duplicate entries.  */
+static void
+aarch64_test_sysreg_encoding_clashes (void)
+{
+  using dup_instances_t = hash_map>;
+
+  dup_instances_t duplicate_instances;
+
+  /* Every time an encoding is established to come up more than once
+ we add it to a "clash-analysis queue", which is then used to extract
+ necessary information from our hash map when establishing whether
+ repeated encodings are valid.  */
+
+  /* 1) Collect recurrence information.  */
+  for (unsigned i = 0; i < nsysreg; i++)
+{
+  const sysreg_t *reg = sysreg_structs + i;
+
+  std::vector *tmp
+   = &duplicate_instances.get_or_insert (reg->encoding);
+
+  tmp->push_back (reg);
+}
+
+  /* 2) Carry out analysis on collected data.  */
+  for (auto instance : duplicate_instances)
+{
+  unsigned nrep = instance.second.size ();
+  if (nrep > 1)
+   for (unsigned i = 0; i < nrep; i++)
+ for (unsigned j = i + 1; j < nrep; j++)
+   {
+ const sysreg_t *a = instance.second[i];
+ const sysreg_t *b = instance.second[j];
+ ASSERT_TRUE ((a->properties != b->properties)
+  || (a->arch_reqs != b->arch_reqs));
+   }
+}
+}
+
 /* Run all target-specific selftests.  */
 
 static void
@@ -28397,6 +28440,7 @@ aarch64_run_selftests (void)
 {
   aarch64_test_loading_full_dump ();
   aarch64_test_fractional_cost ();
+  aarch64_test_sysreg_encoding_clashes ();
 }
 
 } // namespace selftest
-- 
2.41.0



[PATCH V3 0/6] aarch64: Add support for __arm_rsr and __arm_wsr ACLE function family

2023-11-02 Thread Victor Do Nascimento
Implement changes resulting from upstream discussion about the
implementation as presented in V2 of this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633458.html

Note that patch 4/7 of the previous iteration of this series (Add
basic target_print_operand support for CONST_STRING) was resubmitted
and upstreamed separately due to its use in other work which had since
been submitted.

---

This patch series adds support for reading and writing to and from
system registers via the relevant ACLE-defined builtins [1].

The patch series makes a series of additions to the aarch64-specific
areas of the compiler to make this possible.

Firstly, a mechanism for defining system registers is established via a
new .def file and the new SYSREG macro.  This macro is the same as is
used in Binutils and system register entries are compatible with
either code-base.

Given the information contained in this system register definition
file, a compile-time validation mechanism is implemented, such that any
system register name passed as a string literal argument to these
builtins can be checked against known system registers and its use
for a given target architecture validated.

Finally, patterns for each of these builtins are added to the back-end
such that, if all validation criteria are met, the correct assembly is
emitted.

Thus, the following example of system register access is now valid for
GCC:

long long old = __arm_rsr("trcseqstr");
__arm_wsr("trcseqstr", new);

Testing:
 - Bootstrap/regtest on aarch64-linux-gnu done.

[1] https://arm-software.github.io/acle/main/acle.html

Victor Do Nascimento (6):
  aarch64: Sync system register information with Binutils
  aarch64: Add support for aarch64-sys-regs.def
  aarch64: Implement system register validation tools
  aarch64: Implement system register r/w arm ACLE intrinsic functions
  aarch64: Add front-end argument type checking for target builtins
  aarch64: Add system register duplication check selftest

 gcc/config/aarch64/aarch64-builtins.cc|  222 
 gcc/config/aarch64/aarch64-c.cc   |4 +-
 gcc/config/aarch64/aarch64-protos.h   |6 +
 gcc/config/aarch64/aarch64-sys-regs.def   | 1064 +
 gcc/config/aarch64/aarch64.cc |  244 
 gcc/config/aarch64/aarch64.h  |   22 +
 gcc/config/aarch64/aarch64.md |   18 +
 gcc/config/aarch64/arm_acle.h |   30 +
 gcc/config/aarch64/predicates.md  |4 +
 gcc/testsuite/gcc.dg/pch/rwsr-pch.c   |7 +
 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs  |   10 +
 .../gcc.target/aarch64/acle/rwsr-1.c  |   29 +
 .../gcc.target/aarch64/acle/rwsr-2.c  |   25 +
 .../gcc.target/aarch64/acle/rwsr-3.c  |   18 +
 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  |  144 +++
 15 files changed, 1845 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.c
 create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

-- 
2.41.0



[PATCH V3 3/6] aarch64: Implement system register validation tools

2023-11-02 Thread Victor Do Nascimento
Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler.  In particular, this involves:

  1. Ensuring a supplied string corresponds to a known system
 register name.  System registers can be accessed either via their
 name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
 Register names are validated using a hash map, mapping known
 system register names to its corresponding `sysreg_t' struct,
 which is populated from the `aarch64_system_regs.def' file.
 Register name validation is done via `lookup_sysreg_map', while
 the encoding naming convention is validated via a parser
 implemented in this patch - `is_implem_def_reg'.
  2. Once a given register name is deemed to be valid, it is checked
 against a further 2 criteria:
   a. Is the referenced register implemented in the target
  architecture?  This is achieved by comparing the ARCH field
  in the relevant SYSREG entry from `aarch64_system_regs.def'
  against `aarch64_feature_flags' flags set at compile-time.
   b. Is the register being used correctly?  Check the requested
  operation against the FLAGS specified in SYSREG.
  This prevents operations like writing to a read-only system
  register.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): New.
(aarch64_retrieve_sysreg): Likewise.
* config/aarch64/aarch64.cc (is_implem_def_reg): Likewise.
(aarch64_valid_sysreg_name_p): Likewise.
(aarch64_retrieve_sysreg): Likewise.
(aarch64_register_sysreg): Likewise.
(aarch64_init_sysregs): Likewise.
(aarch64_lookup_sysreg_map): Likewise.
* config/aarch64/predicates.md (aarch64_sysreg_string): New.
---
 gcc/config/aarch64/aarch64-protos.h |   2 +
 gcc/config/aarch64/aarch64.cc   | 147 
 gcc/config/aarch64/predicates.md|   4 +
 3 files changed, 153 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc19..5d6a1e75700 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
 bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
 bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+bool aarch64_valid_sysreg_name_p (const char *);
+const char *aarch64_retrieve_sysreg (const char *, bool);
 rtx aarch64_check_zero_based_sve_index_immediate (rtx);
 bool aarch64_sve_index_immediate_p (rtx);
 bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a4a9e2e51ea..eaeab0be436 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -85,6 +85,7 @@
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
 #include "ssa.h"
+#include "hash-map.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2860,6 +2861,51 @@ const sysreg_t sysreg_structs[] =
 
 const unsigned nsysreg = ARRAY_SIZE (sysreg_structs);
 
+using sysreg_map_t = hash_map;
+static sysreg_map_t *sysreg_map = nullptr;
+
+/* Map system register names to their hardware metadata: encoding,
+   feature flags and architectural feature requirements, all of which
+   are encoded in a sysreg_t struct.  */
+void
+aarch64_register_sysreg (const char *name, const sysreg_t *metadata)
+{
+  bool dup = sysreg_map->put (name, metadata);
+  gcc_checking_assert (!dup);
+}
+
+/* Lazily initialize hash table for system register validation,
+   checking the validity of supplied register name and returning
+   register's associated metadata.  */
+static void
+aarch64_init_sysregs (void)
+{
+  gcc_assert (!sysreg_map);
+  sysreg_map = new sysreg_map_t;
+
+  for (unsigned i = 0; i < nsysreg; i++)
+{
+  const sysreg_t *reg = sysreg_structs + i;
+  aarch64_register_sysreg (reg->name, reg);
+}
+}
+
+/* No direct access to the sysreg hash-map should be made.  Doing so
+   risks trying to acess an unitialized hash-map and dereferencing the
+   returned double pointer without due care risks dereferencing a
+   null-pointer.  */
+const sysreg_t *
+aarch64_lookup_sysreg_map (const char *regname)
+{
+  if (!sysreg_map)
+aarch64_init_sysregs ();
+
+  const sysreg_t **sysreg_entry = sysreg_map->get (regname);
+  if (sysreg_entry != NULL)
+return *sysreg_entry;
+  return NULL;
+}
+
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
@@ -28116,6 +28162,107 @@ aarch64_pars_overlap_p (rtx par1, rtx par2)
   return false;
 }
 
+/* Parse an implementation-defined system register name of
+   the form S[0-3]_[0-7]_C[0-15]_C[0-

[PATCH V3 1/6] aarch64: Sync system register information with Binutils

2023-11-02 Thread Victor Do Nascimento
This patch adds the `aarch64-sys-regs.def' file, originally written
for Binutils, to GCC. In so doing, it provides GCC with the necessary
information for teaching the compiler about system registers known to
the assembler and how these can be used.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; By keeping both copies of the file in sync,
any `SYSREG (...)' that is added in one project is automatically added
to its counterpart.  This being the case, no change should be made in
the GCC copy of the file.  Any modifications should first be made in
Binutils and the resulting file copied over to GCC.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

* config/aarch64/aarch64-system-regs.def: New.
---
 gcc/config/aarch64/aarch64-sys-regs.def | 1064 +++
 1 file changed, 1064 insertions(+)
 create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def

diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
new file mode 100644
index 000..d24a2455503
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -0,0 +1,1064 @@
+/* aarch64-system-regs.def -- AArch64 opcode support.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of the GNU opcodes library.
+
+   This library is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   It is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING3.  If not,
+   see .  */
+
+/* Array of system registers and their associated arch features.
+
+   This file is also used by GCC.  Where necessary, any updates should
+   be made in Binutils and the updated file copied across to GCC, such
+   that the two projects are kept in sync at all times.
+
+   Before using #include to read this file, define a macro:
+
+ SYSREG (name, encoding, flags, features)
+
+  The NAME is the system register name, as recognized by the
+  assembler.  ENCODING provides the necessary information for the binary
+  encoding of the system register.  The FLAGS field is a bitmask of
+  relevant behavior information pertaining to the particular register.
+  For example: is it read/write-only? does it alias another register?
+  The FEATURES field maps onto ISA flags and specifies the architectural
+  feature requirements of the system register.  */
+
+  SYSREG ("accdata_el1",   CPENC (3,0,13,0,5), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el1", CPENC (3,0,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el2", CPENC (3,4,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el3", CPENC (3,6,1,0,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el1", CPENC (3,0,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el12",CPENC (3,5,5,1,0),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr0_el2", CPENC (3,4,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el3", CPENC (3,6,5,1,0),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el1", CPENC (3,0,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el12",CPENC (3,5,5,1,1),  F_ARCHEXT,  
AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr1_el2", CPENC (3,4,5,1,1),  0,  
AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el3", CPENC (3,6,5,1,1),  0,  

[PATCH V3 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-11-02 Thread Victor Do Nascimento
This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

  SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
  - NAME:  The system register name, as used in the assembly language.
  - CPENC: The system register encoding, mapping to:

   s__c_c_

  - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
  encode extra information required to ensure proper use of
  the system register.  For example, a read-only system
  register will have the flag F_REG_READ, while write-only
  registers will be labeled F_REG_WRITE.  Such flags are
  tested against at compile-time.
  - ARCH: The architectural features the system register is associated
  with.  This is encoded via one of three possible macros:
  1. When a system register is universally implemented, we say
  it has no feature requirements, so we tag it with the
  AARCH64_NO_FEATURES macro.
  2. When a register is only implemented for a single
  architectural extension EXT, the AARCH64_FEATURE (EXT), is
  used.
  3. When a given system register is made available by any of N
  possible architectural extensions, the AARCH64_FEATURES(N, ...)
  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (sysreg_t): New.
(sysreg_structs): Likewise.
(nsysreg): Likewise.
(AARCH64_FEATURE): Likewise.
(AARCH64_FEATURES): Likewise.
(AARCH64_NO_FEATURES): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
ISA flag.
(AARCH64_ISA_V8_1A): Likewise.
(AARCH64_ISA_V8_7A): Likewise.
(AARCH64_ISA_V8_8A): Likewise.
(AARCH64_NO_FEATURES): Likewise.
(AARCH64_FL_RAS): New ISA flag alias.
(AARCH64_FL_LOR): Likewise.
(AARCH64_FL_PAN): Likewise.
(AARCH64_FL_AMU): Likewise.
(AARCH64_FL_SCXTNUM): Likewise.
(AARCH64_FL_ID_PFR2): Likewise.
(F_DEPRECATED): New.
(F_REG_READ): Likewise.
(F_REG_WRITE): Likewise.
(F_ARCHEXT): Likewise.
(F_REG_ALIAS): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 53 +++
 gcc/config/aarch64/aarch64.h  | 22 +++
 2 files changed, 75 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5fd7063663c..a4a9e2e51ea 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2806,6 +2806,59 @@ static const struct processor all_cores[] =
feature_deps::V8A ().enable, &generic_tunings},
   {NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
 };
+/* Internal representation of system registers.  */
+typedef struct {
+  const char *name;
+  /* Stringified sysreg encoding values, represented as
+ s__c_c_.  */
+  const char *encoding;
+  /* Flags affecting sysreg usage, such as read/write-only.  */
+  unsigned properties;
+  /* Architectural features implied by sysreg.  */
+  aarch64_feature_flags arch_reqs;
+} sysreg_t;
+
+/* An aarch64_feature_set initializer for a single feature,
+   AARCH64_FEATURE_.  */
+#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
+
+/* Used by AARCH64_FEATURES.  */
+#define AARCH64_OR_FEATURES_1(X, F1) \
+  AARCH64_FEATURE (F1)
+#define AARCH64_OR_FEATURES_2(X, F1, F2) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
+#define AARCH64_OR_FEATURES_3(X, F1, ...) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
+
+/* An aarch64_feature_set initializer for the N features listed in "...".  */
+#define AARCH64_FEATURES(N, ...) \
+  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
+
+#define AARCH64_NO_FEATURES   0
+
+/* Flags associated with the properties of system registers.  It mainly serves
+   to mark particular registers as read or write only.  */
+#define F_DEPRECATED  (1 << 1)
+#define F_REG_READ(1 << 2)
+#define F_REG_WRITE   (1 << 3)
+#define F_ARCHEXT (1 << 4)
+/* Flag indicating register name is alias for another system register.  */
+#define F_REG_ALIAS   (1 << 5)
+
+/* Database of system registers, their encodings and architectural
+   requirements.  */
+const sysreg_

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Siddhesh Poyarekar

On 2023-11-02 10:12, Martin Uecker wrote:

This shouldn't be necessary. The object-size pass
can track pointer arithmeti if it comes after
inserting the .ACCESS_WITH_SIZE.

https://godbolt.org/z/fvc3aoPfd


The problem is dependency tracking through the pointer arithmetic, which 
Jakub suggested to work around by passing a reference to the size in 
.ACCESS_WITH_SIZE to avoid DCE/reordering.


Thanks,
Sid


Re: [PATCH V2] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized codes[PR112326]

2023-11-02 Thread Robin Dapp
Hi Juzhe,

in principle this LGTM.  It could use some function comments, though ;)
> +imm_avl_p (machine_mode mode)
>  {
>poly_uint64 nuints = GET_MODE_NUNITS (mode);
>  
>return nuints.is_constant ()
> -/* The vsetivli can only hold register 0~31.  */
> -? (IN_RANGE (nuints.to_constant (), 0, 31))
> -/* Only allowed in VLS-VLMAX mode.  */
> -: false;
> +/* The vsetivli can only hold register 0~31.  */
> +? (IN_RANGE (nuints.to_constant (), 0, 31))
> +/* Only allowed in VLS-VLMAX mode.  */
> +: false;
>  }

Please replace nuints (or untis) with nunits here everywhere.

> +;; The index of operand[] represents the machine mode of the instruction.
> +(define_attr "mode_idx" ""
> + (cond [(eq_attr "type" 
> "vlde,vste,vldm,vstm,vlds,vsts,vldux,vldox,vldff,vldr,vstr,\
> + 
> vlsegde,vlsegds,vlsegdux,vlsegdox,vlsegdff,vialu,vext,vicalu,\
> + 
> vshift,vicmp,viminmax,vimul,vidiv,vimuladd,vimerge,vimov,\
> + 
> vsalu,vaalu,vsmul,vsshift,vfalu,vfmul,vfdiv,vfmuladd,vfsqrt,vfrecp,\
> + vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,\
> + 
> vfcvtitof,vfncvtitof,vfncvtftoi,vfncvtftof,vmalu,vmiota,vmidx,\
> + 
> vimovxv,vfmovfv,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,\
> + vgather,vcompress,vmov")
> +(const_int 0)
> +
> +(eq_attr "type" "vimovvx,vfmovvf")
> +(const_int 1)
> +
> +(eq_attr "type" "vssegte,vnshift,vmpop,vmffs")
> +(const_int 2)   

I'm not that fond of the growing number of necessary indices even though I
realize that it's the most painless way for now.  Why is vnshift "2" and
not "0", though?

"4" for vnclip also looks dubious.  I didn't go through all of them.

Regards
 Robin



[PATCH v3] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-11-02 Thread Marek Polacek
On Sat, Oct 14, 2023 at 12:56:11AM -0400, Jason Merrill wrote:
> On 10/10/23 13:20, Marek Polacek wrote:
> > Thanks for looking into this.  It's kept me occupied for quite a while.
> 
> Thanks, nice work.
> 
> > On Tue, Aug 29, 2023 at 03:26:46PM -0400, Jason Merrill wrote:
> > > On 8/23/23 15:49, Marek Polacek wrote:
> > > > +struct A {
> > > > +  int x;
> > > > +  int y = id(x);
> > > > +};
> > > > +
> > > > +template
> > > > +constexpr int k(int) {  // k is not an immediate function 
> > > > because A(42) is a
> > > > +  return A(42).y;   // constant expression and thus not 
> > > > immediate-escalating
> > > > +}
> > > 
> > > Needs use(s) of k to test the comment.
> > 
> > True, and that revealed what I think is a bug in the standard.
> > In the test I'm saying:
> > 
> > // ??? [expr.const]#example-9 says:
> > //   k is not an immediate function because A(42) is a
> > //   constant expression and thus not immediate-escalating
> > // But I think the call to id(x) is *not* a constant expression and thus
> > // it is an immediate-escalating expression.  Therefore k *is*
> > // an immediate function.  So we get the error below.  clang++ agrees.
> > id(x) is not a constant expression because x isn't constant.
> 
> Not when considering id(x) by itself, but in the evaluation of A(42), the
> member x has just been initialized to constant 42.  And A(42) is
> constant-evaluated because "An aggregate initialization is an immediate
> invocation if it evaluates a default member initializer that has a
> subexpression that is an immediate-escalating expression."
> 
> I assume clang doesn't handle this passage properly yet because it was added
> during core review of the paper, for parity between aggregate initialization
> and constructor escalation.
> 
> This can be a follow-up patch.

I see.  So the fix will be to, for the aggregate initialization case, pass
the whole A(42).y thing to cxx_constant_eval, not just id(x).
 
> > So.  I think we want to refrain from instantiating things early
> > given how many problems that caused.  On the other hand, stashing
> > all the immediate-escalating decls into immediate_escalating_decls
> > and walking their bodies isn't going to be cheap.  I've checked
> > how big the vectors can get, but our testsuite isn't the best litmus
> > test because it's mostly smallish testcases without many #includes.
> > The worst offender is uninit-pr105562.C with
> > 
> > (gdb) p immediate_escalating_decls->length()
> > $2 = 2204
> > (gdb) p deferred_escalating_exprs->length()
> > $3 = 501
> > 
> > Compiling uninit-pr105562.C with g++13 and g++14 with this patch:
> > real 7.51 real 7.67
> > user 7.32 user 7.49
> > sys 0.15  sys 0.14
> > 
> > I've made sure not to walk the same bodies twice.  But there's room
> > for further optimization; I suppose we could escalate instantiated
> > functions right away rather than putting them into
> > immediate_escalating_decls and waiting till later.
> 
> Absolutely; if we see a call to a known consteval function, we should
> escalate...immediately.  As the patch seems to do already?

Right, I'm not sure what I was thinking.
 
> > I'm not certain
> > if I can just look at DECL_TEMPLATE_INSTANTIATED.
> 
> I'm not sure what you mean, but a constexpr function being instantiated
> doesn't necessarily imply that everything it calls has been instantiated, so
> we might not know yet if it needs to escalate.

I was pondering exactly that but you are of course correct here.
 
> > I suppose some
> > functions cannot possibly be promoted because they don't contain
> > any CALL_EXPRs.  So we may be able to rule them out while doing
> > cp_fold_r early.
> 
> Yes.  Or, the only immediate-escalating functions referenced have already
> been checked.
> 
> We can also do some escalation during constexpr evaluation: all the
> functions involved need to be instantiated for the evaluation, and if we
> encounter an immediate-escalating expression while evaluating a call to an
> immediate-escalating function, we can promote it then.  Though we can't
> necessarily mark it as not needing promotion, as there might be i-e exprs in
> branches that the particular evaluation doesn't take.

I've tried but I didn't get anywhere.  The patch was basically

--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2983,7 +2983,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   } fb (new_call.bindings);

   if (*non_constant_p)
-return t;
+{
+  if (cxx_dialect >= cxx20
+ && ctx->manifestly_const_eval == mce_false
+ && DECL_IMMEDIATE_FUNCTION_P (fun))
+   maybe_promote_function_to_consteval (current_function_decl);
+  return t;
+}

   /* We can't defer instantiating the function any longer.  */
   if (!DECL_INITIAL (fun)

but since I have to check mce_false, it didn't do anything useful
in practice (that is, it wouldn't escalate anything in my tests).

> > If a function is t

Re: [PATCH v3] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-11-02 Thread Marek Polacek
On Thu, Nov 02, 2023 at 11:28:43AM -0400, Marek Polacek wrote:
> On Sat, Oct 14, 2023 at 12:56:11AM -0400, Jason Merrill wrote:
> > As discussed above, we probably don't want to exclude implicitly-declared
> > special member functions.
> 
> Done (but removing the DECL_ARTIFICIAL check).

s/but/by/



[PATCH] vect: allow using inbranch simdclones for masked loops

2023-11-02 Thread Andre Vieira (lists)

Hi,

In a previous patch I did most of the work for this, but forgot to 
change the check for number of arguments matching between call and 
simdclone.  This check should accept calls without a mask to be matched 
against simdclones with mask arguments.  I also added tests to verify 
this feature actually works.



For the simd-builtins tests I decided to remove the sin (double) 
simdclone which would now be used, because it was inbranch and we enable 
their use for not inbranch.  Given the nature of the test, removing it 
made more sense, but thats not a strong opinion, happy to change.


Bootstrapped and regression tested on aarch64-unknown-linux-gnu and 
x86_64-pc-linux-gnu.


OK for trunk?

PS: I'll be away for two weeks from tomorrow, it would be really nice if 
this can go in for gcc-14, otherwise the previous work I did for this 
won't have any actual visible effect :(



gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_simd_clone_call): Allow unmasked
calls to use masked simdclones.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-clone-20.c: New file.
* gfortran.dg/simd-builtins-1.h: Adapt.
* gfortran.dg/simd-builtins-6.f90: Adapt.diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c
new file mode 100644
index 
..9f51a68f3a0c8851af2cd26bd8235c771b851d7d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c
@@ -0,0 +1,87 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly.  */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned.  */
+#pragma omp declare simd inbranch
+TYPE __attribute__((noinline))
+foo (TYPE a)
+{
+  return a + 1;
+}
+
+/* Check that "inbranch" clones are called correctly.  */
+
+void __attribute__((noipa))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+  #pragma omp simd
+  for (int i = 0; i < size; i++)
+b[i] = foo(a[i]);
+}
+
+/* Check that "inbranch" works when there might be unrolling.  */
+
+void __attribute__((noipa))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+  #pragma omp simd
+  for (int i = 0; i < 128; i++)
+b[i] = foo(a[i]);
+}
+
+/* Validate the outputs.  */
+
+void
+check_masked (TYPE *b, int size)
+{
+  for (int i = 0; i < size; i++)
+if (b[i] != (TYPE)(i + 1))
+  {
+   __builtin_printf ("error at %d\n", i);
+   __builtin_exit (1);
+  }
+}
+
+int
+main ()
+{
+  TYPE a[1024];
+  TYPE b[1024];
+
+  for (int i = 0; i < 1024; i++)
+a[i] = i;
+
+  masked_fixed (a, b);
+  check_masked (b, 128);
+
+  /* Test various sizes to cover machines with different vectorization
+ factors.  */
+  for (int size = 8; size <= 1024; size *= 2)
+{
+  masked (a, b, size);
+  check_masked (b, size);
+}
+
+  /* Test sizes that might exercise the partial vector code-path.  */
+  for (int size = 8; size <= 1024; size *= 2)
+{
+  masked (a, b, size-4);
+  check_masked (b, size-4);
+}
+
+  return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support them. 
 */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" 
{ target { aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" 
{ target { x86_64*-*-* } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
diff --git a/gcc/testsuite/gfortran.dg/simd-builtins-1.h 
b/gcc/testsuite/gfortran.dg/simd-builtins-1.h
index 
88d555cf41ad065ea525a63d7c05d15d3e5b54ed..08b73514a67d5791d35203530d039741946e9dcc
 100644
--- a/gcc/testsuite/gfortran.dg/simd-builtins-1.h
+++ b/gcc/testsuite/gfortran.dg/simd-builtins-1.h
@@ -1,4 +1,3 @@
-!GCC$ builtin (sin) attributes simd (inbranch)
 !GCC$ builtin (sinf) attributes simd (notinbranch)
 !GCC$ builtin (cosf) attributes simd
 !GCC$ builtin (cosf) attributes simd (notinbranch)
diff --git a/gcc/testsuite/gfortran.dg/simd-builtins-6.f90 
b/gcc/testsuite/gfortran.dg/simd-builtins-6.f90
index 
60bcac78f3e0cc492930f3eb73cf97065312dc1c..2c68f9f1818a35674a0aef15793aa312a48199a8
 100644
--- a/gcc/testsuite/gfortran.dg/simd-builtins-6.f90
+++ b/gcc/testsuite/gfortran.dg/simd-builtins-6.f90
@@ -2,7 +2,6 @@
 ! { dg-additional-options "-nostdinc -Ofast -fdump-tree-optimized" }
 ! { dg-additional-options "-msse2 -mno-avx" { target i?86-*-linux* 
x86_64-*-linux* } }
 
-!GCC$ builtin (sin) attributes simd (inbranch)
 !GCC$ builtin (sinf) attributes simd (notinbranch)
 !GCC$ builtin (cosf) attributes simd
 !GCC$ builtin (cosf) attributes simd (notinbranch)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
a9200767f67a4c9a8e106259be97a7bc7cd7e9dc.

[committed] libstdc++: Add assertion to std::string_view::remove_suffix [PR112314]

2023-11-02 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

Backports seem reasonable.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/112314
* include/std/string_view (string_view::remove_suffix): Add
debug assertion.
* 
testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc:
New test.
* 
testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc:
New test.
---
 libstdc++-v3/include/std/string_view   |  5 -
 .../modifiers/remove_prefix/debug.cc   | 14 ++
 .../modifiers/remove_suffix/debug.cc   | 14 ++
 3 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index d103abda668..9deae25f712 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -301,7 +301,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   constexpr void
   remove_suffix(size_type __n) noexcept
-  { this->_M_len -= __n; }
+  {
+   __glibcxx_assert(this->_M_len >= __n);
+   this->_M_len -= __n;
+  }
 
   constexpr void
   swap(basic_string_view& __sv) noexcept
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc
new file mode 100644
index 000..37204583b71
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_prefix/debug.cc
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++17 } }
+
+#include 
+
+constexpr bool
+check_remove_prefix()
+{
+  std::string_view sv("123");
+  sv.remove_prefix(4);
+  // { dg-error "not a constant expression" "" { target *-*-* } 0 }
+  return true;
+}
+
+constexpr bool test = check_remove_prefix();
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc
new file mode 100644
index 000..a549e4c2471
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/modifiers/remove_suffix/debug.cc
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++17 } }
+
+#include 
+
+constexpr bool
+check_remove_suffix()
+{
+  std::string_view sv("123");
+  sv.remove_suffix(4);
+  // { dg-error "not a constant expression" "" { target *-*-* } 0 }
+  return true;
+}
+
+constexpr bool test = check_remove_suffix();
-- 
2.41.0



[PATCH] libstdc++: Improve static assert messages for monadic operations

2023-11-02 Thread Jonathan Wakely
Any objections or suggestions for better wording?

Tested x86_64-linux.

-- >8 --

The monadic operations for std::optional and std::expected make use of
internal helper traits __is_optional nad __is_expected, which are not
very user-friendly when shown in diagnostics. Add messages to the
assertions explaining the problem more clearly.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::and_then, expected::or_else):
Add string literals to static assertions.
* include/std/optional (optional::and_then, optional::or_else):
Likewise.
---
 libstdc++-v3/include/std/expected | 64 +++
 libstdc++-v3/include/std/optional | 24 +---
 2 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/libstdc++-v3/include/std/expected 
b/libstdc++-v3/include/std/expected
index a796f0b6f27..a176d4c3a78 100644
--- a/libstdc++-v3/include/std/expected
+++ b/libstdc++-v3/include/std/expected
@@ -843,8 +843,12 @@ namespace __expected
and_then(_Fn&& __f) &
{
  using _Up = __expected::__result<_Fn, _Tp&>;
- static_assert(__expected::__is_expected<_Up>);
- static_assert(is_same_v);
+ static_assert(__expected::__is_expected<_Up>,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected");
+ static_assert(is_same_v,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected with the same error_type");
 
  if (has_value())
return std::__invoke(std::forward<_Fn>(__f), _M_val);
@@ -857,8 +861,12 @@ namespace __expected
and_then(_Fn&& __f) const &
{
  using _Up = __expected::__result<_Fn, const _Tp&>;
- static_assert(__expected::__is_expected<_Up>);
- static_assert(is_same_v);
+ static_assert(__expected::__is_expected<_Up>,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected");
+ static_assert(is_same_v,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected with the same error_type");
 
  if (has_value())
return std::__invoke(std::forward<_Fn>(__f), _M_val);
@@ -871,8 +879,12 @@ namespace __expected
and_then(_Fn&& __f) &&
{
  using _Up = __expected::__result<_Fn, _Tp&&>;
- static_assert(__expected::__is_expected<_Up>);
- static_assert(is_same_v);
+ static_assert(__expected::__is_expected<_Up>,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected");
+ static_assert(is_same_v,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected with the same error_type");
 
  if (has_value())
return std::__invoke(std::forward<_Fn>(__f), std::move(_M_val));
@@ -886,8 +898,12 @@ namespace __expected
and_then(_Fn&& __f) const &&
{
  using _Up = __expected::__result<_Fn, const _Tp&&>;
- static_assert(__expected::__is_expected<_Up>);
- static_assert(is_same_v);
+ static_assert(__expected::__is_expected<_Up>,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected");
+ static_assert(is_same_v,
+   "the function passed to std::expected::and_then "
+   "must return a std::expected with the same error_type");
 
  if (has_value())
return std::__invoke(std::forward<_Fn>(__f), std::move(_M_val));
@@ -900,8 +916,12 @@ namespace __expected
or_else(_Fn&& __f) &
{
  using _Gr = __expected::__result<_Fn, _Er&>;
- static_assert(__expected::__is_expected<_Gr>);
- static_assert(is_same_v);
+ static_assert(__expected::__is_expected<_Gr>,
+   "the function passed to std::expected::or_else "
+   "must return a std::expected");
+ static_assert(is_same_v,
+   "the function passed to std::expected::or_else "
+   "must return a std::expected with the same value_type");
 
  if (has_value())
return _Gr(in_place, _M_val);
@@ -914,8 +934,12 @@ namespace __expected
or_else(_Fn&& __f) const &
{
  using _Gr = __expected::__result<_Fn, const _Er&>;
- static_assert(__expected::__is_expected<_Gr>);
- static_assert(is_same_v);
+ static_assert(__expected::__is_expected<_Gr>,
+   "the function passed to std::expected::or_else "
+   "must return a std::expected");
+ static_assert(is_same_v,
+   "the function passed to st

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 9:54 AM, Richard Biener  wrote:
> 
> On Thu, Nov 2, 2023 at 2:50 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Nov 2, 2023, at 3:57 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
 
 
 
> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> 
> On Tue, 31 Oct 2023, Qing Zhao wrote:
> 
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> 
>> For the following structure including a FAM with a counted_by attribute:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> for any object with such type:
>> 
>> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 
>> The setting to the size field should be done before the first reference
>> to the FAM field.
>> 
>> Such requirement to the user will guarantee that the first reference to
>> the FAM knows the size of the FAM.
>> 
>> We need to add this additional requirement to the user document.
> 
> Make sure the manual is very specific about exactly when size is
> considered to be an accurate representation of the space available for buf
> (given that, after malloc or realloc, it's going to be temporarily
> inaccurate).  If the intent is that inaccurate size at such a time means
> undefined behavior, say so explicitly.
 
 Yes, good point. We need to define this clearly in the beginning.
 We need to explicit say that
 
 the size of the FAM is defined by the latest “counted_by” value. And it’s 
 an undefined behavior when the size field is not defined when the FAM is 
 referenced.
 
 Is the above good enough?
 
 
> 
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> 
>> In C FE:
>> 
>> for every reference to a FAM, for example, "obj->buf" in the small 
>> example,
>> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>> if YES, replace the reference to "obj->buf" with a call to
>>.ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> 
> This seems plausible - but you should also consider the case of static
> initializers - remember the GNU extension for statically allocated objects
> with flexible array members (unless you're not allowing it with
> counted_by).
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = &x.buf;
> 
> I'd expect that to be valid - and unless you say such a usage is invalid,
 
 At this moment, I think that this should be valid.
 
 I,e, the following:
 
 struct A
 {
 size_t size;
 char buf[] __attribute__((counted_by(size)));
 };
 
 static struct A x = {sizeof "hello", "hello”};
 
 Should be valid, and x.size represents the number of elements of x.buf.
 Both x.size and x.buf are initialized statically.
 
> you should avoid the replacement in such a static initializer context when
> the FAM reference is to an object with a constant address (if
> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> expression; if it works fine as a constant-address lvalue, then the
> replacement would be OK).
 
 Then if such usage for the “counted_by” is valid, we need to replace the 
 FAM
 reference by a call to  .ACCESS_WITH_SIZE as well.
 Otherwise the “counted_by” relationship will be lost to the Middle end.
 
 With the current definition of .ACCESS_WITH_SIZE
 
 PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
 
 Isn’t the PTR (return value of the call) a LVALUE?
>>> 
>>> You probably want to specify that when a pointer to the array is taken the
>>> pointer has to be to the first array element (or do we want to mangle the
>>> 'size' accordingly for the instrumentation?).
>> 
>> Yes. Will add this into the user documentation.
>> 
>>> You also want to specify that
>>> the 'size' associated with such pointer is assumed to be unchanging and
>>> after changing the size such pointer has to be re-obtained.
>> 
>> What do you mean by “re-obtained”?
> 
> do
> 
> p = &container.array[0];
> 
> after any adjustments to 'array' or 'len' again and base further accesses on
> the new 'p'.


Then for the following example form Kees:

struct foo *f;
char *p;
int i;

f = alloc(maximum_possible);
f->count = 0;
p = f->buf;

for (i; data_is_available() && i < maximum_possible; i++) {
f->count ++;
p[i] = next_data_item();
}

Will not work?

We have to change it as:

struct foo *f;
char *p;
int i;

f = alloc(maximum_possible);
f->count = 0;
p = f->buf;

for (i; data_is_available() && i < maximum_possible; i++) {
 

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread Richard Biener
On Thu, 2 Nov 2023, ??? wrote:

> Hi, Richi.
> 
> >> Do we really need to have two modes for the optab though or could we
> >> simply require the target to support arbitrary offset modes (give it
> >> is implicitly constrained to ptr_mode for the base already)?  Or
> >> properly extend/truncate the offset at expansion time, say to ptr_mode
> >> or to the mode of sizetype.
> 
> For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
> Is it ok that I define strided load/store as single mode optab and default 
> Pmode as stride operand?
> How about scale and signed/unsigned operand ?
> It seems scale operand can be removed ? Since we can pass DR_STEP directly to 
> the stride arguments.
> But I think we can't remove signed/unsigned operand since for strided mode = 
> SI mode, the unsigned
> maximum stride = 2^31 wheras signed is 2 ^ 30.

On the GIMPLE side I think we want to have a sizetype operand and
indeed drop 'scale', the sizetype operand should be readily available.

> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 21:52
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
> Subject: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
>  
> > As previous Richard's suggested, we should support strided load/store in
> > loop vectorizer instead hacking RISC-V backend.
> > 
> > This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> > 
> > The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but 
> > with
> > changing vector offset into scalar stride.
>  
> I see that it follows gather/scatter.  I'll note that when introducing
> those we failed to add a specifier for TBAA and alignment info for the
> data access.  That means we have to use alias-set zero for the accesses
> (I see existing targets use UNSPECs with some not elaborate MEM anyway,
> but TBAA info might have been the "easy" and obvious property to 
> preserve).  For alignment we either have to assume unaligned or reject
> vectorization of accesses that do not have their original scalar accesses
> naturally aligned (aligned according to their mode).  We don't seem
> to check that though.
>  
> It might be fine to go forward with this since gather/scatter are broken
> in a similar way.
>  
> Do we really need to have two modes for the optab though or could we
> simply require the target to support arbitrary offset modes (give it
> is implicitly constrained to ptr_mode for the base already)?  Or
> properly extend/truncate the offset at expansion time, say to ptr_mode
> or to the mode of sizetype.
>  
> Thanks,
> Richard.
> > We don't have strided_load/strided_store and 
> > mask_strided_load/mask_strided_store since
> > it't unlikely RVV will have such optabs and we can't add the patterns that 
> > we can't test them.
> >
> > 
> > gcc/ChangeLog:
> > 
> > * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> > * internal-fn.cc (internal_load_fn_p): Ditto.
> > (internal_strided_fn_p): Ditto.
> > (internal_fn_len_index): Ditto.
> > (internal_fn_mask_index): Ditto.
> > (internal_fn_stored_value_index): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> > (MASK_LEN_STRIDED_STORE): Ditto.
> > * internal-fn.h (internal_strided_fn_p): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * optabs.def (OPTAB_CD): Ditto.
> > 
> > ---
> >  gcc/doc/md.texi | 51 +
> >  gcc/internal-fn.cc  | 44 ++
> >  gcc/internal-fn.def |  4 
> >  gcc/internal-fn.h   |  2 ++
> >  gcc/optabs.def  |  2 ++
> >  5 files changed, 103 insertions(+)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..5bac713a0dd 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> >  
> > +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}@var{n}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> > mode @var{n}.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_gather_load@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 2 as step.
> > +For each element index i:
> > +
> > +@itemize @bullet
> > +@item
> > +extend the stride to address width, using zero
> > +extension if operand 3 is 1 and sign extension if operand 3 is zero;
> > +@item
>

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread Richard Biener
On Thu, 2 Nov 2023, ??? wrote:

> Ok. So drop 'scale' and keep signed/unsigned argument, is that right?

I don't think we need signed/unsigned.  RTL expansion has the signedness
of the offset argument there and can just extend to the appropriate mode
to offset a pointer.

> And I wonder I should create the stride_type using size_type_node or 
> ptrdiff_type_node ?
> Which is preferrable ?

'sizetype' - that's the type we require to be used for 
the POINTER_PLUS_EXPR offset operand.


> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:27
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Hi, Richi.
> > 
> > >> Do we really need to have two modes for the optab though or could we
> > >> simply require the target to support arbitrary offset modes (give it
> > >> is implicitly constrained to ptr_mode for the base already)?  Or
> > >> properly extend/truncate the offset at expansion time, say to ptr_mode
> > >> or to the mode of sizetype.
> > 
> > For RVV, it's ok by default set stride type as ptr_mode/size_type by 
> > default.
> > Is it ok that I define strided load/store as single mode optab and default 
> > Pmode as stride operand?
> > How about scale and signed/unsigned operand ?
> > It seems scale operand can be removed ? Since we can pass DR_STEP directly 
> > to the stride arguments.
> > But I think we can't remove signed/unsigned operand since for strided mode 
> > = SI mode, the unsigned
> > maximum stride = 2^31 wheras signed is 2 ^ 30.
>  
> On the GIMPLE side I think we want to have a sizetype operand and
> indeed drop 'scale', the sizetype operand should be readily available.
>  
> > 
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-11-02 21:52
> > To: Juzhe-Zhong
> > CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
> > Subject: Re: [PATCH V2] OPTABS/IFN: Add 
> > mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> > On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > As previous Richard's suggested, we should support strided load/store in
> > > loop vectorizer instead hacking RISC-V backend.
> > > 
> > > This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> > > 
> > > The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but 
> > > with
> > > changing vector offset into scalar stride.
> >  
> > I see that it follows gather/scatter.  I'll note that when introducing
> > those we failed to add a specifier for TBAA and alignment info for the
> > data access.  That means we have to use alias-set zero for the accesses
> > (I see existing targets use UNSPECs with some not elaborate MEM anyway,
> > but TBAA info might have been the "easy" and obvious property to 
> > preserve).  For alignment we either have to assume unaligned or reject
> > vectorization of accesses that do not have their original scalar accesses
> > naturally aligned (aligned according to their mode).  We don't seem
> > to check that though.
> >  
> > It might be fine to go forward with this since gather/scatter are broken
> > in a similar way.
> >  
> > Do we really need to have two modes for the optab though or could we
> > simply require the target to support arbitrary offset modes (give it
> > is implicitly constrained to ptr_mode for the base already)?  Or
> > properly extend/truncate the offset at expansion time, say to ptr_mode
> > or to the mode of sizetype.
> >  
> > Thanks,
> > Richard.
> > > We don't have strided_load/strided_store and 
> > > mask_strided_load/mask_strided_store since
> > > it't unlikely RVV will have such optabs and we can't add the patterns 
> > > that we can't test them.
> > >
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> > > * internal-fn.cc (internal_load_fn_p): Ditto.
> > > (internal_strided_fn_p): Ditto.
> > > (internal_fn_len_index): Ditto.
> > > (internal_fn_mask_index): Ditto.
> > > (internal_fn_stored_value_index): Ditto.
> > > (internal_strided_fn_supported_p): Ditto.
> > > * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> > > (MASK_LEN_STRIDED_STORE): Ditto.
> > > * internal-fn.h (internal_strided_fn_p): Ditto.
> > > (internal_strided_fn_supported_p): Ditto.
> > > * optabs.def (OPTAB_CD): Ditto.
> > > 
> > > ---
> > >  gcc/doc/md.texi | 51 +
> > >  gcc/internal-fn.cc  | 44 ++
> > >  gcc/internal-fn.def |  4 
> > >  gcc/internal-fn.h   |  2 ++
> > >  gcc/optabs.def  |  2 ++
> > >  5 files changed, 103 insertions(+)
> > > 
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index fab2513105a..5bac713a0dd 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} 
> > > of the result should
> >

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread 钟居哲
Hi, Richi.

>> Do we really need to have two modes for the optab though or could we
>> simply require the target to support arbitrary offset modes (give it
>> is implicitly constrained to ptr_mode for the base already)?  Or
>> properly extend/truncate the offset at expansion time, say to ptr_mode
>> or to the mode of sizetype.

For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
Is it ok that I define strided load/store as single mode optab and default 
Pmode as stride operand?
How about scale and signed/unsigned operand ?
It seems scale operand can be removed ? Since we can pass DR_STEP directly to 
the stride arguments.
But I think we can't remove signed/unsigned operand since for strided mode = SI 
mode, the unsigned
maximum stride = 2^31 wheras signed is 2 ^ 30.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 21:52
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
Subject: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
 
> As previous Richard's suggested, we should support strided load/store in
> loop vectorizer instead hacking RISC-V backend.
> 
> This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> 
> The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
> changing vector offset into scalar stride.
 
I see that it follows gather/scatter.  I'll note that when introducing
those we failed to add a specifier for TBAA and alignment info for the
data access.  That means we have to use alias-set zero for the accesses
(I see existing targets use UNSPECs with some not elaborate MEM anyway,
but TBAA info might have been the "easy" and obvious property to 
preserve).  For alignment we either have to assume unaligned or reject
vectorization of accesses that do not have their original scalar accesses
naturally aligned (aligned according to their mode).  We don't seem
to check that though.
 
It might be fine to go forward with this since gather/scatter are broken
in a similar way.
 
Do we really need to have two modes for the optab though or could we
simply require the target to support arbitrary offset modes (give it
is implicitly constrained to ptr_mode for the base already)?  Or
properly extend/truncate the offset at expansion time, say to ptr_mode
or to the mode of sizetype.
 
Thanks,
Richard.
> We don't have strided_load/strided_store and 
> mask_strided_load/mask_strided_store since
> it't unlikely RVV will have such optabs and we can't add the patterns that we 
> can't test them.
>
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> * internal-fn.cc (internal_load_fn_p): Ditto.
> (internal_strided_fn_p): Ditto.
> (internal_fn_len_index): Ditto.
> (internal_fn_mask_index): Ditto.
> (internal_fn_stored_value_index): Ditto.
> (internal_strided_fn_supported_p): Ditto.
> * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> (MASK_LEN_STRIDED_STORE): Ditto.
> * internal-fn.h (internal_strided_fn_p): Ditto.
> (internal_strided_fn_supported_p): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 51 +
>  gcc/internal-fn.cc  | 44 ++
>  gcc/internal-fn.def |  4 
>  gcc/internal-fn.h   |  2 ++
>  gcc/optabs.def  |  2 ++
>  5 files changed, 103 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..5bac713a0dd 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>  
> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of mode 
> @var{n}.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i:
> +
> +@itemize @bullet
> +@item
> +extend the stride to address width, using zero
> +extension if operand 3 is 1 and sign extension if operand 3 is zero;
> +@item
> +multiply the extended stride by operand 4;
> +@item
> +add the result to the base; and
> +@item
> +load the value at that address (operand 1 + @var{i} * multiplied and 
> extended stride) into element @var{i} of operand 0.
> +@end itemize
> +
> +Similar to mask_len_load, the instruction loads at most (operand 6 + operand 
> 7) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} o

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread 钟居哲
Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
And I wonder I should create the stride_type using size_type_node or 
ptrdiff_type_node ?
Which is preferrable ?

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 22:27
To: 钟居哲
CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Thu, 2 Nov 2023, ??? wrote:
 
> Hi, Richi.
> 
> >> Do we really need to have two modes for the optab though or could we
> >> simply require the target to support arbitrary offset modes (give it
> >> is implicitly constrained to ptr_mode for the base already)?  Or
> >> properly extend/truncate the offset at expansion time, say to ptr_mode
> >> or to the mode of sizetype.
> 
> For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
> Is it ok that I define strided load/store as single mode optab and default 
> Pmode as stride operand?
> How about scale and signed/unsigned operand ?
> It seems scale operand can be removed ? Since we can pass DR_STEP directly to 
> the stride arguments.
> But I think we can't remove signed/unsigned operand since for strided mode = 
> SI mode, the unsigned
> maximum stride = 2^31 wheras signed is 2 ^ 30.
 
On the GIMPLE side I think we want to have a sizetype operand and
indeed drop 'scale', the sizetype operand should be readily available.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 21:52
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
> Subject: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
>  
> > As previous Richard's suggested, we should support strided load/store in
> > loop vectorizer instead hacking RISC-V backend.
> > 
> > This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> > 
> > The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but 
> > with
> > changing vector offset into scalar stride.
>  
> I see that it follows gather/scatter.  I'll note that when introducing
> those we failed to add a specifier for TBAA and alignment info for the
> data access.  That means we have to use alias-set zero for the accesses
> (I see existing targets use UNSPECs with some not elaborate MEM anyway,
> but TBAA info might have been the "easy" and obvious property to 
> preserve).  For alignment we either have to assume unaligned or reject
> vectorization of accesses that do not have their original scalar accesses
> naturally aligned (aligned according to their mode).  We don't seem
> to check that though.
>  
> It might be fine to go forward with this since gather/scatter are broken
> in a similar way.
>  
> Do we really need to have two modes for the optab though or could we
> simply require the target to support arbitrary offset modes (give it
> is implicitly constrained to ptr_mode for the base already)?  Or
> properly extend/truncate the offset at expansion time, say to ptr_mode
> or to the mode of sizetype.
>  
> Thanks,
> Richard.
> > We don't have strided_load/strided_store and 
> > mask_strided_load/mask_strided_store since
> > it't unlikely RVV will have such optabs and we can't add the patterns that 
> > we can't test them.
> >
> > 
> > gcc/ChangeLog:
> > 
> > * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> > * internal-fn.cc (internal_load_fn_p): Ditto.
> > (internal_strided_fn_p): Ditto.
> > (internal_fn_len_index): Ditto.
> > (internal_fn_mask_index): Ditto.
> > (internal_fn_stored_value_index): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> > (MASK_LEN_STRIDED_STORE): Ditto.
> > * internal-fn.h (internal_strided_fn_p): Ditto.
> > (internal_strided_fn_supported_p): Ditto.
> > * optabs.def (OPTAB_CD): Ditto.
> > 
> > ---
> >  gcc/doc/md.texi | 51 +
> >  gcc/internal-fn.cc  | 44 ++
> >  gcc/internal-fn.def |  4 
> >  gcc/internal-fn.h   |  2 ++
> >  gcc/optabs.def  |  2 ++
> >  5 files changed, 103 insertions(+)
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..5bac713a0dd 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> >  
> > +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}@var{n}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> >

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Martin Uecker
Am Donnerstag, dem 02.11.2023 um 13:50 + schrieb Qing Zhao:
> 
> > On Nov 2, 2023, at 3:57 AM, Richard Biener  
> > wrote:
> > 
> > On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
> > > 
> > > 
> > > 
> > > > On Oct 31, 2023, at 6:14 PM, Joseph Myers  
> > > > wrote:
> > > > 
> > > > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > > > 
> > > > > 2.3 A new semantic requirement in the user documentation of 
> > > > > "counted_by"
> > > > > 
> > > > > For the following structure including a FAM with a counted_by 
> > > > > attribute:
> > > > > 
> > > > > struct A
> > > > > {
> > > > >  size_t size;
> > > > >  char buf[] __attribute__((counted_by(size)));
> > > > > };
> > > > > 
> > > > > for any object with such type:
> > > > > 
> > > > > struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > > > sizeof(char));
> > > > > 
> > > > > The setting to the size field should be done before the first 
> > > > > reference
> > > > > to the FAM field.
> > > > > 
> > > > > Such requirement to the user will guarantee that the first reference 
> > > > > to
> > > > > the FAM knows the size of the FAM.
> > > > > 
> > > > > We need to add this additional requirement to the user document.
> > > > 
> > > > Make sure the manual is very specific about exactly when size is
> > > > considered to be an accurate representation of the space available for 
> > > > buf
> > > > (given that, after malloc or realloc, it's going to be temporarily
> > > > inaccurate).  If the intent is that inaccurate size at such a time means
> > > > undefined behavior, say so explicitly.
> > > 
> > > Yes, good point. We need to define this clearly in the beginning.
> > > We need to explicit say that
> > > 
> > > the size of the FAM is defined by the latest “counted_by” value. And it’s 
> > > an undefined behavior when the size field is not defined when the FAM is 
> > > referenced.
> > > 
> > > Is the above good enough?
> > > 
> > > 
> > > > 
> > > > > 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > > > > 
> > > > > In C FE:
> > > > > 
> > > > > for every reference to a FAM, for example, "obj->buf" in the small 
> > > > > example,
> > > > > check whether the corresponding FIELD_DECL has a "counted_by" 
> > > > > attribute?
> > > > > if YES, replace the reference to "obj->buf" with a call to
> > > > > .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> > > > 
> > > > This seems plausible - but you should also consider the case of static
> > > > initializers - remember the GNU extension for statically allocated 
> > > > objects
> > > > with flexible array members (unless you're not allowing it with
> > > > counted_by).
> > > > 
> > > > static struct A x = { sizeof "hello", "hello" };
> > > > static char *y = &x.buf;
> > > > 
> > > > I'd expect that to be valid - and unless you say such a usage is 
> > > > invalid,
> > > 
> > > At this moment, I think that this should be valid.
> > > 
> > > I,e, the following:
> > > 
> > > struct A
> > > {
> > > size_t size;
> > > char buf[] __attribute__((counted_by(size)));
> > > };
> > > 
> > > static struct A x = {sizeof "hello", "hello”};
> > > 
> > > Should be valid, and x.size represents the number of elements of x.buf.
> > > Both x.size and x.buf are initialized statically.
> > > 
> > > > you should avoid the replacement in such a static initializer context 
> > > > when
> > > > the FAM reference is to an object with a constant address (if
> > > > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> > > > expression; if it works fine as a constant-address lvalue, then the
> > > > replacement would be OK).
> > > 
> > > Then if such usage for the “counted_by” is valid, we need to replace the 
> > > FAM
> > > reference by a call to  .ACCESS_WITH_SIZE as well.
> > > Otherwise the “counted_by” relationship will be lost to the Middle end.
> > > 
> > > With the current definition of .ACCESS_WITH_SIZE
> > > 
> > > PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> > > 
> > > Isn’t the PTR (return value of the call) a LVALUE?
> > 
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).
> 
> Yes. Will add this into the user documentation.

This shouldn't be necessary. The object-size pass
can track pointer arithmeti if it comes after
inserting the .ACCESS_WITH_SIZE.

https://godbolt.org/z/fvc3aoPfd

> 
> >  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.
> 
> What do you mean by “re-obtained”? 
> 
> >  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.
> 
> Through an lvalue with the containing type?
> 
> Yes, will add this too. 

I do not understand this.  It shouldn't matter how
it is

[committed] c: Add missing conditions in Walloc-size to avoid ICEs [PR112347]

2023-11-02 Thread Uecker, Martin

I forget to guard against some more cases. 

Committed as obvious.


Martin


c: Add missing conditions in Walloc-size to avoid ICEs [PR112347]

Fix ICE because of forgotten checks for pointers to void
and incomplete arrays.

Committed as obvious.

PR c/112347

gcc/c:
* c-typeck.cc (convert_for_assignment): Add missing check.

gcc/testsuite:

* gcc.dg/Walloc-size-3.c: New test.

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 16fadfb5468..bdd57aae3ff 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -7367,6 +7367,7 @@ convert_for_assignment (location_t location, location_t 
expr_loc, tree type,
idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN (args))) - 1;
  tree arg = CALL_EXPR_ARG (rhs, idx);
  if (TREE_CODE (arg) == INTEGER_CST
+ && !VOID_TYPE_P (ttl) && TYPE_SIZE_UNIT (ttl)
  && INTEGER_CST == TREE_CODE (TYPE_SIZE_UNIT (ttl))
  && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
 warning_at (location, OPT_Walloc_size, "allocation of "
diff --git a/gcc/testsuite/gcc.dg/Walloc-size-3.c 
b/gcc/testsuite/gcc.dg/Walloc-size-3.c
new file mode 100644
index 000..b95e04a8d99
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloc-size-3.c
@@ -0,0 +1,15 @@
+/* PR 112347 
+   { dg-do compile }
+   { dg-options "-Walloc-size" }
+ * */
+
+// Test that various types without size do not crash with -Walloc-size
+
+int * mallocx(unsigned long) __attribute__((malloc)) 
__attribute__((alloc_size(1)));
+void test_oom(void) { void *a_ = mallocx(1); }
+
+void parse_args(char (**child_args_ptr_ptr)[]) {
+  *child_args_ptr_ptr = __builtin_calloc(1, sizeof(char));
+}
+
+




[committed] d: Merge upstream dmd, druntime 643b1261bb, phobos 1c98326e7

2023-11-02 Thread Iain Buclaw
Hi,

This patch merges the D front-end and runtime library with upstream dmd
643b1261bb, and standard library with phobos 1c98326e7.

Synchronizing with the v2.106.0-beta.1 release.

This is being done a little earlier than usual as there's a lot of
internal moving code around within upstream at the moment to reduce both
the extern(C++) surface area, and cyclic dependencies between all D
modules that implement the compiler. So it is done now to keep the diff
below the 400kb threshold enforced on the mailing list.

D front-end changes:

- Suggested preview switches now give gdc flags (PR109681).
- `new S[10]' is now lowered to `_d_newarrayT!S(10)'.

D runtime changes:

- Runtime compiler library functions `_d_newarrayU', `_d_newarrayT',
  `_d_newarrayiT' have been converted to templates.

Phobos changes:

- Add new `std.traits.Unshared' template.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 643b1261bb.
* d-attribs.cc (build_attributes): Update for new front-end interface.
* d-lang.cc (d_post_options): Likewise.
* decl.cc (layout_class_initializer): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 643b1261bb.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES_FREEBSD): Add
core/sys/freebsd/ifaddrs.d, core/sys/freebsd/net/if_dl.d,
core/sys/freebsd/sys/socket.d, core/sys/freebsd/sys/types.d.
(DRUNTIME_DSOURCES_LINUX): Add core/sys/linux/linux/if_arp.d,
core/sys/linux/linux/if_packet.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos 1c98326e7.
---
 gcc/d/d-attribs.cc|   2 +-
 gcc/d/d-lang.cc   |   1 -
 gcc/d/decl.cc |   2 +-
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/aggregate.d | 184 +++---
 gcc/d/dmd/attrib.d|   6 +-
 gcc/d/dmd/cond.d  |   1 +
 gcc/d/dmd/constfold.d |  24 +-
 gcc/d/dmd/cparse.d|   1 +
 gcc/d/dmd/dcast.d |   3 +-
 gcc/d/dmd/dclass.d|   2 +-
 gcc/d/dmd/declaration.d   |  50 +-
 gcc/d/dmd/dinterpret.d|   3 +-
 gcc/d/dmd/dmangle.d   |   1 +
 gcc/d/dmd/doc.d   |   2 +-
 gcc/d/dmd/dstruct.d   |   2 +-
 gcc/d/dmd/dsymbol.d   |  74 ++-
 gcc/d/dmd/dsymbolsem.d|  11 +-
 gcc/d/dmd/dtemplate.d |  15 +-
 gcc/d/dmd/expression.d| 546 +-
 gcc/d/dmd/expression.h|  20 +-
 gcc/d/dmd/expressionsem.d | 511 +++-
 gcc/d/dmd/func.d  |   1 +
 gcc/d/dmd/globals.h   |   1 -
 gcc/d/dmd/gluelayer.d |   5 -
 gcc/d/dmd/initsem.d   |   1 +
 gcc/d/dmd/lexer.d |   1 -
 gcc/d/dmd/mtype.d |  25 +-
 gcc/d/dmd/mtype.h |   2 +-
 gcc/d/dmd/optimize.d  |   1 +
 gcc/d/dmd/parse.d |  22 +-
 gcc/d/dmd/semantic3.d |   7 +-
 gcc/d/dmd/statementsem.d  |   5 +-
 gcc/d/dmd/staticcond.d|   1 +
 gcc/d/dmd/templateparamsem.d  |   1 +
 gcc/d/dmd/traits.d|   1 +
 gcc/d/dmd/typesem.d   |   2 +
 gcc/d/dmd/typinf.d|  30 +-
 gcc/d/dmd/typinf.h|  22 +
 gcc/testsuite/gdc.test/compilable/dbitfield.d |  13 +
 .../gdc.test/compilable/deprecate14283.d  |   8 +-
 .../gdc.test/compilable/named_arguments.d |  18 +-
 gcc/testsuite/gdc.test/compilable/test20039.d |   2 +-
 .../gdc.test/fail_compilation/b23686.d|  42 ++
 .../gdc.test/fail_compilation/diag4596.d  |   4 +-
 .../gdc.test/fail_compilation/fail13116.d |   2 +-
 .../gdc.test/fail_compilation/fail24208.d |  20 +
 .../gdc.test/fail_compilation/fail24212.d |  30 +
 .../gdc.test/fail_compilation/fail24213.d |  17 +
 .../gdc.test/fail_compilation/ice23865.d  |  32 +
 .../gdc.test/fail_compilation/ice24188.d  |  14 +
 .../fail_compilation/ice24188_a/ice24188_c.d  |   0
 .../gdc.test/fail_compilation/test18480.d |   1 +
 .../gdc.test/fail_compilation/test24157.d |  28 +
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/Makefile.am |   7 +-
 libphobos/libdruntime/Makefile.in |  34 +-
 .../libdruntime/core/sys/linux/linux/if_arp.d | 136 +

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Richard Biener
On Thu, Nov 2, 2023 at 2:50 PM Qing Zhao  wrote:
>
>
>
> > On Nov 2, 2023, at 3:57 AM, Richard Biener  
> > wrote:
> >
> > On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
> >>
> >>
> >>
> >>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> >>>
> >>> On Tue, 31 Oct 2023, Qing Zhao wrote:
> >>>
>  2.3 A new semantic requirement in the user documentation of "counted_by"
> 
>  For the following structure including a FAM with a counted_by attribute:
> 
>  struct A
>  {
>   size_t size;
>   char buf[] __attribute__((counted_by(size)));
>  };
> 
>  for any object with such type:
> 
>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
>  The setting to the size field should be done before the first reference
>  to the FAM field.
> 
>  Such requirement to the user will guarantee that the first reference to
>  the FAM knows the size of the FAM.
> 
>  We need to add this additional requirement to the user document.
> >>>
> >>> Make sure the manual is very specific about exactly when size is
> >>> considered to be an accurate representation of the space available for buf
> >>> (given that, after malloc or realloc, it's going to be temporarily
> >>> inaccurate).  If the intent is that inaccurate size at such a time means
> >>> undefined behavior, say so explicitly.
> >>
> >> Yes, good point. We need to define this clearly in the beginning.
> >> We need to explicit say that
> >>
> >> the size of the FAM is defined by the latest “counted_by” value. And it’s 
> >> an undefined behavior when the size field is not defined when the FAM is 
> >> referenced.
> >>
> >> Is the above good enough?
> >>
> >>
> >>>
>  2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
>  In C FE:
> 
>  for every reference to a FAM, for example, "obj->buf" in the small 
>  example,
>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>  if YES, replace the reference to "obj->buf" with a call to
>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
> >>>
> >>> This seems plausible - but you should also consider the case of static
> >>> initializers - remember the GNU extension for statically allocated objects
> >>> with flexible array members (unless you're not allowing it with
> >>> counted_by).
> >>>
> >>> static struct A x = { sizeof "hello", "hello" };
> >>> static char *y = &x.buf;
> >>>
> >>> I'd expect that to be valid - and unless you say such a usage is invalid,
> >>
> >> At this moment, I think that this should be valid.
> >>
> >> I,e, the following:
> >>
> >> struct A
> >> {
> >> size_t size;
> >> char buf[] __attribute__((counted_by(size)));
> >> };
> >>
> >> static struct A x = {sizeof "hello", "hello”};
> >>
> >> Should be valid, and x.size represents the number of elements of x.buf.
> >> Both x.size and x.buf are initialized statically.
> >>
> >>> you should avoid the replacement in such a static initializer context when
> >>> the FAM reference is to an object with a constant address (if
> >>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
> >>> expression; if it works fine as a constant-address lvalue, then the
> >>> replacement would be OK).
> >>
> >> Then if such usage for the “counted_by” is valid, we need to replace the 
> >> FAM
> >> reference by a call to  .ACCESS_WITH_SIZE as well.
> >> Otherwise the “counted_by” relationship will be lost to the Middle end.
> >>
> >> With the current definition of .ACCESS_WITH_SIZE
> >>
> >> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> >>
> >> Isn’t the PTR (return value of the call) a LVALUE?
> >
> > You probably want to specify that when a pointer to the array is taken the
> > pointer has to be to the first array element (or do we want to mangle the
> > 'size' accordingly for the instrumentation?).
>
> Yes. Will add this into the user documentation.
>
> >  You also want to specify that
> > the 'size' associated with such pointer is assumed to be unchanging and
> > after changing the size such pointer has to be re-obtained.
>
> What do you mean by “re-obtained”?

do

p = &container.array[0];

after any adjustments to 'array' or 'len' again and base further accesses on
the new 'p'.

> >  Plus that
> > changes to the allocated object/size have to be performed through an
> > lvalue where the containing type and thus the 'counted_by' attribute is
> > visible.
>
> Through an lvalue with the containing type?
>
> Yes, will add this too.
>
>
> >  That is,
> >
> > size_t *s = &a.size;
> > *s = 1;
> >
> > is invoking undefined behavior,
>
> right.
>
> > likewise modifying 'buf' (makes it a bit
> > awkward since for example that wouldn't support using posix_memalign
> > for allocation, though aligned_alloc would be fine).
> Is there a small example for the undefined behavior for this?

a.len = len;
posix_memalign (&a.buf, 16, len);

we would probably have to somehow instrument this.

[committed] libstdc++: Fix warning during configure

2023-11-02 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The checks for snprintf give a -Wformat warning due to a missing
argument.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_C99): Fix snprintf checks.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 4 ++--
 libstdc++-v3/configure| 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index d8f0ba1c3e2..654b99e92d7 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -997,7 +997,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
vscanf("%i", args);
vsnprintf(fmt, 0, "%i", args);
vsscanf(fmt, "%i", args);
-   snprintf(fmt, 0, "%i");
+   snprintf(fmt, 0, "%i", 1);
  }], [],
 [glibcxx_cv_c99_stdio_cxx98=yes], [glibcxx_cv_c99_stdio_cxx98=no])
 ])
@@ -1578,7 +1578,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
vscanf("%i", args);
vsnprintf(fmt, 0, "%i", args);
vsscanf(fmt, "%i", args);
-   snprintf(fmt, 0, "%i");
+   snprintf(fmt, 0, "%i", 1);
  }], [],
 [glibcxx_cv_c99_stdio_cxx11=yes], [glibcxx_cv_c99_stdio_cxx11=no])
 ])



Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-02 Thread Richard Biener
On Tue, 31 Oct 2023, Juzhe-Zhong wrote:

> As previous Richard's suggested, we should support strided load/store in
> loop vectorizer instead hacking RISC-V backend.
> 
> This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> 
> The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
> changing vector offset into scalar stride.

I see that it follows gather/scatter.  I'll note that when introducing
those we failed to add a specifier for TBAA and alignment info for the
data access.  That means we have to use alias-set zero for the accesses
(I see existing targets use UNSPECs with some not elaborate MEM anyway,
but TBAA info might have been the "easy" and obvious property to 
preserve).  For alignment we either have to assume unaligned or reject
vectorization of accesses that do not have their original scalar accesses
naturally aligned (aligned according to their mode).  We don't seem
to check that though.

It might be fine to go forward with this since gather/scatter are broken
in a similar way.

Do we really need to have two modes for the optab though or could we
simply require the target to support arbitrary offset modes (give it
is implicitly constrained to ptr_mode for the base already)?  Or
properly extend/truncate the offset at expansion time, say to ptr_mode
or to the mode of sizetype.

Thanks,
Richard.
 
> We don't have strided_load/strided_store and 
> mask_strided_load/mask_strided_store since
> it't unlikely RVV will have such optabs and we can't add the patterns that we 
> can't test them.
>
> 
> gcc/ChangeLog:
> 
>   * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
>   * internal-fn.cc (internal_load_fn_p): Ditto.
>   (internal_strided_fn_p): Ditto.
>   (internal_fn_len_index): Ditto.
>   (internal_fn_mask_index): Ditto.
>   (internal_fn_stored_value_index): Ditto.
>   (internal_strided_fn_supported_p): Ditto.
>   * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
>   (MASK_LEN_STRIDED_STORE): Ditto.
>   * internal-fn.h (internal_strided_fn_p): Ditto.
>   (internal_strided_fn_supported_p): Ditto.
>   * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 51 +
>  gcc/internal-fn.cc  | 44 ++
>  gcc/internal-fn.def |  4 
>  gcc/internal-fn.h   |  2 ++
>  gcc/optabs.def  |  2 ++
>  5 files changed, 103 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..5bac713a0dd 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>  
> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of mode 
> @var{n}.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i:
> +
> +@itemize @bullet
> +@item
> +extend the stride to address width, using zero
> +extension if operand 3 is 1 and sign extension if operand 3 is zero;
> +@item
> +multiply the extended stride by operand 4;
> +@item
> +add the result to the base; and
> +@item
> +load the value at that address (operand 1 + @var{i} * multiplied and 
> extended stride) into element @var{i} of operand 0.
> +@end itemize
> +
> +Similar to mask_len_load, the instruction loads at most (operand 6 + operand 
> 7) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5157,31 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>  
> +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}@var{n}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of mode 
> @var{n}.
> +Operand 2 is the vector of values that should b

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Qing Zhao


> On Nov 2, 2023, at 3:57 AM, Richard Biener  wrote:
> 
> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
 2.3 A new semantic requirement in the user documentation of "counted_by"
 
 For the following structure including a FAM with a counted_by attribute:
 
 struct A
 {
  size_t size;
  char buf[] __attribute__((counted_by(size)));
 };
 
 for any object with such type:
 
 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 
 The setting to the size field should be done before the first reference
 to the FAM field.
 
 Such requirement to the user will guarantee that the first reference to
 the FAM knows the size of the FAM.
 
 We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is
>>> considered to be an accurate representation of the space available for buf
>>> (given that, after malloc or realloc, it's going to be temporarily
>>> inaccurate).  If the intent is that inaccurate size at such a time means
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning.
>> We need to explicit say that
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
>> undefined behavior when the size field is not defined when the FAM is 
>> referenced.
>> 
>> Is the above good enough?
>> 
>> 
>>> 
 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
 
 In C FE:
 
 for every reference to a FAM, for example, "obj->buf" in the small example,
 check whether the corresponding FIELD_DECL has a "counted_by" attribute?
 if YES, replace the reference to "obj->buf" with a call to
 .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
>>> 
>>> This seems plausible - but you should also consider the case of static
>>> initializers - remember the GNU extension for statically allocated objects
>>> with flexible array members (unless you're not allowing it with
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = &x.buf;
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid,
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf.
>> Both x.size and x.buf are initialized statically.
>> 
>>> you should avoid the replacement in such a static initializer context when
>>> the FAM reference is to an object with a constant address (if
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
>>> expression; if it works fine as a constant-address lvalue, then the
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE?
> 
> You probably want to specify that when a pointer to the array is taken the
> pointer has to be to the first array element (or do we want to mangle the
> 'size' accordingly for the instrumentation?).

Yes. Will add this into the user documentation.

>  You also want to specify that
> the 'size' associated with such pointer is assumed to be unchanging and
> after changing the size such pointer has to be re-obtained.

What do you mean by “re-obtained”? 

>  Plus that
> changes to the allocated object/size have to be performed through an
> lvalue where the containing type and thus the 'counted_by' attribute is
> visible.

Through an lvalue with the containing type?

Yes, will add this too. 


>  That is,
> 
> size_t *s = &a.size;
> *s = 1;
> 
> is invoking undefined behavior,

right.

> likewise modifying 'buf' (makes it a bit
> awkward since for example that wouldn't support using posix_memalign
> for allocation, though aligned_alloc would be fine).
Is there a small example for the undefined behavior for this?

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>> --
>>> Joseph S. Myers
>>> jos...@codesourcery.com
>> 



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-02 Thread Robin Dapp
> Looks reasonable overall.  The new match patterns are 1:1 the
> same as the COND_ ones.  That's a bit awkward, but I don't see
> a good way to "macroize" stuff further there.  Can you at least
> interleave the COND_LEN_* ones with the other ones instead of
> putting them all at the end?

Yes, no problem.  It's supposed to be only temporary anyway (FWIW)
as I didn't manage with the "stripping _LEN" way on the first few tries.
Still on the todo list but unlikely to be done before stage 1 closes.

I believe Richard "kind of" LGTM'ed the rest minus the spurious
pattern (which is gone now) but there is still the direct optab change
that he didn't comment on so I think we should wait for his remarks
still.

Regards
 Robin



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-02 Thread Richard Biener
On Thu, 26 Oct 2023, Robin Dapp wrote:

> Ok, next try.  Now without dubious pattern and with direct optab
> but still dedicated expander function.
> 
> This will cause one riscv regression in cond_widen_reduc-2.c that
> we can deal with later.  It is just a missed optimization where
> we do not combine something that we used to because of the
> now-present length masking.
> 
> I'd also like to postpone handling vcond_mask_len simplifications
> via stripping the length and falling back to vec_cond and its fold
> patterns to a later time.  As is, this helps us avoid execution
> failures in at least five test cases.
> 
> Bootstrap et al. running on x86, aarch64 and power10.

Looks reasonable overall.  The new match patterns are 1:1 the
same as the COND_ ones.  That's a bit awkward, but I don't see
a good way to "macroize" stuff further there.  Can you at least
interleave the COND_LEN_* ones with the other ones instead of
putting them all at the end?

Thanks,
Richard.


> Regards
>  Robin
> 
> From 7acdebb5b13b71331621af08da6649fe08476fe8 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 25 Oct 2023 22:19:43 +0200
> Subject: [PATCH v3] internal-fn: Add VCOND_MASK_LEN.
> 
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.
> 
> It also adds new match patterns that allow the combination of
> unconditional unary, binary and ternay operations with the
> VCOND_MASK_LEN into a conditional operation if the target supports it.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/111760
> 
>   * config/riscv/autovec.md (vcond_mask_len_): Add
>   expander.
>   * config/riscv/riscv-protos.h (enum insn_type): Add.
>   * doc/md.texi: Add vcond_mask_len.
>   * gimple-match-exports.cc (maybe_resimplify_conditional_op):
>   Create VCOND_MASK_LEN when
>   length masking.
>   * gimple-match.h (gimple_match_op::gimple_match_op): Allow
>   matching of 6 and 7 parameters.
>   (gimple_match_op::set_op): Ditto.
>   (gimple_match_op::gimple_match_op): Always initialize len and
>   bias.
>   * internal-fn.cc (vec_cond_mask_len_direct): Add.
>   (expand_vec_cond_mask_len_optab_fn): Add.
>   (direct_vec_cond_mask_len_optab_supported_p): Add.
>   (internal_fn_len_index): Add VCOND_MASK_LEN.
>   (internal_fn_mask_index): Ditto.
>   * internal-fn.def (VCOND_MASK_LEN): New internal function.
>   * match.pd: Combine unconditional unary, binary and ternary
>   operations into the respective COND_LEN operations.
>   * optabs.def (OPTAB_D): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md | 37 
>  gcc/config/riscv/riscv-protos.h |  5 +++
>  gcc/doc/md.texi |  9 
>  gcc/gimple-match-exports.cc | 13 --
>  gcc/gimple-match.h  | 78 -
>  gcc/internal-fn.cc  | 42 ++
>  gcc/internal-fn.def |  2 +
>  gcc/match.pd| 61 ++
>  gcc/optabs.def  |  1 +
>  9 files changed, 243 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 80910ba3cc2..dadb71c1165 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_"
>[(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_"
> +  [(match_operand:V_VLS 0 "register_operand")
> +(match_operand: 3 "nonmemory_operand")
> +(match_operand:V_VLS 1 "nonmemory_operand")
> +(match_operand:V_VLS 2 "autovec_else_operand")
> +(match_operand 4 "autovec_length_operand")
> +(match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +if (satisfies_constraint_Wc1 (operands[3]))
> +  {
> + rtx ops[] = {operands[0], operands[2], operands[1]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> +   riscv_vector::UNARY_OP_TUMA,
> +   ops, operands[4]);
> +  }
> +else if (satisfies_constraint_Wc0 (operands[3]))
> +  {
> + rtx ops[] = {operands[0], operands[2], operands[2]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> +   riscv_vector::UNARY_OP_TUMA,
> +   ops, operands[4]);
> +  }
> +else
> +  {
> + /* The order of vcond_mask is opposite to pred_merge.  */
> + rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
> +  operands[3]};
> + riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
> +   riscv_vector::MERGE_OP_TUMA,
> +   ops, operands[4

[committed] Improve H8 sequences for single bit sign extractions

2023-11-02 Thread Jeff Law
Spurred by Roger's recent work on ARC, this patch improves the code we 
generation for single bit sign extractions.


The basic idea is to get the bit we want into C, the use a 
subx;ext.w;ext.l sequence to sign extend it in a GPR.


For bits 0..15 we can use a bld instruction to get the bit we want into 
C.  For bits 16..31, we can move the high word into the low word, then 
use bld.  There's a couple special cases where we can shift the bit we 
want from the high word into C which is an instruction smaller.


Not surprisingly most cases seen in newlib and the test suite are 
extractions from the low byte, HImode sign bit and top two bits of SImode.


Regression tested on the H8 with no regressions.  Installing on the trunk.

Jeffcommit 0f9f3fc885a1f830ff09a095e8c14919c2796a9d
Author: Jeff Law 
Date:   Thu Nov 2 07:25:39 2023 -0600

[committed] Improve H8 sequences for single bit sign extractions

Spurred by Roger's recent work on ARC, this patch improves the code we
generation for single bit sign extractions.

The basic idea is to get the bit we want into C, the use a subx;ext.w;ext.l
sequence to sign extend it in a GPR.

For bits 0..15 we can use a bld instruction to get the bit we want into C.  
For
bits 16..31, we can move the high word into the low word, then use bld.
There's a couple special cases where we can shift the bit we want from the 
high
word into C which is an instruction smaller.

Not surprisingly most cases seen in newlib and the test suite are 
extractions
from the low byte, HImode sign bit and top two bits of SImode.

Regression tested on the H8 with no regressions.  Installing on the trunk.

gcc/
* config/h8300/combiner.md: Add new patterns for single bit
sign extractions.

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index fd5cf2f4af4..2f7faf77c93 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1268,3 +1268,94 @@ (define_insn ""
 ;;   (label_ref (match_dup 1))
 ;;   (pc)))]
 ;;   "")
+
+;; Various ways to extract a single bit bitfield and sign extend it
+;;
+;; Testing showed this only triggering with SImode, probably because
+;; of how insv/extv are defined.
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (sign_extract:SI (match_operand:QHSI 1 "register_operand" "0")
+(const_int 1)
+(match_operand 2 "immediate_operand")))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0)
+  (sign_extract:SI (match_dup 1) (const_int 1) (match_dup 2)))
+ (clobber (reg:CC CC_REG))])])
+
+(define_insn ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (sign_extract:SI (match_operand:QHSI 1 "register_operand" "0")
+(const_int 1)
+(match_operand 2 "immediate_operand")))
+   (clobber (reg:CC CC_REG))]
+  ""
+{
+  int position = INTVAL (operands[2]);
+
+  /* For bit position 31, 30, left shift the bit we want into C.  */
+  bool bit_in_c = false;
+  if (position == 31)
+{
+  output_asm_insn ("shll.l\t%0", operands);
+  bit_in_c = true;
+}
+  else if (position == 30 && TARGET_H8300S)
+{
+  output_asm_insn ("shll.l\t#2,%0", operands);
+  bit_in_c = true;
+}
+
+  /* Similar for positions 16, 17, but with a right shift into C.  */
+  else if (position == 16)
+{
+  output_asm_insn ("shlr.w\t%e0", operands);
+  bit_in_c = true;
+}
+  else if (position == 17 && TARGET_H8300S)
+{
+  output_asm_insn ("shlr.w\t#2,%e0", operands);
+  bit_in_c = true;
+}
+
+
+  /* For all the other cases in the upper 16 bits, move the upper 16
+ bits into the lower 16 bits, then use the standard sequence for
+ extracting one of the low 16 bits.  */
+  else if (position >= 16)
+{
+  output_asm_insn ("mov.w\t%e1,%f0", operands);
+
+  /* We'll use the standard sequences for the low word now.  */
+  position %= 16;
+}
+
+  /* Same size/speed as the general sequence, but slightly faster
+ to simulate.  */
+  if (position == 0)
+return "and.l\t#1,%0\;neg.l\t%0";
+
+  rtx xoperands[3];
+  xoperands[0] = operands[0];
+  xoperands[1] = operands[1];
+  xoperands[2] = GEN_INT (position);
+
+  /* If the bit we want is not already in C, get it there  */
+  if (!bit_in_c)
+{
+  if (position >= 8)
+   {
+ xoperands[2] = GEN_INT (position % 8);
+ output_asm_insn ("bld\t%2,%t1", xoperands);
+   }
+  else
+   output_asm_insn ("bld\t%2,%s1", xoperands);
+}
+
+  /* Now the bit we want is in C, emit the generalized sequence
+ to get that bit into the destination, properly extended.  */
+  return "subx\t%s0,%s0\;exts.w %T0\;exts.l %0";
+}
+  [(set_attr "length" "10")])


Re: [PATCH 01/12] [contrib] validate_failures.py: Avoid testsuite aliasing

2023-11-02 Thread Maxim Kuvyrkov
Patch proposed at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635000.html
--
Maxim Kuvyrkov
https://www.linaro.org

> On Sep 27, 2023, at 18:47, Maxim Kuvyrkov  wrote:
> 
> Hi Bernhard,
> 
> Thanks, I meant to fix this, but forgot.
> 
> The underlying problem here is that we want to detect which sub-testsuites 
> had failures.  Current regex doesn't match go's case because there is no 
> "..." at the end: "Running foo" vs "Running foo ..." .
> 
> My preferred way of fixing this is to make go's testsuite print out "..." .  
> We have a similar patch for glibc [1].
> 
> [1] https://sourceware.org/pipermail/libc-alpha/2023-June/148702.html
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
>> On Sep 26, 2023, at 19:46, Bernhard Reutner-Fischer  
>> wrote:
>> 
>> Hi Maxim!
>> 
>> On Mon, 5 Jun 2023 18:06:25 +0400
>> Maxim Kuvyrkov via Gcc-patches  wrote:
>> 
 On Jun 3, 2023, at 19:17, Jeff Law  wrote:
 
 On 6/2/23 09:20, Maxim Kuvyrkov via Gcc-patches wrote:  
> This patch adds tracking of current testsuite "tool" and "exp"
> to the processing of .sum files.  This avoids aliasing between
> tests from different testsuites with same name+description.
> E.g., this is necessary for testsuite/c-c++-common, which is ran
> for both gcc and g++ "tools".
> This patch changes manifest format from ...
> 
> FAIL: gcc_test
> FAIL: g++_test
> 
> ... to ...
> 
> === gcc tests ===
> Running gcc/foo.exp ...
> FAIL: gcc_test
> === gcc Summary ==
> === g++ tests ===
> Running g++/bar.exp ...
> FAIL: g++_test
> === g++ Summary ==
> .
> The new format uses same formatting as DejaGnu's .sum files
> to specify which "tool" and "exp" the test belongs to.  
 I think the series is fine.  You're not likely to hear from Diego or Doug 
 I suspect, I don't think either are involved in GNU stuff anymore.
 
>>> 
>>> Thanks, Jeff.  I'll wait for a couple of days and will merge if there are 
>>> no new comments.
>> 
>> Maxim, may i ask you to have a look at the following problem, please?
>> 
>> ISTM that your exp code does not work as expected for go, maybe you
>> forgot to test the changes with go enabled?
>> 
>> Ever since your changes in summer i see the following:
>> 
>> gcc-14.mine$ 
>> /scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py 
>> --clean_build ../gcc-14.orig/
>> Getting actual results from build directory .
>> ./gcc/testsuite/go/go.sum
>> ./gcc/testsuite/gcc/gcc.sum
>> ./gcc/testsuite/objc/objc.sum
>> ./gcc/testsuite/jit/jit.sum
>> ./gcc/testsuite/gdc/gdc.sum
>> ./gcc/testsuite/gnat/gnat.sum
>> ./gcc/testsuite/ada/acats/acats.sum
>> ./gcc/testsuite/g++/g++.sum
>> ./gcc/testsuite/obj-c++/obj-c++.sum
>> ./gcc/testsuite/rust/rust.sum
>> ./gcc/testsuite/gfortran/gfortran.sum
>> ./x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum
>> ./x86_64-pc-linux-gnu/libphobos/testsuite/libphobos.sum
>> ./x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
>> ./x86_64-pc-linux-gnu/libffi/testsuite/libffi.sum
>> ./x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum
>> ./x86_64-pc-linux-gnu/libgo/libgo.sum
>> ./x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum
>> ./gotools/gotools.sum
>> .sum file seems to be broken: tool="gotools", exp="None", 
>> summary_line="FAIL: TestScript"
>> Traceback (most recent call last):
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 732, in 
>>   retval = Main(sys.argv)
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 721, in Main
>>   retval = CompareBuilds()
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 622, in CompareBuilds
>>   actual = GetResults(sum_files)
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 466, in GetResults
>>   build_results.update(ParseSummary(sum_fname))
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 405, in ParseSummary
>>   result = result_set.MakeTestResult(line, ordinal)
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 239, in MakeTestResult
>>   return TestResult(summary_line, ordinal,
>> File 
>> "/scratch/src/gcc-14.mine/contrib/testsuite-management/validate_failures.py",
>>  line 151, in __init__
>>   raise
>> RuntimeError: No active exception to reraise
>> 
>> 
>> The problem seems to be that gotools.sum does not mention any ".exp"
>> files.
>> 
>> $ grep "Running " gotools/gotools.sum 
>> Running cmd/go
>> Running runtime
>> Running cgo
>> Running carchive
>> Running cmd/vet
>> Running embed
>> $ grep -c "\.exp" gotools/gotools.sum 
>> 0
>> 
>> The .sum files looks like this:
>> ---8<---
>> Test Run By foo on Tue Sep 26 14:46:48 CEST 2023
>> Native configuration is x86_64-foo-linux-gnu
>> 
>>   === gotools tests 

[PATCH 1/4] c/c++: rework pragma parsing

2023-11-02 Thread David Malcolm
This patch reworks pragma parsing in c-pragma.cc, with the
following improvements:

- it replaces the GCC_BAD* macros (that contained "return") in favor
of helper classes and functions for emitting diagnostics, making control
flow more explicit

- the -Wpragmas diagnostics are reworded from the form e.g.:
  DESCRIPTION OF PROBLEM; ignored
to:
  ignoring malformed '#pragma FOO': DESCRIPTION OF PROBLEM

- the locations of the warnings are fixed to more accurately
reflect the location of the problem

- the names of the pragmas are URLified into links to the
documentation for the pragma.  For example, in:

  warning: ignoring malformed '#pragma weak': expected name [-Wpragmas]

in a suitable terminal, the "#pragma weak" within quotes is a link
to https://gcc.gnu.org/onlinedocs/gcc/Weak-Pragmas.html; similarly with

  warning: '#pragma pack' has no effect with '-fpack-struct' - ignored 
[-Wpragmas]

the "#pragma pack" text is linkified to
  https://gcc.gnu.org/onlinedocs/gcc/Structure-Layout-Pragmas.html
and the "-fpack-struct" text is linkified to:
  https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fpack-struct

I have a more general and maintainable approach to adding URLs to
diagnostics which is in a followup.

gcc/c-family/ChangeLog:
* c-pragma.cc (GCC_BAD): Delete.
(GCC_BAD2): Delete.
(GCC_BAD_AT): Delete.
(GCC_BAD2_AT): Delete.
(get_doc_url): New.
(class pragma_parser): New.
(handle_pragma_pack): Delete redundant forward decl.
(pop_alignment): Add param "p" and use it to get doc urls.
(enum class pack_action): Move here from within
handle_pragma_pack.
(class pragma_pack_parser): New.
(handle_pragma_pack): Rewrite using pragma_pack_parser
and enum class pack_action, eliminating uses of GCC_BAD*,
rewording diagnostics.
(handle_pragma_weak): Rewrite using pragma_parser, eliminating
uses of GCC_BAD*, rewording diagnostics.
(class pragma_scalar_storage_order_parser): New.
(handle_pragma_scalar_storage_order): Rewrite using above,
eliminating uses of GCC_BAD*, rewording diagnostics.
(handle_pragma_redefine_extname): Rewrite using pragma_parser,
eliminating uses of GCC_BAD*, rewording diagnostics.  Fix overlong
line.
(handle_pragma_visibility): Remove redundant forward decl.
(push_visibility): Add "const pragma_parser *" param.  Rewrite to
eliminate uses of GCC_BAD*.  Add note that warning was ignored.
(handle_pragma_visibility): Rewrite using pragma_parser,
eliminating uses of GCC_BAD*, rewording diagnostics.
(handle_pragma_target): Fix name of pragma in "error".  Eliminate
uses of GCC_BAD*.
(handle_pragma_optimize): Eliminate uses of GCC_BAD.
(handle_pragma_message): Rewrite using pragma_parser, eliminating
uses of GCC_BAD*, rewording diagnostics.
* c-pragma.h (class pragma_parser): New forward decl.
(push_visibility): Add optional "const pragma_parser *" param.

gcc/testsuite/ChangeLog:
* c-c++-common/pragma-message-parsing.c: New test.
* c-c++-common/pragma-optimize-parsing.c: New test.
* c-c++-common/pragma-pack-parsing-1.c: New test.
* c-c++-common/pragma-pack-parsing-2.c: New test.
* c-c++-common/pragma-redefine_extname-parsing.c: New test.
* c-c++-common/pragma-target-parsing.c: New test.
* c-c++-common/pragma-visibility-parsing.c: New test.
* c-c++-common/pragma-weak-parsing.c: New test.
* gcc.dg/bad-pragma-locations.c: Update for changes to wording and
location of -Wpragmas.
* gcc.dg/pragma-scalar_storate_order-parsing.c: New test.
* gcc.dg/sso-6.c: Update for changes to wording of -Wpragmas.
---
 gcc/c-family/c-pragma.cc  | 569 ++
 gcc/c-family/c-pragma.h   |   5 +-
 .../c-c++-common/pragma-message-parsing.c |  21 +
 .../c-c++-common/pragma-optimize-parsing.c|  16 +
 .../c-c++-common/pragma-pack-parsing-1.c  |  19 +
 .../c-c++-common/pragma-pack-parsing-2.c  |   4 +
 .../pragma-redefine_extname-parsing.c |   9 +
 .../c-c++-common/pragma-target-parsing.c  |  14 +
 .../c-c++-common/pragma-visibility-parsing.c  |  13 +
 .../c-c++-common/pragma-weak-parsing.c|  24 +
 gcc/testsuite/gcc.dg/bad-pragma-locations.c   |  22 +-
 .../pragma-scalar_storate_order-parsing.c |   8 +
 gcc/testsuite/gcc.dg/sso-6.c  |   2 +-
 13 files changed, 588 insertions(+), 138 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pragma-message-parsing.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-optimize-parsing.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-pack-parsing-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-pack-parsing-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-redefine_extname-parsi

[PATCH 2/4] c: add #pragma GCC show_layout

2023-11-02 Thread David Malcolm
This patch adds a new pragma to the C frontend that will
make it emit a human-readable diagram of a struct's layout.

For example, given this contrived usage:

struct example {
  char foo : 7;
  char bar;
  char visible : 1;
  char active  : 1;
};

the compiler will emit output similar to the following:

note: 'sizeof(struct example)' == 3; layout:

  
┌───┬┬───┬─┬─┬───┐
  │Offsets│Byte│   0   │  1  │2
│   3   │
  
├───┼┼─┬─┬─┬─┬─┬─┬─┬─┼─┬─┬──┬──┬──┬──┬──┬──┼───┬───┬──┬──┬──┬──┬──┬──┼──┬──┬──┬──┬──┬──┬──┬──┤
  │ Byte  │Bit │0│1│2│3│4│5│6│7│8│9│10│11│12│13│14│15│16 │17 
│18│19│20│21│22│23│24│25│26│27│28│29│30│31│
  
├───┼┼─┴─┴─┴─┴─┴─┴─┼─┼─┴─┴──┴──┴──┴──┴──┴──┼───┼───┼──┴──┴──┴──┴──┴──┼──┴──┴──┴──┴──┴──┴──┴──┘
  │   0   │ 0  │'foo'│*│'bar'│(1)│(2)│ padding │
  └───┴┴─┴─┴─┴───┴───┴─┘
  *: padding
  (1): 'visible'
  (2): 'active'

The output is intended for humans, rather than scripts, and is
subject to change.

One wart is that it uses some analyzer internals, and thus requires
GCC to have been configured without disabling the analyzer.

Caveat: only tested on x86_64, and probably has some endianness and
packing assumptions in the testcases.

Thoughts?

gcc/analyzer/ChangeLog:
* record-layout.cc: Define INCLUDE_ALGORITHM and
INCLUDE_VECTOR.  Include "intl.h", "text-art/table.h",
"text-art/widget.h", and "diagnostic-diagram.h".
(class layout_diagram): New.
(layout_diagram::layout_diagram): New.
(layout_diagram::bit_to_table_coord): New.
(layout_diagram::ensure_table_rows): New.
(layout_diagram::get_string_for_item): New.
(impl_show_record_layout): New.
(show_record_layout): New.
* record-layout.h (class layout_diagram): New forward decl.
(class record_layout): Add friend class layout_diagram.

gcc/c-family/ChangeLog:
* c-pragma.cc: Include "stor-layout.h".
(class pragma_parser_show_layout): New.
(handle_pragma_show_layout): New.
(init_pragma): Register it.

gcc/ChangeLog:
* doc/extend.texi (Other Pragmas): New subsection,
with '#pragma GCC show_layout'.
* stor-layout.h (show_record_layout): New decl.

gcc/testsuite/ChangeLog:
* gcc.dg/parsing-pragma-show_layout.c: New test.
* gcc.dg/pragma-show_layout-1.c: New test.
* gcc.dg/pragma-show_layout-2.c: New test.
* gcc.dg/pragma-show_layout-infoleak-CVE-2017-18550.c: New test.
---
 gcc/analyzer/record-layout.cc | 235 ++
 gcc/analyzer/record-layout.h  |   4 +
 gcc/c-family/c-pragma.cc  |  95 +++
 gcc/doc/extend.texi   |  49 
 gcc/stor-layout.h |   3 +
 .../gcc.dg/parsing-pragma-show_layout.c   |  15 ++
 gcc/testsuite/gcc.dg/pragma-show_layout-1.c   |  12 +
 gcc/testsuite/gcc.dg/pragma-show_layout-2.c   | 184 ++
 ...agma-show_layout-infoleak-CVE-2017-18550.c | 175 +
 9 files changed, 772 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/parsing-pragma-show_layout.c
 create mode 100644 gcc/testsuite/gcc.dg/pragma-show_layout-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pragma-show_layout-2.c
 create mode 100644 
gcc/testsuite/gcc.dg/pragma-show_layout-infoleak-CVE-2017-18550.c

diff --git a/gcc/analyzer/record-layout.cc b/gcc/analyzer/record-layout.cc
index 1369bfb5eff..242a9895309 100644
--- a/gcc/analyzer/record-layout.cc
+++ b/gcc/analyzer/record-layout.cc
@@ -19,7 +19,9 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_ALGORITHM
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "tree.h"
@@ -28,8 +30,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "diagnostic.h"
 #include "tree-diagnostic.h"
+#include "intl.h"
+#include "make-unique.h"
 #include "analyzer/analyzer.h"
 #include "analyzer/record-layout.h"
+#include "text-art/table.h"
+#include "text-art/widget.h"
+#include "diagnostic-diagram.h"
 
 #if ENABLE_ANALYZER
 
@@ -120,6 +127,234 @@ record_layout::maybe_pad_to (bit_offset_t next_offset)
 }
 }
 
+class layout_diagram : public text_art::vbox_widget
+{
+public:
+  layout_diagram (const ana::record_layout &layout,
+ text_art::style_manager &sm,
+ const text_art::theme &theme);
+
+private:
+  text_art::table::coord_t bit_to_table_coord (ana::bit_offset_t bit);
+
+  void ensure_table_rows (text_art::style_manager &sm,
+ text_art::table &table,
+ int table_y);
+
+  text_art::styled_string
+  get_string_for_item (const ana::rec

[PATCH 3/4] diagnostics: add automatic URL-ification within messages

2023-11-02 Thread David Malcolm
In r10-3781-gd26082357676a3 GCC's pretty-print framework gained
the ability to emit embedding URLs via escape sequences
for marking up text output..

In r10-3783-gb4c7ca2ef3915a GCC started using this for the
[-Wname-of-option] emitted at the end of each diagnostic so that it
becomes a hyperlink to the documentation for that option on the GCC
website.

This makes it much more convenient for the user to locate pertinent
documentation when a diagnostic is emitted.

The above involved special-casing in one specific place, but there is
plenty of quoted text throughout GCC's diagnostic messages that could
usefully have a documentation URL: references to options, pragmas, etc

This patch adds a new optional "urlifier" parameter to pp_format.
The idea is that a urlifier object has responsibility for mapping from
quoted strings in diagnostic messages to URLs, and pp_format has the
ability to automatically add URL escapes for strings that the urlifier
gives it URLs for.

For example, given the format string:

  "%<#pragma pack%> has no effect with %<-fpack-struct%>"

with this patch GCC is able to automatically linkify the "#pragma pack"
text to
  https://gcc.gnu.org/onlinedocs/gcc/Structure-Layout-Pragmas.html
and the "-fpack-struct" text to:
  https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fpack-struct

and we don't have to modify the format string itself.

This is only done for the pp_format within diagnostic_report_diagnostic
i.e. just for the primary message in each diagnostics, and not for other
places within GCC that use pp_format internally.

"urlifier" is an abstract base class, with a GCC-specific subclass
implementing the logic for generating URLs into GCC's HTML
documentation via binary search in a data table.  This patch implements
the gcc_urlifier with a small table generated by hand; the data table in
this patch only covers enough pragmas and options to allow undoing some
of the hardcoding from the previous pragma-parsing patch.

I have a followup patch that scripts the creation of this data by
directly scraping the output of "make html", thus automating all this,
and (I hope) minimizing the work of ensuring that documentation URLs
emitted by GCC match the generated documentation.

gcc/ChangeLog:
* Makefile.in (GCC_OBJS): Add gcc-urlifier.o.
(OBJS): Likewise.

gcc/c-family/ChangeLog:
* c-pragma.cc:: Eliminate uses of %{ and %} and get_doc_url
in all places where it's just the name of the pragma (or of an
option).
(handle_pragma_push_options): Fix missing "GCC" in name of pragma
in "junk" message.
(handle_pragma_pop_options): Likewise.

gcc/ChangeLog:
* diagnostic.cc: Include "pretty-print-urlifier.h".
(diagnostic_initialize): Initialize m_urlifier.
(diagnostic_finish): Clean up m_urlifier
(diagnostic_report_diagnostic): Pass context->m_urlifier to
pp_format.
* diagnostic.h (diagnostic_context::m_urlifier): New field.
* gcc-urlifier.cc: New file.
* gcc-urlifier.def: New file.
* gcc-urlifier.h: New file.
* gcc.cc: Include "gcc-urlifier.h".
(driver::global_initializations): Initialize global_dc->m_urlifier.
* pretty-print-urlifier.h: New file.
* pretty-print.cc: Include "pretty-print-urlifier.h".
(obstack_append_string): New.
(urlify_quoted_string): New.
(pp_format): Add "urlifier" param and use it to implement optional
urlification of quoted text strings.
(pp_output_formatted_text): Make buffer a const pointer.
(selftest::pp_printf_with_urlifier): New.
(selftest::test_urlification): New.
(selftest::pretty_print_cc_tests): Call it.
* pretty-print.h (class urlifier): New forward declaration.
(pp_format): Add optional urlifier param.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::gcc_urlifier_cc_tests .
* selftest.h (selftest::gcc_urlifier_cc_tests): New decl.
* toplev.cc: Include "gcc-urlifier.h".
(general_init): Initialize global_dc->m_urlifier.
---
 gcc/Makefile.in |   3 +-
 gcc/c-family/c-pragma.cc|  73 ---
 gcc/diagnostic.cc   |   8 +-
 gcc/diagnostic.h|   4 +
 gcc/gcc-urlifier.cc | 159 +++
 gcc/gcc-urlifier.def|  20 +++
 gcc/gcc-urlifier.h  |  26 
 gcc/gcc.cc  |   2 +
 gcc/pretty-print-urlifier.h |  33 +
 gcc/pretty-print.cc | 242 +++-
 gcc/pretty-print.h  |   5 +-
 gcc/selftest-run-tests.cc   |   1 +
 gcc/selftest.h  |   1 +
 gcc/toplev.cc   |   2 +
 14 files changed, 520 insertions(+), 59 deletions(-)
 create mode 100644 gcc/gcc-urlifier.cc
 create mode 100644 gcc/gcc-urlifier.def
 create mode 100644 gcc/gcc-urlifier.h
 create mode 100644 gcc/pretty-print-urlifier.h

diff --git a/gcc/Makefile.in b/gcc/Ma

[PATCH/RFC 0/4] C/C++/diagnostics: various UX improvements

2023-11-02 Thread David Malcolm
The following patch kit implements the:
  #pragma GCC show_layout (struct foo)
idea I mentioned in my Cauldron talk (in patch 2),  and the other
patches implement various related user experience changes I came
across when implementing it.

Patch 1 reworks how c-pragma.cc parses pragmas, and experiments with
adding links to documentation to the diagnostics messages (on a
suitably capable terminal).

Patch 2 implements the new "show_layout" pragma

Patch 3 adds a new mechanism to the diagnostics subsytem for
automatically adding documentation links to messages, with enough
data to handle the pragmas from patch 1.

Patch 4 attempts to automatically populate the URL data for our docs by
parsing the results of "make html".

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

I'd like to go ahead with patch 1 and patch 3; patch 2 and patch 4 may
need more work, but posting here for feedback.

Thoughts?

David Malcolm (4):
  c/c++: rework pragma parsing
  c: add #pragma GCC show_layout
  diagnostics: add automatic URL-ification within messages
  RFC: add contrib/regenerate-index-urls.py

 contrib/regenerate-index-urls.py  |  245 ++
 gcc/Makefile.in   |3 +-
 gcc/analyzer/record-layout.cc |  235 ++
 gcc/analyzer/record-layout.h  |4 +
 gcc/c-family/c-pragma.cc  |  641 -
 gcc/c-family/c-pragma.h   |5 +-
 gcc/diagnostic.cc |8 +-
 gcc/diagnostic.h  |4 +
 gcc/doc/extend.texi   |   49 +
 gcc/gcc-urlifier.cc   |  159 ++
 gcc/gcc-urlifier.def  | 2532 +
 gcc/gcc-urlifier.h|   26 +
 gcc/gcc.cc|2 +
 gcc/pretty-print-urlifier.h   |   33 +
 gcc/pretty-print.cc   |  242 +-
 gcc/pretty-print.h|5 +-
 gcc/selftest-run-tests.cc |1 +
 gcc/selftest.h|1 +
 gcc/stor-layout.h |3 +
 .../c-c++-common/pragma-message-parsing.c |   21 +
 .../c-c++-common/pragma-optimize-parsing.c|   16 +
 .../c-c++-common/pragma-pack-parsing-1.c  |   19 +
 .../c-c++-common/pragma-pack-parsing-2.c  |4 +
 .../pragma-redefine_extname-parsing.c |9 +
 .../c-c++-common/pragma-target-parsing.c  |   14 +
 .../c-c++-common/pragma-visibility-parsing.c  |   13 +
 .../c-c++-common/pragma-weak-parsing.c|   24 +
 gcc/testsuite/gcc.dg/bad-pragma-locations.c   |   22 +-
 .../gcc.dg/parsing-pragma-show_layout.c   |   15 +
 .../pragma-scalar_storate_order-parsing.c |8 +
 gcc/testsuite/gcc.dg/pragma-show_layout-1.c   |   12 +
 gcc/testsuite/gcc.dg/pragma-show_layout-2.c   |  184 ++
 ...agma-show_layout-infoleak-CVE-2017-18550.c |  175 ++
 gcc/testsuite/gcc.dg/sso-6.c  |2 +-
 gcc/toplev.cc |2 +
 35 files changed, 4589 insertions(+), 149 deletions(-)
 create mode 100755 contrib/regenerate-index-urls.py
 create mode 100644 gcc/gcc-urlifier.cc
 create mode 100644 gcc/gcc-urlifier.def
 create mode 100644 gcc/gcc-urlifier.h
 create mode 100644 gcc/pretty-print-urlifier.h
 create mode 100644 gcc/testsuite/c-c++-common/pragma-message-parsing.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-optimize-parsing.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-pack-parsing-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-pack-parsing-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-redefine_extname-parsing.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-target-parsing.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-visibility-parsing.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-weak-parsing.c
 create mode 100644 gcc/testsuite/gcc.dg/parsing-pragma-show_layout.c
 create mode 100644 gcc/testsuite/gcc.dg/pragma-scalar_storate_order-parsing.c
 create mode 100644 gcc/testsuite/gcc.dg/pragma-show_layout-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pragma-show_layout-2.c
 create mode 100644 
gcc/testsuite/gcc.dg/pragma-show_layout-infoleak-CVE-2017-18550.c

-- 
2.26.3



Re: Re: [tree-optimization/111721] VECT: Support SLP for MASK_LEN_GATHER_LOAD with dummy mask

2023-11-02 Thread Richard Biener
On Thu, 2 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richi.
> 
> The following is the V2 patch:
> Testing on X86 and aarch64 are running.
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 43d742e3c92..e7f7f976f11 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -760,7 +760,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
> - || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> + && !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))

That's not what I wrote.  I wrote to let == BOOLEAN_TYPE pass without
check here, thus

 - && (TREE_CODE (type) == BOOLEAN_TYPE
 + && TREE_CODE (type) != BOOLEAN_TYPE
   && !can_duplicate...

> {
>   if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6ce4868d3e1..6c47121e158 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9859,10 +9859,16 @@ vectorizable_load (vec_info *vinfo,
>mask_index = internal_fn_mask_index (ifn);
>if (mask_index >= 0 && slp_node)
> mask_index = vect_slp_child_index_for_operand (call, mask_index);
> +  slp_tree slp_op = NULL;
>if (mask_index >= 0
>   && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> - &mask, NULL, &mask_dt, &mask_vectype))
> + &mask, &slp_op, &mask_dt, 
> &mask_vectype))
> return false;
> +  /* MASK_LEN_GATHER_LOAD dummy mask -1 should always match the
> +MASK_VECTYPE.  */
> +  if (mask_index >= 0 && slp_node && mask_dt == vect_constant_def
> + && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> +   gcc_unreachable ();

You shouldn't do this here.  Theres code in if (costing_p) that
would need to be updated if you (correctly) want to track slp_op here.

>  }
> 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 19:11
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [tree-optimization/111721] VECT: Support SLP for 
> MASK_LEN_GATHER_LOAD with dummy mask
> On Thu, 2 Nov 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes following FAILs for RVV:
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > 
> > Bootstrap on X86 and regtest passed.
> > 
> > Tested on aarch64 passed.
> > 
> > Ok for trunk ?
> > 
> > PR tree-optimization/111721
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for 
> > dummy mask -1.
> > * tree-vect-stmts.cc (vectorizable_load): Ditto.
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 14 --
> >  gcc/tree-vect-stmts.cc |  8 +++-
> >  2 files changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 43d742e3c92..23ca0318e31 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -756,8 +756,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> > char swap,
> >  {
> >tree type = TREE_TYPE (oprnd);
> >dt = dts[i];
> > -   if ((dt == vect_constant_def
> > -|| dt == vect_external_def)
> > +   if (dt == vect_external_def
> >&& !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >&& (TREE_CODE (type) == BOOLEAN_TYPE
> >|| !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> > @@ -769,6 +768,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> > char swap,
> >  "for variable-length SLP %T\n", oprnd);
> >return -1;
> >  }
> > +   if (dt == vect_constant_def
> > +   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> > +   && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> > + {
> > +   if (dump_enabled_p ())
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + "Build SLP failed: invalid type of def "
> > + "for variable-length SLP %T\n",
> > + oprnd);
> > +   return -1;
> > + }
>  
> I don't think that's quite correct.  can_duplicate_and_interleave_p
> doesn't get enough info here and IIRC even materializing arbitrary
> constants isn't possible with VLA vectors.  The very first thing
> the function does is
>  
>   tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, 
> count);
>   if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
> return false;
>  
> but for masks that's not going to get us the correct vector type.
> While I don't under

[PATCH v2] A new copy propagation and PHI elimination pass

2023-11-02 Thread Filip Kastl
> Hi,
> 
> this is a patch that I submitted two months ago as an RFC. I added some polish
> since.
> 
> It is a new lightweight pass that removes redundant PHI functions and as a
> bonus does basic copy propagation. With Jan Hubička we measured that it is 
> able
> to remove usually more than 5% of all PHI functions when run among early 
> passes
> (sometimes even 13% or more). Those are mostly PHI functions that would be
> later optimized away but with this pass it is possible to remove them early
> enough so that they don't get streamed when runing LTO (and also potentially
> inlined at multiple places). It is also able to remove some redundant PHIs
> that otherwise would still be present during RTL expansion.
> 
> Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
> with and without this patch. These are the sizes of .debug_info and
> .debug_loclists
> 
> .debug_info without patch 181694311
> .debug_infowith patch 181692320
> +0.0011% change
> 
> .debug_loclists without patch 47934753
> .debug_loclistswith patch 47934966
> -0.0004% change
> 
> I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> the program working on my machine sadly. Hope this suffices. Seems to me that
> my patch doesn't have a significant impact on debug info.
> 
> Bootstraped and tested* on x86_64-pc-linux-gnu.
> 
> * One testcase (pr79691.c) did regress. However that is because the test is
> dependent on a certain variable not being copy propagated. I will go into more
> detail about this in a reply to this mail.
> 
> Ok to commit?

This is a second version of the patch.  In this version, I modified the
pr79691.c testcase so that it works as intended with other changes from the
patch.

The pr79691.c testcase checks that we get constants from snprintf calls and
that they simplify into a single constant.  The testcase doesn't account for
the fact that this constant may be further copy propagated which is exactly
what happens with this patch applied.

Bootstrapped and tested on x86_64-pc-linux-gnu.

Ok to commit?

Filip Kastl

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
  RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* tree-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
  for additional copy propagation this patch adds.
* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/Makefile.in |   1 +
 gcc/passes.def  |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
 gcc/tree-pass.h |   1 +
 gcc/tree-ssa-sccopy.cc  | 867 
 6 files changed, 951 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
 create mode 100644 gcc/tree-ssa-sccopy.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a25a1e32fbc..2bd5a015676 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1736,6 +1736,7 @@ OBJS = \
tree-ssa-pre.o \
tree-ssa-propagate.o \
tree-ssa-reassoc.o \
+   tree-ssa-sccopy.o \
tree-ssa-sccvn.o \
tree-ssa-scopedtables.o \
tree-ssa-sink.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 1e1950bdb39..fa6c5a2c9fa 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_if_to_switch);
  NEXT_PASS (pass_convert_switch);
  NEXT_PASS (pass_cleanup_eh);
+ NEXT_PASS (pass_sccopy);
  NEXT_PASS (pass_profile);
  NEXT_PASS (pass_local_pure_const);
  NEXT_PASS (pass_modref);
@@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
 However, this also causes us to misdiagnose cases that should be
 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+  NEXT_PASS (pass_sccopy);
   NEXT_PASS (pass_tail_calls);
   /* Split critical edges before late uninit warning to reduce the
  number of false positives from it.  */
@@ -409,6 +411,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);

Re: [PATCH] A new copy propagation and PHI elimination pass

2023-11-02 Thread Filip Kastl
Hi,

thanks for the guidance.  I'm going to post a new version of the patch with the
testcase modified so that it searches for 'return 9;' instead of '= 9;'.

Filip Kastl


On Fri 2023-10-27 13:55:37, Jeff Law wrote:
> 
> 
> On 10/20/23 07:52, Filip Kastl wrote:
> > On Fri 2023-10-20 15:50:25, Filip Kastl wrote:
> > > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > > 
> > > * One testcase (pr79691.c) did regress. However that is because the test 
> > > is
> > > dependent on a certain variable not being copy propagated. I will go into 
> > > more
> > > detail about this in a reply to this mail.
> > 
> > This testcase checks for the string '= 9' being present in the 
> > tree-optimized
> > gimple dump ({ dg-final { scan-tree-dump " = 9;" "optimized" } }). This is 
> > how
> > the relevant place in the dump looks like without my patch:
> > 
> > int f4 (int i)
> > {
> >int _6;
> > 
> > [local count: 1073741824]:
> >_6 = 9;
> >return _6;
> > 
> > }
> > 
> > Note that '= 9' is indeed present but there is an opportunity for copy
> > propagation. With my patch, the copy propagation happens:
> > 
> > int f4 (int i)
> > {
> >int _6;
> > 
> > [local count: 1073741824]:
> >return 9;
> > 
> > }
> > 
> > Which means no '= 9' is present and therefore the test fails.
> > 
> > What should I do? I don't suppose that changing the testcase to search for 
> > just
> > '9' would be wise since the dump may contain other '9's. I could change it 
> > to
> > search for 'return 9'. That would make it dependent on some copy propagation
> > being run late enough. However it is currently dependent on *no* copy
> > propagation being run late in the compilation. Also, if the test would 
> > search
> > for 'return 9', it would search for the most optimized version of the 
> > function
> > f4.
> > 
> > Or maybe searching for '9;' would work.
> So in general you have to go back and try to assess the original intent of
> the test.  Once you have the original intent, the path forward is often
> clear.
> 
> In this specific case the source is:
> +/* Verify -fprintf-return-value results used for constant propagation.  */
> +int f4 (int i)
> +{
> +  int n1 = __builtin_snprintf (0, 0, "%i", 1234);
> +  int n2 = __builtin_snprintf (0, 0, "%i", 12345);
> +  return n1 + n2;
> +}
> 
> And the intent of the test is to verify that we get constants from the
> snprintf calls and that they in turn simplify to a constant.
> 
> That is certainly still the case after your patch, just the form of the
> output is different (the constant is propagated further).  So I think
> testing for "return 9" would be the right approach here.
> 
> jeff


[PATCH] Format gotools.sum closer to what DejaGnu does

2023-11-02 Thread Maxim Kuvyrkov
... to restore compatability with validate_failures.py .
The testsuite script validate_failures.py expects
"Running  ..." to extract  values,
and gotools.sum provided "Running ".

Note that libgo.sum, which also uses Makefile logic to generate
DejaGnu-like output, already has "..." suffix.

gotools/ChangeLog:

* Makefile.am: Update "Running  ..." output
* Makefile.in: Regenerate.
---
 gotools/Makefile.am | 4 ++--
 gotools/Makefile.in | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gotools/Makefile.am b/gotools/Makefile.am
index 7b5302990f8..d2376b9c25b 100644
--- a/gotools/Makefile.am
+++ b/gotools/Makefile.am
@@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
check-cgo-test check-carchive-test
@cp gotools.sum gotools.log
@for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
  testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
- echo "Running $${testname}" >> gotools.sum; \
- echo "Running $${testname}" >> gotools.log; \
+ echo "Running $${testname} ..." >> gotools.sum; \
+ echo "Running $${testname} ..." >> gotools.log; \
  sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
  grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
done
diff --git a/gotools/Makefile.in b/gotools/Makefile.in
index 2783b91ef4b..9cc238e748d 100644
--- a/gotools/Makefile.in
+++ b/gotools/Makefile.in
@@ -317,6 +317,7 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
+runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
@@ -1003,8 +1004,8 @@ mostlyclean-local:
 @NATIVE_TRUE@  @cp gotools.sum gotools.log
 @NATIVE_TRUE@  @for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
 @NATIVE_TRUE@testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; 
\
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.sum; \
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.log; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.sum; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.log; \
 @NATIVE_TRUE@sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> 
gotools.log; \
 @NATIVE_TRUE@grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' 
-e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
 @NATIVE_TRUE@  done
-- 
2.34.1



[pushed] analyzer: fix clang warnings [PR112317]

2023-11-02 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5080-gc71028c979d55f.

gcc/analyzer/ChangeLog:
PR analyzer/112317
* access-diagram.cc (class x_aligned_x_ruler_widget): Eliminate
unused field "m_col_widths".
(access_diagram_impl::add_valid_vs_invalid_ruler): Update for
above change.
* region-model.cc
(check_one_function_attr_null_terminated_string_arg): Remove
unused variables "cd_unchecked", "strlen_sval", and
"limited_sval".
* region-model.h (region_model_context_decorator::warn): Add
missing "override".
---
 gcc/analyzer/access-diagram.cc |  9 +++--
 gcc/analyzer/region-model.cc   | 21 +
 gcc/analyzer/region-model.h|  2 +-
 3 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
index c7d190e3188..fb8c0282e75 100644
--- a/gcc/analyzer/access-diagram.cc
+++ b/gcc/analyzer/access-diagram.cc
@@ -919,11 +919,9 @@ class x_aligned_x_ruler_widget : public leaf_widget
 {
 public:
   x_aligned_x_ruler_widget (const access_diagram_impl &dia_impl,
-   const theme &theme,
-   table_dimension_sizes &col_widths)
+   const theme &theme)
   : m_dia_impl (dia_impl),
-m_theme (theme),
-m_col_widths (col_widths)
+m_theme (theme)
   {
   }
 
@@ -973,7 +971,6 @@ private:
 
   const access_diagram_impl &m_dia_impl;
   const theme &m_theme;
-  table_dimension_sizes &m_col_widths;
   std::vector m_labels;
 };
 
@@ -2361,7 +2358,7 @@ private:
 LOG_SCOPE (m_logger);
 
 x_aligned_x_ruler_widget *w
-  = new x_aligned_x_ruler_widget (*this, m_theme, *m_col_widths);
+  = new x_aligned_x_ruler_widget (*this, m_theme);
 
 access_range invalid_before_bits;
 if (m_op.maybe_get_invalid_before_bits (&invalid_before_bits))
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 9479bcf380c..dc834406520 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1877,23 +1877,13 @@ check_one_function_attr_null_terminated_string_arg 
(const gcall *call,
 || access->mode == access_read_write)
&& access->sizarg != UINT_MAX)
   {
-   /* First, check for a null-terminated string *without*
-  emitting warnings (via a null context), to get an
-  svalue for the strlen of the buffer (possibly
-  nullptr if there would be an issue).  */
-   call_details cd_unchecked (call, this, nullptr);
-   const svalue *strlen_sval
- = check_for_null_terminated_string_arg (cd_unchecked,
- arg_idx);
-
-   /* Get svalue for the size limit argument.  */
call_details cd_checked (call, this, ctxt);
const svalue *limit_sval
  = cd_checked.get_arg_svalue (access->sizarg);
const svalue *ptr_sval
  = cd_checked.get_arg_svalue (arg_idx);
/* Try reading all of the bytes expressed by the size param,
-  but without checking (via a null context).  */
+  but without emitting warnings (via a null context).  */
const svalue *limited_sval
  = read_bytes (deref_rvalue (ptr_sval, NULL_TREE, nullptr),
NULL_TREE,
@@ -1912,11 +1902,10 @@ check_one_function_attr_null_terminated_string_arg 
(const gcall *call,
  {
/* Reading up to the truncation limit seems OK; repeat
   the read, but with checking enabled.  */
-   const svalue *limited_sval
- = read_bytes (deref_rvalue (ptr_sval, NULL_TREE, ctxt),
-   NULL_TREE,
-   limit_sval,
-   ctxt);
+   read_bytes (deref_rvalue (ptr_sval, NULL_TREE, ctxt),
+   NULL_TREE,
+   limit_sval,
+   ctxt);
  }
return;
   }
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 8bfb06880ff..4d8480df141 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -890,7 +890,7 @@ class region_model_context_decorator : public 
region_model_context
 {
  public:
   bool warn (std::unique_ptr d,
-const stmt_finder *custom_finder)
+const stmt_finder *custom_finder) override
   {
 if (m_inner)
   return m_inner->warn (std::move (d), custom_finder);
-- 
2.26.3



RE: [PATCH v1] EXPMED: Allow vector mode for DSE extract_low_bits [PR111720]

2023-11-02 Thread Li, Pan2
Thanks Richard B for comments.

> when there are integer modes for the vector modes you now go a different path,
> a little less "regressing" would be to write it as
> 
>   if (int_mode_for_mode (src_mode).exists (&src_int_mode)
>&& int_mode_for_mode (mode).exists (&int_mode))
>  {
> ... old code ...
>  }
>   else if (VECTOR_MODE_P (mode) && VECTOR_MODE_P (src_mode))
>  {
> ... new code ...
>}
>   else
>  return NULL_RTX;

That make sense to me, will update it in V2.

> so you're really expecting to generate a subreg here?  Given "vector
> register layout"
> isn't something that's very well defined I fear it's going to be
> difficult to guarantee
> the desired semantics of this function.  IIRC powerpc64le has big-endian lane
> order for example.

This should be one problem here, I may need more consideration here regarding 
different backends.

Pan


-Original Message-
From: Richard Biener  
Sent: Thursday, November 2, 2023 4:20 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com; jeffreya...@gmail.com; 
richard.sandif...@arm.com
Subject: Re: [PATCH v1] EXPMED: Allow vector mode for DSE extract_low_bits 
[PR111720]

On Thu, Nov 2, 2023 at 4:15 AM  wrote:
>
> From: Pan Li 
>
> The extract_low_bits only try the scalar mode if the bitsize of
> the mode and src_mode is not equal. When vector mode is given
> from get_stored_val in DSE, it will always fail and return NULL_RTX.
>
> This patch would like to allow the vector mode in the extract_low_bits
> if and only if the size of mode is less than or equals to the size of
> the src_mode.
>
> Given below example code with --param=riscv-autovec-preference=fixed-vlmax.
>
> vuint8m1_t test () {
>   uint8_t arr[32] = {
> 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
> 1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
>   };
>
>   return __riscv_vle8_v_u8m1(arr, 32);
> }
>
> Before this patch:
>
> test:
>   lui a5,%hi(.LANCHOR0)
>   addisp,sp,-32
>   addia5,a5,%lo(.LANCHOR0)
>   li  a3,32
>   vl2re64.v   v2,0(a5)
>   vsetvli zero,a3,e8,m1,ta,ma
>   vs2r.v  v2,0(sp) <== Unnecessary store to stack
>   vle8.v  v1,0(sp) <== Ditto
>   vs1r.v  v1,0(a0)
>   addisp,sp,32
>   jr  ra
>
> After this patch:
>
> test:
>   lui a5,%hi(.LANCHOR0)
>   addia5,a5,%lo(.LANCHOR0)
>   li  a4,32
>   addisp,sp,-32
>   vsetvli zero,a4,e8,m1,ta,ma
>   vle8.v  v1,0(a5)
>   vs1r.v  v1,0(a0)
>   addisp,sp,32
>   jr  ra
>
> Below tests are passed within this patch:
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression test.
>
> PR target/111720
>
> gcc/ChangeLog:
>
> * expmed.cc (extract_low_bits): Allow vector mode if the
> mode size is less than or equal to src_mode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr111720-0.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-1.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-10.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-2.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-3.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-4.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-5.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-6.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-7.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-8.c: New test.
> * gcc.target/riscv/rvv/base/pr111720-9.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/expmed.cc | 44 ---
>  .../gcc.target/riscv/rvv/base/pr111720-0.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-1.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-10.c   | 18 
>  .../gcc.target/riscv/rvv/base/pr111720-2.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-3.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-4.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-5.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-6.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-7.c| 21 +
>  .../gcc.target/riscv/rvv/base/pr111720-8.c| 18 
>  .../gcc.target/riscv/rvv/base/pr111720-9.c| 15 +++
>  12 files changed, 227 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-10.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-5.c
>  create mode

RE: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zhong 
Sent: Thursday, November 2, 2023 8:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; Li, Pan2 ; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec 
iterator

lgtm
 Replied Message 
From
pan2...@intel.com
Date
11/02/2023 19:48
To
gcc-patches@gcc.gnu.org
Cc
juzhe.zh...@rivai.ai,
pan2...@intel.com,
yanzhang.w...@intel.com,
kito.ch...@gmail.com
Subject
[PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Jakub Jelinek
On Thu, Nov 02, 2023 at 12:52:50PM +0100, Richard Biener wrote:
> > What I meant is to emit
> > tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
> > p_5 = &tmp_4[2];
> > i.e. don't associate the pointer with a value of the size, but with
> > an address where to find the size (plus how large it is), basically escape
> > pointer to the size at that point.  And __builtin_dynamic_object_size is 
> > pure,
> > so supposedly it can depend on what the escaped pointer points to.
> 
> Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
> escape point (quite bad IMHO)

That is why I've said we need to decide what cost we want to suffer because
of that.

> and __builtin_dynamic_object_size being
> non-const (that's probably not too bad).

It is already pure,leaf,nothrow (unlike __builtin_object_size which is obviously
const,leaf,nothrow).  Because under the hood, it can read memory when
expanded.

> > We'd see that a particular pointer is size associated with &s.a address
> > and would use that address cast to the type of the third argument (to
> > preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
> > VN CSE it anyway if one has say
> > union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
> >   struct T { char c, d, e, f; char g __attribute__((counted_by 
> > (c))) []; } t; };
> > and
> > .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
> > ...
> > .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
> > ?
> 
> We'd probably CSE that - the usual issue of address-with-same-value.
> 
> > It would mean though that counted_by wouldn't be allowed to be a
> > bit-field...
> 
> Yup.  We could also pass a pointer to the container though, that's good enough
> for the escape, and pass the size by value in addition to that.

I was wondering about stuff like _BitInt.  But sure, counted_by is just an
extension, we can just refuse counting by _BitInt in addition to counting by
floating point, pointers, aggregates, bit-fields, or we could somehow encode
all the needed type's properties numerically into an integral constant.
Similarly for alias set (unless it uses 0 for reads).

Jakub



Re: [PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread juzhe.zhong
lgtm Replied Message Frompan2...@intel.comDate11/02/2023 19:48 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator


Re: [PATCH] doc: explicitly say 'lifetime' for DCE

2023-11-02 Thread Sam James


Richard Biener  writes:

> On Thu, Nov 2, 2023 at 11:25 AM Sam James  wrote:
>>
>>
>> Richard Biener  writes:
>>
>> > On Thu, Nov 2, 2023 at 10:03 AM Sam James  wrote:
>> >>
>> >> Say 'memory lifetime' rather than 'memory life' as lifetime is the more
>> >> standard term nowadays (indeed we have e.g. -fno-lifetime-dse).
>> >>
>> >> It's also easier to grep for if someone is looking for the documentation 
>> >> on
>> >> where we do that.
>> >
>> > OK
>>
>> Could you push for me please? I have a sw account but no gcc access
>> (yet).
>
> Done after fixing ChangeLog format.
>
> Richard.

Thanks!

>
>> cheers
>>
>> >
>> >> gcc/ChangeLog:
>> >> * doc/passes.texi (Dead code elimination): Explicitly say 'lifetime'
>> >> as this has become the standard term for what we're doing here.
>> >>
>> >> Signed-off-by: Sam James 
>> >> ---
>> >>  gcc/doc/passes.texi | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
>> >> index eb2bb6062834..470ac498a132 100644
>> >> --- a/gcc/doc/passes.texi
>> >> +++ b/gcc/doc/passes.texi
>> >> @@ -543,7 +543,7 @@ and is defined by 
>> >> @code{pass_early_warn_uninitialized} and
>> >>  @item Dead code elimination
>> >>
>> >>  This pass scans the function for statements without side effects whose
>> >> -result is unused.  It does not do memory life analysis, so any value
>> >> +result is unused.  It does not do memory lifetime analysis, so any value
>> >>  that is stored in memory is considered used.  The pass is run multiple
>> >>  times throughout the optimization process.  It is located in
>> >>  @file{tree-ssa-dce.cc} and is described by @code{pass_dce}.
>> >> --
>> >> 2.42.0
>> >>
>>



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Richard Biener
On Thu, Nov 2, 2023 at 11:40 AM Jakub Jelinek  wrote:
>
> On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
> > > Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one 
> > > of
> > > the arguments not the size value, but its address.  Then at __bdos time
> > > we would dereference that pointer to get the size.
> > > So,
> > > struct S { int a; char b __attribute__((counted_by (a))) []; };
> > > struct S s;
> > > s.a = 5;
> > > char *p = &s.b[2];
> > > int i1 = __builtin_dynamic_object_size (p, 0);
> > > s.a = 3;
> > > int i2 = __builtin_dynamic_object_size (p, 0);
> > > would then yield 3 and 1 rather than 3 and 3.
> >
> > I fail to see how we can get the __builtin_dynamic_object_size call
> > data dependent on s.a, thus avoid re-ordering or even DSE of the
> > store.
>
> If &s.b[2] is lowered as
> sz_1 = s.a;
> tmp_2 = .ACCESS_WITH_SIZE (&s.b[0], sz_1);
> p_3 = &tmp_2[2];
> then sure, there is no way, you get the size from that point.
> tree-object-size.cc tracking then determines that in a particular
> case the pointer is size associated with sz_1 and use that value
> as the size (with the usual adjustments for pointer arithmetics and the
> like).
>
> What I meant is to emit
> tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
> p_5 = &tmp_4[2];
> i.e. don't associate the pointer with a value of the size, but with
> an address where to find the size (plus how large it is), basically escape
> pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
> so supposedly it can depend on what the escaped pointer points to.

Well, yeah - that would work but depend on .ACCESS_WITH_SIZE being an
escape point (quite bad IMHO) and __builtin_dynamic_object_size being
non-const (that's probably not too bad).

> We'd see that a particular pointer is size associated with &s.a address
> and would use that address cast to the type of the third argument (to
> preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
> VN CSE it anyway if one has say
> union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
>   struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
> []; } t; };
> and
> .ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
> ...
> .ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
> ?

We'd probably CSE that - the usual issue of address-with-same-value.

> It would mean though that counted_by wouldn't be allowed to be a
> bit-field...

Yup.  We could also pass a pointer to the container though, that's good enough
for the escape, and pass the size by value in addition to that.

> Jakub
>


Re: Re: [tree-optimization/111721] VECT: Support SLP for MASK_LEN_GATHER_LOAD with dummy mask

2023-11-02 Thread juzhe.zh...@rivai.ai
Thanks Richi.

The following is the V2 patch:
Testing on X86 and aarch64 are running.

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 43d742e3c92..e7f7f976f11 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -760,7 +760,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char 
swap,
   || dt == vect_external_def)
  && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
  && (TREE_CODE (type) == BOOLEAN_TYPE
- || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
+ && !can_duplicate_and_interleave_p (vinfo, stmts.length (),
  type)))
{
  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6ce4868d3e1..6c47121e158 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9859,10 +9859,16 @@ vectorizable_load (vec_info *vinfo,
   mask_index = internal_fn_mask_index (ifn);
   if (mask_index >= 0 && slp_node)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
+  slp_tree slp_op = NULL;
   if (mask_index >= 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask, &slp_op, &mask_dt, &mask_vectype))
return false;
+  /* MASK_LEN_GATHER_LOAD dummy mask -1 should always match the
+MASK_VECTYPE.  */
+  if (mask_index >= 0 && slp_node && mask_dt == vect_constant_def
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+   gcc_unreachable ();
 }




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 19:11
To: Juzhe-Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [tree-optimization/111721] VECT: Support SLP for 
MASK_LEN_GATHER_LOAD with dummy mask
On Thu, 2 Nov 2023, Juzhe-Zhong wrote:
 
> This patch fixes following FAILs for RVV:
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> 
> Bootstrap on X86 and regtest passed.
> 
> Tested on aarch64 passed.
> 
> Ok for trunk ?
> 
> PR tree-optimization/111721
> 
> gcc/ChangeLog:
> 
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for 
> dummy mask -1.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
> 
> ---
>  gcc/tree-vect-slp.cc   | 14 --
>  gcc/tree-vect-stmts.cc |  8 +++-
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 43d742e3c92..23ca0318e31 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -756,8 +756,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>  {
>tree type = TREE_TYPE (oprnd);
>dt = dts[i];
> -   if ((dt == vect_constant_def
> -|| dt == vect_external_def)
> +   if (dt == vect_external_def
>&& !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>&& (TREE_CODE (type) == BOOLEAN_TYPE
>|| !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> @@ -769,6 +768,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>  "for variable-length SLP %T\n", oprnd);
>return -1;
>  }
> +   if (dt == vect_constant_def
> +   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> +   && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "Build SLP failed: invalid type of def "
> + "for variable-length SLP %T\n",
> + oprnd);
> +   return -1;
> + }
 
I don't think that's quite correct.  can_duplicate_and_interleave_p
doesn't get enough info here and IIRC even materializing arbitrary
constants isn't possible with VLA vectors.  The very first thing
the function does is
 
  tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, 
count);
  if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
return false;
 
but for masks that's not going to get us the correct vector type.
While I don't understand why we have that 'BOOLEAN_TYPE' special
case (maybe the intent was to identify 'mask' operands that way?),
we might want to require that we can materialize both all-zero
and all-ones constant 'mask's.  But then 'mask' operands should
be properly identified here.
 
Maybe we can also simply delay the check to the point we know
whether we're facing an uniform constant or not (note for 'first',
we cannot really special-case vect_constant_def as the second
SLP lane might demote that to vect_external_def).  It's always
a balance of whether to reject sth at SLP build time (possibly
allowing operand swapping to do magic) or to delay checks
to stmt analysis

[AVR PATCH] Improvements to SImode and PSImode shifts by constants.

2023-11-02 Thread Roger Sayle

This patch provides non-looping implementations for more SImode (32-bit)
and PSImode (24-bit) shifts on AVR.  For most cases, these are shorter
and faster than using a loop, but for a few (controlled by optimize_size)
they are a little larger but significantly faster,  The approach is to
perform byte-based shifts by 1, 2 or 3 bytes, followed by bit-based shifts
(effectively in a narrower type) for the remaining bits, beyond 8, 16 or 24.

For example, the simple test case below (inspired by PR 112268):

unsigned long foo(unsigned long x)
{
  return x >> 26;
}

gcc -O2 currently generates:

foo:ldi r18,26
1:  lsr r25
ror r24
ror r23
ror r22
dec r18
brne 1b
ret

which is 8 instructions, and takes ~158 cycles.
With this patch, we now generate:

foo:mov r22,r25
clr r23
clr r24
clr r25
lsr r22
lsr r22
ret

which is 7 instructions, and takes ~7 cycles.

One complication is that the modified functions sometimes use spaces instead
of TABs, with occasional mistakes in GNU-style formatting, so I've fixed
these indentation/whitespace issues.  There's no change in the code for the
cases previously handled/special-cased, with the exception of ashrqi3 reg,5
where with -Os a (4-instruction) loop is shorter than the five single-bit
shifts of a fully unrolled implementation.

This patch has been (partially) tested with a cross-compiler to avr-elf
hosted on x86_64, without a simulator, where the compile-only tests in
the gcc testsuite show no regressions.  If someone could test this more
thoroughly that would be great.


2023-11-02  Roger Sayle  

gcc/ChangeLog
* config/avr/avr.cc (ashlqi3_out): Fix indentation whitespace.
(ashlhi3_out): Likewise.
(avr_out_ashlpsi3): Likewise.  Handle shifts by 9 and 17-22.
(ashlsi3_out): Fix formatting.  Handle shifts by 9 and 25-30.
(ashrqi3_our): Use loop for shifts by 5 when optimizing for size.
Fix indentation whitespace.
(ashrhi3_out): Likewise.
(avr_out_ashrpsi3): Likewise.  Handle shifts by 17.
(ashrsi3_out): Fix indentation.  Handle shifts by 17 and 25.
(lshrqi3_out): Fix whitespace.
(lshrhi3_out): Likewise.
(avr_out_lshrpsi3): Likewise.  Handle shifts by 9 and 17-22.
(lshrsi3_out): Fix indentation.  Handle shifts by 9,17,18 and 25-30.

gcc/testsuite/ChangeLog
* gcc.target/avr/ashlsi-1.c: New test case.
* gcc.target/avr/ashlsi-2.c: Likewise.
* gcc.target/avr/ashrsi-1.c: Likewise.
* gcc.target/avr/ashrsi-2.c: Likewise.
* gcc.target/avr/lshrsi-1.c: Likewise.
* gcc.target/avr/lshrsi-2.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 5e0217de36fc..706599b4aa6a 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -6715,7 +6715,7 @@ ashlqi3_out (rtx_insn *insn, rtx operands[], int *len)
 fatal_insn ("internal compiler error.  Incorrect shift:", insn);
 
   out_shift_with_cnt ("lsl %0",
-  insn, operands, len, 1);
+ insn, operands, len, 1);
   return "";
 }
 
@@ -6728,8 +6728,8 @@ ashlhi3_out (rtx_insn *insn, rtx operands[], int *len)
   if (CONST_INT_P (operands[2]))
 {
   int scratch = (GET_CODE (PATTERN (insn)) == PARALLEL
- && XVECLEN (PATTERN (insn), 0) == 3
- && REG_P (operands[3]));
+&& XVECLEN (PATTERN (insn), 0) == 3
+&& REG_P (operands[3]));
   int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]);
   int k;
   int *t = len;
@@ -6826,8 +6826,9 @@ ashlhi3_out (rtx_insn *insn, rtx operands[], int *len)
  "ror %A0");
 
case 8:
- return *len = 2, ("mov %B0,%A1" CR_TAB
-   "clr %A0");
+ *len = 2;
+ return ("mov %B0,%A1" CR_TAB
+ "clr %A0");
 
case 9:
  *len = 3;
@@ -6974,7 +6975,7 @@ ashlhi3_out (rtx_insn *insn, rtx operands[], int *len)
   len = t;
 }
   out_shift_with_cnt ("lsl %A0" CR_TAB
-  "rol %B0", insn, operands, len, 2);
+ "rol %B0", insn, operands, len, 2);
   return "";
 }
 
@@ -6990,54 +6991,126 @@ avr_out_ashlpsi3 (rtx_insn *insn, rtx *op, int *plen)
   if (CONST_INT_P (op[2]))
 {
   switch (INTVAL (op[2]))
-{
-default:
-  if (INTVAL (op[2]) < 24)
-break;
+   {
+   default:
+ if (INTVAL (op[2]) < 24)
+   break;
 
-  return avr_asm_len ("clr %A0" CR_TAB
-  "clr %B0" CR_TAB
-  "clr %C0", op, plen, 3);
+ return avr_asm_len ("clr %A0" CR_TAB
+ "clr %B0" CR_TAB
+ "clr %C0", op, plen, 3);
 
-case 8:
-  {
-int reg0 = REGNO (op[0]);
-  

[AVR PATCH] Optimize (X>>C)&1 for C in [1, 4, 8, 16, 24] in *insv.any_shift..

2023-11-02 Thread Roger Sayle

This patch optimizes a few special cases in avr.md's *insv.any_shift.
instruction.  This template handles tests for a single bit, where the result
has only a (different) single bit set in the result.  Usually (currently)
this always requires a three-instruction sequence of a BST, a CLR and a BLD
(plus any additional CLR instructions to clear the rest of the result
bytes).
The special cases considered here are those that can be done with only two
instructions (plus CLRs); an ANDI preceded by either a MOV, a SHIFT or a
SWAP.

Hence for C=1 in HImode, GCC with -O2 currently generates:

bst r24,1
clr r24
clr r25
bld r24,0

with this patch, we now generate:

lsr r24
andi r24,1
clr r25

Likewise, HImode C=4 now becomes:

swap r24
andi r24,1
clr r25

and SImode C=8 now becomes:

mov r22,r23
andi r22,1
clr 23
clr 24
clr 25


I've not attempted to model the instruction length accurately for these
special cases; the logic would be ugly, but it's safe to use the current
(1 insn longer) length.

This patch has been (partially) tested with a cross-compiler to avr-elf
hosted on x86_64, without a simulator, where the compile-only tests in
the gcc testsuite show no regressions.  If someone could test this more
thoroughly that would be great.


2023-11-02  Roger Sayle  

gcc/ChangeLog
* config/avr/avr.md (*insv.any_shift.): Optimize special
cases of *insv.any_shift that save one instruction by using
ANDI with either a MOV, a SHIFT or a SWAP.

gcc/testsuite/ChangeLog
* gcc.target/avr/insvhi-1.c: New HImode test case.
* gcc.target/avr/insvhi-2.c: Likewise.
* gcc.target/avr/insvhi-3.c: Likewise.
* gcc.target/avr/insvhi-4.c: Likewise.
* gcc.target/avr/insvhi-5.c: Likewise.
* gcc.target/avr/insvqi-1.c: New QImode test case.
* gcc.target/avr/insvqi-2.c: Likewise.
* gcc.target/avr/insvqi-3.c: Likewise.
* gcc.target/avr/insvqi-4.c: Likewise.
* gcc.target/avr/insvsi-1.c: New SImode test case.
* gcc.target/avr/insvsi-2.c: Likewise.
* gcc.target/avr/insvsi-3.c: Likewise.
* gcc.target/avr/insvsi-4.c: Likewise.
* gcc.target/avr/insvsi-5.c: Likewise.
* gcc.target/avr/insvsi-6.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 83dd15040b07..c2a1931733f8 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -9840,6 +9840,7 @@
(clobber (reg:CC REG_CC))]
   "reload_completed"
   {
+int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]);
 int shift =  == ASHIFT ? INTVAL (operands[2]) : -INTVAL 
(operands[2]);
 int mask = GET_MODE_MASK (mode) & INTVAL (operands[3]);
 // Position of the output / input bit, respectively.
@@ -9850,6 +9851,217 @@
 operands[3] = GEN_INT (obit);
 operands[2] = GEN_INT (ibit);
 
+/* Special cases requiring MOV to low byte and ANDI.  */
+if ((shift & 7) == 0 && ldi_ok)
+  {
+   if (IN_RANGE (obit, 0, 7))
+ {
+   if (shift == -8)
+ {
+   if ( == 2)
+ return "mov %A0,%B1\;andi %A0,lo8(1<<%3)\;clr %B0";
+   if ( == 3)
+ return "mov %A0,%B1\;andi %A0,lo8(1<<%3)\;clr %B0\;clr %C0";
+   if ( == 4 && !AVR_HAVE_MOVW)
+ return "mov %A0,%B1\;andi %A0,lo8(1<<%3)\;"
+"clr %B0\;clr %C0\;clr %D0";
+ }
+   else if (shift == -16)
+ {
+   if ( == 3)
+ return "mov %A0,%C1\;andi %A0,lo8(1<<%3)\;clr %B0\;clr %C0";
+   if ( == 4 && !AVR_HAVE_MOVW)
+ return "mov %A0,%C1\;andi %A0,lo8(1<<%3)\;"
+"clr %B0\;clr %C0\;clr %D0";
+ }
+   else if (shift == -24 && !AVR_HAVE_MOVW)
+ return "mov %A0,%D1\;andi %A0,lo8(1<<%3)\;"
+"clr %B0\;clr %C0\;clr %D0";
+ }
+
+   /* Special cases requiring MOV and ANDI.  */
+   else if (IN_RANGE (obit, 8, 15))
+ {
+   if (shift == 8)
+ {
+   if ( == 2)
+ return "mov %B0,%A1\;andi %B0,lo8(1<<(%3-8))\;clr %A0";
+   if ( == 3)
+ return "mov %B0,%A1\;andi %B0,lo8(1<<(%3-8))\;"
+"clr %A0\;clr %C0";
+   if ( == 4 && !AVR_HAVE_MOVW)
+ return "mov %B0,%A1\;andi %B0,lo8(1<<(%3-8))\;"
+"clr %A0\;clr %C0\;clr %D0";
+ }
+   else if (shift == -8)
+ {
+   if ( == 3)
+ return "mov %B0,%C1\;andi %B0,lo8(1<<(%3-8))\;"
+"clr %A0\;clr %C0";
+   if ( == 4 && !AVR_HAVE_MOVW)
+ return "mov %B0,%C1\;andi %B0,lo8(1<<(%3-8))\;"
+"clr %B0\;clr %C0\;clr %D0";
+  

Re: [PATCH] doc: explicitly say 'lifetime' for DCE

2023-11-02 Thread Richard Biener
On Thu, Nov 2, 2023 at 11:25 AM Sam James  wrote:
>
>
> Richard Biener  writes:
>
> > On Thu, Nov 2, 2023 at 10:03 AM Sam James  wrote:
> >>
> >> Say 'memory lifetime' rather than 'memory life' as lifetime is the more
> >> standard term nowadays (indeed we have e.g. -fno-lifetime-dse).
> >>
> >> It's also easier to grep for if someone is looking for the documentation on
> >> where we do that.
> >
> > OK
>
> Could you push for me please? I have a sw account but no gcc access
> (yet).

Done after fixing ChangeLog format.

Richard.

> cheers
>
> >
> >> gcc/ChangeLog:
> >> * doc/passes.texi (Dead code elimination): Explicitly say 'lifetime'
> >> as this has become the standard term for what we're doing here.
> >>
> >> Signed-off-by: Sam James 
> >> ---
> >>  gcc/doc/passes.texi | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> >> index eb2bb6062834..470ac498a132 100644
> >> --- a/gcc/doc/passes.texi
> >> +++ b/gcc/doc/passes.texi
> >> @@ -543,7 +543,7 @@ and is defined by @code{pass_early_warn_uninitialized} 
> >> and
> >>  @item Dead code elimination
> >>
> >>  This pass scans the function for statements without side effects whose
> >> -result is unused.  It does not do memory life analysis, so any value
> >> +result is unused.  It does not do memory lifetime analysis, so any value
> >>  that is stored in memory is considered used.  The pass is run multiple
> >>  times throughout the optimization process.  It is located in
> >>  @file{tree-ssa-dce.cc} and is described by @code{pass_dce}.
> >> --
> >> 2.42.0
> >>
>


[PATCH v1] RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator

2023-11-02 Thread pan2 . li
From: Pan Li 

The previous rounding API start with i/l/ll only works on the same
mode types. For example as below, and we arrange the iterator similar
to fcvt.

* SF => SI
* DF => DI

After we refined this limination from middle-end, these API can also
vectorized with different type sizes, aka:

* HF => SI, HF => DI
* SF => DI, SF => SI
* DF => SI, DF => DI

Then the iterator cannot take care of this simply and this patch
would like to re-arrange the iterator in two items.

* V_VLS_F_CONVERT_SI: handle (HF, SF, DF) => SI
* V_VLS_F_CONVERT_DI: handle (HF, SF, DF) => DI

As well as related mode_attr to reconcile the new iterator.

gcc/ChangeLog:

* config/riscv/autovec.md (lrint2): Remove.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
(lrint2): New pattern for cvt from
FP to SI.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
(lrint2): New pattern for cvt from
FP to DI.
(lround2): Ditto.
(lceil2): Ditto.
(lfloor2): Ditto.
* config/riscv/vector-iterators.md: Renew iterators for both
the SI and DI.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md  |  72 +++---
 gcc/config/riscv/vector-iterators.md | 199 ---
 2 files changed, 237 insertions(+), 34 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f5e3e347ace..81acb1a815b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2395,42 +2395,82 @@ (define_expand "roundeven2"
   }
 )
 
-(define_expand "lrint2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+(define_expand "lrint2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
   "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
-riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
 )
 
-(define_expand "lround2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+(define_expand "lrint2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
   "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
-riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lrint (operands[0], operands[1], mode, 
mode);
 DONE;
   }
 )
 
-(define_expand "lceil2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+(define_expand "lround2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
   "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
-riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
 DONE;
   }
 )
 
-(define_expand "lfloor2"
-  [(match_operand:0 "register_operand")
-   (match_operand:V_VLS_FCONVERT_I_L_LL 1 "register_operand")]
+(define_expand "lround2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
   "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
-riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
+riscv_vector::expand_vec_lround (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lceil2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lceil2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lceil (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lfloor2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
+
+(define_expand "lfloor2"
+  [(match_operand:   0 "register_operand")
+   (match_operand:V_VLS_F_CONVERT_DI 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
 DONE;
   }
 )
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index d9b5dec5edb..f2d9f60b631 100644
--- a/gcc/config/riscv/vector-iter

RE: [PATCH] RISC-V: Fix bug of AVL propagation PASS

2023-11-02 Thread Li, Pan2
Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Thursday, November 2, 2023 7:34 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; kito.ch...@gmail.com; kito.ch...@sifive.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V: Fix bug of AVL propagation PASS

LGTM.

Regards
 Robin


Re: [PATCH] RISC-V: Fix bug of AVL propagation PASS

2023-11-02 Thread Robin Dapp
LGTM.

Regards
 Robin


[PATCH] RISC-V: Fix bug of AVL propagation PASS

2023-11-02 Thread Juzhe-Zhong
A run FAIL suddenly shows up today to me:
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c 
execution test

that I didn't have before.

After investigation, I realize that there is a bug in AVL propagtion PASS.

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc 
(pass_avlprop::get_vlmax_ta_preferred_avl): Don't allow non-real insn AVL 
propation.

---
 gcc/config/riscv/riscv-avlprop.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index bec1e3c715a..1dfaa8742da 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -308,6 +308,13 @@ pass_avlprop::get_vlmax_ta_preferred_avl (insn_info *insn) 
const
  def_info *def2 = dl.prev_def (use_insn);
  if (!def1 || !def2 || def1 != def2)
return NULL_RTX;
+ /* For vectorized codes, we always use SELECT_VL/MIN_EXPR to
+calculate the loop len at the header of the loop.
+We only allow AVL propagation for real instruction for now.
+TODO: We may enhance it for intrinsic codes if it is necessary.
+ */
+ if (!def1->insn ()->is_real ())
+   return NULL_RTX;
 
  /* FIXME: We only all AVL propation within a block which should
 be totally enough for vectorized codes.
-- 
2.36.3



Re: [tree-optimization/111721] VECT: Support SLP for MASK_LEN_GATHER_LOAD with dummy mask

2023-11-02 Thread Richard Biener
On Thu, 2 Nov 2023, Juzhe-Zhong wrote:

> This patch fixes following FAILs for RVV:
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> 
> Bootstrap on X86 and regtest passed.
> 
> Tested on aarch64 passed.
> 
> Ok for trunk ?
> 
> PR tree-optimization/111721
> 
> gcc/ChangeLog:
> 
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for 
> dummy mask -1.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
> 
> ---
>  gcc/tree-vect-slp.cc   | 14 --
>  gcc/tree-vect-stmts.cc |  8 +++-
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 43d742e3c92..23ca0318e31 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -756,8 +756,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>   {
> tree type = TREE_TYPE (oprnd);
> dt = dts[i];
> -   if ((dt == vect_constant_def
> -|| dt == vect_external_def)
> +   if (dt == vect_external_def
> && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> && (TREE_CODE (type) == BOOLEAN_TYPE
> || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> @@ -769,6 +768,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>"for variable-length SLP %T\n", oprnd);
> return -1;
>   }
> +   if (dt == vect_constant_def
> +   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> +   && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "Build SLP failed: invalid type of def "
> +  "for variable-length SLP %T\n",
> +  oprnd);
> +   return -1;
> + }

I don't think that's quite correct.  can_duplicate_and_interleave_p
doesn't get enough info here and IIRC even materializing arbitrary
constants isn't possible with VLA vectors.  The very first thing
the function does is

  tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, 
count);
  if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
return false;

but for masks that's not going to get us the correct vector type.
While I don't understand why we have that 'BOOLEAN_TYPE' special
case (maybe the intent was to identify 'mask' operands that way?),
we might want to require that we can materialize both all-zero
and all-ones constant 'mask's.  But then 'mask' operands should
be properly identified here.

Maybe we can also simply delay the check to the point we know
whether we're facing an uniform constant or not (note for 'first',
we cannot really special-case vect_constant_def as the second
SLP lane might demote that to vect_external_def).  It's always
a balance of whether to reject sth at SLP build time (possibly
allowing operand swapping to do magic) or to delay checks
to stmt analysis time.  That might also explain that you
do not see fallout of the "wrong" change (the later checking
will catch it anyway).

There's probably testsuite coverage for SVE here.

That said, a "correct" patch might be to simply change

  && (TREE_CODE (type) == BOOLEAN_TYPE
  || !can_duplicate_and_interleave_p (vinfo, stmts.length 
(),
  type)))

to

   && TREE_CODE (type) != BOOLEAN_TYPE
   && !can_duplicate_and_interleave_p (vinfo, stmts.length 
(),   
  type)

thus delay 'mask' operand validation here.

Note I still think we should improve TREE_CODE (type) == BOOLEAN_TYPE
to identify internal function mask operands only.

Richard.


>  
> /* For the swapping logic below force vect_reduction_def
>for the reduction op in a SLP reduction group.  */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6ce4868d3e1..6c47121e158 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9859,10 +9859,16 @@ vectorizable_load (vec_info *vinfo,
>mask_index = internal_fn_mask_index (ifn);
>if (mask_index >= 0 && slp_node)
>   mask_index = vect_slp_child_index_for_operand (call, mask_index);
> +  slp_tree slp_op = NULL;
>if (mask_index >= 0
> && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> -   &mask, NULL, &mask_dt, &mask_vectype))
> +   &mask, &slp_op, &mask_dt, &mask_vectype))
>   return false;
> +  /* MASK_LEN_GATHER_LOAD dummy mask -1 should always match the
> +  MASK_VECTYPE.  */
>

Re: [committed] libstdc++: Minor update to installation docs

2023-11-02 Thread Jonathan Wakely
On Wed, 1 Nov 2023 at 22:11, Gerald Pfeifer  wrote:
>
> On Mon, 18 Sep 2023, Jonathan Wakely via Gcc-patches wrote:
> > @@ -103,8 +103,10 @@ ln -s libiconv-1.16 libiconv
> >   
> > If GCC 3.1.0 or later on is being used on GNU/Linux, an attempt
> > will be made to use "C" library functionality necessary for
> > -   C++ named locale support.  For GCC 4.6.0 and later, this
> > -   means that glibc 2.3 or later is required.
> > +   C++ named locale support, e.g. the newlocale
> > +   and uselocale functions.
> > +   For GCC 4.6.0 and later,
> > +   this means that glibc 2.3 or later is required.
>
> Do we still need to provide those details on GCC 3.1+ and GCC 4.6+?
>
> Would it make sense to simply require glibc 2.3 (or higher)?

Yes that probably makes sense now.



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-02 Thread Jakub Jelinek
On Thu, Nov 02, 2023 at 11:18:09AM +0100, Richard Biener wrote:
> > Or, if we want to pay further price, .ACCESS_WITH_SIZE could take as one of
> > the arguments not the size value, but its address.  Then at __bdos time
> > we would dereference that pointer to get the size.
> > So,
> > struct S { int a; char b __attribute__((counted_by (a))) []; };
> > struct S s;
> > s.a = 5;
> > char *p = &s.b[2];
> > int i1 = __builtin_dynamic_object_size (p, 0);
> > s.a = 3;
> > int i2 = __builtin_dynamic_object_size (p, 0);
> > would then yield 3 and 1 rather than 3 and 3.
> 
> I fail to see how we can get the __builtin_dynamic_object_size call
> data dependent on s.a, thus avoid re-ordering or even DSE of the
> store.

If &s.b[2] is lowered as
sz_1 = s.a;
tmp_2 = .ACCESS_WITH_SIZE (&s.b[0], sz_1);
p_3 = &tmp_2[2];
then sure, there is no way, you get the size from that point.
tree-object-size.cc tracking then determines that in a particular
case the pointer is size associated with sz_1 and use that value
as the size (with the usual adjustments for pointer arithmetics and the
like).

What I meant is to emit
tmp_4 = .ACCESS_WITH_SIZE (&s.b[0], &s.a, (typeof (&s.a)) 0);
p_5 = &tmp_4[2];
i.e. don't associate the pointer with a value of the size, but with
an address where to find the size (plus how large it is), basically escape
pointer to the size at that point.  And __builtin_dynamic_object_size is pure,
so supposedly it can depend on what the escaped pointer points to.
We'd see that a particular pointer is size associated with &s.a address
and would use that address cast to the type of the third argument (to
preserve the exact pointer type on INTEGER_CST, though not sure, wouldn't
VN CSE it anyway if one has say
union U { struct S { int a; char b __attribute__((counted_by (a))) []; } s;
  struct T { char c, d, e, f; char g __attribute__((counted_by (c))) 
[]; } t; };
and
.ACCESS_WITH_SIZE (&v.s.b[0], &v.s.a, (int *) 0);
...
.ACCESS_WITH_SIZE (&v.t.g[0], &v.t.c, (int *) 0);
?

It would mean though that counted_by wouldn't be allowed to be a
bit-field...

Jakub



[PATCH] tree-optimization/112320 - bougs debug IL after SCCP

2023-11-02 Thread Richard Biener
The following addresses wrong debug IL created by SCCP rewriting stmts
to defined overflow.  I addressed another inefficiency there but
needed to adjust the API of rewrite_to_defined_overflow for this
which is now taking a stmt iterator for in-place operation and a
stmt for sequence producing because gsi_for_stmt doesn't work for
stmts not in the IL.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112320
* gimple-fold.h (rewrite_to_defined_overflow): New overload
for in-place operation.
* gimple-fold.cc (rewrite_to_defined_overflow): Add stmt
iterator argument to worker, define separate API for
in-place and not in-place operation.
* tree-if-conv.cc (predicate_statements): Simplify.
* tree-scalar-evolution.cc (final_value_replacement_loop):
Likewise.
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute): Adjust.
* tree-ssa-reassoc.cc (update_range_test): Likewise.

* gcc.dg/pr112320.c: New testcase.
---
 gcc/gimple-fold.cc  | 25 ++---
 gcc/gimple-fold.h   |  3 ++-
 gcc/testsuite/gcc.dg/pr112320.c | 14 ++
 gcc/tree-if-conv.cc | 19 +--
 gcc/tree-scalar-evolution.cc| 15 ---
 gcc/tree-ssa-ifcombine.cc   |  2 +-
 gcc/tree-ssa-reassoc.cc |  2 +-
 7 files changed, 41 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112320.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 853edd9e5d4..a5be2ee048b 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8769,12 +8769,14 @@ arith_code_with_undefined_signed_overflow (tree_code 
code)
its operand, carrying out the operation in the corresponding unsigned
type and converting the result back to the original type.
 
-   If IN_PLACE is true, adjust the stmt in place and return NULL.
+   If IN_PLACE is true, *GSI points to STMT, adjust the stmt in place and
+   return NULL.
Otherwise returns a sequence of statements that replace STMT and also
contain a modified form of STMT itself.  */
 
-gimple_seq
-rewrite_to_defined_overflow (gimple *stmt, bool in_place /* = false */)
+static gimple_seq
+rewrite_to_defined_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
+bool in_place)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
@@ -8801,9 +8803,8 @@ rewrite_to_defined_overflow (gimple *stmt, bool in_place 
/* = false */)
   gimple_set_modified (stmt, true);
   if (in_place)
 {
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
   if (stmts)
-   gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
   stmts = NULL;
 }
   else
@@ -8811,8 +8812,7 @@ rewrite_to_defined_overflow (gimple *stmt, bool in_place 
/* = false */)
   gimple *cvt = gimple_build_assign (lhs, NOP_EXPR, gimple_assign_lhs (stmt));
   if (in_place)
 {
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
-  gsi_insert_after (&gsi, cvt, GSI_SAME_STMT);
+  gsi_insert_after (gsi, cvt, GSI_SAME_STMT);
   update_stmt (stmt);
 }
   else
@@ -8821,6 +8821,17 @@ rewrite_to_defined_overflow (gimple *stmt, bool in_place 
/* = false */)
   return stmts;
 }
 
+void
+rewrite_to_defined_overflow (gimple_stmt_iterator *gsi)
+{
+  rewrite_to_defined_overflow (gsi, gsi_stmt (*gsi), true);
+}
+
+gimple_seq
+rewrite_to_defined_overflow (gimple *stmt)
+{
+  return rewrite_to_defined_overflow (nullptr, stmt, false);
+}
 
 /* The valueization hook we use for the gimple_build API simplification.
This makes us match fold_buildN behavior by only combining with
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 2fd58db9a2e..f69bcc7d3e4 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -60,7 +60,8 @@ extern tree gimple_fold_indirect_ref (tree);
 extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
 extern bool arith_code_with_undefined_signed_overflow (tree_code);
-extern gimple_seq rewrite_to_defined_overflow (gimple *, bool = false);
+extern void rewrite_to_defined_overflow (gimple_stmt_iterator *);
+extern gimple_seq rewrite_to_defined_overflow (gimple *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
 extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
 extern void gsi_replace_with_seq_vops (gimple_stmt_iterator *, gimple_seq);
diff --git a/gcc/testsuite/gcc.dg/pr112320.c b/gcc/testsuite/gcc.dg/pr112320.c
new file mode 100644
index 000..15cf39f898c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr112320.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+
+unsigned void0_effective_addr2;
+int void0_i, void0_m, void0_p2;
+void void0()
+{
+  void0_m = 800 - (void0_effective_addr2 & 5);
+  int b1;
+  void0_i = 0;
+  for (; void0_i < void0

  1   2   >