Re: RFC: Introduce -fhardened to enable security-related flags

2023-09-16 Thread Sam James via Gcc-patches


Hans-Peter Nilsson via Gcc-patches  writes:

>> Date: Tue, 29 Aug 2023 15:42:27 -0400
>> From: Marek Polacek via Gcc-patches 
>
>> Surely, there must be no ABI impact, the option cannot cause
>> severe performance issues,
>
>> Currently, -fhardened enables:
> ...
>>   -ftrivial-auto-var-init=zero
>
>> Thoughts?
>
> Regarding -ftrivial-auto-var-init=zero, I was consulted when
> colleagues investigating a performance regression
> pint-pointed it as *causing severe performance issues*;
> cf. https://github.com/systemd/systemd.git commit 1a4e392760
> (TL;DR: adds "-ftrivial-auto-var-init=zero" to the systemd
> build).
>
> The situation was described as "we noticed that some test
> suites takes 35% percent longer time to finish.  After
> further investigation it was noticed that running systemctl
> unmask x takes around 5s more time on [version including
> patch vs. before that patch]" (timing out some tests).
> Reverting that patch fixed the drop in performance.

Did some bug ever get filed for this to see if we can do a bit
better here?

Some slowdown doesn't mean it's of the expected magnitude.

>
> Just a data point, but I believe also exactly your intended
> use.  IMO including -ftrivial-auto-var-init is worth extra
> consideration.
>
> Alternatively, strike the while "cannot cause severe
> performance issues".
>
> brgds, H-P



Re: RFC: Introduce -fhardened to enable security-related flags

2023-09-16 Thread Hans-Peter Nilsson via Gcc-patches
> Date: Tue, 29 Aug 2023 15:42:27 -0400
> From: Marek Polacek via Gcc-patches 

> Surely, there must be no ABI impact, the option cannot cause
> severe performance issues,

> Currently, -fhardened enables:
...
>   -ftrivial-auto-var-init=zero

> Thoughts?

Regarding -ftrivial-auto-var-init=zero, I was consulted when
colleagues investigating a performance regression
pint-pointed it as *causing severe performance issues*;
cf. https://github.com/systemd/systemd.git commit 1a4e392760
(TL;DR: adds "-ftrivial-auto-var-init=zero" to the systemd
build).

The situation was described as "we noticed that some test
suites takes 35% percent longer time to finish.  After
further investigation it was noticed that running systemctl
unmask x takes around 5s more time on [version including
patch vs. before that patch]" (timing out some tests).
Reverting that patch fixed the drop in performance.

Just a data point, but I believe also exactly your intended
use.  IMO including -ftrivial-auto-var-init is worth extra
consideration.

Alternatively, strike the while "cannot cause severe
performance issues".

brgds, H-P


[PATCH] RISC-V: Support VLS modes reduction[PR111153]

2023-09-16 Thread Juzhe-Zhong
This patch supports VLS reduction vectorization.

It can optimize the current reduction vectorization codegen with current COST 
model.

#define DEF_REDUC_PLUS(TYPE)\
TYPE __attribute__ ((noinline, noclone))\
reduc_plus_##TYPE (TYPE * __restrict a, int n)  \
{   \
  TYPE r = 0;   \
  for (int i = 0; i < n; ++i)   \
r += a[i];  \
  return r; \
}

#define TEST_PLUS(T)\
  T (int32_t)   \

TEST_PLUS (DEF_REDUC_PLUS)


Before this patch:

vle32.v v2,0(a5)
addia5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
lui a4,%hi(.LC0)
lui a5,%hi(.LC1)
addia4,a4,%lo(.LC0)
vlm.v   v0,0(a4)
addia5,a5,%lo(.LC1)
andia1,a1,-4
vmv1r.v v2,v3
vlm.v   v4,0(a5)
vcompress.vmv2,v1,v0
vmv1r.v v0,v4
vadd.vv v1,v2,v1
vcompress.vmv3,v1,v0
vadd.vv v3,v3,v1
vmv.x.s a0,v3
sext.w  a0,a0
beq a3,a1,.L12

After this patch:

vle32.v v2,0(a5)
addia5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
li  a5,0
andia1,a1,-4
vmv.s.x v2,a5
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
beq a3,a1,.L12

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS mode reduction case.
* gcc.target/riscv/rvv/autovec/vls/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-15.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-16.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-17.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-18.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/reduc-9.c: New test.

---
 gcc/config/riscv/autovec.md   |  2 +-
 .../gcc.target/riscv/rvv/autovec/vls/def.h| 30 +++
 .../riscv/rvv/autovec/vls/reduc-1.c   | 31 +++
 .../riscv/rvv/autovec/vls/reduc-10.c  | 50 
 .../riscv/rvv/autovec/vls/reduc-11.c  | 46 +++
 .../riscv/rvv/autovec/vls/reduc-12.c  | 30 +++
 .../riscv/rvv/autovec/vls/reduc-13.c  | 28 +++
 .../riscv/rvv/autovec/vls/reduc-14.c  | 26 ++
 .../riscv/rvv/autovec/vls/reduc-15.c  | 81 +++
 .../riscv/rvv/autovec/vls/reduc-16.c  | 75 +
 .../riscv/rvv/autovec/vls/reduc-17.c  | 69 
 .../riscv/rvv/autovec/vls/reduc-18.c  | 63 +++
 .../riscv/rvv/autovec/vls/reduc-19.c  | 18 +
 .../riscv/rvv/autovec/vls/reduc-2.c   | 29 +++
 .../riscv/rvv/autovec/vls/reduc-20.c  | 17 
 .../riscv/rvv/autovec/vls/reduc-21.c  | 16 
 .../riscv/rvv/autovec/vls/reduc-3.c   | 27 +++
 .../riscv/rvv/autovec/vls/reduc-4.c   | 25 ++
 .../riscv/rvv/autovec/vls/reduc-5.c   | 18 +
 .../riscv/rvv/autovec/vls/reduc-6.c   | 17 
 .../riscv/rvv/autovec/vls/reduc-7.c   | 16 
 .../riscv/rvv/autovec/vls/reduc-8.c   | 58 +
 .../riscv/rvv/autovec/vls/reduc-9.c   | 54 +
 23 files changed, 825 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-14.c
 create mode 100644 

[PATCH] MATCH: Avoid recusive zero_one_valued_p for conversions

2023-09-16 Thread Andrew Pinski via Gcc-patches
So when VN finds a name which has a nop conversion, it says
both names are equivalent to each other and the valuaization
function for one will return the other. This normally does not
cause any issues as there is no recusive matches. But after
r14-4038-gb975c0dc3be285, there was one added. So we would
do an infinite recusion on the match and never finish.
This fixes the issue (and adds a comment in match.pd) by
for converts just handle one level instead of being recusive
always.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note the testcase was reduced from tree-ssa-loop-niter.cc and then
changed slightly into C rather than C++ but it still needs exceptions
turned on get the IR that VN would produce this equivalence relationship
going on. Also had to turn off early inline to force put to be inlined later.

PR tree-optimization/111435

gcc/ChangeLog:

* match.pd (zero_one_valued_p): Don't do recusion
on converts.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr111435-1.c: New test.
---
 gcc/match.pd   |  8 +++-
 .../gcc.c-torture/compile/pr111435-1.c | 18 ++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111435-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 97405e6a5c3..887665633d4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2188,8 +2188,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* A conversion from an zero_one_valued_p is still a [0,1].
This is useful when the range of a variable is not known */
+/* Note this matches can't be recusive because of the way VN handles
+   nop conversions being equivalent and then recusive between them. */
 (match zero_one_valued_p
- (convert@0 zero_one_valued_p))
+ (convert@0 @1)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
+  && (TYPE_UNSIGNED (TREE_TYPE (@1))
+ || TYPE_PRECISION (TREE_TYPE (@1)) > 1)
+  && wi::leu_p (tree_nonzero_bits (@1), 1
 
 /* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 }.  */
 (simplify
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c
new file mode 100644
index 000..afa84dd59dd
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c
@@ -0,0 +1,18 @@
+/* { dg-options "-fexceptions -fno-early-inlining" } */
+/* { dg-require-effective-target exceptions } */
+
+void find_slot_with_hash(const int *);
+
+void put(const int *k, const int *) {
+find_slot_with_hash(k);
+}
+unsigned len();
+int *address();
+void h(int header, int **bounds) {
+  if (!*bounds)
+return;
+  unsigned t = *bounds ? len() : 0;
+  int queue_index = t;
+  address()[(unsigned)queue_index] = 0;
+  put(, _index);
+}
-- 
2.31.1



Re: [PATCH] LoongArch: Fix lo_sum rtx cost

2023-09-16 Thread chenglulu



在 2023/9/16 下午10:52, WANG Xuerui 写道:

Hi,

On 9/16/23 17:16, mengqinggang wrote:

The cost of lo_sum rtx for addi.d instruction my be a very big number if
computed by common function. It may cause some symbols saving to 
stack and
loading from stack if there no enough registers during loop 
optimization.


Thanks for the patch! It seems though this change is done in order to 
optimize some previously pathetic codegen, am I right? If so, it's 
appreciated to have a minimal test case attached, in order to ensure 
that codegen never regresses. (You can have your teammates help you if 
you're not familiar with that.)


This is a performance optimization problem discovered by Meng Qinggang 
when he was debugging the spec. The specific test cases are not easy to 
extract.


We will try to extract simple test cases to reproduce this optimization. 
If not, we will mark the description information.


Thanks!





gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum 
cost.

---
  gcc/config/loongarch/loongarch.cc | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc

index 845fad5a8e8..0e57f09379c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, 
int outer_code,

  *total = COSTS_N_INSNS (4);
    return false;
  +    case LO_SUM:
+  *total = set_src_cost (XEXP (x, 0), mode, speed);
+  return true;
+
In order for the code to be more maintainable, it may be better to 
duplicate some of the change reasons here, just in case someone in the 
future questions this piece of code that's without any explanation, 
and regresses things (because there's no test case).

  case LT:
  case LTU:
  case LE:




Re: [PATCH] c++: constness of decltype of NTTP object [PR98820]

2023-09-16 Thread Patrick Palka via Gcc-patches
On Sat, 16 Sep 2023, Jason Merrill wrote:

> On 9/15/23 13:55, Patrick Palka wrote:
> > This corrects decltype of a (class) NTTP object as per
> > [dcl.type.decltype]/1.2 and [temp.param]/6 in the type-dependent case.
> > In the non-dependent case (nontype-class8.C) we resolve the decltype
> > ahead of time, and finish_decltype_type already made sure to drop the
> > const VIEW_CONVERT_EXPR wrapper around the TEMPLATE_PARM_INDEX.
> 
> Hmm, seems like dropping the VIEW_CONVERT_EXPR is wrong in this case? I'm not
> sure why I added that.

Ah sorry, my commit message was a bit sloppy.

In the non-dependent case we resolve the decltype ahead of time, in
which case finish_decltype_type drops the const VIEW_CONVERT_EXPR
wrapper around the TEMPLATE_PARM_INDEX, and the latter has the
desired non-const type.

In the type-dependent case, tsubst drops the VIEW_CONVERT_EXPR
because the substituted class NTTP is the already const object
created by get_template_parm_object.  So finish_decltype_type
at instantiation time sees the bare const object, which this patch
now adds special handling for.

So we need to continue dropping the VIEW_CONVERT_EXPR to handle the
non-dependent case.

> 
> Jason
> 
> 



Re: [PATCH] c++: overeager type completion in convert_to_void [PR111419]

2023-09-16 Thread Patrick Palka via Gcc-patches
On Sat, 16 Sep 2023, Jason Merrill wrote:

> On 9/15/23 12:03, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > -- >8 --
> > 
> > Here convert_to_void always completes the type of an INDIRECT_REF or
> > VAR_DECL expression, but according to [expr.context] an lvalue-to-rvalue
> > conversion is applied to a discarded-value expression only if "the
> > expression is a glvalue of volatile-qualified type".  This patch restricts
> > convert_to_void's type completion accordingly.
> > 
> > PR c++/111419
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cvt.cc (convert_to_void) : Only call
> > complete_type if the type is volatile and the INDIRECT_REF
> > isn't an implicit one.
> 
> Hmm, what does implicit have to do with it?  The expression forms listed in
> https://eel.is/c++draft/expr.context#2 include "id-expression"...

When there's an implicit INDIRECT_REF, I reckoned the type of the
id-expression is really a reference type, which can't be cv-qualified?

> 
> > diff --git a/gcc/testsuite/g++.dg/expr/discarded1a.C
> > b/gcc/testsuite/g++.dg/expr/discarded1a.C
> > new file mode 100644
> > index 000..5516ff46fe9
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/expr/discarded1a.C
> > @@ -0,0 +1,16 @@
> > +// PR c++/111419
> > +
> > +struct Incomplete;
> > +
> > +template struct Holder { T t; }; // { dg-error "incomplete" }
> > +
> > +extern volatile Holder a;
> > +extern volatile Holder& b;
> > +extern volatile Holder* c;
> > +
> > +int main() {
> > +  a; // { dg-message "required from here" }
> > +  b; // { dg-warning "implicit dereference will not access object" }
> > + // { dg-bogus "required from here" "" { target *-*-* } .-1 }
> 
> ...so it seems to me this line should get the lvalue-rvalue conversion (and
> not the warning about no access).
> 
> > +  *c; // { dg-message "required from here" }
> > +}
> 
> 



Re: [PATCH] c++: always check arity before deduction

2023-09-16 Thread Jason Merrill via Gcc-patches

On 9/12/23 20:33, Patrick Palka wrote:

Bootstrpaped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

This simple patch extends the r12-3271-gf1e73199569287 optimization
to apply to deduction without explicit template arguments as well.
The motivation for this is to accept testcases such as conv20.C and
ttp40.C below, which don't use explicit template arguments but for which
unnecessary template instantiation during deduction could be avoided if
we pruned overloads according to arity early in this case as well.  This
incidentally causes us to accept one reduced testcase from PR c++/84075,
but the underlying issue there still remains unfixed.

As an added bonus, this change ends up causing the "candidate expects
N argument(s)" note during overload resolution failure to point to the
template candidate instead of the call site, which seems like an
improvement similar to r14-309-g14e881eb030509.

gcc/cp/ChangeLog:

* call.cc (add_template_candidate_real): Check arity even
when there are no explicit template arguments.  Combine the
two adjacent '!obj' tests into one.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/vt-57397-1.C: Expect "candidate expects ... N
argument(s)" at the declaration site instead of the call site.
* g++.dg/cpp0x/vt-57397-2.C: Likewise.
* g++.dg/overload/template5.C: Likewise.
* g++.dg/template/local6.C: Likewise.
* g++.dg/template/conv20.C: New test.
* g++.dg/template/ttp40.C: New test.
---
  gcc/cp/call.cc| 14 ++---
  gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C   |  6 +++---
  gcc/testsuite/g++.dg/cpp0x/vt-57397-2.C   |  6 +++---
  gcc/testsuite/g++.dg/overload/template5.C |  4 ++--
  gcc/testsuite/g++.dg/template/conv20.C| 17 +++
  gcc/testsuite/g++.dg/template/local6.C|  4 ++--
  gcc/testsuite/g++.dg/template/ttp40.C | 25 +++
  7 files changed, 58 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/conv20.C
  create mode 100644 gcc/testsuite/g++.dg/template/ttp40.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 399345307ea..2bbaeee039d 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -3535,13 +3535,13 @@ add_template_candidate_real (struct z_candidate 
**candidates, tree tmpl,
  }
gcc_assert (ia == nargs_without_in_chrg);
  
-  if (!obj && explicit_targs)

+  if (!obj)
  {
/* Check that there's no obvious arity mismatch before proceeding with
 deduction.  This avoids substituting explicit template arguments
-into the template (which could result in an error outside the
-immediate context) when the resulting candidate would be unviable
-anyway.  */
+into the template or e.g. derived-to-base parm/arg unification
+(which could result in an error outside the immediate context) when
+the resulting candidate would be unviable anyway.  */
int min_arity = 0, max_arity = 0;
tree parms = TYPE_ARG_TYPES (TREE_TYPE (tmpl));
parms = skip_artificial_parms_for (tmpl, parms);
@@ -3571,11 +3571,7 @@ add_template_candidate_real (struct z_candidate 
**candidates, tree tmpl,
  reason = arity_rejection (NULL_TREE, max_arity, ia);
  goto fail;
}
-}
  
-  errs = errorcount+sorrycount;

-  if (!obj)
-{
convs = alloc_conversions (nargs);
  
if (shortcut_bad_convs

@@ -3602,6 +3598,8 @@ add_template_candidate_real (struct z_candidate 
**candidates, tree tmpl,
}
}
  }
+
+  errs = errorcount+sorrycount;
fn = fn_type_unification (tmpl, explicit_targs, targs,
args_without_in_chrg,
nargs_without_in_chrg,
diff --git a/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C 
b/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C
index 440bea5b2f7..bac3b64ad7e 100644
--- a/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C
@@ -3,20 +3,20 @@
  
  template

  void foo(T1, Tn...);
+// { dg-message "candidate expects at least 1 argument, 0 provided" "" { 
target *-*-* } .-1 }
  
  template

  void bar(T1, T2, Tn...);
+// { dg-message "candidate expects at least 2 arguments, 0 provided" "" { 
target *-*-* } .-1 }
+// { dg-message "candidate expects at least 2 arguments, 1 provided" "" { 
target *-*-* } .-2 }
  
  int main()

  {
foo();   // { dg-error "no matching" }
-  // { dg-message "candidate expects at least 1 argument, 0 provided" "" { 
target *-*-* } .-1 }
foo(1);
foo(1, 2);
bar();   // { dg-error "no matching" }
-  // { dg-message "candidate expects at least 2 arguments, 0 provided" "" { 
target *-*-* } .-1 }
bar(1);  // { dg-error "no matching" }
-  // { dg-message "candidate expects at least 2 arguments, 1 provided" "" { 
target *-*-* } .-1 }
bar(1, 2);
bar(1, 2, 3);
  }
diff --git a/gcc/testsuite/g++.dg/cpp0x/vt-57397-2.C 

Re: [PATCH] c++: unifying identical tmpls from current inst [PR108347]

2023-09-16 Thread Jason Merrill via Gcc-patches

On 9/13/23 13:53, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Here more_specialized_partial_spec considers the two partial
specializations to be unordered ultimately because unify for identical
parm=arg=A::C returns failure due to C being dependent.

This patch fixes this by relaxing unify's early-exit identity test to
also accept dependent decls; we can't deduce anything further from them
anyway.


OK.


In passing this patch removes the CONST_DECL case of unify:
we should never see the CONST_DECL version of a template parameter here,
and for other CONST_DECLs (such as enumerators) it seems we can rely on
them already having been folded to their DECL_INITIAL.


Hmm, I think I'd prefer to add a gcc_unreachable in case we decide to 
defer that folding at some point.


Jason



Re: [PATCH] c++: optimize unification of class specializations [PR89231]

2023-09-16 Thread Jason Merrill via Gcc-patches

On 9/13/23 13:53, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

Since the LHS of a qualified-id is a non-deduced context, it effectively
means we can't deduce from outer template arguments of a class template
specialization.  And checking for equality between the TI_TEMPLATE of a
class specialization parm/arg already implies that the outer template
arguments are the same.  Hence recursing into outer template arguments
during unification of class specializations is redundant, so this patch
makes unify recurse only into innermost arguments.

This incidentally fixes the testcase from PR89231 because there
more_specialized_partial_inst considers the two partial specializations
to be unordered ultimately because unify for identical
parm=arg=A::Collect gets confused when it recurses into
parm=arg={Ps...} since the level of Ps doesn't match the innermost level
of tparms that we're actually deducing.

PR c++/89231

gcc/cp/ChangeLog:

* pt.cc (try_class_unification): Strengthen TI_TEMPLATE equality
test by not calling most_general_template.  Only unify the
innermost levels of template arguments.
(unify) : Only unify the innermost levels of
template arguments.  Don't unify template arguments if the
template is not primary.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic-partial3.C: New test.
---
  gcc/cp/pt.cc  | 17 +++--
  .../g++.dg/cpp0x/variadic-partial3.C  | 19 +++
  2 files changed, 30 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 838179d5fe3..c88e9cd0fa6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23999,8 +23999,7 @@ try_class_unification (tree tparms, tree targs, tree 
parm, tree arg,
  return NULL_TREE;
else if (TREE_CODE (parm) == BOUND_TEMPLATE_TEMPLATE_PARM)
  /* Matches anything.  */;
-  else if (most_general_template (CLASSTYPE_TI_TEMPLATE (arg))
-  != most_general_template (CLASSTYPE_TI_TEMPLATE (parm)))
+  else if (CLASSTYPE_TI_TEMPLATE (arg) != CLASSTYPE_TI_TEMPLATE (parm))
  return NULL_TREE;
  
/* We need to make a new template argument vector for the call to

@@ -24041,8 +24040,10 @@ try_class_unification (tree tparms, tree targs, tree 
parm, tree arg,
if (TREE_CODE (parm) == BOUND_TEMPLATE_TEMPLATE_PARM)
  err = unify_bound_ttp_args (tparms, targs, parm, arg, explain_p);
else
-err = unify (tparms, targs, CLASSTYPE_TI_ARGS (parm),
-CLASSTYPE_TI_ARGS (arg), UNIFY_ALLOW_NONE, explain_p);
+err = unify (tparms, targs,
+INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)),
+INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (arg)),
+UNIFY_ALLOW_NONE, explain_p);
  
return err ? NULL_TREE : arg;

  }
@@ -25167,11 +25168,15 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
/* There's no chance of unification succeeding.  */
return unify_type_mismatch (explain_p, parm, arg);
  
-	  return unify (tparms, targs, CLASSTYPE_TI_ARGS (parm),

-   CLASSTYPE_TI_ARGS (t), UNIFY_ALLOW_NONE, explain_p);
+ if (PRIMARY_TEMPLATE_P (CLASSTYPE_TI_TEMPLATE (t)))
+   return unify (tparms, targs,
+ INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)),
+ INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (t)),
+ UNIFY_ALLOW_NONE, explain_p);
}
else if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
return unify_type_mismatch (explain_p, parm, arg);
+
return unify_success (explain_p);
  
  case METHOD_TYPE:

diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C
new file mode 100644
index 000..5af60711320
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C
@@ -0,0 +1,19 @@
+// PR c++/89231
+// { dg-do compile { target c++11 } }
+
+template
+struct A {
+  template
+  struct Collect { };
+
+  template>
+  struct Seq;
+
+  template
+  struct Seq> : Seq> { };
+
+  template
+  struct Seq<0, I, Collect> : Collect { };
+};
+
+A::Seq<4> test;




Re: [PATCH] c++: overeager type completion in convert_to_void [PR111419]

2023-09-16 Thread Jason Merrill via Gcc-patches

On 9/15/23 12:03, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Here convert_to_void always completes the type of an INDIRECT_REF or
VAR_DECL expression, but according to [expr.context] an lvalue-to-rvalue
conversion is applied to a discarded-value expression only if "the
expression is a glvalue of volatile-qualified type".  This patch restricts
convert_to_void's type completion accordingly.

PR c++/111419

gcc/cp/ChangeLog:

* cvt.cc (convert_to_void) : Only call
complete_type if the type is volatile and the INDIRECT_REF
isn't an implicit one.


Hmm, what does implicit have to do with it?  The expression forms listed 
in https://eel.is/c++draft/expr.context#2 include "id-expression"...



diff --git a/gcc/testsuite/g++.dg/expr/discarded1a.C 
b/gcc/testsuite/g++.dg/expr/discarded1a.C
new file mode 100644
index 000..5516ff46fe9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/discarded1a.C
@@ -0,0 +1,16 @@
+// PR c++/111419
+
+struct Incomplete;
+
+template struct Holder { T t; }; // { dg-error "incomplete" }
+
+extern volatile Holder a;
+extern volatile Holder& b;
+extern volatile Holder* c;
+
+int main() {
+  a; // { dg-message "required from here" }
+  b; // { dg-warning "implicit dereference will not access object" }
+ // { dg-bogus "required from here" "" { target *-*-* } .-1 }


...so it seems to me this line should get the lvalue-rvalue conversion 
(and not the warning about no access).



+  *c; // { dg-message "required from here" }
+}




Re: [PATCH] c++: visibility wrt template and ptrmem targs [PR70413]

2023-09-16 Thread Jason Merrill via Gcc-patches

On 9/15/23 12:03, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

When constraining the visibility of an instantiation, we weren't
properly considering the visibility of PTRMEM_CST and TEMPLATE_DECL
template arguments.

PR c++/70413

gcc/cp/ChangeLog:

* decl2.cc (min_vis_expr_r): Handle PTRMEM_CST and TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/abi/no-linkage-expr2.C: New test.
* g++.dg/abi/no-linkage-expr3.C: New test.
---
  gcc/cp/decl2.cc | 18 ++
  gcc/testsuite/g++.dg/abi/no-linkage-expr2.C | 15 +++
  gcc/testsuite/g++.dg/abi/no-linkage-expr3.C | 17 +
  3 files changed, 46 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/abi/no-linkage-expr2.C
  create mode 100644 gcc/testsuite/g++.dg/abi/no-linkage-expr3.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index b402befba6d..5006372a646 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2582,7 +2582,10 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
int *vis_p = (int *)data;
int tpvis = VISIBILITY_DEFAULT;
  
-  switch (TREE_CODE (*tp))

+  tree t = *tp;
+  if (TREE_CODE (t) == PTRMEM_CST)
+t = PTRMEM_CST_MEMBER (t);
+  switch (TREE_CODE (t))
  {
  case CAST_EXPR:
  case IMPLICIT_CONV_EXPR:
@@ -2593,15 +2596,22 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
  case NEW_EXPR:
  case CONSTRUCTOR:
  case LAMBDA_EXPR:
-  tpvis = type_visibility (TREE_TYPE (*tp));
+  tpvis = type_visibility (TREE_TYPE (t));
break;
  
+case TEMPLATE_DECL:

+  t = DECL_TEMPLATE_RESULT (t);
+  /* Fall through.  */
  case VAR_DECL:
  case FUNCTION_DECL:
-  if (! TREE_PUBLIC (*tp))
+  if (! TREE_PUBLIC (t))
tpvis = VISIBILITY_ANON;
else
-   tpvis = DECL_VISIBILITY (*tp);
+   tpvis = DECL_VISIBILITY (t);
+  break;
+
+case FIELD_DECL:
+  tpvis = type_visibility (DECL_CONTEXT (t));
break;
  
  default:

diff --git a/gcc/testsuite/g++.dg/abi/no-linkage-expr2.C 
b/gcc/testsuite/g++.dg/abi/no-linkage-expr2.C
new file mode 100644
index 000..db23570bb08
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/no-linkage-expr2.C
@@ -0,0 +1,15 @@
+// PR c++/70413
+// { dg-do compile { target c++11 } }
+// { dg-final { scan-assembler-not "weak.*_Z" } }
+
+namespace {
+  template struct A;
+  template using B = int;
+}
+
+template class Q> void f() { }
+
+int main() {
+  f();
+  f();
+}
diff --git a/gcc/testsuite/g++.dg/abi/no-linkage-expr3.C 
b/gcc/testsuite/g++.dg/abi/no-linkage-expr3.C
new file mode 100644
index 000..a2db1a45c74
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/no-linkage-expr3.C
@@ -0,0 +1,17 @@
+// PR c++/70413
+// { dg-final { scan-assembler-not "weak.*_Z" } }
+
+namespace {
+  struct A {
+void f();
+int m;
+  };
+}
+
+template void g() { }
+template void h() { }
+
+int main() {
+  g<::f>();
+  h<::m>();
+}




Re: [PATCH] c++: constness of decltype of NTTP object [PR98820]

2023-09-16 Thread Jason Merrill via Gcc-patches

On 9/15/23 13:55, Patrick Palka wrote:

This corrects decltype of a (class) NTTP object as per
[dcl.type.decltype]/1.2 and [temp.param]/6 in the type-dependent case.
In the non-dependent case (nontype-class8.C) we resolve the decltype
ahead of time, and finish_decltype_type already made sure to drop the
const VIEW_CONVERT_EXPR wrapper around the TEMPLATE_PARM_INDEX.


Hmm, seems like dropping the VIEW_CONVERT_EXPR is wrong in this case? 
I'm not sure why I added that.


Jason



Re: [PATCH v6] c++: Move consteval folding to cp_fold_r

2023-09-16 Thread Jason Merrill via Gcc-patches

On 9/15/23 16:32, Marek Polacek wrote:

On Fri, Sep 15, 2023 at 02:08:46PM -0400, Jason Merrill wrote:

On 9/13/23 20:02, Marek Polacek wrote:

On Wed, Sep 13, 2023 at 05:57:47PM -0400, Jason Merrill wrote:

On 9/13/23 16:56, Marek Polacek wrote:

On Tue, Sep 12, 2023 at 05:26:25PM -0400, Jason Merrill wrote:

On 9/8/23 14:24, Marek Polacek wrote:

+  switch (TREE_CODE (stmt))
+{
+/* Unfortunately we must handle code like
+false ? bar () : 42
+   where we have to check bar too.  */
+case COND_EXPR:
+  if (cp_fold_immediate_r (_OPERAND (stmt, 1), walk_subtrees, data))
+   return error_mark_node;
+  if (TREE_OPERAND (stmt, 2)
+ && cp_fold_immediate_r (_OPERAND (stmt, 2), walk_subtrees, data))
+   return error_mark_node;


Is this necessary?  Doesn't walk_tree already walk into the arms of
COND_EXPR?


Unfortunately yes.  The cp_fold call in cp_fold_r could fold the ?: into
a constant before we see it here.  I've added a comment saying just that.


Ah.  But in that case I guess we need to walk into the arms, not just check
the top-level expression in them.

Arg, of course.  I was fooled into thinking that it would recurse, but
you're right.  Fixed by using cp_walk_tree as I intended.  Tested in
consteval34.C.


But maybe cp_fold_r should do that before the cp_fold, instead of this
function?


I...am not sure how that would be better than what I did.


Callers of cp_fold_immediate don't need this because cp_fold_r isn't
involved, so it isn't folding anything.


This is true.
  

cp_fold_r can walk the arms with cp_fold_r and then clear *walk_subtrees to
avoid walking the arms again normally.


I didn't think we wanted to do everything cp_fold_r does even in dead
branches, but ok.


Ah, that's a good point.  With the recursive walk in 
cp_fold_immediate_r, I suppose we could suppress it when called from 
cp_fold_immediate with a new fold_flag?  That would still allow for 
cp_walk_tree_without_duplicates.


Incidentally, I notice you check for null op2 of COND_EXPR, should 
probably also check op1.


Jason



Re: [PATCH] core: Support heap-based trampolines

2023-09-16 Thread Iain Sandoe
Hi Richard,

> On 14 Sep 2023, at 11:18, Richard Biener  wrote:
> 
> On Wed, Sep 6, 2023 at 5:44 PM FX Coudert  wrote:
>> 

>> ping**2 on the revised patch, for Richard or another global reviewer. So far 
>> all review feedback is that it’s a step forward, and it’s been widely used 
>> for both aarch64-darwin and x86_64-darwin distributions for almost three 
>> years now.
>> 
>> OK to commit?
> 
> I just noticed that ftrampoline-impl isn't Optimize, thus it's not
> streamed with LTO.

I think this is fine, the nested pass runs before LTO streaming and lowers to 
the relevant built-ins for the chosen impl.
The builtins are distinct and can co-exist in the linked exe,

>  How does mixing different -ftrampoline-impl for different LTO TUs behave?

Assuming that a target can support multiple implementations, then each is 
applied local to a single TU.  The nested functions are scoped within their 
parent and thus should not be candidates for merging by LTO.

For a target that cannot support both, then one or more of the TUs should be 
rejected before we even get to LTO.

>  How does mis-specifying -ftrampoline-impl at LTO link time compared to 
> compile-time behave?

The flag should be a  NOP at LTO link time (but I do not think we want to 
reject it, that would probably create other issues?)

>  Is the state fully reflected during pre-IPA compilation and the flag not 
> needed after that?  

yes, that is my understanding, nested runs very early.

> It appears so, but did you check?

I actually checked on x86_64-darwin (which does support both) and we see…
here with two tus with nested fns and a third with the main().

$ nm -mapv ./nn.ltrans0.ltrans.o

as expected, two instances of the nested “bar”.

01a8 (__TEXT,__cstring) non-external lC0
001f (__TEXT,__text) non-external _bar.0.lto_priv.0 
01d0 (__TEXT,__cstring) non-external lC1
00ec (__TEXT,__text) non-external _bar.0.lto_priv.1
007c (__TEXT,__text) external _foo_1
0149 (__TEXT,__text) external _foo_2
 (__TEXT,__text) external _main

>>> these for heap-based:
 (undefined) external ___builtin_nested_func_ptr_created 
 (undefined) external ___builtin_nested_func_ptr_deleted

>>> this for stack-based.
 (undefined) external ___enable_execute_stack

(and the code executes as expected).

> OK if that's a non-issue.

thanks, we'll wait a day or two in case of any follow-on comments,
Iain

P.S. I was investigating some unrelated unwinder issues a couple of weeks ago, 
but that did highlight that we have a possibility to avoid the leaks from 
longjump if we hang on the forced_unwind() machinery [TODO, tho, not part of 
this initial patch]


> 
> Thanks,
> Richard.
> 
>> FX
>> 
>> 
>> 
>>> Le 5 août 2023 à 16:20, FX Coudert  a écrit :
>>> 
>>> Hi Richard,
>>> 
>>> Thanks for your feedback. Here is an amended version of the patch, taking 
>>> into consideration your requests and the following discussion. There is no 
>>> configure option for the libgcc part, and the documentation is amended. The 
>>> patch is split into three commits for core, target and libgcc.
>>> 
>>> Currently regtesting on x86_64 linux and darwin (it was fine before I split 
>>> up into three commits, so I’m re-testing to make sure I didn’t screw 
>>> anything up).
>>> 
>>> OK to commit?
>>> FX
>> 



[PATCH] MATCH: Add simplifications of `(a == CST) & a`

2023-09-16 Thread Andrew Pinski via Gcc-patches
`(a == CST) & a` can be either simplified to simplying `a == CST`
or 0 depending on the first bit of the CST.
This is an extension of the already pattern of `X & !X` and allows
us to remove the 2 xfails on gcc.dg/binop-notand1a.c and 
gcc.dg/binop-notand4a.c.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111431

gcc/ChangeLog:

* match.pd (`(a == CST) & a`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/binop-notand1a.c: Remove xfail.
* gcc.dg/binop-notand4a.c: Likewise.
* gcc.c-torture/execute/pr111431-1.c: New test.
* gcc.dg/binop-andeq1.c: New test.
* gcc.dg/binop-andeq2.c: New test.
* gcc.dg/binop-notand7.c: New test.
* gcc.dg/binop-notand7a.c: New test.
---
 gcc/match.pd  |  8 
 .../gcc.c-torture/execute/pr111431-1.c| 39 +++
 gcc/testsuite/gcc.dg/binop-andeq1.c   | 12 ++
 gcc/testsuite/gcc.dg/binop-andeq2.c   | 14 +++
 gcc/testsuite/gcc.dg/binop-notand1a.c |  4 +-
 gcc/testsuite/gcc.dg/binop-notand4a.c |  4 +-
 gcc/testsuite/gcc.dg/binop-notand7.c  | 12 ++
 gcc/testsuite/gcc.dg/binop-notand7a.c | 12 ++
 8 files changed, 99 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111431-1.c
 create mode 100644 gcc/testsuite/gcc.dg/binop-andeq1.c
 create mode 100644 gcc/testsuite/gcc.dg/binop-andeq2.c
 create mode 100644 gcc/testsuite/gcc.dg/binop-notand7.c
 create mode 100644 gcc/testsuite/gcc.dg/binop-notand7a.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ebb50ee0581..65960a1701e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5172,6 +5172,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  )
 )
 
+/* `(a == CST) & a` can be simplified to `0` or `(a == CST)` depending
+   on the first bit of the CST.  */
+(simplify
+ (bit_and:c (convert@2 (eq @0 INTEGER_CST@1)) (convert? @0))
+ (if ((wi::to_wide (@1) & 1) != 0)
+  @2
+  { build_zero_cst (type); }))
+
 /* Optimize
# x_5 in range [cst1, cst2] where cst2 = cst1 + 1
x_5 ? cstN ? cst4 : cst3
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c
new file mode 100644
index 000..a96dbadf2b5
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c
@@ -0,0 +1,39 @@
+int
+foo (int a)
+{
+  int b = a == 0;
+  return (a & b);
+}
+
+#define function(vol,cst) \
+__attribute__((noipa)) \
+_Bool func_##cst##_##vol(vol int a) \
+{ \
+  vol int b = a == cst; \
+  return (a & b); \
+}
+
+#define funcdefs(cst) \
+function(,cst) \
+function(volatile,cst)
+
+#define funcs(f) \
+f(0) \
+f(1) \
+f(5)
+
+funcs(funcdefs)
+
+#define test(cst) \
+do { \
+ if(func_##cst##_(a) != func_##cst##_volatile(a))\
+   __builtin_abort(); \
+} while(0);
+int main(void)
+{
+  for(int a = -10; a <= 10; a++)
+   {
+ funcs(test)
+   }
+}
+
diff --git a/gcc/testsuite/gcc.dg/binop-andeq1.c 
b/gcc/testsuite/gcc.dg/binop-andeq1.c
new file mode 100644
index 000..2a92b8f95df
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/binop-andeq1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/111431 */
+
+int
+foo (int a)
+{
+  int b = a == 2;
+  return (a & b);
+}
+
+/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/binop-andeq2.c 
b/gcc/testsuite/gcc.dg/binop-andeq2.c
new file mode 100644
index 000..895262fc17e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/binop-andeq2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/111431 */
+
+int
+foo (int a)
+{
+  int b = a == 1025;
+  return (a & b);
+}
+
+/* { dg-final { scan-tree-dump-not "return 0"  "optimized" } } */
+/* { dg-final { scan-tree-dump-not " & "  "optimized" } } */
+/* { dg-final { scan-tree-dump-times " == 1025;" 1  "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/binop-notand1a.c 
b/gcc/testsuite/gcc.dg/binop-notand1a.c
index c7e932b2638..d94685eb4ce 100644
--- a/gcc/testsuite/gcc.dg/binop-notand1a.c
+++ b/gcc/testsuite/gcc.dg/binop-notand1a.c
@@ -7,6 +7,4 @@ foo (char a, unsigned short b)
   return (a & !a) | (b & !b);
 }
 
-/* As long as comparisons aren't boolified and casts from boolean-types
-   aren't preserved, the folding of  X & !X to zero fails.  */
-/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" { xfail *-*-* } 
} } */
+/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized"  } } */
diff --git a/gcc/testsuite/gcc.dg/binop-notand4a.c 
b/gcc/testsuite/gcc.dg/binop-notand4a.c
index dce6a5c7eb5..bd9c7cce638 100644
--- a/gcc/testsuite/gcc.dg/binop-notand4a.c
+++ b/gcc/testsuite/gcc.dg/binop-notand4a.c
@@ -7,6 +7,4 @@ foo (unsigned char a, _Bool b)
   return (!a & a) | (b & !b);
 }
 
-/* As long as comparisons aren't boolified and casts from boolean-types
-   aren't 

Re: [PATCH] LoongArch: Fix lo_sum rtx cost

2023-09-16 Thread WANG Xuerui

Hi,

On 9/16/23 17:16, mengqinggang wrote:

The cost of lo_sum rtx for addi.d instruction my be a very big number if
computed by common function. It may cause some symbols saving to stack and
loading from stack if there no enough registers during loop optimization.


Thanks for the patch! It seems though this change is done in order to 
optimize some previously pathetic codegen, am I right? If so, it's 
appreciated to have a minimal test case attached, in order to ensure 
that codegen never regresses. (You can have your teammates help you if 
you're not familiar with that.)




gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum cost.
---
  gcc/config/loongarch/loongarch.cc | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 845fad5a8e8..0e57f09379c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = COSTS_N_INSNS (4);
return false;
  
+case LO_SUM:

+  *total = set_src_cost (XEXP (x, 0), mode, speed);
+  return true;
+
In order for the code to be more maintainable, it may be better to 
duplicate some of the change reasons here, just in case someone in the 
future questions this piece of code that's without any explanation, and 
regresses things (because there's no test case).

  case LT:
  case LTU:
  case LE:


RE: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-16 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Friday, September 15, 2023 11:44 PM
To: 钟居哲 ; Jeff Law ; kito.cheng 

Cc: rdapp@gmail.com; gcc-patches ; kito.cheng 

Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

> You mean this patch is ok?

I thought about it a bit more.  From my point of view the patch is OK
for now in order to get the bug out of the way.

In the longer term I would really prefer a more "regular" solution
(i.e. via hard_regno_mode_ok) and related.  I can take care of that
once I have a bit of time but for now let's go ahead.

Regards
 Robin


[PATCH] LoongArch: Fix lo_sum rtx cost

2023-09-16 Thread mengqinggang
The cost of lo_sum rtx for addi.d instruction my be a very big number if
computed by common function. It may cause some symbols saving to stack and
loading from stack if there no enough registers during loop optimization.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum cost.
---
 gcc/config/loongarch/loongarch.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 845fad5a8e8..0e57f09379c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = COSTS_N_INSNS (4);
   return false;
 
+case LO_SUM:
+  *total = set_src_cost (XEXP (x, 0), mode, speed);
+  return true;
+
 case LT:
 case LTU:
 case LE:
-- 
2.36.0



Re: RFC: Introduce -fhardened to enable security-related flags

2023-09-16 Thread Martin Uecker
Am Freitag, dem 15.09.2023 um 11:11 -0400 schrieb Marek Polacek:
> On Wed, Aug 30, 2023 at 10:46:14AM +0200, Martin Uecker wrote:
> > > Improving the security of software has been a major trend in the recent
> > > years.  Fortunately, GCC offers a wide variety of flags that enable extra
> > > hardening.  These flags aren't enabled by default, though.  And since
> > > there are a lot of hardening flags, with more to come, it's been difficult
> > > to keep on top of them; more so for the users of GCC who ought not to be
> > > expected to keep track of all the new options.
> > > 
> > > To alleviate some of the problems I mentioned, we thought it would
> > > be useful to provide a new umbrella option that enables a reasonable set
> > > of hardening flags.  What's "reasonable" in this context is not easy to
> > > pin down.  Surely, there must be no ABI impact, the option cannot cause
> > > severe performance issues, and, I suspect, it should not cause build
> > > errors by enabling stricter compile-time errors (such as, -Wimplicit-int,
> > > -Wint-conversion).  Including a controversial option in -fhardened
> > > would likely cause that users would not use -fhardened at all.  It's
> > > roughly akin to -Wall or -O2 -- those also enable a reasonable set of
> > > options, and evolve over time, and are not kept in sync with other
> > > compilers.
> > > 
> > > Currently, -fhardened enables:
> > > 
> > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
> > >   -D_GLIBCXX_ASSERTIONS
> > >   -ftrivial-auto-var-init=zero
> > >   -fPIE  -pie  -Wl,-z,relro,-z,now
> > >   -fstack-protector-strong
> > >   -fstack-clash-protection
> > >   -fcf-protection=full (x86 GNU/Linux only)
> > > 
> > > -fsanitize=undefined is specifically not enabled.  -fstrict-flex-arrays is
> > > also liable to break a lot of code so I didn't include it.
> > > 
> > > Appended is a proof-of-concept patch.  It doesn't implement 
> > > --help=hardened
> > > yet.  A fairly crucial point is that -fhardened will not override options
> > > that were specified on the command line (before or after -fhardened).  For
> > > example,
> > >  
> > >  -D_FORTIFY_SOURCE=1 -fhardened
> > > 
> > > means that _FORTIFY_SOURCE=1 will be used.  Similarly,
> > > 
> > >   -fhardened -fstack-protector
> > > 
> > > will not enable -fstack-protector-strong.
> > > 
> > > Thoughts?
> > 
> > I think this is a great idea!  Considering that it is difficult to
> > decide what shoud be activated and what not and the baseline should
> > not cause compile errors,  I wonder whether there should be higher
> > levels  similar to -O1,2,3 ? 
>  
> Thanks.  I would like to avoid any levels if at all possible; I think
> they would be confusing.
> 
> > Although it would be nice to have a one-letter or very short
> > option similar to -O2 or -Wall, but maybe this is not possible 
> > because all short ones are already taken. Of course, 
> > "-fhardening" would  already a huge  improvement to the 
> > current situation.
> 
> There are some free ones, like -Z, but I'm not confident I could take
> it :).
> 

It would send a message.

Today I can get crazy optimizations with 

-O3 

but for (somewhat) decent security, I need something
like:

 -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
  -D_GLIBCXX_ASSERTIONS
  -ftrivial-auto-var-init=pattern
  -fPIE  -pie  -Wl,-z,relro,-z,now
  -fstack-protector-strong
  -fstack-clash-protection
  -fcf-protection=full 
  -fsanitize=undefined
  -fsanitize-undefined-trap-on-error
  -Wall
  -Wextra

which also sends a message.

Martin