Re: RFC: Introduce -fhardened to enable security-related flags
Hans-Peter Nilsson via Gcc-patches writes: >> Date: Tue, 29 Aug 2023 15:42:27 -0400 >> From: Marek Polacek via Gcc-patches > >> Surely, there must be no ABI impact, the option cannot cause >> severe performance issues, > >> Currently, -fhardened enables: > ... >> -ftrivial-auto-var-init=zero > >> Thoughts? > > Regarding -ftrivial-auto-var-init=zero, I was consulted when > colleagues investigating a performance regression > pint-pointed it as *causing severe performance issues*; > cf. https://github.com/systemd/systemd.git commit 1a4e392760 > (TL;DR: adds "-ftrivial-auto-var-init=zero" to the systemd > build). > > The situation was described as "we noticed that some test > suites takes 35% percent longer time to finish. After > further investigation it was noticed that running systemctl > unmask x takes around 5s more time on [version including > patch vs. before that patch]" (timing out some tests). > Reverting that patch fixed the drop in performance. Did some bug ever get filed for this to see if we can do a bit better here? Some slowdown doesn't mean it's of the expected magnitude. > > Just a data point, but I believe also exactly your intended > use. IMO including -ftrivial-auto-var-init is worth extra > consideration. > > Alternatively, strike the while "cannot cause severe > performance issues". > > brgds, H-P
Re: RFC: Introduce -fhardened to enable security-related flags
> Date: Tue, 29 Aug 2023 15:42:27 -0400 > From: Marek Polacek via Gcc-patches > Surely, there must be no ABI impact, the option cannot cause > severe performance issues, > Currently, -fhardened enables: ... > -ftrivial-auto-var-init=zero > Thoughts? Regarding -ftrivial-auto-var-init=zero, I was consulted when colleagues investigating a performance regression pint-pointed it as *causing severe performance issues*; cf. https://github.com/systemd/systemd.git commit 1a4e392760 (TL;DR: adds "-ftrivial-auto-var-init=zero" to the systemd build). The situation was described as "we noticed that some test suites takes 35% percent longer time to finish. After further investigation it was noticed that running systemctl unmask x takes around 5s more time on [version including patch vs. before that patch]" (timing out some tests). Reverting that patch fixed the drop in performance. Just a data point, but I believe also exactly your intended use. IMO including -ftrivial-auto-var-init is worth extra consideration. Alternatively, strike the while "cannot cause severe performance issues". brgds, H-P
[PATCH] RISC-V: Support VLS modes reduction[PR111153]
This patch supports VLS reduction vectorization. It can optimize the current reduction vectorization codegen with current COST model. #define DEF_REDUC_PLUS(TYPE)\ TYPE __attribute__ ((noinline, noclone))\ reduc_plus_##TYPE (TYPE * __restrict a, int n) \ { \ TYPE r = 0; \ for (int i = 0; i < n; ++i) \ r += a[i]; \ return r; \ } #define TEST_PLUS(T)\ T (int32_t) \ TEST_PLUS (DEF_REDUC_PLUS) Before this patch: vle32.v v2,0(a5) addia5,a5,16 vadd.vv v1,v1,v2 bne a5,a4,.L4 lui a4,%hi(.LC0) lui a5,%hi(.LC1) addia4,a4,%lo(.LC0) vlm.v v0,0(a4) addia5,a5,%lo(.LC1) andia1,a1,-4 vmv1r.v v2,v3 vlm.v v4,0(a5) vcompress.vmv2,v1,v0 vmv1r.v v0,v4 vadd.vv v1,v2,v1 vcompress.vmv3,v1,v0 vadd.vv v3,v3,v1 vmv.x.s a0,v3 sext.w a0,a0 beq a3,a1,.L12 After this patch: vle32.v v2,0(a5) addia5,a5,16 vadd.vv v1,v1,v2 bne a5,a4,.L4 li a5,0 andia1,a1,-4 vmv.s.x v2,a5 vredsum.vs v1,v1,v2 vmv.x.s a0,v1 beq a3,a1,.L12 gcc/ChangeLog: * config/riscv/autovec.md: Add VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS mode reduction case. * gcc.target/riscv/rvv/autovec/vls/reduc-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-10.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-11.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-12.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-13.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-14.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-15.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-16.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-17.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-18.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-19.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-20.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-21.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-7.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-8.c: New test. * gcc.target/riscv/rvv/autovec/vls/reduc-9.c: New test. --- gcc/config/riscv/autovec.md | 2 +- .../gcc.target/riscv/rvv/autovec/vls/def.h| 30 +++ .../riscv/rvv/autovec/vls/reduc-1.c | 31 +++ .../riscv/rvv/autovec/vls/reduc-10.c | 50 .../riscv/rvv/autovec/vls/reduc-11.c | 46 +++ .../riscv/rvv/autovec/vls/reduc-12.c | 30 +++ .../riscv/rvv/autovec/vls/reduc-13.c | 28 +++ .../riscv/rvv/autovec/vls/reduc-14.c | 26 ++ .../riscv/rvv/autovec/vls/reduc-15.c | 81 +++ .../riscv/rvv/autovec/vls/reduc-16.c | 75 + .../riscv/rvv/autovec/vls/reduc-17.c | 69 .../riscv/rvv/autovec/vls/reduc-18.c | 63 +++ .../riscv/rvv/autovec/vls/reduc-19.c | 18 + .../riscv/rvv/autovec/vls/reduc-2.c | 29 +++ .../riscv/rvv/autovec/vls/reduc-20.c | 17 .../riscv/rvv/autovec/vls/reduc-21.c | 16 .../riscv/rvv/autovec/vls/reduc-3.c | 27 +++ .../riscv/rvv/autovec/vls/reduc-4.c | 25 ++ .../riscv/rvv/autovec/vls/reduc-5.c | 18 + .../riscv/rvv/autovec/vls/reduc-6.c | 17 .../riscv/rvv/autovec/vls/reduc-7.c | 16 .../riscv/rvv/autovec/vls/reduc-8.c | 58 + .../riscv/rvv/autovec/vls/reduc-9.c | 54 + 23 files changed, 825 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc-14.c create mode 100644
[PATCH] MATCH: Avoid recusive zero_one_valued_p for conversions
So when VN finds a name which has a nop conversion, it says both names are equivalent to each other and the valuaization function for one will return the other. This normally does not cause any issues as there is no recusive matches. But after r14-4038-gb975c0dc3be285, there was one added. So we would do an infinite recusion on the match and never finish. This fixes the issue (and adds a comment in match.pd) by for converts just handle one level instead of being recusive always. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note the testcase was reduced from tree-ssa-loop-niter.cc and then changed slightly into C rather than C++ but it still needs exceptions turned on get the IR that VN would produce this equivalence relationship going on. Also had to turn off early inline to force put to be inlined later. PR tree-optimization/111435 gcc/ChangeLog: * match.pd (zero_one_valued_p): Don't do recusion on converts. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr111435-1.c: New test. --- gcc/match.pd | 8 +++- .../gcc.c-torture/compile/pr111435-1.c | 18 ++ 2 files changed, 25 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111435-1.c diff --git a/gcc/match.pd b/gcc/match.pd index 97405e6a5c3..887665633d4 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2188,8 +2188,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* A conversion from an zero_one_valued_p is still a [0,1]. This is useful when the range of a variable is not known */ +/* Note this matches can't be recusive because of the way VN handles + nop conversions being equivalent and then recusive between them. */ (match zero_one_valued_p - (convert@0 zero_one_valued_p)) + (convert@0 @1) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && (TYPE_UNSIGNED (TREE_TYPE (@1)) + || TYPE_PRECISION (TREE_TYPE (@1)) > 1) + && wi::leu_p (tree_nonzero_bits (@1), 1 /* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 }. */ (simplify diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c b/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c new file mode 100644 index 000..afa84dd59dd --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr111435-1.c @@ -0,0 +1,18 @@ +/* { dg-options "-fexceptions -fno-early-inlining" } */ +/* { dg-require-effective-target exceptions } */ + +void find_slot_with_hash(const int *); + +void put(const int *k, const int *) { +find_slot_with_hash(k); +} +unsigned len(); +int *address(); +void h(int header, int **bounds) { + if (!*bounds) +return; + unsigned t = *bounds ? len() : 0; + int queue_index = t; + address()[(unsigned)queue_index] = 0; + put(, _index); +} -- 2.31.1
Re: [PATCH] LoongArch: Fix lo_sum rtx cost
在 2023/9/16 下午10:52, WANG Xuerui 写道: Hi, On 9/16/23 17:16, mengqinggang wrote: The cost of lo_sum rtx for addi.d instruction my be a very big number if computed by common function. It may cause some symbols saving to stack and loading from stack if there no enough registers during loop optimization. Thanks for the patch! It seems though this change is done in order to optimize some previously pathetic codegen, am I right? If so, it's appreciated to have a minimal test case attached, in order to ensure that codegen never regresses. (You can have your teammates help you if you're not familiar with that.) This is a performance optimization problem discovered by Meng Qinggang when he was debugging the spec. The specific test cases are not easy to extract. We will try to extract simple test cases to reproduce this optimization. If not, we will mark the description information. Thanks! gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum cost. --- gcc/config/loongarch/loongarch.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 845fad5a8e8..0e57f09379c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = COSTS_N_INSNS (4); return false; + case LO_SUM: + *total = set_src_cost (XEXP (x, 0), mode, speed); + return true; + In order for the code to be more maintainable, it may be better to duplicate some of the change reasons here, just in case someone in the future questions this piece of code that's without any explanation, and regresses things (because there's no test case). case LT: case LTU: case LE:
Re: [PATCH] c++: constness of decltype of NTTP object [PR98820]
On Sat, 16 Sep 2023, Jason Merrill wrote: > On 9/15/23 13:55, Patrick Palka wrote: > > This corrects decltype of a (class) NTTP object as per > > [dcl.type.decltype]/1.2 and [temp.param]/6 in the type-dependent case. > > In the non-dependent case (nontype-class8.C) we resolve the decltype > > ahead of time, and finish_decltype_type already made sure to drop the > > const VIEW_CONVERT_EXPR wrapper around the TEMPLATE_PARM_INDEX. > > Hmm, seems like dropping the VIEW_CONVERT_EXPR is wrong in this case? I'm not > sure why I added that. Ah sorry, my commit message was a bit sloppy. In the non-dependent case we resolve the decltype ahead of time, in which case finish_decltype_type drops the const VIEW_CONVERT_EXPR wrapper around the TEMPLATE_PARM_INDEX, and the latter has the desired non-const type. In the type-dependent case, tsubst drops the VIEW_CONVERT_EXPR because the substituted class NTTP is the already const object created by get_template_parm_object. So finish_decltype_type at instantiation time sees the bare const object, which this patch now adds special handling for. So we need to continue dropping the VIEW_CONVERT_EXPR to handle the non-dependent case. > > Jason > >
Re: [PATCH] c++: overeager type completion in convert_to_void [PR111419]
On Sat, 16 Sep 2023, Jason Merrill wrote: > On 9/15/23 12:03, Patrick Palka wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for > > trunk? > > > > -- >8 -- > > > > Here convert_to_void always completes the type of an INDIRECT_REF or > > VAR_DECL expression, but according to [expr.context] an lvalue-to-rvalue > > conversion is applied to a discarded-value expression only if "the > > expression is a glvalue of volatile-qualified type". This patch restricts > > convert_to_void's type completion accordingly. > > > > PR c++/111419 > > > > gcc/cp/ChangeLog: > > > > * cvt.cc (convert_to_void) : Only call > > complete_type if the type is volatile and the INDIRECT_REF > > isn't an implicit one. > > Hmm, what does implicit have to do with it? The expression forms listed in > https://eel.is/c++draft/expr.context#2 include "id-expression"... When there's an implicit INDIRECT_REF, I reckoned the type of the id-expression is really a reference type, which can't be cv-qualified? > > > diff --git a/gcc/testsuite/g++.dg/expr/discarded1a.C > > b/gcc/testsuite/g++.dg/expr/discarded1a.C > > new file mode 100644 > > index 000..5516ff46fe9 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/expr/discarded1a.C > > @@ -0,0 +1,16 @@ > > +// PR c++/111419 > > + > > +struct Incomplete; > > + > > +template struct Holder { T t; }; // { dg-error "incomplete" } > > + > > +extern volatile Holder a; > > +extern volatile Holder& b; > > +extern volatile Holder* c; > > + > > +int main() { > > + a; // { dg-message "required from here" } > > + b; // { dg-warning "implicit dereference will not access object" } > > + // { dg-bogus "required from here" "" { target *-*-* } .-1 } > > ...so it seems to me this line should get the lvalue-rvalue conversion (and > not the warning about no access). > > > + *c; // { dg-message "required from here" } > > +} > >
Re: [PATCH] c++: always check arity before deduction
On 9/12/23 20:33, Patrick Palka wrote: Bootstrpaped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? OK. -- >8 -- This simple patch extends the r12-3271-gf1e73199569287 optimization to apply to deduction without explicit template arguments as well. The motivation for this is to accept testcases such as conv20.C and ttp40.C below, which don't use explicit template arguments but for which unnecessary template instantiation during deduction could be avoided if we pruned overloads according to arity early in this case as well. This incidentally causes us to accept one reduced testcase from PR c++/84075, but the underlying issue there still remains unfixed. As an added bonus, this change ends up causing the "candidate expects N argument(s)" note during overload resolution failure to point to the template candidate instead of the call site, which seems like an improvement similar to r14-309-g14e881eb030509. gcc/cp/ChangeLog: * call.cc (add_template_candidate_real): Check arity even when there are no explicit template arguments. Combine the two adjacent '!obj' tests into one. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/vt-57397-1.C: Expect "candidate expects ... N argument(s)" at the declaration site instead of the call site. * g++.dg/cpp0x/vt-57397-2.C: Likewise. * g++.dg/overload/template5.C: Likewise. * g++.dg/template/local6.C: Likewise. * g++.dg/template/conv20.C: New test. * g++.dg/template/ttp40.C: New test. --- gcc/cp/call.cc| 14 ++--- gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C | 6 +++--- gcc/testsuite/g++.dg/cpp0x/vt-57397-2.C | 6 +++--- gcc/testsuite/g++.dg/overload/template5.C | 4 ++-- gcc/testsuite/g++.dg/template/conv20.C| 17 +++ gcc/testsuite/g++.dg/template/local6.C| 4 ++-- gcc/testsuite/g++.dg/template/ttp40.C | 25 +++ 7 files changed, 58 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.dg/template/conv20.C create mode 100644 gcc/testsuite/g++.dg/template/ttp40.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 399345307ea..2bbaeee039d 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -3535,13 +3535,13 @@ add_template_candidate_real (struct z_candidate **candidates, tree tmpl, } gcc_assert (ia == nargs_without_in_chrg); - if (!obj && explicit_targs) + if (!obj) { /* Check that there's no obvious arity mismatch before proceeding with deduction. This avoids substituting explicit template arguments -into the template (which could result in an error outside the -immediate context) when the resulting candidate would be unviable -anyway. */ +into the template or e.g. derived-to-base parm/arg unification +(which could result in an error outside the immediate context) when +the resulting candidate would be unviable anyway. */ int min_arity = 0, max_arity = 0; tree parms = TYPE_ARG_TYPES (TREE_TYPE (tmpl)); parms = skip_artificial_parms_for (tmpl, parms); @@ -3571,11 +3571,7 @@ add_template_candidate_real (struct z_candidate **candidates, tree tmpl, reason = arity_rejection (NULL_TREE, max_arity, ia); goto fail; } -} - errs = errorcount+sorrycount; - if (!obj) -{ convs = alloc_conversions (nargs); if (shortcut_bad_convs @@ -3602,6 +3598,8 @@ add_template_candidate_real (struct z_candidate **candidates, tree tmpl, } } } + + errs = errorcount+sorrycount; fn = fn_type_unification (tmpl, explicit_targs, targs, args_without_in_chrg, nargs_without_in_chrg, diff --git a/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C b/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C index 440bea5b2f7..bac3b64ad7e 100644 --- a/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C +++ b/gcc/testsuite/g++.dg/cpp0x/vt-57397-1.C @@ -3,20 +3,20 @@ template void foo(T1, Tn...); +// { dg-message "candidate expects at least 1 argument, 0 provided" "" { target *-*-* } .-1 } template void bar(T1, T2, Tn...); +// { dg-message "candidate expects at least 2 arguments, 0 provided" "" { target *-*-* } .-1 } +// { dg-message "candidate expects at least 2 arguments, 1 provided" "" { target *-*-* } .-2 } int main() { foo(); // { dg-error "no matching" } - // { dg-message "candidate expects at least 1 argument, 0 provided" "" { target *-*-* } .-1 } foo(1); foo(1, 2); bar(); // { dg-error "no matching" } - // { dg-message "candidate expects at least 2 arguments, 0 provided" "" { target *-*-* } .-1 } bar(1); // { dg-error "no matching" } - // { dg-message "candidate expects at least 2 arguments, 1 provided" "" { target *-*-* } .-1 } bar(1, 2); bar(1, 2, 3); } diff --git a/gcc/testsuite/g++.dg/cpp0x/vt-57397-2.C
Re: [PATCH] c++: unifying identical tmpls from current inst [PR108347]
On 9/13/23 13:53, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? -- >8 -- Here more_specialized_partial_spec considers the two partial specializations to be unordered ultimately because unify for identical parm=arg=A::C returns failure due to C being dependent. This patch fixes this by relaxing unify's early-exit identity test to also accept dependent decls; we can't deduce anything further from them anyway. OK. In passing this patch removes the CONST_DECL case of unify: we should never see the CONST_DECL version of a template parameter here, and for other CONST_DECLs (such as enumerators) it seems we can rely on them already having been folded to their DECL_INITIAL. Hmm, I think I'd prefer to add a gcc_unreachable in case we decide to defer that folding at some point. Jason
Re: [PATCH] c++: optimize unification of class specializations [PR89231]
On 9/13/23 13:53, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? OK. -- >8 -- Since the LHS of a qualified-id is a non-deduced context, it effectively means we can't deduce from outer template arguments of a class template specialization. And checking for equality between the TI_TEMPLATE of a class specialization parm/arg already implies that the outer template arguments are the same. Hence recursing into outer template arguments during unification of class specializations is redundant, so this patch makes unify recurse only into innermost arguments. This incidentally fixes the testcase from PR89231 because there more_specialized_partial_inst considers the two partial specializations to be unordered ultimately because unify for identical parm=arg=A::Collect gets confused when it recurses into parm=arg={Ps...} since the level of Ps doesn't match the innermost level of tparms that we're actually deducing. PR c++/89231 gcc/cp/ChangeLog: * pt.cc (try_class_unification): Strengthen TI_TEMPLATE equality test by not calling most_general_template. Only unify the innermost levels of template arguments. (unify) : Only unify the innermost levels of template arguments. Don't unify template arguments if the template is not primary. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/variadic-partial3.C: New test. --- gcc/cp/pt.cc | 17 +++-- .../g++.dg/cpp0x/variadic-partial3.C | 19 +++ 2 files changed, 30 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 838179d5fe3..c88e9cd0fa6 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -23999,8 +23999,7 @@ try_class_unification (tree tparms, tree targs, tree parm, tree arg, return NULL_TREE; else if (TREE_CODE (parm) == BOUND_TEMPLATE_TEMPLATE_PARM) /* Matches anything. */; - else if (most_general_template (CLASSTYPE_TI_TEMPLATE (arg)) - != most_general_template (CLASSTYPE_TI_TEMPLATE (parm))) + else if (CLASSTYPE_TI_TEMPLATE (arg) != CLASSTYPE_TI_TEMPLATE (parm)) return NULL_TREE; /* We need to make a new template argument vector for the call to @@ -24041,8 +24040,10 @@ try_class_unification (tree tparms, tree targs, tree parm, tree arg, if (TREE_CODE (parm) == BOUND_TEMPLATE_TEMPLATE_PARM) err = unify_bound_ttp_args (tparms, targs, parm, arg, explain_p); else -err = unify (tparms, targs, CLASSTYPE_TI_ARGS (parm), -CLASSTYPE_TI_ARGS (arg), UNIFY_ALLOW_NONE, explain_p); +err = unify (tparms, targs, +INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)), +INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (arg)), +UNIFY_ALLOW_NONE, explain_p); return err ? NULL_TREE : arg; } @@ -25167,11 +25168,15 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, /* There's no chance of unification succeeding. */ return unify_type_mismatch (explain_p, parm, arg); - return unify (tparms, targs, CLASSTYPE_TI_ARGS (parm), - CLASSTYPE_TI_ARGS (t), UNIFY_ALLOW_NONE, explain_p); + if (PRIMARY_TEMPLATE_P (CLASSTYPE_TI_TEMPLATE (t))) + return unify (tparms, targs, + INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)), + INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (t)), + UNIFY_ALLOW_NONE, explain_p); } else if (!same_type_ignoring_top_level_qualifiers_p (parm, arg)) return unify_type_mismatch (explain_p, parm, arg); + return unify_success (explain_p); case METHOD_TYPE: diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C b/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C new file mode 100644 index 000..5af60711320 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C @@ -0,0 +1,19 @@ +// PR c++/89231 +// { dg-do compile { target c++11 } } + +template +struct A { + template + struct Collect { }; + + template> + struct Seq; + + template + struct Seq> : Seq> { }; + + template + struct Seq<0, I, Collect> : Collect { }; +}; + +A::Seq<4> test;
Re: [PATCH] c++: overeager type completion in convert_to_void [PR111419]
On 9/15/23 12:03, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? -- >8 -- Here convert_to_void always completes the type of an INDIRECT_REF or VAR_DECL expression, but according to [expr.context] an lvalue-to-rvalue conversion is applied to a discarded-value expression only if "the expression is a glvalue of volatile-qualified type". This patch restricts convert_to_void's type completion accordingly. PR c++/111419 gcc/cp/ChangeLog: * cvt.cc (convert_to_void) : Only call complete_type if the type is volatile and the INDIRECT_REF isn't an implicit one. Hmm, what does implicit have to do with it? The expression forms listed in https://eel.is/c++draft/expr.context#2 include "id-expression"... diff --git a/gcc/testsuite/g++.dg/expr/discarded1a.C b/gcc/testsuite/g++.dg/expr/discarded1a.C new file mode 100644 index 000..5516ff46fe9 --- /dev/null +++ b/gcc/testsuite/g++.dg/expr/discarded1a.C @@ -0,0 +1,16 @@ +// PR c++/111419 + +struct Incomplete; + +template struct Holder { T t; }; // { dg-error "incomplete" } + +extern volatile Holder a; +extern volatile Holder& b; +extern volatile Holder* c; + +int main() { + a; // { dg-message "required from here" } + b; // { dg-warning "implicit dereference will not access object" } + // { dg-bogus "required from here" "" { target *-*-* } .-1 } ...so it seems to me this line should get the lvalue-rvalue conversion (and not the warning about no access). + *c; // { dg-message "required from here" } +}
Re: [PATCH] c++: visibility wrt template and ptrmem targs [PR70413]
On 9/15/23 12:03, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? OK. -- >8 -- When constraining the visibility of an instantiation, we weren't properly considering the visibility of PTRMEM_CST and TEMPLATE_DECL template arguments. PR c++/70413 gcc/cp/ChangeLog: * decl2.cc (min_vis_expr_r): Handle PTRMEM_CST and TEMPLATE_DECL. gcc/testsuite/ChangeLog: * g++.dg/abi/no-linkage-expr2.C: New test. * g++.dg/abi/no-linkage-expr3.C: New test. --- gcc/cp/decl2.cc | 18 ++ gcc/testsuite/g++.dg/abi/no-linkage-expr2.C | 15 +++ gcc/testsuite/g++.dg/abi/no-linkage-expr3.C | 17 + 3 files changed, 46 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/abi/no-linkage-expr2.C create mode 100644 gcc/testsuite/g++.dg/abi/no-linkage-expr3.C diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc index b402befba6d..5006372a646 100644 --- a/gcc/cp/decl2.cc +++ b/gcc/cp/decl2.cc @@ -2582,7 +2582,10 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void *data) int *vis_p = (int *)data; int tpvis = VISIBILITY_DEFAULT; - switch (TREE_CODE (*tp)) + tree t = *tp; + if (TREE_CODE (t) == PTRMEM_CST) +t = PTRMEM_CST_MEMBER (t); + switch (TREE_CODE (t)) { case CAST_EXPR: case IMPLICIT_CONV_EXPR: @@ -2593,15 +2596,22 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void *data) case NEW_EXPR: case CONSTRUCTOR: case LAMBDA_EXPR: - tpvis = type_visibility (TREE_TYPE (*tp)); + tpvis = type_visibility (TREE_TYPE (t)); break; +case TEMPLATE_DECL: + t = DECL_TEMPLATE_RESULT (t); + /* Fall through. */ case VAR_DECL: case FUNCTION_DECL: - if (! TREE_PUBLIC (*tp)) + if (! TREE_PUBLIC (t)) tpvis = VISIBILITY_ANON; else - tpvis = DECL_VISIBILITY (*tp); + tpvis = DECL_VISIBILITY (t); + break; + +case FIELD_DECL: + tpvis = type_visibility (DECL_CONTEXT (t)); break; default: diff --git a/gcc/testsuite/g++.dg/abi/no-linkage-expr2.C b/gcc/testsuite/g++.dg/abi/no-linkage-expr2.C new file mode 100644 index 000..db23570bb08 --- /dev/null +++ b/gcc/testsuite/g++.dg/abi/no-linkage-expr2.C @@ -0,0 +1,15 @@ +// PR c++/70413 +// { dg-do compile { target c++11 } } +// { dg-final { scan-assembler-not "weak.*_Z" } } + +namespace { + template struct A; + template using B = int; +} + +template class Q> void f() { } + +int main() { + f(); + f(); +} diff --git a/gcc/testsuite/g++.dg/abi/no-linkage-expr3.C b/gcc/testsuite/g++.dg/abi/no-linkage-expr3.C new file mode 100644 index 000..a2db1a45c74 --- /dev/null +++ b/gcc/testsuite/g++.dg/abi/no-linkage-expr3.C @@ -0,0 +1,17 @@ +// PR c++/70413 +// { dg-final { scan-assembler-not "weak.*_Z" } } + +namespace { + struct A { +void f(); +int m; + }; +} + +template void g() { } +template void h() { } + +int main() { + g<::f>(); + h<::m>(); +}
Re: [PATCH] c++: constness of decltype of NTTP object [PR98820]
On 9/15/23 13:55, Patrick Palka wrote: This corrects decltype of a (class) NTTP object as per [dcl.type.decltype]/1.2 and [temp.param]/6 in the type-dependent case. In the non-dependent case (nontype-class8.C) we resolve the decltype ahead of time, and finish_decltype_type already made sure to drop the const VIEW_CONVERT_EXPR wrapper around the TEMPLATE_PARM_INDEX. Hmm, seems like dropping the VIEW_CONVERT_EXPR is wrong in this case? I'm not sure why I added that. Jason
Re: [PATCH v6] c++: Move consteval folding to cp_fold_r
On 9/15/23 16:32, Marek Polacek wrote: On Fri, Sep 15, 2023 at 02:08:46PM -0400, Jason Merrill wrote: On 9/13/23 20:02, Marek Polacek wrote: On Wed, Sep 13, 2023 at 05:57:47PM -0400, Jason Merrill wrote: On 9/13/23 16:56, Marek Polacek wrote: On Tue, Sep 12, 2023 at 05:26:25PM -0400, Jason Merrill wrote: On 9/8/23 14:24, Marek Polacek wrote: + switch (TREE_CODE (stmt)) +{ +/* Unfortunately we must handle code like +false ? bar () : 42 + where we have to check bar too. */ +case COND_EXPR: + if (cp_fold_immediate_r (_OPERAND (stmt, 1), walk_subtrees, data)) + return error_mark_node; + if (TREE_OPERAND (stmt, 2) + && cp_fold_immediate_r (_OPERAND (stmt, 2), walk_subtrees, data)) + return error_mark_node; Is this necessary? Doesn't walk_tree already walk into the arms of COND_EXPR? Unfortunately yes. The cp_fold call in cp_fold_r could fold the ?: into a constant before we see it here. I've added a comment saying just that. Ah. But in that case I guess we need to walk into the arms, not just check the top-level expression in them. Arg, of course. I was fooled into thinking that it would recurse, but you're right. Fixed by using cp_walk_tree as I intended. Tested in consteval34.C. But maybe cp_fold_r should do that before the cp_fold, instead of this function? I...am not sure how that would be better than what I did. Callers of cp_fold_immediate don't need this because cp_fold_r isn't involved, so it isn't folding anything. This is true. cp_fold_r can walk the arms with cp_fold_r and then clear *walk_subtrees to avoid walking the arms again normally. I didn't think we wanted to do everything cp_fold_r does even in dead branches, but ok. Ah, that's a good point. With the recursive walk in cp_fold_immediate_r, I suppose we could suppress it when called from cp_fold_immediate with a new fold_flag? That would still allow for cp_walk_tree_without_duplicates. Incidentally, I notice you check for null op2 of COND_EXPR, should probably also check op1. Jason
Re: [PATCH] core: Support heap-based trampolines
Hi Richard, > On 14 Sep 2023, at 11:18, Richard Biener wrote: > > On Wed, Sep 6, 2023 at 5:44 PM FX Coudert wrote: >> >> ping**2 on the revised patch, for Richard or another global reviewer. So far >> all review feedback is that it’s a step forward, and it’s been widely used >> for both aarch64-darwin and x86_64-darwin distributions for almost three >> years now. >> >> OK to commit? > > I just noticed that ftrampoline-impl isn't Optimize, thus it's not > streamed with LTO. I think this is fine, the nested pass runs before LTO streaming and lowers to the relevant built-ins for the chosen impl. The builtins are distinct and can co-exist in the linked exe, > How does mixing different -ftrampoline-impl for different LTO TUs behave? Assuming that a target can support multiple implementations, then each is applied local to a single TU. The nested functions are scoped within their parent and thus should not be candidates for merging by LTO. For a target that cannot support both, then one or more of the TUs should be rejected before we even get to LTO. > How does mis-specifying -ftrampoline-impl at LTO link time compared to > compile-time behave? The flag should be a NOP at LTO link time (but I do not think we want to reject it, that would probably create other issues?) > Is the state fully reflected during pre-IPA compilation and the flag not > needed after that? yes, that is my understanding, nested runs very early. > It appears so, but did you check? I actually checked on x86_64-darwin (which does support both) and we see… here with two tus with nested fns and a third with the main(). $ nm -mapv ./nn.ltrans0.ltrans.o as expected, two instances of the nested “bar”. 01a8 (__TEXT,__cstring) non-external lC0 001f (__TEXT,__text) non-external _bar.0.lto_priv.0 01d0 (__TEXT,__cstring) non-external lC1 00ec (__TEXT,__text) non-external _bar.0.lto_priv.1 007c (__TEXT,__text) external _foo_1 0149 (__TEXT,__text) external _foo_2 (__TEXT,__text) external _main >>> these for heap-based: (undefined) external ___builtin_nested_func_ptr_created (undefined) external ___builtin_nested_func_ptr_deleted >>> this for stack-based. (undefined) external ___enable_execute_stack (and the code executes as expected). > OK if that's a non-issue. thanks, we'll wait a day or two in case of any follow-on comments, Iain P.S. I was investigating some unrelated unwinder issues a couple of weeks ago, but that did highlight that we have a possibility to avoid the leaks from longjump if we hang on the forced_unwind() machinery [TODO, tho, not part of this initial patch] > > Thanks, > Richard. > >> FX >> >> >> >>> Le 5 août 2023 à 16:20, FX Coudert a écrit : >>> >>> Hi Richard, >>> >>> Thanks for your feedback. Here is an amended version of the patch, taking >>> into consideration your requests and the following discussion. There is no >>> configure option for the libgcc part, and the documentation is amended. The >>> patch is split into three commits for core, target and libgcc. >>> >>> Currently regtesting on x86_64 linux and darwin (it was fine before I split >>> up into three commits, so I’m re-testing to make sure I didn’t screw >>> anything up). >>> >>> OK to commit? >>> FX >>
[PATCH] MATCH: Add simplifications of `(a == CST) & a`
`(a == CST) & a` can be either simplified to simplying `a == CST` or 0 depending on the first bit of the CST. This is an extension of the already pattern of `X & !X` and allows us to remove the 2 xfails on gcc.dg/binop-notand1a.c and gcc.dg/binop-notand4a.c. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/111431 gcc/ChangeLog: * match.pd (`(a == CST) & a`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/binop-notand1a.c: Remove xfail. * gcc.dg/binop-notand4a.c: Likewise. * gcc.c-torture/execute/pr111431-1.c: New test. * gcc.dg/binop-andeq1.c: New test. * gcc.dg/binop-andeq2.c: New test. * gcc.dg/binop-notand7.c: New test. * gcc.dg/binop-notand7a.c: New test. --- gcc/match.pd | 8 .../gcc.c-torture/execute/pr111431-1.c| 39 +++ gcc/testsuite/gcc.dg/binop-andeq1.c | 12 ++ gcc/testsuite/gcc.dg/binop-andeq2.c | 14 +++ gcc/testsuite/gcc.dg/binop-notand1a.c | 4 +- gcc/testsuite/gcc.dg/binop-notand4a.c | 4 +- gcc/testsuite/gcc.dg/binop-notand7.c | 12 ++ gcc/testsuite/gcc.dg/binop-notand7a.c | 12 ++ 8 files changed, 99 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111431-1.c create mode 100644 gcc/testsuite/gcc.dg/binop-andeq1.c create mode 100644 gcc/testsuite/gcc.dg/binop-andeq2.c create mode 100644 gcc/testsuite/gcc.dg/binop-notand7.c create mode 100644 gcc/testsuite/gcc.dg/binop-notand7a.c diff --git a/gcc/match.pd b/gcc/match.pd index ebb50ee0581..65960a1701e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5172,6 +5172,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) ) ) +/* `(a == CST) & a` can be simplified to `0` or `(a == CST)` depending + on the first bit of the CST. */ +(simplify + (bit_and:c (convert@2 (eq @0 INTEGER_CST@1)) (convert? @0)) + (if ((wi::to_wide (@1) & 1) != 0) + @2 + { build_zero_cst (type); })) + /* Optimize # x_5 in range [cst1, cst2] where cst2 = cst1 + 1 x_5 ? cstN ? cst4 : cst3 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c b/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c new file mode 100644 index 000..a96dbadf2b5 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr111431-1.c @@ -0,0 +1,39 @@ +int +foo (int a) +{ + int b = a == 0; + return (a & b); +} + +#define function(vol,cst) \ +__attribute__((noipa)) \ +_Bool func_##cst##_##vol(vol int a) \ +{ \ + vol int b = a == cst; \ + return (a & b); \ +} + +#define funcdefs(cst) \ +function(,cst) \ +function(volatile,cst) + +#define funcs(f) \ +f(0) \ +f(1) \ +f(5) + +funcs(funcdefs) + +#define test(cst) \ +do { \ + if(func_##cst##_(a) != func_##cst##_volatile(a))\ + __builtin_abort(); \ +} while(0); +int main(void) +{ + for(int a = -10; a <= 10; a++) + { + funcs(test) + } +} + diff --git a/gcc/testsuite/gcc.dg/binop-andeq1.c b/gcc/testsuite/gcc.dg/binop-andeq1.c new file mode 100644 index 000..2a92b8f95df --- /dev/null +++ b/gcc/testsuite/gcc.dg/binop-andeq1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* PR tree-optimization/111431 */ + +int +foo (int a) +{ + int b = a == 2; + return (a & b); +} + +/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/binop-andeq2.c b/gcc/testsuite/gcc.dg/binop-andeq2.c new file mode 100644 index 000..895262fc17e --- /dev/null +++ b/gcc/testsuite/gcc.dg/binop-andeq2.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* PR tree-optimization/111431 */ + +int +foo (int a) +{ + int b = a == 1025; + return (a & b); +} + +/* { dg-final { scan-tree-dump-not "return 0" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " & " "optimized" } } */ +/* { dg-final { scan-tree-dump-times " == 1025;" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/binop-notand1a.c b/gcc/testsuite/gcc.dg/binop-notand1a.c index c7e932b2638..d94685eb4ce 100644 --- a/gcc/testsuite/gcc.dg/binop-notand1a.c +++ b/gcc/testsuite/gcc.dg/binop-notand1a.c @@ -7,6 +7,4 @@ foo (char a, unsigned short b) return (a & !a) | (b & !b); } -/* As long as comparisons aren't boolified and casts from boolean-types - aren't preserved, the folding of X & !X to zero fails. */ -/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/binop-notand4a.c b/gcc/testsuite/gcc.dg/binop-notand4a.c index dce6a5c7eb5..bd9c7cce638 100644 --- a/gcc/testsuite/gcc.dg/binop-notand4a.c +++ b/gcc/testsuite/gcc.dg/binop-notand4a.c @@ -7,6 +7,4 @@ foo (unsigned char a, _Bool b) return (!a & a) | (b & !b); } -/* As long as comparisons aren't boolified and casts from boolean-types - aren't
Re: [PATCH] LoongArch: Fix lo_sum rtx cost
Hi, On 9/16/23 17:16, mengqinggang wrote: The cost of lo_sum rtx for addi.d instruction my be a very big number if computed by common function. It may cause some symbols saving to stack and loading from stack if there no enough registers during loop optimization. Thanks for the patch! It seems though this change is done in order to optimize some previously pathetic codegen, am I right? If so, it's appreciated to have a minimal test case attached, in order to ensure that codegen never regresses. (You can have your teammates help you if you're not familiar with that.) gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum cost. --- gcc/config/loongarch/loongarch.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 845fad5a8e8..0e57f09379c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = COSTS_N_INSNS (4); return false; +case LO_SUM: + *total = set_src_cost (XEXP (x, 0), mode, speed); + return true; + In order for the code to be more maintainable, it may be better to duplicate some of the change reasons here, just in case someone in the future questions this piece of code that's without any explanation, and regresses things (because there's no test case). case LT: case LTU: case LE:
RE: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
Committed, thanks Robin. Pan -Original Message- From: Gcc-patches On Behalf Of Robin Dapp via Gcc-patches Sent: Friday, September 15, 2023 11:44 PM To: 钟居哲 ; Jeff Law ; kito.cheng Cc: rdapp@gmail.com; gcc-patches ; kito.cheng Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391] > You mean this patch is ok? I thought about it a bit more. From my point of view the patch is OK for now in order to get the bug out of the way. In the longer term I would really prefer a more "regular" solution (i.e. via hard_regno_mode_ok) and related. I can take care of that once I have a bit of time but for now let's go ahead. Regards Robin
[PATCH] LoongArch: Fix lo_sum rtx cost
The cost of lo_sum rtx for addi.d instruction my be a very big number if computed by common function. It may cause some symbols saving to stack and loading from stack if there no enough registers during loop optimization. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_rtx_costs): Add lo_sum cost. --- gcc/config/loongarch/loongarch.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 845fad5a8e8..0e57f09379c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3648,6 +3648,10 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = COSTS_N_INSNS (4); return false; +case LO_SUM: + *total = set_src_cost (XEXP (x, 0), mode, speed); + return true; + case LT: case LTU: case LE: -- 2.36.0
Re: RFC: Introduce -fhardened to enable security-related flags
Am Freitag, dem 15.09.2023 um 11:11 -0400 schrieb Marek Polacek: > On Wed, Aug 30, 2023 at 10:46:14AM +0200, Martin Uecker wrote: > > > Improving the security of software has been a major trend in the recent > > > years. Fortunately, GCC offers a wide variety of flags that enable extra > > > hardening. These flags aren't enabled by default, though. And since > > > there are a lot of hardening flags, with more to come, it's been difficult > > > to keep on top of them; more so for the users of GCC who ought not to be > > > expected to keep track of all the new options. > > > > > > To alleviate some of the problems I mentioned, we thought it would > > > be useful to provide a new umbrella option that enables a reasonable set > > > of hardening flags. What's "reasonable" in this context is not easy to > > > pin down. Surely, there must be no ABI impact, the option cannot cause > > > severe performance issues, and, I suspect, it should not cause build > > > errors by enabling stricter compile-time errors (such as, -Wimplicit-int, > > > -Wint-conversion). Including a controversial option in -fhardened > > > would likely cause that users would not use -fhardened at all. It's > > > roughly akin to -Wall or -O2 -- those also enable a reasonable set of > > > options, and evolve over time, and are not kept in sync with other > > > compilers. > > > > > > Currently, -fhardened enables: > > > > > > -D_FORTIFY_SOURCE=3 (or =2 for older glibcs) > > > -D_GLIBCXX_ASSERTIONS > > > -ftrivial-auto-var-init=zero > > > -fPIE -pie -Wl,-z,relro,-z,now > > > -fstack-protector-strong > > > -fstack-clash-protection > > > -fcf-protection=full (x86 GNU/Linux only) > > > > > > -fsanitize=undefined is specifically not enabled. -fstrict-flex-arrays is > > > also liable to break a lot of code so I didn't include it. > > > > > > Appended is a proof-of-concept patch. It doesn't implement > > > --help=hardened > > > yet. A fairly crucial point is that -fhardened will not override options > > > that were specified on the command line (before or after -fhardened). For > > > example, > > > > > > -D_FORTIFY_SOURCE=1 -fhardened > > > > > > means that _FORTIFY_SOURCE=1 will be used. Similarly, > > > > > > -fhardened -fstack-protector > > > > > > will not enable -fstack-protector-strong. > > > > > > Thoughts? > > > > I think this is a great idea! Considering that it is difficult to > > decide what shoud be activated and what not and the baseline should > > not cause compile errors, I wonder whether there should be higher > > levels similar to -O1,2,3 ? > > Thanks. I would like to avoid any levels if at all possible; I think > they would be confusing. > > > Although it would be nice to have a one-letter or very short > > option similar to -O2 or -Wall, but maybe this is not possible > > because all short ones are already taken. Of course, > > "-fhardening" would already a huge improvement to the > > current situation. > > There are some free ones, like -Z, but I'm not confident I could take > it :). > It would send a message. Today I can get crazy optimizations with -O3 but for (somewhat) decent security, I need something like: -D_FORTIFY_SOURCE=3 (or =2 for older glibcs) -D_GLIBCXX_ASSERTIONS -ftrivial-auto-var-init=pattern -fPIE -pie -Wl,-z,relro,-z,now -fstack-protector-strong -fstack-clash-protection -fcf-protection=full -fsanitize=undefined -fsanitize-undefined-trap-on-error -Wall -Wextra which also sends a message. Martin