Re: [PATCH] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-22 Thread Andreas Krebbel via Gcc-patches
On 6/22/21 12:20 AM, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> 
> 
> s390 glibc does not need counters in the .data section, since it stores
> edge hits in its own data structure.  Therefore counters only waste
> space and confuse diffing tools (e.g. kpatch), so don't generate them.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.c (s390_function_profiler): Ignore labelno
>   parameter.
>   * config/s390/s390.h (NO_PROFILE_COUNTERS): Define.

Just two minor nits below. Ok with these changes. Thanks!

Andreas

> ---
>  gcc/config/s390/s390.c | 14 ++
>  gcc/config/s390/s390.h |  2 ++
>  2 files changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 6bbeb640e1f..96c9a9db53b 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -13110,17 +13110,13 @@ output_asm_nops (const char *user, int hw)
>  }
>  }
>  
> -/* Output assembler code to FILE to increment profiler label # LABELNO
> -   for profiling a function entry.  */
> +/* Output assembler code to FILE to call a profiler hook.  */
>  
>  void
> -s390_function_profiler (FILE *file, int labelno)
> +s390_function_profiler (FILE *file, int /* labelno */)

ATTRIBUTE_UNUSED?

>  {
>rtx op[8];
>  
> -  char label[128];
> -  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
> -
>fprintf (file, "# function profiler \n");
>  
>op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
> @@ -13128,10 +13124,6 @@ s390_function_profiler (FILE *file, int labelno)
>op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
>op[7] = GEN_INT (UNITS_PER_LONG);
>  
> -  op[2] = gen_rtx_REG (Pmode, 1);
> -  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
> -  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
> -

Shouldn't we remove these two slots from the op array and renumber the 
subsequent entries then?

>op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
>if (flag_pic)
>  {
> @@ -13162,7 +13154,6 @@ s390_function_profiler (FILE *file, int labelno)
> output_asm_insn ("stg\t%0,%1", op);
> if (flag_dwarf2_cfi_asm)
>   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
> -   output_asm_insn ("larl\t%2,%3", op);
> output_asm_insn ("brasl\t%0,%4", op);
> output_asm_insn ("lg\t%0,%1", op);
> if (flag_dwarf2_cfi_asm)
> @@ -13179,7 +13170,6 @@ s390_function_profiler (FILE *file, int labelno)
> output_asm_insn ("st\t%0,%1", op);
> if (flag_dwarf2_cfi_asm)
>   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
> -   output_asm_insn ("larl\t%2,%3", op);
> output_asm_insn ("brasl\t%0,%4", op);
> output_asm_insn ("l\t%0,%1", op);
> if (flag_dwarf2_cfi_asm)
> diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
> index 3b876160420..fb16a455a03 100644
> --- a/gcc/config/s390/s390.h
> +++ b/gcc/config/s390/s390.h
> @@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
>  
>  #define PROFILE_BEFORE_PROLOGUE 1
>  
> +#define NO_PROFILE_COUNTERS 1
> +
>  
>  /* Trampolines for nested functions.  */
>  
> 



Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-22 Thread Trevor Saunders
On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:
> On 6/21/21 1:15 AM, Richard Biener wrote:
> > On Fri, Jun 18, 2021 at 6:03 PM Martin Sebor  wrote:
> > > 
> > > On 6/18/21 4:38 AM, Richard Biener wrote:
> > > > On Thu, Jun 17, 2021 at 4:43 PM Martin Sebor  wrote:
> > > > > 
> > > > > On 6/17/21 12:03 AM, Richard Biener wrote:
> > > > > > On Wed, Jun 16, 2021 at 6:01 PM Martin Sebor  
> > > > > > wrote:
> > > > > > > 
> > > > > > > On 6/16/21 6:46 AM, Richard Sandiford via Gcc-patches wrote:
> > > > > > > > Richard Biener via Gcc-patches  writes:
> > > > > > > > > On Tue, Jun 15, 2021 at 8:02 AM Trevor Saunders 
> > > > > > > > >  wrote:
> > > > > > > > > > 
> > > > > > > > > > This makes it clear the caller owns the vector, and ensures 
> > > > > > > > > > it is cleaned up.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Trevor Saunders 
> > > > > > > > > > 
> > > > > > > > > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> > > > > > > > > 
> > > > > > > > > OK.
> > > > > > > > > 
> > > > > > > > > Btw, are "standard API" returns places we can use 'auto'?  
> > > > > > > > > That would avoid
> > > > > > > > > excessive indent for
> > > > > > > > > 
> > > > > > > > > -  dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
> > > > > > > > > -bbs.address (),
> > > > > > > > > -bbs.length ());
> > > > > > > > > +  auto_vec dom_bbs = get_dominated_by_region 
> > > > > > > > > (CDI_DOMINATORS,
> > > > > > > > > +  
> > > > > > > > > bbs.address (),
> > > > > > > > > +  
> > > > > > > > > bbs.length ());
> > > > > > > > > 
> > > > > > > > > and just uses
> > > > > > > > > 
> > > > > > > > >   auto dom_bbs = get_dominated_by_region (...
> > > > > > > > > 
> > > > > > > > > Not asking you to do this, just a question for the audience.
> > > > > > > > 
> > > > > > > > Personally I think this would be surprising for something that 
> > > > > > > > doesn't
> > > > > > > > have copy semantics.  (Not that I'm trying to reopen that 
> > > > > > > > debate here :-)
> > > > > > > > FWIW, I agree not having copy semantics is probably the most 
> > > > > > > > practical
> > > > > > > > way forward for now.)
> > > > > > > 
> > > > > > > But you did open the door for me to reiterate my strong 
> > > > > > > disagreement
> > > > > > > with that.  The best C++ practice going back to the early 1990's 
> > > > > > > is
> > > > > > > to make types safely copyable and assignable.  It is the default 
> > > > > > > for
> > > > > > > all types, in both C++ and C, and so natural and expected.
> > > > > > > 
> > > > > > > Preventing copying is appropriate in special and rare 
> > > > > > > circumstances
> > > > > > > (e.g, a mutex may not be copyable, or a file or iostream object 
> > > > > > > may
> > > > > > > not be because they represent a unique physical resource.)
> > > > > > > 
> > > > > > > In the absence of such special circumstances preventing copying is
> > > > > > > unexpected, and in the case of an essential building block such as
> > > > > > > a container, makes the type difficult to use.
> > > > > > > 
> > > > > > > The only argument for disabling copying that has been given is
> > > > > > > that it could be surprising(*).  But because all types are 
> > > > > > > copyable
> > > > > > > by default the "surprise" is usually when one can't be.
> > > > > > > 
> > > > > > > I think Richi's "surprising" has to do with the fact that it lets
> > > > > > > one inadvertently copy a large amount of data, thus leading to
> > > > > > > an inefficiency.  But by analogy, there are infinitely many ways
> > > > > > > to end up with inefficient code (e.g., deep recursion, or heap
> > > > > > > allocation in a loop), and they are not a reason to ban the coding
> > > > > > > constructs that might lead to it.
> > > > > > > 
> > > > > > > IIUC, Jason's comment about surprising effects was about implicit
> > > > > > > conversion from auto_vec to vec.  I share that concern, and agree
> > > > > > > that it should be addressed by preventing the conversion (as Jason
> > > > > > > suggested).
> > > > > > 
> > > > > > But fact is that how vec<> and auto_vec<> are used today in GCC
> > > > > > do not favor that.  In fact your proposed vec<> would be quite 
> > > > > > radically
> > > > > > different (and IMHO vec<> and auto_vec<> should be unified then to
> > > > > > form your proposed new container).  auto_vec<> at the moment simply
> > > > > > maintains ownership like a smart pointer - which is _also_ not 
> > > > > > copyable.
> > > > > 
> > > > > Yes, as we discussed in the review below, vec is not a good model
> > > > > because (as you note again above) it's constrained by its legacy
> > > > > uses.  The best I think we can do for it is to make it safer to
> > > > > use.
> > > > > 

Re: [PATCH] c++: CTAD and deduction guide selection [PR86439]

2021-06-22 Thread Patrick Palka via Gcc-patches
On Tue, 22 Jun 2021, Jonathan Wakely wrote:

> On Tue, 22 Jun 2021 at 19:45, Patrick Palka wrote:
> > This change causes us to reject some container CTAD examples in the
> > libstdc++ testsuite due to deduction failure for {}, which AFAICT is the
> > correct behavior.  Previously, in the case of e.g. the first removed
> > example for std::map, the type of {} would be deduced to less as a
> > side effect of forming the call to the selected guide
> >
> >   template,
> >  typename _Allocator = allocator>,
> >  typename = _RequireNotAllocator<_Compare>,
> >  typename = _RequireAllocator<_Allocator>>
> >   map(initializer_list>,
> >   _Compare = _Compare(), _Allocator = _Allocator())
> >   -> map<_Key, _Tp, _Compare, _Allocator>;
> >
> > which made later overload resolution for the constructor call
> > unambiguous.  Now, the type of {} remains undeduced until constructor
> > overload resolution, and we complain about ambiguity with the two
> > constructors
> >
> >   map(initializer_list __l,
> >   const _Compare& __comp = _Compare(),
> >   const allocator_type& __a = allocator_type())
> >
> >   map(initializer_list __l, const allocator_type& __a)
> >
> > This patch just removes these problematic container CTAD examples.
> 
> Do all the problematic cases have a corresponding case that doesn't
> use {} but uses an actual type?
> 
> If not, we might want to add such cases, to ensure we're still
> covering all the cases that really *should* work.

Ah, it looks like most tests have one such corresponding case, but since
the {} can potentially be an allocator or a function, for maximum
coverage we should have two corresponding cases.  So the revised patch
below instead replaces each problematic CTAD example with the other
untested case.

Interestingly two of these new CTAD examples for std::set and std::multiset
end up triggering an unrelated CTAD bug PR101174 on trunk, so the patch
comments out these adjusted examples for now.

-- >8 --

PR c++/86439
PR c++/101174

gcc/cp/ChangeLog:

* call.c (print_error_for_call_failure): Constify 'args' parameter.
(perform_dguide_overload_resolution): Define.
* cp-tree.h: (perform_dguide_overload_resolution): Declare.
* pt.c (do_class_deduction): Use perform_dguide_overload_resolution
instead of build_new_function_call.  Don't use tf_decltype or
set cp_unevaluated_operand.  Remove unnecessary NULL_TREE tests.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/map/cons/deduction.cc: Replace ambiguous
CTAD examples.
* testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
* testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
Mention one of the replaced examples is broken due to PR101174.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_map/cons/deduction.cc: Replace
ambiguous CTAD examples.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction88.C: New test.
* g++.dg/cpp1z/class-deduction89.C: New test.
* g++.dg/cpp1z/class-deduction90.C: New test.
---
 gcc/cp/call.c | 36 +++-
 gcc/cp/cp-tree.h  |  2 +
 gcc/cp/pt.c   | 41 +++
 .../g++.dg/cpp1z/class-deduction88.C  | 18 
 .../g++.dg/cpp1z/class-deduction89.C  | 15 +++
 .../g++.dg/cpp1z/class-deduction90.C  | 16 
 .../23_containers/map/cons/deduction.cc   |  8 ++--
 .../23_containers/multimap/cons/deduction.cc  |  8 ++--
 .../23_containers/multiset/cons/deduction.cc  |  8 ++--
 .../23_containers/set/cons/deduction.cc   |  8 ++--
 .../unordered_map/cons/deduction.cc   | 17 ++--
 .../unordered_multimap/cons/deduction.cc  | 17 ++--
 .../unordered_multiset/cons/deduction.cc  | 14 ++-
 .../unordered_set/cons/deduction.cc   | 14 ++-
 14 files changed, 170 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction88.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction89.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction90.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 9f03534c20c..aafc7acca24 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4629,7 +4629,7 @@ perform_overload_resolution (tree fn,
functions.  */
 
 static void
-print_error_for_call_failure (tree fn, vec *args,
+print_error_for_call_failure (tree fn, const vec *args,
  struct z_candidate *candidates)
 {
   tree targs = NULL_TREE;
@@ -4654,6 

Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support

2021-06-22 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the prompt review!

on 2021/6/23 上午2:56, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 21, 2021 at 05:27:06PM +0800, Kewen.Lin wrote:
>> Recently if we build gcc on Power with the assembler which doesn't
>> have Power10 support, the build will fail when building libgcc with
>> one error message like:
>>
>> Error: invalid switch -mpower10
>> Error: unrecognized option -mpower10
>> make[2]: *** [...gcc/gcc-base/libgcc/shared-object.mk:14: float128-p10.o] 
>> Error 1
> 
> In general, it is recommended to use a binutils of approximately the
> same age as the GCC you are trying to build.  This is similar to us not
> supporting most other non-sensical configurations.  An important reason
> for that is it cannot ever be tested, there are just too many strange
> combinations possible.
> 
> That said :-)
> 

Ah, ok!  It explained why no one reported this.  I only used the customized
binutils for Power10 build/testing before, will work with one new binutils
from now on.  It did have more testing coverage.

>> By checking the culprit commit r12-1340, it's caused by some typos.
> 
> (That is 9090f4807161.)
> 
>>   - fix test case used for libgcc_cv_powerpc_3_1_float128_hw check.
> 
> I was confused here for a bit, "test case" usually means something in
> testsuite/, I'd call this just "test" :-)
> 
>> BTW, there are some noises during regression testings due to
>> newer versions binutils, but they were identified as unrelated
>> after some checkings.
> 
> Hrm, what kind of noise?
> 

Some check selectors will fail without new binutils, those cases becomes
unsupported, like Power10 support checks.  The location of our pre-built
binutils in our system has not only binutils but also some other tools
like newer version gdb, which caused some differences such as for guality
cases.
>>  * config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
>>  fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
>>  fp128_3_1_hw_obj): Remove variables for ISA 3.1 support.
> 
> Needs a space before the opening paren.  Doesn't need a line break so
> early on that line btw.
> 

Good catch, fixed.

> Just "Remove." or "Delete." is less confusing btw: what you wrote can be
> read as "Remove the variables from these declarations" or similar.  And
> of course terseness is usually best in a changelog.
> 

Fixed.

>>  * config/rs6000/t-float128-p10-hw (FLOAT128_HW_INSNS): Append
>>  macro FLOAT128_HW_INSNS_ISA3_1 for ISA 3.1 support.
> 
> Don't say what it is for, just say what changed :-)
> 

Fixed.

>>  (FP128_3_1_CFLAGS_HW): Fix option typo.
>>  * config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): Guarded
>>  with FLOAT128_HW_INSNS_ISA3_1.
> 
> "Guard", not "Guarded", all entries are written in the imperative, like,
> "Do this" or "Guard that".
> 

Got it, fixed.

>> +#ifdef FLOAT128_HW_INSNS_ISA3_1
>>  TFtype __floattikf (TItype_ppc)
>>__attribute__ ((__ifunc__ ("__floattikf_resolve")));
> 
> I wonder if we now need TItype_ppc at all anymore, btw?
> 

Sorry that I don't quite follow this question.

I think it stands for type signed int128, the function is to
convert signed int128 to float128, it would be needed here.
But I think that's not what you asked.  Or you are referring
to replace this type with signed int128 without shielding it
with mode?  If yes, it sounds like a question for Mike.

> Okay for trunk with the changelog slightly massaged.  Thanks!
> 

Thanks for catching changelog problems and the fix suggestion!

Fixed all of them accordingly and committed in r12-1738.


BR,
Kewen


[PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-06-22 Thread Xi Ruoyao via Gcc-patches
Commit message shamelessly copied from 1777beb6b129 by jakub:

This function, because it is sometimes called even outside of function
bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in order
for that to work, when first referenced, the VAR_DECLs need to appear in a
TARGET_EXPR so that during gimplification the var gets the right
DECL_CONTEXT and is added to local decls.

Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk and backport
to 11, 10, and 9?

gcc/

* config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
  TARGET_EXPR instead of MODIFY_EXPR.
---
 gcc/config/mips/mips.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 8f043399a8e..89d1be6cea6 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -22439,12 +22439,12 @@ mips_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
   tree get_fcsr = mips_builtin_decls[MIPS_GET_FCSR];
   tree set_fcsr = mips_builtin_decls[MIPS_SET_FCSR];
   tree get_fcsr_hold_call = build_call_expr (get_fcsr, 0);
-  tree hold_assign_orig = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
- fcsr_orig_var, get_fcsr_hold_call);
+  tree hold_assign_orig = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
+ fcsr_orig_var, get_fcsr_hold_call, NULL, 
NULL);
   tree hold_mod_val = build2 (BIT_AND_EXPR, MIPS_ATYPE_USI, fcsr_orig_var,
  build_int_cst (MIPS_ATYPE_USI, 0xf003));
-  tree hold_assign_mod = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
-fcsr_mod_var, hold_mod_val);
+  tree hold_assign_mod = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
+fcsr_mod_var, hold_mod_val, NULL, NULL);
   tree set_fcsr_hold_call = build_call_expr (set_fcsr, 1, fcsr_mod_var);
   tree hold_all = build2 (COMPOUND_EXPR, MIPS_ATYPE_USI,
  hold_assign_orig, hold_assign_mod);
@@ -22454,8 +22454,8 @@ mips_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
   *clear = build_call_expr (set_fcsr, 1, fcsr_mod_var);
 
   tree get_fcsr_update_call = build_call_expr (get_fcsr, 0);
-  *update = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
-   exceptions_var, get_fcsr_update_call);
+  *update = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
+   exceptions_var, get_fcsr_update_call, NULL, NULL);
   tree set_fcsr_update_call = build_call_expr (set_fcsr, 1, fcsr_orig_var);
   *update = build2 (COMPOUND_EXPR, void_type_node, *update,
set_fcsr_update_call);
-- 
2.32.0





Re: [PATCH 2/7] Duplicate the range information of the phi onto the new ssa_name

2021-06-22 Thread Andrew Pinski via Gcc-patches
On Sun, Jun 20, 2021 at 11:50 PM Richard Biener via Gcc-patches
 wrote:
>
> On Sat, Jun 19, 2021 at 9:49 PM apinski--- via Gcc-patches
>  wrote:
> >
> > From: Andrew Pinski 
> >
> > Since match_simplify_replacement uses gimple_simplify, there is a new
> > ssa name created sometimes and then we go and replace the phi edge with
> > this new ssa name, the range information on the phi is lost.
> > I don't have a testcase right now where we lose the range information
> > though but it does show up when enhancing match.pd to handle
> > some min/max patterns and g++.dg/warn/Wstringop-overflow-1.C starts
> > to fail.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-phiopt.c (match_simplify_replacement): Duplicate range
> > info if we're the only things setting the target PHI.
> > ---
> >  gcc/tree-ssa-phiopt.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> > index 24cbce9955a..feb8ca8d0d1 100644
> > --- a/gcc/tree-ssa-phiopt.c
> > +++ b/gcc/tree-ssa-phiopt.c
> > @@ -894,6 +894,14 @@ match_simplify_replacement (basic_block cond_bb, 
> > basic_block middle_bb,
> >gsi_move_before (, );
> >reset_flow_sensitive_info (gimple_assign_lhs (stmt_to_move));
> >  }
> > +  /* Duplicate range info if we're the only things setting the target PHI. 
> >  */
> > +  tree phi_result = PHI_RESULT (phi);
> > +  if (!gimple_seq_empty_p (seq)
> > +  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
> > +  && !POINTER_TYPE_P (TREE_TYPE (phi_result))
>
> Please use INTEGRAL_TYPE_P (...)

Yes, using INTEGRAL_TYPE_P makes more sense here.
Note I copied exactly what was already done in minmax_replacement.

>
> > +  && SSA_NAME_RANGE_INFO (phi_result)
>
> && !SSA_NAME_RANGE_INFO (result)
>
> ?  Why conditional on !gimple_seq_empty_p (seq)?

The way I understand it is if !gimple_seq_empty_p (seq) is true then
the result will be a new SSA name and therefore !SSA_NAME_RANGE_INFO
(result) will also be true

> It looks like we could do this trick (actually in both directions,
> wherever the range
> info is missing?) in replace_phi_edge_with_variable instead?

Let me look into doing that.

Thanks,
Andrew

>
> Thanks,
> Richard.
>
> )
> > +duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE 
> > (phi_result),
> > +  SSA_NAME_RANGE_INFO (phi_result));
> >if (seq)
> >  gsi_insert_seq_before (, seq, GSI_SAME_STMT);
> >
> > --
> > 2.27.0
> >


Re: [PATCH 2/2] elf: Add GNU_PROPERTY_1_NEEDED check

2021-06-22 Thread H.J. Lu via Gcc-patches
On Tue, Jun 22, 2021 at 11:15 AM Fangrui Song  wrote:
>
> On 2021-06-22, H.J. Lu wrote:
> >On Mon, Jun 21, 2021 at 10:46 PM Fangrui Song  wrote:
> >>
> >> On 2021-06-21, H.J. Lu wrote:
> >> >On Mon, Jun 21, 2021 at 9:16 PM Alan Modra  wrote:
> >> >>
> >> >> On Mon, Jun 21, 2021 at 07:12:02PM -0700, H.J. Lu wrote:
> >> >> > On Mon, Jun 21, 2021 at 5:06 PM Alan Modra  wrote:
> >> >> > >
> >> >> > > On Mon, Jun 21, 2021 at 03:34:38PM -0700, Fangrui Song wrote:
> >> >> > > > clang -fno-pic -fno-direct-access-extern-data  works with 
> >> >> > > > clang>=12.0.0 today.
> >> >> > >
> >> >> > > -fno-direct-access-extern-data or variations on that also seem good 
> >> >> > > to
> >> >> > > me.  -fpic-extern would also work.  I liked -fprotected-abi because
> >> >> > > it shows the intent of correcting abi issues related to protected
> >> >> > > visibility.  (Yes, it affects code for all undefined symbols because
> >> >> > > the compiler clearly isn't seeing the entire program if there are
> >> >> > > undefined symbols.)
> >> >> >
> >> >> > I need an option which can be turned on and off.   How about
> >> >> > -fextern-access=direct and -fextern-access=indirect?  It will cover
> >> >> > both data and function?
> >>
> >> -fno-direct-access-external-data and -fdirect-access-external-data can 
> >> turn on/off the bit.
> >>
> >> clang -fno-pic -fno-direct-access-external-data  works for x86-64 and 
> >> aarch64.
> >>
> >> We can add a -fno-direct-access-external
> >
> >Since both clang and GCC will add a new option for both data and function
> >symbols, can we have an agreement on the new option name?  I am listing
> >options here:
> >
> >1. -fdirect-access-external/-fno-direct-access-external
> >2. -fdirect-extern-access/-fno-direct-exern-access
> >3. -fdirect-external-access/-fno-direct-exernal-access
> >4. -fextern-access=direct/-fextern-access=indirect
> >5. -fexternal-access=direct/-fexternal-access=indirect
> >
> >My order of preferences are 4, 5, 2, 3, 1.
>
> Preferring "extern" to "external" looks fine to me. (`extern` is the C/C++ 
> construct anyway and this option describes what to do with default visibility 
> non-definition `extern int xxx`/`extern void foo()`).
>
> -fextern-access=direct/-fextern-access=indirect and 
> -fdirect-extern-access/-fno-direct-exern-access
>
> look good to me.

Let's go with -fdirect-extern-access/-fno-direct-exern-access

> I am happy to add aliases to clang if consensus is moving toward  
> -fextern-access=indirect or -fno-direct-extern-access.
>
> >> >> Yes, FWIW that option name for gcc also looks good to me.
> >> >
> >> >I will change the gcc option to
> >> >
> >> >-fextern-access=direct
> >> >-fextern-access=indirect
> >> >
> >> >and change GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION
> >> >to GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS
> >>
> >> Note that this will be a glibc + GNU ld specific thing.
> >>
> >> gold and ld.lld error for copy relocations on protected data symbols by 
> >> default.
> >
> >At run-time, there will be a mixture of components built with different tools
> >over time.  A marker will help glibc to avoid potential run-time failures due
> >to binary incompatibility.
>
> glibc can perform the check without a GNU PROPERTY.
>
>
> For a st_value!=0 && st_shndx==0 symbol lookup during relocation
> processing, if the definition is found protected in a shared object,
> ld.so can report an error and make a suggestion like
> "consider recompiling the executable with -fno-direct-extern-access or
> -fpie/-fpic"

There is no run-time issue when the shared object isn't built with
-fno-direct-extern-access.   That is what the marker is used for.

>
> I echo Michael's question in another thread
> https://sourceware.org/pipermail/binutils/2021-June/117137.html
>
> "IOW: which scenario do you want to not error on when you want to make the 
> error conditional?"
>
> For such rare cases, we can use a LD_* environment variable.
>
> >> >> Now as to the need for a corresponding linker option, I'm of the
> >> >> opinion that it is ideal for the linker to be able to cope without
> >> >> needing special options.  Can you show me a set of object files (or
> >> >> just describe them) where ld cannot deduce from relocations and
> >> >> dynamic symbols what dynbss copies, plt stubs, and dynamic relocations
> >> >> are needed?  I'm fairly sure I manage to do that for powerpc.
> >> >>
> >> >> Note that I'm not against a new option to force the linker to go
> >> >> against what it would do based on input object files (perhaps
> >> >
> >> >I'd like to turn it on in linker without any compiler changes, especially
> >> >when building shared libraries, kind of a subset of -Bsymbolic.
> >> >
> >> >> reporting errors), but don't think we should have a new option without
> >> >> some effort being made to see whether we really need it.
> >> >
> >> >Here is a glibc patch to use both linker options on some testcases:
> >> >
> >> 

Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-06-22 Thread David Malcolm via Gcc-patches
On Tue, 2021-06-22 at 19:18 -0400, David Malcolm wrote:
> On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote:
> > The attached patch introduces the suppress_warning(),
> > warning_suppressed(), and copy_no_warning() APIs without making
> > use of them in the rest of GCC.  They are in three files:
> > 
> >    diagnostic-spec.{h,c}: Location-centric overloads.
> >    warning-control.cc: Tree- and gimple*-centric overloads.
> > 
> > The location-centric overloads are suitable to use from the
> > diagnostic
> > subsystem.  The rest can be used from the front ends and the middle
> > end.
> 
> [...snip...]
> 
> > +/* Return true if warning OPT is suppressed for decl/expression
> > EXPR.
> > +   By default tests the disposition for any warning.  */
> > +
> > +bool
> > +warning_suppressed_p (const_tree expr, opt_code opt /* =
> > all_warnings */)
> > +{
> > +  const nowarn_spec_t *spec = get_nowarn_spec (expr);
> > +
> > +  if (!spec)
> > +    return get_no_warning_bit (expr);
> > +
> > +  const nowarn_spec_t optspec (opt);
> > +  bool dis = *spec & optspec;
> > +  gcc_assert (get_no_warning_bit (expr) || !dis);
> > +  return dis;
> 
> Looking through the patch, I don't like the use of "dis" for the "is
> suppressed" bool, since...
> 
> [...snip...]
> 
> > +
> > +/* Enable, or by default disable, a warning for the statement STMT.
> > +   The wildcard OPT of -1 controls all warnings.  */
> 
> ...I find the above comment to be confusingly worded due to the double-
> negative.
> 
> If I'm reading your intent correctly, how about this wording:
> 
> /* Change the supression of warnings for statement STMT.
>    OPT controls which warnings are affected.
>    The wildcard OPT of -1 controls all warnings.
>    If SUPP is true (the default), enable the suppression of the
> warnings.
>    If SUPP is false, disable the suppression of the warnings.  */
> 
> ...and rename "dis" to "supp".
> 
> Or have I misread the intent of the patch?
> 
> > +
> > +void
> > +suppress_warning (gimple *stmt, opt_code opt /* = all_warnings */,
> > + bool dis /* = true */)
> 
> > +{
> > +  if (opt == no_warning)
> > +    return;
> > +
> > +  const key_type_t key = convert_to_key (stmt);
> > +
> > +  dis = suppress_warning_at (key, opt, dis) || dis;
> > +  set_no_warning_bit (stmt, dis);
> > +}
> 
> [...snip...]

Also, I think I prefer having a separate entrypoints for the "all
warnings" case; on reading through the various patches I think that in
e.g.:

- TREE_NO_WARNING (*expr_p) = 1;
+ suppress_warning (*expr_p);

I prefer:

  suppress_warnings (*expr_p);

(note the plural) since that way we can grep for them, and it seems
like better grammar to me.

Both entrypoints could be implemented by a static suppress_warning_1
internally if that makes it easier.

In that vein, "unsuppress_warning" seems far clearer to me that
"suppress_warning (FOO, false)"; IIRC there are very few uses of this
non-default arg (I couldn't find any in a quick look through the v2
kit).

Does this make sense?
Dave



Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-06-22 Thread David Malcolm via Gcc-patches
On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote:
> The attached patch introduces the suppress_warning(),
> warning_suppressed(), and copy_no_warning() APIs without making
> use of them in the rest of GCC.  They are in three files:
> 
>    diagnostic-spec.{h,c}: Location-centric overloads.
>    warning-control.cc: Tree- and gimple*-centric overloads.
> 
> The location-centric overloads are suitable to use from the diagnostic
> subsystem.  The rest can be used from the front ends and the middle
> end.

[...snip...]

> +/* Return true if warning OPT is suppressed for decl/expression EXPR.
> +   By default tests the disposition for any warning.  */
> +
> +bool
> +warning_suppressed_p (const_tree expr, opt_code opt /* = all_warnings */)
> +{
> +  const nowarn_spec_t *spec = get_nowarn_spec (expr);
> +
> +  if (!spec)
> +return get_no_warning_bit (expr);
> +
> +  const nowarn_spec_t optspec (opt);
> +  bool dis = *spec & optspec;
> +  gcc_assert (get_no_warning_bit (expr) || !dis);
> +  return dis;

Looking through the patch, I don't like the use of "dis" for the "is
suppressed" bool, since...

[...snip...]

> +
> +/* Enable, or by default disable, a warning for the statement STMT.
> +   The wildcard OPT of -1 controls all warnings.  */

...I find the above comment to be confusingly worded due to the double-
negative.

If I'm reading your intent correctly, how about this wording:

/* Change the supression of warnings for statement STMT.
   OPT controls which warnings are affected.
   The wildcard OPT of -1 controls all warnings.
   If SUPP is true (the default), enable the suppression of the warnings.
   If SUPP is false, disable the suppression of the warnings.  */

...and rename "dis" to "supp".

Or have I misread the intent of the patch?

> +
> +void
> +suppress_warning (gimple *stmt, opt_code opt /* = all_warnings */,
> +   bool dis /* = true */)

> +{
> +  if (opt == no_warning)
> +return;
> +
> +  const key_type_t key = convert_to_key (stmt);
> +
> +  dis = suppress_warning_at (key, opt, dis) || dis;
> +  set_no_warning_bit (stmt, dis);
> +}

[...snip...]

Dave





Re: [PATCH] Modula-2 into the GCC tree on master

2021-06-22 Thread Gaius Mulley via Gcc-patches
Jakub Jelinek  writes:

>
> On Mon, Jun 21, 2021 at 11:36:48PM +0100, Gaius Mulley via Gcc-patches wrote:
>> > : error: the file containing the definition module 
>> > <80><98>M2RTS
>> > <80><99> cannot be found
>> > compiler exited with status 1
>> > output is:
>> > : error: the file containing the definition module 
>> > <80><98>M2RTS
>> > <80><99> cannot be found
>>
>> ah yes, it would be good to make it autoconf locale utf-8
>
> No, whether gcc is configured on an UTF-8 capable terminal or using UTF-8
> locale doesn't imply whether it will actually be used in such a terminal
> later on.
> See e.g. gcc/intl.c (gcc_init_libintl) how it decides whether to use UTF-8
> or normal quotes.

thank you for the steer - and to avoid a mistake of confusing host and
build.  I've generated a small patch which works using gcc/intl.c
open_quote/close_quote.  Hopefully it improves the aesthetics on host
machines.

diff --git a/gcc-versionno/gcc/m2/ChangeLog b/gcc-versionno/gcc/m2/ChangeLog
index 25cee8ed..86a95270 100644
--- a/gcc-versionno/gcc/m2/ChangeLog
+++ b/gcc-versionno/gcc/m2/ChangeLog
@@ -1,3 +1,16 @@
+2021-06-22   Gaius Mulley 
+
+   * m2/gm2-compiler/M2ColorString.mod:  import open_quote and
+   close_quote from m2color.
+   * m2/gm2-gcc/m2color.c:  (open_quote) New function implemented.
+   (close_quote) New function implemented, both functions import
+   open and close quotes from gcc/intl.c to pick up whether the
+   host has utf-8 capability.
+   * m2/gm2-gcc/m2color.def:  (open_quote) New function defined.
+   (close_quote) New function defined.
+   * m2/gm2-gcc/m2color.h:  (open_quote) and (close_quote) provide C
+   prototypes for external functions.
+
 2021-06-21   Gaius Mulley 
 
* tools-src/calcpath:  (New file).
diff --git a/gcc-versionno/gcc/m2/gm2-compiler/M2ColorString.mod 
b/gcc-versionno/gcc/m2/gm2-compiler/M2ColorString.mod
index cecee131..f32ef88c 100644
--- a/gcc-versionno/gcc/m2/gm2-compiler/M2ColorString.mod
+++ b/gcc-versionno/gcc/m2/gm2-compiler/M2ColorString.mod
@@ -21,7 +21,7 @@ along with GNU Modula-2; see the file COPYING3.  If not see
 
 IMPLEMENTATION MODULE M2ColorString ;
 
-FROM m2color IMPORT colorize_start, colorize_stop ;
+FROM m2color IMPORT colorize_start, colorize_stop, open_quote, close_quote ;
 FROM DynamicStrings IMPORT InitString, InitStringCharStar,
ConCat, ConCatChar, Mark, string, KillString,
Dup, char, Length, Mult ;
@@ -70,7 +70,7 @@ END append ;
 
 PROCEDURE quoteOpen (s: String) : String ;
 BEGIN
-   RETURN ConCat (append (s, "quote"), Mark (InitString ("‘")))
+   RETURN ConCat (append (s, "quote"), Mark (InitStringCharStar (open_quote 
(
 END quoteOpen ;
 
 
@@ -82,7 +82,7 @@ PROCEDURE quoteClose (s: String) : String ;
 BEGIN
s := endColor (s) ;
s := append (s, "quote") ;
-   s := ConCat (s, Mark (InitString ("’"))) ;
+   s := ConCat (s, Mark (InitStringCharStar (close_quote ( ;
s := endColor (s) ;
RETURN s
 END quoteClose ;
diff --git a/gcc-versionno/gcc/m2/gm2-gcc/m2color.c 
b/gcc-versionno/gcc/m2/gm2-gcc/m2color.c
index ec58a4b7..72299e34 100644
--- a/gcc-versionno/gcc/m2/gm2-gcc/m2color.c
+++ b/gcc-versionno/gcc/m2/gm2-gcc/m2color.c
@@ -38,6 +38,18 @@ const char *m2color_colorize_stop (bool show_color)
 }
 
 
+const char *m2color_open_quote (void)
+{
+  return open_quote;
+}
+
+
+const char *m2color_close_quote (void)
+{
+  return close_quote;
+}
+
+
 void _M2_m2color_init ()
 {
 }
diff --git a/gcc-versionno/gcc/m2/gm2-gcc/m2color.def 
b/gcc-versionno/gcc/m2/gm2-gcc/m2color.def
index 6fa48d2a..a6e96e21 100644
--- a/gcc-versionno/gcc/m2/gm2-gcc/m2color.def
+++ b/gcc-versionno/gcc/m2/gm2-gcc/m2color.def
@@ -39,4 +39,16 @@ PROCEDURE colorize_start (show_color: BOOLEAN;
 PROCEDURE colorize_stop (show_color: BOOLEAN) : ADDRESS ;
 
 
+(* open_quote - return a C string containing the open quote character which
+   might be a UTF-8 character if on a UTF-8 supporting host.  *)
+
+PROCEDURE open_quote () : ADDRESS ;
+
+
+(* close_quote - return a C string containing the close quote character which
+   might be a UTF-8 character if on a UTF-8 supporting host.  *)
+
+PROCEDURE close_quote () : ADDRESS ;
+
+
 END m2color.
diff --git a/gcc-versionno/gcc/m2/gm2-gcc/m2color.h 
b/gcc-versionno/gcc/m2/gm2-gcc/m2color.h
index 08ef9e72..1b9be66b 100644
--- a/gcc-versionno/gcc/m2/gm2-gcc/m2color.h
+++ b/gcc-versionno/gcc/m2/gm2-gcc/m2color.h
@@ -37,9 +37,11 @@ along with GNU Modula-2; see the file COPYING3.  If not see
 
 
 EXTERN const char *m2color_colorize_start (bool show_color, char *name, 
unsigned int name_len);
-
 EXTERN const char *m2color_colorize_stop (bool show_color);
 
+EXTERN const char *m2color_open_quote (void);
+EXTERN const char *m2color_close_quote (void);
+
 EXTERN void _M2_m2color_init ();
 EXTERN void _M2_m2color_finish ();


regards,
Gaius


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches
Okay.

Now, I believe that we agreed on the following:

For this current patch:

1. Use byte-repeatable pattern for pattern-initialization;
2. Use one pattern for all types;
3. Use “0xFE” for the byte pattern value.

Possible future improvement:

1. Type specific patterns if needed;
2. User-specified pattern if needed; (add a new option for user to change the 
patterns).
3. Make the code generation part a target hook if needed.

Let me know if I miss anything.

Thanks.

Qing

> On Jun 22, 2021, at 1:18 PM, Richard Sandiford  
> wrote:
> 
> Kees Cook  writes:
>> On Tue, Jun 22, 2021 at 09:25:57AM +0100, Richard Sandiford wrote:
>>> Kees Cook  writes:
 On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
> So, if “pattern value” is “0x”, then it’s a valid 
> canonical virtual memory address.  However, for most OS, 
> “0x” should be not in user space.
> 
> My question is, is “0xF” good for pointer? Or 
> “0x” better?
 
 I think 0xFF repeating is fine for this version. Everything else is a
 "nice to have" for the pattern-init, IMO. :)
>>> 
>>> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
>>> 
>>> For integer types, all values are valid representations, and we're
>>> relying on the pattern being “obviously” wrong in context.  0x…
>>> is unlikely to be a correct integer but 0x… would instead be a
>>> “nice” -1.  It would be difficult to tell in a debugger that a -1
>>> came from pattern init rather than a deliberate choice.
>> 
>> I can live with 0xAA. On x86_64, this puts it nicely in the middle of
>> the middle of the non-canonical space:
>> 
>> 0x8000 - 0x7fff
>> 
>> The only trouble is with 32-bit, where the value 0x is a
>> legitimate allocatable userspace address. If we want some kind-of middle
>> ground, how about 0xFE? That'll be non-canonical on x86_64, and at the
>> high end of the i386 kernel address space.
> 
> Sounds good to me FWIW.  That'd give float -1.694739530317379e+38
> (suspiciously big even for astrophysics, I hope!) and would still
> look unusual in an integer context.
> 
>>> I agree that, all other things being equal, it would be nice to use NaNs
>>> for floats.  But relying on wrong numerical values for floats doesn't
>>> seem worse than doing that for integers.
>>> 
>>> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
>>> which admittedly doesn't stand out as wrong.  But I'm not sure we
>>> should sacrifice integer debugging for float debugging here.
>> 
>> In some future version type-specific patterns would be a nice improvement,
>> but I don't want that to block getting the zero-init portion landed. :)
> 
> Yeah.
> 
> Thanks,
> Richard



Re: [PATCH RFA] expand: empty class return optimization [PR88529]

2021-06-22 Thread Joseph Myers
This introduces an ICE building libgomp for ColdFire, as detected by my 
glibc bot.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101170

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, rs6000] Do not enable pcrel-opt by default

2021-06-22 Thread Aaron Sawdey via Gcc-patches
SPEC2017 testing on p10 shows that this optimization does not have a
positive impact on performance. So we are no longer going to enable it
by default. The test cases for it needed to be updated so they always
enable it to test it.

OK for trunk and backport to 11 if bootstrap/regtest passes?

Thanks!
   Aaron

gcc/

* config/rs6000/rs6000-cpus.def: Take OPTION_MASK_PCREL_OPT out
 of OTHER_POWER10_MASKS so it will not be enabled by default.

gcc/testsuite/

* gcc.target/powerpc/pcrel-opt-inc-di.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-ld-df.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-ld-di.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-ld-hi.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-ld-qi.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-ld-sf.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-ld-si.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-ld-vector.c: Enable -mpcrel-opt to
test it.
* gcc.target/powerpc/pcrel-opt-st-df.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-st-di.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-st-hi.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-st-qi.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-st-sf.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-st-si.c: Enable -mpcrel-opt to test it.
* gcc.target/powerpc/pcrel-opt-st-vector.c: Enable -mpcrel-opt to
test it.
---
 gcc/config/rs6000/rs6000-cpus.def  | 3 ++-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c| 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c | 2 +-
 16 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 52ce84835f7..1e8c9a68c3f 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -75,9 +75,10 @@
 | OPTION_MASK_P9_VECTOR)
 
 /* Flags that need to be turned off if -mno-power10.  */
+/* PCREL_OPT is now disabled by default so we comment it out here.  */
 #define OTHER_POWER10_MASKS(OPTION_MASK_MMA\
 | OPTION_MASK_PCREL\
-| OPTION_MASK_PCREL_OPT\
+/* | OPTION_MASK_PCREL_OPT */  \
 | OPTION_MASK_PREFIXED)
 
 #define ISA_3_1_MASKS_SERVER   (ISA_3_0_MASKS_SERVER   \
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
index c82041c9dc6..6272f5c72c3 100644
--- a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target powerpc_pcrel } */
-/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mpcrel-opt" } */
 
 #define TYPE   unsigned int
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
index d35862fcb6e..0dcab311add 100644
--- a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target powerpc_pcrel } */
-/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mpcrel-opt" } */
 
 #define TYPE   double
 #define LARGE  0x2
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c 
b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
index 7e1ff99f20e..95b60f3b151 100644
--- a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target powerpc_pcrel } */

[PING][PATCH] make rich_location safe to copy

2021-06-22 Thread Martin Sebor via Gcc-patches

Ping: David, I'm still looking for approval of the semi_embedded_vec
change in the originally posted patch (independent of the already
approved subsequent change to rich_location).

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572845.html

On 6/15/21 7:48 PM, Martin Sebor wrote:

While debugging locations I noticed the semi_embedded_vec template
in line-map.h doesn't declare a copy ctor or copy assignment, but
is being copied in a couple of places in the C++ parser (via
gcc_rich_location).  It gets away with it most likely because it
never grows beyond the embedded buffer.

The attached patch defines the copy ctor and also copy assignment
and adds the corresponding move functions.

Tested on x86_64-linux.

Martin




[PATCH v3] libstdc++: Improve std::lock algorithm

2021-06-22 Thread Jonathan Wakely via Gcc-patches
On Tue, 22 Jun 2021 at 20:51, Jonathan Wakely wrote:
>
> On Tue, 22 Jun 2021 at 17:03, Matthias Kretz  wrote:
> >
> > On Dienstag, 22. Juni 2021 17:20:41 CEST Jonathan Wakely wrote:
> > > On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:
> > > > This does a try_lock on all lockabes even if any of them fails. I think
> > > > that's
> > > > not only more expensive but also non-conforming. I think you need to 
> > > > defer
> > > > locking and then loop from beginning to end to break the loop on the 
> > > > first
> > > > unsuccessful try_lock.
> > >
> > > Oops, good point. I'll add a test for that too. Here's the fixed code:
> > >
> > > template
> > >   inline int
> > >   __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
> > >   {
> > > #if __cplusplus >= 201703L
> > > if constexpr ((is_same_v<_L0, _Lockables> && ...))
> > >   {
> > > constexpr int _Np = 1 + sizeof...(_Lockables);
> > > unique_lock<_L0> __locks[_Np] = {
> > > {__l0, defer_lock}, {__lockables, defer_lock}...
> > > };
> > > for (int __i = 0; __i < _Np; ++__i)
> >
> > I thought coding style requires a { here?
>
> Maybe for the compiler, but I don't think libstdc++ has such a rule. I
> can add the braces though, it's probably better.
>
> >
> > >   if (!__locks[__i].try_lock())
> > > {
> > >   const int __failed = __i;
> > >   while (__i--)
> > > __locks[__i].unlock();
> > >   return __i;
> >
> > You meant `return __failed`?
>
> Yep, copy error while trying to avoid the TABs in the real code
> screwing up the gmail formatting :-(
>
>
> > > }
> > > for (auto& __l : __locks)
> > >   __l.release();
> > > return -1;
> > >   }
> > > else
> > > #endif
> > >
> > > > [...]
> > > > Yes, if only we had a wrapping integer type that wraps at an arbitrary 
> > > > N.
> > > > Like
> > > >
> > > > unsigned int but with parameter, like:
> > > >   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> > > >
> > > > __locks[__k - 1].unlock();
> > > >
> > > > This is the loop I wanted to write, except --__k is simpler to write and
> > > > __k -
> > > > 1 would also wrap around to _Np - 1 for __k == 0. But if this is the 
> > > > only
> > > > place it's not important enough to abstract.
> > >
> > > We might be able to use __wrapping_uint in std::seed_seq::generate too, 
> > > and
> > > maybe some other places in . But we can add that later if we 
> > > decide
> > > it's worth it.
> >
> > OK.
> >
> > > > I also considered moving it down here. Makes sense unless you want to 
> > > > call
> > > > __detail::__lock_impl from other functions. And if we want to make it 
> > > > work
> > > > for
> > > > pre-C++11 we could do
> > > >
> > > >   using __homogeneous
> > > >
> > > > = __and_, is_same<_L1, _L3>...>;
> > > >
> > > >   int __i = 0;
> > > >   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
> > >
> > > We don't need tag dispatching, we could just do:
> > >
> > > if _GLIBCXX17_CONSTEXPR (homogeneous::value)
> > >  ...
> > > else
> > >  ...
> > >
> > > because both branches are valid for the homogeneous case, i.e. we aren't
> > > using if-constexpr to avoid invalid instantiations.
> >
> > But for the inhomogeneous case the homogeneous code is invalid 
> > (initialization
> > of C-array of unique_lock<_L1>).
>
> Oops, yeah of course.
>
> >
> > > But given that the default -std option is gnu++17 now, I'm OK with the
> > > iterative version only being used for C++17.
> >
> > Fair enough.

Here's what I've tested and pushed to trunk. Thanks for the
improvement and comments.
commit c556596119307f9ef1c9079ef2bd3a035dea355d
Author: Jonathan Wakely 
Date:   Tue Jun 22 13:35:19 2021

libstdc++: Simplify std::try_lock and std::lock further

The std::try_lock and std::lock algorithms can use iteration instead of
recursion when all lockables have the same type and can be held by an
array of unique_lock objects.

By making this change to __detail::__try_lock_impl it also benefits
__detail::__lock_impl, which uses it. For std::lock we can just put the
iterative version directly in std::lock, to avoid making any call to
__detail::__lock_impl.

Signed-off-by: Matthias Kretz 
Signed-off-by: Jonathan Wakely 

Co-authored-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

* include/std/mutex (lock): Replace recursion with iteration
when lockables all have the same type.
(__detail::__try_lock_impl): Likewise. Pass lockables as
parameters, instead of a tuple. Always lock the first one, and
recurse for the rest.
(__detail::__lock_impl): Adjust call to __try_lock_impl.
(__detail::__try_to_lock): Remove.
* 

[committed] libstdc++: Remove garbage collection support for C++23 [P2186R2]

2021-06-22 Thread Jonathan Wakely via Gcc-patches
This removes the non-functional garbage colection support from ,
as proposed for C++23 by P2186R2.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/memory (declare_reachable, undeclare_reachable)
(declare_no_pointers, undeclare_no_pointers, get_pointer_safety)
(pointer_safety): Only define for C++11 to C++20 inclusive.
* testsuite/20_util/pointer_safety/1.cc: Do not run for C++23.

Tested powerpc64le-linux. Committed to trunk.

commit b5a29741db11007e37d8d4ff977b89a8314abfda
Author: Jonathan Wakely 
Date:   Wed Jun 2 16:41:26 2021

libstdc++: Remove garbage collection support for C++23 [P2186R2]

This removes the non-functional garbage colection support from ,
as proposed for C++23 by P2186R2.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/memory (declare_reachable, undeclare_reachable)
(declare_no_pointers, undeclare_no_pointers, get_pointer_safety)
(pointer_safety): Only define for C++11 to C++20 inclusive.
* testsuite/20_util/pointer_safety/1.cc: Do not run for C++23.

diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
index f19de275b2b..da64be2471a 100644
--- a/libstdc++-v3/include/std/memory
+++ b/libstdc++-v3/include/std/memory
@@ -87,7 +87,7 @@
 #  include 
 #endif
 
-#if __cplusplus >= 201103L
+#if __cplusplus >= 201103L && __cplusplus <= 202002L
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -132,7 +132,7 @@ get_pointer_safety() noexcept { return 
pointer_safety::relaxed; }
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
-#endif // C++11
+#endif // C++11 to C++20
 
 #if __cplusplus >= 201703L
 // Parallel STL algorithms
diff --git a/libstdc++-v3/testsuite/20_util/pointer_safety/1.cc 
b/libstdc++-v3/testsuite/20_util/pointer_safety/1.cc
index 7d9a425e3e7..bfacbce27d2 100644
--- a/libstdc++-v3/testsuite/20_util/pointer_safety/1.cc
+++ b/libstdc++-v3/testsuite/20_util/pointer_safety/1.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-do run { target c++11 } }
+// { dg-do run { target { c++11 && { ! c++23 } } } }
 
 #include 
 #include 


[committed] libstdc++: Implement LWG 3422 for std::seed_seq

2021-06-22 Thread Jonathan Wakely via Gcc-patches
This ensures that the std::seed_seq initializer-list constructor will
not be used for list-initialization unless the initializers in the list
are integers. This allows list-initialization syntax to be used with a
pair of pointers and for that to use the appropriate constructor.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/random.h (seed_seq): Constrain initializer-list
constructor.
* include/bits/random.tcc (seed_seq): Add template parameter.
* testsuite/26_numerics/random/seed_seq/cons/default.cc: Check
for noexcept.
* testsuite/26_numerics/random/seed_seq/cons/initlist.cc: Check
constraints.

Tested powerpc64le-linux. Committed to trunk.

commit 6c63cb231e4cf99552bb7904ebe402f7adcafda4
Author: Jonathan Wakely 
Date:   Tue Jun 22 18:05:11 2021

libstdc++: Implement LWG 3422 for std::seed_seq

This ensures that the std::seed_seq initializer-list constructor will
not be used for list-initialization unless the initializers in the list
are integers. This allows list-initialization syntax to be used with a
pair of pointers and for that to use the appropriate constructor.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/random.h (seed_seq): Constrain initializer-list
constructor.
* include/bits/random.tcc (seed_seq): Add template parameter.
* testsuite/26_numerics/random/seed_seq/cons/default.cc: Check
for noexcept.
* testsuite/26_numerics/random/seed_seq/cons/initlist.cc: Check
constraints.

diff --git a/libstdc++-v3/include/bits/random.h 
b/libstdc++-v3/include/bits/random.h
index 0da013c5f45..ed0d7a832f1 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -6073,7 +6073,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _M_v()
 { }
 
-template
+template>>
   seed_seq(std::initializer_list<_IntType> __il);
 
 template
diff --git a/libstdc++-v3/include/bits/random.tcc 
b/libstdc++-v3/include/bits/random.tcc
index 1357e181874..8e2b702b0be 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -3231,7 +3231,7 @@ namespace __detail
 }
 
 
-  template
+  template
 seed_seq::seed_seq(std::initializer_list<_IntType> __il)
 {
   for (auto __iter = __il.begin(); __iter != __il.end(); ++__iter)
diff --git a/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/default.cc 
b/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/default.cc
index 18f55e723f0..62434a66591 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/default.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/default.cc
@@ -25,6 +25,9 @@
 #include 
 #include 
 
+static_assert( std::is_nothrow_default_constructible::value,
+  "LWG 3422" );
+
 void
 test01()
 {
@@ -34,7 +37,6 @@ test01()
   seq.generate(foo.begin(), foo.end());
 
   VERIFY( seq.size() == 0 );
-  //VERIFY();
 }
 
 int
diff --git 
a/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/initlist.cc 
b/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/initlist.cc
index 44b855e5627..1ed9eb784c3 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/initlist.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/seed_seq/cons/initlist.cc
@@ -36,6 +36,13 @@ test01()
   VERIFY( seq.size() == 10 );
 }
 
+void
+lwg3422()
+{
+  int i[32] = { };
+  std::seed_seq ss{i, i+32}; // LWG 3422
+}
+
 int main()
 {
   test01();


Re: [[PATCH V9] 0/7] Support for the CTF and BTF debug formats

2021-06-22 Thread Jason Merrill via Gcc-patches

On 6/21/21 10:01 AM, Richard Biener wrote:

On Mon, May 31, 2021 at 7:16 PM Jose E. Marchesi via Gcc-patches
 wrote:


[Changes from V8:
- Rebased to today's master.
- Adapted to use the write-symbols new infrastructure recently
   applied upstream.
- Little change in libiberty to copy .BTF sections over when
   LTOing.]

Hi people!

Last year we submitted a first patch series introducing support for
the CTF debugging format in GCC [1].  We got a lot of feedback that
prompted us to change the approach used to generate the debug info,
and this patch series is the result of that.

This series also add support for the BTF debug format, which is needed
by the BPF backend (more on this below.)

This implementation works, but there are several points that need
discussion and agreement with the upstream community, as they impact
the way debugging options work.  We are also proposing a way to add
additional debugging formats in the future.  See below for more
details.

Finally, a patch makes the BPF GCC backend to use the DWARF debug
hooks in order to make -gbtf available to it.

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-05/msg01297.html

About CTF
=

CTF is a debugging format designed in order to express C types in a
very compact way.  The key is compactness and simplicity.  For more
information see:

- CTF specification
   http://www.esperi.org.uk/~oranix/ctf/ctf-spec.pdf

- Compact C-Type support in the GNU toolchain (talk + slides)
   https://linuxplumbersconf.org/event/4/contributions/396/

- On type de-duplication in CTF (talk + slides)
   https://linuxplumbersconf.org/event/7/contributions/725/

About BTF
=

BTF is a debugging format, similar to CTF, that is used in the Linux
kernel as the debugging format for BPF programs.  From the kernel
documentation:

"BTF (BPF Type Format) is the metadata format which encodes the debug
  info related to BPF program/map. The name BTF was used initially to
  describe data types. The BTF was later extended to include function
  info for defined subroutines, and line info for source/line
  information."

Supporting BTF in GCC is important because compiled BPF programs
(which GCC supports as a target) require the type information in order
to be loaded and run in diverse kernel versions.  This mechanism is
known as CO-RE (compile-once, run-everywhere) and is described in the
"Update of the BPF support in the GNU Toolchain" talk mentioned below.

The BTF is documented in the Linux kernel documentation tree:
- linux/Documentation/bpf/btf.rst

CTF in the GNU Toolchain


During the last year we have been working in adding support for CTF to
several components of the GNU toolchain:

- binutils support is already upstream.  It supports linking objects
   with CTF information with full type de-duplication.

- GDB support is to be sent upstream very shortly.  It makes the
   debugger capable to use the CTF information whenever available.
   This is useful in cases where DWARF has been stripped out but CTF is
   kept.

- GCC support is being discussed and submitted in this series.

Overview of the Implementation
==

   dwarf2out.c

 The enabled debug formats are hooked in dwarf2out_early_finish.

   dwarf2int.h

 Internal interface that exports a few functions and data types
 defined in dwarf2out.c.

   dwarf2ctf.c

 Code that tranform the internal GCC DWARF DIEs into CTF container
 structures.  This file uses the dwarf2int.h interface.

   ctfc.c
   ctfc.h

 These two files implement the "CTF container", which is shared
 among CTF and BTF, due to the many similarities between both
 formats.

   ctfout.c

 Code that emits assembler with the .ctf section data, from the CTF
 container.

   btfout.c

 Code that emits assembler with the .BTF section data, from the CTF
 container.

 From debug hooks to debug formats
=

Our first attempt in adding CTF to GCC used the obvious approach of
adding a new set of debug hooks as defined in gcc/debug.h.

During our first interaction with the upstream community we were told
to _not_ use debug hooks, because these are to be obsoleted at some
point.  We were suggested to instead hook our handlers (which
processed type TREE nodes producing CTF types from them) somewhere
else.  So we did.

However at the time we were also facing the need to support BTF, which
is another type-related debug format needed by the BPF GCC backend.
Hooking here and there doesn't sound like such a good idea when it
comes to support several debug formats.

Therefore we thought about how to make GCC support diverse debugging
formats in a better way.  This led to a proposal we tried to discuss
at the GNU Tools Track in LPC2020:

- Update of the BPF support in the GNU Toolchain
   https://linuxplumbersconf.org/event/7/contributions/724/

Basically, the current situation in terms of diversity of debugging
formats in GCC can 

Re: [[PATCH V9] 3/7] CTF/BTF debug formats

2021-06-22 Thread Jason Merrill via Gcc-patches

On 5/31/21 12:57 PM, Jose E. Marchesi via Gcc-patches wrote:

@@ -28219,7 +28219,7 @@ dwarf2out_source_line (unsigned int line, unsigned int 
column,
dw_line_info_table *table;
static var_loc_view lvugid;
  
-  if (debug_info_level < DINFO_LEVEL_TERSE)

+  if (debug_info_level < DINFO_LEVEL_TERSE || !dwarf_debuginfo_p ())


This should have a comment that the current dwarf-based debug formats 
don't use line info.


Jason



Re: [PATCH] Add gnu::diagnose_as attribute

2021-06-22 Thread Jason Merrill via Gcc-patches

On 6/22/21 4:01 PM, Matthias Kretz wrote:

On Tuesday, 22 June 2021 21:52:16 CEST Jason Merrill wrote:

For alias templates, you probably want the attribute only on the
templated class, not on the instantiations.


Oh good point. My current patch does not allow the attribute on alias
templates. Consider:

template 
   struct X {};

template 
   using foo [[gnu::diagnose_as]] = X;

I have no idea how this could work. I would have to set the attribute for an
implicit partial specialization (not that I know of the existence of such a
thing)? I.e. X would have to be diagnosed as foo, but X would have to be diagnosed as X, not foo.

So if anything it should only support alias templates if they are strictly
"renaming" the type. I.e. their template parameters must match up exactly. Can
I constrain the attribute like this?


Yes.  You can check that with get_underlying_template.

Or you could support the above by putting the attribute on the 
instantiation with the TEMPLATE_INFO for foo rather than a simple name.


Jason



Re: [PATCH] c++: CTAD and deduction guide selection [PR86439]

2021-06-22 Thread Jonathan Wakely via Gcc-patches
On Tue, 22 Jun 2021 at 19:45, Patrick Palka wrote:
> This change causes us to reject some container CTAD examples in the
> libstdc++ testsuite due to deduction failure for {}, which AFAICT is the
> correct behavior.  Previously, in the case of e.g. the first removed
> example for std::map, the type of {} would be deduced to less as a
> side effect of forming the call to the selected guide
>
>   template,
>  typename _Allocator = allocator>,
>  typename = _RequireNotAllocator<_Compare>,
>  typename = _RequireAllocator<_Allocator>>
>   map(initializer_list>,
>   _Compare = _Compare(), _Allocator = _Allocator())
>   -> map<_Key, _Tp, _Compare, _Allocator>;
>
> which made later overload resolution for the constructor call
> unambiguous.  Now, the type of {} remains undeduced until constructor
> overload resolution, and we complain about ambiguity with the two
> constructors
>
>   map(initializer_list __l,
>   const _Compare& __comp = _Compare(),
>   const allocator_type& __a = allocator_type())
>
>   map(initializer_list __l, const allocator_type& __a)
>
> This patch just removes these problematic container CTAD examples.

Do all the problematic cases have a corresponding case that doesn't
use {} but uses an actual type?

If not, we might want to add such cases, to ensure we're still
covering all the cases that really *should* work.



Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface

2021-06-22 Thread Jason Merrill via Gcc-patches

On 5/31/21 12:57 PM, Jose E. Marchesi via Gcc-patches wrote:

This patch introduces a dwarf2int.h header, to be used by code that
needs access to the internal DIE structures and their attributes.


Why not put these bits in dwarf2out.h?


The following functions which were previously defined as static in
dwarf2out.c are now non-static, and extern prototypes for them have
been added to dwarf2int.h:

- get_AT
- AT_int
- get_AT_ref
- get_AT_string
- get_AT_class
- AT_unsigned
- get_AT_unsigned
- get_AT_flag
- add_name_attribute
- new_die_raw
- base_type_die
- lookup_decl_die
- get_AT_file

Note how this patch doens't change the names of these functions to
avoid a massive renaming in dwarf2out.c, but n the future we probably
want these functions to sport a dw_* prefix.

Also, some type definitions have been moved from dwarf2out.c to
dwarf2int.h:

- dw_attr_node
- struct dwarf_file_data

Finally, three new accessor functions have been added to dwarf2out.c
with prototypes in dwarf2int.h:

- dw_get_die_child
- dw_get_die_sib
- dw_get_die_tag

2021-05-14  Jose E. Marchesi  

* dwarf2int.h: New file.
* dwarf2out.c (get_AT): Function is no longer static.
(get_AT_string): Likewise.
(get_AT_flag): Likewise.
(get_AT_unsigned): Likewise.
(get_AT_ref): Likewise.
(new_die_raw): Likewise.
(lookup_decl_die): Likewise.
(base_type_die): Likewise.
(add_name_attribute): Likewise.
(dw_get_die_tag): New function.
(dw_get_die_child): Likewise.
(dw_get_die_sib): Likewise.
Include dwarf2int.h.
* gengtype.c: add dwarf2int.h to open_base_files.
* Makefile.in (GTFILES): Add dwarf2int.h.
---
  gcc/Makefile.in |  1 +
  gcc/dwarf2int.h | 67 +
  gcc/dwarf2out.c | 79 -
  gcc/gengtype.c  |  6 ++--
  4 files changed, 109 insertions(+), 44 deletions(-)
  create mode 100644 gcc/dwarf2int.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4cb2966157e..95d5e18ad9d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2648,6 +2648,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
$(srcdir)/ipa-modref.h $(srcdir)/ipa-modref.c \
$(srcdir)/ipa-modref-tree.h \
$(srcdir)/signop.h \
+  $(srcdir)/dwarf2int.h \
$(srcdir)/dwarf2out.h \
$(srcdir)/dwarf2asm.c \
$(srcdir)/dwarf2cfi.c \
diff --git a/gcc/dwarf2int.h b/gcc/dwarf2int.h
new file mode 100644
index 000..f49f51d957b
--- /dev/null
+++ b/gcc/dwarf2int.h
@@ -0,0 +1,67 @@
+/* Prototypes for functions manipulating DWARF2 DIEs.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* This file contains prototypes for functions defined in dwarf2out.c.  It is
+   intended to be included in source files that need some internal knowledge of
+   the GCC dwarf structures.  */
+
+#ifndef GCC_DWARF2INT_H
+#define GCC_DWARF2INT_H 1
+
+/* Each DIE attribute has a field specifying the attribute kind,
+   a link to the next attribute in the chain, and an attribute value.
+   Attributes are typically linked below the DIE they modify.  */
+
+typedef struct GTY(()) dw_attr_struct {
+  enum dwarf_attribute dw_attr;
+  dw_val_node dw_attr_val;
+}
+dw_attr_node;
+
+extern dw_attr_node *get_AT (dw_die_ref, enum dwarf_attribute);
+extern HOST_WIDE_INT AT_int (dw_attr_node *);
+extern unsigned HOST_WIDE_INT AT_unsigned (dw_attr_node *a);
+extern dw_die_ref get_AT_ref (dw_die_ref, enum dwarf_attribute);
+extern const char *get_AT_string (dw_die_ref, enum dwarf_attribute);
+extern enum dw_val_class AT_class (dw_attr_node *);
+extern unsigned HOST_WIDE_INT AT_unsigned (dw_attr_node *);
+extern unsigned get_AT_unsigned (dw_die_ref, enum dwarf_attribute);
+extern int get_AT_flag (dw_die_ref, enum dwarf_attribute);
+
+extern void add_name_attribute (dw_die_ref, const char *);
+
+extern dw_die_ref new_die_raw (enum dwarf_tag);
+extern dw_die_ref base_type_die (tree, bool);
+
+extern dw_die_ref lookup_decl_die (tree);
+
+extern dw_die_ref dw_get_die_child (dw_die_ref);
+extern dw_die_ref dw_get_die_sib (dw_die_ref);
+extern enum dwarf_tag dw_get_die_tag (dw_die_ref);
+
+/* Data about a single source file.  */
+struct GTY((for_user)) dwarf_file_data {
+  const char * filename;
+  int emitted_number;
+};
+
+extern struct 

Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-22 Thread Martin Sebor via Gcc-patches

On 6/21/21 1:15 AM, Richard Biener wrote:

On Fri, Jun 18, 2021 at 6:03 PM Martin Sebor  wrote:


On 6/18/21 4:38 AM, Richard Biener wrote:

On Thu, Jun 17, 2021 at 4:43 PM Martin Sebor  wrote:


On 6/17/21 12:03 AM, Richard Biener wrote:

On Wed, Jun 16, 2021 at 6:01 PM Martin Sebor  wrote:


On 6/16/21 6:46 AM, Richard Sandiford via Gcc-patches wrote:

Richard Biener via Gcc-patches  writes:

On Tue, Jun 15, 2021 at 8:02 AM Trevor Saunders  wrote:


This makes it clear the caller owns the vector, and ensures it is cleaned up.

Signed-off-by: Trevor Saunders 

bootstrapped and regtested on x86_64-linux-gnu, ok?


OK.

Btw, are "standard API" returns places we can use 'auto'?  That would avoid
excessive indent for

-  dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
-bbs.address (),
-bbs.length ());
+  auto_vec dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
+  bbs.address (),
+  bbs.length ());

and just uses

  auto dom_bbs = get_dominated_by_region (...

Not asking you to do this, just a question for the audience.


Personally I think this would be surprising for something that doesn't
have copy semantics.  (Not that I'm trying to reopen that debate here :-)
FWIW, I agree not having copy semantics is probably the most practical
way forward for now.)


But you did open the door for me to reiterate my strong disagreement
with that.  The best C++ practice going back to the early 1990's is
to make types safely copyable and assignable.  It is the default for
all types, in both C++ and C, and so natural and expected.

Preventing copying is appropriate in special and rare circumstances
(e.g, a mutex may not be copyable, or a file or iostream object may
not be because they represent a unique physical resource.)

In the absence of such special circumstances preventing copying is
unexpected, and in the case of an essential building block such as
a container, makes the type difficult to use.

The only argument for disabling copying that has been given is
that it could be surprising(*).  But because all types are copyable
by default the "surprise" is usually when one can't be.

I think Richi's "surprising" has to do with the fact that it lets
one inadvertently copy a large amount of data, thus leading to
an inefficiency.  But by analogy, there are infinitely many ways
to end up with inefficient code (e.g., deep recursion, or heap
allocation in a loop), and they are not a reason to ban the coding
constructs that might lead to it.

IIUC, Jason's comment about surprising effects was about implicit
conversion from auto_vec to vec.  I share that concern, and agree
that it should be addressed by preventing the conversion (as Jason
suggested).


But fact is that how vec<> and auto_vec<> are used today in GCC
do not favor that.  In fact your proposed vec<> would be quite radically
different (and IMHO vec<> and auto_vec<> should be unified then to
form your proposed new container).  auto_vec<> at the moment simply
maintains ownership like a smart pointer - which is _also_ not copyable.


Yes, as we discussed in the review below, vec is not a good model
because (as you note again above) it's constrained by its legacy
uses.  The best I think we can do for it is to make it safer to
use.
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571622.html


Which is what Trevors patches do by simply disallowing things
that do not work at the moment.


(Smart pointers don't rule out copying.  A std::unique_ptr does
and std::shared_ptr doesn't.  But vec and especially auto_vec
are designed to be containers, not "unique pointers" so any
relationship there is purely superficial and a distraction.)

That auto_vec and vec share a name and an is-a relationship is
incidental, an implementation detail leaked into the API.  A better
name than vector is hard to come up with, but the public inheritance
is a design flaw, a bug waiting to be introduced due to the conversion
and the assumptions the base vec makes about POD-ness and shallow
copying.  Hindsight is always 20/20 but past mistakes should not
dictate the design of a general purpose vector-like container in
GCC.


That auto_vec<> "decays" to vec<> was on purpose design.

By-value passing of vec<> is also on purpose to avoid an extra
pointer indirection on each access.


I think you may have misunderstood what I mean by is-a relationship.
It's fine to convert an auto_vec to another interface.  The danger
is in allowing that to happen implicitly because that tends to let
it happen even when it's not intended.  The usual way to avoid
that risk is to provide a conversion function, like
auto_vec::to_vec().  This is also why standard classes like
std::vector or std::string don't allow such implicit conversions
and instead provide member functions (see for example Stroustrup:
The C++ Programming 

Re: [PATCH] Add gnu::diagnose_as attribute

2021-06-22 Thread Matthias Kretz
On Tuesday, 22 June 2021 21:52:16 CEST Jason Merrill wrote:
> For alias templates, you probably want the attribute only on the
> templated class, not on the instantiations.

Oh good point. My current patch does not allow the attribute on alias 
templates. Consider:

template 
  struct X {};

template 
  using foo [[gnu::diagnose_as]] = X;

I have no idea how this could work. I would have to set the attribute for an 
implicit partial specialization (not that I know of the existence of such a 
thing)? I.e. X would have to be diagnosed as foo, but X would have to be diagnosed as X, not foo.

So if anything it should only support alias templates if they are strictly 
"renaming" the type. I.e. their template parameters must match up exactly. Can 
I constrain the attribute like this? Or should we rely on developers to be 
reasonable and only use it for template aliases with matching template params?

-Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──


Re: [PATCH] c++: CTAD and deduction guide selection [PR86439]

2021-06-22 Thread Jason Merrill via Gcc-patches

On 6/22/21 2:45 PM, Patrick Palka wrote:

During CTAD, we select the best viable deduction guide via
build_new_function_call, which performs overload resolution on the set
of candidate guides and then forms a call to the guide.  As the PR
points out, this latter step is unnecessary and occasionally gives us
the wrong answer since a call to the selected guide may be ill-formed,
or forming the call may have side effects such as prematurely deducing
the type of a {}.

This patch introduces a specialized subroutine modeled off of
build_new_function_call that stops short of building a call to the
selected function, and makes do_class_deduction use this subroutine
instead.  And since we no longer build a call, do_class_deduction
doesn't need to set tf_decltype or cp_unevaluated_operand.

This change causes us to reject some container CTAD examples in the
libstdc++ testsuite due to deduction failure for {}, which AFAICT is the
correct behavior.  Previously, in the case of e.g. the first removed
example for std::map, the type of {} would be deduced to less as a
side effect of forming the call to the selected guide

   template,
  typename _Allocator = allocator>,
  typename = _RequireNotAllocator<_Compare>,
  typename = _RequireAllocator<_Allocator>>
   map(initializer_list>,
   _Compare = _Compare(), _Allocator = _Allocator())
   -> map<_Key, _Tp, _Compare, _Allocator>;

which made later overload resolution for the constructor call
unambiguous.  Now, the type of {} remains undeduced until constructor
overload resolution, and we complain about ambiguity with the two
constructors

   map(initializer_list __l,
   const _Compare& __comp = _Compare(),
   const allocator_type& __a = allocator_type())

   map(initializer_list __l, const allocator_type& __a)

This patch just removes these problematic container CTAD examples.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/86439

gcc/cp/ChangeLog:

* call.c (print_error_for_call_failure): Constify 'args'
parameter.
(perform_dguide_overload_resolution): Define.
* cp-tree.h: (perform_dguide_overload_resolution): Declare.
* pt.c (do_class_deduction): Use perform_dguide_overload_resolution
instead of build_new_function_call.  Don't use tf_decltype or
set cp_unevaluated_operand.  Remove unnecessary NULL_TREE tests.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/map/cons/deduction.cc: Remove
ambiguous CTAD constructs.
* testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
* testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction88.C: New test.
* g++.dg/cpp1z/class-deduction89.C: New test.
* g++.dg/cpp1z/class-deduction90.C: New test.
---
  gcc/cp/call.c | 36 +++-
  gcc/cp/cp-tree.h  |  2 +
  gcc/cp/pt.c   | 41 +++
  .../g++.dg/cpp1z/class-deduction88.C  | 20 +
  .../g++.dg/cpp1z/class-deduction89.C  | 15 +++
  .../g++.dg/cpp1z/class-deduction90.C  | 16 
  .../23_containers/map/cons/deduction.cc   | 19 -
  .../23_containers/multimap/cons/deduction.cc  | 20 -
  .../23_containers/multiset/cons/deduction.cc  | 14 ---
  .../23_containers/set/cons/deduction.cc   | 15 ---
  .../unordered_map/cons/deduction.cc   | 16 
  .../unordered_multimap/cons/deduction.cc  | 16 
  .../unordered_multiset/cons/deduction.cc  | 10 -
  .../unordered_set/cons/deduction.cc   | 10 -
  14 files changed, 102 insertions(+), 148 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction88.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction89.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction90.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 9f03534c20c..aafc7acca24 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4629,7 +4629,7 @@ perform_overload_resolution (tree fn,
 functions.  */
  
  static void

-print_error_for_call_failure (tree fn, vec *args,
+print_error_for_call_failure (tree fn, const vec *args,
  struct z_candidate *candidates)
  {
tree targs = NULL_TREE;
@@ -4654,6 +4654,40 @@ print_error_for_call_failure (tree fn, vec 
*args,
  print_z_candidates (loc, candidates);
  }

Re: [PATCH] Add gnu::diagnose_as attribute

2021-06-22 Thread Jason Merrill via Gcc-patches

On 6/22/21 3:30 AM, Matthias Kretz wrote:

On Wednesday, 16 June 2021 02:48:09 CEST Jason Merrill wrote:

IIUC, your main concern is that my proposed diagnose_as *can* be used to
make diagnostics worse, by replacing names with strings that are not
valid identifiers. Of course, whoever uses the attribute to that effect
should have a good reason to do so. Is your other concern that using the
attribute in a "good" way is repetitive? Would you be happier if I make
the string argument to the attribute optional for type aliases?


Yes, and namespace aliases.


I'll look into making the attribute argument optional for aliases. Would you
accept the patch with this change?


Yes (after resolving any technical details, of course).


Questions:

1. If a type alias applies the attribute after a type was completed /
implicitly instantiated (and possibly already used in diagnostics) should /
can I still modify the type and add the attribute?


Yes, because it has no semantic effect.

For alias templates, you probably want the attribute only on the 
templated class, not on the instantiations.



2. About the namespace aliases: IIUC an attribute would currently be rejected
because of the C++ grammar. Do you want to make it valid before WG21
officially decides how to proceed? And if you have a pointer for me where I'd
have to adjust the grammar rules, that'd help. :)


You will want to adjust cp_parser_namespace_alias_definition to handle 
attributes like cp_parser_namespace_definition.  The latter currently 
accepts attributes both before and after the name, which seems like a 
good pattern to follow so it doesn't matter which WG21 chooses. 
Probably best to pedwarn about C++11 attributes in both locations for 
now, not just after.


Jason



Re: [PATCH v2] libstdc++: Improve std::lock algorithm

2021-06-22 Thread Jonathan Wakely via Gcc-patches
On Tue, 22 Jun 2021 at 17:03, Matthias Kretz  wrote:
>
> On Dienstag, 22. Juni 2021 17:20:41 CEST Jonathan Wakely wrote:
> > On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:
> > > This does a try_lock on all lockabes even if any of them fails. I think
> > > that's
> > > not only more expensive but also non-conforming. I think you need to defer
> > > locking and then loop from beginning to end to break the loop on the first
> > > unsuccessful try_lock.
> >
> > Oops, good point. I'll add a test for that too. Here's the fixed code:
> >
> > template
> >   inline int
> >   __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
> >   {
> > #if __cplusplus >= 201703L
> > if constexpr ((is_same_v<_L0, _Lockables> && ...))
> >   {
> > constexpr int _Np = 1 + sizeof...(_Lockables);
> > unique_lock<_L0> __locks[_Np] = {
> > {__l0, defer_lock}, {__lockables, defer_lock}...
> > };
> > for (int __i = 0; __i < _Np; ++__i)
>
> I thought coding style requires a { here?

Maybe for the compiler, but I don't think libstdc++ has such a rule. I
can add the braces though, it's probably better.

>
> >   if (!__locks[__i].try_lock())
> > {
> >   const int __failed = __i;
> >   while (__i--)
> > __locks[__i].unlock();
> >   return __i;
>
> You meant `return __failed`?

Yep, copy error while trying to avoid the TABs in the real code
screwing up the gmail formatting :-(


> > }
> > for (auto& __l : __locks)
> >   __l.release();
> > return -1;
> >   }
> > else
> > #endif
> >
> > > [...]
> > > Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> > > Like
> > >
> > > unsigned int but with parameter, like:
> > >   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> > >
> > > __locks[__k - 1].unlock();
> > >
> > > This is the loop I wanted to write, except --__k is simpler to write and
> > > __k -
> > > 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> > > place it's not important enough to abstract.
> >
> > We might be able to use __wrapping_uint in std::seed_seq::generate too, and
> > maybe some other places in . But we can add that later if we decide
> > it's worth it.
>
> OK.
>
> > > I also considered moving it down here. Makes sense unless you want to call
> > > __detail::__lock_impl from other functions. And if we want to make it work
> > > for
> > > pre-C++11 we could do
> > >
> > >   using __homogeneous
> > >
> > > = __and_, is_same<_L1, _L3>...>;
> > >
> > >   int __i = 0;
> > >   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
> >
> > We don't need tag dispatching, we could just do:
> >
> > if _GLIBCXX17_CONSTEXPR (homogeneous::value)
> >  ...
> > else
> >  ...
> >
> > because both branches are valid for the homogeneous case, i.e. we aren't
> > using if-constexpr to avoid invalid instantiations.
>
> But for the inhomogeneous case the homogeneous code is invalid (initialization
> of C-array of unique_lock<_L1>).

Oops, yeah of course.

>
> > But given that the default -std option is gnu++17 now, I'm OK with the
> > iterative version only being used for C++17.
>
> Fair enough.



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Richard Biener
On June 22, 2021 4:33:09 PM GMT+02:00, Qing Zhao  wrote:
>
>
>> On Jun 22, 2021, at 9:15 AM, Richard Biener 
>wrote:
>> 
>> On Tue, 22 Jun 2021, Qing Zhao wrote:
>> 
>>> 
>>> 
 On Jun 22, 2021, at 9:00 AM, Richard Biener 
>wrote:
 
 On Tue, 22 Jun 2021, Qing Zhao wrote:
 
> So, I am wondering why not still keep my current implementation on
>
> assign different patterns for different types?
> 
> This major issue with this design is the code size and runtime
>overhead, 
> but for debugging purpose, those are not that important, right?
>And we 
> can add some optimization later to improve the code size and
>runtime 
> overhead.
> 
> Otherwise, if we only use one pattern for all the types in this
>initial 
> version, later we still might need change it.
> 
> How do you think?
 
 No, let's not re-open that discussion.  As said we can look to
>support
 multi-byte pattern if that has a chance to improve things but only
 as followup.
>>> 
>>> I am fine with this.
>>> 
>>> However, we need to decide whether we will use one-byte repeatable
>pattern, or multiple-byte repeatable pattern now,
>>> Since the implementation will be different. If using one-byte, the
>implementation will be the simplest, we can use memset for all
>>> VLA, non-vla, zero-init, or pattern-init consistently.
>>> 
>>> However, if we choose multiple-byte pattern, then the implementation
>will be different, we cannot use memset for pattern-init, and 
>>> The implemenation for VLA pattern-init also is different.
>> 
>> As said, we can do this as followup.  For now get the easiest thing
>> working - one-byte patterns via memset.  
>
>Okay. I will work on this.
>
>> There's enough bits in the
>> patch that will likely need followup fixes (the .DEFERED_INIT stuff),
>
>Do you mean your previous suggestion to merge the handling of VLA to
>non-VLA during gimplification phase?
>I have done with this change locally.

No, just bugs that will inevitably show up. 

>> actual code gneration of the init is separate enough we can deal with
>> it later.  Also IMHO not all targets necessarily need to behave the
>> same there.
>
>Then, shall we make the code generation part a target hook now? Or do
>this later?

Do this later, if the need arises. 

Richard. 

>Qing
>> 
>> Richard.
>> 
>>> Qing
 
 Thanks,
 Richard.
 
> Qing
> 
> On Jun 22, 2021, at 3:59 AM, Richard Biener
>mailto:rguent...@suse.de>> wrote:
> 
> On Tue, 22 Jun 2021, Richard Sandiford wrote:
> 
> Kees Cook mailto:keesc...@chromium.org>>
>writes:
> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
> So, if “pattern value” is “0x”, then it’s a valid
>canonical virtual memory address.  However, for most OS,
>“0x” should be not in user space.
> 
> My question is, is “0xF” good for pointer? Or
>“0x” better?
> 
> I think 0xFF repeating is fine for this version. Everything else
>is a
> "nice to have" for the pattern-init, IMO. :)
> 
> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
> 
> For integer types, all values are valid representations, and we're
> relying on the pattern being “obviously” wrong in context. 
>0x…
> is unlikely to be a correct integer but 0x… would instead be a
> “nice” -1.  It would be difficult to tell in a debugger that a -1
> came from pattern init rather than a deliberate choice.
> 
> I agree that, all other things being equal, it would be nice to
>use NaNs
> for floats.  But relying on wrong numerical values for floats
>doesn't
> seem worse than doing that for integers.
> 
> 0xAA… for float is (if I've got this right)
>-3.0316488252093987e-13,
> which admittedly doesn't stand out as wrong.  But I'm not sure we
> should sacrifice integer debugging for float debugging here.
> 
> We can always expose the actual value as --param.  Now, I think
> we'd need a two-byte pattern to reliably produce NaNs anyway,
> so with floats taken out of the picture the focus should be on
> pointers where IMHO val & 1 and val & 15 would be nice to have.
> So sth like 0xf7 would work for those.  With a two-byte pattern
> we could use 0xffef or 0x7fef.
> 
> Anyway, it's probably down to priorities of the project involved
> (debugging FP stuff or integer stuff).
> 
> Richard.
> 
> 
 
 -- 
 Richard Biener 
 SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>Nuernberg,
 Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>>> 
>>> 
>> 
>> -- 
>> Richard Biener 
>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
>Nuernberg,
>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: rs6000: Fix typos in float128 ISA3.1 support

2021-06-22 Thread Segher Boessenkool
Hi!

On Mon, Jun 21, 2021 at 05:27:06PM +0800, Kewen.Lin wrote:
> Recently if we build gcc on Power with the assembler which doesn't
> have Power10 support, the build will fail when building libgcc with
> one error message like:
> 
> Error: invalid switch -mpower10
> Error: unrecognized option -mpower10
> make[2]: *** [...gcc/gcc-base/libgcc/shared-object.mk:14: float128-p10.o] 
> Error 1

In general, it is recommended to use a binutils of approximately the
same age as the GCC you are trying to build.  This is similar to us not
supporting most other non-sensical configurations.  An important reason
for that is it cannot ever be tested, there are just too many strange
combinations possible.

That said :-)

> By checking the culprit commit r12-1340, it's caused by some typos.

(That is 9090f4807161.)

>   - fix test case used for libgcc_cv_powerpc_3_1_float128_hw check.

I was confused here for a bit, "test case" usually means something in
testsuite/, I'd call this just "test" :-)

> BTW, there are some noises during regression testings due to
> newer versions binutils, but they were identified as unrelated
> after some checkings.

Hrm, what kind of noise?

>   * config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
>   fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
>   fp128_3_1_hw_obj): Remove variables for ISA 3.1 support.

Needs a space before the opening paren.  Doesn't need a line break so
early on that line btw.

Just "Remove." or "Delete." is less confusing btw: what you wrote can be
read as "Remove the variables from these declarations" or similar.  And
of course terseness is usually best in a changelog.

>   * config/rs6000/t-float128-p10-hw (FLOAT128_HW_INSNS): Append
>   macro FLOAT128_HW_INSNS_ISA3_1 for ISA 3.1 support.

Don't say what it is for, just say what changed :-)

>   (FP128_3_1_CFLAGS_HW): Fix option typo.
>   * config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): Guarded
>   with FLOAT128_HW_INSNS_ISA3_1.

"Guard", not "Guarded", all entries are written in the imperative, like,
"Do this" or "Guard that".

> +#ifdef FLOAT128_HW_INSNS_ISA3_1
>  TFtype __floattikf (TItype_ppc)
>__attribute__ ((__ifunc__ ("__floattikf_resolve")));

I wonder if we now need TItype_ppc at all anymore, btw?

Okay for trunk with the changelog slightly massaged.  Thanks!


Segher


Re: [wwwdocs] New C++23 papers

2021-06-22 Thread Marek Polacek via Gcc-patches
On Tue, Jun 22, 2021 at 02:38:46PM -0400, Jason Merrill wrote:
> On 6/22/21 1:14 PM, Marek Polacek wrote:
> > P1847 has always been "implemented" as the paper says.
> > P2186 needs a few libstdc++ changes that Jonathan already implemented.
> 
> I figured these removals didn't need to be called out in the table, but it's
> also fine to have them.
> 
> > commit 7b804041d34c344a190105e78c6058e2645bf7cb
> > Author: Marek Polacek 
> > Date:   Tue Jun 22 13:10:41 2021 -0400
> > 
> >  cxx-status: Add more C++23 proposals
> > 
> > diff --git a/htdocs/projects/cxx-status.html 
> > b/htdocs/projects/cxx-status.html
> > index 00a929a7..ec9b4933 100644
> > --- a/htdocs/projects/cxx-status.html
> > +++ b/htdocs/projects/cxx-status.html
> > @@ -111,6 +111,24 @@
> > Yes
> >  
> >   
> > +
> > +   Make declaration order layout mandated 
> > +   https://wg21.link/p1847r4;>P1847R4
> > +  Yes
> > +   
> > +
> > +
> > +   Removing Garbage Collection Support 
> > +   https://wg21.link/p2186r2;>P2186R2
> > +> href="../gcc-12/changes.html#cxx">12
> > +   
> > +
> > +
> > +   Simpler implicit move 
> > +   https://wg21.link/p2266r1;>P2266R1
> > +> href="https://gcc.gnu.org/PR101165;>No
> > +   
> > +
> 
> This paper hasn't been to Core; it's still waiting for a vote in EWG.
> https://github.com/cplusplus/papers/issues/968
> 
> It seems likely to be part of C++23, but isn't yet.

Ah, ok.  I've commented out for now.


Thanks,
Marek



[PATCH] c++: CTAD and deduction guide selection [PR86439]

2021-06-22 Thread Patrick Palka via Gcc-patches
During CTAD, we select the best viable deduction guide via
build_new_function_call, which performs overload resolution on the set
of candidate guides and then forms a call to the guide.  As the PR
points out, this latter step is unnecessary and occasionally gives us
the wrong answer since a call to the selected guide may be ill-formed,
or forming the call may have side effects such as prematurely deducing
the type of a {}.

This patch introduces a specialized subroutine modeled off of
build_new_function_call that stops short of building a call to the
selected function, and makes do_class_deduction use this subroutine
instead.  And since we no longer build a call, do_class_deduction
doesn't need to set tf_decltype or cp_unevaluated_operand.

This change causes us to reject some container CTAD examples in the
libstdc++ testsuite due to deduction failure for {}, which AFAICT is the
correct behavior.  Previously, in the case of e.g. the first removed
example for std::map, the type of {} would be deduced to less as a
side effect of forming the call to the selected guide

  template,
 typename _Allocator = allocator>,
 typename = _RequireNotAllocator<_Compare>,
 typename = _RequireAllocator<_Allocator>>
  map(initializer_list>,
  _Compare = _Compare(), _Allocator = _Allocator())
  -> map<_Key, _Tp, _Compare, _Allocator>;

which made later overload resolution for the constructor call
unambiguous.  Now, the type of {} remains undeduced until constructor
overload resolution, and we complain about ambiguity with the two
constructors

  map(initializer_list __l,
  const _Compare& __comp = _Compare(),
  const allocator_type& __a = allocator_type())

  map(initializer_list __l, const allocator_type& __a)

This patch just removes these problematic container CTAD examples.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/86439

gcc/cp/ChangeLog:

* call.c (print_error_for_call_failure): Constify 'args'
parameter.
(perform_dguide_overload_resolution): Define.
* cp-tree.h: (perform_dguide_overload_resolution): Declare.
* pt.c (do_class_deduction): Use perform_dguide_overload_resolution
instead of build_new_function_call.  Don't use tf_decltype or
set cp_unevaluated_operand.  Remove unnecessary NULL_TREE tests.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/map/cons/deduction.cc: Remove
ambiguous CTAD constructs.
* testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
* testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction88.C: New test.
* g++.dg/cpp1z/class-deduction89.C: New test.
* g++.dg/cpp1z/class-deduction90.C: New test.
---
 gcc/cp/call.c | 36 +++-
 gcc/cp/cp-tree.h  |  2 +
 gcc/cp/pt.c   | 41 +++
 .../g++.dg/cpp1z/class-deduction88.C  | 20 +
 .../g++.dg/cpp1z/class-deduction89.C  | 15 +++
 .../g++.dg/cpp1z/class-deduction90.C  | 16 
 .../23_containers/map/cons/deduction.cc   | 19 -
 .../23_containers/multimap/cons/deduction.cc  | 20 -
 .../23_containers/multiset/cons/deduction.cc  | 14 ---
 .../23_containers/set/cons/deduction.cc   | 15 ---
 .../unordered_map/cons/deduction.cc   | 16 
 .../unordered_multimap/cons/deduction.cc  | 16 
 .../unordered_multiset/cons/deduction.cc  | 10 -
 .../unordered_set/cons/deduction.cc   | 10 -
 14 files changed, 102 insertions(+), 148 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction88.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction89.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction90.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 9f03534c20c..aafc7acca24 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4629,7 +4629,7 @@ perform_overload_resolution (tree fn,
functions.  */
 
 static void
-print_error_for_call_failure (tree fn, vec *args,
+print_error_for_call_failure (tree fn, const vec *args,
  struct z_candidate *candidates)
 {
   tree targs = NULL_TREE;
@@ -4654,6 +4654,40 @@ print_error_for_call_failure (tree fn, vec 
*args,
 print_z_candidates (loc, candidates);
 }
 
+/* Perform overload resolution on the set of deduction guides DGUIDES
+   using 

Re: [RFC][PATCH] contrib: add git-commit-mklog wrapper

2021-06-22 Thread Jason Merrill via Gcc-patches

On 6/22/21 3:30 AM, Martin Liška wrote:

Hello.

There's a patch candidate that comes up with a wrapper for 'git 
commit-mklog' alias.

Using my patch, one can do:

$ git commit-mklog -a -b 12345,

Thoughts?


Looks good to me.

Can one do that without the wrapper script and passing data through env. 
variable?


The hook seems like the way to adjust the commit message, and we can't 
affect its command line arguments, so we're left with environment 
variables or a file somewhere for communicating to it.


Jason



Re: [wwwdocs] New C++23 papers

2021-06-22 Thread Jason Merrill via Gcc-patches

On 6/22/21 1:14 PM, Marek Polacek wrote:

P1847 has always been "implemented" as the paper says.
P2186 needs a few libstdc++ changes that Jonathan already implemented.


I figured these removals didn't need to be called out in the table, but 
it's also fine to have them.



commit 7b804041d34c344a190105e78c6058e2645bf7cb
Author: Marek Polacek 
Date:   Tue Jun 22 13:10:41 2021 -0400

 cxx-status: Add more C++23 proposals

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 00a929a7..ec9b4933 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -111,6 +111,24 @@
Yes
 
  
+
+   Make declaration order layout mandated 
+   https://wg21.link/p1847r4;>P1847R4
+  Yes
+   
+
+
+   Removing Garbage Collection Support 
+   https://wg21.link/p2186r2;>P2186R2
+   12
+   
+
+
+   Simpler implicit move 
+   https://wg21.link/p2266r1;>P2266R1
+   https://gcc.gnu.org/PR101165;>No
+   
+


This paper hasn't been to Core; it's still waiting for a vote in EWG.
https://github.com/cplusplus/papers/issues/968

It seems likely to be part of C++23, but isn't yet.

Jason



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Richard Sandiford via Gcc-patches
Kees Cook  writes:
> On Tue, Jun 22, 2021 at 09:25:57AM +0100, Richard Sandiford wrote:
>> Kees Cook  writes:
>> > On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
>> >> So, if “pattern value” is “0x”, then it’s a valid 
>> >> canonical virtual memory address.  However, for most OS, 
>> >> “0x” should be not in user space.
>> >> 
>> >> My question is, is “0xF” good for pointer? Or 
>> >> “0x” better?
>> >
>> > I think 0xFF repeating is fine for this version. Everything else is a
>> > "nice to have" for the pattern-init, IMO. :)
>> 
>> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
>> 
>> For integer types, all values are valid representations, and we're
>> relying on the pattern being “obviously” wrong in context.  0x…
>> is unlikely to be a correct integer but 0x… would instead be a
>> “nice” -1.  It would be difficult to tell in a debugger that a -1
>> came from pattern init rather than a deliberate choice.
>
> I can live with 0xAA. On x86_64, this puts it nicely in the middle of
> the middle of the non-canonical space:
>
> 0x8000 - 0x7fff
>
> The only trouble is with 32-bit, where the value 0x is a
> legitimate allocatable userspace address. If we want some kind-of middle
> ground, how about 0xFE? That'll be non-canonical on x86_64, and at the
> high end of the i386 kernel address space.

Sounds good to me FWIW.  That'd give float -1.694739530317379e+38
(suspiciously big even for astrophysics, I hope!) and would still
look unusual in an integer context.

>> I agree that, all other things being equal, it would be nice to use NaNs
>> for floats.  But relying on wrong numerical values for floats doesn't
>> seem worse than doing that for integers.
>> 
>> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
>> which admittedly doesn't stand out as wrong.  But I'm not sure we
>> should sacrifice integer debugging for float debugging here.
>
> In some future version type-specific patterns would be a nice improvement,
> but I don't want that to block getting the zero-init portion landed. :)

Yeah.

Thanks,
Richard


Re: [PATCH 2/2] elf: Add GNU_PROPERTY_1_NEEDED check

2021-06-22 Thread Fangrui Song

On 2021-06-22, H.J. Lu wrote:

On Mon, Jun 21, 2021 at 10:46 PM Fangrui Song  wrote:


On 2021-06-21, H.J. Lu wrote:
>On Mon, Jun 21, 2021 at 9:16 PM Alan Modra  wrote:
>>
>> On Mon, Jun 21, 2021 at 07:12:02PM -0700, H.J. Lu wrote:
>> > On Mon, Jun 21, 2021 at 5:06 PM Alan Modra  wrote:
>> > >
>> > > On Mon, Jun 21, 2021 at 03:34:38PM -0700, Fangrui Song wrote:
>> > > > clang -fno-pic -fno-direct-access-extern-data  works with 
clang>=12.0.0 today.
>> > >
>> > > -fno-direct-access-extern-data or variations on that also seem good to
>> > > me.  -fpic-extern would also work.  I liked -fprotected-abi because
>> > > it shows the intent of correcting abi issues related to protected
>> > > visibility.  (Yes, it affects code for all undefined symbols because
>> > > the compiler clearly isn't seeing the entire program if there are
>> > > undefined symbols.)
>> >
>> > I need an option which can be turned on and off.   How about
>> > -fextern-access=direct and -fextern-access=indirect?  It will cover
>> > both data and function?

-fno-direct-access-external-data and -fdirect-access-external-data can turn 
on/off the bit.

clang -fno-pic -fno-direct-access-external-data  works for x86-64 and aarch64.

We can add a -fno-direct-access-external


Since both clang and GCC will add a new option for both data and function
symbols, can we have an agreement on the new option name?  I am listing
options here:

1. -fdirect-access-external/-fno-direct-access-external
2. -fdirect-extern-access/-fno-direct-exern-access
3. -fdirect-external-access/-fno-direct-exernal-access
4. -fextern-access=direct/-fextern-access=indirect
5. -fexternal-access=direct/-fexternal-access=indirect

My order of preferences are 4, 5, 2, 3, 1.


Preferring "extern" to "external" looks fine to me. (`extern` is the C/C++ 
construct anyway and this option describes what to do with default visibility non-definition 
`extern int xxx`/`extern void foo()`).

-fextern-access=direct/-fextern-access=indirect and 
-fdirect-extern-access/-fno-direct-exern-access

look good to me.

I am happy to add aliases to clang if consensus is moving toward  
-fextern-access=indirect or -fno-direct-extern-access.


>> Yes, FWIW that option name for gcc also looks good to me.
>
>I will change the gcc option to
>
>-fextern-access=direct
>-fextern-access=indirect
>
>and change GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION
>to GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS

Note that this will be a glibc + GNU ld specific thing.

gold and ld.lld error for copy relocations on protected data symbols by default.


At run-time, there will be a mixture of components built with different tools
over time.  A marker will help glibc to avoid potential run-time failures due
to binary incompatibility.


glibc can perform the check without a GNU PROPERTY.


For a st_value!=0 && st_shndx==0 symbol lookup during relocation
processing, if the definition is found protected in a shared object,
ld.so can report an error and make a suggestion like
"consider recompiling the executable with -fno-direct-extern-access or
-fpie/-fpic"


I echo Michael's question in another thread
https://sourceware.org/pipermail/binutils/2021-June/117137.html

"IOW: which scenario do you want to not error on when you want to make the error 
conditional?"

For such rare cases, we can use a LD_* environment variable.


>> Now as to the need for a corresponding linker option, I'm of the
>> opinion that it is ideal for the linker to be able to cope without
>> needing special options.  Can you show me a set of object files (or
>> just describe them) where ld cannot deduce from relocations and
>> dynamic symbols what dynbss copies, plt stubs, and dynamic relocations
>> are needed?  I'm fairly sure I manage to do that for powerpc.
>>
>> Note that I'm not against a new option to force the linker to go
>> against what it would do based on input object files (perhaps
>
>I'd like to turn it on in linker without any compiler changes, especially
>when building shared libraries, kind of a subset of -Bsymbolic.
>
>> reporting errors), but don't think we should have a new option without
>> some effort being made to see whether we really need it.
>
>Here is a glibc patch to use both linker options on some testcases:
>
>https://sourceware.org/pipermail/libc-alpha/2021-June/127770.html
>
>> > > The main thing that struck me about -fsingle-global-definition is that
>> > > the option doesn't do what it says.  You can still have multiple
>> > > global definitions of a given symbol, one in the executable and one in
>> > > each of the shared libraries making up the complete program.  Which of
>> > > course is no different to code without -fsingle-global-definition.
>> >
>> >
>> > --
>> > H.J.
>>
>> --
>> Alan Modra
>> Australia Development Lab, IBM
>
>
>
>--
>H.J.




--
H.J.


Re: [PATCH] libstdc++: Fix for deadlock in std::counting_semaphore [PR100806]

2021-06-22 Thread Thomas Rodgers via Gcc-patches
Tested x86_64-pc-linux-gnu.
Committed to master, backported to releases/gcc-11.

On Thu, Jun 17, 2021 at 9:46 AM Jonathan Wakely 
wrote:

> On Wed, 16 Jun 2021 at 20:53, Thomas Rodgers 
> wrote:
> >
> > Same as previous version except removing the copyright notice from the
> > test.
> >
> > libstdc++-v3/ChangeLog:
> > libstdc++/PR100806
> > * include/bits/semaphore_base.h
> (__atomic_semaphore::_M_release():
> > Force _M_release() to wake all waiting threads.
> > * testsuite/30_threads/semaphore/100806.cc: New test.
>
> OK for trunk and 11, thanks.
>
>
> > ---
> >  libstdc++-v3/include/bits/semaphore_base.h|  4 +-
> >  .../testsuite/30_threads/semaphore/100806.cc  | 60 +++
> >  2 files changed, 63 insertions(+), 1 deletion(-)
> >  create mode 100644 libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
> >
> > diff --git a/libstdc++-v3/include/bits/semaphore_base.h
> b/libstdc++-v3/include/bits/semaphore_base.h
> > index 9a55978068f..c4565d7e560 100644
> > --- a/libstdc++-v3/include/bits/semaphore_base.h
> > +++ b/libstdc++-v3/include/bits/semaphore_base.h
> > @@ -256,7 +256,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >if (__update > 1)
> > __atomic_notify_address_bare(&_M_counter, true);
> >else
> > -   __atomic_notify_address_bare(&_M_counter, false);
> > +   __atomic_notify_address_bare(&_M_counter, true);
> > +// FIXME - Figure out why this does not wake a waiting thread
> > +// __atomic_notify_address_bare(&_M_counter, false);
> >  }
> >
> >private:
> > diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
> b/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
> > new file mode 100644
> > index 000..938c2793be1
> > --- /dev/null
> > +++ b/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
> > @@ -0,0 +1,60 @@
> > +// { dg-options "-std=gnu++2a -pthread" }
> > +// { dg-do run { target c++2a } }
> > +// { dg-require-effective-target pthread }
> > +// { dg-require-gthreads "" }
> > +// { dg-add-options libatomic }
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +std::counting_semaphore<4> semaphore{6};
> > +
> > +std::mutex mtx;
> > +std::vector results;
> > +
> > +void thread_main(size_t x)
> > +{
> > +  semaphore.acquire();
> > +  std::this_thread::sleep_for(std::chrono::milliseconds(100));
> > +  semaphore.release();
> > +  {
> > +std::ostringstream stm;
> > +stm << "Thread " << x << " finished.";
> > +std::lock_guard g{ mtx };
> > +results.push_back(stm.str());
> > +  }
> > +}
> > +
> > +int main()
> > +{
> > +
> > +constexpr auto nthreads = 10;
> > +
> > +std::vector threads(nthreads);
> > +
> > +
> > +size_t counter{0};
> > +for(auto& t : threads)
> > +{
> > +t = std::thread(thread_main, counter++);
> > +}
> > +
> > +for(auto& t : threads)
> > +  {
> > +t.join();
> > +{
> > +  std::lock_guard g{ mtx };
> > +  for (auto&& r : results)
> > +std::cout << r << '\n';
> > +  std::cout.flush();
> > +  results.clear();
> > +}
> > +  }
> > +}
> > --
> > 2.26.2
> >
>
>


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Kees Cook via Gcc-patches
On Tue, Jun 22, 2021 at 09:25:57AM +0100, Richard Sandiford wrote:
> Kees Cook  writes:
> > On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
> >> So, if “pattern value” is “0x”, then it’s a valid 
> >> canonical virtual memory address.  However, for most OS, 
> >> “0x” should be not in user space.
> >> 
> >> My question is, is “0xF” good for pointer? Or 
> >> “0x” better?
> >
> > I think 0xFF repeating is fine for this version. Everything else is a
> > "nice to have" for the pattern-init, IMO. :)
> 
> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
> 
> For integer types, all values are valid representations, and we're
> relying on the pattern being “obviously” wrong in context.  0x…
> is unlikely to be a correct integer but 0x… would instead be a
> “nice” -1.  It would be difficult to tell in a debugger that a -1
> came from pattern init rather than a deliberate choice.

I can live with 0xAA. On x86_64, this puts it nicely in the middle of
the middle of the non-canonical space:

0x8000 - 0x7fff

The only trouble is with 32-bit, where the value 0x is a
legitimate allocatable userspace address. If we want some kind-of middle
ground, how about 0xFE? That'll be non-canonical on x86_64, and at the
high end of the i386 kernel address space.

> I agree that, all other things being equal, it would be nice to use NaNs
> for floats.  But relying on wrong numerical values for floats doesn't
> seem worse than doing that for integers.
> 
> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
> which admittedly doesn't stand out as wrong.  But I'm not sure we
> should sacrifice integer debugging for float debugging here.

In some future version type-specific patterns would be a nice improvement,
but I don't want that to block getting the zero-init portion landed. :)

-- 
Kees Cook


[committed] analyzer: fix ICE on malloc/alloca param type mismatch [PR101143]

2021-06-22 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-1731-gea4e32181d7a36055b57421abd0ced4735654cf6.

gcc/analyzer/ChangeLog:
PR analyzer/101143
* region-model.cc (compat_types_p): New function.
(region_model::create_region_for_heap_alloc): Convert assertion to
an error check.
(region_model::create_region_for_alloca): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/101143
* gcc.dg/analyzer/pr101143.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc | 19 +++
 gcc/testsuite/gcc.dg/analyzer/pr101143.c | 18 ++
 2 files changed, 33 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr101143.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 462fe6d8b3c..ee11e82bdf2 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1443,6 +1443,17 @@ assert_compat_types (tree src_type, tree dst_type)
 }
 }
 
+/* Return true if SRC_TYPE can be converted to DST_TYPE as a no-op.  */
+
+static bool
+compat_types_p (tree src_type, tree dst_type)
+{
+  if (src_type && dst_type && !VOID_TYPE_P (dst_type))
+if (!(useless_type_conversion_p (src_type, dst_type)))
+  return false;
+  return true;
+}
+
 /* Get the region for PV within this region_model,
emitting any diagnostics to CTXT.  */
 
@@ -3402,8 +3413,8 @@ const region *
 region_model::create_region_for_heap_alloc (const svalue *size_in_bytes)
 {
   const region *reg = m_mgr->create_region_for_heap_alloc ();
-  assert_compat_types (size_in_bytes->get_type (), size_type_node);
-  set_dynamic_extents (reg, size_in_bytes);
+  if (compat_types_p (size_in_bytes->get_type (), size_type_node))
+set_dynamic_extents (reg, size_in_bytes);
   return reg;
 }
 
@@ -3414,8 +3425,8 @@ const region *
 region_model::create_region_for_alloca (const svalue *size_in_bytes)
 {
   const region *reg = m_mgr->create_region_for_alloca (m_current_frame);
-  assert_compat_types (size_in_bytes->get_type (), size_type_node);
-  set_dynamic_extents (reg, size_in_bytes);
+  if (compat_types_p (size_in_bytes->get_type (), size_type_node))
+set_dynamic_extents (reg, size_in_bytes);
   return reg;
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr101143.c 
b/gcc/testsuite/gcc.dg/analyzer/pr101143.c
new file mode 100644
index 000..bcc0974d4e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr101143.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-Wno-builtin-declaration-mismatch" } */
+
+extern void *malloc (unsigned int);
+extern void *alloca (unsigned int);
+extern void unknown_fn (void *);
+
+void *
+test_malloc (void)
+{
+  return malloc (sizeof (int));
+}
+
+void *
+test_alloca (void)
+{
+  void *p = alloca (sizeof (int));
+  unknown_fn (p);
+}
-- 
2.26.3



[wwwdocs] Document new C++ features in C++23

2021-06-22 Thread Marek Polacek via Gcc-patches
It's time to start adding new C++ features.

Pushed.

commit e373348138d8d767067c0a79b3ddc6a70cbee3a4
Author: Marek Polacek 
Date:   Tue Jun 22 13:19:37 2021 -0400

gcc-12/changes.html: Add if consteval

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 42ded16d..07f70b8b 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -72,7 +72,15 @@ a work-in-progress.
   
 
 
-
+C++
+
+  Several C++23 features have been implemented:
+
+  P1938R3, if consteval
+
+  
+
+
 
 
 



[PATCH 3/3] [amdgcn] Add hook for DWARF address spaces.

2021-06-22 Thread Hafiz Abid Qadeer
Map GCN address spaces to the proposed DWARF address spaces defined by AMD at
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-dwarf-address-class-mapping-table

gcc/

* config/gcn/gcn.c: Include dwarf2.h.
(gcn_addr_space_debug): New function.
(TARGET_ADDR_SPACE_DEBUG): New hook.
---
 gcc/config/gcn/gcn.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 0eac3aa3844..25996dc83de 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -50,6 +50,7 @@
 #include "varasm.h"
 #include "intl.h"
 #include "rtl-iter.h"
+#include "dwarf2.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1497,6 +1498,32 @@ gcn_addr_space_convert (rtx op, tree from_type, tree 
to_type)
 gcc_unreachable ();
 }
 
+/* Implement TARGET_ADDR_SPACE_DEBUG.
+
+   Return the dwarf address space class for each hardware address space.  */
+
+static int
+gcn_addr_space_debug (addr_space_t as)
+{
+  switch (as)
+{
+  case ADDR_SPACE_DEFAULT:
+  case ADDR_SPACE_FLAT:
+  case ADDR_SPACE_SCALAR_FLAT:
+  case ADDR_SPACE_FLAT_SCRATCH:
+   return DW_ADDR_none;
+  case ADDR_SPACE_GLOBAL:
+   return 1;  // DW_ADDR_LLVM_global
+  case ADDR_SPACE_LDS:
+   return 3;  // DW_ADDR_LLVM_group
+  case ADDR_SPACE_SCRATCH:
+   return 4;  // DW_ADDR_LLVM_private
+  case ADDR_SPACE_GDS:
+   return 0x8000; // DW_ADDR_AMDGPU_region
+}
+  gcc_unreachable ();
+}
+
 
 /* Implement REGNO_MODE_CODE_OK_FOR_BASE_P via gcn.h

@@ -6354,6 +6381,8 @@ gcn_dwarf_register_span (rtx rtl)
 
 #undef  TARGET_ADDR_SPACE_ADDRESS_MODE
 #define TARGET_ADDR_SPACE_ADDRESS_MODE gcn_addr_space_address_mode
+#undef  TARGET_ADDR_SPACE_DEBUG
+#define TARGET_ADDR_SPACE_DEBUG gcn_addr_space_debug
 #undef  TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P
 #define TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P \
   gcn_addr_space_legitimate_address_p
-- 
2.25.1



[PATCH 2/3] [amdgcn] Use frame pointer for CFA expressions.

2021-06-22 Thread Hafiz Abid Qadeer
As size of address is bigger than registers in amdgcn, we are forced to use
DW_CFA_def_cfa_expression to make an expression that concatenates multiple
registers for the value of the CFA.  This then prohibits us from using many
of the dwarf ops which expect CFA rule to be a single regsiter plus an offset.
Using frame pointer in the CFA rule is only real possibility as it is saved
in every frame and it is easy to unwind its value.

So unless user gives fomit-frame-pointer, we use frame pointer for the
cfi information.  This options also has a different default now.

gcc/

* common/config/gcn/gcn-common.c
(gcn_option_optimization_table): Change OPT_fomit_frame_pointer to -O3.
(gcn_expand_prologue): Prefer the frame pointer when emitting CFI.
(gcn_frame_pointer_rqd): New function.
(TARGET_FRAME_POINTER_REQUIRED): New hook.
---
 gcc/common/config/gcn/gcn-common.c |  2 +-
 gcc/config/gcn/gcn.c   | 60 +++---
 2 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/gcc/common/config/gcn/gcn-common.c 
b/gcc/common/config/gcn/gcn-common.c
index 305c310f940..695eb467e34 100644
--- a/gcc/common/config/gcn/gcn-common.c
+++ b/gcc/common/config/gcn/gcn-common.c
@@ -27,7 +27,7 @@
 /* Set default optimization options.  */
 static const struct default_options gcn_option_optimization_table[] =
   {
-{ OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 3ab16548aad..0eac3aa3844 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2900,10 +2900,14 @@ gcn_expand_prologue ()
  rtx adjustment = gen_int_mode (sp_adjust, SImode);
  rtx insn = emit_insn (gen_addsi3_scalar_carry (sp_lo, sp_lo,
 adjustment, scc));
- RTX_FRAME_RELATED_P (insn) = 1;
- add_reg_note (insn, REG_FRAME_RELATED_EXPR,
-   gen_rtx_SET (sp,
-gen_rtx_PLUS (DImode, sp, adjustment)));
+ if (!offsets->need_frame_pointer)
+   {
+ RTX_FRAME_RELATED_P (insn) = 1;
+ add_reg_note (insn, REG_FRAME_RELATED_EXPR,
+   gen_rtx_SET (sp,
+gen_rtx_PLUS (DImode, sp,
+  adjustment)));
+   }
  emit_insn (gen_addcsi3_scalar_zero (sp_hi, sp_hi, scc));
}
 
@@ -2917,25 +2921,24 @@ gcn_expand_prologue ()
  rtx adjustment = gen_int_mode (fp_adjust, SImode);
  rtx insn = emit_insn (gen_addsi3_scalar_carry(fp_lo, sp_lo,
adjustment, scc));
- RTX_FRAME_RELATED_P (insn) = 1;
- add_reg_note (insn, REG_FRAME_RELATED_EXPR,
-   gen_rtx_SET (fp,
-gen_rtx_PLUS (DImode, sp, adjustment)));
  emit_insn (gen_addcsi3_scalar (fp_hi, sp_hi,
 (fp_adjust < 0 ? GEN_INT (-1)
  : const0_rtx),
 scc, scc));
+
+ /* Set the CFA to the entry stack address, as an offset from the
+frame pointer.  This is preferred because the frame pointer is
+saved in each frame, whereas the stack pointer is not.  */
+ RTX_FRAME_RELATED_P (insn) = 1;
+ add_reg_note (insn, REG_CFA_DEF_CFA,
+   gen_rtx_PLUS (DImode, fp,
+ GEN_INT (-(offsets->pretend_size
++ offsets->callee_saves;
}
 
   rtx_insn *seq = get_insns ();
   end_sequence ();
 
-  /* FIXME: Prologue insns should have this flag set for debug output, etc.
-but it causes issues for now.
-  for (insn = seq; insn; insn = NEXT_INSN (insn))
-if (INSN_P (insn))
- RTX_FRAME_RELATED_P (insn) = 1;*/
-
   emit_insn (seq);
 }
   else
@@ -3011,6 +3014,16 @@ gcn_expand_prologue ()
gen_rtx_SET (sp, gen_rtx_PLUS (DImode, sp,
   dbg_adjustment)));
 
+  if (offsets->need_frame_pointer)
+   {
+ /* Set the CFA to the entry stack address, as an offset from the
+frame pointer.  This is necessary when alloca is used, and
+harmless otherwise.  */
+ rtx neg_adjust = gen_int_mode (-offsets->callee_saves, DImode);
+ add_reg_note (insn, REG_CFA_DEF_CFA,
+   gen_rtx_PLUS (DImode, fp, neg_adjust));
+   }
+
   /* Make sure the flat scratch reg doesn't get optimised away.  */
   emit_insn (gen_prologue_use (gen_rtx_REG (DImode, FLAT_SCRATCH_REG)));
 }
@@ -3114,6 +3127,23 @@ 

[PATCH 1/3] [amdgcn] Update CFI configuration

2021-06-22 Thread Hafiz Abid Qadeer
Currently we don't get any call frame information for the amdgcn target.
This patch makes necessary adjustments to generate CFI that can work with
ROCGDB (ROCm 3.8+).

gcc/

* config/gcn/gcn.c (move_callee_saved_registers): Emit CFI notes for
prologue register saves.
(gcn_debug_unwind_info): Use UI_DWARF2.
(gcn_dwarf_register_number): Map DWARF_LINK_REGISTER to DWARF PC.
(gcn_dwarf_register_span): DWARF_LINK_REGISTER doesn't span.
* config/gcn/gcn.h: (DWARF_FRAME_RETURN_COLUMN): New define.
(DWARF_LINK_REGISTER): New define.
(FIRST_PSEUDO_REGISTER): Increment.
(FIXED_REGISTERS): Add entry for DWARF_LINK_REGISTER.
(CALL_USED_REGISTERS): Likewise.
(REGISTER_NAMES): Likewise.
---
 gcc/config/gcn/gcn.c | 82 
 gcc/config/gcn/gcn.h | 10 +++---
 2 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 283a91fe50a..3ab16548aad 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2649,6 +2649,7 @@ move_callee_saved_registers (rtx sp, machine_function 
*offsets,
   rtx as = gen_rtx_CONST_INT (VOIDmode, STACK_ADDR_SPACE);
   HOST_WIDE_INT exec_set = 0;
   int offreg_set = 0;
+  auto_vec saved_sgprs;
 
   start_sequence ();
 
@@ -2665,7 +2666,10 @@ move_callee_saved_registers (rtx sp, machine_function 
*offsets,
int lane = saved_scalars % 64;
 
if (prologue)
- emit_insn (gen_vec_setv64si (vreg, reg, GEN_INT (lane)));
+ {
+   emit_insn (gen_vec_setv64si (vreg, reg, GEN_INT (lane)));
+   saved_sgprs.safe_push (regno);
+ }
else
  emit_insn (gen_vec_extractv64sisi (reg, vreg, GEN_INT (lane)));
 
@@ -2698,7 +2702,7 @@ move_callee_saved_registers (rtx sp, machine_function 
*offsets,
  gcn_gen_undef (V64SImode), exec));
 
   /* Move vectors.  */
-  for (regno = FIRST_VGPR_REG, offset = offsets->pretend_size;
+  for (regno = FIRST_VGPR_REG, offset = 0;
regno < FIRST_PSEUDO_REGISTER; regno++)
 if ((df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
|| (regno == VGPR_REGNO (6) && saved_scalars > 0)
@@ -2719,8 +2723,67 @@ move_callee_saved_registers (rtx sp, machine_function 
*offsets,
  }
 
if (prologue)
- emit_insn (gen_scatterv64si_insn_1offset_exec (vsp, const0_rtx, reg,
-as, const0_rtx, exec));
+ {
+   rtx insn = emit_insn (gen_scatterv64si_insn_1offset_exec
+ (vsp, const0_rtx, reg, as, const0_rtx,
+  exec));
+
+   /* Add CFI metadata.  */
+   rtx note;
+   if (regno == VGPR_REGNO (6) || regno == VGPR_REGNO (7))
+ {
+   int start = (regno == VGPR_REGNO (7) ? 64 : 0);
+   int count = MIN (saved_scalars - start, 64);
+   int add_lr = (regno == VGPR_REGNO (6)
+ && df_regs_ever_live_p (LINK_REGNUM));
+   int lrdest = -1;
+   rtvec seq = rtvec_alloc (count + add_lr);
+
+   /* Add an REG_FRAME_RELATED_EXPR entry for each scalar
+  register that was saved in this batch.  */
+   for (int idx = 0; idx < count; idx++)
+ {
+   int stackaddr = offset + idx * 4;
+   rtx dest = gen_rtx_MEM (SImode,
+   gen_rtx_PLUS
+   (DImode, sp,
+GEN_INT (stackaddr)));
+   rtx src = gen_rtx_REG (SImode, saved_sgprs[start + idx]);
+   rtx set = gen_rtx_SET (dest, src);
+   RTX_FRAME_RELATED_P (set) = 1;
+   RTVEC_ELT (seq, idx) = set;
+
+   if (saved_sgprs[start + idx] == LINK_REGNUM)
+ lrdest = stackaddr;
+ }
+
+   /* Add an additional expression for DWARF_LINK_REGISTER if
+  LINK_REGNUM was saved.  */
+   if (lrdest != -1)
+ {
+   rtx dest = gen_rtx_MEM (DImode,
+   gen_rtx_PLUS
+   (DImode, sp,
+GEN_INT (lrdest)));
+   rtx src = gen_rtx_REG (DImode, DWARF_LINK_REGISTER);
+   rtx set = gen_rtx_SET (dest, src);
+   RTX_FRAME_RELATED_P (set) = 1;
+   RTVEC_ELT (seq, count) = set;
+ }
+
+   note = gen_rtx_SEQUENCE (VOIDmode, seq);
+ }
+   else
+ {
+   rtx dest = gen_rtx_MEM (V64SImode,
+   gen_rtx_PLUS (DImode, sp,
+  

[PATCH 0/3] [amdgcn] Improve debug support.

2021-06-22 Thread Hafiz Abid Qadeer
This patch series improves debug experience with ROCGDB.  It enables
generation of CFI, makes use of frame pointer in CFA calculations and
uses proposed values of for address spaces in the debug information.

Tested amdgcn-amdhsa target with no regression.  Debugger testing was
done manually. 

Hafiz Abid Qadeer (3):
  [amdgcn] Update CFI configuration
  [amdgcn] Use frame pointer for CFA expressions.
  [amdgcn] Add hook for DWARF address spaces.

 gcc/common/config/gcn/gcn-common.c |   2 +-
 gcc/config/gcn/gcn.c   | 171 +
 gcc/config/gcn/gcn.h   |  10 +-
 3 files changed, 157 insertions(+), 26 deletions(-)

-- 
2.25.1



[wwwdocs] New C++23 papers

2021-06-22 Thread Marek Polacek via Gcc-patches
P1847 has always been "implemented" as the paper says.
P2186 needs a few libstdc++ changes that Jonathan already implemented.

Pushed.

commit 7b804041d34c344a190105e78c6058e2645bf7cb
Author: Marek Polacek 
Date:   Tue Jun 22 13:10:41 2021 -0400

cxx-status: Add more C++23 proposals

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 00a929a7..ec9b4933 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -111,6 +111,24 @@
   Yes

 
+
+   Make declaration order layout mandated 
+   https://wg21.link/p1847r4;>P1847R4
+  Yes
+   
+
+
+   Removing Garbage Collection Support 
+   https://wg21.link/p2186r2;>P2186R2
+   12
+   
+
+
+   Simpler implicit move 
+   https://wg21.link/p2266r1;>P2266R1
+   https://gcc.gnu.org/PR101165;>No
+   
+
 
CWG 2397: auto specifier for pointers and references to arrays 
https://wg21.link/cwg2397;>CWG2397



Re: predcom: Refactor more by encapsulating global states

2021-06-22 Thread Martin Sebor via Gcc-patches

On 6/21/21 8:35 PM, Kewen.Lin wrote:

Hi Richi and Martin,



Thanks Richi!  One draft (not ready for review) is attached for the further
discussion.  It follows the idea of RAII-style cleanup.  I noticed that
Martin suggested stepping forward to make tree_predictive_commoning_loop
and its callees into one class (Thanks Martin), since there are not many
this kind of C++-style work functions, I want to double confirm which option
do you guys prefer?



Such general cleanup is of course desired - Giuliano started some of it within
GSoC two years ago in the attempt to thread the compilation process.  The
cleanup then helps to get rid of global state which of course interferes here
(and avoids unnecessary use of TLS vars).

So yes, encapsulating global state into a class and making accessors
member functions is something that is desired (but a lot of mechanical
work).

Thanks
Richard.

I meant that not necessarily as something to include in this patch
but as a suggestion for a future improvement.  If you'd like to
tackle it at any point that would be great of course   In any
event, thanks for double-checking!

The attached patch looks good to me as well (more for the sake of
style than anything else, declaring the class copy ctor and copy assignment = 
delete would > make it clear it's not meant to be
copied, although in this case it's unlikely to make a practical
difference).

Martin.



Thanks for your explanation!  Sorry for the late response.
As the way to encapsulate global state into a class and making accessors
member functions looks more complete, I gave up the RAII draft and
switched onto this way.

This patch is to encapsulate global states into a class and
making their accessors as member functions, remove some
consequent useless clean up code, and do some clean up with
RAII.


Nice!

A further improvement worth considering (if you're so inclined :)
is replacing the pcom_worker vec members with auto_vec (obviating
having to explicitly release them) and for the same reason also
replacing the comp_ptrs bare pointer members with auto_vecs.
There may be other opportunities to do the same in individual
functions (I'd look to get rid of as many calls to functions
like XNEW()/XNEWVEC() and free() use auto_vec instead).

An unrelated but worthwhile change is to replace the FOR_EACH_
loops with C++ 11 range loops, analogously to:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html

Finally, the only loosely followed naming convention for member
variables is to start them with the m_ prefix.

These just suggestions that could be done in a followup, not
something I would consider prerequisite for accepting the patch
as is if I were in a position to make such a decision.

Martin



Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* tree-predcom.c (class pcom_worker): New class.
(release_chain): Renamed to...
(pcom_worker::release_chain): ...this.
(release_chains): Renamed to...
(pcom_worker::release_chains): ...this.
(aff_combination_dr_offset): Renamed to...
(pcom_worker::aff_combination_dr_offset): ...this.
(determine_offset): Renamed to...
(pcom_worker::determine_offset): ...this.
(class comp_ptrs): New class.
(split_data_refs_to_components): Renamed to...
(pcom_worker::split_data_refs_to_components): ...this,
and update with class comp_ptrs.
(suitable_component_p): Renamed to...
(pcom_worker::suitable_component_p): ...this.
(filter_suitable_components): Renamed to...
(pcom_worker::filter_suitable_components): ...this.
(valid_initializer_p): Renamed to...
(pcom_worker::valid_initializer_p): ...this.
(find_looparound_phi): Renamed to...
(pcom_worker::find_looparound_phi): ...this.
(add_looparound_copies): Renamed to...
(pcom_worker::add_looparound_copies): ...this.
(determine_roots_comp): Renamed to...
(pcom_worker::determine_roots_comp): ...this.
(determine_roots): Renamed to...
(pcom_worker::determine_roots): ...this.
(single_nonlooparound_use): Renamed to...
(pcom_worker::single_nonlooparound_use): ...this.
(remove_stmt): Renamed to...
(pcom_worker::remove_stmt): ...this.
(execute_pred_commoning_chain): Renamed to...
(pcom_worker::execute_pred_commoning_chain): ...this.
(execute_pred_commoning): Renamed to...
(pcom_worker::execute_pred_commoning): ...this.
(struct epcc_data): New member worker.
(execute_pred_commoning_cbck): Call execute_pred_commoning
with pcom_worker pointer.
(find_use_stmt): Renamed to...
(pcom_worker::find_use_stmt): ...this.
(find_associative_operation_root): Renamed to...

Re: [PATCH v2] libstdc++: Improve std::lock algorithm

2021-06-22 Thread Matthias Kretz
On Dienstag, 22. Juni 2021 17:20:41 CEST Jonathan Wakely wrote:
> On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:
> > This does a try_lock on all lockabes even if any of them fails. I think
> > that's
> > not only more expensive but also non-conforming. I think you need to defer
> > locking and then loop from beginning to end to break the loop on the first
> > unsuccessful try_lock.
> 
> Oops, good point. I'll add a test for that too. Here's the fixed code:
> 
> template
>   inline int
>   __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
>   {
> #if __cplusplus >= 201703L
> if constexpr ((is_same_v<_L0, _Lockables> && ...))
>   {
> constexpr int _Np = 1 + sizeof...(_Lockables);
> unique_lock<_L0> __locks[_Np] = {
> {__l0, defer_lock}, {__lockables, defer_lock}...
> };
> for (int __i = 0; __i < _Np; ++__i)

I thought coding style requires a { here?

>   if (!__locks[__i].try_lock())
> {
>   const int __failed = __i;
>   while (__i--)
> __locks[__i].unlock();
>   return __i;

You meant `return __failed`?

> }
> for (auto& __l : __locks)
>   __l.release();
> return -1;
>   }
> else
> #endif
> 
> > [...]
> > Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> > Like
> > 
> > unsigned int but with parameter, like:
> >   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> >   
> > __locks[__k - 1].unlock();
> > 
> > This is the loop I wanted to write, except --__k is simpler to write and
> > __k -
> > 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> > place it's not important enough to abstract.
> 
> We might be able to use __wrapping_uint in std::seed_seq::generate too, and
> maybe some other places in . But we can add that later if we decide
> it's worth it.

OK.

> > I also considered moving it down here. Makes sense unless you want to call
> > __detail::__lock_impl from other functions. And if we want to make it work
> > for
> > pre-C++11 we could do
> > 
> >   using __homogeneous
> >   
> > = __and_, is_same<_L1, _L3>...>;
> >   
> >   int __i = 0;
> >   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
> 
> We don't need tag dispatching, we could just do:
> 
> if _GLIBCXX17_CONSTEXPR (homogeneous::value)
>  ...
> else
>  ...
> 
> because both branches are valid for the homogeneous case, i.e. we aren't
> using if-constexpr to avoid invalid instantiations.

But for the inhomogeneous case the homogeneous code is invalid (initialization 
of C-array of unique_lock<_L1>).

> But given that the default -std option is gnu++17 now, I'm OK with the
> iterative version only being used for C++17.

Fair enough.

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──





Re: [[PATCH V9] 0/7] Support for the CTF and BTF debug formats

2021-06-22 Thread Indu Bhagat via Gcc-patches

On 6/21/21 7:01 AM, Richard Biener via Gcc-patches wrote:

Command line options for debug formats
==

This implementation adds the following command-line options to select the
emission of CTF and BTF:

  -gctf[123]
  -gbtf

These options mimic the -g[123...] options for DWARF.

This involved adding new entries for debug_info_type:

  CTF_DEBUG- Write CTF debug info.
  BTF_DEBUG- Write BTF debug info.
  CTF_AND_DWARF2_DEBUG - Write both CTF and DWARF info.

That's probably obsolete info now?



Yes, that's correct. Since GCC now supports bitmasks in the 
write_symbols, defining entries for combination of debug formats like 
CTF_AND_DWARF2_DEBUG is not necessary.


Thanks for pointing it out.
Indu



Re: [PATCH v2] libstdc++: Improve std::lock algorithm

2021-06-22 Thread Jonathan Wakely via Gcc-patches
On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:

> On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote:
> > With your suggestion
> > to also drop std::tuple the number of parameters decides which
> > function we call. And we don't instantiate std::tuple. And we can also
> > get rid of the __try_to_lock function, which was only used to deduce
> > the lock type rather than use tuple_element to get it. That's much
> > nicer.
>
> 
>
> > > How about optimizing a likely common case where all lockables have the
> > > same
> > > type? In that case we don't require recursion and can manage stack
> usage
> > > much
> > > simpler:
> > The stack usage is bounded by the number of mutexes being locked,
> > which is unlikely to get large, but we can do that.
>
> Right. I meant simpler, because it takes a while of staring at the
> recursive
> implementation to understand how it works. :)
>
> > We can do it for try_lock too:
> >
> >   template
> > int
> > try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> > {
> > #if __cplusplus >= 201703L
> >   if constexpr (is_same_v<_L1, _L2>
> > && (is_same_v<_L1, _L3> && ...))
> > {
> >   constexpr int _Np = 2 + sizeof...(_L3);
> >   unique_lock<_L1> __locks[_Np] = {
> >   {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3,
> > try_to_lock}... };
>
> This does a try_lock on all lockabes even if any of them fails. I think
> that's
> not only more expensive but also non-conforming. I think you need to defer
> locking and then loop from beginning to end to break the loop on the first
> unsuccessful try_lock.
>

Oops, good point. I'll add a test for that too. Here's the fixed code:

template
  inline int
  __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
  {
#if __cplusplus >= 201703L
if constexpr ((is_same_v<_L0, _Lockables> && ...))
  {
constexpr int _Np = 1 + sizeof...(_Lockables);
unique_lock<_L0> __locks[_Np] = {
{__l0, defer_lock}, {__lockables, defer_lock}...
};
for (int __i = 0; __i < _Np; ++__i)
  if (!__locks[__i].try_lock())
{
  const int __failed = __i;
  while (__i--)
__locks[__i].unlock();
  return __i;
}
for (auto& __l : __locks)
  __l.release();
return -1;
  }
else
#endif




> >   for (int __i = 0; __i < _Np; ++__i)
> > if (!__locks[__i])
> >   return __i;
> >   for (auto& __l : __locks)
> > __l.release();
> >   return -1;
> > }
> >   else
> > #endif
> >   return __detail::__try_lock_impl(__l1, __l2, __l3...);
> > }
> >
> > > if constexpr ((is_same_v<_L0, _L1> && ...))
> > > {
> > > constexpr int _Np = 1 + sizeof...(_L1);
> > > std::array, _Np> __locks = {
> > > {__l0, defer_lock}, {__l1, defer_lock}...
> > > };
> > > int __first = 0;
> > > do {
> > > __locks[__first].lock();
> > > for (int __j = 1; __j < _Np; ++__j)
> > > {
> > > const int __idx = (__first + __j) % _Np;
> > > if (!__locks[__idx].try_lock())
> > > {
> > > for (int __k = __idx; __k != __first;
> > > __k = __k == 1 ? _Np : __k - 1)
> > > __locks[__k - 1].unlock();
> >
> > This loop doesn't work if any try_lock fails when first==0, because
> > the loop termination condition is never reached.
>
> Uh yes. Which is the same reason why the __j loop doesn't start from
> __first +
> 1.
>
> > I find this a bit easier to understand than the loop above, and
> > correct (I think):
> >
> >   for (int __k = __j; __k != 0; --__k)
> > __locks[(__first + __k - 1) % _Np].unlock();
>
> Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> Like
> unsigned int but with parameter, like:
>
>   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
> __locks[__k - 1].unlock();
>
> This is the loop I wanted to write, except --__k is simpler to write and
> __k -
> 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> place it's not important enough to abstract.
>

We might be able to use __wrapping_uint in std::seed_seq::generate too, and
maybe some other places in . But we can add that later if we decide
it's worth it.


> > [...]
> > @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > *  @post All arguments are locked.
> > *
> > *  All arguments are locked via a sequence of calls to lock(),
> > try_lock() -   *  and unlock().  If the call exits via an exception any
> > locks that were -   *  obtained will be released.
> > +   *  and unlock().  If this function exits via an exception any locks
> that
> > +   *  were obtained will be released.
> > */
> >template
> >  void
> >  lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> >  {
> > -  int __i = 0;
> > -  __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> > +#if 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches


> On Jun 22, 2021, at 9:15 AM, Richard Biener  wrote:
> 
> On Tue, 22 Jun 2021, Qing Zhao wrote:
> 
>> 
>> 
>>> On Jun 22, 2021, at 9:00 AM, Richard Biener  wrote:
>>> 
>>> On Tue, 22 Jun 2021, Qing Zhao wrote:
>>> 
 So, I am wondering why not still keep my current implementation on 
 assign different patterns for different types?
 
 This major issue with this design is the code size and runtime overhead, 
 but for debugging purpose, those are not that important, right? And we 
 can add some optimization later to improve the code size and runtime 
 overhead.
 
 Otherwise, if we only use one pattern for all the types in this initial 
 version, later we still might need change it.
 
 How do you think?
>>> 
>>> No, let's not re-open that discussion.  As said we can look to support
>>> multi-byte pattern if that has a chance to improve things but only
>>> as followup.
>> 
>> I am fine with this.
>> 
>> However, we need to decide whether we will use one-byte repeatable pattern, 
>> or multiple-byte repeatable pattern now,
>> Since the implementation will be different. If using one-byte, the 
>> implementation will be the simplest, we can use memset for all
>> VLA, non-vla, zero-init, or pattern-init consistently.
>> 
>> However, if we choose multiple-byte pattern, then the implementation will be 
>> different, we cannot use memset for pattern-init, and 
>> The implemenation for VLA pattern-init also is different.
> 
> As said, we can do this as followup.  For now get the easiest thing
> working - one-byte patterns via memset.  

Okay. I will work on this.

> There's enough bits in the
> patch that will likely need followup fixes (the .DEFERED_INIT stuff),

Do you mean your previous suggestion to merge the handling of VLA to non-VLA 
during gimplification phase?
I have done with this change locally.

> actual code gneration of the init is separate enough we can deal with
> it later.  Also IMHO not all targets necessarily need to behave the
> same there.

Then, shall we make the code generation part a target hook now? Or do this 
later?

Qing
> 
> Richard.
> 
>> Qing
>>> 
>>> Thanks,
>>> Richard.
>>> 
 Qing
 
 On Jun 22, 2021, at 3:59 AM, Richard Biener 
 mailto:rguent...@suse.de>> wrote:
 
 On Tue, 22 Jun 2021, Richard Sandiford wrote:
 
 Kees Cook mailto:keesc...@chromium.org>> writes:
 On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
 So, if “pattern value” is “0x”, then it’s a valid 
 canonical virtual memory address.  However, for most OS, 
 “0x” should be not in user space.
 
 My question is, is “0xF” good for pointer? Or 
 “0x” better?
 
 I think 0xFF repeating is fine for this version. Everything else is a
 "nice to have" for the pattern-init, IMO. :)
 
 Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
 
 For integer types, all values are valid representations, and we're
 relying on the pattern being “obviously” wrong in context.  0x…
 is unlikely to be a correct integer but 0x… would instead be a
 “nice” -1.  It would be difficult to tell in a debugger that a -1
 came from pattern init rather than a deliberate choice.
 
 I agree that, all other things being equal, it would be nice to use NaNs
 for floats.  But relying on wrong numerical values for floats doesn't
 seem worse than doing that for integers.
 
 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
 which admittedly doesn't stand out as wrong.  But I'm not sure we
 should sacrifice integer debugging for float debugging here.
 
 We can always expose the actual value as --param.  Now, I think
 we'd need a two-byte pattern to reliably produce NaNs anyway,
 so with floats taken out of the picture the focus should be on
 pointers where IMHO val & 1 and val & 15 would be nice to have.
 So sth like 0xf7 would work for those.  With a two-byte pattern
 we could use 0xffef or 0x7fef.
 
 Anyway, it's probably down to priorities of the project involved
 (debugging FP stuff or integer stuff).
 
 Richard.
 
 
>>> 
>>> -- 
>>> Richard Biener 
>>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>>> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>> 
>> 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Qing Zhao wrote:

> 
> 
> > On Jun 22, 2021, at 9:00 AM, Richard Biener  wrote:
> > 
> > On Tue, 22 Jun 2021, Qing Zhao wrote:
> > 
> >> So, I am wondering why not still keep my current implementation on 
> >> assign different patterns for different types?
> >> 
> >> This major issue with this design is the code size and runtime overhead, 
> >> but for debugging purpose, those are not that important, right? And we 
> >> can add some optimization later to improve the code size and runtime 
> >> overhead.
> >> 
> >> Otherwise, if we only use one pattern for all the types in this initial 
> >> version, later we still might need change it.
> >> 
> >> How do you think?
> > 
> > No, let's not re-open that discussion.  As said we can look to support
> > multi-byte pattern if that has a chance to improve things but only
> > as followup.
> 
> I am fine with this.
> 
> However, we need to decide whether we will use one-byte repeatable pattern, 
> or multiple-byte repeatable pattern now,
> Since the implementation will be different. If using one-byte, the 
> implementation will be the simplest, we can use memset for all
> VLA, non-vla, zero-init, or pattern-init consistently.
> 
> However, if we choose multiple-byte pattern, then the implementation will be 
> different, we cannot use memset for pattern-init, and 
> The implemenation for VLA pattern-init also is different.

As said, we can do this as followup.  For now get the easiest thing
working - one-byte patterns via memset.  There's enough bits in the
patch that will likely need followup fixes (the .DEFERED_INIT stuff),
actual code gneration of the init is separate enough we can deal with
it later.  Also IMHO not all targets necessarily need to behave the
same there.

Richard.

> Qing
> > 
> > Thanks,
> > Richard.
> > 
> >> Qing
> >> 
> >> On Jun 22, 2021, at 3:59 AM, Richard Biener 
> >> mailto:rguent...@suse.de>> wrote:
> >> 
> >> On Tue, 22 Jun 2021, Richard Sandiford wrote:
> >> 
> >> Kees Cook mailto:keesc...@chromium.org>> writes:
> >> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
> >> So, if “pattern value” is “0x”, then it’s a valid 
> >> canonical virtual memory address.  However, for most OS, 
> >> “0x” should be not in user space.
> >> 
> >> My question is, is “0xF” good for pointer? Or 
> >> “0x” better?
> >> 
> >> I think 0xFF repeating is fine for this version. Everything else is a
> >> "nice to have" for the pattern-init, IMO. :)
> >> 
> >> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
> >> 
> >> For integer types, all values are valid representations, and we're
> >> relying on the pattern being “obviously” wrong in context.  0x…
> >> is unlikely to be a correct integer but 0x… would instead be a
> >> “nice” -1.  It would be difficult to tell in a debugger that a -1
> >> came from pattern init rather than a deliberate choice.
> >> 
> >> I agree that, all other things being equal, it would be nice to use NaNs
> >> for floats.  But relying on wrong numerical values for floats doesn't
> >> seem worse than doing that for integers.
> >> 
> >> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
> >> which admittedly doesn't stand out as wrong.  But I'm not sure we
> >> should sacrifice integer debugging for float debugging here.
> >> 
> >> We can always expose the actual value as --param.  Now, I think
> >> we'd need a two-byte pattern to reliably produce NaNs anyway,
> >> so with floats taken out of the picture the focus should be on
> >> pointers where IMHO val & 1 and val & 15 would be nice to have.
> >> So sth like 0xf7 would work for those.  With a two-byte pattern
> >> we could use 0xffef or 0x7fef.
> >> 
> >> Anyway, it's probably down to priorities of the project involved
> >> (debugging FP stuff or integer stuff).
> >> 
> >> Richard.
> >> 
> >> 
> > 
> > -- 
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


RE: [PATCH]middle-end[RFC] slp: new implementation of complex numbers

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, June 22, 2021 1:08 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: Re: [PATCH]middle-end[RFC] slp: new implementation of complex
> > numbers
> > 
> > On Mon, 21 Jun 2021, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > This patch is still very much incomplete and I do know that it is
> > > missing things but it's complete enough such that examples are working
> > > and allows me to show what I'm working towards.
> > >
> > > note, that this approach will remove a lot of code in
> > > tree-vect-slp-patterns but to keep the diff readable I've left them in
> > > and just commented out the calls or removed them where needed.
> > >
> > > The patch rewrites the complex numbers detection by splitting the
> > > detection of structure from dataflow analysis.  In principle the
> > > biggest difference between this and the previous implementation is
> > > that instead of trying to detect valid complex operations it *makes*
> > > an operation a valid complex operation.
> > >
> > > To do this each operation gets a dual optab which matches the same
> > > structure but has no dataflow requirement.
> > >
> > > i.e. in this patch I added 4, ADDSUB, SUBADD, MUL_ADDSUB,
> > MULL_SUBADD.
> > >
> > > There is a then a mapping between these and their variant with the
> > dataflow:
> > >
> > > * ADDSUB -> COMPLEX_ADD_ROT270
> > > * SUBADD -> COMPLEX_ADD_ROT90
> > > * MUL_ADDSUB -> COMPLEX_MUL_CONJ
> > > * MUL_SUBADD -> COMPLEX_MUL
> > >
> > > with the intention that when we detect the structure of an operation
> > > we query the backend for both optabs.
> > >
> > > This should result in one of three states:
> > >
> > >  * not supported: Move on.
> > >  * Supports ADDSUB only: Rewrite using ADDSUB, set type to
> > 'cannot_transform'
> > >  * Supports COMPLEX_ADD_ROT270 only: Rewrite using ADDSUB, set type
> > to 'must_transform'
> > >  * Supports both: Rewrite using ADDSUB, set type fo 'can_transform'
> > >
> > > with the idea behind `can_transform` is to check the costs of the
> > > inverse permute needed to use the complex operation and if this is
> > > very expensive then stick to addsub.  This requires the target to be
> > > able to cost the operations reasonably correct.
> > >
> > > So for ADD this looks like
> > >
> > >  === vect_match_slp_patterns ===
> > >  Analyzing SLP tree 0x494e970 for patterns  Found ADDSUB pattern in
> > > SLP tree  Target does not support ADDSUB for vector type vector(4)
> > > float  Found COMPLEX_ADD_ROT270 pattern in SLP tree  Target supports
> > > COMPLEX_ADD_ROT270 vectorization with mode vector(4) float Pattern
> > > matched SLP tree node 0x494e970 (max_nunits=4, refcnt=1) op template:
> > > REALPART_EXPR <*_10> = _23;
> > >   stmt 0 REALPART_EXPR <*_10> = _23;
> > >   stmt 1 IMAGPART_EXPR <*_10> = _22;
> > >   children 0x494ea00
> > > node 0x494ea00 (max_nunits=4, refcnt=1) op template: slp_patt_39 =
> > > .ADDSUB (_23, _23);
> > >   stmt 0 _23 = _6 + _13;
> > >   stmt 1 _22 = _12 - _8;
> > >   children 0x494eb20 0x494ebb0
> > > node 0x494eb20 (max_nunits=4, refcnt=1) op template: _13 =
> > > REALPART_EXPR <*_3>;
> > >   stmt 0 _13 = REALPART_EXPR <*_3>;
> > >   stmt 1 _12 = IMAGPART_EXPR <*_3>;
> > > node 0x494ebb0 (max_nunits=4, refcnt=1)
> > > op: VEC_PERM_EXPR
> > >   { }
> > >   lane permutation { 0[1] 0[0] }
> > >   children 0x494ec40
> > > node 0x494ec40 (max_nunits=1, refcnt=2) op template: _8 =
> > > REALPART_EXPR <*_5>;
> > >   stmt 0 _8 = REALPART_EXPR <*_5>;
> > >   stmt 1 _6 = IMAGPART_EXPR <*_5>;
> > >   load permutation { 0 1 }
> > >
> > > and later during optimize_slp we get
> > >
> > > Tranforming SLP expression from ADDSUB to COMPLEX_ADD_ROT270
> > > processing node 0x494ebb0 simplifying permute node 0x494ebb0
> > Optimized
> > > SLP instances:
> > > node 0x494e970 (max_nunits=4, refcnt=1) op template: REALPART_EXPR
> > > <*_10> = _23;
> > >stmt 0 REALPART_EXPR <*_10> = _23;
> > >stmt 1 IMAGPART_EXPR <*_10> = _22;
> > >children 0x494ea00
> > > node 0x494ea00 (max_nunits=4, refcnt=1) op template: slp_patt_39 =
> > > .COMPLEX_ADD_ROT270 (_23, _23);
> > >stmt 0 _23 = _6 + _13;
> > >stmt 1 _22 = _12 - _8;
> > >children 0x494eb20 0x494ebb0
> > > node 0x494eb20 (max_nunits=4, refcnt=1) op template: _13 =
> > > REALPART_EXPR <*_3>;
> > >stmt 0 _13 = REALPART_EXPR <*_3>;
> > >stmt 1 _12 = IMAGPART_EXPR <*_3>;
> > > node 0x494ebb0 (max_nunits=4, refcnt=1)
> > > op: VEC_PERM_EXPR
> > >{ }
> > >lane permutation { 0[0] 0[1] }
> > >children 0x494ec40
> > > node 0x494ec40 (max_nunits=1, refcnt=2) op template: _8 =
> > > REALPART_EXPR <*_5>;
> > >stmt 0 _8 = REALPART_EXPR <*_5>;
> > >stmt 1 _6 = IMAGPART_EXPR <*_5>;
> > >
> > > Now I still have to elide the VEC_PERM_EXPR here but that's easy.
> > 
> > So having skimmed half of the patch - this means SLP pattern recog will
> > 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches


> On Jun 22, 2021, at 9:00 AM, Richard Biener  wrote:
> 
> On Tue, 22 Jun 2021, Qing Zhao wrote:
> 
>> So, I am wondering why not still keep my current implementation on 
>> assign different patterns for different types?
>> 
>> This major issue with this design is the code size and runtime overhead, 
>> but for debugging purpose, those are not that important, right? And we 
>> can add some optimization later to improve the code size and runtime 
>> overhead.
>> 
>> Otherwise, if we only use one pattern for all the types in this initial 
>> version, later we still might need change it.
>> 
>> How do you think?
> 
> No, let's not re-open that discussion.  As said we can look to support
> multi-byte pattern if that has a chance to improve things but only
> as followup.

I am fine with this.

However, we need to decide whether we will use one-byte repeatable pattern, or 
multiple-byte repeatable pattern now,
Since the implementation will be different. If using one-byte, the 
implementation will be the simplest, we can use memset for all
VLA, non-vla, zero-init, or pattern-init consistently.

However, if we choose multiple-byte pattern, then the implementation will be 
different, we cannot use memset for pattern-init, and 
The implemenation for VLA pattern-init also is different.

Qing
> 
> Thanks,
> Richard.
> 
>> Qing
>> 
>> On Jun 22, 2021, at 3:59 AM, Richard Biener 
>> mailto:rguent...@suse.de>> wrote:
>> 
>> On Tue, 22 Jun 2021, Richard Sandiford wrote:
>> 
>> Kees Cook mailto:keesc...@chromium.org>> writes:
>> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
>> So, if “pattern value” is “0x”, then it’s a valid canonical 
>> virtual memory address.  However, for most OS, “0x” should 
>> be not in user space.
>> 
>> My question is, is “0xF” good for pointer? Or 
>> “0x” better?
>> 
>> I think 0xFF repeating is fine for this version. Everything else is a
>> "nice to have" for the pattern-init, IMO. :)
>> 
>> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
>> 
>> For integer types, all values are valid representations, and we're
>> relying on the pattern being “obviously” wrong in context.  0x…
>> is unlikely to be a correct integer but 0x… would instead be a
>> “nice” -1.  It would be difficult to tell in a debugger that a -1
>> came from pattern init rather than a deliberate choice.
>> 
>> I agree that, all other things being equal, it would be nice to use NaNs
>> for floats.  But relying on wrong numerical values for floats doesn't
>> seem worse than doing that for integers.
>> 
>> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
>> which admittedly doesn't stand out as wrong.  But I'm not sure we
>> should sacrifice integer debugging for float debugging here.
>> 
>> We can always expose the actual value as --param.  Now, I think
>> we'd need a two-byte pattern to reliably produce NaNs anyway,
>> so with floats taken out of the picture the focus should be on
>> pointers where IMHO val & 1 and val & 15 would be nice to have.
>> So sth like 0xf7 would work for those.  With a two-byte pattern
>> we could use 0xffef or 0x7fef.
>> 
>> Anyway, it's probably down to priorities of the project involved
>> (debugging FP stuff or integer stuff).
>> 
>> Richard.
>> 
>> 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Qing Zhao wrote:

> So, I am wondering why not still keep my current implementation on 
> assign different patterns for different types?
> 
> This major issue with this design is the code size and runtime overhead, 
> but for debugging purpose, those are not that important, right? And we 
> can add some optimization later to improve the code size and runtime 
> overhead.
> 
> Otherwise, if we only use one pattern for all the types in this initial 
> version, later we still might need change it.
> 
> How do you think?

No, let's not re-open that discussion.  As said we can look to support
multi-byte pattern if that has a chance to improve things but only
as followup.

Thanks,
Richard.

> Qing
> 
> On Jun 22, 2021, at 3:59 AM, Richard Biener 
> mailto:rguent...@suse.de>> wrote:
> 
> On Tue, 22 Jun 2021, Richard Sandiford wrote:
> 
> Kees Cook mailto:keesc...@chromium.org>> writes:
> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
> So, if “pattern value” is “0x”, then it’s a valid canonical 
> virtual memory address.  However, for most OS, “0x” should be 
> not in user space.
> 
> My question is, is “0xF” good for pointer? Or 
> “0x” better?
> 
> I think 0xFF repeating is fine for this version. Everything else is a
> "nice to have" for the pattern-init, IMO. :)
> 
> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
> 
> For integer types, all values are valid representations, and we're
> relying on the pattern being “obviously” wrong in context.  0x…
> is unlikely to be a correct integer but 0x… would instead be a
> “nice” -1.  It would be difficult to tell in a debugger that a -1
> came from pattern init rather than a deliberate choice.
> 
> I agree that, all other things being equal, it would be nice to use NaNs
> for floats.  But relying on wrong numerical values for floats doesn't
> seem worse than doing that for integers.
> 
> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
> which admittedly doesn't stand out as wrong.  But I'm not sure we
> should sacrifice integer debugging for float debugging here.
> 
> We can always expose the actual value as --param.  Now, I think
> we'd need a two-byte pattern to reliably produce NaNs anyway,
> so with floats taken out of the picture the focus should be on
> pointers where IMHO val & 1 and val & 15 would be nice to have.
> So sth like 0xf7 would work for those.  With a two-byte pattern
> we could use 0xffef or 0x7fef.
> 
> Anyway, it's probably down to priorities of the project involved
> (debugging FP stuff or integer stuff).
> 
> Richard.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Qing Zhao via Gcc-patches
So, I am wondering why not still keep my current implementation on assign 
different patterns for different types?

This major issue with this design is the code size and runtime overhead, but 
for debugging purpose, those are not that important, right? And we can add some 
optimization later to improve the code size and runtime overhead.

Otherwise, if we only use one pattern for all the types in this initial 
version, later we still might need change it.

How do you think?

Qing

On Jun 22, 2021, at 3:59 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Tue, 22 Jun 2021, Richard Sandiford wrote:

Kees Cook mailto:keesc...@chromium.org>> writes:
On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
So, if “pattern value” is “0x”, then it’s a valid canonical 
virtual memory address.  However, for most OS, “0x” should be 
not in user space.

My question is, is “0xF” good for pointer? Or 
“0x” better?

I think 0xFF repeating is fine for this version. Everything else is a
"nice to have" for the pattern-init, IMO. :)

Sorry to be awkward, but 0xFF seems worse than 0xAA to me.

For integer types, all values are valid representations, and we're
relying on the pattern being “obviously” wrong in context.  0x…
is unlikely to be a correct integer but 0x… would instead be a
“nice” -1.  It would be difficult to tell in a debugger that a -1
came from pattern init rather than a deliberate choice.

I agree that, all other things being equal, it would be nice to use NaNs
for floats.  But relying on wrong numerical values for floats doesn't
seem worse than doing that for integers.

0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
which admittedly doesn't stand out as wrong.  But I'm not sure we
should sacrifice integer debugging for float debugging here.

We can always expose the actual value as --param.  Now, I think
we'd need a two-byte pattern to reliably produce NaNs anyway,
so with floats taken out of the picture the focus should be on
pointers where IMHO val & 1 and val & 15 would be nice to have.
So sth like 0xf7 would work for those.  With a two-byte pattern
we could use 0xffef or 0x7fef.

Anyway, it's probably down to priorities of the project involved
(debugging FP stuff or integer stuff).

Richard.



RE: [PATCH]middle-end[RFC] slp: new implementation of complex numbers

2021-06-22 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, June 22, 2021 1:08 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH]middle-end[RFC] slp: new implementation of complex
> numbers
> 
> On Mon, 21 Jun 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > This patch is still very much incomplete and I do know that it is
> > missing things but it's complete enough such that examples are working
> > and allows me to show what I'm working towards.
> >
> > note, that this approach will remove a lot of code in
> > tree-vect-slp-patterns but to keep the diff readable I've left them in
> > and just commented out the calls or removed them where needed.
> >
> > The patch rewrites the complex numbers detection by splitting the
> > detection of structure from dataflow analysis.  In principle the
> > biggest difference between this and the previous implementation is
> > that instead of trying to detect valid complex operations it *makes*
> > an operation a valid complex operation.
> >
> > To do this each operation gets a dual optab which matches the same
> > structure but has no dataflow requirement.
> >
> > i.e. in this patch I added 4, ADDSUB, SUBADD, MUL_ADDSUB,
> MULL_SUBADD.
> >
> > There is a then a mapping between these and their variant with the
> dataflow:
> >
> > * ADDSUB -> COMPLEX_ADD_ROT270
> > * SUBADD -> COMPLEX_ADD_ROT90
> > * MUL_ADDSUB -> COMPLEX_MUL_CONJ
> > * MUL_SUBADD -> COMPLEX_MUL
> >
> > with the intention that when we detect the structure of an operation
> > we query the backend for both optabs.
> >
> > This should result in one of three states:
> >
> >  * not supported: Move on.
> >  * Supports ADDSUB only: Rewrite using ADDSUB, set type to
> 'cannot_transform'
> >  * Supports COMPLEX_ADD_ROT270 only: Rewrite using ADDSUB, set type
> to 'must_transform'
> >  * Supports both: Rewrite using ADDSUB, set type fo 'can_transform'
> >
> > with the idea behind `can_transform` is to check the costs of the
> > inverse permute needed to use the complex operation and if this is
> > very expensive then stick to addsub.  This requires the target to be
> > able to cost the operations reasonably correct.
> >
> > So for ADD this looks like
> >
> >  === vect_match_slp_patterns ===
> >  Analyzing SLP tree 0x494e970 for patterns  Found ADDSUB pattern in
> > SLP tree  Target does not support ADDSUB for vector type vector(4)
> > float  Found COMPLEX_ADD_ROT270 pattern in SLP tree  Target supports
> > COMPLEX_ADD_ROT270 vectorization with mode vector(4) float Pattern
> > matched SLP tree node 0x494e970 (max_nunits=4, refcnt=1) op template:
> > REALPART_EXPR <*_10> = _23;
> >   stmt 0 REALPART_EXPR <*_10> = _23;
> >   stmt 1 IMAGPART_EXPR <*_10> = _22;
> >   children 0x494ea00
> > node 0x494ea00 (max_nunits=4, refcnt=1) op template: slp_patt_39 =
> > .ADDSUB (_23, _23);
> >   stmt 0 _23 = _6 + _13;
> >   stmt 1 _22 = _12 - _8;
> >   children 0x494eb20 0x494ebb0
> > node 0x494eb20 (max_nunits=4, refcnt=1) op template: _13 =
> > REALPART_EXPR <*_3>;
> >   stmt 0 _13 = REALPART_EXPR <*_3>;
> >   stmt 1 _12 = IMAGPART_EXPR <*_3>;
> > node 0x494ebb0 (max_nunits=4, refcnt=1)
> > op: VEC_PERM_EXPR
> >   { }
> >   lane permutation { 0[1] 0[0] }
> >   children 0x494ec40
> > node 0x494ec40 (max_nunits=1, refcnt=2) op template: _8 =
> > REALPART_EXPR <*_5>;
> >   stmt 0 _8 = REALPART_EXPR <*_5>;
> >   stmt 1 _6 = IMAGPART_EXPR <*_5>;
> >   load permutation { 0 1 }
> >
> > and later during optimize_slp we get
> >
> > Tranforming SLP expression from ADDSUB to COMPLEX_ADD_ROT270
> > processing node 0x494ebb0 simplifying permute node 0x494ebb0
> Optimized
> > SLP instances:
> > node 0x494e970 (max_nunits=4, refcnt=1) op template: REALPART_EXPR
> > <*_10> = _23;
> >stmt 0 REALPART_EXPR <*_10> = _23;
> >stmt 1 IMAGPART_EXPR <*_10> = _22;
> >children 0x494ea00
> > node 0x494ea00 (max_nunits=4, refcnt=1) op template: slp_patt_39 =
> > .COMPLEX_ADD_ROT270 (_23, _23);
> >stmt 0 _23 = _6 + _13;
> >stmt 1 _22 = _12 - _8;
> >children 0x494eb20 0x494ebb0
> > node 0x494eb20 (max_nunits=4, refcnt=1) op template: _13 =
> > REALPART_EXPR <*_3>;
> >stmt 0 _13 = REALPART_EXPR <*_3>;
> >stmt 1 _12 = IMAGPART_EXPR <*_3>;
> > node 0x494ebb0 (max_nunits=4, refcnt=1)
> > op: VEC_PERM_EXPR
> >{ }
> >lane permutation { 0[0] 0[1] }
> >children 0x494ec40
> > node 0x494ec40 (max_nunits=1, refcnt=2) op template: _8 =
> > REALPART_EXPR <*_5>;
> >stmt 0 _8 = REALPART_EXPR <*_5>;
> >stmt 1 _6 = IMAGPART_EXPR <*_5>;
> >
> > Now I still have to elide the VEC_PERM_EXPR here but that's easy.
> 
> So having skimmed half of the patch - this means SLP pattern recog will
> initially recognize { +, -, +, - } as ADDSUB for example but not factor in 
> lane
> permutes from loads yet.  Now, suppose we have { +, -, -, + } seen in pattern
> recog - how's that handled?

These will be rejected, the lanes are still checked to ensure that it's a { +, 
- ,+, -}.
The lane 

Re: [PATCH 2/2] elf: Add GNU_PROPERTY_1_NEEDED check

2021-06-22 Thread H.J. Lu via Gcc-patches
On Mon, Jun 21, 2021 at 10:46 PM Fangrui Song  wrote:
>
> On 2021-06-21, H.J. Lu wrote:
> >On Mon, Jun 21, 2021 at 9:16 PM Alan Modra  wrote:
> >>
> >> On Mon, Jun 21, 2021 at 07:12:02PM -0700, H.J. Lu wrote:
> >> > On Mon, Jun 21, 2021 at 5:06 PM Alan Modra  wrote:
> >> > >
> >> > > On Mon, Jun 21, 2021 at 03:34:38PM -0700, Fangrui Song wrote:
> >> > > > clang -fno-pic -fno-direct-access-extern-data  works with 
> >> > > > clang>=12.0.0 today.
> >> > >
> >> > > -fno-direct-access-extern-data or variations on that also seem good to
> >> > > me.  -fpic-extern would also work.  I liked -fprotected-abi because
> >> > > it shows the intent of correcting abi issues related to protected
> >> > > visibility.  (Yes, it affects code for all undefined symbols because
> >> > > the compiler clearly isn't seeing the entire program if there are
> >> > > undefined symbols.)
> >> >
> >> > I need an option which can be turned on and off.   How about
> >> > -fextern-access=direct and -fextern-access=indirect?  It will cover
> >> > both data and function?
>
> -fno-direct-access-external-data and -fdirect-access-external-data can turn 
> on/off the bit.
>
> clang -fno-pic -fno-direct-access-external-data  works for x86-64 and aarch64.
>
> We can add a -fno-direct-access-external

Since both clang and GCC will add a new option for both data and function
symbols, can we have an agreement on the new option name?  I am listing
options here:

1. -fdirect-access-external/-fno-direct-access-external
2. -fdirect-extern-access/-fno-direct-exern-access
3. -fdirect-external-access/-fno-direct-exernal-access
4. -fextern-access=direct/-fextern-access=indirect
5. -fexternal-access=direct/-fexternal-access=indirect

My order of preferences are 4, 5, 2, 3, 1.

> >> Yes, FWIW that option name for gcc also looks good to me.
> >
> >I will change the gcc option to
> >
> >-fextern-access=direct
> >-fextern-access=indirect
> >
> >and change GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION
> >to GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS
>
> Note that this will be a glibc + GNU ld specific thing.
>
> gold and ld.lld error for copy relocations on protected data symbols by 
> default.

At run-time, there will be a mixture of components built with different tools
over time.  A marker will help glibc to avoid potential run-time failures due
to binary incompatibility.

> >> Now as to the need for a corresponding linker option, I'm of the
> >> opinion that it is ideal for the linker to be able to cope without
> >> needing special options.  Can you show me a set of object files (or
> >> just describe them) where ld cannot deduce from relocations and
> >> dynamic symbols what dynbss copies, plt stubs, and dynamic relocations
> >> are needed?  I'm fairly sure I manage to do that for powerpc.
> >>
> >> Note that I'm not against a new option to force the linker to go
> >> against what it would do based on input object files (perhaps
> >
> >I'd like to turn it on in linker without any compiler changes, especially
> >when building shared libraries, kind of a subset of -Bsymbolic.
> >
> >> reporting errors), but don't think we should have a new option without
> >> some effort being made to see whether we really need it.
> >
> >Here is a glibc patch to use both linker options on some testcases:
> >
> >https://sourceware.org/pipermail/libc-alpha/2021-June/127770.html
> >
> >> > > The main thing that struck me about -fsingle-global-definition is that
> >> > > the option doesn't do what it says.  You can still have multiple
> >> > > global definitions of a given symbol, one in the executable and one in
> >> > > each of the shared libraries making up the complete program.  Which of
> >> > > course is no different to code without -fsingle-global-definition.
> >> >
> >> >
> >> > --
> >> > H.J.
> >>
> >> --
> >> Alan Modra
> >> Australia Development Lab, IBM
> >
> >
> >
> >--
> >H.J.



-- 
H.J.


Re: [PATCH v2] libstdc++: Improve std::lock algorithm

2021-06-22 Thread Matthias Kretz
On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote:
> With your suggestion
> to also drop std::tuple the number of parameters decides which
> function we call. And we don't instantiate std::tuple. And we can also
> get rid of the __try_to_lock function, which was only used to deduce
> the lock type rather than use tuple_element to get it. That's much
> nicer.



> > How about optimizing a likely common case where all lockables have the
> > same
> > type? In that case we don't require recursion and can manage stack usage
> > much
> > simpler:
> The stack usage is bounded by the number of mutexes being locked,
> which is unlikely to get large, but we can do that.

Right. I meant simpler, because it takes a while of staring at the recursive 
implementation to understand how it works. :)

> We can do it for try_lock too:
> 
>   template
> int
> try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> {
> #if __cplusplus >= 201703L
>   if constexpr (is_same_v<_L1, _L2>
> && (is_same_v<_L1, _L3> && ...))
> {
>   constexpr int _Np = 2 + sizeof...(_L3);
>   unique_lock<_L1> __locks[_Np] = {
>   {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3,
> try_to_lock}... };

This does a try_lock on all lockabes even if any of them fails. I think that's 
not only more expensive but also non-conforming. I think you need to defer 
locking and then loop from beginning to end to break the loop on the first 
unsuccessful try_lock.

>   for (int __i = 0; __i < _Np; ++__i)
> if (!__locks[__i])
>   return __i;
>   for (auto& __l : __locks)
> __l.release();
>   return -1;
> }
>   else
> #endif
>   return __detail::__try_lock_impl(__l1, __l2, __l3...);
> }
> 
> > if constexpr ((is_same_v<_L0, _L1> && ...))
> > {
> > constexpr int _Np = 1 + sizeof...(_L1);
> > std::array, _Np> __locks = {
> > {__l0, defer_lock}, {__l1, defer_lock}...
> > };
> > int __first = 0;
> > do {
> > __locks[__first].lock();
> > for (int __j = 1; __j < _Np; ++__j)
> > {
> > const int __idx = (__first + __j) % _Np;
> > if (!__locks[__idx].try_lock())
> > {
> > for (int __k = __idx; __k != __first;
> > __k = __k == 1 ? _Np : __k - 1)
> > __locks[__k - 1].unlock();
> 
> This loop doesn't work if any try_lock fails when first==0, because
> the loop termination condition is never reached.

Uh yes. Which is the same reason why the __j loop doesn't start from __first + 
1.

> I find this a bit easier to understand than the loop above, and
> correct (I think):
> 
>   for (int __k = __j; __k != 0; --__k)
> __locks[(__first + __k - 1) % _Np].unlock();

Yes, if only we had a wrapping integer type that wraps at an arbitrary N. Like 
unsigned int but with parameter, like:

  for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
__locks[__k - 1].unlock();

This is the loop I wanted to write, except --__k is simpler to write and __k - 
1 would also wrap around to _Np - 1 for __k == 0. But if this is the only 
place it's not important enough to abstract.

> [...]
> @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> *  @post All arguments are locked.
> *
> *  All arguments are locked via a sequence of calls to lock(),
> try_lock() -   *  and unlock().  If the call exits via an exception any
> locks that were -   *  obtained will be released.
> +   *  and unlock().  If this function exits via an exception any locks that
> +   *  were obtained will be released.
> */
>template
>  void
>  lock(_L1& __l1, _L2& __l2, _L3&... __l3)
>  {
> -  int __i = 0;
> -  __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> +#if __cplusplus >= 201703L

I also considered moving it down here. Makes sense unless you want to call 
__detail::__lock_impl from other functions. And if we want to make it work for 
pre-C++11 we could do

  using __homogeneous
= __and_, is_same<_L1, _L3>...>;
  int __i = 0;
  __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);


-Matthias

> +  if constexpr (is_same_v<_L1, _L2> && (is_same_v<_L1, _L3> && ...))
> +   {
> + constexpr int _Np = 2 + sizeof...(_L3);
> + unique_lock<_L1> __locks[] = {
> + {__l1, defer_lock}, {__l2, defer_lock}, {__l3, defer_lock}...
> + };
> + int __first = 0;
> + do {
> +   __locks[__first].lock();
> +   for (int __j = 1; __j < _Np; ++__j)
> + {
> +   const int __idx = (__first + __j) % _Np;
> +   if (!__locks[__idx].try_lock())
> + {
> +   for (int __k = __j; __k != 0; --__k)
> + __locks[(__first + __k - 1) % _Np].unlock();
> +   __first = __idx;
> +   break;
> + }
> + }
> + } while (!__locks[__first]);
> +
> + for (auto& __l : __locks)
> +   __l.release();
> +   }
> 

[COMMITTED 7/7] Add relational self-tests.

2021-06-22 Thread Andrew MacLeod via Gcc-patches
This patch just adds some basic self tests for some of the new relation 
operations.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From ca1f9f22854049d6f9cab5b4bfbc46edbcb5c990 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Jun 2021 13:40:05 -0400
Subject: [PATCH 7/7] Add relational self-tests.

	* range-op.cc (range_relational_tests): New.
	(range_op_tests): Call range_relational_tests.
---
 gcc/range-op.cc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 92b314df9dd..1692a096e20 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4244,6 +4244,30 @@ range_op_bitwise_and_tests ()
   ASSERT_FALSE (res.contains_p (INT (0)));
 }
 
+static void
+range_relational_tests ()
+{
+  int_range<2> lhs (unsigned_char_type_node);
+  int_range<2> op1 (UCHAR (8), UCHAR (10));
+  int_range<2> op2 (UCHAR (20), UCHAR (20));
+
+  // Never wrapping additions mean LHS > OP1.
+  tree_code code = op_plus.lhs_op1_relation (lhs, op1, op2);
+  ASSERT_TRUE (code == GT_EXPR);
+
+  // Most wrapping additions mean nothing...
+  op1 = int_range<2> (UCHAR (8), UCHAR (10));
+  op2 = int_range<2> (UCHAR (0), UCHAR (255));
+  code = op_plus.lhs_op1_relation (lhs, op1, op2);
+  ASSERT_TRUE (code == VREL_NONE);
+
+  // However, always wrapping additions mean LHS < OP1.
+  op1 = int_range<2> (UCHAR (1), UCHAR (255));
+  op2 = int_range<2> (UCHAR (255), UCHAR (255));
+  code = op_plus.lhs_op1_relation (lhs, op1, op2);
+  ASSERT_TRUE (code == LT_EXPR);
+}
+
 void
 range_op_tests ()
 {
@@ -4251,6 +4275,7 @@ range_op_tests ()
   range_op_lshift_tests ();
   range_op_bitwise_and_tests ();
   range_op_cast_tests ();
+  range_relational_tests ();
 }
 
 } // namespace selftest
-- 
2.17.2



[COMMITTED 6/7] Add relation between LHS and op1 for casts and copies.

2021-06-22 Thread Andrew MacLeod via Gcc-patches
This patch provides basic relations between the LHS and OP1 for casts 
and copies.


Copies are very straightforward,  LHS == OP1 always.

Casts are a little trickier. If the RHS of the copy is of the same or 
lower precision as the LHS,  then we can consider them EQ_EXPR.  
Otherwise, they are considered to have NO relation.


There is a comment regarding exactly why this is the case in the 
implementation of operator_cast::lhs_op1_relation.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From 0f7ccc063a42407f91fa52a54cc480950a45e75c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Jun 2021 13:39:02 -0400
Subject: [PATCH 6/7] Add relation between LHS and op1 for casts and copies.

	* range-op.cc (operator_cast::lhs_op1_relation): New.
	(operator_identity::lhs_op1_relation): Mew.
---
 gcc/range-op.cc | 41 +
 1 file changed, 41 insertions(+)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index ec4816d69fa..92b314df9dd 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2115,6 +2115,10 @@ public:
 			  const irange ,
 			  const irange ,
 			  relation_kind rel = VREL_NONE) const;
+  virtual enum tree_code lhs_op1_relation (const irange ,
+	   const irange ,
+	   const irange ) const;
+
 private:
   bool truncating_cast_p (const irange , const irange ) const;
   bool inside_domain_p (const wide_int , const wide_int ,
@@ -2123,6 +2127,27 @@ private:
 			   const irange ) const;
 } op_convert;
 
+// Determine if there is a relationship between LHS and OP1.
+
+enum tree_code
+operator_cast::lhs_op1_relation (const irange ,
+ const irange ,
+ const irange  ATTRIBUTE_UNUSED) const
+{
+  if (op1.undefined_p ())
+return VREL_NONE;
+  // We can't make larger types equivalent to smaller types because we can
+  // miss sign extensions in a chain of casts.
+  // u32 = 0xf
+  // s32 = (s32) u32
+  // s64 = (s64) s32
+  // we cant simply "convert" s64 = (s64)u32  or we get positive 0x
+  // value instead of sign extended negative value.
+  if (TYPE_PRECISION (lhs.type ()) == TYPE_PRECISION (op1.type ()))
+return EQ_EXPR;
+  return VREL_NONE;
+}
+
 // Return TRUE if casting from INNER to OUTER is a truncating cast.
 
 inline bool
@@ -3325,8 +3350,24 @@ public:
 			  const irange ,
 			  const irange ,
 			  relation_kind rel = VREL_NONE) const;
+  virtual enum tree_code lhs_op1_relation (const irange ,
+	   const irange ,
+	   const irange ) const;
 } op_identity;
 
+// Determine if there is a relationship between LHS and OP1.
+
+enum tree_code
+operator_identity::lhs_op1_relation (const irange ,
+ const irange  ATTRIBUTE_UNUSED,
+ const irange  ATTRIBUTE_UNUSED) const
+{
+  if (lhs.undefined_p ())
+return VREL_NONE;
+  // Simply a copy, so they are equivalent.
+  return EQ_EXPR;
+}
+
 bool
 operator_identity::fold_range (irange , tree type ATTRIBUTE_UNUSED,
 			   const irange ,
-- 
2.17.2



[COMMITTED 5/7] Add relation effects between operands to MINUS_EXPR.

2021-06-22 Thread Andrew MacLeod via Gcc-patches
This patch enhances processing of OP_MINUS to show how we can utilize 
relations between op1 and op2 to produce a better result. Given:


a_3 = b_4 - d_1

if we know b_4 > d_1, on top of whatever other range calculations we can 
do with the actual ranges, we can apply the knowledge that the result 
must also  be  in the range [1, +INF] as well.


likewise, if we know b_4 == d_1, we know the result is [0,0]

This provide a sample of how applying a relation between 2 operands is 
implemented.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From ae6b830f31a47aca7ca24c4fea245c29214eef3a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Jun 2021 13:38:03 -0400
Subject: [PATCH 5/7] Add relation effects between operands to MINUS_EXPR.

	* range-op.cc (operator_minus::op1_op2_relation_effect): New.
---
 gcc/range-op.cc | 44 
 1 file changed, 44 insertions(+)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index a7698f21b0d..ec4816d69fa 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1279,6 +1279,11 @@ public:
 		const wide_int _ub,
 		const wide_int _lb,
 		const wide_int _ub) const;
+  virtual bool op1_op2_relation_effect (irange _range,
+	tree type,
+	const irange _range,
+	const irange _range,
+	relation_kind rel) const;
 } op_minus;
 
 void 
@@ -1293,6 +1298,45 @@ operator_minus::wi_fold (irange , tree type,
   value_range_with_overflow (r, type, new_lb, new_ub, ov_lb, ov_ub);
 }
 
+// Check to see if the relation REL between OP1 and OP2 has any effect on the
+// LHS of the epxression.  If so, apply it to LHS_RANGE.
+
+bool
+operator_minus::op1_op2_relation_effect (irange _range, tree type,
+  const irange _range ATTRIBUTE_UNUSED,
+  const irange _range ATTRIBUTE_UNUSED,
+  relation_kind rel) const
+{
+  if (rel == VREL_NONE)
+return false;
+
+  int_range<2> rel_range;
+  unsigned prec = TYPE_PRECISION (type);
+  signop sgn = TYPE_SIGN (type);
+
+  switch (rel)
+{
+  // op1 > op2,  op1 - op2 can be restricted to  [1, max]
+  case GT_EXPR:
+	rel_range = int_range<2> (type, wi::one (prec),
+  wi::max_value (prec, sgn));
+	break;
+  // op1 >= op2,  op1 - op2 can be restricted to  [0, max]
+  case GE_EXPR:
+	rel_range = int_range<2> (type, wi::zero (prec),
+  wi::max_value (prec, sgn));
+	break;
+  // op1 == op2,  op1 - op2 can be restricted to  [0, 0]
+  case EQ_EXPR:
+	rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
+	break;
+  default:
+	return false;
+}
+  lhs_range.intersect (rel_range);
+  return true;
+}
+
 bool
 operator_minus::op1_range (irange , tree type,
 			   const irange ,
-- 
2.17.2



[COMMITTED 4/7] Add relations between LHS and op1/op2 for PLUS_EXPR.

2021-06-22 Thread Andrew MacLeod via Gcc-patches
This patch demonstrates how to add relation generation between the LHS 
of an expression and one of the operands in range-ops, using PLUS_EXPR.


a_2 = b_3 + c_1

if c_1 == [0, 0], we know a_2 == b_3

if c_1 > 0, and there is no overflow/wrapping, we know a_2 > b_3

likewise, if c1 < 0 we know a_2 < b_3

if c_1 does not contain zero, we also know a_2 != b_3.

plus is symmetrical, so we can draw similar conclusions about the 
relation between a_2 and c_1 based on what we know about b_3's range.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From c526de3f432a037bdbdd44eb6fa43af4f3b22694 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Jun 2021 13:35:10 -0400
Subject: [PATCH 4/7] Add relations between LHS and op1/op2 for PLUS_EXPR.

	* range-op.cc (operator_plus::lhs_op1_relation): New.
	(operator_plus::lhs_op2_relation): New.
---
 gcc/range-op.cc | 80 +
 1 file changed, 80 insertions(+)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index d807693900a..a7698f21b0d 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1150,8 +1150,88 @@ public:
 		const wide_int _ub,
 		const wide_int _lb,
 		const wide_int _ub) const;
+  virtual enum tree_code lhs_op1_relation (const irange , const irange ,
+	   const irange ) const;
+  virtual enum tree_code lhs_op2_relation (const irange , const irange ,
+	   const irange ) const;
 } op_plus;
 
+// Check to see if the range of OP2 indicates anything about the relation
+// between LHS and OP1.
+
+enum tree_code
+operator_plus::lhs_op1_relation (const irange ,
+ const irange ,
+ const irange ) const
+{
+  if (lhs.undefined_p () || op1.undefined_p () || op2.undefined_p ())
+return VREL_NONE;
+
+  tree type = lhs.type ();
+  unsigned prec = TYPE_PRECISION (type);
+  wi::overflow_type ovf1, ovf2;
+  signop sign = TYPE_SIGN (type);
+
+  // LHS = OP1 + 0  indicates LHS == OP1.
+  if (op2.zero_p ())
+return EQ_EXPR;
+
+  if (TYPE_OVERFLOW_WRAPS (type))
+{
+  wi::add (op1.lower_bound (), op2.lower_bound (), sign, );
+  wi::add (op1.upper_bound (), op2.upper_bound (), sign, );
+}
+  else
+ovf1 = ovf2 = wi::OVF_NONE;
+
+  // Never wrapping additions.
+  if (!ovf1 && !ovf2)
+{
+  // Positive op2 means lhs > op1.
+  if (wi::gt_p (op2.lower_bound (), wi::zero (prec), sign))
+	return GT_EXPR;
+  if (wi::ge_p (op2.lower_bound (), wi::zero (prec), sign))
+	return GE_EXPR;
+
+  // Negative op2 means lhs < op1.
+  if (wi::lt_p (op2.upper_bound (), wi::zero (prec), sign))
+	return LT_EXPR;
+  if (wi::le_p (op2.upper_bound (), wi::zero (prec), sign))
+	return LE_EXPR;
+}
+  // Always wrapping additions.
+  else if (ovf1 && ovf1 == ovf2)
+{
+  // Positive op2 means lhs < op1.
+  if (wi::gt_p (op2.lower_bound (), wi::zero (prec), sign))
+	return LT_EXPR;
+  if (wi::ge_p (op2.lower_bound (), wi::zero (prec), sign))
+	return LE_EXPR;
+
+  // Negative op2 means lhs > op1.
+  if (wi::lt_p (op2.upper_bound (), wi::zero (prec), sign))
+	return GT_EXPR;
+  if (wi::le_p (op2.upper_bound (), wi::zero (prec), sign))
+	return GE_EXPR;
+}
+
+  // If op2 does not contain 0, then LHS and OP1 can never be equal.
+  if (!range_includes_zero_p ())
+return NE_EXPR;
+
+  return VREL_NONE;
+}
+
+// PLUS is symmetrical, so we can simply call lhs_op1_relation with reversed
+// operands.
+
+enum tree_code
+operator_plus::lhs_op2_relation (const irange , const irange ,
+ const irange ) const
+{
+  return lhs_op1_relation (lhs, op2, op1);
+}
+
 void
 operator_plus::wi_fold (irange , tree type,
 			const wide_int _lb, const wide_int _ub,
-- 
2.17.2



[COMMITTED 3/7] Add relational support to fold_using_range

2021-06-22 Thread Andrew MacLeod via Gcc-patches
This patch get the ball rolling by adding relation support to 
fold_using_ranges. This enables relations to be set and queried by 
ranger, and the results applied to any ranges being calculated.


At this point, any further additions to range-ops will be reflected in 
relational processing.  Currently only range-ops enabled opcodes are 
being handled, but the design of fold_using_range and gori_computes is 
such that we can now add relation processing to builtins or any other 
kind of statement easily.   That will be follow on work.


With this patch we can finally fold useless relations away.

I've tried to be efficient, and the current overhead is less than 1% of 
compile time in EVRP.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From a2c9173331914eff3d728c07afaeee71892689ba Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Jun 2021 14:09:48 -0400
Subject: [PATCH 3/7] Add relational support to fold_using_range

Enable a relation oracle in ranger, and add full range-op relation support
to fold_using_range.

	* gimple-range-cache.cc (ranger_cache::ranger_cache): Create a
	relation_oracle if dominators exist.
	(ranger_cache::~ranger_cache): Dispose of oracle.
	(ranger_cache::dump_bb): Dump oracle.
	* gimple-range.cc (fur_source::fur_source): New.
	(fur_source::get_operand): Use mmeber query.
	(fur_source::get_phi_operand): Use member_query.
	(fur_source::query_relation): New.
	(fur_source::register_dependency): Delete.
	(fur_source::register_relation): New.
	(fur_edge::fur_edge): Adjust.
	(fur_edge::get_phi_operand): Fix comment.
	(fur_edge::query): Delete.
	(fur_stmt::fur_stmt): Adjust.
	(fur_stmt::query): Delete.
	(fur_depend::fur_depend): Adjust.
	(fur_depend::register_relation): New.
	(fur_depend::register_relation): New.
	(fur_list::fur_list): Adjust.
	(fur_list::get_operand): Use member query.
	(fold_using_range::range_of_range_op): Process and query relations.
	(fold_using_range::range_of_address): Adjust dependency call.
	(fold_using_range::range_of_phi): Ditto.
	(gimple_ranger::gimple_ranger): New.  Use ranger_ache oracle.
	(fold_using_range::relation_fold_and_or): New.
	(fold_using_range::postfold_gcond_edges): New.
	* gimple-range.h (class gimple_ranger): Adjust.
	(class fur_source): Adjust members.
	(class fur_stmt): Ditto.
	(class fold_using_range): Ditto.
---
 gcc/gimple-range-cache.cc |  10 ++
 gcc/gimple-range.cc   | 355 +++---
 gcc/gimple-range.h|  22 ++-
 3 files changed, 324 insertions(+), 63 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index def604dc149..4347485cf98 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -714,6 +714,12 @@ ranger_cache::ranger_cache ()
   m_update_list.safe_grow_cleared (last_basic_block_for_fn (cfun));
   m_update_list.truncate (0);
   m_temporal = new temporal_cache;
+  // If DOM info is available, spawn an oracle as well.
+  if (dom_info_available_p (CDI_DOMINATORS))
+  m_oracle = new relation_oracle ();
+else
+  m_oracle = NULL;
+
   unsigned x, lim = last_basic_block_for_fn (cfun);
   // Calculate outgoing range info upfront.  This will fully populate the
   // m_maybe_variant bitmap which will help eliminate processing of names
@@ -728,6 +734,8 @@ ranger_cache::ranger_cache ()
 
 ranger_cache::~ranger_cache ()
 {
+  if (m_oracle)
+delete m_oracle;
   delete m_temporal;
   m_workback.release ();
   m_update_list.release ();
@@ -750,6 +758,8 @@ ranger_cache::dump_bb (FILE *f, basic_block bb)
 {
   m_gori.gori_map::dump (f, bb, false);
   m_on_entry.dump (f, bb);
+  if (m_oracle)
+m_oracle->dump (f, bb);
 }
 
 // Get the global range for NAME, and return in R.  Return false if the
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 0a2c72b29aa..385cecf330b 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -47,14 +47,25 @@ along with GCC; see the file COPYING3.  If not see
 #include "vr-values.h"
 #include "gimple-range.h"
 
-// Evaluate expression EXPR using the source information the class was
-// instantiated with.  Place the result in R, and return TRUE.  If a range
-// cannot be calculated, return FALSE.
+// Construct a fur_source, and set the m_query field.
+
+fur_source::fur_source (range_query *q)
+{
+  if (q)
+m_query = q;
+  else if (cfun)
+m_query = get_range_query (cfun);
+  else
+m_query = get_global_range_query ();
+  m_gori = NULL;
+}
+
+// Invoke range_of_expr on EXPR.
 
 bool
 fur_source::get_operand (irange , tree expr)
 {
-  return get_range_query (cfun)->range_of_expr (r, expr);
+  return m_query->range_of_expr (r, expr);
 }
 
 // Evaluate EXPR for this stmt as a PHI argument on edge E.  Use the current
@@ -63,23 +74,36 @@ fur_source::get_operand (irange , tree expr)
 bool
 fur_source::get_phi_operand (irange , tree expr, edge e)
 {
-  return get_range_query (cfun)->range_on_edge (r, e, expr);
+  return m_query->range_on_edge (r, e, expr);
+}

[COMMITTED 2/7] Add relational support to range-op.

2021-06-22 Thread Andrew MacLeod via Gcc-patches

THis patch adds relation support to range-ops.

a relation_kind is added to all fold and op1/2_range operations, which 
will allow any known relation to be applied during calculations.


4 more routines are provided which enable range-ops to indicate when a 
relation is caused by an expression/result and what effects it may have. 
   All relations are driven from these routines.


virtual enum tree_code lhs_op1_relation (const irange , const irange 
, const irange ) const;
virtual enum tree_code lhs_op2_relation (const irange , const irange 
, const irange ) const;

virtual enum tree_code op1_op2_relation (const irange ) const;
virtual bool op1_op2_relation_effect (irange _range, tree type, 
const irange _range, const irange _range, relation_kind rel) const;


This initial patch provides the basic of the relation opcodes. Ie,  
operator_equal has added


enum tree_code
operator_equal::op1_op2_relation (const irange ) const
{
  if (lhs.undefined_p ())
    return VREL_EMPTY;

  // FALSE = op1 == op2 indicates NE_EXPR.
  if (lhs.zero_p ())
    return NE_EXPR;

  // TRUE = op1 == op2 indicates EQ_EXPR.
  if (!lhs.contains_p (build_zero_cst (lhs.type (
    return EQ_EXPR;
  return VREL_NONE;
}

This teaches range-ops that there is a relation between op1 and op2 is 
the LHS has a range of FALSE, or TRUE.  Otherwise, we don't know what 
the relation is.


All 6 basic relations are implemented, as well as folding of these 
opcodes when a relation is to be applied.  Ie, how to fold it ( if a_2 
== b_2 is being folded., and the known relation coming into this 
expression is a_2 > b_2, it will fold to [0,0] or false.


Again, this patch on its own will not cause anything to actually happen, 
that will be enabled in the next patch. Virtually all relation 
generation and application is handled via these range-ops routines.


 Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From 80dd13f5c3bdc7899ee6e863e05b254815ec0cef Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Jun 2021 11:49:21 -0400
Subject: [PATCH 2/7] Add relational support to range-op.

This patch integrates relations with range-op functionality so that any
known relations can be used to help reduce or resolve ranges.
Initially handle  EQ_EXPR, NE_EXPR, LE_EXPR, LT_EXPR, GT_EXPR and GE_EXPR.

	* range-op.cc (range_operator::wi_fold): Apply relation effect.
	(range_operator::fold_range): Adjust and apply relation effect.
	(*::fold_range): Add relation parameters.
	(*::op1_range): Ditto.
	(*::op2_range): Ditto.
	(range_operator::lhs_op1_relation): New.
	(range_operator::lhs_op2_relation): New.
	(range_operator::op1_op2_relation): New.
	(range_operator::op1_op2_relation_effect): New.
	(relop_early_resolve): New.
	(operator_equal::op1_op2_relation): New.
	(operator_equal::fold_range): Call relop_early_resolve.
	(operator_not_equal::op1_op2_relation): New.
	(operator_not_equal::fold_range): Call relop_early_resolve.
	(operator_lt::op1_op2_relation): New.
	(operator_lt::fold_range): Call relop_early_resolve.
	(operator_le::op1_op2_relation): New.
	(operator_le::fold_range): Call relop_early_resolve.
	(operator_gt::op1_op2_relation): New.
	(operator_gt::fold_range): Call relop_early_resolve.
	(operator_ge::op1_op2_relation): New.
	(operator_ge::fold_range): Call relop_early_resolve.
	* range-op.h (class range_operator): Adjust parameters and methods.
---
 gcc/range-op.cc | 584 +---
 gcc/range-op.h  |  24 +-
 2 files changed, 469 insertions(+), 139 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index e805f26a333..d807693900a 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-walk.h"
 #include "tree-cfg.h"
 #include "wide-int.h"
+#include "value-relation.h"
 #include "range-op.h"
 
 // Return the upper limit for a type.
@@ -138,7 +139,8 @@ range_operator::wi_fold (irange , tree type,
 bool
 range_operator::fold_range (irange , tree type,
 			const irange ,
-			const irange ) const
+			const irange ,
+			relation_kind rel) const
 {
   gcc_checking_assert (irange::supports_type_p (type));
   if (empty_range_varying (r, type, lh, rh))
@@ -152,6 +154,7 @@ range_operator::fold_range (irange , tree type,
 {
   wi_fold (r, type, lh.lower_bound (0), lh.upper_bound (0),
 	   rh.lower_bound (0), rh.upper_bound (0));
+  op1_op2_relation_effect (r, type, lh, rh, rel);
   return true;
 }
 
@@ -167,8 +170,12 @@ range_operator::fold_range (irange , tree type,
 	wi_fold (tmp, type, lh_lb, lh_ub, rh_lb, rh_ub);
 	r.union_ (tmp);
 	if (r.varying_p ())
-	  return true;
+	  {
+	op1_op2_relation_effect (r, type, lh, rh, rel);
+	return true;
+	  }
   }
+  op1_op2_relation_effect (r, type, lh, rh, rel);
   return true;
 }
 
@@ -178,7 +185,8 @@ bool
 range_operator::op1_range (irange  ATTRIBUTE_UNUSED,
 			   tree type ATTRIBUTE_UNUSED,
 			   const 

[COMMITTED 1/7] Initial value-relation code.

2021-06-22 Thread Andrew MacLeod via Gcc-patches

This file introduces a relation oracle to GCC.

Rather than introducing a new enum for relations, I chose to reuse a 
range of enum tree_code, leaving us with mostly familiar names:


EQ_EXPR, NE_EXPR, GT_EXPR, GE_EXPR, LT_EXPR and LE_EXPR.  In addition to 
these relations, are 2 other codes:

#define VREL_NONE  TRUTH_NOT_EXPR
#define VREL_EMPTY LTGT_EXPR
VREL_NONE represents a lack of a relation (ie  a = b + c   there is no 
relation between b and c, so thats VREL_NONE)


VREL_EMPTY represents an EMPTY , or impossible relation.  This is 
usually the result of combining relations that apply (ie, a > b && a < 
b  provides a VREL_EMPTY representing the relation between a and b on 
the true side.  Its impossible.)


The additional relations have tree codes chosen carefully to form a 
contiguous series from VREL_NONE to  NE_EXPR enabling some internal 
calculation tables to be used indexing from 0..last_relation .   This is 
easily changed, but seems convenient.



The oracle is pretty basic to start, but will be enhanced as we move 
along. It requires a dominance tree and stores/looks up things based on 
dominance. It hasn't been extensively analyzed in an iterative/on-demand 
environment yet, so there may be some warts that show up.  It should 
never give a wrong result, but its possible  it may miss something in 
some instances.  We'll work those out as we find them.     It exists in 
2 parts:
- an equivalence oracle which manages all the EQ_EXPR relations, and 
groups equivalences as sets, and
- a relation oracle which derives from that which handles all the other 
relations.


The API is straightforward.  Relations are registered via one of 2 routines:

void register_relation (gimple *stmt, relation_kind k, tree op1, tree op2);
void register_relation (edge e, relation_kind k, tree op1, tree op2);

and there is a single query routine:

relation_kind query_relation (basic_block bb, tree ssa1, tree ssa2);

Our friend the range_query object now contains a relation pointer, and 
this is set and used by ranger.  range_query has also had 2 query 
routines added so if a pass is using a ranger, it can also invoke the 
range queries that same way range_of_expr is invokes.  ie:  
get_range_query (cfun)->query_relation (stmt, ssa1, ssa2)   THis 
mechanism has the advantage that you dont need to check if an oracle is 
present, it'll simple return VREL_NONE (no relation) between 2 ssa-names 
if there is no oracle.


More advanced users can access the oracle directly via the 
get_range_query (cfun)->oracle () call. It would be possible to create 
an oracle for a pass without a ranger, and manage the relations in the 
pass. That is not being done yet, but would be easy enough to add the 
enable/disable() oracle routines much like the enable/disable_ranger 
routines.


When using ranger, this information is set and propagated automatically, 
and the results are transparently applied to the ranges that are 
generated.  For instance,


if (a_2 < b_3)   // register relation a_2 < b_3 on the true edge
   c_3 = a_2 > b_3    // applies a_2 < b_3, and the range of c_3 is set 
to [0, 0]  or false.


Currently only direct 1st order relations are tracked. I will get to 
transitive relations soon, but that is not in this first round. 
Relations on edges are also currently limited to single predecessor 
blocks ..  They are simply dropped/ignored if the destination of the 
edge has multiple preds.


I will be writing up a relation guide in the not too distant future, 
alone with a few of the other components.


This patch provides the oracle, as well as the range_query interface, 
but does not actually do anything.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew



>From 3aaa69e5f30e1904d7ca2bb711b1cb0c62b6895f Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 17 Jun 2021 10:19:31 -0400
Subject: [PATCH 1/7] Initial value-relation code.

This code provides a both an equivalence and relation oracle which can be
accessed via a range_query object.  This initial code drop includes the
oracles and access them, but does not utilize them yet.

	* Makefile.in (OBJS): Add value-relation.o.
	* gimple-range.h: Adjust include files.
	* tree-data-ref.c: Adjust include file order.
	* value-query.cc (range_query::get_value_range): Default to no oracle.
	(range_query::query_relation): New.
	(range_query::query_relation): New.
	* value-query.h (class range_query): Adjust.
	* value-relation.cc: New.
	* value-relation.h: New.
---
 gcc/Makefile.in   |   1 +
 gcc/gimple-range.h|   2 +-
 gcc/tree-data-ref.c   |   2 +-
 gcc/value-query.cc|  50 +++
 gcc/value-query.h |  11 +
 gcc/value-relation.cc | 932 ++
 gcc/value-relation.h  | 159 +++
 7 files changed, 1155 insertions(+), 2 deletions(-)
 create mode 100644 gcc/value-relation.cc
 create mode 100644 gcc/value-relation.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 

Re: [PATCH] testsuite: Add testcase for recently fixed PR [PR101159]

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Jakub Jelinek wrote:

> On Tue, Jun 22, 2021 at 11:00:51AM +0200, Richard Biener wrote:
> > 2021-06-22  Richard Biener  
> > 
> > PR tree-optimization/101159
> > * tree-vect-patterns.c (vect_recog_popcount_pattern): Add
> > missing NULL vectype check.
> 
> The following patch adds the testcase for it, IMHO it can't hurt and
> from my experience testcases often trigger other bugs later on (rather
> than the original bugs reappearing, though even that happens),
> and also fixes a couple of typos in the new function.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2021-06-22  Jakub Jelinek  
> 
>   PR tree-optimization/101159
>   * tree-vect-patterns.c (vect_recog_widen_minus_pattern): Fix some
>   comment typos.
> 
>   * gcc.c-torture/compile/pr101159.c: New test.
> 
> --- gcc/tree-vect-patterns.c.jj   2021-06-22 12:19:09.168846556 +0200
> +++ gcc/tree-vect-patterns.c  2021-06-22 12:41:35.334932438 +0200
> @@ -1300,7 +1300,7 @@ vect_recog_widen_minus_pattern (vec_info
> TYPE1 B;
> UTYPE2 temp_in;
> TYPE3 temp_out;
> -   temp_in = (TYPE2)A;
> +   temp_in = (UTYPE2)A;
>  
> temp_out = __builtin_popcount{,l,ll} (temp_in);
> B = (TYPE1) temp_out;
> @@ -1372,8 +1372,8 @@ vect_recog_popcount_pattern (vec_info *v
>if (!rhs_origin)
>  return NULL;
>  
> -  /* Input and outout of .POPCOUNT should be same-precision integer.
> - Also A should be unsigned or same presion as temp_in,
> +  /* Input and output of .POPCOUNT should be same-precision integer.
> + Also A should be unsigned or same precision as temp_in,
>   otherwise there would be sign_extend from A to temp_in.  */
>if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type)
>|| (!TYPE_UNSIGNED (unprom_diff.type)
> @@ -1384,7 +1384,7 @@ vect_recog_popcount_pattern (vec_info *v
>  
>vect_pattern_detected ("vec_regcog_popcount_pattern", popcount_stmt);
>vec_type = get_vectype_for_scalar_type (vinfo, lhs_type);
> -  /* Do it only the backend existed popcount2.  */
> +  /* Do it only if the backend has popcount2 pattern.  */
>if (!vec_type
>|| !direct_internal_fn_supported_p (IFN_POPCOUNT, vec_type,
> OPTIMIZE_FOR_SPEED))
> --- gcc/testsuite/gcc.c-torture/compile/pr101159.c.jj 2021-06-22 
> 12:41:54.742665843 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/pr101159.c2021-06-22 
> 12:38:15.267680653 +0200
> @@ -0,0 +1,10 @@
> +/* PR tree-optimization/101159 */
> +
> +unsigned long a;
> +long b;
> +
> +void
> +foo (void)
> +{
> +  a += __builtin_popcountl (b);
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] expand: Fix up empty class return optimization [PR101160]

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Jakub Jelinek wrote:

> On Mon, Jun 14, 2021 at 11:24:22PM -0400, Jason Merrill via Gcc-patches wrote:
> > The x86_64 psABI says that an empty class isn't passed or returned in 
> > memory or
> > registers, so we shouldn't set %eax in this function.  Is this a reasonable
> > place to implement that?  Another possibility would be to remove the hack to
> > prevent i386.c:function_value_64 from returning NULL in this case and fix 
> > the
> > callers to deal, but that seems like more work.
> > 
> > The df-scan hunk catches the case where we look at a 0-length reg and build
> > a range the length of unsigned int, which happened before I changed
> > assign_parms to match expand_function_end.
> 
> The assign_params change unfortunately breaks e.g. the following testcase.
> The problem is that some passes (e.g. subreg lowering but assign_parms
> comments also talk about delayed slot scheduling) rely on crtl->return_rtx
> not to contain pseudo registers, and the assign_parms change results
> in the pseudo in there not being replaced with a hard register.
> 
> The following patch instead clears the crtl->return_rtx if a function
> returns TYPE_EMPTY_P structure, that way (use (pseudo)) is not emitted
> into the IL and it is treated like more like functions returning void.
> 
> I've also changed the effective target on the empty-class1.C testcase, so
> that it doesn't fail on x86_64-linux with -m32 testing.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.
 
> 2021-06-22  Jakub Jelinek  
> 
>   PR middle-end/101160
>   * function.c (assign_parms): For decl_result with TYPE_EMPTY_P type
>   clear crtl->return_rtx instead of keeping it referencing a pseudo.
> 
>   * g++.target/i386/empty-class1.C: Require lp64 effective target
>   instead of x86_64-*-*.
>   * g++.target/i386/empty-class2.C: New test.
> 
> --- gcc/function.c.jj 2021-06-22 10:04:46.0 +0200
> +++ gcc/function.c2021-06-22 11:30:36.615264498 +0200
> @@ -3821,17 +3821,22 @@ assign_parms (tree fndecl)
>tree decl_result = DECL_RESULT (fndecl);
>rtx decl_rtl = DECL_RTL (decl_result);
>  
> -  if ((REG_P (decl_rtl)
> -? REGNO (decl_rtl) >= FIRST_PSEUDO_REGISTER
> -: DECL_REGISTER (decl_result))
> -   /* Unless the psABI says not to.  */
> -   && !TYPE_EMPTY_P (TREE_TYPE (decl_result)))
> +  if (REG_P (decl_rtl)
> +   ? REGNO (decl_rtl) >= FIRST_PSEUDO_REGISTER
> +   : DECL_REGISTER (decl_result))
>   {
> rtx real_decl_rtl;
>  
> -   real_decl_rtl = targetm.calls.function_value (TREE_TYPE (decl_result),
> - fndecl, true);
> -   REG_FUNCTION_VALUE_P (real_decl_rtl) = 1;
> +   /* Unless the psABI says not to.  */
> +   if (TYPE_EMPTY_P (TREE_TYPE (decl_result)))
> + real_decl_rtl = NULL_RTX;
> +   else
> + {
> +   real_decl_rtl
> + = targetm.calls.function_value (TREE_TYPE (decl_result),
> + fndecl, true);
> +   REG_FUNCTION_VALUE_P (real_decl_rtl) = 1;
> + }
> /* The delay slot scheduler assumes that crtl->return_rtx
>holds the hard register containing the return value, not a
>temporary pseudo.  */
> --- gcc/testsuite/g++.target/i386/empty-class1.C.jj   2021-06-22 
> 10:04:46.377208914 +0200
> +++ gcc/testsuite/g++.target/i386/empty-class1.C  2021-06-22 
> 11:37:53.463375502 +0200
> @@ -1,5 +1,5 @@
>  // PR target/88529
> -// { dg-do compile { target { c++11 && x86_64-*-* } } }
> +// { dg-do compile { target { c++11 && lp64 } } }
>  // { dg-additional-options -fdump-rtl-expand }
>  // { dg-final { scan-rtl-dump-not "set" "expand" } }
>  // The x86_64 psABI says that f() doesn't put the return value anywhere.
> --- gcc/testsuite/g++.target/i386/empty-class2.C.jj   2021-06-22 
> 11:34:53.422805115 +0200
> +++ gcc/testsuite/g++.target/i386/empty-class2.C  2021-06-22 
> 11:35:34.048257864 +0200
> @@ -0,0 +1,20 @@
> +// PR middle-end/101160
> +// Test passing aligned empty aggregate
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +// { dg-additional-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } 
> && ilp32 } } }
> +
> +struct S { union {} a; } __attribute__((aligned));
> +
> +S
> +foo (S arg)
> +{
> +  return arg;
> +}
> +
> +void
> +bar (void)
> +{
> +  S arg;
> +  foo (arg);
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH]middle-end[RFC] slp: new implementation of complex numbers

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Richard Biener wrote:

> On Mon, 21 Jun 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> > 
[...]
> > since we are removing the TWO_OPERANDS node we need to drop one of the 
> > multiply
> > and so we need to give it the original 2 vectors as a parameter.  The 
> > current
> > implementation emits a permute operation to combine the loads again and 
> > later
> > pushes the permute down.  This is problematic as it again requires us to do 
> > df
> > analysis early.
> > 
> > To counter this, in the patch I have changed loads to no longer come out of
> > build_slp with LOAD_PERMUTES but instead to have a VEC_PERM above each load.
> 
> Yep.  I've been there before (not sure if I ever sent you my 
> work-in-progress here).  There's some wrongs in your patch but I guess
> doing this exactly for the permutes optimize_slp handles would be fine.
> 
> We should see to do this independently of the stuff above, I can handle
> this and will prepare a patch for this later.

So it's of course difficult and the current optimize_slp tied closely
to what the original vect_attempt_slp_rearrange_stmts did.

If you for example consider

double x[2], y[2], z[4];
void foo ()
{
  z[0] = x[0] + y[0];
  z[1] = x[1] + y[1];
  z[2] = x[1] + y[0];
  z[3] = x[0] + y[1];
}

then the x[0], x[1] loads look unform enough to be handled
but of course we end up with a group_size of 4 here and
a { 0, 1, 1, 0 } load permutation which optimize_slp wouldn't
handle either.  Of course in the end we should have split
the SLP at vector size boundary but the question is how
we should ensure this (if at all ...) or if we should
eventually even create a SLP tree like

  { x[0], x[1] }
 |   \
 | VEC_PERM { 0[1], 0[0] }
 \  /
  VEC_PERM { 0[0], 0[1], 1[0], 1[1] }
 |

for the load.  Note all lane splitting/duplicating has
effects on vectorization factor compute - one complication
I ran into when originally trying to split out load permutations
from loads.

I'm not sure whether your example motivating the load SLP
changes is good enough - if you consider that the loaded
values get modified, like as { x[0]/a + y[1]/a, x[1]/a - y[0]/a }
splitting the load permutation from the load will not get you
the division CSEd at SLP build and if we divide by a different value
there's no CSE opportunity.  What would work and what should work
right now is that you get a lane permute down but you'll not know
whether values are "related"?  If you need that info and that
directly on the child you can look at the representatives
DR group leader, if any, as a heuristic.

I've pasted below what I was playing with, it shows CSE for
cases like

double x[2], z[4];
void foo ()
{
  z[0] = x[0] + 2 * x[1];
  z[1] = x[1] + 2 * x[0];
}

but it breaks various vect.exp tests that look for permutes
being merged with reductions (thus the optimize_slp propagation
somehow doesn't work - as I said it's a bit fragile).

Richard.

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 6a26ccdd290..187bbfb70db 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -343,12 +343,14 @@ vect_slp_tree_uniform_p (slp_tree node)
 }
 
 /* Find the place of the data-ref in STMT_INFO in the interleaving chain
-   that starts from FIRST_STMT_INFO.  Return -1 if the data-ref is not a part
-   of the chain.  */
+   that starts from FIRST_STMT_INFO.  If ADD_GAPS is true then if there's
+   a gap between elements account for that as well.
+   Return -1 if the data-ref is not a part of the chain.  */
 
 int
 vect_get_place_in_interleaving_chain (stmt_vec_info stmt_info,
- stmt_vec_info first_stmt_info)
+ stmt_vec_info first_stmt_info,
+ bool add_gaps)
 {
   stmt_vec_info next_stmt_info = first_stmt_info;
   int result = 0;
@@ -362,7 +364,7 @@ vect_get_place_in_interleaving_chain (stmt_vec_info 
stmt_info,
return result;
   next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
   if (next_stmt_info)
-   result += DR_GROUP_GAP (next_stmt_info);
+   result += add_gaps ? DR_GROUP_GAP (next_stmt_info) : 1;
 }
   while (next_stmt_info);
 
@@ -1769,24 +1771,65 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
{
  *max_nunits = this_max_nunits;
  (*tree_size)++;
- node = vect_create_new_slp_node (node, stmts, 0);
- SLP_TREE_VECTYPE (node) = vectype;
  /* And compute the load permutation.  Whether it is actually
 a permutation depends on the unrolling factor which is
 decided later.  */
- vec load_permutation;
  int j;
  stmt_vec_info load_info;
+ stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmts[0]);
+ vec load_permutation;
  load_permutation.create (group_size);
- stmt_vec_info first_stmt_info
-   = DR_GROUP_FIRST_ELEMENT 

[PATCH] testsuite: Add testcase for recently fixed PR [PR101159]

2021-06-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 22, 2021 at 11:00:51AM +0200, Richard Biener wrote:
> 2021-06-22  Richard Biener  
> 
>   PR tree-optimization/101159
>   * tree-vect-patterns.c (vect_recog_popcount_pattern): Add
>   missing NULL vectype check.

The following patch adds the testcase for it, IMHO it can't hurt and
from my experience testcases often trigger other bugs later on (rather
than the original bugs reappearing, though even that happens),
and also fixes a couple of typos in the new function.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-22  Jakub Jelinek  

PR tree-optimization/101159
* tree-vect-patterns.c (vect_recog_widen_minus_pattern): Fix some
comment typos.

* gcc.c-torture/compile/pr101159.c: New test.

--- gcc/tree-vect-patterns.c.jj 2021-06-22 12:19:09.168846556 +0200
+++ gcc/tree-vect-patterns.c2021-06-22 12:41:35.334932438 +0200
@@ -1300,7 +1300,7 @@ vect_recog_widen_minus_pattern (vec_info
TYPE1 B;
UTYPE2 temp_in;
TYPE3 temp_out;
-   temp_in = (TYPE2)A;
+   temp_in = (UTYPE2)A;
 
temp_out = __builtin_popcount{,l,ll} (temp_in);
B = (TYPE1) temp_out;
@@ -1372,8 +1372,8 @@ vect_recog_popcount_pattern (vec_info *v
   if (!rhs_origin)
 return NULL;
 
-  /* Input and outout of .POPCOUNT should be same-precision integer.
- Also A should be unsigned or same presion as temp_in,
+  /* Input and output of .POPCOUNT should be same-precision integer.
+ Also A should be unsigned or same precision as temp_in,
  otherwise there would be sign_extend from A to temp_in.  */
   if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type)
   || (!TYPE_UNSIGNED (unprom_diff.type)
@@ -1384,7 +1384,7 @@ vect_recog_popcount_pattern (vec_info *v
 
   vect_pattern_detected ("vec_regcog_popcount_pattern", popcount_stmt);
   vec_type = get_vectype_for_scalar_type (vinfo, lhs_type);
-  /* Do it only the backend existed popcount2.  */
+  /* Do it only if the backend has popcount2 pattern.  */
   if (!vec_type
   || !direct_internal_fn_supported_p (IFN_POPCOUNT, vec_type,
  OPTIMIZE_FOR_SPEED))
--- gcc/testsuite/gcc.c-torture/compile/pr101159.c.jj   2021-06-22 
12:41:54.742665843 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr101159.c  2021-06-22 
12:38:15.267680653 +0200
@@ -0,0 +1,10 @@
+/* PR tree-optimization/101159 */
+
+unsigned long a;
+long b;
+
+void
+foo (void)
+{
+  a += __builtin_popcountl (b);
+}


Jakub



[PATCH] expand: Fix up empty class return optimization [PR101160]

2021-06-22 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 14, 2021 at 11:24:22PM -0400, Jason Merrill via Gcc-patches wrote:
> The x86_64 psABI says that an empty class isn't passed or returned in memory 
> or
> registers, so we shouldn't set %eax in this function.  Is this a reasonable
> place to implement that?  Another possibility would be to remove the hack to
> prevent i386.c:function_value_64 from returning NULL in this case and fix the
> callers to deal, but that seems like more work.
> 
> The df-scan hunk catches the case where we look at a 0-length reg and build
> a range the length of unsigned int, which happened before I changed
> assign_parms to match expand_function_end.

The assign_params change unfortunately breaks e.g. the following testcase.
The problem is that some passes (e.g. subreg lowering but assign_parms
comments also talk about delayed slot scheduling) rely on crtl->return_rtx
not to contain pseudo registers, and the assign_parms change results
in the pseudo in there not being replaced with a hard register.

The following patch instead clears the crtl->return_rtx if a function
returns TYPE_EMPTY_P structure, that way (use (pseudo)) is not emitted
into the IL and it is treated like more like functions returning void.

I've also changed the effective target on the empty-class1.C testcase, so
that it doesn't fail on x86_64-linux with -m32 testing.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-22  Jakub Jelinek  

PR middle-end/101160
* function.c (assign_parms): For decl_result with TYPE_EMPTY_P type
clear crtl->return_rtx instead of keeping it referencing a pseudo.

* g++.target/i386/empty-class1.C: Require lp64 effective target
instead of x86_64-*-*.
* g++.target/i386/empty-class2.C: New test.

--- gcc/function.c.jj   2021-06-22 10:04:46.0 +0200
+++ gcc/function.c  2021-06-22 11:30:36.615264498 +0200
@@ -3821,17 +3821,22 @@ assign_parms (tree fndecl)
   tree decl_result = DECL_RESULT (fndecl);
   rtx decl_rtl = DECL_RTL (decl_result);
 
-  if ((REG_P (decl_rtl)
-  ? REGNO (decl_rtl) >= FIRST_PSEUDO_REGISTER
-  : DECL_REGISTER (decl_result))
- /* Unless the psABI says not to.  */
- && !TYPE_EMPTY_P (TREE_TYPE (decl_result)))
+  if (REG_P (decl_rtl)
+ ? REGNO (decl_rtl) >= FIRST_PSEUDO_REGISTER
+ : DECL_REGISTER (decl_result))
{
  rtx real_decl_rtl;
 
- real_decl_rtl = targetm.calls.function_value (TREE_TYPE (decl_result),
-   fndecl, true);
- REG_FUNCTION_VALUE_P (real_decl_rtl) = 1;
+ /* Unless the psABI says not to.  */
+ if (TYPE_EMPTY_P (TREE_TYPE (decl_result)))
+   real_decl_rtl = NULL_RTX;
+ else
+   {
+ real_decl_rtl
+   = targetm.calls.function_value (TREE_TYPE (decl_result),
+   fndecl, true);
+ REG_FUNCTION_VALUE_P (real_decl_rtl) = 1;
+   }
  /* The delay slot scheduler assumes that crtl->return_rtx
 holds the hard register containing the return value, not a
 temporary pseudo.  */
--- gcc/testsuite/g++.target/i386/empty-class1.C.jj 2021-06-22 
10:04:46.377208914 +0200
+++ gcc/testsuite/g++.target/i386/empty-class1.C2021-06-22 
11:37:53.463375502 +0200
@@ -1,5 +1,5 @@
 // PR target/88529
-// { dg-do compile { target { c++11 && x86_64-*-* } } }
+// { dg-do compile { target { c++11 && lp64 } } }
 // { dg-additional-options -fdump-rtl-expand }
 // { dg-final { scan-rtl-dump-not "set" "expand" } }
 // The x86_64 psABI says that f() doesn't put the return value anywhere.
--- gcc/testsuite/g++.target/i386/empty-class2.C.jj 2021-06-22 
11:34:53.422805115 +0200
+++ gcc/testsuite/g++.target/i386/empty-class2.C2021-06-22 
11:35:34.048257864 +0200
@@ -0,0 +1,20 @@
+// PR middle-end/101160
+// Test passing aligned empty aggregate
+// { dg-do compile }
+// { dg-options "-O2" }
+// { dg-additional-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } && 
ilp32 } } }
+
+struct S { union {} a; } __attribute__((aligned));
+
+S
+foo (S arg)
+{
+  return arg;
+}
+
+void
+bar (void)
+{
+  S arg;
+  foo (arg);
+}


Jakub



Re: [PATCH] fold-const: Return corresponding integral type for OFFSET_TYPE in range_check_type [PR101162]

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Jakub Jelinek wrote:

> Hi!
> 
> Andrew's recent r12-1608-g2f1686ff70b25fceb04ca2ffc0a450fb682913ef change
> to fail verification on various unary and binary operations with OFFSET_TYPE
> revealed that e.g. switchconf happily performs multiplications and additions
> in OFFSET_TYPE.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

> 2021-06-22  Jakub Jelinek  
>   Andrew Pinski  
> 
>   PR tree-optimization/101162
>   * fold-const.c (range_check_type): Handle OFFSET_TYPE like pointer
>   types.
> 
>   * g++.dg/opt/pr101162.C: New test.
> 
> --- gcc/fold-const.c.jj   2021-06-14 12:27:18.572411152 +0200
> +++ gcc/fold-const.c  2021-06-22 11:06:49.718462451 +0200
> @@ -5360,7 +5360,7 @@ range_check_type (tree etype)
>else
>   return NULL_TREE;
>  }
> -  else if (POINTER_TYPE_P (etype))
> +  else if (POINTER_TYPE_P (etype) || TREE_CODE (etype) == OFFSET_TYPE)
>  etype = unsigned_type_for (etype);
>return etype;
>  }
> --- gcc/testsuite/g++.dg/opt/pr101162.C.jj2021-06-22 11:08:11.754359910 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr101162.C   2021-06-22 11:07:54.318594243 
> +0200
> @@ -0,0 +1,21 @@
> +// PR tree-optimization/101162
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +
> +struct A { int i1, i2, i3, i4, i5, i6; };
> +
> +int A::*
> +foo (int i)
> +{
> +  switch (i)
> +{
> +case 1: return ::i1;
> +case 2: return ::i2;
> +case 3: return ::i3;
> +case 4: return ::i4;
> +case 5: return ::i5;
> +case 6: return ::i6;
> +}
> +
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] fold-const: Return corresponding integral type for OFFSET_TYPE in range_check_type [PR101162]

2021-06-22 Thread Jakub Jelinek via Gcc-patches
Hi!

Andrew's recent r12-1608-g2f1686ff70b25fceb04ca2ffc0a450fb682913ef change
to fail verification on various unary and binary operations with OFFSET_TYPE
revealed that e.g. switchconf happily performs multiplications and additions
in OFFSET_TYPE.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2021-06-22  Jakub Jelinek  
Andrew Pinski  

PR tree-optimization/101162
* fold-const.c (range_check_type): Handle OFFSET_TYPE like pointer
types.

* g++.dg/opt/pr101162.C: New test.

--- gcc/fold-const.c.jj 2021-06-14 12:27:18.572411152 +0200
+++ gcc/fold-const.c2021-06-22 11:06:49.718462451 +0200
@@ -5360,7 +5360,7 @@ range_check_type (tree etype)
   else
return NULL_TREE;
 }
-  else if (POINTER_TYPE_P (etype))
+  else if (POINTER_TYPE_P (etype) || TREE_CODE (etype) == OFFSET_TYPE)
 etype = unsigned_type_for (etype);
   return etype;
 }
--- gcc/testsuite/g++.dg/opt/pr101162.C.jj  2021-06-22 11:08:11.754359910 
+0200
+++ gcc/testsuite/g++.dg/opt/pr101162.C 2021-06-22 11:07:54.318594243 +0200
@@ -0,0 +1,21 @@
+// PR tree-optimization/101162
+// { dg-do compile }
+// { dg-options "-O2" }
+
+struct A { int i1, i2, i3, i4, i5, i6; };
+
+int A::*
+foo (int i)
+{
+  switch (i)
+{
+case 1: return ::i1;
+case 2: return ::i2;
+case 3: return ::i3;
+case 4: return ::i4;
+case 5: return ::i5;
+case 6: return ::i6;
+}
+
+  return 0;
+}

Jakub



[PATCH v2] libstdc++: Improve std::lock algorithm

2021-06-22 Thread Jonathan Wakely via Gcc-patches
On Tue, 22 Jun 2021 at 10:07, Matthias Kretz wrote:
>
> On Monday, 21 June 2021 19:31:59 CEST Jonathan Wakely via Gcc-patches wrote:
> > +// Lock the last element of the tuple, after all previous ones are
> > locked. +template
> > +  inline __enable_if_t<_Idx + 1 == sizeof...(_Lockables), int>
> > +  __try_lock_impl(tuple<_Lockables&...>& __lockables)
>
> Couldn't you drop the need for enable_if and tuple if you define the function
> like this? (Or - without constexpr if - two overloads with
> __try_lock_impl(_L1& __l1) and __try_lock_impl(_L1& __l1, _L2& __l2,
> _Lockables&... __lockables)
>
> template
>   inline int
>   __try_lock_impl(_L1& __l1, _Lockables&... __lockables)
>   {
> if (auto __lock = __detail::__try_to_lock(__l1))
>   {
> if constexpr (sizeof...(_Lockables))
>   {
> int __idx = __detail::__try_lock_impl(__lockables...);
> if (__idx >= 0)
>   return __idx + 1;
>   }
> __lock.release();
> return -1;
>   }
> else
>   return 0;
>   }

Yes, I did try something like that, but we can't use if-constexpr
unconditionally. Doing it with two overloads still needed enable_if or
tag dispatching, but that's because I retained the use of std::tuple,
so was passing around the whole parameter pack. With your suggestion
to also drop std::tuple the number of parameters decides which
function we call. And we don't instantiate std::tuple. And we can also
get rid of the __try_to_lock function, which was only used to deduce
the lock type rather than use tuple_element to get it. That's much
nicer.

>
> > [...]
> > +template
> > +  void
> > +  __lock_impl(int& __i, int __depth, _L0& __l0, _L1&... __l1)
> > +  {
>
> How about optimizing a likely common case where all lockables have the same
> type? In that case we don't require recursion and can manage stack usage much
> simpler:

The stack usage is bounded by the number of mutexes being locked,
which is unlikely to get large, but we can do that.

We can do it for try_lock too:

  template
int
try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
{
#if __cplusplus >= 201703L
  if constexpr (is_same_v<_L1, _L2>
&& (is_same_v<_L1, _L3> && ...))
{
  constexpr int _Np = 2 + sizeof...(_L3);
  unique_lock<_L1> __locks[_Np] = {
  {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3, try_to_lock}...
  };
  for (int __i = 0; __i < _Np; ++__i)
if (!__locks[__i])
  return __i;
  for (auto& __l : __locks)
__l.release();
  return -1;
}
  else
#endif
  return __detail::__try_lock_impl(__l1, __l2, __l3...);
}



>
>   if constexpr ((is_same_v<_L0, _L1> && ...))
> {
>   constexpr int _Np = 1 + sizeof...(_L1);
>   std::array, _Np> __locks = {
> {__l0, defer_lock}, {__l1, defer_lock}...
>   };
>   int __first = 0;
>   do {
> __locks[__first].lock();
> for (int __j = 1; __j < _Np; ++__j)
>   {
> const int __idx = (__first + __j) % _Np;
> if (!__locks[__idx].try_lock())
>   {
> for (int __k = __idx; __k != __first;
> __k = __k == 1 ? _Np : __k - 1)
>   __locks[__k - 1].unlock();

This loop doesn't work if any try_lock fails when first==0, because
the loop termination condition is never reached.

I find this a bit easier to understand than the loop above, and
correct (I think):

  for (int __k = __j; __k != 0; --__k)
__locks[(__first + __k - 1) % _Np].unlock();

I'll finish testing the attached patch. I should probably add more
tests, so that each test is run for a set of lockables of the same
type, and also for lockables of different types.
commit 7d7cf35ed3c4b9d8c7fc4a52b4c7b1788c85c46d
Author: Jonathan Wakely 
Date:   Tue Jun 22 13:35:19 2021

libstdc++: Simplify std::try_lock and std::lock further

The std::try_lock and std::lock algorithms can use iteration instead of
recursion when all lockables have the same type and can be held by an
array of unique_lock objects.

By making this change to __detail::__try_lock_impl it also benefits
__detail::__lock_impl, which uses it. For std::lock we can just put the
iterative version directly in std::lock, to avoid making any call to
__detail::__lock_impl.

Signed-off-by: Matthias Kretz 
Signed-off-by: Jonathan Wakely 

Co-authored-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

* include/std/mutex (lock): Replace recursion with iteration
when lockables all have the same type.
(__detail::__try_lock_impl): Likewise. Pass lockables as
parameters, instead of a tuple. Always lock the first one, and
recurse for the rest.

Re: [PATCH] AArch64: Add support for __builtin_roundeven[f] [PR100966]

2021-06-22 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

> So rather than have two patterns that generate frintn, I think
> it would be better to change the existing frint_pattern entry to
> "roundeven" instead, and fix whatever the fallout is.  Hopefully it
> shouldn't be too bad, since we already use the optab names for the
> other UNSPEC_FRINT* codes.

Well it requires various changes to the arm_neon headers since they use
existing intrinsics. If that is not considered a risky ABI change then here is 
v2:

Enable __builtin_roundeven[f] by changing existing frintn to roundeven.

Bootstrap OK and passes regress.

ChangeLog:
2021-06-18  Wilco Dijkstra  

PR target/100966
* config/aarch64/aarch64.md (frint_pattern): Update comment.
* config/aarch64/aarch64-simd-builtins.def: Change frintn to roundeven.
* config/aarch64/arm_fp16.h: Change frintn to roundeven.
* config/aarch64/arm_neon.h: Likewise.
* config/aarch64/iterators.md (frint_pattern): Use roundeven for FRINTN.

gcc/testsuite
PR target/100966
* gcc.target/aarch64/frint.x: Add roundeven tests.
* gcc.target/aarch64/frint_double.c: Likewise.
* gcc.target/aarch64/frint_float.c: Likewise.

---

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 
b885bd5b38bf7ad83eb9d801284bf9b34db17210..534cc8ccb538c4fb0c208a7035020e131656260d
 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -485,7 +485,7 @@
   BUILTIN_VHSDF (UNOP, nearbyint, 2, FP)
   BUILTIN_VHSDF (UNOP, rint, 2, FP)
   BUILTIN_VHSDF (UNOP, round, 2, FP)
-  BUILTIN_VHSDF_HSDF (UNOP, frintn, 2, FP)
+  BUILTIN_VHSDF_HSDF (UNOP, roundeven, 2, FP)

   VAR1 (UNOP, btrunc, 2, FP, hf)
   VAR1 (UNOP, ceil, 2, FP, hf)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
30effca6f3562f6870a6cc8097750e63bb0d424d..8977330b142d2cde1d2faa4a01282b01a68e25c5
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5922,7 +5922,7 @@ (define_insn "*bswapsi2_uxtw"
 ;; ---

 ;; frint floating-point round to integral standard patterns.
-;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
+;; Expands to btrunc, ceil, floor, nearbyint, rint, round, roundeven.

 (define_insn "2"
   [(set (match_operand:GPF_F16 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
index 
2afbd1203361b54d6e1315ffaa1bec21834c060e..3efa7e1f19817df1409bf781266f4e238c128f0b
 100644
--- a/gcc/config/aarch64/arm_fp16.h
+++ b/gcc/config/aarch64/arm_fp16.h
@@ -333,7 +333,7 @@ vrndmh_f16 (float16_t __a)
 __extension__ static __inline float16_t __attribute__ ((__always_inline__))
 vrndnh_f16 (float16_t __a)
 {
-  return __builtin_aarch64_frintnhf (__a);
+  return __builtin_aarch64_roundevenhf (__a);
 }

 __extension__ static __inline float16_t __attribute__ ((__always_inline__))
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
baa30bd5a9d96c1bf04a37fb105091ea56a6444a..c88a8a627d3082d3577b0fe222381e93c35d7251
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -24657,35 +24657,35 @@ __extension__ extern __inline float32_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vrndns_f32 (float32_t __a)
 {
-  return __builtin_aarch64_frintnsf (__a);
+  return __builtin_aarch64_roundevensf (__a);
 }

 __extension__ extern __inline float32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vrndn_f32 (float32x2_t __a)
 {
-  return __builtin_aarch64_frintnv2sf (__a);
+  return __builtin_aarch64_roundevenv2sf (__a);
 }

 __extension__ extern __inline float64x1_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vrndn_f64 (float64x1_t __a)
 {
-  return (float64x1_t) {__builtin_aarch64_frintndf (__a[0])};
+  return (float64x1_t) {__builtin_aarch64_roundevendf (__a[0])};
 }

 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vrndnq_f32 (float32x4_t __a)
 {
-  return __builtin_aarch64_frintnv4sf (__a);
+  return __builtin_aarch64_roundevenv4sf (__a);
 }

 __extension__ extern __inline float64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vrndnq_f64 (float64x2_t __a)
 {
-  return __builtin_aarch64_frintnv2df (__a);
+  return __builtin_aarch64_roundevenv2df (__a);
 }

 /* vrndp  */
@@ -31287,14 +31287,14 @@ __extension__ extern __inline float16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vrndn_f16 (float16x4_t __a)
 {
-  return __builtin_aarch64_frintnv4hf (__a);
+  return __builtin_aarch64_roundevenv4hf (__a);
 }

 __extension__ extern __inline float16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vrndnq_f16 (float16x8_t __a)
 {
-  return 

Re: [PATCH]middle-end[RFC] slp: new implementation of complex numbers

2021-06-22 Thread Richard Biener
On Mon, 21 Jun 2021, Tamar Christina wrote:

> Hi Richi,
> 
> This patch is still very much incomplete and I do know that it is 
> missing things but it's complete enough such that examples are working 
> and allows me to show what I'm working towards.
> 
> note, that this approach will remove a lot of code in 
> tree-vect-slp-patterns but to keep the diff readable I've left them in 
> and just commented out the calls or removed them where needed.
> 
> The patch rewrites the complex numbers detection by splitting the 
> detection of structure from dataflow analysis.  In principle the biggest 
> difference between this and the previous implementation is that instead 
> of trying to detect valid complex operations it *makes* an operation a 
> valid complex operation.
> 
> To do this each operation gets a dual optab which matches the same structure 
> but
> has no dataflow requirement.
> 
> i.e. in this patch I added 4, ADDSUB, SUBADD, MUL_ADDSUB, MULL_SUBADD.
> 
> There is a then a mapping between these and their variant with the dataflow:
> 
> * ADDSUB -> COMPLEX_ADD_ROT270
> * SUBADD -> COMPLEX_ADD_ROT90
> * MUL_ADDSUB -> COMPLEX_MUL_CONJ
> * MUL_SUBADD -> COMPLEX_MUL
> 
> with the intention that when we detect the structure of an operation we query
> the backend for both optabs.
> 
> This should result in one of three states:
> 
>  * not supported: Move on.
>  * Supports ADDSUB only: Rewrite using ADDSUB, set type to 'cannot_transform'
>  * Supports COMPLEX_ADD_ROT270 only: Rewrite using ADDSUB, set type to 
> 'must_transform'
>  * Supports both: Rewrite using ADDSUB, set type fo 'can_transform'
> 
> with the idea behind `can_transform` is to check the costs of the inverse
> permute needed to use the complex operation and if this is very expensive then
> stick to addsub.  This requires the target to be able to cost the operations
> reasonably correct.
> 
> So for ADD this looks like
> 
>  === vect_match_slp_patterns ===
>  Analyzing SLP tree 0x494e970 for patterns
>  Found ADDSUB pattern in SLP tree
>  Target does not support ADDSUB for vector type vector(4) float
>  Found COMPLEX_ADD_ROT270 pattern in SLP tree
>  Target supports COMPLEX_ADD_ROT270 vectorization with mode vector(4) float
> Pattern matched SLP tree
> node 0x494e970 (max_nunits=4, refcnt=1)
> op template: REALPART_EXPR <*_10> = _23;
>   stmt 0 REALPART_EXPR <*_10> = _23;
>   stmt 1 IMAGPART_EXPR <*_10> = _22;
>   children 0x494ea00
> node 0x494ea00 (max_nunits=4, refcnt=1)
> op template: slp_patt_39 = .ADDSUB (_23, _23);
>   stmt 0 _23 = _6 + _13;
>   stmt 1 _22 = _12 - _8;
>   children 0x494eb20 0x494ebb0
> node 0x494eb20 (max_nunits=4, refcnt=1)
> op template: _13 = REALPART_EXPR <*_3>;
>   stmt 0 _13 = REALPART_EXPR <*_3>;
>   stmt 1 _12 = IMAGPART_EXPR <*_3>;
> node 0x494ebb0 (max_nunits=4, refcnt=1)
> op: VEC_PERM_EXPR
>   { }
>   lane permutation { 0[1] 0[0] }
>   children 0x494ec40
> node 0x494ec40 (max_nunits=1, refcnt=2)
> op template: _8 = REALPART_EXPR <*_5>;
>   stmt 0 _8 = REALPART_EXPR <*_5>;
>   stmt 1 _6 = IMAGPART_EXPR <*_5>;
>   load permutation { 0 1 }
> 
> and later during optimize_slp we get
> 
> Tranforming SLP expression from ADDSUB to COMPLEX_ADD_ROT270
> processing node 0x494ebb0
> simplifying permute node 0x494ebb0
> Optimized SLP instances:
> node 0x494e970 (max_nunits=4, refcnt=1)
> op template: REALPART_EXPR <*_10> = _23;
>stmt 0 REALPART_EXPR <*_10> = _23;
>stmt 1 IMAGPART_EXPR <*_10> = _22;
>children 0x494ea00
> node 0x494ea00 (max_nunits=4, refcnt=1)
> op template: slp_patt_39 = .COMPLEX_ADD_ROT270 (_23, _23);
>stmt 0 _23 = _6 + _13;
>stmt 1 _22 = _12 - _8;
>children 0x494eb20 0x494ebb0
> node 0x494eb20 (max_nunits=4, refcnt=1)
> op template: _13 = REALPART_EXPR <*_3>;
>stmt 0 _13 = REALPART_EXPR <*_3>;
>stmt 1 _12 = IMAGPART_EXPR <*_3>;
> node 0x494ebb0 (max_nunits=4, refcnt=1)
> op: VEC_PERM_EXPR
>{ }
>lane permutation { 0[0] 0[1] }
>children 0x494ec40
> node 0x494ec40 (max_nunits=1, refcnt=2)
> op template: _8 = REALPART_EXPR <*_5>;
>stmt 0 _8 = REALPART_EXPR <*_5>;
>stmt 1 _6 = IMAGPART_EXPR <*_5>;
> 
> Now I still have to elide the VEC_PERM_EXPR here but that's easy.

So having skimmed half of the patch - this means SLP pattern recog
will initially recognize { +, -, +, - } as ADDSUB for example
but not factor in lane permutes from loads yet.  Now, suppose we
have { +, -, -, + } seen in pattern recog - how's that handled?
It might feed operations where we'd like to have inputs permuted
and thus would end up with ADDSUB in the end?

That said, you do

+ /* Check to see if this node can be transformed during permute
+materialization.  */
+ bool patt_trans_cand_p = is_pattern_convert_candidate_p (node);
+ if (patt_trans_cand_p)
+   bitmap_set_bit (n_transform, idx);

in the propagation stage (but this looks like a static marking).

And then just for each of those candidates you somehow "undo" the
permutes on the 

Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-22 Thread Richard Sandiford via Gcc-patches
Richard Sandiford  writes:
>> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>>/* FORNOW.  Can continue analyzing the def-use chain when this stmt in a 
>> phi
>>   inside the loop (in case we are analyzing an outer-loop).  */
>>vect_unpromoted_value unprom0[2];
>> +  enum optab_subtype subtype = optab_vector;
>>if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
>> - false, 2, unprom0, _type))
>> + false, 2, unprom0, _type, ))
>> +return NULL;
>> +
>> +  if (subtype == optab_vector_mixed_sign
>> +  && TYPE_UNSIGNED (unprom_mult.type)
>> +  && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
>>  return NULL;
>
> Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> I.e. we need to reject the case in which we multiply a signed and an
> unsigned value to get a (logically) signed result, but then zero-extend
> it (rather than sign-extend it) to the precision of the addition.
>
> That would make the test:
>
>   if (subtype == optab_vector_mixed_sign
>   && TYPE_UNSIGNED (unprom_mult.type)
>   && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
> return NULL;
>   
> instead.

And folding that into the existing test gives:

  /* If there are two widening operations, make sure they agree on the sign
 of the extension.  The result of an optab_vector_mixed_sign operation
 is signed; otherwise, the result has the same sign as the operands.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
  && (subtype == optab_vector_mixed_sign
  ? TYPE_UNSIGNED (unprom_mult.type)
  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
return NULL;

Thanks,
Richard


[PATCH] middle-end/101156 - remove not working optimization in gimplification

2021-06-22 Thread Richard Biener
This removes a premature and not working optimization from the
gimplifier.  When gimplification is requested not to produce a SSA
name we try to avoid generating a copy when we did so anyway but
instead replace the LHS of its definition.  But that only works in
case there are no uses of the SSA name already which is something
we cannot easily check, so the following removes said optimization.

Statistics on the whole bootstrap shows we hit this optimization
only for libiberty/cp-demangle.c and overall we have 21652112
gimplifications where just 240 copies are elided.  Preserving
the optimization would require scanning the original expression
and the pre and post sequences for SSA names and uses, that seems
excessive to avoid these 240 copies.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2021-06-22  Richard Biener  

PR middle-end/101156
* gimplify.c (gimplify_expr): Remove premature incorrect
optimization.

* gcc.dg/pr101156.c: New testcase.
---
 gcc/gimplify.c  | 15 +--
 gcc/testsuite/gcc.dg/pr101156.c |  8 
 2 files changed, 9 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101156.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 41bae9c188f..21e7a6cc959 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -15128,24 +15128,11 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   bool (*gimple_test_f) (tree), fallback_t fallback,
   bool allow_ssa)
 {
-  bool was_ssa_name_p = TREE_CODE (*expr_p) == SSA_NAME;
   enum gimplify_status ret = gimplify_expr (expr_p, pre_p, post_p,
gimple_test_f, fallback);
   if (! allow_ssa
   && TREE_CODE (*expr_p) == SSA_NAME)
-{
-  tree name = *expr_p;
-  if (was_ssa_name_p)
-   *expr_p = get_initialized_tmp_var (*expr_p, pre_p, NULL, false);
-  else
-   {
- /* Avoid the extra copy if possible.  */
- *expr_p = create_tmp_reg (TREE_TYPE (name));
- if (!gimple_nop_p (SSA_NAME_DEF_STMT (name)))
-   gimple_set_lhs (SSA_NAME_DEF_STMT (name), *expr_p);
- release_ssa_name (name);
-   }
-}
+*expr_p = get_initialized_tmp_var (*expr_p, pre_p, NULL, false);
   return ret;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr101156.c b/gcc/testsuite/gcc.dg/pr101156.c
new file mode 100644
index 000..5c25bd78a02
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101156.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-fchecking" } */
+
+struct S { int i; };
+void baz(struct S *p)
+{
+  __builtin_setjmp(p--);
+}
-- 
2.26.2


Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-22 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review.

Just concentrating on tree-vect-patterns.c, as before:

Tamar Christina  writes:
> @@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type, tree 
> *common_type)
>unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>   TYPE_PRECISION (new_type));
>precision *= 2;
> +
> +  /* The resulting application is unsigned, check if we have enough
> + precision to perform the operation.  */
>if (precision * 2 > TYPE_PRECISION (type))
>  return false;
>  

Not sure what the comment means by “application” here, but the common
type we pick is signed rather than unsigned.

> @@ -546,7 +554,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code 
> code,
> tree_code widened_code, bool shift_p,
> unsigned int max_nops,
> -   vect_unpromoted_value *unprom, tree *common_type)
> +   vect_unpromoted_value *unprom, tree *common_type,
> +   enum optab_subtype *subtype = NULL)
>  {
>/* Check for an integer operation with the right code.  */
>gassign *assign = dyn_cast  (stmt_info->stmt);
> @@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree_code code,
>   = vinfo->lookup_def (this_unprom->op);
> nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>  widened_code, shift_p, max_nops,
> -this_unprom, common_type);
> +this_unprom, common_type,
> +subtype);
> if (nops == 0)
>   return 0;
>  
> @@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree_code code,
>   *common_type = this_unprom->type;
> else if (!vect_joust_widened_type (type, this_unprom->type,
>common_type))
> - return 0;
> + {
> +   if (subtype)
> + {

AIUI, if we get here then:

- there must be one unsigned operand (A) of precision P
- there must be one signed operand (B) with precision <= P
- we can't extend to precision 2*P 

A conversion is needed if B's precision is < P.
That conversion should be to a signed type with precision P.

So…

> +   tree new_type = *common_type;
> +   /* See if we can sign extend the smaller type.  */
> +   if (TYPE_PRECISION (this_unprom->type) > TYPE_PRECISION 
> (new_type)
> +   && (TYPE_UNSIGNED (this_unprom->type) && 
> !TYPE_UNSIGNED (new_type)))

…I think this second line could be an assert and

> + new_type = build_nonstandard_integer_type 
> (TYPE_PRECISION (this_unprom->type), true);

…picking an unsigned type here looks wrong.  The net effect would
be to convert B (the previous signed operand) to an unsigned type.

> +
> +   if (tree_nop_conversion_p (this_unprom->type, new_type))
> + {
> +   *subtype = optab_vector_mixed_sign;
> +   *common_type = new_type;
> + }

IMO the sign of the common type shouldn't matter for optab_vector_mixed_sign:
if we need to convert operands later, it should be to the precision of
the common type but retaining the sign of the original type.
So I think it would be simpler to do:

  if (TYPE_PRECISION (this_unprom->type)
  > TYPE_PRECISION (*common_type)
*common_type = this_unprom->type;
  *subtype = optab_vector_mixed_sign;

here and adjust the conversion code as described below.

This also has the advantage of coping with > 2 operands, in case that
ever becomes important in future.

> @@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree type,
>  }
>  
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If SUBTYPE then don't convert the types if they only
> +   differ by sign.  */
>  
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int 
> n,
>tree *result, tree type, vect_unpromoted_value *unprom,
> -  tree vectype)
> +  tree vectype, enum optab_subtype subtype = optab_default)
>  {
>for (unsigned int i = 0; i < n; ++i)
>  {
> @@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info 
> stmt_info, unsigned int n,
>for (j = 0; j < i; ++j)
>   if (unprom[j].op == unprom[i].op)
> break;
> +
>if (j < i)
>   result[i] = result[j];
> +  else if (subtype == 

Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-22 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 22, 2021 at 12:34 PM Richard Biener  wrote:
>
> On Tue, 22 Jun 2021, Uros Bizjak wrote:
>
> > On Tue, Jun 22, 2021 at 11:42 AM Richard Sandiford
> >  wrote:
> >
> > > >> Well, the pattern is called addsub in the x86 world because highpart
> > > >> does add and lowpart does sub. In left-to-right writing systems
> > > >> highpart comes before lowpart, so you have addsub.
> > > >
> > > > The other targets mentioned do not seem to agree but I can live
> > > > with that, thus I'll change back to addsub.
> > >
> > > FWIW, subadd sounds clearer to me too.  It seems surprising to put
> > > imaginary before real when interpreting something as complex, for example.
> > >
> > > Putting the highpart first feels especially odd on an LE system like x86…
> >
> > The XMM vector is documented left to right with MSB at the left (c.f.
> > most significant *DIGIT* of the number at the left)
> >
> > xmm[MSB]  xmm[LSB]
> >
> > so, looking at x86 ADDSUBPD insn documentation:
> >
> > xmm2[127:64] xmm2[63:0]
> > ( + -)
> > xmm1[127:64] xmm1[63:0]
> > (=)
> > xmm1[127:64] holds ADD
> > xmm1[63:0] holds SUB
> >
> > xmm1[127:64] xmm1 [63:0]
> > ADD SUB
>
> I think if we really want to resolve the "ambiguity" we have to
> spell it out in the optab.  vec_addodd_subeven or vec_addsub_oddeven.
> As I noted there are targets who have the opposite so we could
> then add vec_addsub_evenodd (not vec_subadd_evenodd).
>
> Just tell me what you prefer - I'll adjust the patch accordingly.

I'd use addsub when add comes at the left of sub, and MSB is also
considered at the left. subadd for when sub comes at the left of add
where MSB is also at the left.

I think that the documentation should clear the ambiguity.

Otherwise, just my 0.02€, I don't want to bikeshed about this ad
infinitum, so I'll stop there.

Uros.


Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Uros Bizjak wrote:

> On Tue, Jun 22, 2021 at 11:42 AM Richard Sandiford
>  wrote:
> 
> > >> Well, the pattern is called addsub in the x86 world because highpart
> > >> does add and lowpart does sub. In left-to-right writing systems
> > >> highpart comes before lowpart, so you have addsub.
> > >
> > > The other targets mentioned do not seem to agree but I can live
> > > with that, thus I'll change back to addsub.
> >
> > FWIW, subadd sounds clearer to me too.  It seems surprising to put
> > imaginary before real when interpreting something as complex, for example.
> >
> > Putting the highpart first feels especially odd on an LE system like x86…
> 
> The XMM vector is documented left to right with MSB at the left (c.f.
> most significant *DIGIT* of the number at the left)
> 
> xmm[MSB]  xmm[LSB]
> 
> so, looking at x86 ADDSUBPD insn documentation:
> 
> xmm2[127:64] xmm2[63:0]
> ( + -)
> xmm1[127:64] xmm1[63:0]
> (=)
> xmm1[127:64] holds ADD
> xmm1[63:0] holds SUB
> 
> xmm1[127:64] xmm1 [63:0]
> ADD SUB

I think if we really want to resolve the "ambiguity" we have to
spell it out in the optab.  vec_addodd_subeven or vec_addsub_oddeven.
As I noted there are targets who have the opposite so we could
then add vec_addsub_evenodd (not vec_subadd_evenodd).

Just tell me what you prefer - I'll adjust the patch accordingly.

Richard.


Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-22 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 22, 2021 at 11:42 AM Richard Sandiford
 wrote:

> >> Well, the pattern is called addsub in the x86 world because highpart
> >> does add and lowpart does sub. In left-to-right writing systems
> >> highpart comes before lowpart, so you have addsub.
> >
> > The other targets mentioned do not seem to agree but I can live
> > with that, thus I'll change back to addsub.
>
> FWIW, subadd sounds clearer to me too.  It seems surprising to put
> imaginary before real when interpreting something as complex, for example.
>
> Putting the highpart first feels especially odd on an LE system like x86…

The XMM vector is documented left to right with MSB at the left (c.f.
most significant *DIGIT* of the number at the left)

xmm[MSB]  xmm[LSB]

so, looking at x86 ADDSUBPD insn documentation:

xmm2[127:64] xmm2[63:0]
( + -)
xmm1[127:64] xmm1[63:0]
(=)
xmm1[127:64] holds ADD
xmm1[63:0] holds SUB

xmm1[127:64] xmm1 [63:0]
ADD SUB

Uros.


[PATCH] tree-optimization/101151 - fix irreducible region check for sinking

2021-06-22 Thread Richard Biener
The check whether two blocks are in the same irreducible region
and thus post-dominance checks being unreliable was incomplete
since an irreducible region can contain reducible sub-regions but
if one block is in the irreducible part and one not the check
still doesn't work as expected.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-22  Richard Biener  

PR tree-optimization/101151
* tree-ssa-sink.c (statement_sink_location): Expand irreducible
region check.

* gcc.dg/torture/pr101151.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr101151.c | 19 +++
 gcc/tree-ssa-sink.c |  9 -
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101151.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr101151.c 
b/gcc/testsuite/gcc.dg/torture/pr101151.c
new file mode 100644
index 000..15c9a7b7f57
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101151.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+int a, *b = , c, d;
+int main() {
+  *b;
+  if (a) {
+  L1:
+a = 0;
+  L2:
+if (d) {
+  while (b)
+;
+  goto L1;
+}
+  }
+  if (c)
+goto L2;
+  return 0;
+}
diff --git a/gcc/tree-ssa-sink.c b/gcc/tree-ssa-sink.c
index d252cbb5c51..92f444ec1c8 100644
--- a/gcc/tree-ssa-sink.c
+++ b/gcc/tree-ssa-sink.c
@@ -398,7 +398,14 @@ statement_sink_location (gimple *stmt, basic_block frombb,
  && dominated_by_p (CDI_POST_DOMINATORS, commondom, bb)
  /* If the blocks are possibly within the same irreducible
 cycle the above check breaks down.  */
- && !(bb->flags & commondom->flags & BB_IRREDUCIBLE_LOOP))
+ && !((bb->flags & commondom->flags & BB_IRREDUCIBLE_LOOP)
+  && bb->loop_father == commondom->loop_father)
+ && !((commondom->flags & BB_IRREDUCIBLE_LOOP)
+  && flow_loop_nested_p (commondom->loop_father,
+ bb->loop_father))
+ && !((bb->flags & BB_IRREDUCIBLE_LOOP)
+  && flow_loop_nested_p (bb->loop_father,
+ commondom->loop_father)))
continue;
  bb = EDGE_PRED (bb, PHI_ARG_INDEX_FROM_USE (use_p))->src;
}
-- 
2.26.2


[PATCH resend] testsuite: avoid no-stack-protector-attr-3 fail on mips*-*-*

2021-06-22 Thread Xi Ruoyao via Gcc-patches
[Resend because the original subject missed "[PATCH]" and the path in
ChangeLog is wrong.]

On MIPS a call to __stack_chk_fail needs an additional .reloc pseudo-op,
so "stack_chk_fail" will appear two times.

gcc/testsuite/

* g++.dg/no-stack-protector-attr-3.C (dg-final): Adjust for
  MIPS.
---
 gcc/testsuite/g++.dg/no-stack-protector-attr-3.C | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/no-stack-protector-attr-3.C 
b/gcc/testsuite/g++.dg/no-stack-protector-attr-3.C
index 56a4e74da50..76a5ec08681 100644
--- a/gcc/testsuite/g++.dg/no-stack-protector-attr-3.C
+++ b/gcc/testsuite/g++.dg/no-stack-protector-attr-3.C
@@ -20,4 +20,5 @@ int __attribute__((stack_protect)) bar()
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "stack_chk_fail" 1 } } */
+/* { dg-final { scan-assembler-times "stack_chk_fail" 1 { target { ! mips*-*-* 
} } } }*/
+/* { dg-final { scan-assembler-times "stack_chk_fail" 2 { target { mips*-*-* } 
} } }*/
-- 
2.32.0





Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-22 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, 17 Jun 2021, Uros Bizjak wrote:
>
>> On Thu, Jun 17, 2021 at 11:44 AM Richard Biener  wrote:
>> >
>> > This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
>> > instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
>> > thus subtract, add alternating on lanes, starting with subtract.
>> >
>> > It adds a corresponding optab and direct internal function,
>> > vec_subadd$a3 and at the moment to make the i386 backend changes
>> > "obvious", duplicates the existing avx_addsubv4df3 pattern with
>> > the new canonical name (CODE_FOR_* and gen_* are used throughout the
>> > intrinsic code, so the actual change to rename all existing patterns
>> > will be quite a bit bigger).  I expect some bike-shedding on
>> > subadd vs. addsub so I delay that change ;)
>> 
>> Well, the pattern is called addsub in the x86 world because highpart
>> does add and lowpart does sub. In left-to-right writing systems
>> highpart comes before lowpart, so you have addsub.
>
> The other targets mentioned do not seem to agree but I can live
> with that, thus I'll change back to addsub.

FWIW, subadd sounds clearer to me too.  It seems surprising to put
imaginary before real when interpreting something as complex, for example.

Putting the highpart first feels especially odd on an LE system like x86…

Richard


Re: [PATCH] testsuite: add -fwrapv for 950704-1.c

2021-06-22 Thread Xi Ruoyao via Gcc-patches
On Tue, 2021-06-22 at 10:37 +0200, Richard Biener wrote:
> On Mon, Jun 21, 2021 at 6:53 PM Xi Ruoyao via Gcc-patches
>  wrote:
> > 
> > This test relies on wrap behavior of signed overflow.  Without -
> > fwrapv
> > it is known to fail on mips (and maybe some other targets as well).
> 
> OK.

I don't have write access, requiring a maintainer to commit the change.
-- 
Xi Ruoyao 



[ARM] PR66791: Gate comparison in vca intrinsics on __FAST_MATH__

2021-06-22 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
The attached patch gates abs(__a) cmp abs(__b) for vca intrinsics on
__FAST_MATH__. I moved vabs intrinsics before vcage_f32 since vca
intrinsics use those.
Bootstrapped+tested on arm-linux-gnueabihf.
OK to commit ?

Thanks,
Prathamesh
2021-06-22  Prathamesh Kulkarni  

PR target/66791
* gcc/config/arm_neon.h: Move vabs intrinsics before vcage_f32.
(vcage_f32): Gate comparison conditionally on __FAST_MATH__.
(vcageq_f32): Likewise.
(vcale_f32): Likewise.
(vcaleq_f32): Likewise.
(vcagt_f32): Likewise.
(vcagtq_f32): Likewise.
(vcalt_f32): Likewise.
(vcaltq_f32): Likewise.
(vcage_f16): Likewise.
(vcageq_f16): Likewise.
(vcale_f16): Likewise.
(vcaleq_f16): Likewise.
(vcagt_f16): Likewise.
(vcagtq_f16): Likewise.
(vcalt_f16): Likewise.
(vcaltq_f16): Likewise.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index dcd533fd003..4f81a55234c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -2859,60 +2859,189 @@ vcltq_u32 (uint32x4_t __a, uint32x4_t __b)
   return (__a < __b);
 }
 
+__extension__ extern __inline int8x8_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabs_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vabsv8qi (__a);
+}
+
+__extension__ extern __inline int16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabs_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vabsv4hi (__a);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabs_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vabsv2si (__a);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabs_f32 (float32x2_t __a)
+{
+  return (float32x2_t)__builtin_neon_vabsv2sf (__a);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabsq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vabsv16qi (__a);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabsq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vabsv8hi (__a);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabsq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vabsv4si (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vabsq_f32 (float32x4_t __a)
+{
+  return (float32x4_t)__builtin_neon_vabsv4sf (__a);
+}
+
+__extension__ extern __inline int8x8_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vqabs_s8 (int8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vqabsv8qi (__a);
+}
+
+__extension__ extern __inline int16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vqabs_s16 (int16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vqabsv4hi (__a);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vqabs_s32 (int32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vqabsv2si (__a);
+}
+
+__extension__ extern __inline int8x16_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vqabsq_s8 (int8x16_t __a)
+{
+  return (int8x16_t)__builtin_neon_vqabsv16qi (__a);
+}
+
+__extension__ extern __inline int16x8_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vqabsq_s16 (int16x8_t __a)
+{
+  return (int16x8_t)__builtin_neon_vqabsv8hi (__a);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vqabsq_s32 (int32x4_t __a)
+{
+  return (int32x4_t)__builtin_neon_vqabsv4si (__a);
+}
 __extension__ extern __inline uint32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vcage_f32 (float32x2_t __a, float32x2_t __b)
 {
+#ifdef __FAST_MATH__
+  return (uint32x2_t) (vabs_f32 (__a) >= vabs_f32 (__b));
+#else
   return (uint32x2_t)__builtin_neon_vcagev2sf (__a, __b);
+#endif
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vcageq_f32 (float32x4_t __a, float32x4_t __b)
 {
+#ifdef __FAST_MATH__
+  return (uint32x4_t) (vabsq_f32 (__a) >= vabsq_f32 (__b));
+#else
   return (uint32x4_t)__builtin_neon_vcagev4sf (__a, __b);
+#endif
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vcale_f32 (float32x2_t __a, float32x2_t __b)
 {
+#ifdef __FAST_MATH__
+  return (uint32x2_t) (vabs_f32 (__a) <= vabs_f32 (__b));
+#else
   return (uint32x2_t)__builtin_neon_vcagev2sf (__b, __a);
+#endif
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__  

[committed] testuite: Add pthread check to dg-module-cmi for omp module testing

2021-06-22 Thread Kito Cheng
gcc/testsuite:

* g++.dg/modules/omp-1_a.C: Check pthread is available for
dg-module-cmi.
* g++.dg/modules/omp-2_a.C: Ditto.
---
 gcc/testsuite/g++.dg/modules/omp-1_a.C | 2 +-
 gcc/testsuite/g++.dg/modules/omp-2_a.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/modules/omp-1_a.C 
b/gcc/testsuite/g++.dg/modules/omp-1_a.C
index 94e1171f03c..7ddb776d6a1 100644
--- a/gcc/testsuite/g++.dg/modules/omp-1_a.C
+++ b/gcc/testsuite/g++.dg/modules/omp-1_a.C
@@ -2,7 +2,7 @@
 // { dg-require-effective-target pthread }
 
 export module foo;
-// { dg-module-cmi foo }
+// { dg-module-cmi foo { target pthread } }
 
 export inline void frob (unsigned ()[64])
 {
diff --git a/gcc/testsuite/g++.dg/modules/omp-2_a.C 
b/gcc/testsuite/g++.dg/modules/omp-2_a.C
index b0d4bbc6e8a..e030ac7acf7 100644
--- a/gcc/testsuite/g++.dg/modules/omp-2_a.C
+++ b/gcc/testsuite/g++.dg/modules/omp-2_a.C
@@ -2,7 +2,7 @@
 // { dg-require-effective-target pthread }
 
 export module foo;
-// { dg-module-cmi foo }
+// { dg-module-cmi foo { target pthread } }
 
 // The OpenMPness doesn't escape to the interface.
 export void frob (unsigned ()[64])
-- 
2.31.1



Re: [PATCH] RISC-V: Add tune info for T-HEAD C906.

2021-06-22 Thread Kito Cheng via Gcc-patches
Thanks, committed :)

On Mon, Jun 21, 2021 at 8:44 PM Jojo R via Gcc-patches
 wrote:
>
> gcc/
> * config/riscv/riscv.c (thead_c906_tune_info): New.
> * config/riscv/riscv.c (riscv_tune_info_table): Use new tune.
> ---
>  gcc/config/riscv/riscv.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 1baa2990ee27..576960bb37cb 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -300,6 +300,19 @@ static const struct riscv_tune_param sifive_7_tune_info 
> = {
>true,/* 
> slow_unaligned_access */
>  };
>
> +/* Costs to use when optimizing for T-HEAD c906.  */
> +static const struct riscv_tune_param thead_c906_tune_info = {
> +  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_add */
> +  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
> +  {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */
> +  {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
> +  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
> +  1,/* issue_rate */
> +  3,/* branch_cost */
> +  5,/* memory_cost */
> +  false,/* slow_unaligned_access */
> +};
> +
>  /* Costs to use when optimizing for size.  */
>  static const struct riscv_tune_param optimize_size_tune_info = {
>{COSTS_N_INSNS (1), COSTS_N_INSNS (1)},  /* fp_add */
> @@ -348,6 +361,7 @@ static const struct riscv_tune_info 
> riscv_tune_info_table[] = {
>{ "sifive-3-series", generic, _tune_info },
>{ "sifive-5-series", generic, _tune_info },
>{ "sifive-7-series", sifive_7, _7_tune_info },
> +  { "thead-c906", generic, _c906_tune_info },
>{ "size", generic, _size_tune_info },
>  };
>
> --
> 2.24.3 (Apple Git-128)
>


Re: [committed] libstdc++: Improve std::lock algorithm

2021-06-22 Thread Matthias Kretz
On Monday, 21 June 2021 19:31:59 CEST Jonathan Wakely via Gcc-patches wrote:
> diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
> index d4c5d13f654..5f2d8f9ee7b 100644
> --- a/libstdc++-v3/include/std/mutex
> +++ b/libstdc++-v3/include/std/mutex
> @@ -512,47 +512,44 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #endif // _GLIBCXX_HAS_GTHREADS
> 
>/// @cond undocumented
> -  template
> -inline unique_lock<_Lock>
> -__try_to_lock(_Lock& __l)
> -{ return unique_lock<_Lock>{__l, try_to_lock}; }
> +  namespace __detail
> +  {
> +template
> +  inline unique_lock<_Lockable>
> +  __try_to_lock(_Lockable& __l)
> +  { return unique_lock<_Lockable>{__l, try_to_lock}; }
> 
> -  template
> -struct __try_lock_impl
> -{
> -  template
> -   static void
> -   __do_try_lock(tuple<_Lock&...>& __locks, int& __idx)
> -   {
> -  __idx = _Idx;
> -  auto __lock = std::__try_to_lock(std::get<_Idx>(__locks));
> -  if (__lock.owns_lock())
> -{
> - constexpr bool __cont = _Idx + 2 < sizeof...(_Lock);
> - using __try_locker = __try_lock_impl<_Idx + 1, __cont>;
> - __try_locker::__do_try_lock(__locks, __idx);
> -  if (__idx == -1)
> -__lock.release();
> -}
> -   }
> -};
> +// Lock the last element of the tuple, after all previous ones are
> locked. +template
> +  inline __enable_if_t<_Idx + 1 == sizeof...(_Lockables), int>
> +  __try_lock_impl(tuple<_Lockables&...>& __lockables)

Couldn't you drop the need for enable_if and tuple if you define the function 
like this? (Or - without constexpr if - two overloads with 
__try_lock_impl(_L1& __l1) and __try_lock_impl(_L1& __l1, _L2& __l2, 
_Lockables&... __lockables)

template
  inline int
  __try_lock_impl(_L1& __l1, _Lockables&... __lockables)
  {
if (auto __lock = __detail::__try_to_lock(__l1))
  {
if constexpr (sizeof...(_Lockables))
  {
int __idx = __detail::__try_lock_impl(__lockables...);
if (__idx >= 0)
  return __idx + 1;
  }
__lock.release();
return -1;
  }
else
  return 0;
  }

> [...]
> +template
> +  void
> +  __lock_impl(int& __i, int __depth, _L0& __l0, _L1&... __l1)
> +  {

How about optimizing a likely common case where all lockables have the same 
type? In that case we don't require recursion and can manage stack usage much 
simpler:

  if constexpr ((is_same_v<_L0, _L1> && ...))
{
  constexpr int _Np = 1 + sizeof...(_L1);
  std::array, _Np> __locks = {
{__l0, defer_lock}, {__l1, defer_lock}...
  };
  int __first = 0;
  do {
__locks[__first].lock();
for (int __j = 1; __j < _Np; ++__j)
  {
const int __idx = (__first + __j) % _Np;
if (!__locks[__idx].try_lock())
  {
for (int __k = __idx; __k != __first;
__k = __k == 1 ? _Np : __k - 1)
  __locks[__k - 1].unlock();
__first = __idx;
break;
  }
  }
  } while (!__locks[__first]);
  for (int __j = 0; __j < _Np; ++__j)
__locks[__j].release();
}
  else


> +   while (__i >= __depth)
> + {
> +   if (__i == __depth)
> + {
> +   int __failed = 1; // index that couldn't be locked
> +   {
> + unique_lock<_L0> __first(__l0);
> + auto __rest = std::tie(__l1...);
> + __failed += __detail::__try_lock_impl<0>(__rest);
> + if (!__failed)
> +   {
> + __i = -1; // finished
> + __first.release();
> + return;
> +   }
> +   }
> +#ifdef _GLIBCXX_USE_SCHED_YIELD
> +   __gthread_yield();
> +#endif
> +   constexpr auto __n = 1 + sizeof...(_L1);
> +   __i = (__depth + __failed) % __n;
> + }
> +   else // rotate left until l_i is first.
> + __detail::__lock_impl(__i, __depth + 1, __l1..., __l0);
> + }
> +  }
> +
> +  } // namespace __detail
> +  /// @endcond
> +
>/** @brief Generic lock.
> *  @param __l1 Meets Lockable requirements (try_lock() may throw).
> *  @param __l2 Meets Lockable requirements (try_lock() may throw).
> @@ -590,19 +627,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  void
>  lock(_L1& __l1, _L2& __l2, _L3&... __l3)
>  {
> -  while (true)
> -{
> -  using __try_locker = __try_lock_impl<0, sizeof...(_L3) != 0>;
> -  unique_lock<_L1> __first(__l1);
> -  int __idx;
> -  auto __locks = std::tie(__l2, __l3...);
> -  __try_locker::__do_try_lock(__locks, __idx);
> -  

[PATCH] tree-optimization/101154 - fix out-of bound access in SLP

2021-06-22 Thread Richard Biener
This fixes an out-of-bound access of matches.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-22  Richard Biener  

PR tree-optimization/101154
* tree-vect-slp.c (vect_build_slp_tree_2): Fix out-of-bound access.
---
 gcc/tree-vect-slp.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index a32f86b8bc7..b9f91e7c7ba 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1963,15 +1963,15 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  if (dt == vect_constant_def
  || dt == vect_external_def)
{
-   /* We can always build those.  Might want to sort last
-  or defer building.  */
-  vec ops;
-  ops.create (group_size);
-  for (lane = 0; lane < group_size; ++lane)
-ops.quick_push (chains[lane][n].op);
-  slp_tree child = vect_create_new_slp_node (ops);
-  SLP_TREE_DEF_TYPE (child) = dt;
-  children.safe_push (child);
+ /* We can always build those.  Might want to sort last
+or defer building.  */
+ vec ops;
+ ops.create (group_size);
+ for (lane = 0; lane < group_size; ++lane)
+   ops.quick_push (chains[lane][n].op);
+ slp_tree child = vect_create_new_slp_node (ops);
+ SLP_TREE_DEF_TYPE (child) = dt;
+ children.safe_push (child);
}
  else if (dt != vect_internal_def)
{
@@ -2036,9 +2036,10 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
dump_printf_loc (MSG_NOTE, vect_location,
 "failed to match up op %d\n", n);
  op_stmts.release ();
- matches[lane] = false;
  if (lane != group_size - 1)
matches[0] = false;
+ else
+   matches[lane] = false;
  goto out;
}
  if (dump_enabled_p ())
-- 
2.26.2


[PATCH] tree-optimization/101159 - fix missing NULL check in popcount pattern

2021-06-22 Thread Richard Biener
This fixes a missing check for a NULL vectype in the new popcount
pattern.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-22  Richard Biener  

PR tree-optimization/101159
* tree-vect-patterns.c (vect_recog_popcount_pattern): Add
missing NULL vectype check.
---
 gcc/tree-vect-patterns.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 59727056dc7..d0a5c71dbe4 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1385,9 +1385,9 @@ vect_recog_popcount_pattern (vec_info *vinfo,
   vect_pattern_detected ("vec_regcog_popcount_pattern", popcount_stmt);
   vec_type = get_vectype_for_scalar_type (vinfo, lhs_type);
   /* Do it only the backend existed popcount2.  */
-  if (!direct_internal_fn_supported_p (IFN_POPCOUNT,
-  vec_type,
-  OPTIMIZE_FOR_SPEED))
+  if (!vec_type
+  || !direct_internal_fn_supported_p (IFN_POPCOUNT, vec_type,
+ OPTIMIZE_FOR_SPEED))
 return NULL;
 
   /* Create B = .POPCOUNT (A).  */
-- 
2.26.2


[PATCH] tree-optimization/101158 - adjust SLP call matching sequence

2021-06-22 Thread Richard Biener
This moves the check for same operands after verifying we're
facing compatible calls.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-22  Richard Biener  

PR tree-optimization/101158
* tree-vect-slp.c (vect_build_slp_tree_1): Move same operand
checking after checking for matching operation.

* gfortran.dg/pr101158.f90: New testcase.
---
 gcc/testsuite/gfortran.dg/pr101158.f90 | 25 +
 gcc/tree-vect-slp.c| 31 +-
 2 files changed, 41 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr101158.f90

diff --git a/gcc/testsuite/gfortran.dg/pr101158.f90 
b/gcc/testsuite/gfortran.dg/pr101158.f90
new file mode 100644
index 000..9a4d9a2d7ae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr101158.f90
@@ -0,0 +1,25 @@
+! { dg-do compile }
+! { dg-options "-O1 -ftree-slp-vectorize -fwrapv" }
+! { dg-additional-options "-march=armv8-a+sve" { target aarch64-*-* } }
+
+subroutine sprpl5 (left)
+  implicit none
+
+  integer :: left
+  integer :: avail1, avail2, delx1, delx2, i2, ic
+
+  ic = left
+  delx1 = ic / 2
+  delx2 = delx1 + 1
+  i2 = ic + delx2
+  avail1 = i2
+  avail2 = 1
+
+  do delx1 = 1, 2
+ ic = left + nint (real (left) / 2)
+ if (ic .ge. avail1) avail1 = ic + 1
+
+ i2 = ic + delx2
+ if (i2 .le. avail2) avail2 = i2 + 1
+  end do
+end subroutine sprpl5
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b9f91e7c7ba..6c98acbe722 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1177,21 +1177,6 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  continue;
}
 
- if (need_same_oprnds)
-   {
- tree other_op1 = (call_stmt
-   ? gimple_call_arg (call_stmt, 1)
-   : gimple_assign_rhs2 (stmt));
- if (!operand_equal_p (first_op1, other_op1, 0))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"Build SLP failed: different shift "
-"arguments in %G", stmt);
- /* Mismatch.  */
- continue;
-   }
-   }
  if (!load_p
  && first_stmt_code == BIT_FIELD_REF
  && (TREE_OPERAND (gimple_assign_rhs1 (first_stmt_info->stmt), 0)
@@ -1231,6 +1216,22 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  continue;
}
 
+ if (need_same_oprnds)
+   {
+ tree other_op1 = (call_stmt
+   ? gimple_call_arg (call_stmt, 1)
+   : gimple_assign_rhs2 (stmt));
+ if (!operand_equal_p (first_op1, other_op1, 0))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: different shift "
+"arguments in %G", stmt);
+ /* Mismatch.  */
+ continue;
+   }
+   }
+
  if (!types_compatible_p (vectype, *node_vectype))
{
  if (dump_enabled_p ())
-- 
2.26.2


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Richard Biener
On Tue, 22 Jun 2021, Richard Sandiford wrote:

> Kees Cook  writes:
> > On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
> >> So, if “pattern value” is “0x”, then it’s a valid 
> >> canonical virtual memory address.  However, for most OS, 
> >> “0x” should be not in user space.
> >> 
> >> My question is, is “0xF” good for pointer? Or 
> >> “0x” better?
> >
> > I think 0xFF repeating is fine for this version. Everything else is a
> > "nice to have" for the pattern-init, IMO. :)
> 
> Sorry to be awkward, but 0xFF seems worse than 0xAA to me.
> 
> For integer types, all values are valid representations, and we're
> relying on the pattern being “obviously” wrong in context.  0x…
> is unlikely to be a correct integer but 0x… would instead be a
> “nice” -1.  It would be difficult to tell in a debugger that a -1
> came from pattern init rather than a deliberate choice.
> 
> I agree that, all other things being equal, it would be nice to use NaNs
> for floats.  But relying on wrong numerical values for floats doesn't
> seem worse than doing that for integers.
> 
> 0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
> which admittedly doesn't stand out as wrong.  But I'm not sure we
> should sacrifice integer debugging for float debugging here.

We can always expose the actual value as --param.  Now, I think
we'd need a two-byte pattern to reliably produce NaNs anyway,
so with floats taken out of the picture the focus should be on
pointers where IMHO val & 1 and val & 15 would be nice to have.
So sth like 0xf7 would work for those.  With a two-byte pattern
we could use 0xffef or 0x7fef.

Anyway, it's probably down to priorities of the project involved
(debugging FP stuff or integer stuff).

Richard.


Re: [PATCH] testsuite: add -fwrapv for 950704-1.c

2021-06-22 Thread Richard Biener via Gcc-patches
On Mon, Jun 21, 2021 at 6:53 PM Xi Ruoyao via Gcc-patches
 wrote:
>
> This test relies on wrap behavior of signed overflow.  Without -fwrapv
> it is known to fail on mips (and maybe some other targets as well).

OK.

Richard.

> gcc/testsuite/
>
> * gcc.c-torture/execute/950704-1.c: Add -fwrapv to avoid
>   undefined behavior.
> ---
>  gcc/testsuite/gcc.c-torture/execute/950704-1.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/950704-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/950704-1.c
> index f11aff8cabc..67fe0885e5a 100644
> --- a/gcc/testsuite/gcc.c-torture/execute/950704-1.c
> +++ b/gcc/testsuite/gcc.c-torture/execute/950704-1.c
> @@ -1,3 +1,4 @@
> +/* { dg-additional-options "-fwrapv" } */
>  int errflag;
>
>  long long
> --
> 2.32.0
>
>
>


Re: [Ping^2, Patch, Fortran] PR100337 Should be able to pass non-present optional arguments to CO_BROADCAST

2021-06-22 Thread Tobias Burnus

Hi Andre,

On 22.06.21 09:40, Andre Vehreschild via Fortran wrote:

To the questions:
- I added a test only for -fcoarray=single because in the library case the
   optional stat is just propagated to the library, which is already tested a
   lot of times and which needs to handle the optional stat in any case. So an
   error there would have been detected in one of the earlier tests. I did not
   want to add unnecessary  test overhead given that the tests already run for a
   long time.

Fair point.

- I did not add tests for the other CO_* routines, i.e. CO_MIN, CO_MAX,
   CO_REDUCE or CO_SUM, that are also handled by this routine, because I believe
   that showing that the fix works for CO_BROADCAST shows that the others work,
   too. Because the four others do not have any special handling in their
   implementation in  trans_intrinsic. Or do you mean other coarray-routines
   besides the five handled by conv_co_collective()?

Well, that relates more to the first point – for -fcoarray=lib, it
likely makes a difference. For -fcoarray=single not. If the former is
skipped, it is much less relevant for the second.

If it is ok for you, I would apply the patch as is, or do you see a reason to
add more tests?


OK.

Although, I am not that sure that libcaf_single gets that much testing.
On the other hand, -fcoarray=lib with -lcaf_single is also not that
relevant in the real world, either.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH] docs: drop unbalanced parenthesis in rtl.texi

2021-06-22 Thread Martin Liška

On 6/22/21 12:43 AM, Sergei Trofimovich via Gcc-patches wrote:

From: Sergei Trofimovich 

gcc/ChangeLog:

* doc/rtl.texi: drop unbalanced parenthesis.
---
  gcc/doc/rtl.texi | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 5af71137a87..e1e76a93a8b 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -144,7 +144,7 @@ Currently, @file{rtl.def} defines these classes:
  @item RTX_OBJ
  An RTX code that represents an actual object, such as a register
  (@code{REG}) or a memory location (@code{MEM}, @code{SYMBOL_REF}).
-@code{LO_SUM}) is also included; instead, @code{SUBREG} and
+@code{LO_SUM} is also included; instead, @code{SUBREG} and
  @code{STRICT_LOW_PART} are not in this class, but in class
  @code{RTX_EXTRA}.
  



Please push it, it's obvious.

Thanks,
Martin


Re: [RFC][PATCH] contrib: add git-commit-mklog wrapper

2021-06-22 Thread Martin Liška

On 6/22/21 10:23 AM, Tobias Burnus wrote:

Hello,

On 22.06.21 09:30, Martin Liška wrote:

There's a patch candidate that comes up with a wrapper for 'git
commit-mklog' alias.
Using my patch, one can do:
$ git commit-mklog -a -b 12345,
Thoughts?


What about '-p' – to fetch the data from GCC Bugzilla?


Sure, that needs to be supported as well.


I do note that
'git commit ' supports '-p, --patch' which may or may not be an issue.


People likely do not use it with commit-mklog alias.

Martin



Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf




Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-22 Thread Richard Sandiford via Gcc-patches
Kees Cook  writes:
> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
>> So, if “pattern value” is “0x”, then it’s a valid canonical 
>> virtual memory address.  However, for most OS, “0x” should 
>> be not in user space.
>> 
>> My question is, is “0xF” good for pointer? Or 
>> “0x” better?
>
> I think 0xFF repeating is fine for this version. Everything else is a
> "nice to have" for the pattern-init, IMO. :)

Sorry to be awkward, but 0xFF seems worse than 0xAA to me.

For integer types, all values are valid representations, and we're
relying on the pattern being “obviously” wrong in context.  0x…
is unlikely to be a correct integer but 0x… would instead be a
“nice” -1.  It would be difficult to tell in a debugger that a -1
came from pattern init rather than a deliberate choice.

I agree that, all other things being equal, it would be nice to use NaNs
for floats.  But relying on wrong numerical values for floats doesn't
seem worse than doing that for integers.

0xAA… for float is (if I've got this right) -3.0316488252093987e-13,
which admittedly doesn't stand out as wrong.  But I'm not sure we
should sacrifice integer debugging for float debugging here.

Thanks,
Richard


Re: [RFC][PATCH] contrib: add git-commit-mklog wrapper

2021-06-22 Thread Tobias Burnus

Hello,

On 22.06.21 09:30, Martin Liška wrote:

There's a patch candidate that comes up with a wrapper for 'git
commit-mklog' alias.
Using my patch, one can do:
$ git commit-mklog -a -b 12345,
Thoughts?


What about '-p' – to fetch the data from GCC Bugzilla? I do note that
'git commit ' supports '-p, --patch' which may or may not be an issue.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[PATCH] Remove my Write After Approval entry.

2021-06-22 Thread liuhongt via Gcc-patches
ChangeLog:

* MAINTAINERS: Remove my Write After Approval entry.
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4ac4fc5f3bd..b4c50a93129 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -472,7 +472,6 @@ Nicolas Koenig  

 Boris Kolpackov

 Dave Korn  
 Julia Koval
-Hongtao Liu
 Matt Kraai 
 Jan Kratochvil 
 Louis Krupp
-- 
2.18.1



Re: [Ping^2, Patch, Fortran] PR100337 Should be able to pass non-present optional arguments to CO_BROADCAST

2021-06-22 Thread Andre Vehreschild via Gcc-patches
Hi Tobias,

thanks for the review.

To the questions: 

- I added a test only for -fcoarray=single because in the library case the
  optional stat is just propagated to the library, which is already tested a
  lot of times and which needs to handle the optional stat in any case. So an
  error there would have been detected in one of the earlier tests. I did not
  want to add unnecessary  test overhead given that the tests already run for a
  long time.

- I did not add tests for the other CO_* routines, i.e. CO_MIN, CO_MAX,
  CO_REDUCE or CO_SUM, that are also handled by this routine, because I believe
  that showing that the fix works for CO_BROADCAST shows that the others work,
  too. Because the four others do not have any special handling in their
  implementation in  trans_intrinsic. Or do you mean other coarray-routines
  besides the five handled by conv_co_collective()?

If it is ok for you, I would apply the patch as is, or do you see a reason to
add more tests?

Regards,
Andre

On Mon, 21 Jun 2021 14:30:21 +0200
Tobias Burnus  wrote:

> Any reason that you did not put it under
>gfortran.dg/coarray/
> such that it is also run with -fcoarray=lib (-lcaf_single)?
> I know that the issue only exists for single, but it also makes
> sense to check that libcaf_single works 
> 
> In that sense, I wonder whether also the other CO_* should be
> checked in the testsuite as they are handled differently in
> libcaf_... (but identical with -fcoarray=single).
> 
> Except for those two nits, it LGTM. Thanks!
> 
> Tobias
> 
> PS: The function is used by
>  case GFC_ISYM_CO_BROADCAST:
>  case GFC_ISYM_CO_MIN:
>  case GFC_ISYM_CO_MAX:
>  case GFC_ISYM_CO_REDUCE:
>  case GFC_ISYM_CO_SUM:
> and, with -fcoarray=single, errmsg is not touched
> as stat is (unconditionally) 0 (success)..
> 
> 
> On 19.06.21 13:23, Andre Vehreschild via Fortran wrote:
> > PING!
> >
> > On Fri, 4 Jun 2021 18:05:18 +0200
> > Andre Vehreschild  wrote:
> >  
> >> Ping!
> >>
> >> On Fri, 21 May 2021 15:33:11 +0200
> >> Andre Vehreschild  wrote:
> >>  
> >>> Hi,
> >>>
> >>> the attached patch fixes an issue when calling CO_BROADCAST in
> >>> -fcoarray=single mode, where the optional but non-present (in the calling
> >>> scope) stat variable was assigned to before checking for it being not
> >>> present.
> >>>
> >>> Regtests fine on x86-64-linux/f33. Ok for trunk?
> >>>
> >>> Regards,
> >>> Andre  
> >>  
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de  
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
> Thürauf


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


  1   2   >