Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-09-01 Thread Hongtao Liu via Gcc-patches
On Wed, Sep 1, 2021 at 8:52 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Sep 1, 2021 at 8:28 AM Hongtao Liu  wrote:
> >>
> >> On Tue, Aug 31, 2021 at 7:56 PM Richard Biener
> >>  wrote:
> >> >
> >> > On Tue, Aug 31, 2021 at 12:18 PM Hongtao Liu  wrote:
> >> > >
> >> > > On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
> >> > >  wrote:
> >> > > >
> >> > > > On Fri, Aug 27, 2021 at 8:53 AM liuhongt  
> >> > > > wrote:
> >> > > > >
> >> > > > >   When gimple simplifcation try to combine op and vec_cond_expr to 
> >> > > > > cond_op,
> >> > > > > it doesn't check if mask type matches. It causes an ICE when 
> >> > > > > expand cond_op
> >> > > > > with mismatched mode.
> >> > > > >   This patch add a function named 
> >> > > > > cond_vectorized_internal_fn_supported_p
> >> > > > >  to additionally check mask type than 
> >> > > > > vectorized_internal_fn_supported_p.
> >> > > > >
> >> > > > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> >> > > > >   Ok for trunk?
> >> > > > >
> >> > > > > gcc/ChangeLog:
> >> > > > >
> >> > > > > PR middle-end/102080
> >> > > > > * internal-fn.c (cond_vectorized_internal_fn_supported_p): 
> >> > > > > New functions.
> >> > > > > * internal-fn.h (cond_vectorized_internal_fn_supported_p): 
> >> > > > > New declaration.
> >> > > > > * match.pd: Check the type of mask while generating 
> >> > > > > cond_op in
> >> > > > > gimple simplication.
> >> > > > >
> >> > > > > gcc/testsuite/ChangeLog:
> >> > > > >
> >> > > > > PR middle-end/102080
> >> > > > > * gcc.target/i386/pr102080.c: New test.
> >> > > > > ---
> >> > > > >  gcc/internal-fn.c| 22 
> >> > > > > ++
> >> > > > >  gcc/internal-fn.h|  1 +
> >> > > > >  gcc/match.pd | 24 
> >> > > > > 
> >> > > > >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
> >> > > > >  4 files changed, 55 insertions(+), 8 deletions(-)
> >> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
> >> > > > >
> >> > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> >> > > > > index 1360a00f0b9..8b2b65db1a7 100644
> >> > > > > --- a/gcc/internal-fn.c
> >> > > > > +++ b/gcc/internal-fn.c
> >> > > > > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
> >> > > > >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
> >> > > > >  }
> >> > > > >
> >> > > > > +/* Check cond_op for vector modes since 
> >> > > > > vectorized_internal_fn_supported_p
> >> > > > > +   doesn't check if mask type matches.  */
> >> > > > > +bool
> >> > > > > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree 
> >> > > > > type,
> >> > > > > +tree mask_type)
> >> > > > > +{
> >> > > > > +  if (!vectorized_internal_fn_supported_p (ifn, type))
> >> > > > > +return false;
> >> > > > > +
> >> > > > > +  machine_mode mask_mode;
> >> > > > > +  machine_mode vmode = TYPE_MODE (type);
> >> > > > > +  int size1, size2;
> >> > > > > +  if (VECTOR_MODE_P (vmode)
> >> > > > > +  && targetm.vectorize.get_mask_mode 
> >> > > > > (vmode).exists(&mask_mode)
> >> > > > > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
> >> > > > > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant 
> >> > > > > (&size2)
> >> > > > > +  && size1 != size2)
> >> > > >
> >> > > > Why do we check for equal size rather than just mode equality which
> >> > > I originally thought  TYPE_MODE of vector(8)  was
> >> > > not QImode, Changed the patch to check mode equality.
> >> > > Update patch.
> >> >
> >> > Looking at all this it seems the match.pd patterns should have not
> >> > used vectorized_internal_fn_supported_p but 
> >> > direct_internal_fn_supported_p
> >> > which is equivalent here because we're always working with vector modes?
>
> Yeah, looks like it.
>
> >> > And then shouldn't we look at the actual optab whether the mask mode 
> >> > matches
> >> > the expectation rather than going around via the target hook which may 
> >> > not have
> >> > enough context to decide which mask mode to use?
> >> How about this?
> >>
> >> +/* Return true if target supports cond_op with data TYPE and
> >> +   mask MASK_TYPE.  */
> >> +bool
> >> +cond_internal_fn_supported_p (internal_fn ifn, tree type,
> >> +   tree mask_type)
> >> +{
> >> +  tree_pair types = tree_pair (type, type);
> >> +  optab tmp = direct_internal_fn_optab (ifn, types);
> >> +  machine_mode vmode = TYPE_MODE (type);
> >> +  insn_code icode = direct_optab_handler (tmp, vmode);
> >> +  if (icode == CODE_FOR_nothing)
> >> +return false;
> >> +
> >> +  machine_mode mask_mode = TYPE_MODE (mask_type);
> >> +  /* Can't create rtx and use insn_operand_matches here.  */
> >> +  return insn_data[icode].operand[0].mode == vmode
> >> +&& insn_data[icode].operand[1].mode == mask_mode;
> >> +}
> >> +

Re: [PATCH 4/6] Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.

2021-09-01 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 24, 2021 at 5:39 PM Hongtao Liu  wrote:
>
> On Tue, Aug 17, 2021 at 9:53 AM Hongtao Liu  wrote:
> >
> > On Fri, Aug 6, 2021 at 2:06 PM Hongtao Liu  wrote:
> > >
> > > On Tue, Aug 3, 2021 at 10:44 AM Hongtao Liu  wrote:
> > > >
> > > > On Tue, Aug 3, 2021 at 3:34 AM Joseph Myers  
> > > > wrote:
> > > > >
> > > > > On Mon, 2 Aug 2021, liuhongt via Gcc-patches wrote:
> > > > >
> > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > index 7979e240426..dc673c89bc8 100644
> > > > > > --- a/gcc/config/i386/i386.c
> > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > @@ -23352,6 +23352,8 @@ ix86_get_excess_precision (enum 
> > > > > > excess_precision_type type)
> > > > > >   return (type == EXCESS_PRECISION_TYPE_STANDARD
> > > > > >   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > > > > >   : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > > > > +  case EXCESS_PRECISION_TYPE_FLOAT16:
> > > > > > + return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > > > >default:
> > > > > >   gcc_unreachable ();
> > > > > >  }
> > > > >
> > > > > I'd expect an error for -fexcess-precision=16 with -mfpmath=387 
> > > > > (since x87
> > > > > doesn't do float or double arithmetic, but -fexcess-precision=16 
> > > > > implies
> > > > > that all of _Float16, float and double are represented to the range 
> > > > > and
> > > > > precision of their type withou any excess precision).
> > > > >
> > > > Yes, additional changes like this.
> > > >
> > > > modified   gcc/config/i386/i386.c
> > > > @@ -23443,6 +23443,9 @@ ix86_get_excess_precision (enum
> > > > excess_precision_type type)
> > > >   ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT
> > > >   : FLT_EVAL_METHOD_UNPREDICTABLE);
> > > >case EXCESS_PRECISION_TYPE_FLOAT16:
> > > > + if (TARGET_80387
> > > > + && !(TARGET_SSE_MATH && TARGET_SSE))
> > > > +   error ("%<-fexcess-precision=16%> is not compatible with 
> > > > %<-mfpmath=387%>");
> > > >   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > >default:
> > > >   gcc_unreachable ();
> > > > new file   gcc/testsuite/gcc.target/i386/float16-7.c
> > > > @@ -0,0 +1,9 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -mfpmath=387 -fexcess-precision=16" } */
> > > > +/* { dg-excess-errors "'-fexcess-precision=16' is not compatible with
> > > > '-mfpmath=387'" } */
> > > > +_Float16
> > > > +foo (_Float16 a, _Float16 b)
> > > > +{
> > > > +  return a + b;/* { dg-error "'-fexcess-precision=16' is not
> > > > compatible with '-mfpmath=387'" } */
> > > > +}
> > > > +
> > > >
> > > > > --
> > > > > Joseph S. Myers
> > > > > jos...@codesourcery.com
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > > Updated patch and ping for it.
> > >
> > > Also for backend changes.
> > > 1. For backend m68k/s390 which totally don't support _Float16, backend
> > > will issue an error for -fexcess-precision=16, I think it should be
> > > fine.
> > > 2. For backend like arm/aarch64 which supports _Float16 , backend will
> > > set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for -fexcess-precision=16 even
> > > hardware instruction for fp16 is not supported. Would that be ok for
> > > arm?
> >
> > Ping for this patch.
> >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
> Rebased and ping^3, there are plenty of avx512fp16 patches blocked by
> this patch, i'd like someone to help review this patch.
I'm going to check in this patch if there's no objections in the next 48 hours.

> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-01 Thread Hongtao Liu via Gcc-patches
I'm going to check in the first 3 patches which are already approved.

  Update hf soft-fp from glibc.
  [i386] Enable _Float16 type for TARGET_SSE2 and above.
  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
truncations.

On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
>
> Update from v2:
>
> 1. Support -fexcess-precision=16 which will enable
> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> should not do anything different from -fexcess-precision=fast
>  regarding _Float16.
> 3. Avoiding macroization of HFmode patterns.
> 4. Allow (subreg:SI (reg:HF)).
> 5. Update documents corresponding exactly to the code changes in
> the same patch.
> 6. According to 32bit abi, pass vector _Float16 by sse registers
> for 32-bit mode, not stack.
>
> Guo, Xuepeng (1):
>   AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> instructions.
>
> liuhongt (5):
>   Update hf soft-fp from glibc.
>   [i386] Enable _Float16 type for TARGET_SSE2 and above.
>   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> truncations.
>   Support -fexcess-precision=16 which will enable
> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
>   AVX512FP16: Support vector init/broadcast/set/extract for FP16.
>
>  gcc/ada/gcc-interface/misc.c  |   3 +
>  gcc/c-family/c-common.c   |   6 +-
>  gcc/c-family/c-cppbuiltin.c   |   6 +-
>  gcc/common.opt|   5 +-
>  gcc/common/config/i386/cpuinfo.h  |   2 +
>  gcc/common/config/i386/i386-common.c  |  26 +-
>  gcc/common/config/i386/i386-cpuinfo.h |   1 +
>  gcc/common/config/i386/i386-isas.h|   1 +
>  gcc/config.gcc|   2 +-
>  gcc/config/aarch64/aarch64.c  |   1 +
>  gcc/config/arm/arm.c  |   1 +
>  gcc/config/i386/avx512fp16intrin.h| 225 ++
>  gcc/config/i386/cpuid.h   |   1 +
>  gcc/config/i386/i386-builtin-types.def|   7 +-
>  gcc/config/i386/i386-builtins.c   |  23 +
>  gcc/config/i386/i386-c.c  |   2 +
>  gcc/config/i386/i386-expand.c | 129 +-
>  gcc/config/i386/i386-isa.def  |   1 +
>  gcc/config/i386/i386-modes.def|  13 +-
>  gcc/config/i386/i386-options.c|   4 +-
>  gcc/config/i386/i386.c| 243 +--
>  gcc/config/i386/i386.h|  29 +-
>  gcc/config/i386/i386.md   | 291 -
>  gcc/config/i386/i386.opt  |   4 +
>  gcc/config/i386/immintrin.h   |   4 +
>  gcc/config/i386/sse.md| 397 +-
>  gcc/config/m68k/m68k.c|   2 +
>  gcc/config/s390/s390.c|   2 +
>  gcc/coretypes.h   |   3 +-
>  gcc/doc/extend.texi   |  22 +
>  gcc/doc/invoke.texi   |  10 +-
>  gcc/doc/tm.texi   |  14 +-
>  gcc/doc/tm.texi.in|   3 +
>  gcc/emit-rtl.c|   5 +
>  gcc/flag-types.h  |   3 +-
>  gcc/fortran/options.c |   3 +
>  gcc/lto/lto-lang.c|   3 +
>  gcc/target.def|  11 +-
>  gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
>  gcc/testsuite/g++.target/i386/float16-1.C |   8 +
>  gcc/testsuite/g++.target/i386/float16-2.C |  14 +
>  gcc/testsuite/g++.target/i386/float16-3.C |  10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c |   2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c |   2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>  .../gcc.target/i386/avx512fp16-12a.c  |  21 +
>  .../gcc.target/i386/avx512fp16-12b.c  |  27 ++
>  gcc/testsuite/gcc.target/i386/float16-3a.c|  10 +
>  gcc/testsuite/gcc.target/i386/float16-3b.c|  10 +
>  gcc/testsuite/gcc.target/i386/float16-4a.c|  10 +
>  gcc/testsuite/gcc.target/i386/float16-4b.c|  10 +
>  gcc/testsuite/gcc.target/i386/float16-5.c |  12 +
>  gcc/testsuite/gcc.target/i386/float16-6.c |   8 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  gcc/testsuite/gcc.target/i386/pr54855-12.c|  14 +
>  gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
>  gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
>  gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
>  .../gcc.target/i386/sse2-float16-1.c  |   8 +
>  .../gcc.target/i386/sse2-float16-2.c  |  16 +
>  .../gcc.target/i386/sse2-fl

[PATCH] rx: Add define "PREFERRED_DEBUGGING_TYPE" to rx-*-linux.

2021-09-01 Thread Yoshinori Sato
Added missging PREFERRED_DEBUGGING_TYPE.

gcc/ChangeLog

* config/rx/linux.h (PREFERRED_DEBUGGING_TYPE):
Added missing define.

---
 gcc/config/rx/linux.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/rx/linux.h b/gcc/config/rx/linux.h
index 9ee484af886..e9b51197a23 100644
--- a/gcc/config/rx/linux.h
+++ b/gcc/config/rx/linux.h
@@ -226,6 +226,7 @@
   while (0)
 
 #undef  PREFERRED_DEBUGGING_TYPE
+#define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
 
 #undef TARGET_AS100_SYNTAX
 #define TARGET_AS100_SYNTAX 0
-- 
2.33.0



Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-01 Thread Xionghu Luo via Gcc-patches




On 2021/9/1 17:58, Richard Biener wrote:

This fixes the CFG walk order of fill_always_executed_in to use
RPO oder rather than the dominator based order computed by
get_loop_body_in_dom_order.  That fixes correctness issues with
unordered dominator children.

The RPO order computed by rev_post_order_and_mark_dfs_back_seme in
its for-iteration mode is a good match for the algorithm.

Xionghu, I've tried to only fix the CFG walk order issue and not
change anything else with this so we have a more correct base
to work against.  The code still walks inner loop bodies
up to loop depth times and thus is quadratic in the loop depth.

Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't
have any comments I plan to push this and then revisit what we
were circling around.


LGTM, thanks.



Richard.

2021-09-01  Richard Biener  

PR tree-optimization/102155
* tree-ssa-loop-im.c (fill_always_executed_in_1): Iterate
over a part of the RPO array and do not recurse here.
Dump blocks marked as always executed.
(fill_always_executed_in): Walk over the RPO array and
process loops whose header we run into.
(loop_invariant_motion_in_fun): Compute the first RPO
using rev_post_order_and_mark_dfs_back_seme in iteration
order and pass that to fill_always_executed_in.
---
  gcc/tree-ssa-loop-im.c | 136 ++---
  1 file changed, 73 insertions(+), 63 deletions(-)

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index d9f75d5025e..f3706dcdb8a 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3025,77 +3025,74 @@ do_store_motion (void)
  /* Fills ALWAYS_EXECUTED_IN information for basic blocks of LOOP, i.e.
 for each such basic block bb records the outermost loop for that execution
 of its header implies execution of bb.  CONTAINS_CALL is the bitmap of
-   blocks that contain a nonpure call.  */
+   blocks that contain a nonpure call.  The blocks of LOOP start at index
+   START of the RPO array of size N.  */
  
  static void

-fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
+fill_always_executed_in_1 (function *fun, class loop *loop,
+  int *rpo, int start, int n, sbitmap contains_call)
  {
-  basic_block bb = NULL, *bbs, last = NULL;
-  unsigned i;
-  edge e;
+  basic_block last = NULL;
class loop *inn_loop = loop;
  
-  if (ALWAYS_EXECUTED_IN (loop->header) == NULL)

+  for (int i = start; i < n; i++)
  {
-  bbs = get_loop_body_in_dom_order (loop);
-
-  for (i = 0; i < loop->num_nodes; i++)
-   {
- edge_iterator ei;
- bb = bbs[i];
-
- if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
-   last = bb;
+  basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i]);
+  /* Stop when we iterated over all blocks in this loop.  */
+  if (!flow_bb_inside_loop_p (loop, bb))
+   break;
  
-	  if (bitmap_bit_p (contains_call, bb->index))

-   break;
+  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+   last = bb;
  
-	  FOR_EACH_EDGE (e, ei, bb->succs)

-   {
- /* If there is an exit from this BB.  */
- if (!flow_bb_inside_loop_p (loop, e->dest))
-   break;
- /* Or we enter a possibly non-finite loop.  */
- if (flow_loop_nested_p (bb->loop_father,
- e->dest->loop_father)
- && ! finite_loop_p (e->dest->loop_father))
-   break;
-   }
- if (e)
-   break;
+  if (bitmap_bit_p (contains_call, bb->index))
+   break;
  
-	  /* A loop might be infinite (TODO use simple loop analysis

-to disprove this if possible).  */
- if (bb->flags & BB_IRREDUCIBLE_LOOP)
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+   {
+ /* If there is an exit from this BB.  */
+ if (!flow_bb_inside_loop_p (loop, e->dest))
break;
-
- if (!flow_bb_inside_loop_p (inn_loop, bb))
+ /* Or we enter a possibly non-finite loop.  */
+ if (flow_loop_nested_p (bb->loop_father,
+ e->dest->loop_father)
+ && ! finite_loop_p (e->dest->loop_father))
break;
+   }
+  if (e)
+   break;
  
-	  if (bb->loop_father->header == bb)

-   {
- if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
-   break;
+  /* A loop might be infinite (TODO use simple loop analysis
+to disprove this if possible).  */
+  if (bb->flags & BB_IRREDUCIBLE_LOOP)
+   break;
  
-	  /* In a loop that is always entered we may proceed anyway.

-But record that we entered it and stop once we leave it.  */
- inn_loop = bb->loop_father;
-   }
-   }
+  if (!flow_bb_inside_loop_p (inn_loop, bb))
+   break;
  
-

[PATCH] Remove macro check for __AMX_BF16/INT8/TILE__ in header file.

2021-09-01 Thread liuhongt via Gcc-patches
Hi:
  Details discussed in PR.
  Bootstrapped and regtested on x86-64_linux-gnu{-m32,}.
  Pushed to master and GCC-11.

gcc/ChangeLog:

PR target/102166
* config/i386/amxbf16intrin.h : Remove macro check for __AMX_BF16__.
* config/i386/amxint8intrin.h : Remove macro check for __AMX_INT8__.
* config/i386/amxtileintrin.h : Remove macro check for __AMX_TILE__.

gcc/testsuite/ChangeLog:

PR target/102166
* g++.target/i386/pr102166.C: New test.
---
 gcc/config/i386/amxbf16intrin.h  |  2 +-
 gcc/config/i386/amxint8intrin.h  |  2 +-
 gcc/config/i386/amxtileintrin.h  |  2 +-
 gcc/testsuite/g++.target/i386/pr102166.C | 20 
 4 files changed, 23 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr102166.C

diff --git a/gcc/config/i386/amxbf16intrin.h b/gcc/config/i386/amxbf16intrin.h
index 8c24cdd50a0..1d60e8e609f 100644
--- a/gcc/config/i386/amxbf16intrin.h
+++ b/gcc/config/i386/amxbf16intrin.h
@@ -34,7 +34,7 @@
 #define __DISABLE_AMX_BF16__
 #endif /* __AMX_BF16__ */
 
-#if defined(__x86_64__) && defined(__AMX_BF16__)
+#if defined(__x86_64__)
 #define _tile_dpbf16ps_internal(dst,src1,src2) 
\
   __asm__ volatile\
   ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1", 
%%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
diff --git a/gcc/config/i386/amxint8intrin.h b/gcc/config/i386/amxint8intrin.h
index 180c2436278..dbb7b6cc5ad 100644
--- a/gcc/config/i386/amxint8intrin.h
+++ b/gcc/config/i386/amxint8intrin.h
@@ -34,7 +34,7 @@
 #define __DISABLE_AMX_INT8__
 #endif /* __AMX_INT8__ */
 
-#if defined(__x86_64__) && defined(__AMX_INT8__)
+#if defined(__x86_64__)
 #define _tile_int8_dp_internal(name,dst,src1,src2) 
\
   __asm__ volatile \
   ("{"#name"\t%%tmm"#src2", %%tmm"#src1", %%tmm"#dst"|"#name"\t%%tmm"#dst", 
%%tmm"#src1", %%tmm"#src2"}" ::)
diff --git a/gcc/config/i386/amxtileintrin.h b/gcc/config/i386/amxtileintrin.h
index 16c8b6ef681..75d784ad160 100644
--- a/gcc/config/i386/amxtileintrin.h
+++ b/gcc/config/i386/amxtileintrin.h
@@ -34,7 +34,7 @@
 #define __DISABLE_AMX_TILE__
 #endif /* __AMX_TILE__ */
 
-#if defined(__x86_64__) && defined(__AMX_TILE__)
+#if defined(__x86_64__)
 extern __inline void
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _tile_loadconfig (const void *__config)
diff --git a/gcc/testsuite/g++.target/i386/pr102166.C 
b/gcc/testsuite/g++.target/i386/pr102166.C
new file mode 100644
index 000..751cd2c6d26
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr102166.C
@@ -0,0 +1,20 @@
+/* PR target/102166 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -std=c++14" } */
+
+#include
+__attribute__((target("amx-tile"))) void amx()
+{
+  _tile_loadd(0, 0, 0);
+  _tile_release();
+}
+
+__attribute__((target("amx-int8"))) void amxint8()
+{
+  _tile_dpbssd(0, 1, 2);
+}
+
+__attribute__((target("amx-bf16"))) void amxbf16()
+{
+  _tile_dpbf16ps (0, 1, 2);
+}
-- 
2.27.0



obstack.h __PTR_ALIGN vs. ubsan

2021-09-01 Thread Alan Modra via Gcc-patches
Current ubsan complains on every use of __PTR_ALIGN (when ptrdiff_t is
as large as a pointer), due to making calculations relative to a NULL
pointer.  This patch avoids the problem by extracting out and
simplifying __BPTR_ALIGN for the usual case.  I've continued to use
ptrdiff_t here, where it might be better to throw away __BPTR_ALIGN
entirely and just assume uintptr_t exists.

OK to apply for gcc?

* obstack.h (__PTR_ALIGN): Expand and simplify __BPTR_ALIGN
rather than calculating relative to a NULL pointer.

diff --git a/include/obstack.h b/include/obstack.h
index a6eb6c95587..0d8746f835b 100644
--- a/include/obstack.h
+++ b/include/obstack.h
@@ -137,9 +137,9 @@
relative to B.  Otherwise, use the faster strategy of computing the
alignment relative to 0.  */
 
-#define __PTR_ALIGN(B, P, A) \
-  __BPTR_ALIGN (sizeof (ptrdiff_t) < sizeof (void *) ? (B) : (char *) 0,  \
-P, A)
+#define __PTR_ALIGN(B, P, A)   \
+  (sizeof (ptrdiff_t) < sizeof (void *) ? __BPTR_ALIGN (B, P, A)   \
+   : (char *) (((ptrdiff_t) (P) + (A)) & ~(A)))
 
 #ifndef __attribute_pure__
 # if defined __GNUC_MINOR__ && __GNUC__ * 1000 + __GNUC_MINOR__ >= 2096

-- 
Alan Modra
Australia Development Lab, IBM


Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-01 Thread Martin Sebor via Gcc-patches

On 8/30/21 4:46 AM, Thomas Schwinge wrote:

Hi!

Ping -- we still need to plug the memory leak; see patch attached, and/or
long discussion here:


Thanks for answering my questions.  I have no concerns with going
forward with the patch as is.  Just a suggestion/request: unless
this patch fixes all the outstanding problems you know of or suspect
in this area (leaks/missing dtor calls) and unless you plan to work
on those in the near future, please open a bug for them with a brain
dump of what you learned.  That should save us time when the day
comes to tackle those.

Martin



On 2021-08-16T14:10:00-0600, Martin Sebor  wrote:

On 8/16/21 6:44 AM, Thomas Schwinge wrote:

On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc  wrote:

On 8/6/21 10:57 AM, Thomas Schwinge wrote:

So I'm trying to do some C++...  ;-)

Given:

   /* A map from SSA names or var decls to record fields.  */
   typedef hash_map field_map_t;

   /* For each propagation record type, this is a map from SSA names or var 
decls
  to propagate, to the field in the record type that should be used for
  transmission and reception.  */
   typedef hash_map record_field_map_t;

Thus, that's a 'hash_map>'.  (I may do that,
right?)  Looking through GCC implementation files, very most of all uses
of 'hash_map' boil down to pointer key ('tree', for example) and
pointer/integer value.


Right.  Because most GCC containers rely exclusively on GCC's own
uses for testing, if your use case is novel in some way, chances
are it might not work as intended in all circumstances.

I've wrestled with hash_map a number of times.  A use case that's
close to yours (i.e., a non-trivial value type) is in cp/parser.c:
see class_to_loc_map_t.


Indeed, at the time you sent this email, I already had started looking
into that one!  (The Fortran test cases that I originally analyzed, which
triggered other cases of non-POD/non-trivial destructor, all didn't
result in a memory leak, because the non-trivial constructor doesn't
actually allocate any resources dynamically -- that's indeed different in
this case here.)  ..., and indeed:


(I don't remember if I tested it for leaks
though.  It's used to implement -Wmismatched-tags so compiling
a few tests under Valgrind should show if it does leak.)


... it does leak memory at present.  :-| (See attached commit log for
details for one example.)


(Attached "Fix 'hash_table::expand' to destruct stale Value objects"
again.)


To that effect, to document the current behavior, I propose to
"Add more self-tests for 'hash_map' with Value type with non-trivial
constructor/destructor"


(We've done that in commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b
"Add more self-tests for 'hash_map' with Value type with non-trivial
constructor/destructor", quickly followed by bug fix
commit bb04a03c6f9bacc890118b9e12b657503093c2f8
"Make 'gcc/hash-map-tests.c:test_map_of_type_with_ctor_and_dtor_expand'
work on 32-bit architectures [PR101959]".


(Also cherry-pick into release branches, eventually?)



Then:

   record_field_map_t field_map ([...]); // see below
   for ([...])
 {
   tree record_type = [...];
   [...]
   bool existed;
   field_map_t &fields
 = field_map.get_or_insert (record_type, &existed);
   gcc_checking_assert (!existed);
   [...]
   for ([...])
 fields.put ([...], [...]);
   [...]
 }
   [stuff that looks up elements from 'field_map']
   field_map.empty ();

This generally works.

If I instantiate 'record_field_map_t field_map (40);', Valgrind is happy.
If however I instantiate 'record_field_map_t field_map (13);' (where '13'
would be the default for 'hash_map'), Valgrind complains:

   2,080 bytes in 10 blocks are definitely lost in loss record 828 of 876
  at 0x483DD99: calloc (vg_replace_malloc.c:762)
  by 0x175F010: xcalloc (xmalloc.c:162)
  by 0xAF4A2C: hash_table, tree_node*> >::hash_entry, false, 
xcallocator>::hash_table(unsigned long, bool, bool, bool, mem_alloc_origin) (hash-table.h:275)
  by 0x15E0120: hash_map, tree_node*> 
>::hash_map(unsigned long, bool, bool, bool) (hash-map.h:143)
  by 0x15DEE87: hash_map, tree_node*> >, 
simple_hashmap_traits, hash_map, tree_node*> > > >::get_or_insert(tree_node* const&, 
bool*) (hash-map.h:205)
  by 0x15DD52C: execute_omp_oacc_neuter_broadcast() 
(omp-oacc-neuter-broadcast.cc:1371)
  [...]

(That's with '#pragma GCC optimize "O0"' at the top of the 'gcc/*.cc'
file.)

My suspicion was that it is due to the 'field_map' getting resized as it
incrementally grows (and '40' being big enough for that to never happen),
and somehow the non-POD (?) value objects not being properly handled
during that.  Working my way a bit through 'gcc/hash-map.*' and
'gcc/hash-table.*' (but not claiming that I understand all that, off
hand), it seems as if my theory is right: I'm able to plug this

Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-09-01 Thread Martin Sebor via Gcc-patches

On 9/1/21 1:35 PM, Thomas Schwinge wrote:

Hi!

On 2021-06-23T13:47:08-0600, Martin Sebor via Gcc-patches 
 wrote:

On 6/22/21 5:28 PM, David Malcolm wrote:

On Tue, 2021-06-22 at 19:18 -0400, David Malcolm wrote:

On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote:

The attached patch introduces the suppress_warning(),
warning_suppressed(), and copy_no_warning() APIs [etc.]


Martin, great work on this!

I was a bit surprised to see this key on 'location_t's -- but indeed it
appears to do the right thing.

I now had a bit of a deep dive into some aspects of this, in context of
 "gcc/sparseset.h:215:20: error: suggest
parentheses around assignment used as truth value [-Werror=parentheses]"
that I recently filed.  This seems difficult to reproduce, but I'm still
able to reliably reproduce it in one specific build
configuration/directory/machine/whatever.  Initially, we all quickly
assumed that it'd be some GC issue -- but "alas", it's not, at least not
directly.  (But I'll certainly assume that some GC aspects are involved
which make this issue come and go across different GCC sources revisions,
and difficult to reproduce.)

First, two pieces of cleanup:


--- /dev/null
+++ b/gcc/warning-control.cc



+template 
+void copy_warning (ToType to, FromType from)
+{
+  const key_type_t to_key = convert_to_key (to);
+
+  if (nowarn_spec_t *from_map = get_nowarn_spec (from))
+{
+  /* If there's an entry in the map the no-warning bit must be set.  */
+  gcc_assert (get_no_warning_bit (from));
+
+  if (!nowarn_map)
+ nowarn_map = xint_hash_map_t::create_ggc (32);


OK to push "Simplify 'gcc/diagnostic-spec.h:nowarn_map' setup", see
attached?  If we've just read something from the map, we can be sure that
it exists.  ;-)


Cleanup is definitely okay by me.  I can't formally approve anything
but this looks clearly correct and an improvement.




--- /dev/null
+++ b/gcc/diagnostic-spec.h



+typedef location_t key_type_t;
+typedef int_hash  xint_hash_t;
+typedef hash_map xint_hash_map_t;
+
+/* A mapping from the location of an expression to the warning spec
+   set for it.  */
+extern GTY(()) xint_hash_map_t *nowarn_map;


More on that data structure setup in a later email; here I'd like to
"Clarify 'key_type_t' to 'location_t' as used for
'gcc/diagnostic-spec.h:nowarn_map'", see attached.  OK to push?  To make
it obvious what exactly the key type is.  No change in behavior.


That's fine with me too and also like worthwhile cleanup.

FWIW, I used different key_type_t while prototyping the solution but
now that we've settled on location_t I see no reason not to use it
directly.

By the way, it seems we should probably also use a manifest constant
for Empty (probably UNKNOWN_LOCATION since we're reserving it).



Why is this relevant?  Via current 'int_hash',
we create a 'int_hash' using "spare" value '0' for 'Empty' marker, and
"spare" value 'UINT_MAX' for 'Deleted' marker.  Now, the latter is
unlikely to ever trigger (but still not correct -- patch in testing), but
the former triggers very much so: value '0' is, per 'gcc/input.h':

 #define UNKNOWN_LOCATION ((location_t) 0)

..., and there are no safe-guards in the code here, so we'll happily put
key 'UNKNOWN_LOCATION' into the 'nowarn_map', and all the
'UNKNOWN_LOCATION' entries share (replace?) one single warning
disposition (problem!), and at the same time that key value is also used
as the 'Empty' marker (problem!).  I have not tried to understand why
this doesn't cause much greater breakage, but propose to fix this as per
the attached "Don't maintain a warning spec for
'UNKNOWN_LOCATION'/'BUILTINS_LOCATION' [PR101574]".  OK to push?


You're right that all expressions/statements with no location end
up sharing the same entry in the map when one is written to.
The entry should never be read from for expressions or statements
(they should fall back on their no-warning bit).  The entry should
only be read by a call to warning_suppressed_at(loc, ...).
The Empty problem aside, I would think it's reasonable to return
the union of all suppressions for location zero (or for the decls
of all builtins).  But I never tested this (I'm not really sure
how), and I forgot to consider that UNKNOWN_LOCATION has the same
value as Empty.  So I agree that it ought to be fixed.



Leaving aside that for 'UNKNOWN_LOCATION' -- per my understanding, at
least, as per above -- the current implementation isn't doing the right
thing anyway, Richard had in

toyed with the idea that we for "UNKNOWN_LOCATION create a new location
with the source location being still UNKNOWN but with the appropriate
ad-hoc data to disable the warning".  On the other hand, we have Martin's
initial goal,
,
that he'd like to "work toward providing locations for all
expressions/statements".

[PATCH v3] x86-64: Add ABI warning for 64-bit vectors

2021-09-01 Thread H.J. Lu via Gcc-patches
TYPE_MODE of record and union depends on whether vector_mode_supported_p
returns true or not.  x86-64 backend uses TYPE_MODE to decide how to pass
a parameter and return a value in a function.  64-bit integer vectors
were supported only by MMX and 64-bit float vector was supported only by
3DNOW.  GCC 10 enabled 64-bit integer vectors without MMX by:

commit dfa61b9ed06d71901c4c430caa89820972ad68fe
Author: H.J. Lu 
Date:   Wed May 15 15:02:54 2019 +

i386: Allow MMX register modes in SSE registers

In 64-bit mode, SSE2 can be used to emulate MMX instructions without
3DNOW.  We can use SSE2 to support MMX register modes.

GCC 11 enabled 64-bit float vector without 3DNOW by:

commit 7c355156aa20eaec7401d7c66f6a6cfbe597abc2
Author: Uros Bizjak 
Date:   Mon May 11 11:16:31 2020 +0200

i386: Vectorize basic V2SFmode operations [PR94913]

Enable V2SFmode vectorization and vectorize V2SFmode PLUS,
MINUS, MULT, MIN and MAX operations using XMM registers.

Add ABI warnings for 64-bit integer vectors without MMX and 64-bit float
vector without 3DNOW.

gcc/

PR target/102027
PR target/102105
* config/i386/i386.c (m64_mode): New function.
(examine_argument): Add ABI warnings for 64-bit vectors.

gcc/testsuite/

PR target/102027
PR target/102105
* gcc.target/i386/pr102027-1.c: New test.
* gcc.target/i386/pr102027-2.c: Likewise.
* gcc.target/i386/pr102027-3.c: Likewise.
* gcc.target/i386/pr102027-4.c: Likewise.
* gcc.target/i386/pr102105-1.c: Likewise.
* gcc.target/i386/pr102105-2.c: Likewise.
* gcc.target/i386/pr102105-3.c: Likewise.
* gcc.target/i386/pr102105-4.c: Likewise.
* gcc.target/i386/pr102105-5.c: Likewise.
* gcc.target/i386/sse2-mmx-4.c: Add -Wno-psabi.
---
 gcc/config/i386/i386.c | 111 +
 gcc/testsuite/gcc.target/i386/pr102027-1.c |  15 +++
 gcc/testsuite/gcc.target/i386/pr102027-2.c |  15 +++
 gcc/testsuite/gcc.target/i386/pr102027-3.c |  17 
 gcc/testsuite/gcc.target/i386/pr102027-4.c |  17 
 gcc/testsuite/gcc.target/i386/pr102105-1.c |  15 +++
 gcc/testsuite/gcc.target/i386/pr102105-2.c |  15 +++
 gcc/testsuite/gcc.target/i386/pr102105-3.c |  17 
 gcc/testsuite/gcc.target/i386/pr102105-4.c |  17 
 gcc/testsuite/gcc.target/i386/pr102105-5.c |  17 
 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c |   2 +-
 11 files changed, 257 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102027-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102027-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102027-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102027-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102105-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102105-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102105-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102105-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102105-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4681b667fa2..a42c5355f6c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2462,6 +2462,41 @@ classify_argument (machine_mode mode, const_tree type,
 }
 }
 
+/* Return the mode of the 64-bit vector type in TYPE.  */
+
+static machine_mode
+m64_mode (const_tree type)
+{
+  if ((TREE_CODE (type) == RECORD_TYPE
+   || TREE_CODE (type) == UNION_TYPE
+   || TREE_CODE (type) == QUAL_UNION_TYPE))
+{
+  const_tree field;
+
+  for (field = TYPE_FIELDS (type);
+  field;
+  field = DECL_CHAIN (field))
+   if (TREE_CODE (field) == FIELD_DECL)
+ {
+   const_tree field_type = TREE_TYPE (field);
+   if (field_type == error_mark_node)
+ continue;
+
+   /* Return the mode of a 64-bit vector.  */
+   if (TREE_CODE (field_type) == VECTOR_TYPE
+   && VECTOR_MODE_P (TYPE_MODE (field_type))
+   && GET_MODE_SIZE (TYPE_MODE (field_type)) == 8)
+ return TYPE_MODE (field_type);
+
+   machine_mode mode = m64_mode (field_type);
+   if (mode != VOIDmode)
+ return mode;
+ }
+}
+
+  return VOIDmode;
+}
+
 /* Examine the argument and return set number of register required in each
class.  Return true iff parameter should be passed in memory.  */
 
@@ -2477,6 +2512,82 @@ examine_argument (machine_mode mode, const_tree type, 
int in_return,
 
   if (!n)
 return true;
+
+  if (warn_psabi
+  && n == 1
+  && regclass[0] == X86_64_SSE_CLASS
+  && (VECTOR_MODE_P (mode) || type)
+  && GET_MODE_SIZE (mode) == 8)
+{
+  const char *url;
+  if (!VECTOR_MODE_P (mode))
+   mode = m64_mode (type);
+  if (mode == V2SFmode)
+   {
+ /* GCC 11 enables V2SFmode without TARGET_3DNOW.  */
+ if (!TARGET_3DNOW)
+   {
+  

Re: [PATCH v2] x86-64: Add ABI warning for 64-bit vectors

2021-09-01 Thread H.J. Lu via Gcc-patches
On Mon, Aug 30, 2021 at 6:49 AM Jakub Jelinek  wrote:
>
> On Sun, Aug 29, 2021 at 12:11:23PM -0700, H.J. Lu wrote:
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -1840,6 +1840,54 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
> > Argument info to initialize */
> >cfun->machine->arg_reg_available = (cum->nregs > 0);
> >  }
> >
> > +/* Return the single 64-bit vector type of TYPE.  */
> > +
> > +static const_tree
> > +single_m64_base_type (const_tree type)
> > +{
> > +  if ((TREE_CODE (type) == RECORD_TYPE
> > +   || TREE_CODE (type) == UNION_TYPE
> > +   || TREE_CODE (type) == QUAL_UNION_TYPE)
> > +  && int_size_in_bytes (type) == 8)
> > +{
> > +  const_tree field;
> > +  const_tree first_field = nullptr;
> > +
> > +  for (field = TYPE_FIELDS (type);
> > +field;
> > +field = DECL_CHAIN (field))
> > + if (TREE_CODE (field) == FIELD_DECL)
> > +   {
> > + if (TREE_TYPE (field) == error_mark_node)
> > +   continue;
> > +
> > + /* Skip if structure has more than one field.  */
> > + if (first_field)
> > +   return nullptr;
> > +
> > + first_field = field;
> > +   }
> > +
> > +  /* Skip if structure doesn't have any fields.  */
> > +  if (!first_field)
> > + return nullptr;
>
> Is this an attempt to emulate compute_record_mode or something else?
> How should it treat zero-width bitfields (either the C ones kept
> in the structures or C++ ones formerly removed from them and now no longer)?
> compute_record_mode actually has more complicated details...

I will submit the v3 patch.


-- 
H.J.


Re: [PATCH 1/2]middle-end Teach CSE to be able to do vector extracts.

2021-09-01 Thread Jeff Law via Gcc-patches




On 8/31/2021 7:29 AM, Tamar Christina wrote:

Hi All,

This patch gets CSE to re-use constants already inside a vector rather than
re-materializing the constant again.

Basically consider the following case:

#include 
#include 

uint64_t
test (uint64_t a, uint64x2_t b, uint64x2_t* rt)
{
   uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL};
   uint64_t res = a | arr[0];
   uint64x2_t val = vld1q_u64 (arr);
   *rt = vaddq_u64 (val, b);
   return res;
}

The actual behavior is inconsequential however notice that the same constants
are used in the vector (arr and later val) and in the calculation of res.

The code we generate for this however is quite sub-optimal:

test:
 adrpx2, .LC0
 sub sp, sp, #16
 ldr q1, [x2, #:lo12:.LC0]
 mov x2, 16502
 movkx2, 0x1023, lsl 16
 movkx2, 0x4308, lsl 32
 add v1.2d, v1.2d, v0.2d
 movkx2, 0x942, lsl 48
 orr x0, x0, x2
 str q1, [x1]
 add sp, sp, 16
 ret
.LC0:
 .xword  667169396713799798
 .xword  667169396713799798

Essentially we materialize the same constant twice.  The reason for this is
because the front-end lowers the constant extracted from arr[0] quite early on.
If you look into the result of fre you'll find

:
   arr[0] = 667169396713799798;
   arr[1] = 667169396713799798;
   res_7 = a_6(D) | 667169396713799798;
   _16 = __builtin_aarch64_ld1v2di (&arr);
   _17 = VIEW_CONVERT_EXPR(_16);
   _11 = b_10(D) + _17;
   *rt_12(D) = _11;
   arr ={v} {CLOBBER};
   return res_7;

Which makes sense for further optimization.  However come expand time if the
constant isn't representable in the target arch it will be assigned to a
register again.

(insn 8 5 9 2 (set (reg:V2DI 99)
 (const_vector:V2DI [
 (const_int 667169396713799798 [0x942430810234076]) repeated x2
 ])) "cse.c":7:12 -1
  (nil))
...
(insn 14 13 15 2 (set (reg:DI 103)
 (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1
  (nil))
(insn 15 14 16 2 (set (reg:DI 102 [ res ])
 (ior:DI (reg/v:DI 96 [ a ])
 (reg:DI 103))) "cse.c":8:12 -1
  (nil))

And since it's out of the immediate range of the scalar instruction used
combine won't be able to do anything here.

This will then trigger the re-materialization of the constant twice.

To fix this this patch extends CSE to be able to generate an extract for a
constant from another vector, or to make a vector for a constant by duplicating
another constant.

Whether this transformation is done or not depends entirely on the costing for
the target for the different constants and operations.

I Initially also investigated doing this in PRE, but PRE requires at least 2 BB
to work and does not currently have any way to remove redundancies within a
single BB and it did not look easy to support.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* cse.c (find_sets_in_insn): Register constants in sets.
(cse_insn): Try materializing using vec_dup.

Looks good to me.

If you can turn that example into a test, even if it's just in the 
aarch64 directory, that would be helpful


Thanks,
Jeff



[committed] [PR tree-optimization/102152] Call reduce_vector_comparison_to_scalar_comparison earlier

2021-09-01 Thread Jeff Law via Gcc-patches
As noted in the PR, we can get an ICE after the introduction of code to 
reduce a vector comparison to a scalar.  The problem is we left the 
operand cache in an inconsistent state because we called the new 
function too late. This is trivially fixed by making the transformation 
before we call update_stmt_if_modified.


The irony here is the whole point of calling 
reduce_vector_comparison_to_scalar_comparison when we did was to expose 
these kinds of secondary opportunities.  In this particular case we 
collapsed the test to a comparison of constants (thus no SSA operands).


Anyway, this fixes the problem in the obvious way.  This may all end up 
being moot if I can twiddle Richi's match.pd pattern to work.  It 
doesn't work as-written due to a couple issues that I haven't worked 
totally through yet.  It seemed better to get the regression fixed 
immediately rather than wait for the match.pd work.


Installed on the trunk after bootstrap & regression testing on x86 and 
verifying it addresses the aarch64 issue.


Jeff
commit 165446a1e81f5bb9597289e783af9ee67e1fe5ba
Author: Jeff Law 
Date:   Wed Sep 1 19:13:58 2021 -0400

Call reduce_vector_comparison_to_scalar_comparison earlier

As noted in the PR, we can get an ICE after the introduction of code to 
reduce a vector comparison to a scalar.  The problem is we left the operand 
cache in an inconsistent state because we called the new function too late.   
This is trivially fixed by making the transformation before we call 
update_stmt_if_modified.

The irony here is the whole point of calling 
reduce_vector_comparison_to_scalar_comparison when we did was to expose these 
kinds of secondary opportunities.  In this particular case we collapsed the 
test to a comparison of constants (thus no SSA operands).

Anyway, this fixes the problem in the obvious way.  This may all end up 
being moot if I can twiddle Richi's match.pd pattern to work.  It doesn't work 
as-written due to a couple issues that I haven't worked totally through yet.

Installed on the trunk after bootstrap & regression testing on x86 and 
verifying it addresses the aarch64 issue.

gcc/
PR tree-optimization/102152
* tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Reduce a 
vector
comparison to a scalar comparison before calling
update_stmt_if_modified.

gcc/testsuite/
PR tree-optimization/102152
* gcc.dg/pr102152.c: New test

diff --git a/gcc/testsuite/gcc.dg/pr102152.c b/gcc/testsuite/gcc.dg/pr102152.c
new file mode 100644
index 000..4e0c1f5a3d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102152.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-loop-vectorize -fno-tree-fre" } */
+/* { dg-additional-options "-march=armv8-a+sve" { target aarch64-*-* } } */
+
+
+
+signed char i;
+
+void
+foo (void)
+{
+  for (i = 0; i < 6; i += 5)
+;
+}
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index a5245b33de6..49d8f96408f 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1990,14 +1990,14 @@ dom_opt_dom_walker::optimize_stmt (basic_block bb, 
gimple_stmt_iterator *si,
   print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 }
 
-  update_stmt_if_modified (stmt);
-  opt_stats.num_stmts++;
-
   /* STMT may be a comparison of uniform vectors that we can simplify
  down to a comparison of scalars.  Do that transformation first
  so that all the scalar optimizations from here onward apply.  */
   reduce_vector_comparison_to_scalar_comparison (stmt);
 
+  update_stmt_if_modified (stmt);
+  opt_stats.num_stmts++;
+
   /* Const/copy propagate into USES, VUSES and the RHS of VDEFs.  */
   cprop_into_stmt (stmt, m_evrp_range_analyzer);
 


Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Martin Sebor via Gcc-patches

On 9/1/21 3:39 PM, Jason Merrill wrote:

On 9/1/21 4:33 PM, Martin Sebor wrote:

On 9/1/21 1:21 PM, Jason Merrill wrote:

On 8/31/21 10:08 PM, Martin Sebor wrote:

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

   int f (void)
   {
 int a[2][2];
 return &a == 0;
   }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

   int f (void)
   {
 int a[2][2];
 return a[0] == 0;
   }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

   int g (int i)
   {
 return a[0] + i == 0;
   }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.



+  while (TREE_CODE (cop) == ARRAY_REF
+ || TREE_CODE (cop) == COMPONENT_REF)
+    {
+  unsigned opno = TREE_CODE (cop) == COMPONENT_REF;
+  cop = TREE_OPERAND (cop, opno);
+    }


1) Maybe 'while (handled_component_p (cop))'?
2) Why handle COMPONENT_REF differently?  Operand 1 is the 
FIELD_DECL, which doesn't have an address of its own.


This is because the address of a field is never null, regardless of
what the P in in &P->M points to.


True, though I'd change "invalid" to "undefined" in the comment for 
decl_with_nonnull_addr_p.



(With the caveat mentioned in
the comment further up about the pointer used to access the member
being nonnull.)  So this is diagnosed:

   extern struct { int m; } *p;
   bool b = &p->m == 0;

Using handled_component_p() in a loop would prevent that.


Would it?  p isn't declared weak.


Maybe I misunderstood.  This loop:

  while (handled_component_p (cop))
cop = TREE_OPERAND (cop, 0);

would unwrap the COMPONENT_REF from cop and terminate with it set
to INDIRECT_REF for which decl_with_nonnull_addr_p() would return
false.  But if you meant to keep the body as is and just change
the condition, that would work.  If you think that's better,
e.g., because it would handle more cases, I'm all for it.




For array_refs, the loop gets us the decl to mention in the warning.
But this should work too and looks cleaner:

   cop = TREE_OPERAND (cop, 0);

   /* Get the outermost array.  */
   while (TREE_CODE (cop) == ARRAY_REF)
 cop = TREE_OPERAND (cop, 0);

   /* Get the member (its address is never null).  */
   if (TREE_CODE (cop) == COMPONENT_REF)
 cop = TREE_OPERAND (cop, 1);

Do you prefer the above instead?


Sure.  OK with that change and the comment tweak above.


Thanks.  Are you approving the whole patch (including the C FE
changes) or just the C++ part?

Martin


Re: [PATCH] c++, abi: Set DECL_FIELD_ABI_IGNORED on C++ zero width bitfields [PR102024]

2021-09-01 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 01, 2021 at 05:32:07PM -0400, Jason Merrill wrote:
> On 8/31/21 5:15 AM, Richard Biener wrote:
> > On Tue, 31 Aug 2021, Jakub Jelinek wrote:
> > 
> > > On Tue, Aug 31, 2021 at 09:57:44AM +0200, Richard Biener wrote:
> > > > Just to clarify - in the C++ FE these fields are meaningful for
> > > > layout purposes but they are only supposed to influence layout
> > > > but not ABI (but why does the C++ FE say that?)
> 
> The code to remove zero-length bit-fields was copied from the C front end
> when G++ was first created.  It was removed from the C front end by Joseph's
> r84279.  The last thread for that patch is
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2004-July/thread.html#142984
> 
> The patch and that thread contain no discussion of zero-length bit-fields,
> nor is there one in any of the testcases added by the patch.
> 
> I guess the assumption both before and after that patch was that zero-length
> bit-fields would not affect ABI, whether or not they appeared in
> TYPE_FIELDS.  Are the targets where ABI is affected all new since then?

Ah, thanks for the archeology.  So it indeed seems like in theory an ABI change
between GCC 3.4 and 4.0 for C then on some of the targets like x86_64 which
already existed in 3.2-ish era.  I actually couldn't see a difference
between C and C++ in that era on x86_64, e.g. 3.3 C and C++ both work as
C and C++ now, as if the zero width bitfields aren't removed.
Before the PR42217 fix the C++ FE wasn't really removing the zero width 
bitfields
properly, so it is actually 4.5/4.4-ish regression for C++.

> > Anyway, I'm not stuck to whatever naming we choose but the situation
> > is complicated enough that we want some more elaborate docs in tree.h
> > I'll leave the final ACK to Jason (unless he's on vacation).
> 
> DECL_FIELD_FOR_LAYOUT sounds better to me.

Just for the zero-width bitfields, or also for the class FIELD_DECLs on
which we set DECL_FIELD_ABI_IGNORED now?
I could e.g. keep DECL_FIELD_ABI_IGNORED for the class ones, by
making
#define DECL_FIELD_ABI_IGNORED(field) \
  (!DECL_BIT_FIELD (field) && (FIELD_DECL_CHECK 
(NODE)->decl_common.decl_flag_0))
add SET_DECL_FIELD_ABI_IGNORED macro and similarly define
DECL_FIELD_FOR_LAYOUT and SET_DECL_FIELD_FOR_LAYOUT (just it would do
DECL_BIT_FIELD (field) check in that case).

Jakub



Re: C++ patch ping

2021-09-01 Thread Jason Merrill via Gcc-patches

On 9/1/21 4:11 PM, Jakub Jelinek wrote:

On Wed, Sep 01, 2021 at 03:25:17PM -0400, Jason Merrill wrote:

On 8/30/21 3:11 AM, Jakub Jelinek wrote:

Hi!

I'd like to ping the following patches

libcpp: __VA_OPT__ p1042r1 placemarker changes [PR101488]
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575621.html
together with your
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577602.html
incremental patch (successfully tested on x86_64-linux and i686-linux).


OK, thanks.


Thanks, committed both patches.


My reply to that patch approved it with a suggestion for a tweak to
ucn_valid_in_identifier.  Quoting it here:


I might check invalid_start_flags first, and return 1 if not set, then
check all the other flags when not pedantic, and finally return 2 if
nothing matches.  OK with or without this change.


Sorry for missing this, didn't scroll down enough.

I don't think something like:
   if (CPP_OPTION (pfile, cxx23_identifiers))
 invalid_start_flags = NXX23;
   else if (CPP_OPTION (pfile, c11_identifiers))
 invalid_start_flags = N11;
   else if (CPP_OPTION (pfile, c99))
 invalid_start_flags = N99;
   else
 invalid_start_flags = 0;

   /* In C99, UCN digits may not begin identifiers.  In C11 and C++11,
  UCN combining characters may not begin identifiers.  */
   if ((ucnranges[mn].flags & invalid_start_flags) == 0)
 return 1;

   /* If not -pedantic, accept as character that may
  begin an identifier a union of characters allowed
  at that position in each of the character sets.  */
   if (!CPP_PEDANTIC (pfile)
   && ((ucnranges[mn].flags & (C99 | N99)) == C99
   || (ucnranges[mn].flags & CXX) != 0
   || (ucnranges[mn].flags & (C11 | N11)) == C11
   || (ucnranges[mn].flags & (CXX23 | NXX23)) == CXX23))
 return 1;

   return 2;
would work, e.g. for C++98 invalid_start_flags is 0, so it would return
always 1, while the previous patch returned 2 for non-pedantic if the char
wasn't in the CXX set but was e.g. in the C99 set that wasn't allowed
as the first char (i.e. in & (C99 | N99) == (C99 | N99) set) etc.
While all C99 | N99 characters are C11 | 0, e.g.
\u0304 (and many others) are not in C99 at all, not in CXX, and in
C11 | N11 and in CXX23 | NXX23.  So they are never valid as start
characters.  There are also some characters like
\u1dfa which are not in C99 at all, not in CXX, not in CXX23 and in
C11 | N11, so again not valid as start character in any of the pedantic
modes.  IMHO we want to return 2 for them in non-pedantic.
And testing first
   if (ucnranges[mn].flags & invalid_start_flags)
 return 2;
and then doing the if !CPP_PEDANTIC stuff wouldn't work either, e.g.
\U0001d18b is in CXX23 | NXX23 and in C11 | 0, so we IMHO want to return
1 for that (allowed as start character in -pedantic -std=c++20, disallowed
as start character in -pedantic -std=c++23) but we would return 2
in -std=c++23 mode.


Fair enough.  Go ahead without changes, then.

Jason



Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Jason Merrill via Gcc-patches

On 9/1/21 4:33 PM, Martin Sebor wrote:

On 9/1/21 1:21 PM, Jason Merrill wrote:

On 8/31/21 10:08 PM, Martin Sebor wrote:

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

   int f (void)
   {
 int a[2][2];
 return &a == 0;
   }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

   int f (void)
   {
 int a[2][2];
 return a[0] == 0;
   }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

   int g (int i)
   {
 return a[0] + i == 0;
   }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.



+  while (TREE_CODE (cop) == ARRAY_REF
+ || TREE_CODE (cop) == COMPONENT_REF)
+    {
+  unsigned opno = TREE_CODE (cop) == COMPONENT_REF;
+  cop = TREE_OPERAND (cop, opno);
+    }


1) Maybe 'while (handled_component_p (cop))'?
2) Why handle COMPONENT_REF differently?  Operand 1 is the FIELD_DECL, 
which doesn't have an address of its own.


This is because the address of a field is never null, regardless of
what the P in in &P->M points to.


True, though I'd change "invalid" to "undefined" in the comment for 
decl_with_nonnull_addr_p.



(With the caveat mentioned in
the comment further up about the pointer used to access the member
being nonnull.)  So this is diagnosed:

   extern struct { int m; } *p;
   bool b = &p->m == 0;

Using handled_component_p() in a loop would prevent that.


Would it?  p isn't declared weak.


For array_refs, the loop gets us the decl to mention in the warning.
But this should work too and looks cleaner:

   cop = TREE_OPERAND (cop, 0);

   /* Get the outermost array.  */
   while (TREE_CODE (cop) == ARRAY_REF)
 cop = TREE_OPERAND (cop, 0);

   /* Get the member (its address is never null).  */
   if (TREE_CODE (cop) == COMPONENT_REF)
 cop = TREE_OPERAND (cop, 1);

Do you prefer the above instead?


Sure.  OK with that change and the comment tweak above.

Jason



Re: [PATCH] c++, abi: Set DECL_FIELD_ABI_IGNORED on C++ zero width bitfields [PR102024]

2021-09-01 Thread Jason Merrill via Gcc-patches

On 8/31/21 5:15 AM, Richard Biener wrote:

On Tue, 31 Aug 2021, Jakub Jelinek wrote:


On Tue, Aug 31, 2021 at 09:57:44AM +0200, Richard Biener wrote:

Just to clarify - in the C++ FE these fields are meaningful for
layout purposes but they are only supposed to influence layout
but not ABI (but why does the C++ FE say that?)


The code to remove zero-length bit-fields was copied from the C front 
end when G++ was first created.  It was removed from the C front end by 
Joseph's r84279.  The last thread for that patch is


https://gcc.gnu.org/pipermail/gcc-patches/2004-July/thread.html#142984

The patch and that thread contain no discussion of zero-length 
bit-fields, nor is there one in any of the testcases added by the patch.


I guess the assumption both before and after that patch was that 
zero-length bit-fields would not affect ABI, whether or not they 
appeared in TYPE_FIELDS.  Are the targets where ABI is affected all new 
since then?



and thus the
'DECL_FIELD_ABI_IGNORED' is a good term to use?  But we still want
to have the backends decide whether to actually follow this advice
and we do expect some to not do this?


The removal of zero-width bitfields was added (after structure layout)
by
https://gcc.gnu.org/legacy-ml/gcc-patches/1999-12/msg00589.html
https://gcc.gnu.org/legacy-ml/gcc-patches/1999-12/msg00641.html
The comment about it was:
/* Delete all zero-width bit-fields from the list of fields.  Now
that we have layed out the type they are no longer important.  */
The only spot I see zero-width bit-fields mentioned in the Itanium ABI is:

empty class
   A class with no non-static data members other than empty data members,
   no unnamed bit-fields other than zero-width bit-fields, no virtual functions,
   no virtual base classes, and no non-empty non-virtual proper base classes.

nearly empty class
   A class that contains a virtual pointer, but no other data except (possibly) 
virtual bases. In particular, it:
- has no non-static data members and no non-zero-width unnamed bit-fields,
- has no direct base classes that are not either empty, nearly empty, or 
virtual,
- has at most one non-virtual, nearly empty direct base class, and
- has no proper base class that is empty, not morally virtual, and at an 
offset other than zero.
   Such classes may be primary base classes even if virtual, sharing a virtual 
pointer with the derived class.

and the removal of remove_zero_width_bit_fields I believe didn't change
anything on that, e.g. is_empty_class uses CLASSTYPE_EMPTY_P flag whose
computation takes:
   if (DECL_C_BIT_FIELD (field)
   && integer_zerop (DECL_BIT_FIELD_REPRESENTATIVE (field)))
 /* We don't treat zero-width bitfields as making a class
non-empty.  */
 ;
into account (that is still before the bit-fields are finalized so
width is stored differently, and it is necessary before the
former remove_zero_width_bit_fields call).

The flag for these zero-width bitfields is a good name for the case
where a target decides to keep the old GCC 11 ABI of not ignoring them
for C and ignoring them for C++, in other cases it can be a little bit
confusing, but I think we could define another macro with the same
value for it if we find a good name for it (dunno what it would be though).
But even if we have another name, if we reuse the flag we need to take
it into account in the target code, and using a different flag would be a
waste of the precious bits.
Perhaps just clarify in tree.h above the DECL_FIELD_ABI_IGNORED the cases
in which it is set?


Yeah, I think it conflates the C++ [Itanium] ABI and the psABI for
calling conventions.  The 'ABI' in DECL_FIELD_ABI_IGNORED refers
to the psABI as far as I understand the situation, but then it
might still be important for the psABI when dealing with
(non-)homogenous aggregates ...

So _maybe_ DECL_FIELD_FOR_LAYOUT might capture the bits better - the
field is present for layout (and possibly ABI), but it doesn't carry
any data so it doesn't have to be passed across function boundary
for example.

Anyway, I'm not stuck to whatever naming we choose but the situation
is complicated enough that we want some more elaborate docs in tree.h
I'll leave the final ACK to Jason (unless he's on vacation).


DECL_FIELD_FOR_LAYOUT sounds better to me.

Jason



Re: [PATCH v3] md/define_c_enum: support value assignation

2021-09-01 Thread Andrew Pinski via Gcc-patches
On Tue, Aug 31, 2021 at 4:22 AM YunQiang Su  wrote:
>
> Currently, the enums from define_c_enum and define_enum can only
> has values one by one from 0.
>
> In fact we can support the behaviour just like C, aka like
>   (define_enum "mips_isa" [(mips1 1) mips2 (mips32 32) mips32r2]),
> then we can get
>   enum mips_isa {
> MIPS_ISA_MIPS1 = 1,
> MIPS_ISA_MIPS2 = 2,
> MIPS_ISA_MIPS32 = 32,
> MIPS_ISA_MIPS32R2 = 33
>   };
>
> gcc/ChangeLog:
> * read-md.c (md_reader::handle_enum): support value assignation.
> * doc/md.texi: record define_c_enum value assignation support.
> ---
>  gcc/doc/md.texi |  4 
>  gcc/read-md.c   | 21 +
>  2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index f8047aefc..2b41cb7fb 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -11074,6 +11074,8 @@ The syntax is as follows:
>  (define_c_enum "@var{name}" [
>@var{value0}
>@var{value1}
> +  (@var{value32} 32)
> +  @var{value33}
>@dots{}
>@var{valuen}
>  ])
> @@ -11086,6 +11088,8 @@ in @file{insn-constants.h}:
>  enum @var{name} @{
>@var{value0} = 0,
>@var{value1} = 1,
> +  @var{value32} = 32,
> +  @var{value33} = 33,
>@dots{}
>@var{valuen} = @var{n}
>  @};
> diff --git a/gcc/read-md.c b/gcc/read-md.c
> index bb419e0f6..0fbe924d1 100644
> --- a/gcc/read-md.c
> +++ b/gcc/read-md.c
> @@ -902,7 +902,8 @@ void
>  md_reader::handle_enum (file_location loc, bool md_p)
>  {
>char *enum_name, *value_name;
> -  struct md_name name;
> +  unsigned int cur_value;
> +  struct md_name name, value;
>struct enum_type *def;
>struct enum_value *ev;
>void **slot;
> @@ -928,6 +929,7 @@ md_reader::handle_enum (file_location loc, bool md_p)
>*slot = def;
>  }
>
> +  cur_value = def->num_values;
>require_char_ws ('[');
>
>while ((c = read_skip_spaces ()) != ']')
> @@ -937,8 +939,18 @@ md_reader::handle_enum (file_location loc, bool md_p)
>   error_at (loc, "unterminated construct");
>   exit (1);
> }
> -  unread_char (c);
> -  read_name (&name);
> +  if (c == '(')
> +   {
> + read_name (&name);
> + read_name (&value);
> + require_char_ws (')');
> + cur_value = atoi(value.string);

We really should be avoiding adding atoi.  Yes there are uses already
in the source but https://gcc.gnu.org/PR44574 exists to track those
uses.

Thanks,
Andrew


> +   }
> +  else
> +   {
> + unread_char (c);
> + read_name (&name);
> +   }
>
>ev = XNEW (struct enum_value);
>ev->next = 0;
> @@ -954,11 +966,12 @@ md_reader::handle_enum (file_location loc, bool md_p)
>   ev->name = value_name;
> }
>ev->def = add_constant (get_md_constants (), value_name,
> - md_decimal_string (def->num_values), def);
> + md_decimal_string (cur_value), def);
>
>*def->tail_ptr = ev;
>def->tail_ptr = &ev->next;
>def->num_values++;
> +  cur_value++;
>  }
>  }
>
> --
> 2.30.2
>


Re: [PATCH] c++: Fix ICE with nullptr comparison (GCC 11) [PR101592]

2021-09-01 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 01, 2021 at 05:18:09PM -0400, Marek Polacek wrote:
> On trunk, PR101592 was fixed by r12-2537, but that change shouldn't be
> backported to GCC 11.  In the PR Jakub suggested this fix, so here it
> is, after the usual testing.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for 11?
> 
>   PR c++/101592
> 
> gcc/ChangeLog:
> 
>   * fold-const.c (make_range_step): Return NULL_TREE for NULLPTR_TYPE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/warn/Wlogical-op-3.C: New test.

Ok, thanks.

Jakub



[PATCH] c++: Fix ICE with nullptr comparison (GCC 11) [PR101592]

2021-09-01 Thread Marek Polacek via Gcc-patches
On trunk, PR101592 was fixed by r12-2537, but that change shouldn't be
backported to GCC 11.  In the PR Jakub suggested this fix, so here it
is, after the usual testing.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for 11?

PR c++/101592

gcc/ChangeLog:

* fold-const.c (make_range_step): Return NULL_TREE for NULLPTR_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wlogical-op-3.C: New test.

Co-authored-by: Jakub Jelinek 
---
 gcc/fold-const.c  |  3 ++-
 gcc/testsuite/g++.dg/warn/Wlogical-op-3.C | 12 
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wlogical-op-3.C

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 39f120db38c..a1d08c74025 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -5003,7 +5003,8 @@ make_range_step (location_t loc, enum tree_code code, 
tree arg0, tree arg1,
 being not equal to zero; "out" is leaving it alone.  */
   if (low == NULL_TREE || high == NULL_TREE
  || ! integer_zerop (low) || ! integer_zerop (high)
- || TREE_CODE (arg1) != INTEGER_CST)
+ || TREE_CODE (arg1) != INTEGER_CST
+ || TREE_CODE (arg0_type) == NULLPTR_TYPE)
return NULL_TREE;
 
   switch (code)
diff --git a/gcc/testsuite/g++.dg/warn/Wlogical-op-3.C 
b/gcc/testsuite/g++.dg/warn/Wlogical-op-3.C
new file mode 100644
index 000..4b0bc22af4d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wlogical-op-3.C
@@ -0,0 +1,12 @@
+// PR c++/101592
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -Wlogical-op" }
+
+decltype(nullptr) foo ();
+
+bool
+bar ()
+{
+  return foo () > nullptr
+|| foo () < nullptr;
+}

base-commit: 051040f0642cfd002d31f655a70aef50e6f44d25
-- 
2.31.1



Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Koning, Paul via Gcc-patches



> On Sep 1, 2021, at 3:35 PM, Iain Sandoe  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> Hi Paul,
> 
>> ...
>> If so, then I would think that ignoring it for this patch as well is 
>> reasonable.  If in a given target a pointer that C thinks of as NULL is in 
>> fact a valid object pointer, then all sorts of optimizations are incorrect.  
>> If the target really cares, it can use a different representation for the 
>> null pointer.  (Does GCC give us a way to do that?)  For example, pdp11 
>> could use the all-ones bit pattern to represent an invalid pointer.
> 
> regardless of whether GCC supports it or not - trying to use a non-0 NULL 
> pointer is likely to break massive amounts of code in the wild.

It depends on what you mean by "non-0 NULL pointer".  The constant written as 0 
in pointer context doesn't represent the all-zeroes bit pattern but rather 
whatever is a null pointer on that target.  Most code would not notice that.  
The two places I can think of where this would break is (a) if you cast a 
pointer to int or look at it via a pointer/int union and expect to see integer 
zero, and (b) if you initialize pointers by using bzero.  The former seems 
rather unlikely, the latter is somewhat common.  Can GCC detect the bzero case? 
 It would make a good check for -Wpedantic on the usual platforms that use all 
zero bits as NULL.

> It might, OTOH, be possible to use a non-0 special value to represent the 
> valid 0 address-use (providing that there is somewhere in the address space 
> you can steal that from).

That would be really ugly, because every pointer reference would have to do the 
address translation at run time.

paul



Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Martin Sebor via Gcc-patches

On 9/1/21 1:21 PM, Jason Merrill wrote:

On 8/31/21 10:08 PM, Martin Sebor wrote:

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

   int f (void)
   {
 int a[2][2];
 return &a == 0;
   }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

   int f (void)
   {
 int a[2][2];
 return a[0] == 0;
   }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

   int g (int i)
   {
 return a[0] + i == 0;
   }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.



+  while (TREE_CODE (cop) == ARRAY_REF
+ || TREE_CODE (cop) == COMPONENT_REF)
+    {
+  unsigned opno = TREE_CODE (cop) == COMPONENT_REF;
+  cop = TREE_OPERAND (cop, opno);
+    }


1) Maybe 'while (handled_component_p (cop))'?
2) Why handle COMPONENT_REF differently?  Operand 1 is the FIELD_DECL, 
which doesn't have an address of its own.


This is because the address of a field is never null, regardless of
what the P in in &P->M points to.  (With the caveat mentioned in
the comment further up about the pointer used to access the member
being nonnull.)  So this is diagnosed:

  extern struct { int m; } *p;
  bool b = &p->m == 0;

Using handled_component_p() in a loop would prevent that.

For array_refs, the loop gets us the decl to mention in the warning.
But this should work too and looks cleaner:

  cop = TREE_OPERAND (cop, 0);

  /* Get the outermost array.  */
  while (TREE_CODE (cop) == ARRAY_REF)
cop = TREE_OPERAND (cop, 0);

  /* Get the member (its address is never null).  */
  if (TREE_CODE (cop) == COMPONENT_REF)
cop = TREE_OPERAND (cop, 1);

Do you prefer the above instead?

Martin


Re: [PATCH] Generate XXSPLTIDP on power10.

2021-09-01 Thread Michael Meissner via Gcc-patches
On Tue, Aug 31, 2021 at 06:41:30PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> Please do two separate patches.  The first that adds the instruction
> (with a bit pattern, i.e. integer, input), and perhaps a second pattern
> that has an fp as input and uses it if the constant is valid for the
> insn (survives being converted to SP and back to DP (or the other way
> around), and is not denormal).  That can be two patches if you want,
> but :-)

Well we already have the two stages for the built-in.  But the point of the
work is to make it work without the explicit builtin.

> Having the integer intermediate step not only makes the code hugely less
> complicated, but is also allows e.g.


> ===
> typedef unsigned long long v2u64 __attribute__ ((vector_size (16)));
> v2u64 f(void)
> {
> v2u64 x = { 0x8000, 0x8000 };
> return x;
> }
> ===
> 
> to be optimised properly.
> 
> The second part is letting the existing code use such FP (and integer!)
> contants.

It is somewhat harder to do this as integer values, because you have to check
whether all of the bits are correct, including skipping the SF denormal bits.

I tend to think in real life, the only win will be be -0.0
(i.e. 0x8000 that you mentioned).  I doubt many V2DI constants will
be appropriate canidates for this.

> 
> On Wed, Aug 25, 2021 at 03:46:43PM -0400, Michael Meissner wrote:
> > +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> > +(define_constraint "eF"
> > +  "A vector constant that can be loaded with the XXSPLTIDP instruction."
> > +  (match_operand 0 "xxspltidp_operand"))
> 
> vector *or float*.  It should allow all vectors, not just FP ones.
> 
> > +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that 
> > can be
> > +;; loaded via the ISA 3.1 XXSPLTIDP instruction.
> > +(define_predicate "xxspltidp_operand"
> > +  (match_code "const_double,const_vector,vec_duplicate")
> > +{
> > +  HOST_WIDE_INT value = 0;
> > +  return xxspltidp_constant_p (op, mode, &value);
> > +})
> 
> Don't do that.  Factor the code properly.  A predicate function should
> never have side effects.
> 
> Since this is the only place you want to convert the value to its bit
> pattern, you should just do that here.
> 
> (Btw, initialising the value (although the function always writes it) is
> not defensive programming, it is hiding bugs.  IMNSHO :-) )

And avoiding warnings.

> 
> > +bool
> > +xxspltidp_constant_p (rtx op,
> > + machine_mode mode,
> > + HOST_WIDE_INT *constant_ptr)
> > +{
> > +  *constant_ptr = 0;
> 
> And a second time, too!  Don't do either.
> 
> > +  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
> > +return false;
> 
> This is the wrong place to test these.  It belongs in the caller.
> 
> > +  if (CONST_VECTOR_P (op))
> > +   {
> > + element = CONST_VECTOR_ELT (op, 0);
> > + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
> > +   return false;
> > +   }
> 
> const_vec_duplicate_p

Ok.

> (But you actually should check if the bit pattern is valid, nothing
> more, nothing less).
> 
> > +  /* Don't return true for 0.0 since that is easy to create without
> > + XXSPLTIDP.  */
> > +  if (element == CONST0_RTX (mode))
> > +return false;
> 
> Don't do that.  Instead have whatever decides what insn to use choose
> more directly.
> 
> > +/* Whether a permute type instruction is a prefixed instruction.  This is
> > +   called from the prefixed attribute processing.  */
> > +
> > +bool
> > +prefixed_permute_p (rtx_insn *insn)
> 
> What does this have to do with this patch?

This is used by the prefixed type attribute to know that the xxsplti{w,dp,32dx}
instructions are prefixed by default, but unlike paddi, they don't want a
leading 'p' in the instruction.

The alternative is for each of the moves to add an explicit prefixed attribute,
instead of letting the default be set.


> > +{
> > +  rtx set = single_set (insn);
> > +  if (!set)
> > +return false;
> > +
> > +  rtx dest = SET_DEST (set);
> > +  rtx src = SET_SRC (set);
> > +  machine_mode mode = GET_MODE (dest);
> > +
> > +  if (!REG_P (dest) && !SUBREG_P (dest))
> > +return false;
> > +
> > +  switch (mode)
> > +{
> > +case DFmode:
> > +case SFmode:
> > +case V2DFmode:
> > +  return xxspltidp_operand (src, mode);
> 
> ??!!??
> 
> That is not a permute insn at all.
> 
> Perhaps you mean it is executed in the PM pipe on current
> implementations (all one of-em).  That does not make it a permute insn.
> It is not a good idea to call insns that do not have semantics similar
> to permutations "permute".

I'll come up with a different name.  But because it includes SPLAT-ing to the
other double word, I would tend to believe these instructions will always be a
permute class instruction.

> 
> > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
> > @@ -8051,20 +8057,21 @@ (define_insn "*mov_hardfloat32"
> > @

Re: C++ patch ping

2021-09-01 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 01, 2021 at 03:25:17PM -0400, Jason Merrill wrote:
> On 8/30/21 3:11 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > I'd like to ping the following patches
> > 
> > libcpp: __VA_OPT__ p1042r1 placemarker changes [PR101488]
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575621.html
> > together with your
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577602.html
> > incremental patch (successfully tested on x86_64-linux and i686-linux).
> 
> OK, thanks.

Thanks, committed both patches.

> My reply to that patch approved it with a suggestion for a tweak to
> ucn_valid_in_identifier.  Quoting it here:
>
> > I might check invalid_start_flags first, and return 1 if not set, then
> > check all the other flags when not pedantic, and finally return 2 if
> > nothing matches.  OK with or without this change.

Sorry for missing this, didn't scroll down enough.

I don't think something like:
  if (CPP_OPTION (pfile, cxx23_identifiers))
invalid_start_flags = NXX23;
  else if (CPP_OPTION (pfile, c11_identifiers))
invalid_start_flags = N11;
  else if (CPP_OPTION (pfile, c99))
invalid_start_flags = N99;
  else
invalid_start_flags = 0;

  /* In C99, UCN digits may not begin identifiers.  In C11 and C++11,
 UCN combining characters may not begin identifiers.  */
  if ((ucnranges[mn].flags & invalid_start_flags) == 0)
return 1;

  /* If not -pedantic, accept as character that may
 begin an identifier a union of characters allowed
 at that position in each of the character sets.  */
  if (!CPP_PEDANTIC (pfile)
  && ((ucnranges[mn].flags & (C99 | N99)) == C99
  || (ucnranges[mn].flags & CXX) != 0
  || (ucnranges[mn].flags & (C11 | N11)) == C11
  || (ucnranges[mn].flags & (CXX23 | NXX23)) == CXX23))
return 1;

  return 2;
would work, e.g. for C++98 invalid_start_flags is 0, so it would return
always 1, while the previous patch returned 2 for non-pedantic if the char
wasn't in the CXX set but was e.g. in the C99 set that wasn't allowed
as the first char (i.e. in & (C99 | N99) == (C99 | N99) set) etc.
While all C99 | N99 characters are C11 | 0, e.g.
\u0304 (and many others) are not in C99 at all, not in CXX, and in
C11 | N11 and in CXX23 | NXX23.  So they are never valid as start
characters.  There are also some characters like
\u1dfa which are not in C99 at all, not in CXX, not in CXX23 and in
C11 | N11, so again not valid as start character in any of the pedantic
modes.  IMHO we want to return 2 for them in non-pedantic.
And testing first
  if (ucnranges[mn].flags & invalid_start_flags)
return 2;
and then doing the if !CPP_PEDANTIC stuff wouldn't work either, e.g.
\U0001d18b is in CXX23 | NXX23 and in C11 | 0, so we IMHO want to return
1 for that (allowed as start character in -pedantic -std=c++20, disallowed
as start character in -pedantic -std=c++23) but we would return 2
in -std=c++23 mode.

Jakub



Re: [PATCH] Generate XXSPLTIDP on power10.

2021-09-01 Thread Michael Meissner via Gcc-patches
On Tue, Aug 31, 2021 at 05:52:48PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote:
> > On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote:
> > > On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> > > > Generate XXSPLTIDP on power10.
> > > > 
> > > > I have added a temporary switch (-mxxspltidp) to control whether or not 
> > > > the
> > > > XXSPLTIDP instruction is generated.
> > > 
> > > How temporary?  
> > 
> > Until we decide we no longer need to disable the option to do tests.  
> > Probably
> > at the end of stage1.
> 
> Don't do it at all please.  If it is useful to disable some new strategy
> for generating constants, a (temporary or at least undocumented) flag
> for that can be handy.  But a flag to disable separate insns is a
> liability, it makes the compiler much more fragile, makes changing the
> compiler hard because of all the surprises hidden.
> 
> > > > (xxspltidp_operand): New predicate.
> > > 
> > > Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
> > > V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
> > > but defer..
> > 
> > This is the convention I've used for adding other instructions like 
> > xxspltib.
> 
> The only reason it is a good idea here is because of the strange
> behaviour this insn has with single precision subnormals.  In general
> a better name here would be something like "sf_as_int_operand".  The
> insn should probably not allow anything else than bit patterns, not
> floating point constants, have a separate pattern for that (that can
> then forward to the integer one).
> 
> > This way we have just one place that centralizes the knowledge about the
> > instruction.
> 
> That one place should be the define_insn for it.

But you need the predicate and constraint for this to work.  That is what I was
talking about.  The define_insn is simple with this current method.  Otherwise,
the define_insn has to reparse the RTL.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Andreas Schwab
On Sep 01 2021, Iain Sandoe via Gcc-patches wrote:

> I wonder what things like m68k do that have vector tables at 0 …

Vector 0 is the reset stack pointer, if that isn't ROM you have a
problem.  On the Atari, the MCU redirects physical addresses 0-7 to the
system ROM.  Then there is the VBR on m68010+.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] Add MIPS Linux support to gcc.misc-tests/linkage.c (testsuite/51748)

2021-09-01 Thread Andrew Pinski via Gcc-patches
On Wed, Sep 1, 2021 at 2:46 AM YunQiang Su  wrote:
>
> Richard Sandiford via Gcc-patches 
> 于2021年9月1日周三 下午4:55写道:
> >
> > apinski--- via Gcc-patches  writes:
> > > From: Andrew Pinski 
> > >
> > > This adds MIPS Linux support to gcc.misc-tests/linkage.exp.  Basically
> > > copying what was done for MIPS IRIX and changing the options to be 
> > > correct.
> > >
> > > OK?
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR testsuite/51748
> > >   * gcc.misc-tests/linkage.exp: Add mips*-linux-* support.
> >
> > OK, thanks.  Searching for any match for 64 seems surprisingly general,
> > but it's what other cases do and has obviously stood the test of time.
> >
>
> syq@XX:~$ gcc -mips64r2 -mabi=64 -c zz.c && file zz.o
> zz.o: ELF 64-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
> not stripped
> syq@XX:~$ gcc -mips64r2 -mabi=32 -c zz.c && file zz.o
> zz.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
> not stripped
> syq@XX:~$ gcc -mips64r2 -mabi=n32 -c zz.c && file zz.o
> zz.o: ELF 32-bit LSB relocatable, MIPS, N32 MIPS64 rel2 version 1
> (SYSV), not stripped
>
> In the first glance, I also thought the code is wrong. While with some
> check, it does work.


Right and the order of if statements matter and they are not exclusive either.
So for N32, all three will match and that is ok. For o32, the first
two might match.
While for N64, only the first will match.

Thanks,
Andrew Pinski

>
> > Richard
> >
> > > ---
> > >  gcc/testsuite/gcc.misc-tests/linkage.exp | 12 
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/gcc/testsuite/gcc.misc-tests/linkage.exp 
> > > b/gcc/testsuite/gcc.misc-tests/linkage.exp
> > > index afed2b811c9..2cb109e776e 100644
> > > --- a/gcc/testsuite/gcc.misc-tests/linkage.exp
> > > +++ b/gcc/testsuite/gcc.misc-tests/linkage.exp
> > > @@ -38,6 +38,18 @@ if { [isnative] && ![is_remote host] } then {
> > >
> > >   # Need to ensure ABI for native compiler matches gcc
> > >   set native_cflags ""
> > > + if  [istarget "mips*-linux*"] {
> > > + set file_string [exec file "linkage-x.o"]
> > > + if [ string match "*64*" $file_string ] {
> > > + set native_cflags "-mabi=64"
> > > + }
> > > + if [ string match "*ELF 32*" $file_string ] {
> > > + set native_cflags "-mabi=32"
> > > + }
> > > + if [ string match "*N32*" $file_string ] {
> > > + set native_cflags "-mabi=n32"
> > > + }
> > > + }
> > >   if  [istarget "sparc*-sun-solaris2*"] {
> > >   set file_string [exec file "linkage-x.o"]
> > >   if [ string match "*64*" $file_string ] {


Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-09-01 Thread Thomas Schwinge
Hi!

On 2021-06-23T13:47:08-0600, Martin Sebor via Gcc-patches 
 wrote:
> On 6/22/21 5:28 PM, David Malcolm wrote:
>> On Tue, 2021-06-22 at 19:18 -0400, David Malcolm wrote:
>>> On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote:
 The attached patch introduces the suppress_warning(),
 warning_suppressed(), and copy_no_warning() APIs [etc.]

Martin, great work on this!

I was a bit surprised to see this key on 'location_t's -- but indeed it
appears to do the right thing.

I now had a bit of a deep dive into some aspects of this, in context of
 "gcc/sparseset.h:215:20: error: suggest
parentheses around assignment used as truth value [-Werror=parentheses]"
that I recently filed.  This seems difficult to reproduce, but I'm still
able to reliably reproduce it in one specific build
configuration/directory/machine/whatever.  Initially, we all quickly
assumed that it'd be some GC issue -- but "alas", it's not, at least not
directly.  (But I'll certainly assume that some GC aspects are involved
which make this issue come and go across different GCC sources revisions,
and difficult to reproduce.)

First, two pieces of cleanup:

> --- /dev/null
> +++ b/gcc/warning-control.cc

> +template 
> +void copy_warning (ToType to, FromType from)
> +{
> +  const key_type_t to_key = convert_to_key (to);
> +
> +  if (nowarn_spec_t *from_map = get_nowarn_spec (from))
> +{
> +  /* If there's an entry in the map the no-warning bit must be set.  */
> +  gcc_assert (get_no_warning_bit (from));
> +
> +  if (!nowarn_map)
> + nowarn_map = xint_hash_map_t::create_ggc (32);

OK to push "Simplify 'gcc/diagnostic-spec.h:nowarn_map' setup", see
attached?  If we've just read something from the map, we can be sure that
it exists.  ;-)

> --- /dev/null
> +++ b/gcc/diagnostic-spec.h

> +typedef location_t key_type_t;
> +typedef int_hash  xint_hash_t;
> +typedef hash_map xint_hash_map_t;
> +
> +/* A mapping from the location of an expression to the warning spec
> +   set for it.  */
> +extern GTY(()) xint_hash_map_t *nowarn_map;

More on that data structure setup in a later email; here I'd like to
"Clarify 'key_type_t' to 'location_t' as used for
'gcc/diagnostic-spec.h:nowarn_map'", see attached.  OK to push?  To make
it obvious what exactly the key type is.  No change in behavior.

Why is this relevant?  Via current 'int_hash',
we create a 'int_hash' using "spare" value '0' for 'Empty' marker, and
"spare" value 'UINT_MAX' for 'Deleted' marker.  Now, the latter is
unlikely to ever trigger (but still not correct -- patch in testing), but
the former triggers very much so: value '0' is, per 'gcc/input.h':

#define UNKNOWN_LOCATION ((location_t) 0)

..., and there are no safe-guards in the code here, so we'll happily put
key 'UNKNOWN_LOCATION' into the 'nowarn_map', and all the
'UNKNOWN_LOCATION' entries share (replace?) one single warning
disposition (problem!), and at the same time that key value is also used
as the 'Empty' marker (problem!).  I have not tried to understand why
this doesn't cause much greater breakage, but propose to fix this as per
the attached "Don't maintain a warning spec for
'UNKNOWN_LOCATION'/'BUILTINS_LOCATION' [PR101574]".  OK to push?

Leaving aside that for 'UNKNOWN_LOCATION' -- per my understanding, at
least, as per above -- the current implementation isn't doing the right
thing anyway, Richard had in

toyed with the idea that we for "UNKNOWN_LOCATION create a new location
with the source location being still UNKNOWN but with the appropriate
ad-hoc data to disable the warning".  On the other hand, we have Martin's
initial goal,
,
that he'd like to "work toward providing locations for all
expressions/statements".  (I agree -- and get rid of "location wrapper"
nodes at the same time...)  So there certainly is follow-on work to be
done re 'UNKNOWN_LOCATION's, but that's orthogonal to the issue I'm
fixing here.  (Just mentioning all this for context.)

I'm reasonably confident that my changes are doing the right things in
general, but please carefully review, especially here:

  - 'gcc/warning-control.cc:suppress_warning' functions: is it correct to
conditionalize on '!RESERVED_LOCATION_P' the 'suppress_warning_at'
calls and 'supp' update?  Or, should instead 'suppress_warning_at'
handle the case of '!RESERVED_LOCATION_P'?  (How?)

  - 'gcc/diagnostic-spec.c:copy_warning' and
'gcc/warning-control.cc:copy_warning': is the rationale correct for
the 'gcc_checking_assert (!from_spec)': "If we cannot set no-warning
dispositions for 'to', ascertain that we don't have any for 'from'.
Otherwise, we'd lose these."?  If the rationale is correct, then
observing that in 'gcc/warning-control.cc:copy_warning' this
currently "triggers during GCC build" is something to be l

Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Iain Sandoe via Gcc-patches
Hi Paul,

> On 1 Sep 2021, at 20:28, Koning, Paul via Gcc-patches 
>  wrote:
> 
>> On Sep 1, 2021, at 3:08 PM, Jeff Law via Gcc-patches 
>>  wrote:
>> On 9/1/2021 12:57 PM, Koning, Paul wrote:
>>> 
 On Sep 1, 2021, at 1:35 PM, Jeff Law via Gcc-patches 
  wrote:
 
 Generally OK.  There's some C++ front-end bits that Jason ought to take a 
 quick looksie at.   Second, how does this interact with targets that allow 
 objects at address 0?   We have a few targets like that and that makes me 
 wonder if we should be suppressing some, if not all, of these warnings for 
 targets that turn on -fno-delete-null-pointer-checks?
>>> But in C, the pointer constant 0 represents the null (invalid) pointer, not 
>>> the actual address zero necessarily.
>>> 
>>> If a target supports objects at address zero, how does it represent the 
>>> pointer value 0 (which we usually refer to as NULL)?  Is the issue simply 
>>> ignored?  It seems to me it is in pdp11, which I would guess is one of the 
>>> targets for which objects at address 0 make sense.
>> The issue is ignored to the best of my knowledge.
> 
> If so, then I would think that ignoring it for this patch as well is 
> reasonable.  If in a given target a pointer that C thinks of as NULL is in 
> fact a valid object pointer, then all sorts of optimizations are incorrect.  
> If the target really cares, it can use a different representation for the 
> null pointer.  (Does GCC give us a way to do that?)  For example, pdp11 could 
> use the all-ones bit pattern to represent an invalid pointer.

regardless of whether GCC supports it or not - trying to use a non-0 NULL 
pointer is likely to break massive amounts of code in the wild.

It might, OTOH, be possible to use a non-0 special value to represent the valid 
0 address-use (providing that there is somewhere in the address space you can 
steal that from).

I wonder what things like m68k do that have vector tables at 0 …

Iain



Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Koning, Paul via Gcc-patches



> On Sep 1, 2021, at 3:08 PM, Jeff Law via Gcc-patches 
>  wrote:
> 
> 
> 
> On 9/1/2021 12:57 PM, Koning, Paul wrote:
>> 
>>> On Sep 1, 2021, at 1:35 PM, Jeff Law via Gcc-patches 
>>>  wrote:
>>> 
>>> Generally OK.  There's some C++ front-end bits that Jason ought to take a 
>>> quick looksie at.   Second, how does this interact with targets that allow 
>>> objects at address 0?   We have a few targets like that and that makes me 
>>> wonder if we should be suppressing some, if not all, of these warnings for 
>>> targets that turn on -fno-delete-null-pointer-checks?
>> But in C, the pointer constant 0 represents the null (invalid) pointer, not 
>> the actual address zero necessarily.
>> 
>> If a target supports objects at address zero, how does it represent the 
>> pointer value 0 (which we usually refer to as NULL)?  Is the issue simply 
>> ignored?  It seems to me it is in pdp11, which I would guess is one of the 
>> targets for which objects at address 0 make sense.
> The issue is ignored to the best of my knowledge.

If so, then I would think that ignoring it for this patch as well is 
reasonable.  If in a given target a pointer that C thinks of as NULL is in fact 
a valid object pointer, then all sorts of optimizations are incorrect.  If the 
target really cares, it can use a different representation for the null 
pointer.  (Does GCC give us a way to do that?)  For example, pdp11 could use 
the all-ones bit pattern to represent an invalid pointer.

paul



Re: C++ patch ping

2021-09-01 Thread Jason Merrill via Gcc-patches

On 8/30/21 3:11 AM, Jakub Jelinek wrote:

Hi!

I'd like to ping the following patches

libcpp: __VA_OPT__ p1042r1 placemarker changes [PR101488]
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575621.html
together with your
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577602.html
incremental patch (successfully tested on x86_64-linux and i686-linux).


OK, thanks.


libcpp, v2: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode 
Standard Annex 31 [PR100977]
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576854.html
The incremental patch for splitting tokens at unsupported characters
withdrawn, the above is the base patch.


My reply to that patch approved it with a suggestion for a tweak to 
ucn_valid_in_identifier.  Quoting it here:



@@ -998,6 +998,28 @@ ucn_valid_in_identifier (cpp_reader *pfi
  nst->previous = c;
nst->prev_class = ucnranges[mn].combine;
  +  if (!CPP_PEDANTIC (pfile))
+{
+  /* If not -pedantic, accept as character that may
+ begin an identifier a union of characters allowed
+ at that position in each of the character sets.  */
+  if ((ucnranges[mn].flags & (C99 | N99)) == C99
+  || (ucnranges[mn].flags & CXX) != 0
+  || (ucnranges[mn].flags & (C11 | N11)) == C11
+  || (ucnranges[mn].flags & (CXX23 | NXX23)) == CXX23)
+return 1;
+  return 2;
+}
+
+  if (CPP_OPTION (pfile, cxx23_identifiers))
+invalid_start_flags = NXX23;
+  else if (CPP_OPTION (pfile, c11_identifiers))
+invalid_start_flags = N11;
+  else if (CPP_OPTION (pfile, c99))
+invalid_start_flags = N99;
+  else
+invalid_start_flags = 0;
+
/* In C99, UCN digits may not begin identifiers.  In C11 and C++11,
   UCN combining characters may not begin identifiers.  */
if (ucnranges[mn].flags & invalid_start_flags)


I might check invalid_start_flags first, and return 1 if not set, then check all the other flags when not pedantic, and finally return 2 if nothing matches.  OK with or without this change. 


Jason



Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Jason Merrill via Gcc-patches

On 8/31/21 10:08 PM, Martin Sebor wrote:

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

   int f (void)
   {
     int a[2][2];
     return &a == 0;
   }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

   int f (void)
   {
     int a[2][2];
     return a[0] == 0;
   }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

   int g (int i)
   {
     return a[0] + i == 0;
   }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.



+  while (TREE_CODE (cop) == ARRAY_REF
+|| TREE_CODE (cop) == COMPONENT_REF)
+   {
+ unsigned opno = TREE_CODE (cop) == COMPONENT_REF;
+ cop = TREE_OPERAND (cop, opno);
+   }


1) Maybe 'while (handled_component_p (cop))'?
2) Why handle COMPONENT_REF differently?  Operand 1 is the FIELD_DECL, 
which doesn't have an address of its own.


Jason



[pushed] c++: Add test for fixed PR [PR101592]

2021-09-01 Thread Marek Polacek via Gcc-patches
Fixed by my c++/99701 patch.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/101592

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wlogical-op-3.C: New test.
---
 gcc/testsuite/g++.dg/warn/Wlogical-op-3.C | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wlogical-op-3.C

diff --git a/gcc/testsuite/g++.dg/warn/Wlogical-op-3.C 
b/gcc/testsuite/g++.dg/warn/Wlogical-op-3.C
new file mode 100644
index 000..50b09d57be5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wlogical-op-3.C
@@ -0,0 +1,12 @@
+// PR c++/101592
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -Wlogical-op" }
+
+decltype(nullptr) foo ();
+
+bool
+bar ()
+{
+  return foo () > nullptr // { dg-error "ordered comparison" }
+|| foo () < nullptr; // { dg-error "ordered comparison" }
+}

base-commit: fbb334a6acc5cc5d8944712daeda8089ef1d7fd2
-- 
2.31.1



Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Jeff Law via Gcc-patches




On 9/1/2021 12:57 PM, Koning, Paul wrote:



On Sep 1, 2021, at 1:35 PM, Jeff Law via Gcc-patches  
wrote:

Generally OK.  There's some C++ front-end bits that Jason ought to take a quick 
looksie at.   Second, how does this interact with targets that allow objects at 
address 0?   We have a few targets like that and that makes me wonder if we 
should be suppressing some, if not all, of these warnings for targets that turn 
on -fno-delete-null-pointer-checks?

But in C, the pointer constant 0 represents the null (invalid) pointer, not the 
actual address zero necessarily.

If a target supports objects at address zero, how does it represent the pointer 
value 0 (which we usually refer to as NULL)?  Is the issue simply ignored?  It 
seems to me it is in pdp11, which I would guess is one of the targets for which 
objects at address 0 make sense.

The issue is ignored to the best of my knowledge.

Jeff


Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Koning, Paul via Gcc-patches



> On Sep 1, 2021, at 1:35 PM, Jeff Law via Gcc-patches 
>  wrote:
> 
> Generally OK.  There's some C++ front-end bits that Jason ought to take a 
> quick looksie at.   Second, how does this interact with targets that allow 
> objects at address 0?   We have a few targets like that and that makes me 
> wonder if we should be suppressing some, if not all, of these warnings for 
> targets that turn on -fno-delete-null-pointer-checks?

But in C, the pointer constant 0 represents the null (invalid) pointer, not the 
actual address zero necessarily.

If a target supports objects at address zero, how does it represent the pointer 
value 0 (which we usually refer to as NULL)?  Is the issue simply ignored?  It 
seems to me it is in pdp11, which I would guess is one of the targets for which 
objects at address 0 make sense.

paul



Re: [PATCH] libiberty, configure, Darwin: Avoid detecting deprecated sbrk.

2021-09-01 Thread Jeff Law via Gcc-patches




On 8/30/2021 1:29 PM, Iain Sandoe wrote:

Hi,

Darwin provides an implementation of sbrk, which is detected by the
libiberty configuration process.

However, (like most of the BSD-derivatives) sbrk/brk are deprecated on
Darwin which leads to build-time warnings.  It seems that the configure
process does not see the deprecation warnings as reason for excluding
the fn.

Darwin should use the malloc-based implementation.

This patch works around the issue by removing sbrk from the functions
searched (for Darwin only, although it’s likely that other BSD-ish ports
might wish to do the same).

Open to more elegant solutions, of course,
tested on powerpc,i686,x86_64-darwin, x86_64, powerpc64- linux,

OK for master?
thanks Iain

Signed-off-by: Iain Sandoe 

libiberty/ChangeLog:

* configure: Regenerate.
* configure.ac: Do not search for sbrk on Darwin.
* xmalloc.c: Do not declare sbrk unless it has been found
by configure.

OK
jeff



Re: [PATCH] warn for more impossible null pointer tests

2021-09-01 Thread Jeff Law via Gcc-patches




On 8/31/2021 8:08 PM, Martin Sebor via Gcc-patches wrote:

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

  int f (void)
  {
    int a[2][2];
    return &a == 0;
  }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

  int f (void)
  {
    int a[2][2];
    return a[0] == 0;
  }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

  int g (int i)
  {
    return a[0] + i == 0;
  }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c. That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.


gcc-102103.diff

Enhance -Waddress to detect more suspicious expressions.

Resolves:
PR c/102103 - missing warning comparing array address to null


gcc/ChangeLog:

* doc/invoke.texi (-Waddress): Update.

gcc/c-family/ChangeLog:

* c-common.c (decl_with_nonnull_addr_p): Handle members.

gcc/c/ChangeLog:

* c-typeck.c (maybe_warn_for_null_address): New function.
(build_binary_op): Call it.

gcc/cp/ChangeLog:

* typeck.c (warn_for_null_address): Enhance.
(cp_build_binary_op): Call it also for member pointers.

gcc/fortran/ChangeLog:

* gcc/fortran/array.c: Remove an unnecessary test.
* gcc/fortran/trans-array.c: Same.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-array-ptr10.C: Suppress a valid warning.
* g++.dg/warn/Wreturn-local-addr-6.C: Correct a cast.
* gcc.dg/Waddress.c: Expect a warning.
* c-c++-common/Waddress-3.c: New test.
* c-c++-common/Waddress-4.c: New test.
* g++.dg/warn/Waddress-5.C: New test.
Generally OK.  There's some C++ front-end bits that Jason ought to take 
a quick looksie at.   Second, how does this interact with targets that 
allow objects at address 0?   We have a few targets like that and that 
makes me wonder if we should be suppressing some, if not all, of these 
warnings for targets that turn on -fno-delete-null-pointer-checks?


Jeff


Re: DWARF for extern variable

2021-09-01 Thread Indu Bhagat via Gcc-patches

On 8/24/21 12:55 AM, Richard Biener wrote:

On Mon, Aug 23, 2021 at 11:18 PM Indu Bhagat via Gcc-patches
 wrote:


Hello,

What is the expected DWARF for extern variable in the following cases? I
am seeing that the DWARF generated is different with gcc8.4.1 vs gcc-trunk.


Testcase 2
--
extern const char a[];
const char a[] = "testme";

Testcase 2 Behavior

- Both gcc-trunk and gcc8.4.1 generate two DW_TAG_variable DIEs (the
defining decl holds the reference to the non-defining decl via
DW_AT_specification)
- But gcc8.4.1 does not generate any DWARF for the type of the defining
decl (const char[7]) but gcc-trunk does.

## DWARF for testcase 2 with gcc-trunk is as follows:
<...>
   <1><22>: Abbrev Number: 2 (DW_TAG_array_type)
  <23>   DW_AT_type: <0x39>
  <27>   DW_AT_sibling : <0x2d>
   <2><2b>: Abbrev Number: 5 (DW_TAG_subrange_type)
   <2><2c>: Abbrev Number: 0
   <1><2d>: Abbrev Number: 1 (DW_TAG_const_type)
  <2e>   DW_AT_type: <0x22>
   <1><32>: Abbrev Number: 3 (DW_TAG_base_type)
  <33>   DW_AT_byte_size   : 1
  <34>   DW_AT_encoding: 6(signed char)
  <35>   DW_AT_name: (indirect string, offset: 0x2035): char
   <1><39>: Abbrev Number: 1 (DW_TAG_const_type)
  <3a>   DW_AT_type: <0x32>
   <1><3e>: Abbrev Number: 6 (DW_TAG_variable)
  <3f>   DW_AT_name: a
  <41>   DW_AT_decl_file   : 1
  <42>   DW_AT_decl_line   : 1
  <43>   DW_AT_decl_column : 19
  <44>   DW_AT_type: <0x2d>
  <48>   DW_AT_external: 1
  <48>   DW_AT_declaration : 1
   <1><48>: Abbrev Number: 2 (DW_TAG_array_type)
  <49>   DW_AT_type: <0x39>
  <4d>   DW_AT_sibling : <0x58>
   <2><51>: Abbrev Number: 7 (DW_TAG_subrange_type)
  <52>   DW_AT_type: <0x5d>
  <56>   DW_AT_upper_bound : 6
   <2><57>: Abbrev Number: 0
   <1><58>: Abbrev Number: 1 (DW_TAG_const_type)
  <59>   DW_AT_type: <0x48>
   <1><5d>: Abbrev Number: 3 (DW_TAG_base_type)
  <5e>   DW_AT_byte_size   : 8
  <5f>   DW_AT_encoding: 7(unsigned)
  <60>   DW_AT_name: (indirect string, offset: 0x2023): long
unsigned int
   <1><64>: Abbrev Number: 8 (DW_TAG_variable)
  <65>   DW_AT_specification: <0x3e>
  <69>   DW_AT_decl_line   : 2
  <6a>   DW_AT_decl_column : 12
  <6b>   DW_AT_type: <0x58>


I suppose having both a DW_AT_specification and a DW_AT_type
is somewhat at odds.  It's likely because the definition specifies
the size of the array while the specification does not.  Not sure
what should be best done here.

Richard.


Hmm..I thought the generated DWARF by gcc-trunk for testcase 2 is 
coherent and specifies the information in alignment with the source : 
DW_AT_type of the defining declaration correctly specifies the type to 
be const char[7] while the DW_AT_specification pointing to the 
non-defining decl (and with type const char[] with no size info).


The DWARF generated by gcc-8.4.1, however, does seem to be missing 
information though. It should have the information for the defining decl 
and hence, the size info. i.e., DW_AT_type pointing to a array with 
DW_TAG_subrange_type with attribute DW_AT_upper_bound = 6 like above. 
Isn't it ?


Indu




  <6f>   DW_AT_location: 9 byte block: 3 0 0 0 0 0 0 0 0
(DW_OP_addr: 0)
   <1><79>: Abbrev Number: 0

## DWARF for testcase 2 with gcc8.4.1 is as follows:
   <1><21>: Abbrev Number: 2 (DW_TAG_array_type)
  <22>   DW_AT_type: <0x38>
  <26>   DW_AT_sibling : <0x2c>
   <2><2a>: Abbrev Number: 3 (DW_TAG_subrange_type)
   <2><2b>: Abbrev Number: 0
   <1><2c>: Abbrev Number: 4 (DW_TAG_const_type)
  <2d>   DW_AT_type: <0x21>
   <1><31>: Abbrev Number: 5 (DW_TAG_base_type)
  <32>   DW_AT_byte_size   : 1
  <33>   DW_AT_encoding: 6(signed char)
  <34>   DW_AT_name: (indirect string, offset: 0x1e04): char
   <1><38>: Abbrev Number: 4 (DW_TAG_const_type)
  <39>   DW_AT_type: <0x31>
   <1><3d>: Abbrev Number: 6 (DW_TAG_variable)
  <3e>   DW_AT_name: a
  <40>   DW_AT_decl_file   : 1
  <41>   DW_AT_decl_line   : 1
  <42>   DW_AT_decl_column : 19
  <43>   DW_AT_type: <0x2c>
  <47>   DW_AT_external: 1
  <47>   DW_AT_declaration : 1
   <1><47>: Abbrev Number: 5 (DW_TAG_base_type)
  <48>   DW_AT_byte_size   : 8
  <49>   DW_AT_encoding: 7(unsigned)
  <4a>   DW_AT_name: (indirect string, offset: 0x1df2): long
unsigned int
   <1><4e>: Abbrev Number: 7 (DW_TAG_variable)
  <4f>   DW_AT_specification: <0x3d>
  <53>   DW_AT_decl_line   : 2
  <54>   DW_AT_decl_column : 12
  <55>   DW_AT_location: 9 byte block: 3 0 0 0 0 0 0 0 0
(DW_OP_addr: 0)
   <1><5f>: Abbrev Number: 0

Thanks
Indu




[PATCH 18/18] rs6000: Add escape-newline support for builtins files

2021-09-01 Thread Bill Schmidt via Gcc-patches
2021-08-19  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def (VEC_INIT_V16QI): Use
escape-newline support.
(VEC_INIT_V4SI): Likewise.
(VEC_INIT_V8HI): Likewise.
(PACK_V1TI): Likewise.
(DIVDEU): Likewise.
(VFIRSTMISMATCHOREOSINDEX_V16QI): Likewise.
(VFIRSTMISMATCHOREOSINDEX_V8HI): Likewise.
(VFIRSTMISMATCHOREOSINDEX_V4SI): Likewise.
(CMPRB2): Likewise.
(VSTDCP): Likewise.
(VSIEDP): Likewise.
(FMAF128_ODD): Likewise.
(VSCEQPUO): Likewise.
(VSIEQP): Likewise.
(VSIEQPF): Likewise.
(VSTDCQP): Likewise.
(PACK_TD): Likewise.
(TABORTDC): Likewise.
(TABORTDCI): Likewise.
(SE_LXVRBX): Likewise.
(SE_LXVRHX): Likewise.
(SE_LXVRWX): Likewise.
(SE_LXVRDX): Likewise.
(VREPLACE_UN_UV2DI): Likewise.
(VREPLACE_UN_UV4SI): Likewise.
(VREPLACE_UN_V2DI): Likewise.
(VREPLACE_ELT_UV2DI): Likewise.
(VREPLACE_ELT_V2DI): Likewise.
(ZE_LXVRBX): Likewise.
(ZE_LXVRHX): Likewise.
(ZE_LXVRWX): Likewise.
(ZE_LXVRDX): Likewise.
(CFUGED): Likewise.
(CNTLZDM): Likewise.
(CNTTZDM): Likewise.
(PDEPD): Likewise.
(PEXTD): Likewise.
(PMXVBF16GER2): Likewise.
(PMXVBF16GER2_INTERNAL): Likewise.
(PMXVBF16GER2NN): Likewise.
(PMXVBF16GER2NN_INTERNAL): Likewise.
(PMXVBF16GER2NP): Likewise.
(PMXVBF16GER2NP_INTERNAL): Likewise.
(PMXVBF16GER2PN): Likewise.
(PMXVBF16GER2PN_INTERNAL): Likewise.
(PMXVBF16GER2PP): Likewise.
(PMXVBF16GER2PP_INTERNAL): Likewise.
(PMXVF16GER2): Likewise.
(PMXVF16GER2_INTERNAL): Likewise.
(PMXVF16GER2NN): Likewise.
(PMXVF16GER2NN_INTERNAL): Likewise.
(PMXVF16GER2NP): Likewise.
(PMXVF16GER2NP_INTERNAL): Likewise.
(PMXVF16GER2PN): Likewise.
(PMXVF16GER2PN_INTERNAL): Likewise.
(PMXVF16GER2PP): Likewise.
(PMXVF16GER2PP_INTERNAL): Likewise.
(PMXVF32GER_INTERNAL): Likewise.
(PMXVF32GERNN): Likewise.
(PMXVF32GERNN_INTERNAL): Likewise.
(PMXVF32GERNP): Likewise.
(PMXVF32GERNP_INTERNAL): Likewise.
(PMXVF32GERPN): Likewise.
(PMXVF32GERPN_INTERNAL): Likewise.
(PMXVF32GERPP): Likewise.
(PMXVF32GERPP_INTERNAL): Likewise.
(PMXVF64GER): Likewise.
(PMXVF64GER_INTERNAL): Likewise.
(PMXVF64GERNN): Likewise.
(PMXVF64GERNN_INTERNAL): Likewise.
(PMXVF64GERNP): Likewise.
(PMXVF64GERNP_INTERNAL): Likewise.
(PMXVF64GERPN): Likewise.
(PMXVF64GERPN_INTERNAL): Likewise.
(PMXVF64GERPP): Likewise.
(PMXVF64GERPP_INTERNAL): Likewise.
(PMXVI16GER2): Likewise.
(PMXVI16GER2_INTERNAL): Likewise.
(PMXVI16GER2PP): Likewise.
(PMXVI16GER2PP_INTERNAL): Likewise.
(PMXVI16GER2S): Likewise.
(PMXVI16GER2S_INTERNAL): Likewise.
(PMXVI16GER2SPP): Likewise.
(PMXVI16GER2SPP_INTERNAL): Likewise.
(PMXVI4GER8): Likewise.
(PMXVI4GER8_INTERNAL): Likewise.
(PMXVI4GER8PP): Likewise.
(PMXVI4GER8PP_INTERNAL): Likewise.
(PMXVI8GER4): Likewise.
(PMXVI8GER4_INTERNAL): Likewise.
(PMXVI8GER4PP): Likewise.
(PMXVI8GER4PP_INTERNAL): Likewise.
(PMXVI8GER4SPP): Likewise.
(PMXVI8GER4SPP_INTERNAL): Likewise.
* config/rs6000/rs6000-gen-builtins.c (MAXLINES): New macro.
(lines): New variable.
(lastline): Likewise.
(real_line_pos): New function.
(diag): Change signature.
(bif_diag): Change signature; support escape-newline handling.
(ovld_diag): Likewise.
(fatal): Move earlier.
(consume_whitespace): Adjust diag call.
(advance_line): Add escape-newline handling; call fatal.
(safe_inc_pos): Adjust diag call.
(match_identifier): Likewise.
(match_integer): Likewise.
(match_to_right_bracket): Call fatal instead of diag; adjust diag
call.
(match_basetype): Adjust diag calls.
(match_bracketed_pair): Likewise.
(match_const_restriction): Likewise.
(match_type): Likewise.
(parse_args): Likewise.
(parse_bif_attrs): Likewise.
(complete_vector_type): Likewise.
(complete_base_type): Likewise.
(parse_prototype): Likewise.
(parse_bif_entry): Likewise.
(parse_bif_stanza): Likewise.
(parse_ovld_entry): Likewise.
(parse_ovld_stanza): Likewise.
(main): Allocate buffers for lines[].
---
 gcc/config/rs6000/rs6000-builtin-new.def | 288 +++
 gcc/config/rs6000/rs6000-gen-builtins.c  | 280 +-
 2 files changed, 358 insertions(+), 210 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gc

[PATCH 16/18] rs6000: Test case adjustments

2021-09-01 Thread Bill Schmidt via Gcc-patches
2021-07-19  Bill Schmidt  

gcc/testsuite/
* gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-extract-sig-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Adjust.
* gcc.target/powerpc/bfp/scalar-insert-exp-8.c: Adjust.
* gcc.target/powerpc/bfp/scalar-test-neg-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-test-neg-3.c: Adjust.
* gcc.target/powerpc/bfp/scalar-test-neg-5.c: Adjust.
* gcc.target/powerpc/byte-in-set-2.c: Adjust.
* gcc.target/powerpc/cmpb-2.c: Adjust.
* gcc.target/powerpc/cmpb32-2.c: Adjust.
* gcc.target/powerpc/crypto-builtin-2.c: Adjust.
* gcc.target/powerpc/fold-vec-splat-floatdouble.c: Adjust.
* gcc.target/powerpc/fold-vec-splat-longlong.c: Adjust.
* gcc.target/powerpc/fold-vec-splat-misc-invalid.c: Adjust.
* gcc.target/powerpc/int_128bit-runnable.c: Adjust.
* gcc.target/powerpc/p8vector-builtin-8.c: Adjust.
* gcc.target/powerpc/pr80315-1.c: Adjust.
* gcc.target/powerpc/pr80315-2.c: Adjust.
* gcc.target/powerpc/pr80315-3.c: Adjust.
* gcc.target/powerpc/pr80315-4.c: Adjust.
* gcc.target/powerpc/pr88100.c: Adjust.
* gcc.target/powerpc/pragma_misc9.c: Adjust.
* gcc.target/powerpc/pragma_power8.c: Adjust.
* gcc.target/powerpc/pragma_power9.c: Adjust.
* gcc.target/powerpc/test_fpscr_drn_builtin_error.c: Adjust.
* gcc.target/powerpc/test_fpscr_rn_builtin_error.c: Adjust.
* gcc.target/powerpc/test_mffsl.c: Adjust.
* gcc.target/powerpc/vec-gnb-2.c: Adjust.
* gcc.target/powerpc/vsu/vec-all-nez-7.c: Adjust.
* gcc.target/powerpc/vsu/vec-any-eqz-7.c: Adjust.
* gcc.target/powerpc/vsu/vec-cmpnez-7.c: Adjust.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c: Adjust.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c: Adjust.
* gcc.target/powerpc/vsu/vec-xst-len-12.c: Adjust.
* gcc.target/powerpc/vsu/vec-xst-len-13.c: Adjust.
---
 .../gcc.target/powerpc/bfp/scalar-extract-exp-2.c  |  2 +-
 .../gcc.target/powerpc/bfp/scalar-extract-sig-2.c  |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-2.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-5.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-8.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-2.c |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-3.c |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-5.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c  |  2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c|  2 +-
 .../gcc.target/powerpc/crypto-builtin-2.c  | 14 +++---
 .../powerpc/fold-vec-splat-floatdouble.c   |  4 ++--
 .../gcc.target/powerpc/fold-vec-splat-longlong.c   | 10 +++---
 .../powerpc/fold-vec-splat-misc-invalid.c  |  8 
 .../gcc.target/powerpc/int_128bit-runnable.c   |  6 +++---
 .../gcc.target/powerpc/p8vector-builtin-8.c|  1 +
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr88100.c | 12 ++--
 gcc/testsuite/gcc.target/powerpc/pragma_misc9.c|  2 +-
 gcc/testsuite/gcc.target/powerpc/pragma_power8.c   |  2 ++
 gcc/testsuite/gcc.target/powerpc/pragma_power9.c   |  3 +++
 .../powerpc/test_fpscr_drn_builtin_error.c |  4 ++--
 .../powerpc/test_fpscr_rn_builtin_error.c  | 12 ++--
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c  |  3 ++-
 gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c   |  2 +-
 .../gcc.target/powerpc/vsu/vec-all-nez-7.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-any-eqz-7.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cmpnez-7.c  |  2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c  |  2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c  |  2 +-
 .../gcc.target/powerpc/vsu/vec-xl-len-13.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-xst-len-12.c|  2 +-
 36 files changed, 65 insertions(+), 62 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
index 922180675fc..53b67c95cf9 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
@@ -14,7 +14,7 @@ get_exponent (double *p)
 {
   double source = *p;
 
-  return scalar_extract_exp (source);  /* { dg-error 
"'__builtin_vec_scalar_extract_exp' is not supported in this compiler 
configuration" } */
+  return scalar_extract_exp (source);  /* { dg-error 
"'__builtin_

[PATCH 17/18] rs6000: Enable the new builtin support

2021-09-01 Thread Bill Schmidt via Gcc-patches
2021-03-05  Bill Schmidt  

gcc/
* config/rs6000/rs6000-gen-builtins.c (write_init_file):
Initialize new_builtins_are_live to 1.
---
 gcc/config/rs6000/rs6000-gen-builtins.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
b/gcc/config/rs6000/rs6000-gen-builtins.c
index 7f711210aff..fdef65fe1d4 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -2791,7 +2791,7 @@ write_init_file (void)
   fprintf (init_file, "#include \"rs6000-builtins.h\"\n");
   fprintf (init_file, "\n");
 
-  fprintf (init_file, "int new_builtins_are_live = 0;\n\n");
+  fprintf (init_file, "int new_builtins_are_live = 1;\n\n");
 
   fprintf (init_file, "tree rs6000_builtin_decls_x[RS6000_OVLD_MAX];\n\n");
 
-- 
2.27.0



[PATCH 15/18] rs6000: Update altivec.h for automated interfaces

2021-09-01 Thread Bill Schmidt via Gcc-patches
2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/altivec.h: Delete a number of #defines that are
now superfluous.  Alphabetize.  Include rs6000-vecdefines.h.
Include some synonyms.
---
 gcc/config/rs6000/altivec.h | 519 +++-
 1 file changed, 38 insertions(+), 481 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 5b631c7ebaf..9dfa285ccd1 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -55,32 +55,36 @@
 #define __CR6_LT   2
 #define __CR6_LT_REV   3
 
-/* Synonyms.  */
+#include "rs6000-vecdefines.h"
+
+/* Deprecated interfaces.  */
+#define vec_lvx vec_ld
+#define vec_lvxl vec_ldl
+#define vec_stvx vec_st
+#define vec_stvxl vec_stl
 #define vec_vaddcuw vec_addc
 #define vec_vand vec_and
 #define vec_vandc vec_andc
-#define vec_vrfip vec_ceil
 #define vec_vcmpbfp vec_cmpb
 #define vec_vcmpgefp vec_cmpge
 #define vec_vctsxs vec_cts
 #define vec_vctuxs vec_ctu
 #define vec_vexptefp vec_expte
-#define vec_vrfim vec_floor
-#define vec_lvx vec_ld
-#define vec_lvxl vec_ldl
 #define vec_vlogefp vec_loge
 #define vec_vmaddfp vec_madd
 #define vec_vmhaddshs vec_madds
-#define vec_vmladduhm vec_mladd
 #define vec_vmhraddshs vec_mradds
+#define vec_vmladduhm vec_mladd
 #define vec_vnmsubfp vec_nmsub
 #define vec_vnor vec_nor
 #define vec_vor vec_or
-#define vec_vpkpx vec_packpx
 #define vec_vperm vec_perm
-#define vec_permxor __builtin_vec_vpermxor
+#define vec_vpkpx vec_packpx
 #define vec_vrefp vec_re
+#define vec_vrfim vec_floor
 #define vec_vrfin vec_round
+#define vec_vrfip vec_ceil
+#define vec_vrfiz vec_trunc
 #define vec_vrsqrtefp vec_rsqrte
 #define vec_vsel vec_sel
 #define vec_vsldoi vec_sld
@@ -91,440 +95,53 @@
 #define vec_vspltisw vec_splat_s32
 #define vec_vsr vec_srl
 #define vec_vsro vec_sro
-#define vec_stvx vec_st
-#define vec_stvxl vec_stl
 #define vec_vsubcuw vec_subc
 #define vec_vsum2sws vec_sum2s
 #define vec_vsumsws vec_sums
-#define vec_vrfiz vec_trunc
 #define vec_vxor vec_xor
 
+/* For _ARCH_PWR8.  Always define to support #pragma GCC target.  */
+#define vec_vclz vec_cntlz
+#define vec_vgbbd vec_gb
+#define vec_vmrgew vec_mergee
+#define vec_vmrgow vec_mergeo
+#define vec_vpopcntu vec_popcnt
+#define vec_vrld vec_rl
+#define vec_vsld vec_sl
+#define vec_vsrd vec_sr
+#define vec_vsrad vec_sra
+
+/* For _ARCH_PWR9.  Always define to support #pragma GCC target.  */
+#define vec_extract_fp_from_shorth vec_extract_fp32_from_shorth
+#define vec_extract_fp_from_shortl vec_extract_fp32_from_shortl
+#define vec_vctz vec_cnttz
+
+/* Synonyms.  */
 /* Functions that are resolved by the backend to one of the
typed builtins.  */
-#define vec_vaddfp __builtin_vec_vaddfp
-#define vec_addc __builtin_vec_addc
-#define vec_adde __builtin_vec_adde
-#define vec_addec __builtin_vec_addec
-#define vec_vaddsws __builtin_vec_vaddsws
-#define vec_vaddshs __builtin_vec_vaddshs
-#define vec_vaddsbs __builtin_vec_vaddsbs
-#define vec_vavgsw __builtin_vec_vavgsw
-#define vec_vavguw __builtin_vec_vavguw
-#define vec_vavgsh __builtin_vec_vavgsh
-#define vec_vavguh __builtin_vec_vavguh
-#define vec_vavgsb __builtin_vec_vavgsb
-#define vec_vavgub __builtin_vec_vavgub
-#define vec_ceil __builtin_vec_ceil
-#define vec_cmpb __builtin_vec_cmpb
-#define vec_vcmpeqfp __builtin_vec_vcmpeqfp
-#define vec_cmpge __builtin_vec_cmpge
-#define vec_vcmpgtfp __builtin_vec_vcmpgtfp
-#define vec_vcmpgtsw __builtin_vec_vcmpgtsw
-#define vec_vcmpgtuw __builtin_vec_vcmpgtuw
-#define vec_vcmpgtsh __builtin_vec_vcmpgtsh
-#define vec_vcmpgtuh __builtin_vec_vcmpgtuh
-#define vec_vcmpgtsb __builtin_vec_vcmpgtsb
-#define vec_vcmpgtub __builtin_vec_vcmpgtub
-#define vec_vcfsx __builtin_vec_vcfsx
-#define vec_vcfux __builtin_vec_vcfux
-#define vec_cts __builtin_vec_cts
-#define vec_ctu __builtin_vec_ctu
-#define vec_cpsgn __builtin_vec_copysign
-#define vec_double __builtin_vec_double
-#define vec_doublee __builtin_vec_doublee
-#define vec_doubleo __builtin_vec_doubleo
-#define vec_doublel __builtin_vec_doublel
-#define vec_doubleh __builtin_vec_doubleh
-#define vec_expte __builtin_vec_expte
-#define vec_float __builtin_vec_float
-#define vec_float2 __builtin_vec_float2
-#define vec_floate __builtin_vec_floate
-#define vec_floato __builtin_vec_floato
-#define vec_floor __builtin_vec_floor
-#define vec_loge __builtin_vec_loge
-#define vec_madd __builtin_vec_madd
-#define vec_madds __builtin_vec_madds
-#define vec_mtvscr __builtin_vec_mtvscr
-#define vec_reve __builtin_vec_vreve
-#define vec_vmaxfp __builtin_vec_vmaxfp
-#define vec_vmaxsw __builtin_vec_vmaxsw
-#define vec_vmaxsh __builtin_vec_vmaxsh
-#define vec_vmaxsb __builtin_vec_vmaxsb
-#define vec_vminfp __builtin_vec_vminfp
-#define vec_vminsw __builtin_vec_vminsw
-#define vec_vminsh __builtin_vec_vminsh
-#define vec_vminsb __builtin_vec_vminsb
-#define vec_mradds __builtin_vec_mradds
-#define vec_vmsumshm __builtin_vec_vmsumshm
-#define vec_vmsumu

[PATCH 14/18] rs6000: Debug support

2021-09-01 Thread Bill Schmidt via Gcc-patches
2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_debug_type): New function.
(def_builtin): Change debug formatting for easier parsing and
include more information.
(rs6000_init_builtins): Add dump of autogenerated builtins.
(altivec_init_builtins): Dump __builtin_altivec_mask_for_load for
completeness.
---
 gcc/config/rs6000/rs6000-call.c | 191 +++-
 1 file changed, 185 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b9ce3f28f9a..b6f669f06a5 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -8880,6 +8880,106 @@ rs6000_gimplify_va_arg (tree valist, tree type, 
gimple_seq *pre_p,
 
 /* Builtins.  */
 
+/* Debug utility to translate a type node to a single token.  */
+static
+const char *rs6000_debug_type (tree type)
+{
+  if (type == void_type_node)
+return "void";
+  else if (type == long_integer_type_node)
+return "long";
+  else if (type == long_unsigned_type_node)
+return "ulong";
+  else if (type == long_long_integer_type_node)
+return "longlong";
+  else if (type == long_long_unsigned_type_node)
+return "ulonglong";
+  else if (type == bool_V2DI_type_node)
+return "vbll";
+  else if (type == bool_V4SI_type_node)
+return "vbi";
+  else if (type == bool_V8HI_type_node)
+return "vbs";
+  else if (type == bool_V16QI_type_node)
+return "vbc";
+  else if (type == bool_int_type_node)
+return "bool";
+  else if (type == dfloat64_type_node)
+return "_Decimal64";
+  else if (type == double_type_node)
+return "double";
+  else if (type == intDI_type_node)
+return "sll";
+  else if (type == intHI_type_node)
+return "ss";
+  else if (type == ibm128_float_type_node)
+return "__ibm128";
+  else if (type == opaque_V4SI_type_node)
+return "opaque";
+  else if (POINTER_TYPE_P (type))
+return "void*";
+  else if (type == intQI_type_node || type == char_type_node)
+return "sc";
+  else if (type == dfloat32_type_node)
+return "_Decimal32";
+  else if (type == float_type_node)
+return "float";
+  else if (type == intSI_type_node || type == integer_type_node)
+return "si";
+  else if (type == dfloat128_type_node)
+return "_Decimal128";
+  else if (type == long_double_type_node)
+return "longdouble";
+  else if (type == intTI_type_node)
+return "sq";
+  else if (type == unsigned_intDI_type_node)
+return "ull";
+  else if (type == unsigned_intHI_type_node)
+return "us";
+  else if (type == unsigned_intQI_type_node)
+return "uc";
+  else if (type == unsigned_intSI_type_node)
+return "ui";
+  else if (type == unsigned_intTI_type_node)
+return "uq";
+  else if (type == unsigned_V1TI_type_node)
+return "vuq";
+  else if (type == unsigned_V2DI_type_node)
+return "vull";
+  else if (type == unsigned_V4SI_type_node)
+return "vui";
+  else if (type == unsigned_V8HI_type_node)
+return "vus";
+  else if (type == unsigned_V16QI_type_node)
+return "vuc";
+  else if (type == V16QI_type_node)
+return "vsc";
+  else if (type == V1TI_type_node)
+return "vsq";
+  else if (type == V2DF_type_node)
+return "vd";
+  else if (type == V2DI_type_node)
+return "vsll";
+  else if (type == V4SF_type_node)
+return "vf";
+  else if (type == V4SI_type_node)
+return "vsi";
+  else if (type == V8HI_type_node)
+return "vss";
+  else if (type == pixel_V8HI_type_node)
+return "vp";
+  else if (type == pcvoid_type_node)
+return "voidc*";
+  else if (type == float128_type_node)
+return "_Float128";
+  else if (type == vector_pair_type_node)
+return "__vector_pair";
+  else if (type == vector_quad_type_node)
+return "__vector_quad";
+  else
+return "unknown";
+}
+
 static void
 def_builtin (const char *name, tree type, enum rs6000_builtins code)
 {
@@ -8908,7 +9008,7 @@ def_builtin (const char *name, tree type, enum 
rs6000_builtins code)
   /* const function, function only depends on the inputs.  */
   TREE_READONLY (t) = 1;
   TREE_NOTHROW (t) = 1;
-  attr_string = ", const";
+  attr_string = "= const";
 }
   else if ((classify & RS6000_BTC_PURE) != 0)
 {
@@ -8916,7 +9016,7 @@ def_builtin (const char *name, tree type, enum 
rs6000_builtins code)
 external state.  */
   DECL_PURE_P (t) = 1;
   TREE_NOTHROW (t) = 1;
-  attr_string = ", pure";
+  attr_string = "= pure";
 }
   else if ((classify & RS6000_BTC_FP) != 0)
 {
@@ -8930,12 +9030,12 @@ def_builtin (const char *name, tree type, enum 
rs6000_builtins code)
{
  DECL_PURE_P (t) = 1;
  DECL_IS_NOVOPS (t) = 1;
- attr_string = ", fp, pure";
+ attr_string = "= fp, pure";
}
   else
{
  TREE_READONLY (t) = 1;
- attr_string = ", fp, const";
+ attr_string = "= fp, const";
}
 }
   els

[PATCH 13/18] rs6000: Miscellaneous uses of rs6000_builtins_decl_x

2021-09-01 Thread Bill Schmidt via Gcc-patches
There are a few leftover places where we use the old rs6000_builtins_decl
array, but we need to use rs6000_builtins_decl_x instead when the new
builtins infrastructure is in play.

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Use
rs6000_builtin_decls_x when appropriate.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.
---
 gcc/config/rs6000/rs6000.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 52c78c7500c..fa86b797b0d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -22681,12 +22681,16 @@ rs6000_builtin_reciprocal (tree fndecl)
   if (!RS6000_RECIP_AUTO_RSQRTE_P (V2DFmode))
return NULL_TREE;
 
+  if (new_builtins_are_live)
+   return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_2DF];
   return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_2DF];
 
 case VSX_BUILTIN_XVSQRTSP:
   if (!RS6000_RECIP_AUTO_RSQRTE_P (V4SFmode))
return NULL_TREE;
 
+  if (new_builtins_are_live)
+   return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_4SF];
   return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_4SF];
 
 default:
@@ -25275,7 +25279,10 @@ add_condition_to_bb (tree function_decl, tree 
version_decl,
 
   tree bool_zero = build_int_cst (bool_int_type_node, 0);
   tree cond_var = create_tmp_var (bool_int_type_node);
-  tree predicate_decl = rs6000_builtin_decls [(int) 
RS6000_BUILTIN_CPU_SUPPORTS];
+  tree predicate_decl
+= (new_builtins_are_live
+   ? rs6000_builtin_decls_x[(int) RS6000_BIF_CPU_SUPPORTS]
+   : rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS]);
   const char *arg_str = rs6000_clone_map[clone_isa].name;
   tree predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
   gimple *call_cond_stmt = gimple_build_call (predicate_decl, 1, 
predicate_arg);
@@ -27915,8 +27922,14 @@ rs6000_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
   return;
 }
 
-  tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS];
-  tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF];
+  tree mffs
+= (new_builtins_are_live
+   ? rs6000_builtin_decls_x[RS6000_BIF_MFFS]
+   : rs6000_builtin_decls[RS6000_BUILTIN_MFFS]);
+  tree mtfsf
+= (new_builtins_are_live
+   ? rs6000_builtin_decls_x[RS6000_BIF_MTFSF]
+   : rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]);
   tree call_mffs = build_call_expr (mffs, 0);
 
   /* Generates the equivalent of feholdexcept (&fenv_var)
-- 
2.27.0



[PATCH 12/18] rs6000: Update rs6000_builtin_decl

2021-09-01 Thread Bill Schmidt via Gcc-patches
Create a new version of this function that uses the new infrastructure,
and particularly checks for supported builtins the new way.

2021-08-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_new_builtin_decl): New
function.
(rs6000_builtin_decl): Call it.
---
 gcc/config/rs6000/rs6000-call.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index e34f6ce8745..b9ce3f28f9a 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -16236,11 +16236,31 @@ rs6000_init_builtins (void)
 }
 }
 
+static tree
+rs6000_new_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+{
+  rs6000_gen_builtins fcode = (rs6000_gen_builtins) code;
+
+  if (fcode >= RS6000_OVLD_MAX)
+return error_mark_node;
+
+  if (!rs6000_new_builtin_is_supported (fcode))
+{
+  rs6000_invalid_new_builtin (fcode);
+  return error_mark_node;
+}
+
+  return rs6000_builtin_decls_x[code];
+}
+
 /* Returns the rs6000 builtin decl for CODE.  */
 
 tree
 rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
 {
+  if (new_builtins_are_live)
+return rs6000_new_builtin_decl (code, initialize_p);
+
   HOST_WIDE_INT fnmask;
 
   if (code >= RS6000_BUILTIN_COUNT)
-- 
2.27.0



[PATCH 11/18] rs6000: Builtin expansion, part 6

2021-09-01 Thread Bill Schmidt via Gcc-patches
Provide replacements for htm_spr_num and htm_expand_builtin.  No logic
changes are intended here, as usual.  Much code was factored out into
rs6000_expand_new_builtin, so the new version of htm_expand_builtin is
a little tidier.

Also implement the support for the "endian" and "32bit" attributes,
which is straightforward.  These just do icode substitution.

2021-09-01  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (new_htm_spr_num): New function.
(new_htm_expand_builtin): Implement.
(rs6000_expand_new_builtin): Handle 32-bit and endian cases.
---
 gcc/config/rs6000/rs6000-call.c | 202 
 1 file changed, 202 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8956eefd95..e34f6ce8745 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -15140,11 +15140,171 @@ new_mma_expand_builtin (tree exp, rtx target, 
insn_code icode,
   return target;
 }
 
+/* Return the appropriate SPR number associated with the given builtin.  */
+static inline HOST_WIDE_INT
+new_htm_spr_num (enum rs6000_gen_builtins code)
+{
+  if (code == RS6000_BIF_GET_TFHAR
+  || code == RS6000_BIF_SET_TFHAR)
+return TFHAR_SPR;
+  else if (code == RS6000_BIF_GET_TFIAR
+  || code == RS6000_BIF_SET_TFIAR)
+return TFIAR_SPR;
+  else if (code == RS6000_BIF_GET_TEXASR
+  || code == RS6000_BIF_SET_TEXASR)
+return TEXASR_SPR;
+  gcc_assert (code == RS6000_BIF_GET_TEXASRU
+ || code == RS6000_BIF_SET_TEXASRU);
+  return TEXASRU_SPR;
+}
+
 /* Expand the HTM builtin in EXP and store the result in TARGET.  */
 static rtx
 new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
tree exp, rtx target)
 {
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  bool nonvoid = TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node;
+
+  if (!TARGET_POWERPC64
+  && (fcode == RS6000_BIF_TABORTDC
+ || fcode == RS6000_BIF_TABORTDCI))
+{
+  error ("builtin %qs is only valid in 64-bit mode", bifaddr->bifname);
+  return const0_rtx;
+}
+
+  rtx op[MAX_HTM_OPERANDS], pat;
+  int nopnds = 0;
+  tree arg;
+  call_expr_arg_iterator iter;
+  insn_code icode = bifaddr->icode;
+  bool uses_spr = bif_is_htmspr (*bifaddr);
+  rtx cr = NULL_RTX;
+
+  if (uses_spr)
+icode = rs6000_htm_spr_icode (nonvoid);
+  const insn_operand_data *insn_op = &insn_data[icode].operand[0];
+
+  if (nonvoid)
+{
+  machine_mode tmode = (uses_spr) ? insn_op->mode : E_SImode;
+  if (!target
+ || GET_MODE (target) != tmode
+ || (uses_spr && !insn_op->predicate (target, tmode)))
+   target = gen_reg_rtx (tmode);
+  if (uses_spr)
+   op[nopnds++] = target;
+}
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+{
+  if (arg == error_mark_node || nopnds >= MAX_HTM_OPERANDS)
+   return const0_rtx;
+
+  insn_op = &insn_data[icode].operand[nopnds];
+  op[nopnds] = expand_normal (arg);
+
+  if (!insn_op->predicate (op[nopnds], insn_op->mode))
+   {
+ if (!strcmp (insn_op->constraint, "n"))
+   {
+ int arg_num = (nonvoid) ? nopnds : nopnds + 1;
+ if (!CONST_INT_P (op[nopnds]))
+   error ("argument %d must be an unsigned literal", arg_num);
+ else
+   error ("argument %d is an unsigned literal that is "
+  "out of range", arg_num);
+ return const0_rtx;
+   }
+ op[nopnds] = copy_to_mode_reg (insn_op->mode, op[nopnds]);
+   }
+
+  nopnds++;
+}
+
+  /* Handle the builtins for extended mnemonics.  These accept
+ no arguments, but map to builtins that take arguments.  */
+  switch (fcode)
+{
+case RS6000_BIF_TENDALL:  /* Alias for: tend. 1  */
+case RS6000_BIF_TRESUME:  /* Alias for: tsr. 1  */
+  op[nopnds++] = GEN_INT (1);
+  break;
+case RS6000_BIF_TSUSPEND: /* Alias for: tsr. 0  */
+  op[nopnds++] = GEN_INT (0);
+  break;
+default:
+  break;
+}
+
+  /* If this builtin accesses SPRs, then pass in the appropriate
+ SPR number and SPR regno as the last two operands.  */
+  if (uses_spr)
+{
+  machine_mode mode = (TARGET_POWERPC64) ? DImode : SImode;
+  op[nopnds++] = gen_rtx_CONST_INT (mode, new_htm_spr_num (fcode));
+}
+  /* If this builtin accesses a CR, then pass in a scratch
+ CR as the last operand.  */
+  else if (bif_is_htmcr (*bifaddr))
+{
+  cr = gen_reg_rtx (CCmode);
+  op[nopnds++] = cr;
+}
+
+  switch (nopnds)
+{
+case 1:
+  pat = GEN_FCN (icode) (op[0]);
+  break;
+case 2:
+  pat = GEN_FCN (icode) (op[0], op[1]);
+  break;
+case 3:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+  break;
+case 4:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+  break;
+default:
+  gcc_unreachable ();
+}
+  if (!pat)
+  

[PATCH 10/18] rs6000: Builtin expansion, part 5

2021-09-01 Thread Bill Schmidt via Gcc-patches
Replace mma_expand_builtin.  There are no significant logic changes,
just adjustments to use the new infrastructure and clean up formatting.

2021-09-01  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (new_mma_expand_builtin):
Implement.
---
 gcc/config/rs6000/rs6000-call.c | 103 
 1 file changed, 103 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 33153a5657c..a8956eefd95 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -15034,6 +15034,109 @@ static rtx
 new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
rs6000_gen_builtins fcode)
 {
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  tree arg;
+  call_expr_arg_iterator iter;
+  const struct insn_operand_data *insn_op;
+  rtx op[MAX_MMA_OPERANDS];
+  unsigned nopnds = 0;
+  bool void_func = TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
+  machine_mode tmode = VOIDmode;
+
+  if (!void_func)
+{
+  tmode = insn_data[icode].operand[0].mode;
+  if (!target
+ || GET_MODE (target) != tmode
+ || !insn_data[icode].operand[0].predicate (target, tmode))
+   target = gen_reg_rtx (tmode);
+  op[nopnds++] = target;
+}
+  else
+target = const0_rtx;
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+{
+  if (arg == error_mark_node)
+   return const0_rtx;
+
+  rtx opnd;
+  insn_op = &insn_data[icode].operand[nopnds];
+  if (TREE_CODE (arg) == ADDR_EXPR
+ && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0
+   opnd = DECL_RTL (TREE_OPERAND (arg, 0));
+  else
+   opnd = expand_normal (arg);
+
+  if (!insn_op->predicate (opnd, insn_op->mode))
+   {
+ if (!strcmp (insn_op->constraint, "n"))
+   {
+ if (!CONST_INT_P (opnd))
+   error ("argument %d must be an unsigned literal", nopnds);
+ else
+   error ("argument %d is an unsigned literal that is "
+  "out of range", nopnds);
+ return const0_rtx;
+   }
+ opnd = copy_to_mode_reg (insn_op->mode, opnd);
+   }
+
+  /* Some MMA instructions have INOUT accumulator operands, so force
+their target register to be the same as their input register.  */
+  if (!void_func
+ && nopnds == 1
+ && !strcmp (insn_op->constraint, "0")
+ && insn_op->mode == tmode
+ && REG_P (opnd)
+ && insn_data[icode].operand[0].predicate (opnd, tmode))
+   target = op[0] = opnd;
+
+  op[nopnds++] = opnd;
+}
+
+  rtx pat;
+  switch (nopnds)
+{
+case 1:
+  pat = GEN_FCN (icode) (op[0]);
+  break;
+case 2:
+  pat = GEN_FCN (icode) (op[0], op[1]);
+  break;
+case 3:
+  /* The ASSEMBLE builtin source operands are reversed in little-endian
+mode, so reorder them.  */
+  if (fcode == RS6000_BIF_ASSEMBLE_PAIR_V_INTERNAL && !WORDS_BIG_ENDIAN)
+   std::swap (op[1], op[2]);
+  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+  break;
+case 4:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+  break;
+case 5:
+  /* The ASSEMBLE builtin source operands are reversed in little-endian
+mode, so reorder them.  */
+  if (fcode == RS6000_BIF_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
+   {
+ std::swap (op[1], op[4]);
+ std::swap (op[2], op[3]);
+   }
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+  break;
+case 6:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
+  break;
+case 7:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6]);
+  break;
+default:
+  gcc_unreachable ();
+}
+  if (!pat)
+return NULL_RTX;
+  emit_insn (pat);
+
   return target;
 }
 
-- 
2.27.0



[PATCH 09/18] rs6000: Builtin expansion, part 4

2021-09-01 Thread Bill Schmidt via Gcc-patches
Consolidate into elemrev_icode some logic that is scattered throughout
the old altivec_expand_builtin.  Also replace functions for handling
special load and store built-ins:
= ldv_expand_builtin replaces altivec_expand_lv_builtin
= lxvrse_expand_builtin and lxvrze_expand_builtin replace
  altivec_expand_lxvr_builtin
= stv_expand builtin replaces altivec_expand_stv_builtin

In all cases, there are no logic changes except that some code was
already factored out into rs6000_expand_new_builtin.

2021-09-01  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (elemrev_icode): Implement.
(ldv_expand_builtin): Likewise.
(lxvrse_expand_builtin): Likewise.
(lxvrze_expand_builtin): Likewise.
(stv_expand_builtin): Likewise.
---
 gcc/config/rs6000/rs6000-call.c | 245 
 1 file changed, 245 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 5032e947a8e..33153a5657c 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14754,12 +14754,142 @@ new_cpu_expand_builtin (enum rs6000_gen_builtins 
fcode,
 static insn_code
 elemrev_icode (rs6000_gen_builtins fcode)
 {
+  switch (fcode)
+{
+default:
+  gcc_unreachable ();
+
+case RS6000_BIF_ST_ELEMREV_V1TI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_store_v1ti
+   : CODE_FOR_vsx_st_elemrev_v1ti;
+
+case RS6000_BIF_ST_ELEMREV_V2DF:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_store_v2df
+   : CODE_FOR_vsx_st_elemrev_v2df;
+
+case RS6000_BIF_ST_ELEMREV_V2DI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_store_v2di
+   : CODE_FOR_vsx_st_elemrev_v2di;
+
+case RS6000_BIF_ST_ELEMREV_V4SF:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_store_v4sf
+   : CODE_FOR_vsx_st_elemrev_v4sf;
+
+case RS6000_BIF_ST_ELEMREV_V4SI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_store_v4si
+   : CODE_FOR_vsx_st_elemrev_v4si;
+
+case RS6000_BIF_ST_ELEMREV_V8HI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_store_v8hi
+   : CODE_FOR_vsx_st_elemrev_v8hi;
+
+case RS6000_BIF_ST_ELEMREV_V16QI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_store_v16qi
+   : CODE_FOR_vsx_st_elemrev_v16qi;
+
+case RS6000_BIF_LD_ELEMREV_V2DF:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_load_v2df
+   : CODE_FOR_vsx_ld_elemrev_v2df;
+
+case RS6000_BIF_LD_ELEMREV_V1TI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_load_v1ti
+   : CODE_FOR_vsx_ld_elemrev_v1ti;
+
+case RS6000_BIF_LD_ELEMREV_V2DI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_load_v2di
+   : CODE_FOR_vsx_ld_elemrev_v2di;
+
+case RS6000_BIF_LD_ELEMREV_V4SF:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_load_v4sf
+   : CODE_FOR_vsx_ld_elemrev_v4sf;
+
+case RS6000_BIF_LD_ELEMREV_V4SI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_load_v4si
+   : CODE_FOR_vsx_ld_elemrev_v4si;
+
+case RS6000_BIF_LD_ELEMREV_V8HI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_load_v8hi
+   : CODE_FOR_vsx_ld_elemrev_v8hi;
+
+case RS6000_BIF_LD_ELEMREV_V16QI:
+  return BYTES_BIG_ENDIAN
+   ? CODE_FOR_vsx_load_v16qi
+   : CODE_FOR_vsx_ld_elemrev_v16qi;
+}
+  gcc_unreachable ();
   return (insn_code) 0;
 }
 
 static rtx
 ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
 {
+  rtx pat, addr;
+  bool blk = (icode == CODE_FOR_altivec_lvlx
+ || icode == CODE_FOR_altivec_lvlxl
+ || icode == CODE_FOR_altivec_lvrx
+ || icode == CODE_FOR_altivec_lvrxl);
+
+  if (target == 0
+  || GET_MODE (target) != tmode
+  || !insn_data[icode].operand[0].predicate (target, tmode))
+target = gen_reg_rtx (tmode);
+
+  op[1] = copy_to_mode_reg (Pmode, op[1]);
+
+  /* For LVX, express the RTL accurately by ANDing the address with -16.
+ LVXL and LVE*X expand to use UNSPECs to hide their special behavior,
+ so the raw address is fine.  */
+  if (icode == CODE_FOR_altivec_lvx_v1ti
+  || icode == CODE_FOR_altivec_lvx_v2df
+  || icode == CODE_FOR_altivec_lvx_v2di
+  || icode == CODE_FOR_altivec_lvx_v4sf
+  || icode == CODE_FOR_altivec_lvx_v4si
+  || icode == CODE_FOR_altivec_lvx_v8hi
+  || icode == CODE_FOR_altivec_lvx_v16qi)
+{
+  rtx rawaddr;
+  if (op[0] == const0_rtx)
+   rawaddr = op[1];
+  else
+   {
+ op[0] = copy_to_mode_reg (Pmode, op[0]);
+ rawaddr = gen_rtx_PLUS (Pmode, op[1], op[0]);
+   }
+  addr = gen_rtx_AND (Pmode, rawaddr, gen_rtx_CONST_INT (Pmode, -16));
+  addr = gen_rtx_MEM (blk ? BLKmode : tmode, addr);
+
+  emit_insn (gen_rtx_SET (target, addr));
+}
+  else
+{
+  if (op[0] == const0_rtx)
+   addr = gen_rtx_MEM (blk ? BLKmode : tmode, op[1]);
+  else
+   {
+ op[0] = copy_to_mode_reg (Pmode, op[0]);
+ addr = gen_rtx

[PATCH 08/18] rs6000: Builtin expansion, part 3

2021-09-01 Thread Bill Schmidt via Gcc-patches
Implement the replacement for cpu_expand_builtin.  There are no logic
changes here, just changes to use the new built-in function names and
clean up some formatting.

2021-09-01  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (new_cpu_expand_builtin):
Implement.
---
 gcc/config/rs6000/rs6000-call.c | 102 
 1 file changed, 102 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 3e0ab42317b..5032e947a8e 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14646,6 +14646,108 @@ static rtx
 new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
tree exp ATTRIBUTE_UNUSED, rtx target)
 {
+  /* __builtin_cpu_init () is a nop, so expand to nothing.  */
+  if (fcode == RS6000_BIF_CPU_INIT)
+return const0_rtx;
+
+  if (target == 0 || GET_MODE (target) != SImode)
+target = gen_reg_rtx (SImode);
+
+#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
+  tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
+  /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it back
+ to a STRING_CST.  */
+  if (TREE_CODE (arg) == ARRAY_REF
+  && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
+  && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
+  && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
+arg = TREE_OPERAND (arg, 0);
+
+  if (TREE_CODE (arg) != STRING_CST)
+{
+  error ("builtin %qs only accepts a string argument",
+rs6000_builtin_info_x[(size_t) fcode].bifname);
+  return const0_rtx;
+}
+
+  if (fcode == RS6000_BIF_CPU_IS)
+{
+  const char *cpu = TREE_STRING_POINTER (arg);
+  rtx cpuid = NULL_RTX;
+  for (size_t i = 0; i < ARRAY_SIZE (cpu_is_info); i++)
+   if (strcmp (cpu, cpu_is_info[i].cpu) == 0)
+ {
+   /* The CPUID value in the TCB is offset by _DL_FIRST_PLATFORM.  */
+   cpuid = GEN_INT (cpu_is_info[i].cpuid + _DL_FIRST_PLATFORM);
+   break;
+ }
+  if (cpuid == NULL_RTX)
+   {
+ /* Invalid CPU argument.  */
+ error ("cpu %qs is an invalid argument to builtin %qs",
+cpu, rs6000_builtin_info_x[(size_t) fcode].bifname);
+ return const0_rtx;
+   }
+
+  rtx platform = gen_reg_rtx (SImode);
+  rtx address = gen_rtx_PLUS (Pmode,
+ gen_rtx_REG (Pmode, TLS_REGNUM),
+ GEN_INT (TCB_PLATFORM_OFFSET));
+  rtx tcbmem = gen_const_mem (SImode, address);
+  emit_move_insn (platform, tcbmem);
+  emit_insn (gen_eqsi3 (target, platform, cpuid));
+}
+  else if (fcode == RS6000_BIF_CPU_SUPPORTS)
+{
+  const char *hwcap = TREE_STRING_POINTER (arg);
+  rtx mask = NULL_RTX;
+  int hwcap_offset;
+  for (size_t i = 0; i < ARRAY_SIZE (cpu_supports_info); i++)
+   if (strcmp (hwcap, cpu_supports_info[i].hwcap) == 0)
+ {
+   mask = GEN_INT (cpu_supports_info[i].mask);
+   hwcap_offset = TCB_HWCAP_OFFSET (cpu_supports_info[i].id);
+   break;
+ }
+  if (mask == NULL_RTX)
+   {
+ /* Invalid HWCAP argument.  */
+ error ("%s %qs is an invalid argument to builtin %qs",
+"hwcap", hwcap,
+rs6000_builtin_info_x[(size_t) fcode].bifname);
+ return const0_rtx;
+   }
+
+  rtx tcb_hwcap = gen_reg_rtx (SImode);
+  rtx address = gen_rtx_PLUS (Pmode,
+ gen_rtx_REG (Pmode, TLS_REGNUM),
+ GEN_INT (hwcap_offset));
+  rtx tcbmem = gen_const_mem (SImode, address);
+  emit_move_insn (tcb_hwcap, tcbmem);
+  rtx scratch1 = gen_reg_rtx (SImode);
+  emit_insn (gen_rtx_SET (scratch1,
+ gen_rtx_AND (SImode, tcb_hwcap, mask)));
+  rtx scratch2 = gen_reg_rtx (SImode);
+  emit_insn (gen_eqsi3 (scratch2, scratch1, const0_rtx));
+  emit_insn (gen_rtx_SET (target,
+ gen_rtx_XOR (SImode, scratch2, const1_rtx)));
+}
+  else
+gcc_unreachable ();
+
+  /* Record that we have expanded a CPU builtin, so that we can later
+ emit a reference to the special symbol exported by LIBC to ensure we
+ do not link against an old LIBC that doesn't support this feature.  */
+  cpu_builtin_p = true;
+
+#else
+  warning (0, "builtin %qs needs GLIBC (2.23 and newer) that exports hardware "
+  "capability bits", rs6000_builtin_info_x[(size_t) fcode].bifname);
+
+  /* For old LIBCs, always return FALSE.  */
+  emit_move_insn (target, GEN_INT (0));
+#endif /* TARGET_LIBC_PROVIDES_HWCAP_IN_TCB */
+
   return target;
 }
 
-- 
2.27.0



[PATCH 07/18] rs6000: Builtin expansion, part 2

2021-09-01 Thread Bill Schmidt via Gcc-patches
Implement rs6000_invalid_new_builtin, which issues the appropriate error
message when a builtin is used when it is not enabled.  Also implement
rs6000_expand_ldst_mask, which just factors out the code that handles
ALTIVEC_BUILTIN_MASK_FOR_LOAD in the old rs6000_expand_builtin.  Finally,
ensure the variable altivec_builtin_mask_for_load is initialized.

2021-09-01  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin):
Implement.
(rs6000_expand_ldst_mask): Likewise.
(rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.
---
 gcc/config/rs6000/rs6000-call.c | 101 +++-
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 583efc9e98e..3e0ab42317b 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -11671,6 +11671,75 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
 static void
 rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
 {
+  size_t uns_fncode = (size_t) fncode;
+  const char *name = rs6000_builtin_info_x[uns_fncode].bifname;
+
+  switch (rs6000_builtin_info_x[uns_fncode].enable)
+{
+case ENB_P5:
+  error ("%qs requires the %qs option", name, "-mcpu=power5");
+  break;
+case ENB_P6:
+  error ("%qs requires the %qs option", name, "-mcpu=power6");
+  break;
+case ENB_ALTIVEC:
+  error ("%qs requires the %qs option", name, "-maltivec");
+  break;
+case ENB_CELL:
+  error ("%qs is only valid for the cell processor", name);
+  break;
+case ENB_VSX:
+  error ("%qs requires the %qs option", name, "-mvsx");
+  break;
+case ENB_P7:
+  error ("%qs requires the %qs option", name, "-mcpu=power7");
+  break;
+case ENB_P7_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=power7", "-m64", "-mpowerpc64");
+  break;
+case ENB_P8:
+  error ("%qs requires the %qs option", name, "-mcpu=power8");
+  break;
+case ENB_P8V:
+  error ("%qs requires the %qs option", name, "-mpower8-vector");
+  break;
+case ENB_P9:
+  error ("%qs requires the %qs option", name, "-mcpu=power9");
+  break;
+case ENB_P9_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=power9", "-m64", "-mpowerpc64");
+  break;
+case ENB_P9V:
+  error ("%qs requires the %qs option", name, "-mpower9-vector");
+  break;
+case ENB_IEEE128_HW:
+  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
+  break;
+case ENB_DFP:
+  error ("%qs requires the %qs option", name, "-mhard-dfp");
+  break;
+case ENB_CRYPTO:
+  error ("%qs requires the %qs option", name, "-mcrypto");
+  break;
+case ENB_HTM:
+  error ("%qs requires the %qs option", name, "-mhtm");
+  break;
+case ENB_P10:
+  error ("%qs requires the %qs option", name, "-mcpu=power10");
+  break;
+case ENB_P10_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=power10", "-m64", "-mpowerpc64");
+  break;
+case ENB_MMA:
+  error ("%qs requires the %qs option", name, "-mmma");
+  break;
+default:
+case ENB_ALWAYS:
+  gcc_unreachable ();
+};
 }
 
 /* Target hook for early folding of built-ins, shamelessly stolen
@@ -14542,7 +14611,34 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
 rtx
 rs6000_expand_ldst_mask (rtx target, tree arg0)
  {
-  return target;
+  int icode2 = BYTES_BIG_ENDIAN
+? (int) CODE_FOR_altivec_lvsr_direct
+: (int) CODE_FOR_altivec_lvsl_direct;
+  machine_mode tmode = insn_data[icode2].operand[0].mode;
+  machine_mode mode = insn_data[icode2].operand[1].mode;
+  rtx op, addr, pat;
+
+  gcc_assert (TARGET_ALTIVEC);
+
+  gcc_assert (POINTER_TYPE_P (TREE_TYPE (arg0)));
+  op = expand_expr (arg0, NULL_RTX, Pmode, EXPAND_NORMAL);
+  addr = memory_address (mode, op);
+  /* We need to negate the address.  */
+  op = gen_reg_rtx (GET_MODE (addr));
+  emit_insn (gen_rtx_SET (op, gen_rtx_NEG (GET_MODE (addr), addr)));
+  op = gen_rtx_MEM (mode, op);
+
+  if (target == 0
+  || GET_MODE (target) != tmode
+  || !insn_data[icode2].operand[0].predicate (target, tmode))
+target = gen_reg_rtx (tmode);
+
+  pat = GEN_FCN (icode2) (target, op);
+  if (!pat)
+return 0;
+  emit_insn (pat);
+
+   return target;
  }
 
 /* Expand the CPU builtin in FCODE and store the result in TARGET.  */
@@ -15351,6 +15447,9 @@ rs6000_init_builtins (void)
 
   if (new_builtins_are_live)
 {
+  altivec_builtin_mask_for_load
+   = rs6000_builtin_decls_x[RS6000_BIF_MASK_FOR_LOAD];
+
 #ifdef SUBTARGET_INIT_BUILTINS
   SUBTARGET_INIT_BUILTINS;
 #endif
-- 
2.27.0



[PATCH 06/18] rs6000: Builtin expansion, part 1

2021-09-01 Thread Bill Schmidt via Gcc-patches
This patch and the subsequent five patches form the meat of the improvements
for this patch series.  We develop a replacement for rs6000_expand_builtin
and its supporting functions, which are inefficient and difficult to
maintain.  This patch implements rs6000_expand_new_builtin, and creates
stubs for the support functions that subsequent patches will fill out.

Differences between the old and new support in this patch include:
 - Make use of the new builtin data structures, directly looking up
   a function's information rather than searching for the function
   multiple times;
 - Test for enablement of builtins at expand time, to support #pragma
   target changes within a compilation unit;
 - Use the builtin function attributes (e.g., bif_is_cpu) to control
   special handling;
 - Refactor common code into one place; and
 - Provide common error handling in one place for operands that are
   restricted to specific values or ranges.

Note that these six patches must be pushed together, because otherwise
unused parameter warnings in the stub functions will prevent bootstrap.
If preferred, I can flag them unused to remove this restriction.

2021-08-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): New
forward decl.
(rs6000_invalid_new_builtin): New stub function.
(rs6000_expand_builtin): Call rs6000_expand_new_builtin.
(rs6000_expand_ldst_mask): New stub function.
(new_cpu_expand_builtin): Likewise.
(elemrev_icode): Likewise.
(ldv_expand_builtin): Likewise.
(lxvrse_expand_builtin): Likewise.
(lxvrze_expand_builtin): Likewise.
(stv_expand_builtin): Likewise.
(new_mma_expand_builtin): Likewise.
(new_htm_expand_builtin): Likewise.
(rs6000_expand_new_builtin): New function.
---
 gcc/config/rs6000/rs6000-call.c | 432 
 1 file changed, 432 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 558f06cfd6c..583efc9e98e 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, 
machine_mode,
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
 static void mma_init_builtins (void);
+static rtx rs6000_expand_new_builtin (tree, rtx, rtx, machine_mode, int);
 static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
 
 
@@ -11664,6 +11665,14 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
 error ("%qs is not supported with the current options", name);
 }
 
+/* Raise an error message for a builtin function that is called without the
+   appropriate target options being set.  */
+
+static void
+rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
+{
+}
+
 /* Target hook for early folding of built-ins, shamelessly stolen
from ia64.c.  */
 
@@ -14234,6 +14243,9 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   machine_mode mode ATTRIBUTE_UNUSED,
   int ignore ATTRIBUTE_UNUSED)
 {
+  if (new_builtins_are_live)
+return rs6000_expand_new_builtin (exp, target, subtarget, mode, ignore);
+
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   enum rs6000_builtins fcode
 = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
@@ -14526,6 +14538,426 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   gcc_unreachable ();
 }
 
+/* Expand ALTIVEC_BUILTIN_MASK_FOR_LOAD.  */
+rtx
+rs6000_expand_ldst_mask (rtx target, tree arg0)
+ {
+  return target;
+ }
+
+/* Expand the CPU builtin in FCODE and store the result in TARGET.  */
+static rtx
+new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
+   tree exp ATTRIBUTE_UNUSED, rtx target)
+{
+  return target;
+}
+
+static insn_code
+elemrev_icode (rs6000_gen_builtins fcode)
+{
+  return (insn_code) 0;
+}
+
+static rtx
+ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
+{
+  return target;
+}
+
+static rtx
+lxvrse_expand_builtin (rtx target, insn_code icode, rtx *op,
+  machine_mode tmode, machine_mode smode)
+{
+  return target;
+}
+
+static rtx
+lxvrze_expand_builtin (rtx target, insn_code icode, rtx *op,
+  machine_mode tmode, machine_mode smode)
+{
+  return target;
+}
+
+static rtx
+stv_expand_builtin (insn_code icode, rtx *op,
+   machine_mode tmode, machine_mode smode)
+{
+  return NULL_RTX;
+}
+
+/* Expand the MMA built-in in EXP.  */
+static rtx
+new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
+   rs6000_gen_builtins fcode)
+{
+  return target;
+}
+
+/* Expand the HTM builtin in EXP and store the result in TARGET.  */
+static rtx
+new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
+   tree exp, rtx target)
+

[PATCH 05/18] rs6000: Support for vectorizing built-in functions

2021-09-01 Thread Bill Schmidt via Gcc-patches
This patch just duplicates a couple of functions and adjusts them to use the
new builtin names.  There's no logical change otherwise.

2021-08-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000.c (rs6000-builtins.h): New include.
(rs6000_new_builtin_vectorized_function): New function.
(rs6000_new_builtin_md_vectorized_function): Likewise.
(rs6000_builtin_vectorized_function): Call
rs6000_new_builtin_vectorized_function.
(rs6000_builtin_md_vectorized_function): Call
rs6000_new_builtin_md_vectorized_function.
---
 gcc/config/rs6000/rs6000.c | 253 +
 1 file changed, 253 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b7ea1483da5..52c78c7500c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -78,6 +78,7 @@
 #include "case-cfn-macros.h"
 #include "ppc-auxv.h"
 #include "rs6000-internal.h"
+#include "rs6000-builtins.h"
 #include "opts.h"
 
 /* This file should be included last.  */
@@ -5501,6 +5502,251 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
loop *loop)
   return nunroll;
 }
 
+/* Returns a function decl for a vectorized version of the builtin function
+   with builtin function code FN and the result vector type TYPE, or NULL_TREE
+   if it is not available.  */
+
+static tree
+rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out,
+   tree type_in)
+{
+  machine_mode in_mode, out_mode;
+  int in_n, out_n;
+
+  if (TARGET_DEBUG_BUILTIN)
+fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n",
+combined_fn_name (combined_fn (fn)),
+GET_MODE_NAME (TYPE_MODE (type_out)),
+GET_MODE_NAME (TYPE_MODE (type_in)));
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+  || TREE_CODE (type_in) != VECTOR_TYPE)
+return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
+  switch (fn)
+{
+CASE_CFN_COPYSIGN:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_CPSGNDP];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_CPSGNSP];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_COPYSIGN_V4SF];
+  break;
+CASE_CFN_CEIL:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIP];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIP];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_VRFIP];
+  break;
+CASE_CFN_FLOOR:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIM];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIM];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_VRFIM];
+  break;
+CASE_CFN_FMA:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVMADDDP];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVMADDSP];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_VMADDFP];
+  break;
+CASE_CFN_TRUNC:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIZ];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIZ];
+  if (VECTOR_UNIT_ALTIV

[PATCH 03/18] rs6000: Handle gimple folding of target built-ins

2021-09-01 Thread Bill Schmidt via Gcc-patches
This is another patch that looks bigger than it really is.  Because we
have a new namespace for the builtins, allowing us to have both the old
and new builtin infrastructure supported at once, we need versions of
these functions that use the new builtin namespace.  Otherwise the code is
unchanged.

2021-08-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
New forward decl.
(rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
(rs6000_new_builtin_valid_without_lhs): New function.
(rs6000_gimple_fold_new_mma_builtin): Likewise.
(rs6000_gimple_fold_new_builtin): Likewise.
---
 gcc/config/rs6000/rs6000-call.c | 1165 +++
 1 file changed, 1165 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 2c68aa3580c..eae4e15df1e 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, 
machine_mode,
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
 static void mma_init_builtins (void);
+static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
 
 
 /* Hash table to keep track of the argument types for builtin functions.  */
@@ -12024,6 +12025,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
*gsi)
 bool
 rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
+  if (new_builtins_are_live)
+return rs6000_gimple_fold_new_builtin (gsi);
+
   gimple *stmt = gsi_stmt (*gsi);
   tree fndecl = gimple_call_fndecl (stmt);
   gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
@@ -12971,6 +12975,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   return false;
 }
 
+/*  Helper function to sort out which built-ins may be valid without having
+a LHS.  */
+static bool
+rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code,
+ tree fndecl)
+{
+  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
+return true;
+
+  switch (fn_code)
+{
+case RS6000_BIF_STVX_V16QI:
+case RS6000_BIF_STVX_V8HI:
+case RS6000_BIF_STVX_V4SI:
+case RS6000_BIF_STVX_V4SF:
+case RS6000_BIF_STVX_V2DI:
+case RS6000_BIF_STVX_V2DF:
+case RS6000_BIF_STXVW4X_V16QI:
+case RS6000_BIF_STXVW4X_V8HI:
+case RS6000_BIF_STXVW4X_V4SF:
+case RS6000_BIF_STXVW4X_V4SI:
+case RS6000_BIF_STXVD2X_V2DF:
+case RS6000_BIF_STXVD2X_V2DI:
+  return true;
+default:
+  return false;
+}
+}
+
 /* Check whether a builtin function is supported in this target
configuration.  */
 bool
@@ -13024,6 +13057,1138 @@ rs6000_new_builtin_is_supported (enum 
rs6000_gen_builtins fncode)
   gcc_unreachable ();
 }
 
+/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
+   __vector_quad arguments into pass-by-value arguments, leading to more
+   efficient code generation.  */
+static bool
+rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
+   rs6000_gen_builtins fn_code)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  size_t fncode = (size_t) fn_code;
+
+  if (!bif_is_mma (rs6000_builtin_info_x[fncode]))
+return false;
+
+  /* Each call that can be gimple-expanded has an associated built-in
+ function that it will expand into.  If this one doesn't, we have
+ already expanded it!  */
+  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
+return false;
+
+  bifdata *bd = &rs6000_builtin_info_x[fncode];
+  unsigned nopnds = bd->nargs;
+  gimple_seq new_seq = NULL;
+  gimple *new_call;
+  tree new_decl;
+
+  /* Compatibility built-ins; we used to call these
+ __builtin_mma_{dis,}assemble_pair, but now we call them
+ __builtin_vsx_{dis,}assemble_pair.  Handle the old versions.  */
+  if (fncode == RS6000_BIF_ASSEMBLE_PAIR)
+fncode = RS6000_BIF_ASSEMBLE_PAIR_V;
+  else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR)
+fncode = RS6000_BIF_DISASSEMBLE_PAIR_V;
+
+  if (fncode == RS6000_BIF_DISASSEMBLE_ACC
+  || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V)
+{
+  /* This is an MMA disassemble built-in function.  */
+  push_gimplify_context (true);
+  unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2;
+  tree dst_ptr = gimple_call_arg (stmt, 0);
+  tree src_ptr = gimple_call_arg (stmt, 1);
+  tree src_type = TREE_TYPE (src_ptr);
+  tree src = create_tmp_reg_or_ssa_name (TREE_TYPE (src_type));
+  gimplify_assign (src, build_simple_mem_ref (src_ptr), &new_seq);
+
+  /* If we are not disassembling an accumulator/pair or our destination is
+another accumulator/pair, then just copy the entire thing as is.  */
+  if ((fncode == RS6000_BIF_DISASSEMBLE_ACC
+  && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+ || (fncode == RS6000_BIF_DI

[PATCH 04/18] rs6000: Handle some recent MMA builtin changes

2021-09-01 Thread Bill Schmidt via Gcc-patches
Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
__builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
I had been using to automate gimple folding of MMA builtins.  Previously,
every MMA function that could be folded had an associated internal function
that it was folded into.  The LXVP/STXVP builtins are just folded directly
into memory operations.

Instead of relying on this pattern, this patch adds a new attribute to
builtins called "mmaint," which is set for all MMA builtins that have an
associated internal builtin.  The naming convention that adds _INTERNAL to
the builtin index name remains.

The rest of the patch is just duplicating Peter's patch, using the new
builtin infrastructure.

2021-08-23  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag.
(ASSEMBLE_PAIR): Likewise.
(BUILD_ACC): Likewise.
(DISASSEMBLE_ACC): Likewise.
(DISASSEMBLE_PAIR): Likewise.
(PMXVBF16GER2): Likewise.
(PMXVBF16GER2NN): Likewise.
(PMXVBF16GER2NP): Likewise.
(PMXVBF16GER2PN): Likewise.
(PMXVBF16GER2PP): Likewise.
(PMXVF16GER2): Likewise.
(PMXVF16GER2NN): Likewise.
(PMXVF16GER2NP): Likewise.
(PMXVF16GER2PN): Likewise.
(PMXVF16GER2PP): Likewise.
(PMXVF32GER): Likewise.
(PMXVF32GERNN): Likewise.
(PMXVF32GERNP): Likewise.
(PMXVF32GERPN): Likewise.
(PMXVF32GERPP): Likewise.
(PMXVF64GER): Likewise.
(PMXVF64GERNN): Likewise.
(PMXVF64GERNP): Likewise.
(PMXVF64GERPN): Likewise.
(PMXVF64GERPP): Likewise.
(PMXVI16GER2): Likewise.
(PMXVI16GER2PP): Likewise.
(PMXVI16GER2S): Likewise.
(PMXVI16GER2SPP): Likewise.
(PMXVI4GER8): Likewise.
(PMXVI4GER8PP): Likewise.
(PMXVI8GER4): Likewise.
(PMXVI8GER4PP): Likewise.
(PMXVI8GER4SPP): Likewise.
(XVBF16GER2): Likewise.
(XVBF16GER2NN): Likewise.
(XVBF16GER2NP): Likewise.
(XVBF16GER2PN): Likewise.
(XVBF16GER2PP): Likewise.
(XVF16GER2): Likewise.
(XVF16GER2NN): Likewise.
(XVF16GER2NP): Likewise.
(XVF16GER2PN): Likewise.
(XVF16GER2PP): Likewise.
(XVF32GER): Likewise.
(XVF32GERNN): Likewise.
(XVF32GERNP): Likewise.
(XVF32GERPN): Likewise.
(XVF32GERPP): Likewise.
(XVF64GER): Likewise.
(XVF64GERNN): Likewise.
(XVF64GERNP): Likewise.
(XVF64GERPN): Likewise.
(XVF64GERPP): Likewise.
(XVI16GER2): Likewise.
(XVI16GER2PP): Likewise.
(XVI16GER2S): Likewise.
(XVI16GER2SPP): Likewise.
(XVI4GER8): Likewise.
(XVI4GER8PP): Likewise.
(XVI8GER4): Likewise.
(XVI8GER4PP): Likewise.
(XVI8GER4SPP): Likewise.
(XXMFACC): Likewise.
(XXMTACC): Likewise.
(XXSETACCZ): Likewise.
(ASSEMBLE_PAIR_V): Likewise.
(BUILD_PAIR): Likewise.
(DISASSEMBLE_PAIR_V): Likewise.
(LXVP): New.
(STXVP): New.
* config/rs6000/rs6000-call.c
(rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
RS6000_BIF_STXVP.
* config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint.
(parse_bif_attrs): Handle ismmaint.
(write_decls): Add bif_mmaint_bit and bif_is_mmaint.
(write_bif_static_init): Handle ismmaint.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 145 ---
 gcc/config/rs6000/rs6000-call.c  |  38 +-
 gcc/config/rs6000/rs6000-gen-builtins.c  |  38 +++---
 3 files changed, 135 insertions(+), 86 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index a8c6b9e988f..1966516551e 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -129,6 +129,7 @@
 ;   mma  Needs special handling for MMA
 ;   quad MMA instruction using a register quad as an input operand
 ;   pair MMA instruction using a register pair as an input operand
+;   mmaint   MMA instruction expanding to internal call at GIMPLE time
 ;   no32bit  Not valid for TARGET_32BIT
 ;   32bitRequires different handling for TARGET_32BIT
 ;   cpu  This is a "cpu_is" or "cpu_supports" builtin
@@ -3584,415 +3585,421 @@
 
 [mma]
   void __builtin_mma_assemble_acc (v512 *, vuc, vuc, vuc, vuc);
-ASSEMBLE_ACC nothing {mma}
+ASSEMBLE_ACC nothing {mma,mmaint}
 
   v512 __builtin_mma_assemble_acc_internal (vuc, vuc, vuc, vuc);
 ASSEMBLE_ACC_INTERNAL mma_assemble_acc {mma}
 
   void __builtin_mma_assemble_pair (v256 *, vuc, vuc);
-ASSEMBLE_PAIR nothing {mma}
+ASSEMBLE_PAIR nothing {mma,mmaint}
 
   v256 __builtin_mma_assemble_pair_internal (vuc, vuc);
 ASSEMBLE_PAIR_INTERNAL vsx_assemble_pair {mma}
 
   void __builtin_mma_build

[PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza

2021-09-01 Thread Bill Schmidt via Gcc-patches
I over-restricted use of __builtin_mffsl, since I was unaware that it
automatically uses mffs when mffsl is not available.  Paul Clarke pointed
this out in discussion of his SSE 4.1 compatibility patches.

2021-08-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (__builtin_mffsl): Move from [power9]
to [always].
---
 gcc/config/rs6000/rs6000-builtin-new.def | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 6a28d5189f8..a8c6b9e988f 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -208,6 +208,12 @@
   double __builtin_mffs ();
 MFFS rs6000_mffs {}
 
+; Although the mffsl instruction is only available on POWER9 and later
+; processors, this builtin automatically falls back to mffs on older
+; platforms.  Thus it appears here in the [always] stanza.
+  double __builtin_mffsl ();
+MFFSL rs6000_mffsl {}
+
 ; This thing really assumes long double == __ibm128, and I'm told it has
 ; been used as such within libgcc.  Given that __builtin_pack_ibm128
 ; exists for the same purpose, this should really not be used at all.
@@ -2784,9 +2790,6 @@
   signed long long __builtin_darn_raw ();
 DARN_RAW darn_raw {}
 
-  double __builtin_mffsl ();
-MFFSL rs6000_mffsl {}
-
   const signed int __builtin_dtstsfi_eq_dd (const int<6>, _Decimal64);
 TSTSFI_EQ_DD dfptstsfi_eq_dd {}
 
-- 
2.27.0



[PATCHv5 00/18] Replace the Power target-specific builtin machinery

2021-09-01 Thread Bill Schmidt via Gcc-patches
Hi!

Original patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568840.html

V2 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572231.html

V3 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573020.html

V4 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576284.html

Thanks for all the reviews so far!  We're into the home stretch.  I needed
to rebase this series again in order to pick up some changes from upstream.

Patch 01/18 is a reposting of V4 patch 19/34, addressing some of the
comments.  Full refactoring of this stuff will be done later, after this
patch series can burn in a little.  This wasn't yet formally approved.

Patch 02/18 is new, and is a minor bug fix.

Patches 03/18 through 17/18 correspond to V4 patches 20/34 through 34/34.
These were adjusted for upstream changes, and I did some formatting
cleanups.  I also provided better descriptions for some of the patches.

Patch 18/18 is new, and improves the parser to handle escape-newline
input.  With that in place, it cleans up all the long lines in the
input files.

Bootstrapped and tested on powerpc64le-linux-gnu (P10) and
powerpc64-linux-gnu (32- and 64-bit, P8).  There are no regressions for
little endian.  There are a small handful of big-endian regressions that
have crept in, and I'll post patches for those after I work through them.
But no need to hold up reviews on the rest of this in the meantime.

Thanks again for all of the helpful reviews so far!

Bill


Bill Schmidt (18):
  rs6000: Handle overloads during program parsing
  rs6000: Move __builtin_mffsl to the [always] stanza
  rs6000: Handle gimple folding of target built-ins
  rs6000: Handle some recent MMA builtin changes
  rs6000: Support for vectorizing built-in functions
  rs6000: Builtin expansion, part 1
  rs6000: Builtin expansion, part 2
  rs6000: Builtin expansion, part 3
  rs6000: Builtin expansion, part 4
  rs6000: Builtin expansion, part 5
  rs6000: Builtin expansion, part 6
  rs6000: Update rs6000_builtin_decl
  rs6000: Miscellaneous uses of rs6000_builtins_decl_x
  rs6000: Debug support
  rs6000: Update altivec.h for automated interfaces
  rs6000: Test case adjustments
  rs6000: Enable the new builtin support
  rs6000: Add escape-newline support for builtins files

 gcc/config/rs6000/altivec.h   |  519 +--
 gcc/config/rs6000/rs6000-builtin-new.def  |  442 ++-
 gcc/config/rs6000/rs6000-c.c  | 1088 ++
 gcc/config/rs6000/rs6000-call.c   | 3132 +++--
 gcc/config/rs6000/rs6000-gen-builtins.c   |  312 +-
 gcc/config/rs6000/rs6000.c|  272 +-
 .../powerpc/bfp/scalar-extract-exp-2.c|2 +-
 .../powerpc/bfp/scalar-extract-sig-2.c|2 +-
 .../powerpc/bfp/scalar-insert-exp-2.c |2 +-
 .../powerpc/bfp/scalar-insert-exp-5.c |2 +-
 .../powerpc/bfp/scalar-insert-exp-8.c |2 +-
 .../powerpc/bfp/scalar-test-neg-2.c   |2 +-
 .../powerpc/bfp/scalar-test-neg-3.c   |2 +-
 .../powerpc/bfp/scalar-test-neg-5.c   |2 +-
 .../gcc.target/powerpc/byte-in-set-2.c|2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c |2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c   |2 +-
 .../gcc.target/powerpc/crypto-builtin-2.c |   14 +-
 .../powerpc/fold-vec-splat-floatdouble.c  |4 +-
 .../powerpc/fold-vec-splat-longlong.c |   10 +-
 .../powerpc/fold-vec-splat-misc-invalid.c |8 +-
 .../gcc.target/powerpc/int_128bit-runnable.c  |6 +-
 .../gcc.target/powerpc/p8vector-builtin-8.c   |1 +
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr88100.c|   12 +-
 .../gcc.target/powerpc/pragma_misc9.c |2 +-
 .../gcc.target/powerpc/pragma_power8.c|2 +
 .../gcc.target/powerpc/pragma_power9.c|3 +
 .../powerpc/test_fpscr_drn_builtin_error.c|4 +-
 .../powerpc/test_fpscr_rn_builtin_error.c |   12 +-
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |3 +-
 gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c  |2 +-
 .../gcc.target/powerpc/vsu/vec-all-nez-7.c|2 +-
 .../gcc.target/powerpc/vsu/vec-any-eqz-7.c|2 +-
 .../gcc.target/powerpc/vsu/vec-cmpnez-7.c |2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c |2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c |2 +-
 .../gcc.target/powerpc/vsu/vec-xl-len-13.c|2 +-
 .../gcc.target/powerpc/vsu/vec-xst-len-12.c   |2 +-
 42 files changed, 4803 insertions(+), 1089 deletions(-)

-- 
2.27.0



[PATCH 01/18] rs6000: Handle overloads during program parsing

2021-09-01 Thread Bill Schmidt via Gcc-patches
Although this patch looks quite large, the changes are fairly minimal.
Most of it is duplicating the large function that does the overload
resolution using the automatically generated data structures instead of
the old hand-generated ones.  This doesn't make the patch terribly easy to
review, unfortunately.  Just be aware that generally we aren't changing
the logic and functionality of overload handling.

2021-08-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (rs6000-builtins.h): New include.
(altivec_resolve_new_overloaded_builtin): New forward decl.
(rs6000_new_builtin_type_compatible): New function.
(altivec_resolve_overloaded_builtin): Call
altivec_resolve_new_overloaded_builtin.
(altivec_build_new_resolved_builtin): New function.
(altivec_resolve_new_overloaded_builtin): Likewise.
* config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported):
Likewise.
* config/rs6000/rs6000-gen-builtins.c (write_decls): Remove _p from
name of rs6000_new_builtin_is_supported.
---
 gcc/config/rs6000/rs6000-c.c| 1088 +++
 gcc/config/rs6000/rs6000-call.c |   53 ++
 gcc/config/rs6000/rs6000-gen-builtins.c |2 +-
 3 files changed, 1142 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index afcb5bb6e39..aafb4e6a98f 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -35,6 +35,9 @@
 #include "langhooks.h"
 #include "c/c-tree.h"
 
+#include "rs6000-builtins.h"
+
+static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void *);
 
 
 /* Handle the machine specific pragma longcall.  Its syntax is
@@ -811,6 +814,30 @@ is_float128_p (tree t)
  && t == long_double_type_node));
 }
   
+static bool
+rs6000_new_builtin_type_compatible (tree t, tree u)
+{
+  if (t == error_mark_node)
+return false;
+
+  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (u))
+return true;
+
+  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  && is_float128_p (t) && is_float128_p (u))
+return true;
+
+  if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u))
+{
+  t = TREE_TYPE (t);
+  u = TREE_TYPE (u);
+  if (TYPE_READONLY (u))
+   t = build_qualified_type (t, TYPE_QUAL_CONST);
+}
+
+  return lang_hooks.types_compatible_p (t, u);
+}
+
 static inline bool
 rs6000_builtin_type_compatible (tree t, int id)
 {
@@ -927,6 +954,10 @@ tree
 altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
void *passed_arglist)
 {
+  if (new_builtins_are_live)
+return altivec_resolve_new_overloaded_builtin (loc, fndecl,
+  passed_arglist);
+
   vec *arglist = static_cast *> (passed_arglist);
   unsigned int nargs = vec_safe_length (arglist);
   enum rs6000_builtins fcode
@@ -1930,3 +1961,1060 @@ altivec_resolve_overloaded_builtin (location_t loc, 
tree fndecl,
 return error_mark_node;
   }
 }
+
+/* Build a tree for a function call to an Altivec non-overloaded builtin.
+   The overloaded builtin that matched the types and args is described
+   by DESC.  The N arguments are given in ARGS, respectively.
+
+   Actually the only thing it does is calling fold_convert on ARGS, with
+   a small exception for vec_{all,any}_{ge,le} predicates. */
+
+static tree
+altivec_build_new_resolved_builtin (tree *args, int n, tree fntype,
+   tree ret_type,
+   rs6000_gen_builtins bif_id,
+   rs6000_gen_builtins ovld_id)
+{
+  tree argtypes = TYPE_ARG_TYPES (fntype);
+  tree arg_type[MAX_OVLD_ARGS];
+  tree fndecl = rs6000_builtin_decls_x[bif_id];
+  tree call;
+
+  for (int i = 0; i < n; i++)
+arg_type[i] = TREE_VALUE (argtypes), argtypes = TREE_CHAIN (argtypes);
+
+  /* The AltiVec overloading implementation is overall gross, but this
+ is particularly disgusting.  The vec_{all,any}_{ge,le} builtins
+ are completely different for floating-point vs. integer vector
+ types, because the former has vcmpgefp, but the latter should use
+ vcmpgtXX.
+
+ In practice, the second and third arguments are swapped, and the
+ condition (LT vs. EQ, which is recognizable by bit 1 of the first
+ argument) is reversed.  Patch the arguments here before building
+ the resolved CALL_EXPR.  */
+  if (n == 3
+  && ovld_id == RS6000_OVLD_VEC_CMPGE_P
+  && bif_id != RS6000_BIF_VCMPGEFP_P
+  && bif_id != RS6000_BIF_XVCMPGEDP_P)
+{
+  std::swap (args[1], args[2]);
+  std::swap (arg_type[1], arg_type[2]);
+
+  args[0] = fold_build2 (BIT_XOR_EXPR, TREE_TYPE (args[0]), args[0],
+build_int_cst (NULL_TREE, 2));
+}
+
+  /* If the number of arguments to an overloaded function increases,
+ we must expand this switch.  */
+  gcc_assert (MAX_OVLD_ARGS <= 4);
+
+  switch (

Re: [PATCH] Fix arm target build with inhibit_libc

2021-09-01 Thread Jeff Law via Gcc-patches




On 9/1/2021 1:00 AM, Christophe Lyon via Gcc-patches wrote:

On Wed, Sep 1, 2021 at 7:09 AM Sebastian Huber <
sebastian.hu...@embedded-brains.de> wrote:


On 30/08/2021 14:01, Sebastian Huber wrote:

Do not declare abort in "libgcc/unwind-arm-common.inc" since it is

already

provided by "tsystem.h".  It fixes the following build error:

In file included from libgcc/config/arm/unwind-arm.c:144:
libgcc/unwind-arm-common.inc:55:24: error: macro "abort" passed 1

arguments, but takes just 0

 55 | extern void abort (void);

libgcc/

   * unwind-arm-common.inc (abort): Remove.

Could someone please have a look at this patch. Currently, the arm build
with inhibit_libc is broken.
OK.  Or just wrap it with #ifndef abort if removing it causes other 
problems.


Jeff



[PING] [PATCH] Jit, testsuite: Amend expect processing to tolerate more platforms.

2021-09-01 Thread Iain Sandoe
Since this post I’ve tested this on more platforms (including cfarm machines
with dejagnu-1.5.1 and tcl 8.5). 

If there’s concern about applying it everywhere, I could make a second version
of fixed_host_execute and have that called conditionally on Darwin.

The Jit testsuite is unusable without this, and really I cannot enable Jit
reliably on Darwin since one failure mode is a deadlocked expect instance
so that the testsuite never completes until that is killed manually.

thanks
Iain

> On 19 Aug 2021, at 19:59, Iain Sandoe  wrote:
> 
> Hi,
> 
> Preface:
> 
> this is the last patch for now in my series - with this applied Darwin
> reports the same results as Linux (at least, for modern x86_64
> platform versions).
> 
> Note
> a)  that the expect expression in {fixed}host_execute seems to depend
> on the assumption that the dejagnu.h output is used by the testcase
> and that the executable’s output can be seen to end with the totals
> produced there (which might in itself be erroneous, see 3).
> 
> b) the main GCC testsuite processing does not do this; rather the expect
> expression is somewhat simple and the output from the executable
> is copied into a secondary buffer, which is then processed by
> prune expressions and then to find the requisite matches (so most
> of the issues seen below do not occur there).
> 
>  patch discussion
> 
> The current 'fixed_host_execute' implementation fails on Darwin
> platforms for a number of reasons:
> 
> 1/ If the sub-process spawn fails (e.g. because of missing or mal-
>   formed params); rather than reporting the fail output into the
>   match stream, as indicated by the expect manual, it terminates
>   the script.
> 
> - We fix this by (a) checking that the executable is valid as well
>   as existing (b) we put the spawn into a catch block and report
>   a failure.
> 
> 2/ There is no recovery path at all for a buffer-full case (and we
>   do see buffer-full events with the default sizes).
> 
> - Added by the patch here, however it is not as sophisticated as
>   the methods used by dejagnu internally.  Here we set the process
>   to be "nowait" and then close the connection - with the intent
>   that this will terminate the spawned process.
> 
> 3/  The expect logic assumes that 'Totals:' is a valid indicator
>for the end of the spawned process output.  This is not true
>even for the default dejagnu header (there are a number of
>additional reporting lines after).  In addition to this, there
>are some tests that intentionally produce more output after
>the totals report (and there are tests that do not use that
>mechanism at all).
> 
>The effect is the we might arrive at the "wait" for the spawned
>process to finish - but that process might not have completed
>all its output.  For Darwin, at least that causes a deadlock
>between expect and the spawnee - the latter is doing a non-
>cancellable write and the former is waiting for the latter to
>terminate.  For some reason this does not seem to affect Linux
>perhaps the pty implementation allows the write(s) are able to
>proceed even though there is no reader.
> 
> -  This is fixed by modifying the loop termination condition to be
>either EOF (which will be the 'correct' condition) or a timeout
>which would represent an error either in the runtime or in the
>parsing of the output.  As added precautions, we only try to
>wait if there is a correcly-spawned process, and we are also
>specific about which process we are waiting for.
> 
> 4/  Darwin appears to have a bug in either the tcl or termios
>'cooking' code that ocassionally inserts an additional CR char
>into the stream - thus '\n' => '\r\r\n' instead of '\r\n'. The
>original program output is correct (it only contains a single
>\n) - the additional character is being inserted somewhere in
>the translations applied before the output reaches expect.
> 
>The logic of this expect implementation does not tolerate single
>\r or \n characters (it will fail with a timeout or buffer-full
>if that occurs).
> 
> -  This is fixed by having a line-end match that is adjusted for
>Darwin.
> 
> 5/  The default buffer size does seem to be too small in some cases
>noting that GCC uses 1 as the match buffer size and the
>default is 2000.
> 
> -  Fixed by increasing the size to 8192.
> 
> 6/  There is a somewhat arbitrary dumping of output where we match
>^$prefix\tSOMETHING... and then process the something.  This
>essentially allows the match to start at any place in the buffer
>following any collection of non-line-end chars.
> 
> -  Fixed by amending the match for 'general' lines to accommodate
>these cases, and reporting such lines to the log.  At least this
>should allow debugging of any cases where output that should be
>recognized is being dropped.
> 
> tested on i686, x86_64-darwin, x86_64,powerpc64-linux,
> OK for master?
>

[pushed] Objective-C, NeXT: Fix messenging non-aggregate return-in-memory.

2021-09-01 Thread Iain Sandoe
Hi

When a method returns a type that the platform ABI says should be
returned in memory, and that is done by a hidden 'sret' parameter,
the message send calls must be adjusted to inform the runtime that
the sret parameter is present.  As reported in the PR, this is not
working for non-aggregate types that use this mechanism.  The fix
here is to adjust the logic such that all return values that flag
'in memory' are considered to use the mechanism *unless* they
provide a struct_value_rtx *and* the return object is an aggregate.

tested across *darwin* and on x86_64, powerpc64-linux,
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

PR objc/101718 - Objective-C frontend emits wrong code to call methods 
returning scalar types returned in memory

PR objc/101718

gcc/objc/ChangeLog:

* objc-next-runtime-abi-02.c (build_v2_build_objc_method_call):
Revise for cases where scalar objects use an sret parameter.
(next_runtime_abi_02_build_objc_method_call): Likwise.
---
 gcc/objc/objc-next-runtime-abi-02.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index ce831fc34ae..9c35738a95c 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -1739,15 +1739,16 @@ build_v2_build_objc_method_call (int super, tree 
method_prototype,
   /* Param list + 2 slots for object and selector.  */
   vec_alloc (parms, nparm + 2);
 
-  /* If we are returning a struct in memory, and the address
- of that memory location is passed as a hidden first
- argument, then change which messenger entry point this
- expr will call.  NB: Note that sender_cast remains
- unchanged (it already has a struct return type).  */
-  if (!targetm.calls.struct_value_rtx (0, 0)
-  && (TREE_CODE (ret_type) == RECORD_TYPE
- || TREE_CODE (ret_type) == UNION_TYPE)
-  && targetm.calls.return_in_memory (ret_type, 0))
+  /* If we are returning an item that must be returned in memory, and the
+ target ABI does this by an invisible pointer provided as the first arg,
+ we need to adjust the message signature to include this.  The second
+ part of this excludes targets that provide some alternate scheme for
+ structure returns.  */
+  if (ret_type && !VOID_TYPE_P (ret_type)
+  && targetm.calls.return_in_memory (ret_type, 0)
+  && !(targetm.calls.struct_value_rtx (0, 0)
+  && (TREE_CODE (ret_type) == RECORD_TYPE
+  || TREE_CODE (ret_type) == UNION_TYPE)))
 {
   if (super)
sender = umsg_id_super2_stret_fixup_decl;
@@ -1849,10 +1850,12 @@ next_runtime_abi_02_build_objc_method_call (location_t 
loc,
 ? TREE_VALUE (TREE_TYPE (method_prototype))
 : objc_object_type;
 
-  if (!targetm.calls.struct_value_rtx (0, 0)
-  && (TREE_CODE (ret_type) == RECORD_TYPE
- || TREE_CODE (ret_type) == UNION_TYPE)
-  && targetm.calls.return_in_memory (ret_type, 0))
+  /* See comment for the fixup version above.  */
+  if (ret_type && !VOID_TYPE_P (ret_type)
+  && targetm.calls.return_in_memory (ret_type, 0)
+  && !(targetm.calls.struct_value_rtx (0, 0)
+  && (TREE_CODE (ret_type) == RECORD_TYPE
+  || TREE_CODE (ret_type) == UNION_TYPE)))
 {
   if (super)
message_func_decl = umsg_id_super2_stret_fixup_decl;
-- 



[pushed] C-family: Add attribute 'unavailable'.

2021-09-01 Thread Iain Sandoe
Hi,

The patch was approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562757.html
and here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/560604.html

subject to a documentation nit.

I ran out of time before stage3 ended .. and just got around to this now.

re-tested on x86_64, powerpc64, powerpc64le, aarach64-linux, aix and solaris
plus across the Darwin range, did a make pdf in gcc and checked the
result visually,

pushed to master with the nit fixed, as attached, thanks,
Iain



0001-C-family-Add-attribute-unavailable.patch
Description: Binary data


Re: [PATCH]AArch64 RFC: Don't cost all scalar operations during vectorization if scalar will fuse

2021-09-01 Thread Richard Biener via Gcc-patches
On Tue, Aug 31, 2021 at 4:50 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Tamar Christina  writes:
> > Hi All,
> >
> > As the vectorizer has improved over time in capabilities it has started
> > over-vectorizing.  This has causes regressions in the order of 1-7x on 
> > libraries
> > that Arm produces.
> >
> > The vector costs actually do make a lot of sense and I don't think that 
> > they are
> > wrong.  I think that the costs for the scalar code are wrong.
> >
> > In particular the costing doesn't take into effect that scalar operation
> > can/will fuse as this happens in RTL.  Because of this the costs for the 
> > scalars
> > end up being always higher.
> >
> > As an example the loop in PR 97984:
> >
> > void x (long * __restrict a, long * __restrict b)
> > {
> >   a[0] *= b[0];
> >   a[1] *= b[1];
> >   a[0] += b[0];
> >   a[1] += b[1];
> > }
> >
> > generates:
> >
> > x:
> > ldp x2, x3, [x0]
> > ldr x4, [x1]
> > ldr q1, [x1]
> > mul x2, x2, x4
> > ldr x4, [x1, 8]
> > fmovd0, x2
> > ins v0.d[1], x3
> > mul x1, x3, x4
> > ins v0.d[1], x1
> > add v0.2d, v0.2d, v1.2d
> > str q0, [x0]
> > ret
> >
> > On an actual loop the prologue costs would make the loop too expensive so we
> > produce the scalar output, but with SLP there's no loop overhead costs so 
> > we end
> > up trying to vectorize this. Because SLP discovery is started from the 
> > stores we
> > will end up vectorizing and costing the add but not the MUL.
> >
> > To counter this the patch adjusts the costing when it finds an operation 
> > that
> > can be fused and discounts the cost of the "other" operation being fused in.
> >
> > The attached testcase shows that even when we discount it we still get 
> > still get
> > vectorized code when profitable to do so, e.g. SVE.
> >
> > This happens as well with other operations such as scalar operations where
> > shifts can be fused in or for e.g. bfxil.  As such sending this for 
> > feedback.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? If the approach is acceptable I can add support for more.
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >   PR target/97984
> >   * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Check for fusing
> >   madd.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR target/97984
> >   * gcc.target/aarch64/pr97984-1.c: New test.
> >   * gcc.target/aarch64/pr97984-2.c: New test.
> >   * gcc.target/aarch64/pr97984-3.c: New test.
> >   * gcc.target/aarch64/pr97984-4.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 
> > 4cd4b037f2606e515ad8f4669d2cd13a509dd0a4..329b556311310d86aaf546d7b395a3750a9d57d4
> >  100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -15536,6 +15536,39 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void 
> > *data, int count,
> >   stmt_cost = aarch64_sve_adjust_stmt_cost (vinfo, kind, stmt_info,
> > vectype, stmt_cost);
> >
> > +  /* Scale costs if operation is fusing.  */
> > +  if (stmt_info && kind == scalar_stmt)
> > +  {
> > + if (gassign *stmt = dyn_cast (STMT_VINFO_STMT (stmt_info)))
> > +   {
> > + switch (gimple_assign_rhs_code (stmt))
> > + {
> > + case PLUS_EXPR:
> > + case MINUS_EXPR:
> > +   {
> > + /* Check if operation can fuse into MSUB or MADD.  */
> > + tree rhs1 = gimple_assign_rhs1 (stmt);
> > + if (gassign *stmt1 = dyn_cast (SSA_NAME_DEF_STMT 
> > (rhs1)))
> > +   if (gimple_assign_rhs_code (stmt1) == MULT_EXPR)
> > + {
> > +   stmt_cost = 0;
> > +   break;
> > +}
> > + tree rhs2 = gimple_assign_rhs2 (stmt);
> > + if (gassign *stmt2 = dyn_cast (SSA_NAME_DEF_STMT 
> > (rhs2)))
> > +   if (gimple_assign_rhs_code (stmt2) == MULT_EXPR)
> > + {
> > +   stmt_cost = 0;
> > +   break;
> > + }
> > +   }
> > +   break;
> > + default:
> > +   break;
> > + }
> > +   }
> > +  }
> > +
>
> The difficulty with this is that we can also use MLA-type operations
> for SVE, and for Advanced SIMD if the mode is not DI.  It's not just
> a scalar thing.
>
> We already take the combination into account (via aarch64_multiply_add_p)
> when estimating issue rates.  But we don't take it into account for
> latencies because of the reason above: if the multiplications are
> vectorisable, then the combination applies to both the scalar and
> the vector code, so the adjustments cancel out.  (Admittedly that
> decision predates the special Advance

Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-09-01 Thread Richard Biener via Gcc-patches
On Wed, Sep 1, 2021 at 2:52 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Sep 1, 2021 at 8:28 AM Hongtao Liu  wrote:
> >>
> >> On Tue, Aug 31, 2021 at 7:56 PM Richard Biener
> >>  wrote:
> >> >
> >> > On Tue, Aug 31, 2021 at 12:18 PM Hongtao Liu  wrote:
> >> > >
> >> > > On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
> >> > >  wrote:
> >> > > >
> >> > > > On Fri, Aug 27, 2021 at 8:53 AM liuhongt  
> >> > > > wrote:
> >> > > > >
> >> > > > >   When gimple simplifcation try to combine op and vec_cond_expr to 
> >> > > > > cond_op,
> >> > > > > it doesn't check if mask type matches. It causes an ICE when 
> >> > > > > expand cond_op
> >> > > > > with mismatched mode.
> >> > > > >   This patch add a function named 
> >> > > > > cond_vectorized_internal_fn_supported_p
> >> > > > >  to additionally check mask type than 
> >> > > > > vectorized_internal_fn_supported_p.
> >> > > > >
> >> > > > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> >> > > > >   Ok for trunk?
> >> > > > >
> >> > > > > gcc/ChangeLog:
> >> > > > >
> >> > > > > PR middle-end/102080
> >> > > > > * internal-fn.c (cond_vectorized_internal_fn_supported_p): 
> >> > > > > New functions.
> >> > > > > * internal-fn.h (cond_vectorized_internal_fn_supported_p): 
> >> > > > > New declaration.
> >> > > > > * match.pd: Check the type of mask while generating 
> >> > > > > cond_op in
> >> > > > > gimple simplication.
> >> > > > >
> >> > > > > gcc/testsuite/ChangeLog:
> >> > > > >
> >> > > > > PR middle-end/102080
> >> > > > > * gcc.target/i386/pr102080.c: New test.
> >> > > > > ---
> >> > > > >  gcc/internal-fn.c| 22 
> >> > > > > ++
> >> > > > >  gcc/internal-fn.h|  1 +
> >> > > > >  gcc/match.pd | 24 
> >> > > > > 
> >> > > > >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
> >> > > > >  4 files changed, 55 insertions(+), 8 deletions(-)
> >> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
> >> > > > >
> >> > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> >> > > > > index 1360a00f0b9..8b2b65db1a7 100644
> >> > > > > --- a/gcc/internal-fn.c
> >> > > > > +++ b/gcc/internal-fn.c
> >> > > > > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
> >> > > > >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
> >> > > > >  }
> >> > > > >
> >> > > > > +/* Check cond_op for vector modes since 
> >> > > > > vectorized_internal_fn_supported_p
> >> > > > > +   doesn't check if mask type matches.  */
> >> > > > > +bool
> >> > > > > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree 
> >> > > > > type,
> >> > > > > +tree mask_type)
> >> > > > > +{
> >> > > > > +  if (!vectorized_internal_fn_supported_p (ifn, type))
> >> > > > > +return false;
> >> > > > > +
> >> > > > > +  machine_mode mask_mode;
> >> > > > > +  machine_mode vmode = TYPE_MODE (type);
> >> > > > > +  int size1, size2;
> >> > > > > +  if (VECTOR_MODE_P (vmode)
> >> > > > > +  && targetm.vectorize.get_mask_mode 
> >> > > > > (vmode).exists(&mask_mode)
> >> > > > > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
> >> > > > > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant 
> >> > > > > (&size2)
> >> > > > > +  && size1 != size2)
> >> > > >
> >> > > > Why do we check for equal size rather than just mode equality which
> >> > > I originally thought  TYPE_MODE of vector(8)  was
> >> > > not QImode, Changed the patch to check mode equality.
> >> > > Update patch.
> >> >
> >> > Looking at all this it seems the match.pd patterns should have not
> >> > used vectorized_internal_fn_supported_p but 
> >> > direct_internal_fn_supported_p
> >> > which is equivalent here because we're always working with vector modes?
>
> Yeah, looks like it.
>
> >> > And then shouldn't we look at the actual optab whether the mask mode 
> >> > matches
> >> > the expectation rather than going around via the target hook which may 
> >> > not have
> >> > enough context to decide which mask mode to use?
> >> How about this?
> >>
> >> +/* Return true if target supports cond_op with data TYPE and
> >> +   mask MASK_TYPE.  */
> >> +bool
> >> +cond_internal_fn_supported_p (internal_fn ifn, tree type,
> >> +   tree mask_type)
> >> +{
> >> +  tree_pair types = tree_pair (type, type);
> >> +  optab tmp = direct_internal_fn_optab (ifn, types);
> >> +  machine_mode vmode = TYPE_MODE (type);
> >> +  insn_code icode = direct_optab_handler (tmp, vmode);
> >> +  if (icode == CODE_FOR_nothing)
> >> +return false;
> >> +
> >> +  machine_mode mask_mode = TYPE_MODE (mask_type);
> >> +  /* Can't create rtx and use insn_operand_matches here.  */
> >> +  return insn_data[icode].operand[0].mode == vmode
> >> +&& insn_data[icode].operand[1].mode == mask_mode;
> >> +}
> >> +

[pushed] coroutines : Add a missed begin/finish else clause to the codegen.

2021-09-01 Thread Iain Sandoe
Hi,

Minor code-gen correction.

tested on x86_64, powerpc64 - linux, x86_64-darwin,
pushed to master as trivial/obvious, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (build_actor_fn): Add begin/finish clauses
to the initial test in the actor function.
---
 gcc/cp/coroutines.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 3bb33cc9eb9..ceb3d3be75e 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -2331,6 +2331,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   finish_switch_stmt (destroy_dispatcher);
 
   finish_then_clause (lsb_if);
+  begin_else_clause (lsb_if);
 
   tree dispatcher = begin_switch_stmt ();
   finish_switch_cond (rat, dispatcher);
@@ -2368,6 +2369,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 
   /* Insert the prototype dispatcher.  */
   finish_switch_stmt (dispatcher);
+  finish_else_clause (lsb_if);
 
   finish_if_stmt (lsb_if);
 
-- 



[pushed] coroutines: No cleanups on goto statements.

2021-09-01 Thread Iain Sandoe
Hi,

Minor cleanup, this is statement not an expression, we do not
need to use finish_expr_stmt here.

tested on x86_64, powerpc64-linux, x86_64-darwin,
pushed to master as trivial/obvious, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (await_statement_walker): Use build_stmt and
add_stmt instead of build1 and finish_expr_stmt.
---
 gcc/cp/coroutines.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 25269d9e51a..3bb33cc9eb9 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3824,8 +3824,8 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
   the parameter to return_value().  */
if (!maybe_await_stmt)
  maybe_await_stmt = tsi_stmt_ptr (tsi_last (ret_list));
-   expr = build1_loc (loc, GOTO_EXPR, void_type_node, awpts->fs_label);
-   finish_expr_stmt (expr);
+   TREE_USED (awpts->fs_label) = 1;
+   add_stmt (build_stmt (loc, GOTO_EXPR, awpts->fs_label));
*stmt = pop_stmt_list (ret_list);
/* Once this is complete, we will have processed subtrees.  */
*do_subtree = 0;
-- 



[committed] libphobos: Don't add zlib when ENABLE_LIBDRUNTIME_ONLY

2021-09-01 Thread Iain Buclaw via Gcc-patches
Hi,

The D run-time library does not depend on zlib, so only include it in
the library when Phobos is being built as well.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
libphobos/ChangeLog:

* src/Makefile.am: Don't add zlib when ENABLE_LIBDRUNTIME_ONLY.
* src/Makefile.in: Regenerate.
---
 libphobos/src/Makefile.am | 4 
 libphobos/src/Makefile.in | 5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libphobos/src/Makefile.am b/libphobos/src/Makefile.am
index f97ddccaca8..9f6251009f6 100644
--- a/libphobos/src/Makefile.am
+++ b/libphobos/src/Makefile.am
@@ -45,8 +45,12 @@ libgphobos_la_SOURCES = $(ALL_PHOBOS_SOURCES)
 libgphobos_la_LIBTOOLFLAGS =
 libgphobos_la_LDFLAGS = -Wc,-nophoboslib,-dstartfiles,-B../libdruntime/gcc \
 -version-info $(libtool_VERSION)
+if ENABLE_LIBDRUNTIME_ONLY
+libgphobos_la_LIBADD = ../libdruntime/libgdruntime_convenience.la
+else
 libgphobos_la_LIBADD = \
 ../libdruntime/libgdruntime_convenience.la $(LIBZ)
+endif
 libgphobos_la_DEPENDENCIES = \
 ../libdruntime/libgdruntime_convenience.la libgphobos.spec
 
diff --git a/libphobos/src/Makefile.in b/libphobos/src/Makefile.in
index 4f76e1077d5..f8b76486e6e 100644
--- a/libphobos/src/Makefile.in
+++ b/libphobos/src/Makefile.in
@@ -504,9 +504,10 @@ libgphobos_la_LIBTOOLFLAGS =
 libgphobos_la_LDFLAGS = -Wc,-nophoboslib,-dstartfiles,-B../libdruntime/gcc \
 -version-info $(libtool_VERSION)
 
-libgphobos_la_LIBADD = \
-../libdruntime/libgdruntime_convenience.la $(LIBZ)
+@ENABLE_LIBDRUNTIME_ONLY_FALSE@libgphobos_la_LIBADD = \
+@ENABLE_LIBDRUNTIME_ONLY_FALSE@../libdruntime/libgdruntime_convenience.la 
$(LIBZ)
 
+@ENABLE_LIBDRUNTIME_ONLY_TRUE@libgphobos_la_LIBADD = 
../libdruntime/libgdruntime_convenience.la
 libgphobos_la_DEPENDENCIES = \
 ../libdruntime/libgdruntime_convenience.la libgphobos.spec
 
-- 
2.30.2



[committed] libphobos: Update comment for DRUNTIME_OS_SOURCES

2021-09-01 Thread Iain Buclaw via Gcc-patches
Hi,

This patch updates the comment for DRUNTIME_OS_SOURCES to reflect new
conditionals that have been added since it was introduced.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
libphobos/ChangeLog:

* m4/druntime/os.m4: Update comment for DRUNTIME_OS_SOURCES.
---
 libphobos/m4/druntime/os.m4 | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libphobos/m4/druntime/os.m4 b/libphobos/m4/druntime/os.m4
index 351558dbcda..15cde3b04b8 100644
--- a/libphobos/m4/druntime/os.m4
+++ b/libphobos/m4/druntime/os.m4
@@ -54,9 +54,10 @@ AC_DEFUN([DRUNTIME_OS_DETECT],
 
 # DRUNTIME_OS_SOURCES
 # ---
-# Detect target OS and add DRUNTIME_OS_AIX DRUNTIME_OS_DARWIN
-# DRUNTIME_OS_FREEBSD DRUNTIME_OS_LINUX DRUNTIME_OS_MINGW
-# DRUNTIME_OS_SOLARIS DRUNTIME_OS_OPENBSD conditionals.
+# Detect target OS and add DRUNTIME_OS_AIX DRUNTIME_OS_ANDROID
+# DRUNTIME_OS_DARWIN DRUNTIME_OS_DRAGONFLYBSD DRUNTIME_OS_FREEBSD
+# DRUNTIME_OS_LINUX DRUNTIME_OS_MINGW DRUNTIME_OS_NETBSD
+# DRUNTIME_OS_OPENBSD DRUNTIME_OS_SOLARIS conditionals.
 # If the system is posix, add DRUNTIME_OS_POSIX conditional.
 AC_DEFUN([DRUNTIME_OS_SOURCES],
 [
-- 
2.30.2



Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-09-01 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, Sep 1, 2021 at 8:28 AM Hongtao Liu  wrote:
>>
>> On Tue, Aug 31, 2021 at 7:56 PM Richard Biener
>>  wrote:
>> >
>> > On Tue, Aug 31, 2021 at 12:18 PM Hongtao Liu  wrote:
>> > >
>> > > On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
>> > >  wrote:
>> > > >
>> > > > On Fri, Aug 27, 2021 at 8:53 AM liuhongt  wrote:
>> > > > >
>> > > > >   When gimple simplifcation try to combine op and vec_cond_expr to 
>> > > > > cond_op,
>> > > > > it doesn't check if mask type matches. It causes an ICE when expand 
>> > > > > cond_op
>> > > > > with mismatched mode.
>> > > > >   This patch add a function named 
>> > > > > cond_vectorized_internal_fn_supported_p
>> > > > >  to additionally check mask type than 
>> > > > > vectorized_internal_fn_supported_p.
>> > > > >
>> > > > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
>> > > > >   Ok for trunk?
>> > > > >
>> > > > > gcc/ChangeLog:
>> > > > >
>> > > > > PR middle-end/102080
>> > > > > * internal-fn.c (cond_vectorized_internal_fn_supported_p): 
>> > > > > New functions.
>> > > > > * internal-fn.h (cond_vectorized_internal_fn_supported_p): 
>> > > > > New declaration.
>> > > > > * match.pd: Check the type of mask while generating cond_op 
>> > > > > in
>> > > > > gimple simplication.
>> > > > >
>> > > > > gcc/testsuite/ChangeLog:
>> > > > >
>> > > > > PR middle-end/102080
>> > > > > * gcc.target/i386/pr102080.c: New test.
>> > > > > ---
>> > > > >  gcc/internal-fn.c| 22 ++
>> > > > >  gcc/internal-fn.h|  1 +
>> > > > >  gcc/match.pd | 24 
>> > > > > 
>> > > > >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
>> > > > >  4 files changed, 55 insertions(+), 8 deletions(-)
>> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
>> > > > >
>> > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> > > > > index 1360a00f0b9..8b2b65db1a7 100644
>> > > > > --- a/gcc/internal-fn.c
>> > > > > +++ b/gcc/internal-fn.c
>> > > > > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
>> > > > >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
>> > > > >  }
>> > > > >
>> > > > > +/* Check cond_op for vector modes since 
>> > > > > vectorized_internal_fn_supported_p
>> > > > > +   doesn't check if mask type matches.  */
>> > > > > +bool
>> > > > > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree type,
>> > > > > +tree mask_type)
>> > > > > +{
>> > > > > +  if (!vectorized_internal_fn_supported_p (ifn, type))
>> > > > > +return false;
>> > > > > +
>> > > > > +  machine_mode mask_mode;
>> > > > > +  machine_mode vmode = TYPE_MODE (type);
>> > > > > +  int size1, size2;
>> > > > > +  if (VECTOR_MODE_P (vmode)
>> > > > > +  && targetm.vectorize.get_mask_mode (vmode).exists(&mask_mode)
>> > > > > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
>> > > > > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant (&size2)
>> > > > > +  && size1 != size2)
>> > > >
>> > > > Why do we check for equal size rather than just mode equality which
>> > > I originally thought  TYPE_MODE of vector(8)  was
>> > > not QImode, Changed the patch to check mode equality.
>> > > Update patch.
>> >
>> > Looking at all this it seems the match.pd patterns should have not
>> > used vectorized_internal_fn_supported_p but direct_internal_fn_supported_p
>> > which is equivalent here because we're always working with vector modes?

Yeah, looks like it.

>> > And then shouldn't we look at the actual optab whether the mask mode 
>> > matches
>> > the expectation rather than going around via the target hook which may not 
>> > have
>> > enough context to decide which mask mode to use?
>> How about this?
>>
>> +/* Return true if target supports cond_op with data TYPE and
>> +   mask MASK_TYPE.  */
>> +bool
>> +cond_internal_fn_supported_p (internal_fn ifn, tree type,
>> +   tree mask_type)
>> +{
>> +  tree_pair types = tree_pair (type, type);
>> +  optab tmp = direct_internal_fn_optab (ifn, types);
>> +  machine_mode vmode = TYPE_MODE (type);
>> +  insn_code icode = direct_optab_handler (tmp, vmode);
>> +  if (icode == CODE_FOR_nothing)
>> +return false;
>> +
>> +  machine_mode mask_mode = TYPE_MODE (mask_type);
>> +  /* Can't create rtx and use insn_operand_matches here.  */
>> +  return insn_data[icode].operand[0].mode == vmode
>> +&& insn_data[icode].operand[1].mode == mask_mode;
>> +}
>> +
>
> Yeah, sth like that, though the operand[0].mode test should be
> redudnant.  I think we should assert or have a whiltelist
> for the internal function we support to be queried this way.
> Not sure if we can directly access the 'cond_binary/cond_ternary'
> classification used in internal-fn.def, that would be best.
>
> Richard, what are your thoughts abou

Re: [PATCH] graph output: use better colors for edges

2021-09-01 Thread Richard Biener via Gcc-patches
On Wed, Sep 1, 2021 at 11:10 AM Martin Liška  wrote:
>
> This patch improves coloring of graph dumps, as can be seen here:
> https://splichal.eu/tmp/example.svg
>
> Ready to be installed once it finishes tests?

OK

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * graph.c (draw_cfg_node_succ_edges): Do not color fallthru
>   edges and rather use colors for TRUE and FALSE edges.
> ---
>   gcc/graph.c | 9 +
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/graph.c b/gcc/graph.c
> index ce8de33ffe1..9acd1d5b95e 100644
> --- a/gcc/graph.c
> +++ b/gcc/graph.c
> @@ -133,10 +133,11 @@ draw_cfg_node_succ_edges (pretty_printer *pp, int 
> funcdef_no, basic_block bb)
>   weight = 10;
> }
> else if (e->flags & EDGE_FALLTHRU)
> -   {
> - color = "blue";
> - weight = 100;
> -   }
> +   weight = 100;
> +  else if (e->flags & EDGE_TRUE_VALUE)
> +   color = "forestgreen";
> +  else if (e->flags & EDGE_FALSE_VALUE)
> +   color = "darkorange";
>
> if (e->flags & EDGE_ABNORMAL)
> color = "red";
> --
> 2.33.0
>


Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-09-01 Thread Richard Biener via Gcc-patches
On Wed, Sep 1, 2021 at 8:28 AM Hongtao Liu  wrote:
>
> On Tue, Aug 31, 2021 at 7:56 PM Richard Biener
>  wrote:
> >
> > On Tue, Aug 31, 2021 at 12:18 PM Hongtao Liu  wrote:
> > >
> > > On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
> > >  wrote:
> > > >
> > > > On Fri, Aug 27, 2021 at 8:53 AM liuhongt  wrote:
> > > > >
> > > > >   When gimple simplifcation try to combine op and vec_cond_expr to 
> > > > > cond_op,
> > > > > it doesn't check if mask type matches. It causes an ICE when expand 
> > > > > cond_op
> > > > > with mismatched mode.
> > > > >   This patch add a function named 
> > > > > cond_vectorized_internal_fn_supported_p
> > > > >  to additionally check mask type than 
> > > > > vectorized_internal_fn_supported_p.
> > > > >
> > > > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > > > >   Ok for trunk?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR middle-end/102080
> > > > > * internal-fn.c (cond_vectorized_internal_fn_supported_p): 
> > > > > New functions.
> > > > > * internal-fn.h (cond_vectorized_internal_fn_supported_p): 
> > > > > New declaration.
> > > > > * match.pd: Check the type of mask while generating cond_op in
> > > > > gimple simplication.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR middle-end/102080
> > > > > * gcc.target/i386/pr102080.c: New test.
> > > > > ---
> > > > >  gcc/internal-fn.c| 22 ++
> > > > >  gcc/internal-fn.h|  1 +
> > > > >  gcc/match.pd | 24 
> > > > > 
> > > > >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
> > > > >  4 files changed, 55 insertions(+), 8 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
> > > > >
> > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> > > > > index 1360a00f0b9..8b2b65db1a7 100644
> > > > > --- a/gcc/internal-fn.c
> > > > > +++ b/gcc/internal-fn.c
> > > > > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
> > > > >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
> > > > >  }
> > > > >
> > > > > +/* Check cond_op for vector modes since 
> > > > > vectorized_internal_fn_supported_p
> > > > > +   doesn't check if mask type matches.  */
> > > > > +bool
> > > > > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree type,
> > > > > +tree mask_type)
> > > > > +{
> > > > > +  if (!vectorized_internal_fn_supported_p (ifn, type))
> > > > > +return false;
> > > > > +
> > > > > +  machine_mode mask_mode;
> > > > > +  machine_mode vmode = TYPE_MODE (type);
> > > > > +  int size1, size2;
> > > > > +  if (VECTOR_MODE_P (vmode)
> > > > > +  && targetm.vectorize.get_mask_mode (vmode).exists(&mask_mode)
> > > > > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
> > > > > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant (&size2)
> > > > > +  && size1 != size2)
> > > >
> > > > Why do we check for equal size rather than just mode equality which
> > > I originally thought  TYPE_MODE of vector(8)  was
> > > not QImode, Changed the patch to check mode equality.
> > > Update patch.
> >
> > Looking at all this it seems the match.pd patterns should have not
> > used vectorized_internal_fn_supported_p but direct_internal_fn_supported_p
> > which is equivalent here because we're always working with vector modes?
> >
> > And then shouldn't we look at the actual optab whether the mask mode matches
> > the expectation rather than going around via the target hook which may not 
> > have
> > enough context to decide which mask mode to use?
> How about this?
>
> +/* Return true if target supports cond_op with data TYPE and
> +   mask MASK_TYPE.  */
> +bool
> +cond_internal_fn_supported_p (internal_fn ifn, tree type,
> +   tree mask_type)
> +{
> +  tree_pair types = tree_pair (type, type);
> +  optab tmp = direct_internal_fn_optab (ifn, types);
> +  machine_mode vmode = TYPE_MODE (type);
> +  insn_code icode = direct_optab_handler (tmp, vmode);
> +  if (icode == CODE_FOR_nothing)
> +return false;
> +
> +  machine_mode mask_mode = TYPE_MODE (mask_type);
> +  /* Can't create rtx and use insn_operand_matches here.  */
> +  return insn_data[icode].operand[0].mode == vmode
> +&& insn_data[icode].operand[1].mode == mask_mode;
> +}
> +

Yeah, sth like that, though the operand[0].mode test should be
redudnant.  I think we should assert or have a whiltelist
for the internal function we support to be queried this way.
Not sure if we can directly access the 'cond_binary/cond_ternary'
classification used in internal-fn.def, that would be best.

Richard, what are your thoughts about all this?

Thanks,
Richard.

> Update patch
>
> >
> > In any case if the approach of the patch is correct shouldn't it do
> >
> >   if (VECTOR_MODE_P (vmode)
> >   && (!targetm.v

[PATCH 8/8] coroutines: Make the continue handle visible to debug.

2021-09-01 Thread Iain Sandoe


When we have a suspend method that returns a coroutine handle
we transfer (hopefully symmetrically, i.e. with a tailcall) to
that new coroutine instead of returning to our resumer.

This adds the variable to the outer block for the actor function
which means that '_Coro_actor_continue' is visible to debug.

Contributory to PR 99215.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (build_actor_fn): Make _Coro_actor_continue
visible to debug.
---
 gcc/cp/coroutines.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 395e5c488e5..b32c5dc5e55 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -2148,6 +2148,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 NULL_TREE);
   
   BIND_EXPR_VARS (actor_bind) = continuation;
+  BLOCK_VARS (top_block) = BIND_EXPR_VARS (actor_bind) ;
 
   /* Link in the block associated with the outer scope of the re-written
  function body.  */
-- 



[PATCH 7/8] coroutines: Make proxy vars for the function arg copies.

2021-09-01 Thread Iain Sandoe


This adds top level proxy variables for the coroutine frame
copies of the original function args.  These are then available
in the debugger to refer to the frame copies.  We rewrite the
function body to use the copies, since the original parms will
no longer be in scope when the coroutine is running.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (struct param_info): Add copy_var.
(build_actor_fn): Use simplified param references.
(register_param_uses): Likewise.
(rewrite_param_uses): Likewise.
(analyze_fn_parms): New function.
(coro_rewrite_function_body): Add proxies for the fn
parameters to the outer bind scope of the rewritten code.
(morph_fn_to_coro): Use simplified version of param ref.
---
 gcc/cp/coroutines.cc | 247 ---
 1 file changed, 117 insertions(+), 130 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index aacf352f1c9..395e5c488e5 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1964,6 +1964,7 @@ transform_await_wrapper (tree *stmt, int *do_subtree, 
void *d)
 struct param_info
 {
   tree field_id; /* The name of the copy in the coroutine frame.  */
+  tree copy_var; /* The local var proxy for the frame copy.  */
   vec *body_uses; /* Worklist of uses, void if there are none.  */
   tree frame_type;   /* The type used to represent this parm in the frame.  */
   tree orig_type;/* The original type of the parm (not as passed).  */
@@ -2169,36 +2170,6 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   /* Declare the continuation handle.  */
   add_decl_expr (continuation);
 
-  /* Re-write param references in the body, no code should be generated
- here.  */
-  if (DECL_ARGUMENTS (orig))
-{
-  tree arg;
-  for (arg = DECL_ARGUMENTS (orig); arg != NULL; arg = DECL_CHAIN (arg))
-   {
- bool existed;
- param_info &parm = param_uses->get_or_insert (arg, &existed);
- if (!parm.body_uses)
-   continue; /* Wasn't used in the original function body.  */
-
- tree fld_ref = lookup_member (coro_frame_type, parm.field_id,
-   /*protect=*/1, /*want_type=*/0,
-   tf_warning_or_error);
- tree fld_idx = build3_loc (loc, COMPONENT_REF, parm.frame_type,
-actor_frame, fld_ref, NULL_TREE);
-
- /* We keep these in the frame as a regular pointer, so convert that
-  back to the type expected.  */
- if (parm.pt_ref)
-   fld_idx = build1_loc (loc, CONVERT_EXPR, TREE_TYPE (arg), fld_idx);
-
- int i;
- tree *puse;
- FOR_EACH_VEC_ELT (*parm.body_uses, i, puse)
-   *puse = fld_idx;
-   }
-}
-
   /* Re-write local vars, similarly.  */
   local_vars_transform xform_vars_data
 = {actor, actor_frame, coro_frame_type, loc, local_var_uses};
@@ -3772,11 +3743,11 @@ struct param_frame_data
   bool param_seen;
 };
 
-/* A tree-walk callback that records the use of parameters (to allow for
-   optimizations where handling unused parameters may be omitted).  */
+/* A tree walk callback that rewrites each parm use to the local variable
+   that represents its copy in the frame.  */
 
 static tree
-register_param_uses (tree *stmt, int *do_subtree ATTRIBUTE_UNUSED, void *d)
+rewrite_param_uses (tree *stmt, int *do_subtree ATTRIBUTE_UNUSED, void *d)
 {
   param_frame_data *data = (param_frame_data *) d;
 
@@ -3784,7 +3755,7 @@ register_param_uses (tree *stmt, int *do_subtree 
ATTRIBUTE_UNUSED, void *d)
   if (TREE_CODE (*stmt) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (*stmt))
 {
   tree t = DECL_VALUE_EXPR (*stmt);
-  return cp_walk_tree (&t, register_param_uses, d, NULL);
+  return cp_walk_tree (&t, rewrite_param_uses, d, NULL);
 }
 
   if (TREE_CODE (*stmt) != PARM_DECL)
@@ -3798,16 +3769,87 @@ register_param_uses (tree *stmt, int *do_subtree 
ATTRIBUTE_UNUSED, void *d)
   param_info &parm = data->param_uses->get_or_insert (*stmt, &existed);
   gcc_checking_assert (existed);
 
-  if (!parm.body_uses)
+  *stmt = parm.copy_var;
+  return NULL_TREE;
+}
+
+/* Build up a set of info that determines how each param copy will be
+   handled.  */
+
+static hash_map *analyze_fn_parms (tree orig)
+{
+  if (!DECL_ARGUMENTS (orig))
+return NULL;
+
+  hash_map *param_uses = new hash_map;
+
+  /* Build a hash map with an entry for each param.
+ The key is the param tree.
+ Then we have an entry for the frame field name.
+ Then a cache for the field ref when we come to use it.
+ Then a tree list of the uses.
+ The second two entries start out empty - and only get populated
+ when we see uses.  */
+  bool lambda_p = LAMBDA_FUNCTION_P (orig);
+
+  unsigned no_name_parm = 0;
+  for (tree arg = DECL_ARGUMENTS (orig); arg != NULL; arg = DECL_CHAIN (arg))
 {

[PATCH] tree-optimization/93491 - avoid PRE of trapping calls across exits

2021-09-01 Thread Richard Biener via Gcc-patches
This makes us avoid PREing calls that could trap across other
calls that might not return.  The PR88087 testcase has exactly
such case so I've refactored the testcase to contain a valid PRE.
I've also adjusted PRE to not consider pure calls possibly
not returning in line with what we do elsewhere.

Note we don't have a good idea whether a function always returns
normally or whether its body is known to never trap.  That's
something IPA could compute.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-09-01  Richard Biener  

PR tree-optimization/93491
* tree-ssa-pre.c (compute_avail): Set BB_MAY_NOTRETURN
after processing the stmt itself.  Do not consider
pure functions possibly not returning.  Properly avoid
adding possibly trapping calls to EXP_GEN when there's
a preceeding possibly not returning call.
* tree-ssa-sccvn.c (vn_reference_may_trap): Conservatively
not handle calls.

* gcc.dg/torture/pr93491.c: New testcase.
* gcc.dg/tree-ssa/pr88087.c: Change to valid PRE opportunity.
---
 gcc/testsuite/gcc.dg/torture/pr93491.c  | 24 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr88087.c | 18 +++---
 gcc/tree-ssa-pre.c  | 33 +++--
 gcc/tree-ssa-sccvn.c|  1 +
 4 files changed, 60 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr93491.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr93491.c 
b/gcc/testsuite/gcc.dg/torture/pr93491.c
new file mode 100644
index 000..2cb4c0ca7af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr93491.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+extern void exit (int);
+
+__attribute__((noipa))
+void f(int i)
+{
+  exit(i);
+}
+
+__attribute__((const,noipa))
+int g(int i)
+{
+  return 1 / i;
+}
+
+int main()
+{
+  while (1)
+{
+  f(0);
+  f(g(0));
+}
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88087.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr88087.c
index d0061b61aed..c48dba5bf21 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr88087.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88087.c
@@ -1,17 +1,17 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fno-code-hoisting -fdump-tree-pre-stats" } */
 
 int f();
 int d;
-void c()
+void c(int x)
 {
-  for (;;)
-{
-  f();
-  int (*fp)() __attribute__((const)) = (void *)f;
-  d = fp();
-}
+  int (*fp)() __attribute__((const)) = (void *)f;
+  if (x)
+d = fp ();
+  int tem = fp ();
+  f();
+  d = tem;
 }
 
-/* We shouldn't ICE and hoist the const call of fp out of the loop.  */
+/* We shouldn't ICE and PRE the const call.  */
 /* { dg-final { scan-tree-dump "Eliminated: 1" "pre" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index ebe95cc6c73..769aadb2315 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3957,6 +3957,7 @@ compute_avail (function *fun)
 
   /* Now compute value numbers and populate value sets with all
 the expressions computed in BLOCK.  */
+  bool set_bb_may_notreturn = false;
   for (gimple_stmt_iterator gsi = gsi_start_bb (block); !gsi_end_p (gsi);
   gsi_next (&gsi))
{
@@ -3965,6 +3966,12 @@ compute_avail (function *fun)
 
  stmt = gsi_stmt (gsi);
 
+ if (set_bb_may_notreturn)
+   {
+ BB_MAY_NOTRETURN (block) = 1;
+ set_bb_may_notreturn = false;
+   }
+
  /* Cache whether the basic-block has any non-visible side-effect
 or control flow.
 If this isn't a call or it is the last stmt in the
@@ -3976,10 +3983,12 @@ compute_avail (function *fun)
 that forbids hoisting possibly trapping expressions
 before it.  */
  int flags = gimple_call_flags (stmt);
- if (!(flags & ECF_CONST)
+ if (!(flags & (ECF_CONST|ECF_PURE))
  || (flags & ECF_LOOPING_CONST_OR_PURE)
  || stmt_can_throw_external (fun, stmt))
-   BB_MAY_NOTRETURN (block) = 1;
+   /* Defer setting of BB_MAY_NOTRETURN to avoid it
+  influencing the processing of the call itself.  */
+   set_bb_may_notreturn = true;
}
 
  FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_DEF)
@@ -4030,11 +4039,16 @@ compute_avail (function *fun)
/* If the value of the call is not invalidated in
   this block until it is computed, add the expression
   to EXP_GEN.  */
-   if (!gimple_vuse (stmt)
-   || gimple_code
-(SSA_NAME_DEF_STMT (gimple_vuse (stmt))) == GIMPLE_PHI
-   || gimple_bb (SSA_NAME_DEF_STMT
-   (gimple_vuse (stmt))) != block)
+   if ((!gimple_vuse (stmt)
+|| gimple_code
+ (SSA_NAME_D

[PATCH 6/8] coroutines: Convert implementation variables to debug-friendly form.

2021-09-01 Thread Iain Sandoe


The user might well wish to inspect some of the state that represents
the implementation of the coroutine machine.

In particular:
  The promise object.
  The function pointers for the resumer and destroyer.
  The current resume index (suspend point).
  The handle that represent this coroutine 'self handle'.
  Whether the coroutine frame is allocated and needs to be freed.

These variables are given names that can be 'well-known' and advertised
in debug documentation - they are placed in the implementation namespace
and all begin with _Coro_.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (transform_await_expr): Use debug-friendly
names for coroutine implementation.
(build_actor_fn): Likewise.
(build_destroy_fn): Likewise.
(coro_rewrite_function_body): Likewise.
(morph_fn_to_coro): Likewise.
---
 gcc/cp/coroutines.cc | 214 +++
 1 file changed, 94 insertions(+), 120 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 3b46aac4dc5..aacf352f1c9 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1906,7 +1906,6 @@ transform_await_expr (tree await_expr, await_xform_data 
*xform)
   /* So, on entry, we have:
  in : CO_AWAIT_EXPR (a, e_proxy, o, awr_call_vector, mode)
  We no longer need a [it had diagnostic value, maybe?]
- We need to replace the promise proxy in all elements
  We need to replace the e_proxy in the awr_call.  */
 
   tree coro_frame_type = TREE_TYPE (xform->actor_frame);
@@ -1932,16 +1931,6 @@ transform_await_expr (tree await_expr, await_xform_data 
*xform)
   TREE_OPERAND (await_expr, 1) = as;
 }
 
-  /* Now do the self_handle.  */
-  data.from = xform->self_h_proxy;
-  data.to = xform->real_self_h;
-  cp_walk_tree (&await_expr, replace_proxy, &data, NULL);
-
-  /* Now do the promise.  */
-  data.from = xform->promise_proxy;
-  data.to = xform->real_promise;
-  cp_walk_tree (&await_expr, replace_proxy, &data, NULL);
-
   return await_expr;
 }
 
@@ -2128,10 +2117,9 @@ coro_get_frame_dtor (tree coro_fp, tree orig, tree 
frame_size,
 
 static void
 build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,
-   tree orig, hash_map *param_uses,
-   hash_map *local_var_uses,
-   vec *param_dtor_list, tree resume_fn_field,
-   tree resume_idx_field, unsigned body_count, tree frame_size)
+   tree orig, hash_map *local_var_uses,
+   vec *param_dtor_list,
+   tree resume_idx_var, unsigned body_count, tree frame_size)
 {
   verify_stmt_tree (fnbody);
   /* Some things we inherit from the original function.  */
@@ -2216,8 +2204,8 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 = {actor, actor_frame, coro_frame_type, loc, local_var_uses};
   cp_walk_tree (&fnbody, transform_local_var_uses, &xform_vars_data, NULL);
 
-  tree rat_field = lookup_member (coro_frame_type, coro_resume_index_field, 1, 
0,
- tf_warning_or_error);
+  tree rat_field = lookup_member (coro_frame_type, coro_resume_index_field,
+ 1, 0, tf_warning_or_error);
   tree rat = build3 (COMPONENT_REF, short_unsigned_type_node, actor_frame,
 rat_field, NULL_TREE);
 
@@ -2319,14 +2307,8 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   tree r = build_stmt (loc, LABEL_EXPR, actor_begin_label);
   add_stmt (r);
 
-  /* actor's version of the promise.  */
-  tree ap_m = lookup_member (coro_frame_type, get_identifier 
("_Coro_promise"), 1, 0,
-tf_warning_or_error);
-  tree ap = build_class_member_access_expr (actor_frame, ap_m, NULL_TREE, 
false,
-   tf_warning_or_error);
-
   /* actor's coroutine 'self handle'.  */
-  tree ash_m = lookup_member (coro_frame_type, get_identifier 
("_Coro_self_handle"), 1,
+  tree ash_m = lookup_member (coro_frame_type, coro_self_handle_field, 1,
  0, tf_warning_or_error);
   tree ash = build_class_member_access_expr (actor_frame, ash_m, NULL_TREE,
 false, tf_warning_or_error);
@@ -2347,36 +2329,13 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
  decide where to put things.  */
 
   await_xform_data xform
-= {actor, actor_frame, promise_proxy, ap, self_h_proxy, ash};
+= {actor, actor_frame, NULL_TREE, NULL_TREE, self_h_proxy, ash};
 
   /* Transform the await expressions in the function body.  Only do each
  await tree once!  */
   hash_set pset;
   cp_walk_tree (&fnbody, transform_await_wrapper, &xform, &pset);
 
-  /* Now replace the promise proxy with its real value.  */
-  proxy_replace p_data;
-  p_data.from = promise_proxy;
-  p_data.to = ap;
-  cp_walk_tree (&fnbody, replace_proxy, &p_dat

[PATCH 5/8] coroutines: Define and populate accessors for debug state.

2021-09-01 Thread Iain Sandoe


This is an efficiency measure and repeats the pattern used for
other identifiers used in the coroutine implementation.

In support of debugging, the user might well need to look at some
of the variables that the implementation manipulates in lowering
the coroutines.  The defines the identifiers for these and populates
them on demand (avoiding repeated identifier calls).

Contributory to debug support (PR 99215)

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc: Add identifiers for implementation
variables that we want to expose to debug.
(coro_init_identifiers): Initialize implementation names.
(coro_promise_type_found_p): Use pre-built identifiers.
(build_actor_fn): Likewise.
(build_destroy_fn): Likewise.
---
 gcc/cp/coroutines.cc | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 081e1a46c63..3b46aac4dc5 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -215,7 +215,17 @@ static GTY(()) tree coro_await_ready_identifier;
 static GTY(()) tree coro_await_suspend_identifier;
 static GTY(()) tree coro_await_resume_identifier;
 
-/* Create the identifiers used by the coroutines library interfaces.  */
+/* Accessors for the coroutine frame state used by the implementation.  */
+
+static GTY(()) tree coro_resume_fn_field;
+static GTY(()) tree coro_destroy_fn_field;
+static GTY(()) tree coro_promise_field;
+static GTY(()) tree coro_frame_needs_free_field;
+static GTY(()) tree coro_resume_index_field;
+static GTY(()) tree coro_self_handle_field;
+
+/* Create the identifiers used by the coroutines library interfaces and
+   the implementation frame state.  */
 
 static void
 coro_init_identifiers ()
@@ -241,6 +251,14 @@ coro_init_identifiers ()
   coro_await_ready_identifier = get_identifier ("await_ready");
   coro_await_suspend_identifier = get_identifier ("await_suspend");
   coro_await_resume_identifier = get_identifier ("await_resume");
+
+  /* Coroutine state frame field accessors.  */
+  coro_resume_fn_field = get_identifier ("_Coro_resume_fn");
+  coro_destroy_fn_field = get_identifier ("_Coro_destroy_fn");
+  coro_promise_field = get_identifier ("_Coro_promise");
+  coro_frame_needs_free_field = get_identifier ("_Coro_frame_needs_free");
+  coro_resume_index_field = get_identifier ("_Coro_resume_index");
+  coro_self_handle_field = get_identifier ("_Coro_self_handle");
 }
 
 /* Trees we only need to set up once.  */
@@ -513,12 +531,12 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
   /* Build a proxy for a handle to "self" as the param to
 await_suspend() calls.  */
   coro_info->self_h_proxy
-   = build_lang_decl (VAR_DECL, get_identifier ("_Coro_self_handle"),
+   = build_lang_decl (VAR_DECL, coro_self_handle_field,
   coro_info->handle_type);
 
   /* Build a proxy for the promise so that we can perform lookups.  */
   coro_info->promise_proxy
-   = build_lang_decl (VAR_DECL, get_identifier ("_Coro_promise"),
+   = build_lang_decl (VAR_DECL, coro_promise_field,
   coro_info->promise_type);
 
   /* Note where we first saw a coroutine keyword.  */
@@ -2198,8 +2216,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 = {actor, actor_frame, coro_frame_type, loc, local_var_uses};
   cp_walk_tree (&fnbody, transform_local_var_uses, &xform_vars_data, NULL);
 
-  tree resume_idx_name = get_identifier ("_Coro_resume_index");
-  tree rat_field = lookup_member (coro_frame_type, resume_idx_name, 1, 0,
+  tree rat_field = lookup_member (coro_frame_type, coro_resume_index_field, 1, 
0,
  tf_warning_or_error);
   tree rat = build3 (COMPONENT_REF, short_unsigned_type_node, actor_frame,
 rat_field, NULL_TREE);
@@ -2462,7 +2479,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 
   /* We will need to know which resume point number should be encoded.  */
   tree res_idx_m
-= lookup_member (coro_frame_type, resume_idx_name,
+= lookup_member (coro_frame_type, coro_resume_index_field,
 /*protect=*/1, /*want_type=*/0, tf_warning_or_error);
   tree resume_pt_number
 = build_class_member_access_expr (actor_frame, res_idx_m, NULL_TREE, false,
@@ -2504,8 +2521,7 @@ build_destroy_fn (location_t loc, tree coro_frame_type, 
tree destroy,
 
   tree destr_frame = build1 (INDIRECT_REF, coro_frame_type, destr_fp);
 
-  tree resume_idx_name = get_identifier ("_Coro_resume_index");
-  tree rat_field = lookup_member (coro_frame_type, resume_idx_name, 1, 0,
+  tree rat_field = lookup_member (coro_frame_type, coro_resume_index_field, 1, 
0,
  tf_warning_or_error);
   tree rat = build3 (COMPONENT_REF, short_unsigned_type_node, destr_frame,
 rat_field, NULL_TREE)

[PATCH 4/8] coroutines: Make some of the artificial names more debugger-friendly.

2021-09-01 Thread Iain Sandoe


Some of the compiler-generated entries are of interest to a
user debugging - keep variables in the implementation namespace
but avoid using periods as separators (which is not compatible
with visible symbols for some assemblers).

Partial improvement to debugging (PR 99215).

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (coro_promise_type_found_p): Rename variable
to make it suitable for debugging.
(build_actor_fn): Likewise.
(build_destroy_fn): Likewise.
(register_local_var_uses): Likewise.
(coro_rewrite_function_body): Likewise.
(morph_fn_to_coro): Likewise.
---
 gcc/cp/coroutines.cc | 59 ++--
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index a12714ea67e..081e1a46c63 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -513,12 +513,12 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
   /* Build a proxy for a handle to "self" as the param to
 await_suspend() calls.  */
   coro_info->self_h_proxy
-   = build_lang_decl (VAR_DECL, get_identifier ("self_h.proxy"),
+   = build_lang_decl (VAR_DECL, get_identifier ("_Coro_self_handle"),
   coro_info->handle_type);
 
   /* Build a proxy for the promise so that we can perform lookups.  */
   coro_info->promise_proxy
-   = build_lang_decl (VAR_DECL, get_identifier ("promise.proxy"),
+   = build_lang_decl (VAR_DECL, get_identifier ("_Coro_promise"),
   coro_info->promise_type);
 
   /* Note where we first saw a coroutine keyword.  */
@@ -2198,7 +2198,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 = {actor, actor_frame, coro_frame_type, loc, local_var_uses};
   cp_walk_tree (&fnbody, transform_local_var_uses, &xform_vars_data, NULL);
 
-  tree resume_idx_name = get_identifier ("__resume_at");
+  tree resume_idx_name = get_identifier ("_Coro_resume_index");
   tree rat_field = lookup_member (coro_frame_type, resume_idx_name, 1, 0,
  tf_warning_or_error);
   tree rat = build3 (COMPONENT_REF, short_unsigned_type_node, actor_frame,
@@ -2303,13 +2303,13 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   add_stmt (r);
 
   /* actor's version of the promise.  */
-  tree ap_m = lookup_member (coro_frame_type, get_identifier ("__p"), 1, 0,
+  tree ap_m = lookup_member (coro_frame_type, get_identifier 
("_Coro_promise"), 1, 0,
 tf_warning_or_error);
   tree ap = build_class_member_access_expr (actor_frame, ap_m, NULL_TREE, 
false,
tf_warning_or_error);
 
   /* actor's coroutine 'self handle'.  */
-  tree ash_m = lookup_member (coro_frame_type, get_identifier ("__self_h"), 1,
+  tree ash_m = lookup_member (coro_frame_type, get_identifier 
("_Coro_self_handle"), 1,
  0, tf_warning_or_error);
   tree ash = build_class_member_access_expr (actor_frame, ash_m, NULL_TREE,
 false, tf_warning_or_error);
@@ -2343,12 +2343,12 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   p_data.to = ap;
   cp_walk_tree (&fnbody, replace_proxy, &p_data, NULL);
 
-  /* The rewrite of the function adds code to set the __resume field to
+  /* The rewrite of the function adds code to set the resume_fn field to
  nullptr when the coroutine is done and also the index to zero when
  calling an unhandled exception.  These are represented by two proxies
  in the function, so rewrite them to the proper frame access.  */
   tree resume_m
-= lookup_member (coro_frame_type, get_identifier ("__resume"),
+= lookup_member (coro_frame_type, get_identifier ("_Coro_resume_fn"),
 /*protect=*/1, /*want_type=*/0, tf_warning_or_error);
   tree res_x = build_class_member_access_expr (actor_frame, resume_m, 
NULL_TREE,
   false, tf_warning_or_error);
@@ -2381,7 +2381,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   /* Here deallocate the frame (if we allocated it), which we will have at
  present.  */
   tree fnf_m
-= lookup_member (coro_frame_type, get_identifier ("__frame_needs_free"), 1,
+= lookup_member (coro_frame_type, get_identifier 
("_Coro_frame_needs_free"), 1,
 0, tf_warning_or_error);
   tree fnf2_x = build_class_member_access_expr (actor_frame, fnf_m, NULL_TREE,
false, tf_warning_or_error);
@@ -2504,7 +2504,7 @@ build_destroy_fn (location_t loc, tree coro_frame_type, 
tree destroy,
 
   tree destr_frame = build1 (INDIRECT_REF, coro_frame_type, destr_fp);
 
-  tree resume_idx_name = get_identifier ("__resume_at");
+  tree resume_idx_name = g

PATCH 3/8] coroutines: Support for debugging implementation state.

2021-09-01 Thread Iain Sandoe


Some of the state that is associated with the implementation
is of interest to a user debugging a coroutine.  In particular
items such as the suspend point, promise object, and current
suspend point.

These variables live in the coroutine frame, but we can inject
proxies for them into the outermost bind expression of the
coroutine.  Such variables are automatically moved into the
coroutine frame (if they need to persist across a suspend
expression).  PLacing the proxies thus allows the user to
inspect them by name in the debugger.

To implement this, we ensure that (at the outermost scope) the
frame entries are not mangled (coroutine frame variables are
usually mangled with scope nesting information so that they do
not clash).  We can safely avoid doing this for the outermost
scope so that we can map frame entries directly to the variables.

This is partial contribution to debug support (PR 99215).

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (register_local_var_uses): Do not mangle
frame entries for the outermost scope.  Record the outer
scope as nesting depth 0.
---
 gcc/cp/coroutines.cc | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index b8501032969..a12714ea67e 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3885,8 +3885,6 @@ register_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
 
   if (TREE_CODE (*stmt) == BIND_EXPR)
 {
-  lvd->bind_indx++;
-  lvd->nest_depth++;
   tree lvar;
   for (lvar = BIND_EXPR_VARS (*stmt); lvar != NULL;
   lvar = DECL_CHAIN (lvar))
@@ -3925,11 +3923,17 @@ register_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
continue;
 
  /* Make names depth+index unique, so that we can support nested
-scopes with identically named locals.  */
+scopes with identically named locals and still be able to
+identify them in the coroutine frame.  */
  tree lvname = DECL_NAME (lvar);
  char *buf;
- if (lvname != NULL_TREE)
-   buf = xasprintf ("__%s.%u.%u", IDENTIFIER_POINTER (lvname),
+ /* The outermost bind scope contains the artificial variables that
+we inject to implement the coro state machine.  We want to be able
+to inspect these in debugging.  */
+ if (lvname != NULL_TREE && lvd->nest_depth == 0)
+   buf = xasprintf ("%s", IDENTIFIER_POINTER (lvname));
+ else if (lvname != NULL_TREE)
+   buf = xasprintf ("%s_%u_%u", IDENTIFIER_POINTER (lvname),
 lvd->nest_depth, lvd->bind_indx);
  else
buf = xasprintf ("_D%u.%u.%u", DECL_UID (lvar), lvd->nest_depth,
@@ -3942,6 +3946,8 @@ register_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  /* We don't walk any of the local var sub-trees, they won't contain
 any bind exprs.  */
}
+  lvd->bind_indx++;
+  lvd->nest_depth++;
   cp_walk_tree (&BIND_EXPR_BODY (*stmt), register_local_var_uses, d, NULL);
   *do_subtree = 0; /* We've done this.  */
   lvd->nest_depth--;
-- 



[PATCH 2/8] coroutines: Add a helper for creating local vars.

2021-09-01 Thread Iain Sandoe


This is primarily code factoring, but we take this opportunity
to rename some of the implementation variables (which we intend
to expose to debugging) so that they are in the implementation
namespace.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (coro_build_artificial_var): New.
(build_actor_fn): Use var builder, rename vars to use
implementation namespace.
(coro_rewrite_function_body): Likewise.
(morph_fn_to_coro): Likewise.
---
 gcc/cp/coroutines.cc | 69 +++-
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 2d68098f242..b8501032969 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1474,6 +1474,29 @@ coro_build_cvt_void_expr_stmt (tree expr, location_t loc)
   return coro_build_expr_stmt (t, loc);
 }
 
+/* Helpers to build an artificial var, with location LOC, NAME and TYPE, in
+   CTX, and with initializer INIT.  */
+
+static tree
+coro_build_artificial_var (location_t loc, tree name, tree type, tree ctx,
+  tree init)
+{
+  tree res = build_lang_decl (VAR_DECL, name, type);
+  DECL_SOURCE_LOCATION (res) = loc;
+  DECL_CONTEXT (res) = ctx;
+  DECL_ARTIFICIAL (res) = true;
+  DECL_INITIAL (res) = init;
+  return res;
+}
+
+static tree
+coro_build_artificial_var (location_t loc, const char *name, tree type,
+  tree ctx, tree init)
+{
+  return coro_build_artificial_var (loc, get_identifier (name),
+   type, ctx, init);
+}
+
 /* Helpers for label creation:
1. Create a named label in the specified context.  */
 
@@ -2113,12 +2136,10 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
   tree top_block = make_node (BLOCK);
   BIND_EXPR_BLOCK (actor_bind) = top_block;
 
-  tree continuation = build_lang_decl (VAR_DECL,
-  get_identifier ("actor.continue"),
-  void_coro_handle_type);
-  DECL_ARTIFICIAL (continuation) = 1;
-  DECL_IGNORED_P (continuation) = 1;
-  DECL_CONTEXT (continuation) = actor;
+  tree continuation = coro_build_artificial_var (loc, "_Coro_actor_continue",
+void_coro_handle_type, actor,
+NULL_TREE);
+  
   BIND_EXPR_VARS (actor_bind) = continuation;
 
   /* Link in the block associated with the outer scope of the re-written
@@ -4069,12 +4090,11 @@ coro_rewrite_function_body (location_t fn_start, tree 
fnbody, tree orig,
 fn_start, NULL, /*musthave=*/true);
   /* Create and initialize the initial-await-resume-called variable per
 [dcl.fct.def.coroutine] / 5.3.  */
-  tree i_a_r_c = build_lang_decl (VAR_DECL, get_identifier ("i_a_r_c"),
- boolean_type_node);
-  DECL_ARTIFICIAL (i_a_r_c) = true;
+  tree i_a_r_c = coro_build_artificial_var (fn_start, "_Coro_i_a_r_c",
+   boolean_type_node, orig,
+   boolean_false_node);
   DECL_CHAIN (i_a_r_c) = var_list;
   var_list = i_a_r_c;
-  DECL_INITIAL (i_a_r_c) = boolean_false_node;
   add_decl_expr (i_a_r_c);
   /* Start the try-catch.  */
   tree tcb = build_stmt (fn_start, TRY_BLOCK, NULL_TREE, NULL_TREE);
@@ -4459,8 +4479,10 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   add_stmt (ramp_bind);
   tree ramp_body = push_stmt_list ();
 
-  tree coro_fp = build_lang_decl (VAR_DECL, get_identifier ("coro.frameptr"),
- coro_frame_ptr);
+  tree zeroinit = build1_loc (fn_start, CONVERT_EXPR,
+ coro_frame_ptr, integer_zero_node);
+  tree coro_fp = coro_build_artificial_var (fn_start, "_Coro_frameptr",
+   coro_frame_ptr, orig, zeroinit);
   tree varlist = coro_fp;
 
   /* To signal that we need to cleanup copied function args.  */
@@ -4478,21 +4500,19 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
 
   /* Signal that we need to clean up the promise object on exception.  */
   tree coro_promise_live
-   = build_lang_decl (VAR_DECL, get_identifier ("coro.promise.live"),
- boolean_type_node);
-  DECL_ARTIFICIAL (coro_promise_live) = true;
+= coro_build_artificial_var (fn_start, "_Coro_promise_live",
+boolean_type_node, orig, boolean_false_node);
   DECL_CHAIN (coro_promise_live) = varlist;
   varlist = coro_promise_live;
-  DECL_INITIAL (coro_promise_live) = boolean_false_node;
+
   /* When the get-return-object is in the RETURN slot, we need to arrange for
  cleanup on exception.  */
   tree coro_gro_live
-   = build_lang_decl (VAR_DECL, get_identifier ("coro.gro.live"),
- boolean_

[PATCH 1/8] coroutines : Use DECL_VALUE_EXPR instead of rewriting vars.

2021-09-01 Thread Iain Sandoe
Hi,

Variables that need to persist over suspension expressions
must be preserved by being copied into the coroutine frame.

The initial implementations do this manually in the transform
code.  However, that has various disadvantages - including
that the debug connections are lost between the original var
and the frame copy.

The revised implementation makes use of DECL_VALUE_EXPRs to
contain the frame offset expressions, so that the original
var names are preserved in the code.

This process is also applied to the function parms which are
always copied to the frame.  In this case the decls need to be
copied since they are used in two different contexts during
the re-write (in the building of the ramp function, and in
the actor function itself).

This will assist in improvement of debugging (PR 99215).

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (transform_local_var_uses): Record
frame offset expressions as DECL_VALUE_EXPRs instead of
rewriting them.
---
 gcc/cp/coroutines.cc | 105 +++
 1 file changed, 5 insertions(+), 100 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index ceb3d3be75e..2d68098f242 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1974,8 +1974,7 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
   local_vars_transform *lvd = (local_vars_transform *) d;
 
   /* For each var in this bind expr (that has a frame id, which means it was
- accessed), build a frame reference for each and then walk the bind expr
- statements, substituting the frame ref for the original var.  */
+ accessed), build a frame reference and add it as the DECL_VALUE_EXPR.  */
 
   if (TREE_CODE (*stmt) == BIND_EXPR)
 {
@@ -1991,13 +1990,9 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  /* Re-write the variable's context to be in the actor func.  */
  DECL_CONTEXT (lvar) = lvd->context;
 
-   /* For capture proxies, this could include the decl value expr.  */
-   if (local_var.is_lambda_capture || local_var.has_value_expr_p)
- {
-   tree ve = DECL_VALUE_EXPR (lvar);
-   cp_walk_tree (&ve, transform_local_var_uses, d, NULL);
+ /* For capture proxies, this could include the decl value expr.  */
+ if (local_var.is_lambda_capture || local_var.has_value_expr_p)
continue; /* No frame entry for this.  */
- }
 
  /* TODO: implement selective generation of fields when vars are
 known not-used.  */
@@ -2011,103 +2006,13 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  tree fld_idx = build3_loc (lvd->loc, COMPONENT_REF, TREE_TYPE (lvar),
 lvd->actor_frame, fld_ref, NULL_TREE);
  local_var.field_idx = fld_idx;
-   }
-  /* FIXME: we should be able to do this in the loop above, but (at least
-for range for) there are cases where the DECL_INITIAL contains
-forward references.
-So, now we've built the revised var in the frame, substitute uses of
-it in initializers and the bind expr body.  */
-  for (lvar = BIND_EXPR_VARS (*stmt); lvar != NULL;
-  lvar = DECL_CHAIN (lvar))
-   {
- /* we need to walk some of the decl trees, which might contain
-references to vars replaced at a higher level.  */
- cp_walk_tree (&DECL_INITIAL (lvar), transform_local_var_uses, d,
-   NULL);
- cp_walk_tree (&DECL_SIZE (lvar), transform_local_var_uses, d, NULL);
- cp_walk_tree (&DECL_SIZE_UNIT (lvar), transform_local_var_uses, d,
-   NULL);
+ SET_DECL_VALUE_EXPR (lvar, fld_idx);
+ DECL_HAS_VALUE_EXPR_P (lvar) = true;
}
   cp_walk_tree (&BIND_EXPR_BODY (*stmt), transform_local_var_uses, d, 
NULL);
-
-  /* Now we have processed and removed references to the original vars,
-we can drop those from the bind - leaving capture proxies alone.  */
-  for (tree *pvar = &BIND_EXPR_VARS (*stmt); *pvar != NULL;)
-   {
- bool existed;
- local_var_info &local_var
-   = lvd->local_var_uses->get_or_insert (*pvar, &existed);
- gcc_checking_assert (existed);
-
- /* Leave lambda closure captures alone, we replace the *this
-pointer with the frame version and let the normal process
-deal with the rest.
-Likewise, variables with their value found elsewhere.
-Skip past unused ones too.  */
- if (local_var.is_lambda_capture
-|| local_var.has_value_expr_p
-|| local_var.field_id == NULL_TREE)
-   {
- pvar = &DECL_CHAIN (*pvar);
- continue;
-   }
-
- /* Discard this one, we replaced it.  */
- *pvar = DECL_CHAIN (*pvar);
-   }
-
   *do_subtree = 0; /* We've done the body a

[PATCH 0/8] coroutines: Use DECL_VALUE_EXPRs to assist in debug [PR99215]

2021-09-01 Thread Iain Sandoe
Hi,

As discussed with Jason at WG21 in Prague - last year, this amends the
way in which the coroutine frame copies of variables are represented in
the front end lowering.  Instead of an explicit pointer->field we now
apply a DECL_VALUE_EXPR to each instead (where that value is the
pointer->field).

Although this was initially targeted at improving the debug experience
in fact it simplifies and shortens the code, and makes subsequent bug-
fixes easier to implement.  Therefore, I'm posting this series first.

In addition to the change applied to local variables:

  - there is an implementation of a mechanism to make the original
function arguments debug-visible.  These are actually copied into
the frame by the ramp function, so we create proxy vars for them 
and attach a DECL_VALUE_EXPR for the frame copy.

  - It is likely that someone debugging a coroutine will want to inspect
parts of the implementation state (especially items such as the
suspend index and the promise object).  Although these are not a
direct part of the user's source code, we expose them as synthetic
variables in the implementation name-space _Coro_xx.

  Patches to follow:
  coroutines: Use DECL_VALUE_EXPR instead of rewriting vars.
  coroutines: Add a helper for creating local vars.
  coroutines: Support for debugging implementation state.
  coroutines: Make some of the artificial names more debugger-friendly.
  coroutines: Define and populate accessors for debug state.
  coroutines: Convert implementation variables to debug-friendly form.
  coroutines: Make proxy vars for the function arg copies.
  coroutines: Make the continue handle visible to debug.

 gcc/cp/coroutines.cc | 701 +++
 1 file changed, 304 insertions(+), 397 deletions(-)

This has been tested quite a bit over some months on x86_64-darwin and
on x86_64, powerpc64-linux,
OK for master?
backports?

thanks,
Iain

***
I've tested with GDB and LLDB that we now have visibility of the coro
state, local variables and function parameters.  However the debug
experience is still a little "jumpy" in respect of stepping through the
code - this is because synthesized expressions default to using
'input_location' as their location - which is typically pointing to the
closing brace of the function.  So I think there is more that can
be done for PR99215, but it's a different kind of problem from the one
being solved with the DECL_VALUE_EXPRs.



Re: [PATCH] tree-optimization/102139 - fix SLP DR base alignment

2021-09-01 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 1 Sep 2021, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
>> >  wrote:
>> >>
>> >> When doing whole-function SLP we have to make sure the recorded
>> >> base alignments we compute as the maximum alignment seen for a
>> >> base anywhere in the function is actually valid at the point
>> >> we want to make use of it.
>> 
>> Ah, yeah, the danger of optimisations that silently rely on the
>> then-current restrictions :-(
>
> Yeah.
>
>> >> To make this work we now record the stmt the alignment was derived
>> >> from in addition to the DRs innermost behavior and we use a
>> >> dominance check to verify the recorded info is valid when doing
>> >> BB vectorization.
>> >>
>> >> Note this leaves a small(?) hole for the case where we have sth
>> >> like
>> >>
>> >> unaligned DR
>> >> call (); // does not return
>> >> aligned DR
>> >>
>> >> since we'll derive an aligned access for the earlier DR but the
>> >> later DR is never actually reached since the call does not
>> >> return.  To plug this hole one option (for the easy backporting)
>> >> would be to simply not use the base-alignment recording at all.
>> >> Alternatively we'd have to store the dataref grouping 'id' somewhere
>> >> in the DR itself and use that to handle this particular case.
>> >
>> > It turns out this isn't too difficult so the following is a patch adjusted
>> > to cover that case together with a testcase.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > OK?
>> 
>> TBH I know nothing about this group id scheme, so I'm not really
>> in a position to comment.  But it LGTM from the (few) bits I do understand.
>> 
>> I guess we're leaving the same easter egg for loop optimisation if
>> we ever support early exits, but I'm not sure what to do about that.
>
> We're currently not recording alignment from masked DRs
> (aka DR_IS_CONDITIONAL_IN_STMT), I suppose we'd need to mark
> all stmts after early exits in this way then since in the end
> they will have to be masked on the early exit.
>
> So we _might_ be fine there ...

Yeah, for a pure SVE-like implementation that's probably true.  But we
also have the option of vectorising an early exit by branching if the
condition is true for *any* element, then handling the remaining
elements with an epilogue.

It would be quite a different approach from what we're doing now,
and wouldn't necessarily require up-front if-conversion.  But the
point is that it's theoretically possible, just as whole-function
SLP was theoretically possible (but seemingly some way off) when
this code was written :-)

Thanks,
Richard


Re: [PATCH] vectorizer: Fix up vectorization using WIDEN_MINUS_EXPR [PR102124]

2021-09-01 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> Hi!
>
> The following testcase is miscompiled on aarch64-linux at -O3 since the
> introduction of WIDEN_MINUS_EXPR.
> The problem is if the inner type (half_type) is unsigned and the result
> type in which the subtraction is performed (type) has precision more than
> twice as larger as the inner type's precision.
> For other widening operations like WIDEN_{PLUS,MULT}_EXPR, if half_type
> is unsigned, the addition/multiplication result in itype is also unsigned
> and needs to be zero-extended to type.
> But subtraction is special, even when half_type is unsigned, the subtraction
> behaves as signed (also regardless of whether the result type is signed or
> unsigned), 0xfeU - 0xffU is -1 or 0xU, not 0x.
>
> I think it is better not to use mixed signedness of types in
> WIDEN_MINUS_EXPR (have unsigned vector of operands and signed result
> vector), so this patch instead adds another cast to make sure we always
> sign-extend the result from itype to type if type is wider than itype.
>
> Bootstrapped/regtested on aarch64-linux, x86_64-linux and i686-linux, ok
> for trunk/11.3?
>
> 2021-08-31  Jakub Jelinek  
>
>   PR tree-optimization/102124
>   * tree-vect-patterns.c (vect_recog_widen_op_pattern): For ORIG_CODE
>   MINUS_EXPR, if itype is unsigned with smaller precision than type,
>   add an extra cast to signed variant of itype to ensure sign-extension.
>
>   * gcc.dg/torture/pr102124.c: New test.

LGTM.  I was wondering whether it would be better to add a new
non-default parameter to select this behaviour, so that any new
callers have to think about it too.  It would also be more consistent
with the shift_p parameter.

Either way's fine though, so the patch is OK as-is.

Thanks,
Richard

> --- gcc/tree-vect-patterns.c.jj   2021-08-17 21:05:07.0 +0200
> +++ gcc/tree-vect-patterns.c  2021-08-30 11:54:03.651474845 +0200
> @@ -1268,11 +1268,31 @@ vect_recog_widen_op_pattern (vec_info *v
>/* Check target support  */
>tree vectype = get_vectype_for_scalar_type (vinfo, half_type);
>tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree ctype = itype;
> +  tree vecctype = vecitype;
> +  if (orig_code == MINUS_EXPR
> +  && TYPE_UNSIGNED (itype)
> +  && TYPE_PRECISION (type) > TYPE_PRECISION (itype))
> +{
> +  /* Subtraction is special, even if half_type is unsigned and no matter
> +  whether type is signed or unsigned, if type is wider than itype,
> +  we need to sign-extend from the widening operation result to the
> +  result type.
> +  Consider half_type unsigned char, operand 1 0xfe, operand 2 0xff,
> +  itype unsigned short and type either int or unsigned int.
> +  Widened (unsigned short) 0xfe - (unsigned short) 0xff is
> +  (unsigned short) 0x, but for type int we want the result -1
> +  and for type unsigned int 0x rather than 0x.  */
> +  ctype = build_nonstandard_integer_type (TYPE_PRECISION (itype), 0);
> +  vecctype = get_vectype_for_scalar_type (vinfo, ctype);
> +}
> +
>enum tree_code dummy_code;
>int dummy_int;
>auto_vec dummy_vec;
>if (!vectype
>|| !vecitype
> +  || !vecctype
>|| !supportable_widening_operation (vinfo, wide_code, last_stmt_info,
> vecitype, vectype,
> &dummy_code, &dummy_code,
> @@ -1291,8 +1311,12 @@ vect_recog_widen_op_pattern (vec_info *v
>gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> oprnd[0], oprnd[1]);
>  
> +  if (vecctype != vecitype)
> +pattern_stmt = vect_convert_output (vinfo, last_stmt_info, ctype,
> + pattern_stmt, vecitype);
> +
>return vect_convert_output (vinfo, last_stmt_info,
> -   type, pattern_stmt, vecitype);
> +   type, pattern_stmt, vecctype);
>  }
>  
>  /* Try to detect multiplication on widened inputs, converting MULT_EXPR
> --- gcc/testsuite/gcc.dg/torture/pr102124.c.jj2021-08-30 
> 12:08:05.838649133 +0200
> +++ gcc/testsuite/gcc.dg/torture/pr102124.c   2021-08-30 12:07:52.669834031 
> +0200
> @@ -0,0 +1,27 @@
> +/* PR tree-optimization/102124 */
> +
> +int
> +foo (const unsigned char *a, const unsigned char *b, unsigned long len)
> +{
> +  int ab, ba; 
> +  unsigned long i;
> +  for (i = 0, ab = 0, ba = 0; i < len; i++)
> +{
> +  ab |= a[i] - b[i];
> +  ba |= b[i] - a[i];
> +}   
> +  return (ab | ba) >= 0;
> +}
> +
> +int
> +main ()
> +{
> +  unsigned char a[32] = { 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 
> 'a', 'a', 'a', 'a', 'a', 'a' };
> +  unsigned char b[32] = { 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 
> 'a', 'a', 'a', 'a', 'a', 'a' };
> +  unsigned char c[32] = { 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 
> 'b', 'b', 'b', 'b', 'b', 'b' };
> +  if (!foo (a, b,

Re: [PATCH] tree-optimization/102139 - fix SLP DR base alignment

2021-09-01 Thread Richard Biener via Gcc-patches
On Wed, 1 Sep 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
> >  wrote:
> >>
> >> When doing whole-function SLP we have to make sure the recorded
> >> base alignments we compute as the maximum alignment seen for a
> >> base anywhere in the function is actually valid at the point
> >> we want to make use of it.
> 
> Ah, yeah, the danger of optimisations that silently rely on the
> then-current restrictions :-(

Yeah.

> >> To make this work we now record the stmt the alignment was derived
> >> from in addition to the DRs innermost behavior and we use a
> >> dominance check to verify the recorded info is valid when doing
> >> BB vectorization.
> >>
> >> Note this leaves a small(?) hole for the case where we have sth
> >> like
> >>
> >> unaligned DR
> >> call (); // does not return
> >> aligned DR
> >>
> >> since we'll derive an aligned access for the earlier DR but the
> >> later DR is never actually reached since the call does not
> >> return.  To plug this hole one option (for the easy backporting)
> >> would be to simply not use the base-alignment recording at all.
> >> Alternatively we'd have to store the dataref grouping 'id' somewhere
> >> in the DR itself and use that to handle this particular case.
> >
> > It turns out this isn't too difficult so the following is a patch adjusted
> > to cover that case together with a testcase.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> TBH I know nothing about this group id scheme, so I'm not really
> in a position to comment.  But it LGTM from the (few) bits I do understand.
> 
> I guess we're leaving the same easter egg for loop optimisation if
> we ever support early exits, but I'm not sure what to do about that.

We're currently not recording alignment from masked DRs
(aka DR_IS_CONDITIONAL_IN_STMT), I suppose we'd need to mark
all stmts after early exits in this way then since in the end
they will have to be masked on the early exit.

So we _might_ be fine there ...

Pushed.

Thanks,
Richard.

> Thanks,
> Richard
> 
> >
> > Thanks,
> > Richard.
> >
> > 2021-08-31  Richard Biener  
> >
> > PR tree-optimization/102139
> > * tree-vectorizer.h (vec_base_alignments): Adjust hash-map
> > type to record a std::pair of the stmt-info and the innermost
> > loop behavior.
> > (dr_vec_info::group): New member.
> > * tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
> > (vect_compute_data_ref_alignment): Verify the recorded
> > base alignment can be used.
> > (data_ref_pair): Remove.
> > (dr_group_sort_cmp): Adjust.
> > (vect_analyze_data_ref_accesses): Store the group-ID in the
> > dr_vec_info and operate on a vector of dr_vec_infos.
> >
> > * gcc.dg/torture/pr102139.c: New testcase.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] bswap: Fix up bswap_view_convert handling [PR102141]

2021-09-01 Thread Richard Biener via Gcc-patches
On Wed, 1 Sep 2021, Jakub Jelinek wrote:

> Hi!
> 
> bswap_view_convert is used twice in spots where gsi_insert_before is the
> right thing, but in the last one it wants to insert preparation stmts
> for the VIEW_CONVERT_EXPR emitted with gsi_insert_after, where at the
> gsi we still need to insert bswap_stmt and maybe mask_stmt whose lhs
> the preparation stmts will use.
> So, this patch adds a BEFORE argument to the function and emits the
> preparation statements before or after depending on that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2021-09-01  Jakub Jelinek  
> 
>   PR tree-optimization/102141
>   * gimple-ssa-store-merging.c (bswap_view_convert): Add BEFORE
>   argument.  If false, emit stmts after gsi instead of before, and
>   with GSI_NEW_STMT.
>   (bswap_replace): Adjust callers.  When converting output of bswap,
>   emit VIEW_CONVERT prepratation stmts after a copy of gsi instead
>   of before it.
> 
>   * gcc.dg/pr102141.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj 2021-08-23 11:54:03.319505682 +0200
> +++ gcc/gimple-ssa-store-merging.c2021-08-31 12:26:08.347127224 +0200
> @@ -1020,7 +1020,8 @@ public:
> first.  */
>  
>  static tree
> -bswap_view_convert (gimple_stmt_iterator *gsi, tree type, tree val)
> +bswap_view_convert (gimple_stmt_iterator *gsi, tree type, tree val,
> + bool before)
>  {
>gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (val))
> || POINTER_TYPE_P (TREE_TYPE (val)));
> @@ -1032,12 +1033,18 @@ bswap_view_convert (gimple_stmt_iterator
> gimple *g
>   = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
>  NOP_EXPR, val);
> -   gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +   if (before)
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +   else
> + gsi_insert_after (gsi, g, GSI_NEW_STMT);
> val = gimple_assign_lhs (g);
>   }
>tree itype = build_nonstandard_integer_type (prec, 1);
>gimple *g = gimple_build_assign (make_ssa_name (itype), NOP_EXPR, val);
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  if (before)
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  else
> + gsi_insert_after (gsi, g, GSI_NEW_STMT);
>val = gimple_assign_lhs (g);
>  }
>return build1 (VIEW_CONVERT_EXPR, type, val);
> @@ -1165,7 +1172,8 @@ bswap_replace (gimple_stmt_iterator gsi,
> gimple_set_vuse (load_stmt, n->vuse);
> gsi_insert_before (&gsi, load_stmt, GSI_SAME_STMT);
> if (conv_code == VIEW_CONVERT_EXPR)
> - val_tmp = bswap_view_convert (&gsi, TREE_TYPE (tgt), val_tmp);
> + val_tmp = bswap_view_convert (&gsi, TREE_TYPE (tgt), val_tmp,
> +   true);
> gimple_assign_set_rhs_with_ops (&gsi, conv_code, val_tmp);
> update_stmt (cur_stmt);
>   }
> @@ -1209,7 +1217,7 @@ bswap_replace (gimple_stmt_iterator gsi,
> if (!is_gimple_val (src))
>   return NULL_TREE;
> if (conv_code == VIEW_CONVERT_EXPR)
> - src = bswap_view_convert (&gsi, TREE_TYPE (tgt), src);
> + src = bswap_view_convert (&gsi, TREE_TYPE (tgt), src, true);
> g = gimple_build_assign (tgt, conv_code, src);
>   }
>else if (cur_stmt)
> @@ -1296,14 +1304,13 @@ bswap_replace (gimple_stmt_iterator gsi,
>/* Convert the result if necessary.  */
>if (!useless_type_conversion_p (TREE_TYPE (tgt), bswap_type))
>  {
> -  gimple *convert_stmt;
> -
>tmp = make_temp_ssa_name (bswap_type, NULL, "bswapdst");
>tree atmp = tmp;
> +  gimple_stmt_iterator gsi2 = gsi;
>if (conv_code == VIEW_CONVERT_EXPR)
> - atmp = bswap_view_convert (&gsi, TREE_TYPE (tgt), tmp);
> -  convert_stmt = gimple_build_assign (tgt, conv_code, atmp);
> -  gsi_insert_after (&gsi, convert_stmt, GSI_SAME_STMT);
> + atmp = bswap_view_convert (&gsi2, TREE_TYPE (tgt), tmp, false);
> +  gimple *convert_stmt = gimple_build_assign (tgt, conv_code, atmp);
> +  gsi_insert_after (&gsi2, convert_stmt, GSI_SAME_STMT);
>  }
>  
>gimple_set_lhs (mask_stmt ? mask_stmt : bswap_stmt, tmp);
> --- gcc/testsuite/gcc.dg/pr102141.c.jj2021-08-31 12:38:22.271783023 
> +0200
> +++ gcc/testsuite/gcc.dg/pr102141.c   2021-08-31 12:38:01.805071490 +0200
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/102141 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +unsigned int __attribute__((__vector_size__ (4))) v;
> +
> +void
> +foo (unsigned long long x)
> +{
> +  v &= (unsigned) (x >> 56 | x >> 40 & 0xff00);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH v3] md/define_c_enum: support value assignation

2021-09-01 Thread Richard Sandiford via Gcc-patches
YunQiang Su  writes:
> Currently, the enums from define_c_enum and define_enum can only
> has values one by one from 0.
>
> In fact we can support the behaviour just like C, aka like
>   (define_enum "mips_isa" [(mips1 1) mips2 (mips32 32) mips32r2]),
> then we can get
>   enum mips_isa {
> MIPS_ISA_MIPS1 = 1,
> MIPS_ISA_MIPS2 = 2,
> MIPS_ISA_MIPS32 = 32,
> MIPS_ISA_MIPS32R2 = 33
>   };
>
> gcc/ChangeLog:
>   * read-md.c (md_reader::handle_enum): support value assignation.
>   * doc/md.texi: record define_c_enum value assignation support.
> ---
>  gcc/doc/md.texi |  4 
>  gcc/read-md.c   | 21 +
>  2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index f8047aefc..2b41cb7fb 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -11074,6 +11074,8 @@ The syntax is as follows:
>  (define_c_enum "@var{name}" [
>@var{value0}
>@var{value1}
> +  (@var{value32} 32)
> +  @var{value33}
>@dots{}
>@var{valuen}
>  ])
> @@ -11086,6 +11088,8 @@ in @file{insn-constants.h}:
>  enum @var{name} @{
>@var{value0} = 0,
>@var{value1} = 1,
> +  @var{value32} = 32,
> +  @var{value33} = 33,
>@dots{}
>@var{valuen} = @var{n}
>  @};
> diff --git a/gcc/read-md.c b/gcc/read-md.c
> index bb419e0f6..0fbe924d1 100644
> --- a/gcc/read-md.c
> +++ b/gcc/read-md.c
> @@ -902,7 +902,8 @@ void
>  md_reader::handle_enum (file_location loc, bool md_p)
>  {
>char *enum_name, *value_name;
> -  struct md_name name;
> +  unsigned int cur_value;
> +  struct md_name name, value;
>struct enum_type *def;
>struct enum_value *ev;
>void **slot;
> @@ -928,6 +929,7 @@ md_reader::handle_enum (file_location loc, bool md_p)
>*slot = def;
>  }
>  
> +  cur_value = def->num_values;
>require_char_ws ('[');
>  
>while ((c = read_skip_spaces ()) != ']')
> @@ -937,8 +939,18 @@ md_reader::handle_enum (file_location loc, bool md_p)
> error_at (loc, "unterminated construct");
> exit (1);
>   }
> -  unread_char (c);
> -  read_name (&name);
> +  if (c == '(')
> + {
> +   read_name (&name);
> +   read_name (&value);
> +   require_char_ws (')');
> +   cur_value = atoi(value.string);

Minor formatting nit, sorry, but: should be a space after atoi.

OK for trunk with that change, thanks.

Richard

> + }
> +  else
> + {
> +   unread_char (c);
> +   read_name (&name);
> + }
>  
>ev = XNEW (struct enum_value);
>ev->next = 0;
> @@ -954,11 +966,12 @@ md_reader::handle_enum (file_location loc, bool md_p)
> ev->name = value_name;
>   }
>ev->def = add_constant (get_md_constants (), value_name,
> -   md_decimal_string (def->num_values), def);
> +   md_decimal_string (cur_value), def);
>  
>*def->tail_ptr = ev;
>def->tail_ptr = &ev->next;
>def->num_values++;
> +  cur_value++;
>  }
>  }


[PATCH]AArch64 Make use of FADDP in simple reductions.

2021-09-01 Thread Tamar Christina via Gcc-patches
Hi All,

This is a respin of an older patch which never got upstream reviewed by a
maintainer.  It's been updated to fit the current GCC codegen.

This patch adds a pattern to support the (F)ADDP (scalar) instruction.

Before the patch, the C code

typedef float v4sf __attribute__((vector_size (16)));

float
foo1 (v4sf x)
{
  return x[0] + x[1];
}

generated:

foo1:
dup s1, v0.s[1]
fadds0, s1, s0
ret

After patch:
foo1:
faddp   s0, v0.2s
ret

The double case is now handled by SLP but the remaining cases still need help
from combine.  I have kept the integer and floating point separate because of
the integer one only supports V2DI and sharing it with the float would have
required definition of a few new iterators for just a single use.

I provide support for when both elements are subregs as a different pattern
as there's no way to tell reload that the two registers must be equal with
just constraints.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_faddp_scalar,
*aarch64_addp_scalarv2di, *aarch64_faddp_scalar2,
*aarch64_addp_scalar2v2di): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/scalar_faddp.c: New test.
* gcc.target/aarch64/simd/scalar_faddp2.c: New test.
* gcc.target/aarch64/simd/scalar_addp.c: New test.

Co-authored-by: Tamar Christina 

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
6814dae079c9ff40aaa2bb625432bf9eb8906b73..b49f8b79b11cbb1888c503d9a9384424f44bde05
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3414,6 +3414,70 @@ (define_insn "aarch64_faddp"
   [(set_attr "type" "neon_fp_reduc_add_")]
 )
 
+;; For the case where both operands are a subreg we need to use a
+;; match_dup since reload cannot enforce that the registers are
+;; the same with a constraint in this case.
+(define_insn "*aarch64_faddp_scalar2"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (plus:
+ (vec_select:
+   (match_operator: 1 "subreg_lowpart_operator"
+ [(match_operand:VHSDF 2 "register_operand" "w")])
+   (parallel [(match_operand 3 "const_int_operand" "n")]))
+ (match_dup: 2)))]
+  "TARGET_SIMD
+   && ENDIAN_LANE_N (, INTVAL (operands[3])) == 1"
+  "faddp\t%0, %2.2"
+  [(set_attr "type" "neon_fp_reduc_add_")]
+)
+
+(define_insn "*aarch64_faddp_scalar"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (plus:
+ (vec_select:
+   (match_operand:VHSDF 1 "register_operand" "w")
+   (parallel [(match_operand 2 "const_int_operand" "n")]))
+ (match_operand: 3 "register_operand" "1")))]
+  "TARGET_SIMD
+   && ENDIAN_LANE_N (, INTVAL (operands[2])) == 1
+   && SUBREG_P (operands[3]) && !SUBREG_P (operands[1])
+   && subreg_lowpart_p (operands[3])"
+  "faddp\t%0, %1.2"
+  [(set_attr "type" "neon_fp_reduc_add_")]
+)
+
+;; For the case where both operands are a subreg we need to use a
+;; match_dup since reload cannot enforce that the registers are
+;; the same with a constraint in this case.
+(define_insn "*aarch64_addp_scalar2v2di"
+  [(set (match_operand:DI 0 "register_operand" "=w")
+   (plus:DI
+ (vec_select:DI
+   (match_operator:DI 1 "subreg_lowpart_operator"
+ [(match_operand:V2DI 2 "register_operand" "w")])
+   (parallel [(match_operand 3 "const_int_operand" "n")]))
+ (match_dup:DI 2)))]
+  "TARGET_SIMD
+   && ENDIAN_LANE_N (2, INTVAL (operands[3])) == 1"
+  "addp\t%d0, %2.2d"
+  [(set_attr "type" "neon_reduc_add_long")]
+)
+
+(define_insn "*aarch64_addp_scalarv2di"
+  [(set (match_operand:DI 0 "register_operand" "=w")
+   (plus:DI
+ (vec_select:DI
+   (match_operand:V2DI 1 "register_operand" "w")
+   (parallel [(match_operand 2 "const_int_operand" "n")]))
+ (match_operand:DI 3 "register_operand" "1")))]
+  "TARGET_SIMD
+   && ENDIAN_LANE_N (2, INTVAL (operands[2])) == 1
+   && SUBREG_P (operands[3]) && !SUBREG_P (operands[1])
+   && subreg_lowpart_p (operands[3])"
+  "addp\t%d0, %1.2d"
+  [(set_attr "type" "neon_reduc_add_long")]
+)
+
 (define_insn "aarch64_reduc_plus_internal"
  [(set (match_operand:VDQV 0 "register_operand" "=w")
(unspec:VDQV [(match_operand:VDQV 1 "register_operand" "w")]
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/scalar_addp.c 
b/gcc/testsuite/gcc.target/aarch64/simd/scalar_addp.c
new file mode 100644
index 
..ab904ca6a6392a3a068615f68e6b76c0716344ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/scalar_addp.c
@@ -0,0 +1,11 @@
+/* { dg-do assemble } */
+/* { dg-additional-options "-save-temps -O1 -std=c99" } */
+
+typedef long long v2di __attribute__((vector_size (16)));
+
+long long
+foo (v2di x)
+{
+  return x[1] + x[0

Re: [PATCH] tree-optimization/102139 - fix SLP DR base alignment

2021-09-01 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
>  wrote:
>>
>> When doing whole-function SLP we have to make sure the recorded
>> base alignments we compute as the maximum alignment seen for a
>> base anywhere in the function is actually valid at the point
>> we want to make use of it.

Ah, yeah, the danger of optimisations that silently rely on the
then-current restrictions :-(

>> To make this work we now record the stmt the alignment was derived
>> from in addition to the DRs innermost behavior and we use a
>> dominance check to verify the recorded info is valid when doing
>> BB vectorization.
>>
>> Note this leaves a small(?) hole for the case where we have sth
>> like
>>
>> unaligned DR
>> call (); // does not return
>> aligned DR
>>
>> since we'll derive an aligned access for the earlier DR but the
>> later DR is never actually reached since the call does not
>> return.  To plug this hole one option (for the easy backporting)
>> would be to simply not use the base-alignment recording at all.
>> Alternatively we'd have to store the dataref grouping 'id' somewhere
>> in the DR itself and use that to handle this particular case.
>
> It turns out this isn't too difficult so the following is a patch adjusted
> to cover that case together with a testcase.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?

TBH I know nothing about this group id scheme, so I'm not really
in a position to comment.  But it LGTM from the (few) bits I do understand.

I guess we're leaving the same easter egg for loop optimisation if
we ever support early exits, but I'm not sure what to do about that.

Thanks,
Richard

>
> Thanks,
> Richard.
>
> 2021-08-31  Richard Biener  
>
> PR tree-optimization/102139
> * tree-vectorizer.h (vec_base_alignments): Adjust hash-map
> type to record a std::pair of the stmt-info and the innermost
> loop behavior.
> (dr_vec_info::group): New member.
> * tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
> (vect_compute_data_ref_alignment): Verify the recorded
> base alignment can be used.
> (data_ref_pair): Remove.
> (dr_group_sort_cmp): Adjust.
> (vect_analyze_data_ref_accesses): Store the group-ID in the
> dr_vec_info and operate on a vector of dr_vec_infos.
>
> * gcc.dg/torture/pr102139.c: New testcase.


[PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-01 Thread Richard Biener via Gcc-patches
This fixes the CFG walk order of fill_always_executed_in to use
RPO oder rather than the dominator based order computed by
get_loop_body_in_dom_order.  That fixes correctness issues with
unordered dominator children.

The RPO order computed by rev_post_order_and_mark_dfs_back_seme in
its for-iteration mode is a good match for the algorithm.

Xionghu, I've tried to only fix the CFG walk order issue and not
change anything else with this so we have a more correct base
to work against.  The code still walks inner loop bodies
up to loop depth times and thus is quadratic in the loop depth.

Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't
have any comments I plan to push this and then revisit what we
were circling around.

Richard.

2021-09-01  Richard Biener  

PR tree-optimization/102155
* tree-ssa-loop-im.c (fill_always_executed_in_1): Iterate
over a part of the RPO array and do not recurse here.
Dump blocks marked as always executed.
(fill_always_executed_in): Walk over the RPO array and
process loops whose header we run into.
(loop_invariant_motion_in_fun): Compute the first RPO
using rev_post_order_and_mark_dfs_back_seme in iteration
order and pass that to fill_always_executed_in.
---
 gcc/tree-ssa-loop-im.c | 136 ++---
 1 file changed, 73 insertions(+), 63 deletions(-)

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index d9f75d5025e..f3706dcdb8a 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3025,77 +3025,74 @@ do_store_motion (void)
 /* Fills ALWAYS_EXECUTED_IN information for basic blocks of LOOP, i.e.
for each such basic block bb records the outermost loop for that execution
of its header implies execution of bb.  CONTAINS_CALL is the bitmap of
-   blocks that contain a nonpure call.  */
+   blocks that contain a nonpure call.  The blocks of LOOP start at index
+   START of the RPO array of size N.  */
 
 static void
-fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
+fill_always_executed_in_1 (function *fun, class loop *loop,
+  int *rpo, int start, int n, sbitmap contains_call)
 {
-  basic_block bb = NULL, *bbs, last = NULL;
-  unsigned i;
-  edge e;
+  basic_block last = NULL;
   class loop *inn_loop = loop;
 
-  if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
+  for (int i = start; i < n; i++)
 {
-  bbs = get_loop_body_in_dom_order (loop);
-
-  for (i = 0; i < loop->num_nodes; i++)
-   {
- edge_iterator ei;
- bb = bbs[i];
-
- if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
-   last = bb;
+  basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i]);
+  /* Stop when we iterated over all blocks in this loop.  */
+  if (!flow_bb_inside_loop_p (loop, bb))
+   break;
 
- if (bitmap_bit_p (contains_call, bb->index))
-   break;
+  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+   last = bb;
 
- FOR_EACH_EDGE (e, ei, bb->succs)
-   {
- /* If there is an exit from this BB.  */
- if (!flow_bb_inside_loop_p (loop, e->dest))
-   break;
- /* Or we enter a possibly non-finite loop.  */
- if (flow_loop_nested_p (bb->loop_father,
- e->dest->loop_father)
- && ! finite_loop_p (e->dest->loop_father))
-   break;
-   }
- if (e)
-   break;
+  if (bitmap_bit_p (contains_call, bb->index))
+   break;
 
- /* A loop might be infinite (TODO use simple loop analysis
-to disprove this if possible).  */
- if (bb->flags & BB_IRREDUCIBLE_LOOP)
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+   {
+ /* If there is an exit from this BB.  */
+ if (!flow_bb_inside_loop_p (loop, e->dest))
break;
-
- if (!flow_bb_inside_loop_p (inn_loop, bb))
+ /* Or we enter a possibly non-finite loop.  */
+ if (flow_loop_nested_p (bb->loop_father,
+ e->dest->loop_father)
+ && ! finite_loop_p (e->dest->loop_father))
break;
+   }
+  if (e)
+   break;
 
- if (bb->loop_father->header == bb)
-   {
- if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
-   break;
+  /* A loop might be infinite (TODO use simple loop analysis
+to disprove this if possible).  */
+  if (bb->flags & BB_IRREDUCIBLE_LOOP)
+   break;
 
- /* In a loop that is always entered we may proceed anyway.
-But record that we entered it and stop once we leave it.  */
- inn_loop = bb->loop_father;
-   }
-   }
+  if (!flow_bb_inside_loop_p (inn_loop, bb))
+   break;
 
-  while (1)
+  if (bb->loop_father->header == bb)

Re: [PATCH] libstdc++-v3: Check for TLS support on mingw

2021-09-01 Thread Jonathan Wakely via Gcc-patches
On Wed, 1 Sept 2021 at 02:44, Jonathan Yong <10wa...@gmail.com> wrote:
>
> On 8/31/21 9:02 AM, Jonathan Wakely wrote:
> > It looks like my questions about this patch never got an answer, and
> > it never got applied.
> >
> > Could somebody say whether TLS is enabled for native *-*-mingw*
> > builds? If it is, then we definitely need to add GCC_CHECK_TLS to the
> > cross-compiler config too.
> >
> > For a linux-hosted x86_64-w64-mingw32 cross compiler I see TLS is not 
> > enabled:
> >
> > /* Define to 1 if the target supports thread-local storage. */
> > /* #undef _GLIBCXX_HAVE_TLS */
> >
> >
> >
> >
> > On Mon, 19 Feb 2018 at 08:59, Hugo Beauzée-Luyssen  wrote:
> >>
> >> libstdc++-v3: Check for TLS support on mingw
> >>
> >> 2018-02-16  Hugo Beauzée-Luyssen  
> >>
> >>  * crossconfig.m4: Check for TLS support on mignw
> >>  * configure: regenerate
> >>
> >> Index: libstdc++-v3/crossconfig.m4
> >> ===
> >> --- libstdc++-v3/crossconfig.m4 (revision 257730)
> >> +++ libstdc++-v3/crossconfig.m4 (working copy)
> >> @@ -197,6 +197,7 @@ case "${host}" in
> >>   GLIBCXX_CHECK_LINKER_FEATURES
> >>   GLIBCXX_CHECK_MATH_SUPPORT
> >>   GLIBCXX_CHECK_STDLIB_SUPPORT
> >> +GCC_CHECK_TLS
> >>   ;;
> >> *-netbsd*)
> >>   SECTION_FLAGS='-ffunction-sections -fdata-sections'
>
> According to MSYS2 native from
> https://mirror.msys2.org/mingw/ucrt64/mingw-w64-ucrt-x86_64-gcc-10.3.0-5-any.pkg.tar.zst:
>
> x86_64-w64-mingw32/bits/c++config.h:#define _GLIBCXX_HAVE_TLS 1
>
> So yes.

Thanks! I'll test the patch on a cross-compiler and apply it soon then.

(Thanks also to LH for the answer)


Re: Simplify 'gcc/tree.c:walk_tree_1' handling of 'OMP_CLAUSE' (was: Fix PR 25886. Convert OMP_CLAUSE_* into sub-codes.)

2021-09-01 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 31, 2021 at 08:51:16PM +0200, Thomas Schwinge wrote:
>   gcc/
>   * tree.c (walk_tree_1) : Simplify.

:

And you don't mention the omp_clause_num_ops and build_omp_clause comment
changes in the ChangeLog.
Otherwise LGTM, thanks.

Jakub



Re: [PATCH] Add MIPS Linux support to gcc.misc-tests/linkage.c (testsuite/51748)

2021-09-01 Thread YunQiang Su
Richard Sandiford via Gcc-patches 
于2021年9月1日周三 下午4:55写道:
>
> apinski--- via Gcc-patches  writes:
> > From: Andrew Pinski 
> >
> > This adds MIPS Linux support to gcc.misc-tests/linkage.exp.  Basically
> > copying what was done for MIPS IRIX and changing the options to be correct.
> >
> > OK?
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR testsuite/51748
> >   * gcc.misc-tests/linkage.exp: Add mips*-linux-* support.
>
> OK, thanks.  Searching for any match for 64 seems surprisingly general,
> but it's what other cases do and has obviously stood the test of time.
>

syq@XX:~$ gcc -mips64r2 -mabi=64 -c zz.c && file zz.o
zz.o: ELF 64-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
not stripped
syq@XX:~$ gcc -mips64r2 -mabi=32 -c zz.c && file zz.o
zz.o: ELF 32-bit LSB relocatable, MIPS, MIPS64 rel2 version 1 (SYSV),
not stripped
syq@XX:~$ gcc -mips64r2 -mabi=n32 -c zz.c && file zz.o
zz.o: ELF 32-bit LSB relocatable, MIPS, N32 MIPS64 rel2 version 1
(SYSV), not stripped

In the first glance, I also thought the code is wrong. While with some
check, it does work.

> Richard
>
> > ---
> >  gcc/testsuite/gcc.misc-tests/linkage.exp | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.misc-tests/linkage.exp 
> > b/gcc/testsuite/gcc.misc-tests/linkage.exp
> > index afed2b811c9..2cb109e776e 100644
> > --- a/gcc/testsuite/gcc.misc-tests/linkage.exp
> > +++ b/gcc/testsuite/gcc.misc-tests/linkage.exp
> > @@ -38,6 +38,18 @@ if { [isnative] && ![is_remote host] } then {
> >
> >   # Need to ensure ABI for native compiler matches gcc
> >   set native_cflags ""
> > + if  [istarget "mips*-linux*"] {
> > + set file_string [exec file "linkage-x.o"]
> > + if [ string match "*64*" $file_string ] {
> > + set native_cflags "-mabi=64"
> > + }
> > + if [ string match "*ELF 32*" $file_string ] {
> > + set native_cflags "-mabi=32"
> > + }
> > + if [ string match "*N32*" $file_string ] {
> > + set native_cflags "-mabi=n32"
> > + }
> > + }
> >   if  [istarget "sparc*-sun-solaris2*"] {
> >   set file_string [exec file "linkage-x.o"]
> >   if [ string match "*64*" $file_string ] {


[PATCH] graph output: use better colors for edges

2021-09-01 Thread Martin Liška

This patch improves coloring of graph dumps, as can be seen here:
https://splichal.eu/tmp/example.svg

Ready to be installed once it finishes tests?
Thanks,
Martin

gcc/ChangeLog:

* graph.c (draw_cfg_node_succ_edges): Do not color fallthru
  edges and rather use colors for TRUE and FALSE edges.
---
 gcc/graph.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index ce8de33ffe1..9acd1d5b95e 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -133,10 +133,11 @@ draw_cfg_node_succ_edges (pretty_printer *pp, int 
funcdef_no, basic_block bb)
  weight = 10;
}
   else if (e->flags & EDGE_FALLTHRU)
-   {
- color = "blue";
- weight = 100;
-   }
+   weight = 100;
+  else if (e->flags & EDGE_TRUE_VALUE)
+   color = "forestgreen";
+  else if (e->flags & EDGE_FALSE_VALUE)
+   color = "darkorange";
 
   if (e->flags & EDGE_ABNORMAL)

color = "red";
--
2.33.0



Re: [Patch v2] C, C++, Fortran, OpenMP: Add support for device-modifiers for 'omp target device'

2021-09-01 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 01, 2021 at 09:06:31AM +0200, Christophe Lyon wrote:
> > >   * gfortran.dg/gomp/target-device-ancestor-4.f90: New test.
> >
> 
> The last new test fails on aarch64:
>  /gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90:7:15: Error:
> Sorry, 'reverse_offload' clause at (1) on REQUIRES directive is not yet
> supported
> compiler exited with status 1
> PASS: gfortran.dg/gomp/target-device-ancestor-4.f90   -O   (test for
> errors, line 7)
> XFAIL: gfortran.dg/gomp/target-device-ancestor-4.f90   -O  sorry,
> unimplemented: 'ancestor' not yet supported (test for warnings, line 9)
> PASS: gfortran.dg/gomp/target-device-ancestor-4.f90   -O  (test for excess
> errors)
> gfortran.dg/gomp/target-device-ancestor-4.f90   -O  : dump file does not
> exist
> UNRESOLVED: gfortran.dg/gomp/target-device-ancestor-4.f90   -O
> scan-tree-dump original "pragma omp target [^\n\r)]*device\\(ancestor:1\\)"

It is UNRESOLVED everywhere.  Unlike the C/C++ FEs that emit the original
dump even if there are errors/sorry during parsing, the Fortran FE doesn't
do that.
So I think either the dg-final should be xfailed or removed for now.

Jakub



Re: [PATCH] Add MIPS Linux support to gcc.misc-tests/linkage.c (testsuite/51748)

2021-09-01 Thread Richard Sandiford via Gcc-patches
apinski--- via Gcc-patches  writes:
> From: Andrew Pinski 
>
> This adds MIPS Linux support to gcc.misc-tests/linkage.exp.  Basically
> copying what was done for MIPS IRIX and changing the options to be correct.
>
> OK?
>
> gcc/testsuite/ChangeLog:
>
>   PR testsuite/51748
>   * gcc.misc-tests/linkage.exp: Add mips*-linux-* support.

OK, thanks.  Searching for any match for 64 seems surprisingly general,
but it's what other cases do and has obviously stood the test of time.

Richard

> ---
>  gcc/testsuite/gcc.misc-tests/linkage.exp | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.misc-tests/linkage.exp 
> b/gcc/testsuite/gcc.misc-tests/linkage.exp
> index afed2b811c9..2cb109e776e 100644
> --- a/gcc/testsuite/gcc.misc-tests/linkage.exp
> +++ b/gcc/testsuite/gcc.misc-tests/linkage.exp
> @@ -38,6 +38,18 @@ if { [isnative] && ![is_remote host] } then {
>  
>   # Need to ensure ABI for native compiler matches gcc
>   set native_cflags ""
> + if  [istarget "mips*-linux*"] {
> + set file_string [exec file "linkage-x.o"]
> + if [ string match "*64*" $file_string ] {
> + set native_cflags "-mabi=64"
> + }
> + if [ string match "*ELF 32*" $file_string ] {
> + set native_cflags "-mabi=32"
> + }
> + if [ string match "*N32*" $file_string ] {
> + set native_cflags "-mabi=n32"
> + }
> + }
>   if  [istarget "sparc*-sun-solaris2*"] {
>   set file_string [exec file "linkage-x.o"]
>   if [ string match "*64*" $file_string ] {


Re: [PATCH] Fix target/101934: aarch64 memset code creates unaligned stores for -mstrict-align

2021-09-01 Thread Richard Sandiford via Gcc-patches
apinski--- via Gcc-patches  writes:
> From: Andrew Pinski 
>
> The problem here is the aarch64_expand_setmem code did not check
> STRICT_ALIGNMENT if it is creating an overlapping store.
> This patch adds that check and the testcase works.
>
> gcc/ChangeLog:
>
>   PR target/101934
>   * config/aarch64/aarch64.c (aarch64_expand_setmem):
>   Check STRICT_ALIGNMENT before creating an overlapping
>   store.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/101934
>   * gcc.target/aarch64/memset-strict-align-1.c: New test.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.c  |  4 +--
>  .../aarch64/memset-strict-align-1.c   | 28 +++
>  2 files changed, 30 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3213585a588..26d59ba1e13 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23566,8 +23566,8 @@ aarch64_expand_setmem (rtx *operands)
>/* Do certain trailing copies as overlapping if it's going to be
>cheaper.  i.e. less instructions to do so.  For instance doing a 15
>byte copy it's more efficient to do two overlapping 8 byte copies than
> -  8 + 4 + 2 + 1.  */
> -  if (n > 0 && n < copy_limit / 2)
> +  8 + 4 + 2 + 1.  Only do this when -mstrict-align is not supplied.  */
> +  if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT)
>   {
> next_mode = smallest_mode_for_size (n, MODE_INT);
> int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
> diff --git a/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c 
> b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
> new file mode 100644
> index 000..5cdc8a44968
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -mstrict-align" } */
> +
> +struct s { char x[95]; };
> +void foo (struct s *);
> +void bar (void) { struct s s1 = {}; foo (&s1); }
> +
> +/* memset (s1 = {}, sizeof = 95) should be expanded out
> +   such that there are no overlap stores when -mstrict-align
> +   is in use.
> +   so 2 pair 16 bytes stores (64 bytes).
> +   1 16 byte stores
> +   1 8 byte store
> +   1 4 byte store
> +   1 2 byte store
> +   1 1 byte store
> +   */
> +
> +/* { dg-final { scan-assembler-times "stp\tq" 2 } } */
> +/* { dg-final { scan-assembler-times "str\tq" 1 } } */
> +/* { dg-final { scan-assembler-times "str\txzr" 1 } } */
> +/* { dg-final { scan-assembler-times "str\twzr" 1 } } */
> +/* { dg-final { scan-assembler-times "strh\twzr" 1 } } */
> +/* { dg-final { scan-assembler-times "strb\twzr" 1 } } */
> +
> +/* Also one store pair for the frame-pointer and the LR. */
> +/* { dg-final { scan-assembler-times "stp\tx" 1 } } */
> +


[PATCH] C++: add type checking for static local vector variable in template

2021-09-01 Thread wangpc via Gcc-patches
From: wangpc 

---
 gcc/cp/pt.c|  8 +++-
 .../aarch64/sve/static-var-in-template.C   | 18 ++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f0aa626ab723..988f4cb1e73f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -14731,7 +14731,13 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
 even if its underlying type is not.  */
  TYPE_DEPENDENT_P_VALID (TREE_TYPE (r)) = false;
  }
-
+/* We should verify static local variable's type
+since vector type does not have a fixed size.  */
+if (TREE_STATIC (t)
+  &&!verify_type_context (input_location, TCTX_STATIC_STORAGE, type))
+{
+  RETURN (error_mark_node);
+}
layout_decl (r, 0);
   }
   break;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index ..26d397ca565d
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+int i = 0;
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */
-- 
2.33.0.windows.1



[PATCH] bswap: Fix up bswap_view_convert handling [PR102141]

2021-09-01 Thread Jakub Jelinek via Gcc-patches
Hi!

bswap_view_convert is used twice in spots where gsi_insert_before is the
right thing, but in the last one it wants to insert preparation stmts
for the VIEW_CONVERT_EXPR emitted with gsi_insert_after, where at the
gsi we still need to insert bswap_stmt and maybe mask_stmt whose lhs
the preparation stmts will use.
So, this patch adds a BEFORE argument to the function and emits the
preparation statements before or after depending on that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-09-01  Jakub Jelinek  

PR tree-optimization/102141
* gimple-ssa-store-merging.c (bswap_view_convert): Add BEFORE
argument.  If false, emit stmts after gsi instead of before, and
with GSI_NEW_STMT.
(bswap_replace): Adjust callers.  When converting output of bswap,
emit VIEW_CONVERT prepratation stmts after a copy of gsi instead
of before it.

* gcc.dg/pr102141.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2021-08-23 11:54:03.319505682 +0200
+++ gcc/gimple-ssa-store-merging.c  2021-08-31 12:26:08.347127224 +0200
@@ -1020,7 +1020,8 @@ public:
first.  */
 
 static tree
-bswap_view_convert (gimple_stmt_iterator *gsi, tree type, tree val)
+bswap_view_convert (gimple_stmt_iterator *gsi, tree type, tree val,
+   bool before)
 {
   gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (val))
  || POINTER_TYPE_P (TREE_TYPE (val)));
@@ -1032,12 +1033,18 @@ bswap_view_convert (gimple_stmt_iterator
  gimple *g
= gimple_build_assign (make_ssa_name (pointer_sized_int_node),
   NOP_EXPR, val);
- gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ if (before)
+   gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ else
+   gsi_insert_after (gsi, g, GSI_NEW_STMT);
  val = gimple_assign_lhs (g);
}
   tree itype = build_nonstandard_integer_type (prec, 1);
   gimple *g = gimple_build_assign (make_ssa_name (itype), NOP_EXPR, val);
-  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  if (before)
+   gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  else
+   gsi_insert_after (gsi, g, GSI_NEW_STMT);
   val = gimple_assign_lhs (g);
 }
   return build1 (VIEW_CONVERT_EXPR, type, val);
@@ -1165,7 +1172,8 @@ bswap_replace (gimple_stmt_iterator gsi,
  gimple_set_vuse (load_stmt, n->vuse);
  gsi_insert_before (&gsi, load_stmt, GSI_SAME_STMT);
  if (conv_code == VIEW_CONVERT_EXPR)
-   val_tmp = bswap_view_convert (&gsi, TREE_TYPE (tgt), val_tmp);
+   val_tmp = bswap_view_convert (&gsi, TREE_TYPE (tgt), val_tmp,
+ true);
  gimple_assign_set_rhs_with_ops (&gsi, conv_code, val_tmp);
  update_stmt (cur_stmt);
}
@@ -1209,7 +1217,7 @@ bswap_replace (gimple_stmt_iterator gsi,
  if (!is_gimple_val (src))
return NULL_TREE;
  if (conv_code == VIEW_CONVERT_EXPR)
-   src = bswap_view_convert (&gsi, TREE_TYPE (tgt), src);
+   src = bswap_view_convert (&gsi, TREE_TYPE (tgt), src, true);
  g = gimple_build_assign (tgt, conv_code, src);
}
   else if (cur_stmt)
@@ -1296,14 +1304,13 @@ bswap_replace (gimple_stmt_iterator gsi,
   /* Convert the result if necessary.  */
   if (!useless_type_conversion_p (TREE_TYPE (tgt), bswap_type))
 {
-  gimple *convert_stmt;
-
   tmp = make_temp_ssa_name (bswap_type, NULL, "bswapdst");
   tree atmp = tmp;
+  gimple_stmt_iterator gsi2 = gsi;
   if (conv_code == VIEW_CONVERT_EXPR)
-   atmp = bswap_view_convert (&gsi, TREE_TYPE (tgt), tmp);
-  convert_stmt = gimple_build_assign (tgt, conv_code, atmp);
-  gsi_insert_after (&gsi, convert_stmt, GSI_SAME_STMT);
+   atmp = bswap_view_convert (&gsi2, TREE_TYPE (tgt), tmp, false);
+  gimple *convert_stmt = gimple_build_assign (tgt, conv_code, atmp);
+  gsi_insert_after (&gsi2, convert_stmt, GSI_SAME_STMT);
 }
 
   gimple_set_lhs (mask_stmt ? mask_stmt : bswap_stmt, tmp);
--- gcc/testsuite/gcc.dg/pr102141.c.jj  2021-08-31 12:38:22.271783023 +0200
+++ gcc/testsuite/gcc.dg/pr102141.c 2021-08-31 12:38:01.805071490 +0200
@@ -0,0 +1,11 @@
+/* PR tree-optimization/102141 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned int __attribute__((__vector_size__ (4))) v;
+
+void
+foo (unsigned long long x)
+{
+  v &= (unsigned) (x >> 56 | x >> 40 & 0xff00);
+}

Jakub



[PATCH] tree-optimization/102149 - add testcase for fixed bug

2021-09-01 Thread Richard Biener via Gcc-patches
This adds the testcase from the PR.

Pushed.

2021-09-01  Richard Biener  

PR tree-optimization/102149
* gcc.dg/torture/pr102149.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr102149.c | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr102149.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr102149.c 
b/gcc/testsuite/gcc.dg/torture/pr102149.c
new file mode 100644
index 000..34a8c213133
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr102149.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fno-vect-cost-model" } */
+
+int a[8];
+int *b = &a[6];
+char c;
+int main()
+{
+  int d = 7;
+  for (; d >= 0; d--)
+{
+  *b = 1;
+  c = a[d] >> 3;
+  a[d] = c;
+}
+  if (a[6] != 1)
+__builtin_abort ();
+  return 0;
+}
-- 
2.31.1


Re: [Patch v2] C, C++, Fortran, OpenMP: Add support for device-modifiers for 'omp target device'

2021-09-01 Thread Christophe Lyon via Gcc-patches
On Mon, Aug 30, 2021 at 8:27 AM Jakub Jelinek via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Wed, Aug 25, 2021 at 12:14:09PM +0200, Marcel Vollweiler wrote:
> > Add support for device-modifiers for 'omp target device'.
> >
> > 'device_num' and 'ancestor' are now parsed on target device constructs
> for C,
> > C++, and Fortran (see OpenMP specification 5.0, p. 170). When 'ancestor'
> is
> >  used, then 'sorry, not supported' is output. Moreover, the restrictions
> for
> > 'ancestor' are implemented (see OpenMP specification 5.0, p. 174f).
> >
> > gcc/c/ChangeLog:
> >
> >   * c-parser.c (c_parser_omp_clause_device): Parse device-modifiers
> 'device_num'
> >   and 'ancestor' in 'target device' clauses.
> >
> > gcc/cp/ChangeLog:
> >
> >   * parser.c (cp_parser_omp_clause_device): Parse device-modifiers
> 'device_num'
> >   and 'ancestor' in 'target device' clauses.
> >   * semantics.c (finish_omp_clauses): Error handling. Constant
> device ids must
> >   evaluate to '1' if 'ancestor' is used.
> >
> > gcc/fortran/ChangeLog:
> >
> >   * gfortran.h: Add variable for 'ancestor' in struct
> gfc_omp_clauses.
> >   * openmp.c (gfc_match_omp_clauses): Parse device-modifiers
> 'device_num'
> > and 'ancestor' in 'target device' clauses.
> >   * trans-openmp.c (gfc_trans_omp_clauses): Set
> OMP_CLAUSE_DEVICE_ANCESTOR.
> >
> > gcc/ChangeLog:
> >
> >   * gimplify.c (gimplify_scan_omp_clauses): Error handling.
> 'ancestor' only
> >   allowed on target constructs and only with particular other
> clauses.
> >   * omp-expand.c (expand_omp_target): Output of 'sorry, not
> supported' if
> >   'ancestor' is used.
> >   * omp-low.c (check_omp_nesting_restrictions): Error handling. No
> nested OpenMP
> > structs when 'ancestor' is used.
> >   (scan_omp_1_stmt): No usage of OpenMP runtime routines in a target
> region when
> >   'ancestor' is used.
> >   * tree-pretty-print.c (dump_omp_clause): Append 'ancestor'.
> >   * tree.h (OMP_CLAUSE_DEVICE_ANCESTOR): Define macro.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * c-c++-common/gomp/target-device-1.c: New test.
> >   * c-c++-common/gomp/target-device-2.c: New test.
> >   * c-c++-common/gomp/target-device-ancestor-1.c: New test.
> >   * c-c++-common/gomp/target-device-ancestor-2.c: New test.
> >   * c-c++-common/gomp/target-device-ancestor-3.c: New test.
> >   * c-c++-common/gomp/target-device-ancestor-4.c: New test.
> >   * gfortran.dg/gomp/target-device-1.f90: New test.
> >   * gfortran.dg/gomp/target-device-2.f90: New test.
> >   * gfortran.dg/gomp/target-device-ancestor-1.f90: New test.
> >   * gfortran.dg/gomp/target-device-ancestor-2.f90: New test.
> >   * gfortran.dg/gomp/target-device-ancestor-3.f90: New test.
> >   * gfortran.dg/gomp/target-device-ancestor-4.f90: New test.
>

The last new test fails on aarch64:
 /gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90:7:15: Error:
Sorry, 'reverse_offload' clause at (1) on REQUIRES directive is not yet
supported
compiler exited with status 1
PASS: gfortran.dg/gomp/target-device-ancestor-4.f90   -O   (test for
errors, line 7)
XFAIL: gfortran.dg/gomp/target-device-ancestor-4.f90   -O  sorry,
unimplemented: 'ancestor' not yet supported (test for warnings, line 9)
PASS: gfortran.dg/gomp/target-device-ancestor-4.f90   -O  (test for excess
errors)
gfortran.dg/gomp/target-device-ancestor-4.f90   -O  : dump file does not
exist
UNRESOLVED: gfortran.dg/gomp/target-device-ancestor-4.f90   -O
scan-tree-dump original "pragma omp target [^\n\r)]*device\\(ancestor:1\\)"

Can you fix it?

Thanks,

Christophe


> Ok, thanks.
>
> Jakub
>
>


  1   2   >