Re: [Patch] x86: Enable GCC support for Intel Hreset extension

2020-10-13 Thread Uros Bizjak via Gcc-patches
On Tue, Oct 13, 2020 at 10:49 AM Hongyu Wang  wrote:
>
> Hi:
>
> This patch is about to support Intel Hreset instruction.
>
> Hreset provides a hint to the processor to selectively reset the prediction 
> history of the current logical processor.
>
> For more details, please refer to 
> https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
>
> Bootstrap ok, regression test on i386/x86 backend is ok.
>
> OK for master?
>
> gcc/
>
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect HRESET.
> * common/config/i386/i386-common.c (OPTION_MASK_ISA2_HRESET_SET,
> OPTION_MASK_ISA2_HRESET_UNSET): New macros.
> (ix86_handle_option): Handle -mhreset.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
> Add FEATURE_HRESET.
> * common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
> for hreset.
> * config.gcc: Add hresetintrin.h
> * config/i386/hresetintrin.h: New header file.
> * config/i386/x86gprintrin.h: Include hresetintrin.h.
> * config/i386/cpuid.h (bit_HRESET): New.
> * config/i386/i386-builtin.def: Add new builtin.
> * config/i386/i386-expand.c (ix86_expand_builtin):
> Handle new builtin.
> * config/i386/i386-c.c (ix86_target_macros_internal): Define
> __HRESET__.
> * config/i386/i386-options.c (isa2_opts): Add -mhreset.
> (ix86_valid_target_attribute_inner_p): Handle hreset.
> * config/i386/i386.h (TARGET_HRESET, TARGET_HRESET_P,
> PTA_HRESET): New.
> (PTA_ALDERLAKE): Add PTA_HRESET.
> * config/i386/i386.opt: Add option -mhreset.
> * config/i386/i386.md (UNSPECV_HRESET): New unspec.
> (hreset): New define_insn.
> * doc/invoke.texi: Document -mhreset.
> * doc/extend.texi: Document hreset.
>
> gcc/testsuite/
>
> * gcc.target/i386/hreset-1.c: New test.
> * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> * gcc.target/i386/sse-12.c: Update -mhreset.
> * gcc.target/i386/sse-13.c: Likewise.
> * gcc.target/i386/sse-14.c: Likewise.
> * gcc.target/i386/sse-22.c: Likewise.
> * gcc.target/i386/sse-23.c: Likewise.
> * g++.dg/other/i386-2.C: Likewise.
> * g++.dg/other/i386-3.C: Likewise.

The patch doesn't include all testsuite changes.

Otherwise OK.

Thanks,
Uros.


Re: [Patch] x86: Enable support for Intel UINTR extension

2020-10-13 Thread Uros Bizjak via Gcc-patches
On Tue, Oct 13, 2020 at 10:30 AM Hongyu Wang  wrote:
>
> Hi:
>
> This patch is about to support User Interrupt (UINTR) instructions.
>
> This feature defines user interrupts as new events in the architecture.  They 
> are delivered to software operating in 64-bit mode with CPL = 3 without any 
> change to segmentation state.
>
> For more details, please refer to 
> https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
>
> Bootstrap ok, regression test on i386/x86 backend is ok.
>
> OK for master?
>
> gcc/
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect UINTR.
> * common/config/i386/i386-common.c (OPTION_MASK_ISA2_UINTR_SET
> OPTION_MASK_ISA2_UINTR_UNSET): New.
> (ix86_handle_option): Handle -muintr.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
> Add FEATURE_UINTR.
> * common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
> for uintr.
> * config.gcc: Add uintrintrin.h to extra_headers.
> * config/i386/uintrintrin.h: New.
> * config/i386/cpuid.h (bit_UINTR): New.
> * config/i386/driver-i386.c (host_detect_local_cpu): Detect UINTR.
> * config/i386/i386-builtin-types.def: Add new types.
> * config/i386/i386-builtin.def: Add new builtins.
> * config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins): Add
> __builtin_ia32_testui.
> * config/i386/i386-builtins.h (ix86_builtins): Add
> IX86_BUILTIN_TESTUI.
> * config/i386/i386-c.c (ix86_target_macros_internal): Define
> __UINTR__.
> * config/i386/i386-expand.c (ix86_expand_special_args_builtin):
> Handle UINT8_FTYPE_VOID.
> (ix86_expand_builtin): Handle IX86_BUILTIN_TESTUI.
> * config/i386/i386-options.c (isa2_opts): Add -muintr.
> (ix86_valid_target_attribute_inner_p): Handle UINTR.
> (ix86_option_override_internal): Add TARGET_64BIT check for UINTR.
> * config/i386/i386.h (TARGET_UINTR, TARGET_UINTR_P, PTA_UINTR): New.
> (PTA_SAPPHIRRAPIDS): Add PTA_UINTR.
> * config/i386/i386.opt: Add -muintr.
> * config/i386/i386.md
> (define_int_iterator UINTR_UNSPECV): New.
> (define_int_attr uintr_unspecv): New.
> (uintr_, uintr_senduipi, testui):
> New define_insn patterns.
> * config/i386/x86gprintrin.h: Include uintrintrin.h
> * doc/invoke.texi: Document -muintr.
> * doc/extend.texi: Document uintr.
>
> gcc/testsuite/
>
> * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> * gcc.target/i386/uintr-1.c: New test.
> * gcc.target/i386/uintr-2.c: Ditto.
> * gcc.target/i386/uintr-3.c: Ditto.
> * gcc.target/i386/uintr-4.c: Ditto.
> * gcc.target/i386/uintr-5.c: Ditto.

Please also add -muintr to g++.dg/other/i386-{2,3}.C and
gcc.target/i386-sse-{12,13,14,22,23}.c. This will test new intrinsics
header.

OK with the above change.

Thanks,
Uros.


Re: [RFC][gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p

2020-10-13 Thread Richard Biener
On Tue, 13 Oct 2020, Tom de Vries wrote:

> On 10/12/20 9:15 AM, Richard Biener wrote:
> > On Fri, 9 Oct 2020, Tom de Vries wrote:
> > 
> >> Hi,
> >>
> >> The function gimple_can_duplicate_bb_p currently always returns true.
> >>
> >> The presence of can_duplicate_bb_p in tracer.c however suggests that
> >> there are cases when bb's indeed cannot be duplicated.
> >>
> >> Move the implementation of can_duplicate_bb_p to gimple_can_duplicate_bb_p.
> >>
> >> Bootstrapped and reg-tested on x86_64-linux.
> >>
> >> Build x86_64-linux with nvptx accelerator and tested libgomp.
> >>
> >> No issues found.
> >>
> >> As corner-case check, bootstrapped and reg-tested a patch that makes
> >> gimple_can_duplicate_bb_p always return false, resulting in
> >> PR97333 - "[gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate]
> >> ICE in duplicate_block, at cfghooks.c:1093".
> >>
> >> Any comments?
> > 
> > In principle it's correct to move this to the CFG hook since there
> > now seem to be stmts that cannot be duplicated and thus we need
> > to implement can_duplicate_bb_p.
> > 
> > Some minor things below...
> > 
> >> Thanks,
> >> - Tom
> >>
> >> [gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p
> >>
> >> gcc/ChangeLog:
> >>
> >> 2020-10-09  Tom de Vries  
> >>
> >>* tracer.c (cached_can_duplicate_bb_p): Use can_duplicate_block_p
> >>instead of can_duplicate_bb_p.
> >>(can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p): Move ...
> >>* tree-cfg.c: ... here.
> >>* tracer.c (can_duplicate_bb_p): Move ...
> >>* tree-cfg.c (gimple_can_duplicate_bb_p): here.
> >>* tree-cfg.h (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p):
> >>Declare.
> >>
> >> ---
> >>  gcc/tracer.c   | 61 
> >> +-
> >>  gcc/tree-cfg.c | 54 ++-
> >>  gcc/tree-cfg.h |  2 ++
> >>  3 files changed, 56 insertions(+), 61 deletions(-)
> >>
> >> diff --git a/gcc/tracer.c b/gcc/tracer.c
> >> index e1c2b9527e5..16b46c65b14 100644
> >> --- a/gcc/tracer.c
> >> +++ b/gcc/tracer.c
> >> @@ -84,65 +84,6 @@ bb_seen_p (basic_block bb)
> >>return bitmap_bit_p (bb_seen, bb->index);
> >>  }
> >>  
> >> -/* Return true if gimple stmt G can be duplicated.  */
> >> -static bool
> >> -can_duplicate_insn_p (gimple *g)
> >> -{
> >> -  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
> >> - duplicated as part of its group, or not at all.
> >> - The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such 
> >> a
> >> - group, so the same holds there.  */
> >> -  if (is_gimple_call (g)
> >> -  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
> >> -|| gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> >> -|| gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
> >> -|| gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
> >> -|| gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
> >> -return false;
> >> -
> >> -  return true;
> >> -}
> >> -
> >> -/* Return true if BB can be duplicated.  Avoid iterating over the insns.  
> >> */
> >> -static bool
> >> -can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
> >> -{
> >> -  if (bb->index < NUM_FIXED_BLOCKS)
> >> -return false;
> >> -
> >> -  if (gimple *g = last_stmt (CONST_CAST_BB (bb)))
> >> -{
> >> -  /* A transaction is a single entry multiple exit region.  It
> >> -   must be duplicated in its entirety or not at all.  */
> >> -  if (gimple_code (g) == GIMPLE_TRANSACTION)
> >> -  return false;
> >> -
> >> -  /* An IFN_UNIQUE call must be duplicated as part of its group,
> >> -   or not at all.  */
> >> -  if (is_gimple_call (g)
> >> -&& gimple_call_internal_p (g)
> >> -&& gimple_call_internal_unique_p (g))
> >> -  return false;
> >> -}
> >> -
> >> -  return true;
> >> -}
> >> -
> >> -/* Return true if BB can be duplicated.  */
> >> -static bool
> >> -can_duplicate_bb_p (const_basic_block bb)
> >> -{
> >> -  if (!can_duplicate_bb_no_insn_iter_p (bb))
> >> -return false;
> >> -
> >> -  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
> >> -   !gsi_end_p (gsi); gsi_next (&gsi))
> >> -if (!can_duplicate_insn_p (gsi_stmt (gsi)))
> >> -  return false;
> >> -
> >> -  return true;
> >> -}
> >> -
> >>  static sbitmap can_duplicate_bb;
> >>  
> >>  /* Cache VAL as value of can_duplicate_bb_p for BB.  */
> >> @@ -167,7 +108,7 @@ cached_can_duplicate_bb_p (const_basic_block bb)
> >>return false;
> >>  }
> >>  
> >> -  return can_duplicate_bb_p (bb);
> >> +  return can_duplicate_block_p (bb);
> >>  }
> >>  
> >>  /* Return true if we should ignore the basic block for purposes of 
> >> tracing.  */
> >> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> >> index 5caf3b62d69..a5677859ffc 100644
> >> --- a/gcc/tree-cfg.c
> >> +++ b/gcc/tree-cfg.c
> >> @@ -6208,11 +6208,63 @@ gimple_split_block_before_cond_jump (basic_block 
> >> bb)
> >>  }
> >>

Re: [Patch] collect-utils.c, lto-wrapper + mkoffload: Improve -save-temps filename

2020-10-13 Thread Tom de Vries
On 10/13/20 9:37 PM, Tobias Burnus wrote:
> This patch avoids putting some more files to /tmp/cc* when
> -save-temps has been specified.
> 

Very nice.

> For my testcase, it now generates:
> a.lto_wrapper_args
> a.offload_args

> a.xnvptx-none.args
> a.xnvptx-none.gcc_args
> a.xamdgcn-amdhsa.gcc_args
> a.xamdgcn-amdhsa.gccnative_args

I'd prefer it if nvptx had the same suffixes as gcn, that is, gcc_args
and gccnative_args.  The ".args" is a bit too non-descript for me.

Thanks,
- Tom

> a.xamdgcn-amdhsa.ld_args
> 
> 
> This patch adds an additional argument to collect-utils.c's
> collect_execute (and is wrapper fork_execute) which, if not NULL,
> it is used in 'concat (dumppfx, atsuffix, NULL);'.
> 
> This patch adds a suffix to gcc/config/gcn/mkoffload.c,
> gcc/config/nvptx/mkoffload.c and gcc/lto-wrapper.c.
> 
> It does not (yet) add a suffix to gcc/collect2.c and
> gcc/config/i386/intelmic-mkoffload.c but just passes
> NULL; for intelmic it is not a work item as it does
> not use '@' files at all.
> 
> Hopefully, there is no file which is written twice
> with the same name (or otherwise overridden) and
> the files names do make sense.
> 
> OK?
> 
> Tobias
> 
> PS: There is still cceBdzZk.ofldlist (via lto-plugin/lto-plugin.c),
> and @/tmp/cc* in calls to lto1 and collect2. And collect2.c
> passes NULL also when use_atfile is true.
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
> Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter


Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.

2020-10-13 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
> >   For rtx like
> >   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> >(parallel [(const_int 0) (const_int 1)]))
> >  it could be simplified as inner.
>
> You could even simplify any vec_select of a subreg of X to just a
> vec_select of X, by changing the selection vector a bit (well, only do

Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection.

> this if that is a constant vector, I suppose).  Not just for paradoxical
> subregs either, just for *all* subregs.
>

Yes, and only when X has the same inner mode and more elements.

> > gcc/ChangeLog
> > PR rtl-optimization/97249
> > * simplify-rtx.c (simplify_binary_operation_1): Simplify
> > vec_select of paradoxical subreg.
> >
> > gcc/testsuite/ChangeLog
> >
> > * gcc.target/i386/pr97249-1.c: New test.
>
> > +   /* For cases like
> > +  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> > +   (parallel [(const_int 0) (const_int 1)])).
> > +  return inner directly.  */
> > +   if (GET_CODE (trueop0) == SUBREG
> > +   && paradoxical_subreg_p (trueop0)
> > +   && mode == GET_MODE (XEXP (trueop0, 0))
> > +   && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> > +   && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> > +   && l0 % l1 == 0)
>
> Why this?  Why does the number of elements of the input have to divide
> that of the output?
>

Removed, also add condition for my upper comments.

> > + {
> > +   gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> > +   unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> > +   unsigned HOST_WIDE_INT sel = 0;
> > +   int i = 0;
> > +   for (;i != l1; i++)
>
>   for (int i = 0; i != l1; i++)
>
> > + {
> > +   rtx j = XVECEXP (trueop1, 0, i);
> > +   if (!CONST_INT_P (j))
> > + break;
> > +   sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> > + }
> > +   /* ??? Need to simplify XEXP (trueop0, 0) here.  */
> > +   if (sel == expect)
> > + return XEXP (trueop0, 0);
> > + }
> >   }
>
> If you just handle the much more generic case, all the other vec_select
> simplifications can be done as well, not just this one.
>

Yes, changed, also selection should be inside the elements of X.

> > +/* PR target/97249  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx2 -O3 -masm=att" } */
> > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ 
> > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ 
> > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ 
> > \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
>
> I don't know enough about the x86 backend to know if this is exactly
> what you need in the testsuite.  I do know a case of backslashitis when
> I see one though -- you might want to use {} instead of "", and perhaps
> \m and \M and \s etc.  And to make sure things are on one line, don't do
> all that nastiness with [^\n], just start the RE with (?n) :-)
>

Yes, changed and it's very clean with usage of (?n) and {}.

>
> Segher

Update patch.

-- 
BR,
Hongtao
From df71eb46e394e5b778c69e9e8f25b301997e365d Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
 X.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of a subreg of X to a vec_select of X when
	available.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c| 44 +++
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 
 2 files changed, 74 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..8a10b6cf4d5 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		return subop1;
 		}
 	}
+
+	  /* Simplify vec_select of a subreg of X to just a vec_select of X
+	 when available.  */
+	  int l2;
+	  if (GET_CODE (trueop0) == SUBREG
+	  && (GET_MODE_INNER (mode)
+		  == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0
+	  && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	  && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	  && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0
+		  .is_constant (&l2)
+	  && known_le (l1, l2))
+	{
+	  unsigned HOST_WIDE_INT subreg_offset = 0;
+	  gcc_assert (known_eq (X

libgo patch committed: Set signal PC on NetBSD

2020-10-13 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Nikhil Benesch sets the signal PC field on NetBSD.
The NetBSD libc provides an architecture-independent macro that can
extract the PC from a ucontext struct.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
7e5aeda340d71a84fbd15504e848a949b2a00d5a
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 80e702fc3f5..2c7a9bde825 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-5e76d81ec120e05a59e6c7d173ddf8a3de466bd0
+6cb7b9e924d84125f21f4a2a96aa0d59466056fe
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
index b429fdb2403..d30d1603adc 100644
--- a/libgo/runtime/go-signal.c
+++ b/libgo/runtime/go-signal.c
@@ -229,6 +229,8 @@ getSiginfo(siginfo_t *info, void *context 
__attribute__((unused)))
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
 #elif defined(__aarch64__) && defined(__linux__)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.pc;
+#elif defined(__NetBSD__)
+   ret.sigpc = _UC_MACHINE_PC(((ucontext_t*)(context)));
 #endif
 
if (ret.sigpc == 0) {


libgo patch committed: Fix NUL terminator check in socket length

2020-10-13 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Nikhil Benesch backports a fix for *BSD unix
sockets from the master sources.  *BSD does not include the null
terminator when in its reported socketlength.  Port the upstream
bugfix for the issue (https://golang.org/issue/6627).  This was likely
missed during the usual upstream merge because the gc and gccgo socket
implementations have diverged quite a bit.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
1f15bcd13f4019744deec3140b575101fbd651ba
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 8f71939862b..80e702fc3f5 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-fef8afc1876f4a1d5e9a8fd54c21bf5917966e10
+5e76d81ec120e05a59e6c7d173ddf8a3de466bd0
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/socket_bsd.go b/libgo/go/syscall/socket_bsd.go
index f62457f2bdb..40637bc7818 100644
--- a/libgo/go/syscall/socket_bsd.go
+++ b/libgo/go/syscall/socket_bsd.go
@@ -52,13 +52,19 @@ func (sa *RawSockaddrUnix) setLen(n int) {
 }
 
 func (sa *RawSockaddrUnix) getLen() (int, error) {
-   if sa.Len < 3 || sa.Len > SizeofSockaddrUnix {
+   if sa.Len < 2 || sa.Len > SizeofSockaddrUnix {
return 0, EINVAL
}
-   n := int(sa.Len) - 3 // subtract leading Family, Len, terminating NUL.
+
+   // Some BSDs include the trailing NUL in the length, whereas
+   // others do not. Work around this by subtracting the leading
+   // family and len. The path is then scanned to see if a NUL
+   // terminator still exists within the length.
+   n := int(sa.Len) - 2 // subtract leading Family, Len
for i := 0; i < n; i++ {
if sa.Path[i] == 0 {
-   // found early NUL; assume Len is overestimating.
+   // found early NUL; assume Len included the NUL
+   // or was overestimating.
n = i
break
}


Re: [PATCH v2] PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values.

2020-10-13 Thread Kito Cheng via Gcc-patches
Thanks for reviewing that, committed to trunk :)

On Tue, Oct 13, 2020 at 5:38 PM Eric Botcazou  wrote:
>
> > Do you mind having a review for that?
>
> Sorry for missing the v2 patch; yes, it looks good to me.
>
> --
> Eric Botcazou
>
>


[Patch] x86: Enable GCC support for Intel AVX-VNNI extension

2020-10-13 Thread Hongyu Wang via Gcc-patches
Hi:

This patch is about to support Intel AVX-VNNI instructions.

AVX-VNNI is an equivalent to AVX512-VNNI with VEX encoding. The
instructions are same, but with extra {vex} prefix to distinguish from
AVX512-VNNI instructions in assembler.

For more details, please refer to
https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Bootstrap ok, regression test on i386/x86 backend is ok.

OK for master?

2020-10-13  Hongtao Liu  
Hongyu Wang  

gcc/
* common/config/i386/cpuinfo.h (get_available_features):
Detect AVXVNNI.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA2_AVXVNNI_SET,
OPTION_MASK_ISA2_AVXVNNI_UNSET, OPTION_MASK_ISA2_AVX2_UNSET):
New.
(ix86_hanlde_option): Handle -mavxvnni, unset avxvnni when
avx2 is disabled.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVXVNNI.
* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
for avxvnni.
* config.gcc: Add avxvnniintrin.h.
* config/i386/avx512vnniintrin.h: Remove 128/256 bit non-mask
intrinsics.
* config/i386/avxvnniintrin.h: New header file.
* config/i386/cpuid.h (bit_AVXVNNI): New.
* config/i386/i386-builtins.c (def_builtin): Handle AVXVNNI mask
for unified builtin.
* config/i386/i386-builtin.def (BDESC): Adjust AVX512VNNI
builtins for AVXVNNI.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVXVNNI__.
* config/i386/i386-expand.c (ix86_expand_builtin): Handle bisa
for AVXVNNI to support unified intrinsic name, since there is no
dependency between AVX512VNNI and AVXVNNI.
* config/i386/i386-options.c (isa2_opts): Add -mavxvnni.
(ix86_valid_target_attribute_inner_p): Handle avxnnni.
(ix86_valid_target_attribute_inner_p): Ditto.
* config/i386/i386.h (TARGET_AVXVNNI, TARGET_AVXVNNI_P,
TARGET_AVXVNNI_P, PTA_AVXVNNI): New.
(PTA_SAPPHIRERAPIDS): Add AVX_VNNI.
(PTA_ALDERLAKE): Likewise.
* config/i386/i386.md ("isa"): Add avxvnni, avx512vnnivl.
("enabled"): Adjust for avxvnni and avx512vnnivl.
* config/i386/i386.opt: Add option -mavxvnni.
* config/i386/immintrin.h: Include avxvnniintrin.h.
* config/i386/sse.md (vpdpbusd_): Adjust for AVXVNNI.
(vpdpbusds_): Likewise.
(vpdpwssd_): Likewise.
(vpdpwssds_): Likewise.
(vpdpbusd_v16si): New.
(vpdpbusds_v16si): Likewise.
(vpdpwssd_v16si): Likewise.
(vpdpwssds_v16si): Likewise.
* doc/invoke.texi: Document -mavxvnni.
* doc/extend.texi: Document avxvnni.
* doc/sourcebuild.texi: Document target avxvnni.

gcc/testsuite/

* gcc.target/i386/avx512vl-vnni-1.c: Rename..
* gcc.target/i386/avx512vl-vnni-1a.c: To This.
* gcc.target/i386/avx512vl-vnni-1b.c: New test.
* gcc.target/i386/avx512vl-vnni-2.c: Ditto.
* gcc.target/i386/avx512vl-vnni-3.c: Ditto.
* gcc.target/i386/avx-vnni-1.c: Ditto.
* gcc.target/i386/avx-vnni-2.c: Ditto.
* gcc.target/i386/avx-vnni-3.c: Ditto.
* gcc.target/i386/avx-vnni-4.c: Ditto.
* gcc.target/i386/avx-vnni-5.c: Ditto.
* gcc.target/i386/avx-vnni-6.c: Ditto.
* gcc.target/i386/avx-vpdpbusd-2.c: Ditto.
* gcc.target/i386/avx-vpdpbusds-2.c: Ditto.
* gcc.target/i386/avx-vpdpwssd-2.c: Ditto.
* gcc.target/i386/avx-vpdpwssds-2.c: Ditto.
* gcc.target/i386/vnni_inline_error.c: Ditto.
* gcc.target/i386/avx512vnnivl-builtin.c: Ditto.
* gcc.target/i386/avxvnni-builtin.c: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/pr83488-3.c: Adjust.
* gcc.target/i386/sse-12.c: Add -mavxvnni.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.
* lib/target-supports.exp (check_effective_target_avxvnni):
New proc.

-- 
Regards,

Hongyu, Wang
From c297f790f7f6579d2c65e74e3c976fdb0e535193 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 13 Oct 2020 16:16:16 +0800
Subject: [PATCH] Support Intel AVX VNNI

2020-10-13  Hongtao Liu  
	Hongyu Wang  

gcc/
	* common/config/i386/cpuinfo.h (get_available_features):
	Detect AVXVNNI.
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA2_AVXVNNI_SET,
	OPTION_MASK_ISA2_AVXVNNI_UNSET, OPTION_MASK_ISA2_AVX2_UNSET):
	New.
	(ix86_hanlde_option): Handle -mavxvnni, unset avxvnni when
	avx2 is disabled.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_AVXVNNI.
	* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
	for avxvnni.
	* config.gcc: Add avxvnniintrin.h.
	* config/i386/avx512vnniintrin.h: Remove 128/256 bit non-mask
	intrinsics.
	* config/i386/avxvnniintrin.h: New header file.
	* config/i386/cpuid.h (bit_AVXVNNI): New.
	* config/i386/i386-builtins.c (def_builtin): Handle AVXVNNI mask
	for unified builtin.
	* config/i386/i386-b

libgo patch committed: Ensure uniqueness of type descriptors on AIX

2020-10-13 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot ensures the uniqueness of type
descriptors on AIX.  On AIX, duplication of type descriptors can occur
if one is declared in the shared libgo and one in the Go program being
compiled.  The AIX linker isn't able to merge them together as
GNU/Linux one does.  One solution is to always load libgo first but
that needs a huge mechanism in gcc core. Thus, this patch ensures that
the duplication isn't visible for the end user.

In reflect and internal/reflectlite, the comparison of rtypes is made
on their name and not only on their addresses.

In reflect, toType() function is using a canonicalization map to force
rtypes having the same rtype.String() to return the same Type. This
can't be made in internal/reflectlite as it needs sync package.  But,
for now, it doesn't matter as internal/reflectlite is not widely used.

This fixes golang.org/issue/39276.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
a462dfb2de1716ece90bca93ec804ff3110e0e65
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 930339e9b44..8f71939862b 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-2563706e4ead80d6906d66ae23c8915c360583ad
+fef8afc1876f4a1d5e9a8fd54c21bf5917966e10
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index e3f08a2df39..76cdc8ef217 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -973,6 +973,12 @@ endif
 # Also use -fno-inline to get better results from the memory profiler.
 runtime_pprof_check_GOCFLAGS = -static-libgo -fno-inline
 
+if LIBGO_IS_AIX
+# reflect tests must be done with -static-libgo. Otherwize,
+# there will be a duplication of the canonicalization map.
+reflect_check_GOCFLAGS = -static-libgo -Wl,-bbigtoc
+endif
+
 if HAVE_STATIC_LINK
 # Use -static for the syscall tests if possible, because otherwise when
 # running as root the re-execs ignore LD_LIBRARY_PATH.
diff --git a/libgo/go/internal/reflectlite/eqtype.go 
b/libgo/go/internal/reflectlite/eqtype.go
new file mode 100644
index 000..a03cf1c55f9
--- /dev/null
+++ b/libgo/go/internal/reflectlite/eqtype.go
@@ -0,0 +1,12 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//+build !aix !gccgo
+
+package reflectlite
+
+// rtypeEqual returns true if both types are identical.
+func rtypeEqual(t1, t2 *rtype) bool {
+   return t1 == t2
+}
diff --git a/libgo/go/internal/reflectlite/eqtype_aix_gccgo.go 
b/libgo/go/internal/reflectlite/eqtype_aix_gccgo.go
new file mode 100644
index 000..38b507fd827
--- /dev/null
+++ b/libgo/go/internal/reflectlite/eqtype_aix_gccgo.go
@@ -0,0 +1,26 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//+build aix,gccgo
+
+// AIX linker isn't able to merge identical type descriptors coming from
+// different objects. Thus, two rtypes might have two different pointers
+// even if they are the same. Thus, instead of pointer equality, string
+// field is checked.
+
+package reflectlite
+
+// rtypeEqual returns true if both types are identical.
+func rtypeEqual(t1, t2 *rtype) bool {
+   switch {
+   case t1 == t2:
+   return true
+   case t1 == nil || t2 == nil:
+   return false
+   case t1.kind != t2.kind || t1.hash != t2.hash:
+   return false
+   default:
+   return t1.String() == t2.String()
+   }
+}
diff --git a/libgo/go/internal/reflectlite/type.go 
b/libgo/go/internal/reflectlite/type.go
index e700a554e41..1609a06a53e 100644
--- a/libgo/go/internal/reflectlite/type.go
+++ b/libgo/go/internal/reflectlite/type.go
@@ -539,7 +539,7 @@ func implements(T, V *rtype) bool {
for j := 0; j < len(v.methods); j++ {
tm := &t.methods[i]
vm := &v.methods[j]
-   if *vm.name == *tm.name && (vm.pkgPath == tm.pkgPath || 
(vm.pkgPath != nil && tm.pkgPath != nil && *vm.pkgPath == *tm.pkgPath)) && 
toType(vm.typ).common() == toType(tm.typ).common() {
+   if *vm.name == *tm.name && (vm.pkgPath == tm.pkgPath || 
(vm.pkgPath != nil && tm.pkgPath != nil && *vm.pkgPath == *tm.pkgPath)) && 
rtypeEqual(toType(vm.typ).common(), toType(tm.typ).common()) {
if i++; i >= len(t.methods) {
return true
}
@@ -556,7 +556,7 @@ func implements(T, V *rtype) bool {
for j := 0; j < len(v.methods); j++ {
tm := &t.methods[i]
vm := &v.methods[j]
-   if *vm.name == *tm.name && (vm.pkgPath == tm.pkgPath || 
(vm.pkgPath != nil && tm.pkgPath != nil && *vm.pkg

Re: [PATCH] PR fortran/97408 - Diagnose non-constant KIND argument to intrinsics

2020-10-13 Thread Tobias Burnus

On 10/13/20 10:17 PM, Harald Anlauf wrote:


The KIND argument to intrinsics must be a compile-time argument.
Improve check so that the proper diagnostics is emitted.


-  if (!gfc_check_init_expr (k))
+  if (!gfc_check_init_expr (k) || k->expr_type == EXPR_VARIABLE)


I think the real question is why is the following regarded as initialization 
expression:
  t = true;
…
  if (gfc_check_iter_variable (e))
break;

Or worded differently: If
  integer, parameter :: A(*) = [(i, i=1,5)]
is valid, which should
  integer, parameter :: B(*) = [integer :: (int(i, kind=i), i=1,2)]
be invalid?

And, indeed, the Intel Fortran compiler does accept the code:
https://godbolt.org/z/EKbTf1
(While FLANG like gfortran does not.)

Thus, the first question should be whether that is valid code
according to the Fortran standard or not.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] openmp: Add support for omp_get_supported_active_levels

2020-10-13 Thread Kwok Cheung Yeung

Now committed to trunk with the suggested fixes. Thanks for the quick review.

Kwok

On 13/10/2020 7:36 pm, Jakub Jelinek wrote:


I'd suggest to
#define gomp_supported_active_levels INT_MAX
in libgomp.h and leave out the const variable.  Another possibility is an
enumerator, but we don't include limits.h in libgomp.h.

OMP_5.0 symbol version has been shipped already in GCC 9.  So we should
never add any further symbols to it.
Thus it needs to be added to OMP_5.0.1 symbol version instead (which is new
in GCC 11).

Otherwise LGTM.


[PATCH] PR fortran/97408 - Diagnose non-constant KIND argument to intrinsics

2020-10-13 Thread Harald Anlauf
While looking at some other PR, I found the urgent need for a rather
obvious improvement to compile-time diagnostics:
The KIND argument to intrinsics must be a compile-time constant.

Regtested on x86_64-pc-linux-gnu.

OK for master?

Thanks,
Harald


PR fortran/97408 - Diagnose non-constant KIND argument to intrinsics

The KIND argument to intrinsics must be a compile-time argument.
Improve check so that the proper diagnostics is emitted.

gcc/fortran/ChangeLog:

* check.c (kind_check): Enhance check for non-constant KIND
arguments.

gcc/testsuite/ChangeLog:

* gfortran.dg/kind_2.f90: New test.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 1e64fab3401..fa795538a7c 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -646,7 +646,7 @@ kind_check (gfc_expr *k, int n, bt type)
   if (!scalar_check (k, n))
 return false;

-  if (!gfc_check_init_expr (k))
+  if (!gfc_check_init_expr (k) || k->expr_type == EXPR_VARIABLE)
 {
   gfc_error ("%qs argument of %qs intrinsic at %L must be a constant",
 		 gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic,
diff --git a/gcc/testsuite/gfortran.dg/kind_2.f90 b/gcc/testsuite/gfortran.dg/kind_2.f90
new file mode 100644
index 000..f3e5b7503ef
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/kind_2.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! PR97408 - Diagnose non-constant KIND argument to intrinsics
+
+program p
+  implicit none
+  integer :: i
+  integer, parameter :: lk(1) = [ 4 ]
+  print *, (int (1 , lk(i)), i=1,1) ! { dg-error "must be a constant" }
+  print *, (real(1 , lk(i)), i=1,1) ! { dg-error "must be a constant" }
+  print *, (cmplx   (1, kind=lk(i)), i=1,1) ! { dg-error "must be a constant" }
+  print *, (logical (.true., lk(i)), i=1,1) ! { dg-error "must be a constant" }
+end


Re: [PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.

2020-10-13 Thread Segher Boessenkool
Hi!

On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote:
>   For rtx like
>   (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
>(parallel [(const_int 0) (const_int 1)]))
>  it could be simplified as inner.

You could even simplify any vec_select of a subreg of X to just a
vec_select of X, by changing the selection vector a bit (well, only do
this if that is a constant vector, I suppose).  Not just for paradoxical
subregs either, just for *all* subregs.

> gcc/ChangeLog
> PR rtl-optimization/97249
> * simplify-rtx.c (simplify_binary_operation_1): Simplify
> vec_select of paradoxical subreg.
> 
> gcc/testsuite/ChangeLog
> 
> * gcc.target/i386/pr97249-1.c: New test.

> +   /* For cases like
> +  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
> +   (parallel [(const_int 0) (const_int 1)])).
> +  return inner directly.  */
> +   if (GET_CODE (trueop0) == SUBREG
> +   && paradoxical_subreg_p (trueop0)
> +   && mode == GET_MODE (XEXP (trueop0, 0))
> +   && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
> +   && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +   && l0 % l1 == 0)

Why this?  Why does the number of elements of the input have to divide
that of the output?

> + {
> +   gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
> +   unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
> +   unsigned HOST_WIDE_INT sel = 0;
> +   int i = 0;
> +   for (;i != l1; i++)

  for (int i = 0; i != l1; i++)

> + {
> +   rtx j = XVECEXP (trueop1, 0, i);
> +   if (!CONST_INT_P (j))
> + break;
> +   sel |= HOST_WIDE_INT_1U << UINTVAL (j);
> + }
> +   /* ??? Need to simplify XEXP (trueop0, 0) here.  */
> +   if (sel == expect)
> + return XEXP (trueop0, 0);
> + }
>   }

If you just handle the much more generic case, all the other vec_select
simplifications can be done as well, not just this one.

> +/* PR target/97249  */
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "vpmovzxbw\[ 
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxwd\[ 
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vpmovzxdq\[ 
> \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */

I don't know enough about the x86 backend to know if this is exactly
what you need in the testsuite.  I do know a case of backslashitis when
I see one though -- you might want to use {} instead of "", and perhaps
\m and \M and \s etc.  And to make sure things are on one line, don't do
all that nastiness with [^\n], just start the RE with (?n) :-)


Segher


[Patch] collect-utils.c, lto-wrapper + mkoffload: Improve -save-temps filename

2020-10-13 Thread Tobias Burnus

This patch avoids putting some more files to /tmp/cc* when
-save-temps has been specified.

For my testcase, it now generates:
a.lto_wrapper_args
a.offload_args
a.xnvptx-none.args
a.xnvptx-none.gcc_args
a.xamdgcn-amdhsa.gcc_args
a.xamdgcn-amdhsa.gccnative_args
a.xamdgcn-amdhsa.ld_args


This patch adds an additional argument to collect-utils.c's
collect_execute (and is wrapper fork_execute) which, if not NULL,
it is used in 'concat (dumppfx, atsuffix, NULL);'.

This patch adds a suffix to gcc/config/gcn/mkoffload.c,
gcc/config/nvptx/mkoffload.c and gcc/lto-wrapper.c.

It does not (yet) add a suffix to gcc/collect2.c and
gcc/config/i386/intelmic-mkoffload.c but just passes
NULL; for intelmic it is not a work item as it does
not use '@' files at all.

Hopefully, there is no file which is written twice
with the same name (or otherwise overridden) and
the files names do make sense.

OK?

Tobias

PS: There is still cceBdzZk.ofldlist (via lto-plugin/lto-plugin.c),
and @/tmp/cc* in calls to lto1 and collect2. And collect2.c
passes NULL also when use_atfile is true.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
collect-utils.c, lto-wrapper + mkoffload: Improve -save-temps filename

gcc/ChangeLog:

	* collect-utils.c (collect_execute, fork_execute): Add at-file suffix
	argument.
	* collect-utils.h (collect_execute, fork_execute): Update prototype.
	* collect2.c (maybe_run_lto_and_relink, do_link, main, do_dsymutil):
	Update calls by passing NULL.
	* config/i386/intelmic-mkoffload.c (compile_for_target,
	generate_host_descr_file, prepare_target_image, main): Likewise.
	* config/gcn/mkoffload.c (compile_native, main): Pass at-file suffix.
	* config/nvptx/mkoffload.c (compile_native, main): Likewise.
	* lto-wrapper.c (compile_offload_image): Likewise.

 gcc/collect-utils.c  | 13 +
 gcc/collect-utils.h  |  4 ++--
 gcc/collect2.c   | 17 +
 gcc/config/gcn/mkoffload.c   |  7 ---
 gcc/config/i386/intelmic-mkoffload.c | 12 ++--
 gcc/config/nvptx/mkoffload.c |  4 ++--
 gcc/lto-wrapper.c| 13 +
 7 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/gcc/collect-utils.c b/gcc/collect-utils.c
index d4fa2c3d345..095db8d7547 100644
--- a/gcc/collect-utils.c
+++ b/gcc/collect-utils.c
@@ -104,7 +104,8 @@ do_wait (const char *prog, struct pex_obj *pex)
 
 struct pex_obj *
 collect_execute (const char *prog, char **argv, const char *outname,
-		 const char *errname, int flags, bool use_atfile)
+		 const char *errname, int flags, bool use_atfile,
+		 const char *atsuffix)
 {
   struct pex_obj *pex;
   const char *errmsg;
@@ -126,7 +127,10 @@ collect_execute (const char *prog, char **argv, const char *outname,
   /* Note: we assume argv contains at least one element; this is
  checked above.  */
 
-  response_file = make_temp_file ("");
+  if (!save_temps || !atsuffix)
+	response_file = make_temp_file ("");
+  else
+	response_file = concat (dumppfx, atsuffix, NULL);
 
   f = fopen (response_file, "w");
 
@@ -202,12 +206,13 @@ collect_execute (const char *prog, char **argv, const char *outname,
 }
 
 void
-fork_execute (const char *prog, char **argv, bool use_atfile)
+fork_execute (const char *prog, char **argv, bool use_atfile,
+	  const char *atsuffix)
 {
   struct pex_obj *pex;
 
   pex = collect_execute (prog, argv, NULL, NULL,
-			 PEX_LAST | PEX_SEARCH, use_atfile);
+			 PEX_LAST | PEX_SEARCH, use_atfile, atsuffix);
   do_wait (prog, pex);
 }
 
diff --git a/gcc/collect-utils.h b/gcc/collect-utils.h
index 6ff7d9d96df..482225764a9 100644
--- a/gcc/collect-utils.h
+++ b/gcc/collect-utils.h
@@ -27,10 +27,10 @@ extern void fatal_signal (int);
 
 extern struct pex_obj *collect_execute (const char *, char **,
 	const char *, const char *,
-	int, bool);
+	int, bool, const char *);
 extern int collect_wait (const char *, struct pex_obj *);
 extern void do_wait (const char *, struct pex_obj *);
-extern void fork_execute (const char *, char **, bool);
+extern void fork_execute (const char *, char **, bool, const char *);
 extern void utils_cleanup (bool);
 
 
diff --git a/gcc/collect2.c b/gcc/collect2.c
index 6d074a79e91..3a43a5a61aa 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -644,7 +644,7 @@ maybe_run_lto_and_relink (char **lto_ld_argv, char **object_lst,
 
   /* Run the LTO back end.  */
   pex = collect_execute (prog, lto_c_argv, NULL, NULL, PEX_SEARCH,
-			 at_file_supplied);
+			 at_file_supplied, NULL);
   {
 	int c;
 	FILE *stream;
@@ -727,7 +727,8 @@ maybe_run_lto_and_relink (char **lto_ld_argv, char **object_lst,
 
   /* Run the linker again, this time replacing the object files
  optimized by the LTO with the temporary file generated by the LTO.  */
-  fork_exec

Re: [PATCH] combine: Fix up simplify_shift_const_1 for nested ROTATEs [PR97386]

2020-10-13 Thread Segher Boessenkool
On Tue, Oct 13, 2020 at 07:03:15PM +0200, Jakub Jelinek wrote:
> On Tue, Oct 13, 2020 at 11:47:10AM -0500, Segher Boessenkool wrote:
> > On Tue, Oct 13, 2020 at 09:44:25AM +0200, Jakub Jelinek wrote:
> > > The following testcases are miscompiled (the first one since my 
> > > improvements
> > > to rotate discovery on GIMPLE, the other one for many years) because
> > > combiner optimizes nested ROTATEs with narrowing SUBREG in between (i.e.
> > > the outer rotate is performed in shorter precision than the inner one) to
> > > just one ROTATE of the rotated constant.  While that (under certain
> > > conditions) can work for shifts, it can't work for rotates where we can 
> > > only
> > > do that with rotates of the same precision.
> > 
> > > OT, on the other side I wonder why the code doesn't handle ROTATERT.  
> > > While
> > 
> > ROTATERT is a relatively recent invention, and isn't handled in many
> > places.
> 
> Depends on how we define recent.
> I see ROTATERT already in 1994 code ;)

Wow, it is even in the 1991 original revision already.  I just never
saw it used, I guess :-)  What is new is that we canonicalize to
ROTATERT from ROTATE, if that makes the constant rotate amount smaller
(and both forms are allowed at all) (which contradicts the
documentation btw: the docs say that rotatert by a constant is not
canonical!)


Segher


Re: Typo in comment in stl_algo.h's any_of

2020-10-13 Thread Jonathan Wakely via Gcc-patches

On 13/10/20 19:23 +0100, Nuno Lopes via Libstdc++ wrote:

Hi,

There's a typo in the comment of any_of in stl_algo.h.
Here's a trivial patch:

--- stl_algo.h.old  2020-10-13 19:16:48.836304600 +0100
+++ stl_algo.h  2020-10-13 19:17:27.357511100 +0100
@@ -471,7 +471,7 @@
{ return __last == _GLIBCXX_STD_A::find_if(__first, __last, __pred); }

  /**
-   *  @brief  Checks that a predicate is false for at least an element
+   *  @brief  Checks that a predicate is true for at least an element
   *  of a sequence.
   *  @ingroup non_mutating_algorithms
   *  @param  __first   An input iterator.


Thanks, applied to trunk.

I also changed "an element" to "one element".


commit 5204cc561a8d3c1a671969715ceb507ece8edef7
Author: Nuno Lopes 
Date:   Tue Oct 13 20:21:55 2020

libstdc++: Fix doxygen comment for std::any_of

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h (any_of): Fix incorrect description
in comment.

diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 2478b5857c1..621c6331422 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -471,7 +471,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return __last == _GLIBCXX_STD_A::find_if(__first, __last, __pred); }
 
   /**
-   *  @brief  Checks that a predicate is false for at least an element
+   *  @brief  Checks that a predicate is true for at least one element
*  of a sequence.
*  @ingroup non_mutating_algorithms
*  @param  __first   An input iterator.


Re: [PATCH] openmp: Add support for omp_get_supported_active_levels

2020-10-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 13, 2020 at 07:05:10PM +0100, Kwok Cheung Yeung wrote:
> --- a/libgomp/env.c
> +++ b/libgomp/env.c
> @@ -73,7 +73,8 @@ struct gomp_task_icv gomp_global_icv = {
>.target_data = NULL
>  };
>  
> -unsigned long gomp_max_active_levels_var = INT_MAX;
> +const unsigned long gomp_supported_active_levels = INT_MAX;
> +unsigned long gomp_max_active_levels_var = gomp_supported_active_levels;

This is not valid C, and while gcc currently in GNU mode accepts it (I think
with optimization only?), we shouldn't use that.
It is valid C++ though.

I'd suggest to
#define gomp_supported_active_levels INT_MAX
in libgomp.h and leave out the const variable.  Another possibility is an
enumerator, but we don't include limits.h in libgomp.h.

> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -172,6 +172,8 @@ OMP_5.0 {
>   omp_display_affinity_;
>   omp_get_affinity_format;
>   omp_get_affinity_format_;
> + omp_get_supported_active_levels;
> + omp_get_supported_active_levels_;

OMP_5.0 symbol version has been shipped already in GCC 9.  So we should
never add any further symbols to it.
Thus it needs to be added to OMP_5.0.1 symbol version instead (which is new
in GCC 11).

Otherwise LGTM.

Jakub



[PATCH] openmp: Add support for omp_get_supported_active_levels

2020-10-13 Thread Kwok Cheung Yeung

Hello

This adds support for the omp_get_supported_active_levels OpenMP runtime 
routine, first introduced in the 5.0 standard. This routine returns the maximum 
level of nested active parallel regions supported on a particular implementation 
of OpenMP, and the current maximum number cannot be set higher than this.


The maximum number in libgomp was initialized to INT_MAX (effectively uncapped), 
so I set the corresponding ICV for omp_get_supported_active_levels to INT_MAX 
too. Attempts to set the maximum to higher than this using 
omp_set_max_active_levels or by setting OMP_MAX_ACTIVE_LEVELS will result in the 
max active levels silently being set to the maximum supported (although since 
omp_set_max_active_levels takes an int and the maximum supported is INT_MAX, 
this is effectively impossible at the moment).


Okay for trunk?

Thanks

Kwok
commit aa519103d7eeaeed825fd358e9532bf51f4be0a9
Author: Kwok Cheung Yeung 
Date:   Wed Oct 7 09:34:32 2020 -0700

openmp: Add support for the omp_get_supported_active_levels runtime library 
routine

This patch implements the omp_get_supported_active_levels runtime routine
from the OpenMP 5.0 specification, which returns the maximum number of
active nested parallel regions supported by this implementation.  The
current maximum (set using the omp_set_max_active_levels routine or the
OMP_MAX_ACTIVE_LEVELS environment variable) cannot exceed this number.

2020-10-13  Kwok Cheung Yeung  

libgomp/
* env.c (gomp_supported_active_levels): New global variable.
(gomp_max_active_levels_var): Initialize to
gomp_supported_active_levels.
(initialize_env): Limit gomp_max_active_levels_var to be at most
equal to gomp_supported_active_levels.
* fortran.c (omp_get_supported_active_levels): Add ialias_redirect.
(omp_get_supported_active_levels_): New.
* icv.c (omp_set_max_active_levels): Limit gomp_max_active_levels_var
to at most equal to gomp_supported_active_levels.
(omp_get_supported_active_levels): New.
* libgomp.h (gomp_supported_active_levels): New.
* libgomp.map (OMP_5.0): Add omp_get_supported_active_levels and
omp_get_supported_active_levels_.
* libgomp.texi (omp_get_supported_active_levels): New.
(omp_set_max_active_levels): Update.  Add reference to
omp_get_supported_active_levels.
* omp.h.in (omp_get_supported_active_levels): New.
* omp_lib.f90.in (omp_get_supported_active_levels): New.
* omp_lib.h.in (omp_get_supported_active_levels): New.
* testsuite/libgomp.c/lib-2.c (main): Check omp_get_max_active_levels
against omp_get_supported_active_levels.
* testsuite/libgomp.fortran/lib4.f90 (lib4): Likewise.

diff --git a/libgomp/env.c b/libgomp/env.c
index c0c4730..539d7a9 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -73,7 +73,8 @@ struct gomp_task_icv gomp_global_icv = {
   .target_data = NULL
 };
 
-unsigned long gomp_max_active_levels_var = INT_MAX;
+const unsigned long gomp_supported_active_levels = INT_MAX;
+unsigned long gomp_max_active_levels_var = gomp_supported_active_levels;
 bool gomp_cancel_var = false;
 int gomp_max_task_priority_var = 0;
 #ifndef HAVE_SYNC_BUILTINS
@@ -1369,6 +1370,8 @@ initialize_env (void)
   parse_int ("OMP_MAX_TASK_PRIORITY", &gomp_max_task_priority_var, true);
   parse_unsigned_long ("OMP_MAX_ACTIVE_LEVELS", &gomp_max_active_levels_var,
   true);
+  if (gomp_max_active_levels_var > gomp_supported_active_levels)
+gomp_max_active_levels_var = gomp_supported_active_levels;
   gomp_def_allocator = parse_allocator ();
   if (parse_unsigned_long ("OMP_THREAD_LIMIT", &thread_limit_var, false))
 {
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 9d838b3..029dec1 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -63,6 +63,7 @@ ialias_redirect (omp_get_schedule)
 ialias_redirect (omp_get_thread_limit)
 ialias_redirect (omp_set_max_active_levels)
 ialias_redirect (omp_get_max_active_levels)
+ialias_redirect (omp_get_supported_active_levels)
 ialias_redirect (omp_get_level)
 ialias_redirect (omp_get_ancestor_thread_num)
 ialias_redirect (omp_get_team_size)
@@ -418,6 +419,12 @@ omp_get_max_active_levels_ (void)
 }
 
 int32_t
+omp_get_supported_active_levels_ (void)
+{
+  return omp_get_supported_active_levels ();
+}
+
+int32_t
 omp_get_level_ (void)
 {
   return omp_get_level ();
diff --git a/libgomp/icv.c b/libgomp/icv.c
index 3c16abb..1bb46ab 100644
--- a/libgomp/icv.c
+++ b/libgomp/icv.c
@@ -116,7 +116,12 @@ void
 omp_set_max_active_levels (int max_levels)
 {
   if (max_levels >= 0)
-gomp_max_active_levels_var = max_levels;
+{
+  if (max_levels <= gomp_supported_active_levels)
+   gomp_max_active_levels_var = max_levels;
+  else
+   gomp_max_active_levels_var = gomp_supported_active_levels;
+}
 }
 
 int
@@ -126,6 +131,12 @@ omp_get_max_active_levels 

[PATCH] x86: Add missing intrinsics [PR95483]

2020-10-13 Thread Sunil K Pandey via Gcc-patches
Tested on x86-64.

gcc/ChangeLog:

* config/i386/avx2intrin.h (_mm_broadcastsi128_si256): New intrinsics.
(_mm_broadcastsd_pd): Ditto.
* config/i386/avx512bwintrin.h (_mm512_loadu_epi16): New intrinsics.
(_mm512_storeu_epi16): Ditto.
(_mm512_loadu_epi8): Ditto.
(_mm512_storeu_epi8): Ditto.
* config/i386/avx512dqintrin.h (_mm_reduce_round_sd): New intrinsics.
(_mm_mask_reduce_round_sd): Ditto.
(_mm_maskz_reduce_round_sd): Ditto.
(_mm_reduce_round_ss): Ditto.
(_mm_mask_reduce_round_ss): Ditto.
(_mm_maskz_reduce_round_ss): Ditto.
(_mm512_reduce_round_pd): Ditto.
(_mm512_mask_reduce_round_pd): Ditto.
(_mm512_maskz_reduce_round_pd): Ditto.
(_mm512_reduce_round_ps): Ditto.
(_mm512_mask_reduce_round_ps): Ditto.
(_mm512_maskz_reduce_round_ps): Ditto.
* config/i386/avx512erintrin.h
(_mm_mask_rcp28_round_sd): New intrinsics.
(_mm_maskz_rcp28_round_sd): Ditto.
(_mm_mask_rcp28_round_ss): Ditto.
(_mm_maskz_rcp28_round_ss): Ditto.
(_mm_mask_rsqrt28_round_sd): Ditto.
(_mm_maskz_rsqrt28_round_sd): Ditto.
(_mm_mask_rsqrt28_round_ss): Ditto.
(_mm_maskz_rsqrt28_round_ss): Ditto.
(_mm_mask_rcp28_sd): Ditto.
(_mm_maskz_rcp28_sd): Ditto.
(_mm_mask_rcp28_ss): Ditto.
(_mm_maskz_rcp28_ss): Ditto.
(_mm_mask_rsqrt28_sd): Ditto.
(_mm_maskz_rsqrt28_sd): Ditto.
(_mm_mask_rsqrt28_ss): Ditto.
(_mm_maskz_rsqrt28_ss): Ditto.
* config/i386/avx512fintrin.h (_mm_mask_sqrt_sd): New intrinsics.
(_mm_maskz_sqrt_sd): Ditto.
(_mm_mask_sqrt_ss): Ditto.
(_mm_maskz_sqrt_ss): Ditto.
(_mm_mask_scalef_sd): Ditto.
(_mm_maskz_scalef_sd): Ditto.
(_mm_mask_scalef_ss): Ditto.
(_mm_maskz_scalef_ss): Ditto.
(_mm_mask_cvt_roundsd_ss): Ditto.
(_mm_maskz_cvt_roundsd_ss): Ditto.
(_mm_mask_cvt_roundss_sd): Ditto.
(_mm_maskz_cvt_roundss_sd): Ditto.
(_mm_mask_cvtss_sd): Ditto.
(_mm_maskz_cvtss_sd): Ditto.
(_mm_mask_cvtsd_ss): Ditto.
(_mm_maskz_cvtsd_ss): Ditto.
(_mm512_cvtsi512_si32): Ditto.
(_mm_cvtsd_i32): Ditto.
(_mm_cvtss_i32): Ditto.
(_mm_cvti32_sd): Ditto.
(_mm_cvti32_ss): Ditto.
(_mm_cvtsd_i64): Ditto.
(_mm_cvtss_i64): Ditto.
(_mm_cvti64_sd): Ditto.
(_mm_cvti64_ss): Ditto.
* config/i386/avx512vlbwintrin.h (_mm256_storeu_epi8): New intrinsics.
(_mm_storeu_epi8): Ditto.
(_mm256_loadu_epi16): Ditto.
(_mm_loadu_epi16): Ditto.
(_mm256_loadu_epi8): Ditto.
(_mm_loadu_epi8): Ditto.
(_mm256_storeu_epi16): Ditto.
(_mm_storeu_epi16): Ditto.
* config/i386/avx512vlintrin.h (_mm256_load_epi64): New intrinsics.
(_mm_load_epi64): Ditto.
(_mm256_load_epi32): Ditto.
(_mm_load_epi32): Ditto.
(_mm256_store_epi32): Ditto.
(_mm_store_epi32): Ditto.
(_mm256_loadu_epi64): Ditto.
(_mm_loadu_epi64): Ditto.
(_mm256_loadu_epi32): Ditto.
(_mm_loadu_epi32): Ditto.
(_mm256_mask_cvt_roundps_ph): Ditto.
(_mm256_maskz_cvt_roundps_ph): Ditto.
(_mm_mask_cvt_roundps_ph): Ditto.
(_mm_maskz_cvt_roundps_ph): Ditto.
* config/i386/avxintrin.h (_mm256_cvtsi256_si32): New intrinsics.
* config/i386/emmintrin.h (_mm_loadu_si32): New intrinsics.
(_mm_loadu_si16): Ditto.
(_mm_storeu_si32): Ditto.
(_mm_storeu_si16): Ditto.
* config/i386/i386-builtin-types.def
(V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT): Add new type.
(V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT): Ditto.
(V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT): Ditto.
(V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT): Ditto.
* config/i386/i386-builtin.def
(__builtin_ia32_cvtsd2ss_mask_round): New builtin.
(__builtin_ia32_cvtss2sd_mask_round): Ditto.
(__builtin_ia32_rcp28sd_mask_round): Ditto.
(__builtin_ia32_rcp28ss_mask_round): Ditto.
(__builtin_ia32_rsqrt28sd_mask_round): Ditto.
(__builtin_ia32_rsqrt28ss_mask_round): Ditto.
(__builtin_ia32_reducepd512_mask_round): Ditto.
(__builtin_ia32_reduceps512_mask_round): Ditto.
(__builtin_ia32_reducesd_mask_round): Ditto.
(__builtin_ia32_reducess_mask_round): Ditto.
* config/i386/i386-expand.c
(ix86_expand_round_builtin): Expand round builtin for new type.
(V8DF_FTYPE_V8DF_INT_V8DF_UQI_INT)
(V16SF_FTYPE_V16SF_INT_V16SF_UHI_INT)
(V4SF_FTYPE_V4SF_V2DF_V4SF_UQI_INT)
(V2DF_FTYPE_V2DF_V4SF_V2DF_UQI_INT)
* config/i386/mmintrin.h ()
Define datatype __m32 and __m16.
Define datatype __m32_u and __m16_u.
* config/i386/sse.md: Adjust pattern.
(reducep): Adjust.
  

Re: [PATCH] combine: Fix up simplify_shift_const_1 for nested ROTATEs [PR97386]

2020-10-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 13, 2020 at 11:47:10AM -0500, Segher Boessenkool wrote:
> On Tue, Oct 13, 2020 at 09:44:25AM +0200, Jakub Jelinek wrote:
> > The following testcases are miscompiled (the first one since my improvements
> > to rotate discovery on GIMPLE, the other one for many years) because
> > combiner optimizes nested ROTATEs with narrowing SUBREG in between (i.e.
> > the outer rotate is performed in shorter precision than the inner one) to
> > just one ROTATE of the rotated constant.  While that (under certain
> > conditions) can work for shifts, it can't work for rotates where we can only
> > do that with rotates of the same precision.
> 
> > OT, on the other side I wonder why the code doesn't handle ROTATERT.  While
> 
> ROTATERT is a relatively recent invention, and isn't handled in many
> places.

Depends on how we define recent.
I see ROTATERT already in 1994 code ;)

Jakub



Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-10-13 Thread Martin Sebor via Gcc-patches

On 10/13/20 3:46 AM, Christophe Lyon wrote:

On Tue, 29 Sep 2020 at 00:02, Martin Sebor via Gcc-patches
 wrote:


On 9/25/20 11:17 PM, Jason Merrill wrote:

On 9/22/20 4:05 PM, Martin Sebor wrote:

The rebased and retested patches are attached.

On 9/21/20 3:17 PM, Martin Sebor wrote:

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html

(I'm working on rebasing the patch on top of the latest trunk which
has changed some of the same code but it'd be helpful to get a go-
ahead on substance the changes.  I don't expect the rebase to
require any substantive modifications.)

Martin

On 9/14/20 4:01 PM, Martin Sebor wrote:

On 9/4/20 11:14 AM, Jason Merrill wrote:

On 9/3/20 2:44 PM, Martin Sebor wrote:

On 9/1/20 1:22 PM, Jason Merrill wrote:

On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:

-Wplacement-new handles array indices and pointer offsets the same:
by adjusting them by the size of the element.  That's correct for
the latter but wrong for the former, causing false positives when
the element size is greater than one.

In addition, the warning doesn't even attempt to handle arrays of
arrays.  I'm not sure if I forgot or if I simply didn't think of
it.

The attached patch corrects these oversights by replacing most
of the -Wplacement-new code with a call to compute_objsize which
handles all this correctly (plus more), and is also better tested.
But even compute_objsize has bugs: it trips up while converting
wide_int to offset_int for some pointer offset ranges.  Since
handling the C++ IL required changes in this area the patch also
fixes that.

For review purposes, the patch affects just the middle end.
The C++ diff pretty much just removes code from the front end.


The C++ changes are OK.


Thank you for looking at the rest as well.




-compute_objsize (tree ptr, int ostype, access_ref *pref,
-bitmap *visited, const vr_values *rvals /* =
NULL */)
+compute_objsize (tree ptr, int ostype, access_ref *pref, bitmap
*visited,
+const vr_values *rvals)


This reformatting seems unnecessary, and I prefer to keep the
comment about the default argument.


This overload doesn't take a default argument.  (There was a stray
declaration of a similar function at the top of the file that had
one.  I've removed it.)


Ah, true.


-  if (!size || TREE_CODE (size) != INTEGER_CST)
-   return false;

  >...

You change some failure cases in compute_objsize to return
success with a maximum range, while others continue to return
failure. This needs commentary about the design rationale.


This is too much for a comment in the code but the background is
this: compute_objsize initially returned the object size as a
constant.
Recently, I have enhanced it to return a range to improve warnings
for
allocated objects.  With that, a failure can be turned into
success by
having the function set the range to that of the largest object.
That
should simplify the function's callers and could even improve
the detection of some invalid accesses.  Once this change is made
it might even be possible to change its return type to void.

The change that caught your eye is necessary to make the function
a drop-in replacement for the C++ front end code which makes this
same assumption.  Without it, a number of test cases that exercise
VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:

void f (int n)
{
  char a[n];
  new (a - 1) int ();
}

Changing any of the other places isn't necessary for existing tests
to pass (and I didn't want to introduce too much churn).  But I do
want to change the rest of the function along the same lines at some
point.


Please do change the other places to be consistent; better to have
more churn than to leave the function half-updated.  That can be a
separate patch if you prefer, but let's do it now rather than later.


I've made most of these changes in the other patch (also attached).
I'm quite happy with the result but it turned out to be a lot more
work than either of us expected, mostly due to the amount of testing.

I've left a couple of failing cases in place mainly as reminders
to handle them better (which means I also didn't change the caller
to avoid testing for failures).  I've also added TODO notes with
reminders to handle some of the new codes more completely.




+  special_array_member sam{ };


sam is always set by component_ref_size, so I don't think it's
necessary to initialize it at the declaration.


I find initializing pass-by-pointer local variables helpful but
I don't insist on it.




@@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
tree last_type = TREE_TYPE (last);
if (TREE_CODE (last_type) != ARRAY_TYPE
|| TYPE_SIZE (last_type))
-return size;
+return size ? size : TYPE_SIZE_UNIT (type);


This change seems to violate the comment for the function.


By my reading (and writing) the change is covered by the first
sentence:

 Returns the size of the object

Re: [PATCH] combine: Fix up simplify_shift_const_1 for nested ROTATEs [PR97386]

2020-10-13 Thread Segher Boessenkool
Hi!

On Tue, Oct 13, 2020 at 09:44:25AM +0200, Jakub Jelinek wrote:
> The following testcases are miscompiled (the first one since my improvements
> to rotate discovery on GIMPLE, the other one for many years) because
> combiner optimizes nested ROTATEs with narrowing SUBREG in between (i.e.
> the outer rotate is performed in shorter precision than the inner one) to
> just one ROTATE of the rotated constant.  While that (under certain
> conditions) can work for shifts, it can't work for rotates where we can only
> do that with rotates of the same precision.

> OT, on the other side I wonder why the code doesn't handle ROTATERT.  While

ROTATERT is a relatively recent invention, and isn't handled in many
places.  I still think we would be better off without it, fwiw (anything
that can be expressed in RTL using ROTATERT can be expressed just as
easily with just ROTATE; it would help if me made ROTATE defined for
*all* rhs values though).

> earlier the code canonicalizes ROTATERT to ROTATE with the adjusted
> (constant) count, I mean if the inner op is ROTATERT, why can't it
> decanonicalize the outer one back to ROTATERT and treat it like that?

Many targets only support the canonical form, so this will likely
regress generated code quality?

>   PR rtl-optimization/97386
>   * combine.c (simplify_shift_const_1): Don't optimize nested ROTATEs if
>   they have different modes.
> 
>   * gcc.c-torture/execute/pr97386-1.c: New test.
>   * gcc.c-torture/execute/pr97386-2.c: New test.

The patch is fine, okay for trunk, thanks!


Segher


[committed] libstdc++: Update C++20 status documentation

2020-10-13 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document some API changes
and deprecations.
* doc/xml/manual/intro.xml: Document LWG 2499.
* doc/xml/manual/status_cxx2020.xml: Update status.
* doc/html/*: Regenerate.

Committed to trunk.

commit 0e0beddd7fb4c0d2157c7f0c7d3f39e9533bb323
Author: Jonathan Wakely 
Date:   Tue Oct 13 17:40:43 2020

libstdc++: Update C++20 status documentation

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document some API changes
and deprecations.
* doc/xml/manual/intro.xml: Document LWG 2499.
* doc/xml/manual/status_cxx2020.xml: Update status.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/evolution.xml 
b/libstdc++-v3/doc/xml/manual/evolution.xml
index 625202b9a06..38f11b0300d 100644
--- a/libstdc++-v3/doc/xml/manual/evolution.xml
+++ b/libstdc++-v3/doc/xml/manual/evolution.xml
@@ -972,6 +972,42 @@ now defaults to zero.
   be used instead.
 
 
+
+  Experimental C++2a support improved, with new headers
+  ,
+  ,
+  ,
+  ,
+  ,
+  ,
+  and
+  
+  added.
+
+
+
+
+11
+
+
+  The --enable-cheaders=c_std configuration
+  was deprecated.
+
+
+
+  When compiling as C++20, the operator>> overloads
+  for extracting strings into character buffers only work with arrays,
+  not raw pointers.
+
+
+
+  std::string::reserve(n) will no longer reduce
+  the string's capacity.
+  Calling reserve() with no arguments is equivalent
+  to shrink_to_fit(), but is deprecated.
+  shrink_to_fit() should be used instead.
+
+
 
 
 
diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 76e55980324..3e7843f58c1 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1140,6 +1140,14 @@ requirements of the license of GCC.
 ill-formed.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#2499">2499:
+   operator>>(basic_istream&, CharT*) makes it 
hard to avoid buffer overflows
+   
+
+Replace operator>>(basic_istream&, CharT*)
+ and other overloads writing through pointers.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#2537">2537:
Constructors for priority_queue taking allocators
 should call make_heap
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
index b9ad03c720f..e633365ab40 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
@@ -1086,7 +1086,6 @@ or any notes about the implementation.
 
 
 
-  
 Add shift to  

   
 http://www.w3.org/1999/xlink"; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0769r2.pdf";>
@@ -1168,13 +1167,12 @@ or any notes about the implementation.
 
 
 
-  
 Fixing operator>>(basic_istream&, 
CharT*) (LWG 2499) 
   
 http://www.w3.org/1999/xlink"; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0487r1.html";>
 P0487R1 
   
-   
+   11.1 
   
 
 


Re: [PATCH] libstdc++: Implement C++20 features for

2020-10-13 Thread Jonathan Wakely via Gcc-patches

On 09/10/20 16:28 -0700, Thomas Rodgers via Libstdc++ wrote:


Jonathan Wakely writes:


On 07/10/20 18:15 -0700, Thomas Rodgers wrote:

@@ -500,6 +576,40 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  }
#endif

+#if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
+  basic_istringstream(ios_base::openmode __mode, const allocator_type& __a)
+  : __istream_type(), _M_stringbuf(__mode | ios_base::in, __a)
+  { this->init(&_M_stringbuf); }


All these & operators need to be std::__addressof(_M_stringbuf)
instead. _M_stringbuf potentially depends on program-defined types
(the traits and allocator classes) which means user namespaces are
considered for ADL and they could define a operator& that gets used.



+
+  explicit basic_istringstream(__string_type&& __str,
+  ios_base::openmode __mode = ios_base::in )
+  : __istream_type(), _M_stringbuf(std::move(__str), __mode | ios_base::in)
+  { this->init(&_M_stringbuf); }
+
+  template
+   basic_istringstream(const basic_string<_CharT, _Traits, _SAlloc>& __str,
+   const allocator_type& __a)
+   : basic_istringstream(__str, ios_base::in, __a)
+   { }
+
+  using __sv_type = basic_string_view;


This typedef seems to only be used once. Might as well just use
basic_string_view directly in the return type
of view().

Similarly in basic_ostringstream and basic_stringstream.


diff --git a/libstdc++-v3/src/c++20/Makefile.in 
b/libstdc++-v3/src/c++20/Makefile.in
new file mode 100644
index 000..0e2de19ae59
diff --git a/libstdc++-v3/src/c++20/sstream-inst.cc 
b/libstdc++-v3/src/c++20/sstream-inst.cc
new file mode 100644
index 000..c419176ae8e
--- /dev/null
+++ b/libstdc++-v3/src/c++20/sstream-inst.cc
@@ -0,0 +1,111 @@
+// Explicit instantiation file.
+
+// Copyright (C) 1997-2020 Free Software Foundation, Inc.


Just 2020 here.


+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+//
+// ISO C++ 14882:
+//
+
+#ifndef _GLIBCXX_USE_CXX11_ABI
+// Instantiations in this file use the new SSO std::string ABI unless included
+// by another file which defines _GLIBCXX_USE_CXX11_ABI=0.


This copy&pasted comment is misleading now if we're not actually going
to include it from another file to generate the old ABI symbols.

I think just define it unconditionally and add a comment saying that
these new symbols are only defines for the SSO string ABI.


+# define _GLIBCXX_USE_CXX11_ABI 1
+#endif
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+template basic_stringbuf::basic_stringbuf(const allocator_type&);
+template basic_stringbuf::basic_stringbuf(ios_base::openmode,
+const allocator_type&);
+template basic_stringbuf::basic_stringbuf(__string_type&&,
+ios_base::openmode);
+template basic_stringbuf::basic_stringbuf(basic_stringbuf&&,
+const allocator_type&);
+template basic_stringbuf::allocator_type
+basic_stringbuf::get_allocator() const noexcept;
+template basic_stringbuf::__sv_type


Looks like this would be a bit simpler if it just used string_view
here, not basic_stringbuf::__sv_type, and wstring_view below
for the wchar_t specializations.

And you could use allocator instead of
basic_stringbuf::allocator_type.

That looks a little cleaner to me, but it's a matter of opinion.

That would be necessary anyway for the basic_*stringstream types if
they don't have the __sv_type any more.



diff --git a/libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/1.cc 
b/libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/1.cc
new file mode 100644
index 000..d93141fc232
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/1.cc
@@ -0,0 +1,85 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or mo

Re: [RFA,PATCH] Bail in bounds_of_var_in_loop if no step found.

2020-10-13 Thread Aldy Hernandez via Gcc-patches




On 10/13/20 6:02 PM, Richard Biener wrote:

On October 13, 2020 5:17:48 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
 wrote:

[Neither Andrew nor I are familiar with the SCEV code.  We treat it as
a
black box :).  So we could use a SCEV expert here.]

In bounds_of_var_in_loop, evolution_part_in_loop_num is returning NULL:

   step = evolution_part_in_loop_num (chrec, loop->num);


It means that Var doesn't vary in the loop.
That is, chrec isn't a polynomial chrec.


That's what I thought, but it is:

(gdb) p chrec
$6 = 
(gdb) dd chrec
{0, +, 1}_2

evolution_part_in_loop_num() is returning NULL deep in 
chrec_component_in_loop_num():


 default:
=>if (right)
return NULL_TREE;
  else
return chrec;

Do you have any suggestions?

Thanks.
Aldy



Re: [r11-3641 Regression] FAIL: gcc.dg/torture/pta-ptrarith-1.c -Os scan-tree-dump alias "ESCAPED = {[^\n}]* i f [^\n}]*}" on Linux/x86_64 (-m32 -march=cascadelake)

2020-10-13 Thread Segher Boessenkool
Hi!

On Tue, Oct 13, 2020 at 11:58:11AM +0200, Christophe Lyon wrote:
> On Mon, 12 Oct 2020 at 18:43, Segher Boessenkool
>  wrote:
> >
> > On Mon, Oct 12, 2020 at 03:26:38PM +0200, Christophe Lyon wrote:
> > > That's why I kept the reporting part manual on my side: once you know
> > > which commit introduced a failure/regression (either via bisect, or by
> > > some other way), it's not always easy to identify the gcc-patches
> > > message to which you want to reply.
> >
> > But it *should* be: the check-in subject should be in the patch mail, or
> > failing that, at least the changelog entries should be!
> 
> Well, for instance I've just reported that a newly introduced testcase
> is failing on arm, aarch64 and other platforms.
> 
> It's easy to know which commit introduced the problem, since it's a
> new test: r11-3827.
> 
> When looking for the email thread to which I want to send a reply, I
> search my mailbox
> for "Wstringop-overflow-47.c", which points me to a thread titled
> "correct handling of indices into arrays with elements larger than 1
> (PR c++/96511)"
> with several iterations, and several sets of patches.
> 
> The offending commit has "Generalize compute_objsize to return maximum
> size/offset instead of failing (PR middle-end/97023)"
> as title, so it's not obvious that this is really the right thread
> (and since the patches were attached, gmail does not display them
> inline, so I have to open them and check if the one I'm looking for is
> really there)
> 
> It's not super-long to do, but I feel it's more effort than should be
> needed for such a simple case.

This should be easier, yes.  But simply not doing it is pushing this
work onto everyone else!  (Except the people who will just delete these
mails because they aren't useful for them at all in this form.)

> > > It seems some people prefer such regressions reports in bugzilla,
> > > others in gcc-patches@.
> >
> > If it will be resolved quickly, and by just telling the author, email is
> > fine of course.  Otherwise, you need bugzilla.
> >
> In the above case, I was tempted to open a bugzilla, I would have had
> to dig less in my email archives, but since many targets are concerned,
> I hope it's obvious enough that the fix will be easy. YMMV.

And it differs per case; it needs human judgment.

> > > > *Actually* following up to the patch mail could be useful (but you can
> > > > than just point to the bugzilla).  Sending spam to gcc-patches@ is not
> > > > useful for most users of the list.
> >
> > ^^^ Still my main point.


Segher


Re: [PATCH 2/X] libsanitizer: Only build libhwasan when targeting AArch64

2020-10-13 Thread Richard Sandiford via Gcc-patches
Matthew Malcomson  writes:
> diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
> index 
> 52503f1a880ba08b515b8a429ac44a262873f74b..fb55ae9762e9ac6531087a258e1291b5635fcd3e
>  100644
> --- a/libsanitizer/configure.tgt
> +++ b/libsanitizer/configure.tgt
> @@ -61,6 +61,7 @@ case "${target}" in
>   LSAN_SUPPORTED=yes
>   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_aarch64.lo
>   fi
> + HWASAN_SUPPORTED=yes
>   ;;
>x86_64-*-darwin1[2-9]* | i?86-*-darwin1[2-9]*)
>   TSAN_SUPPORTED=no

It might be worth having a comment here to emphasise that hwasan is
supported for both ILP32 and LP64.

OK with or without that change in combination with the other patches.

Thanks,
Richard


Re: [RFA,PATCH] Bail in bounds_of_var_in_loop if no step found.

2020-10-13 Thread Richard Biener via Gcc-patches
On October 13, 2020 5:17:48 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
 wrote:
>[Neither Andrew nor I are familiar with the SCEV code.  We treat it as
>a
>black box :).  So we could use a SCEV expert here.]
>
>In bounds_of_var_in_loop, evolution_part_in_loop_num is returning NULL:
>
>   step = evolution_part_in_loop_num (chrec, loop->num);

It means that Var doesn't vary in the loop. 
That is, chrec isn't a polynomial chrec. 

>and we ICE while trying to calculate the range for STEP.
>
>This is for:
>
>(gdb) dd stmt
>qx.0_3 = PHI 
>(gdb) dd var
>qx.0_3
>
>It looks like NULL is a perfectly valid response from
>evolution_part_in_loop_num.  Should we just bail if NULL?
>
>If bailing is the correct solution, the following patch fixes the PR
>and
>passes tests.
>
>Thanks.
>Aldy
>
>gcc/ChangeLog:
>
>   PR tree-optimization/97396
>   * vr-values.c (bounds_of_var_in_loop): Bail on no step.
>
>gcc/testsuite/ChangeLog:
>
>   * gcc.dg/pr97396.c: New test.
>---
> gcc/testsuite/gcc.dg/pr97396.c | 23 +++
> gcc/vr-values.c|  2 ++
> 2 files changed, 25 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr97396.c
>
>diff --git a/gcc/testsuite/gcc.dg/pr97396.c
>b/gcc/testsuite/gcc.dg/pr97396.c
>new file mode 100644
>index 000..d992c11f238
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/pr97396.c
>@@ -0,0 +1,23 @@
>+// { dg-do compile }
>+// { dg-options "-O1 -ftree-vrp" }
>+// { dg-additional-options "-m32" { target { i?86-*-* x86_64-*-* } } }
>+
>+unsigned int
>+po (char *os, unsigned int al)
>+{
>+  for (;;)
>+{
>+  int qx = 0;
>+
>+  while (al < 1)
>+{
>+  char *cw;
>+
>+  cw = os + qx;
>+  if (cw)
>+return al + qx;
>+
>+  qx += sizeof *cw;
>+}
>+}
>+}
>diff --git a/gcc/vr-values.c b/gcc/vr-values.c
>index da0b249278b..16f6c629f29 100644
>--- a/gcc/vr-values.c
>+++ b/gcc/vr-values.c
>@@ -1827,6 +1827,8 @@ bounds_of_var_in_loop (tree *min, tree *max,
>range_query *query,
> 
>   init = initial_condition_in_loop_num (chrec, loop->num);
>   step = evolution_part_in_loop_num (chrec, loop->num);
>+  if (step == NULL_TREE)
>+return false;
> 
>   /* If INIT is an SSA with a singleton range, set INIT to said
>  singleton, otherwise leave INIT alone.  */



Re: [PATCH 1/X] libsanitizer: Tie the hwasan library into our build system

2020-10-13 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review.

Matthew Malcomson  writes:
> This patch tries to tie libhwasan into the GCC build system in the same way
> that the other sanitizer runtime libraries are handled.
>
> libsanitizer/ChangeLog:
>
>   * Makefile.am:  Build libhwasan.
>   * Makefile.in:  Build libhwasan.
>   * asan/Makefile.in:  Build libhwasan.
>   * configure:  Build libhwasan.
>   * configure.ac:  Build libhwasan.
>   * hwasan/Makefile.am: New file.
>   * hwasan/Makefile.in: New file.
>   * hwasan/libtool-version: New file.
>   * interception/Makefile.in: Build libhwasan.
>   * libbacktrace/Makefile.in: Build libhwasan.
>   * libsanitizer.spec.in: Build libhwasan.
>   * lsan/Makefile.in: Build libhwasan.
>   * sanitizer_common/Makefile.in: Build libhwasan.
>   * tsan/Makefile.in: Build libhwasan.
>   * ubsan/Makefile.in: Build libhwasan.

I think this should also update README.gcc and merge.sh.  Could you
try locally merging in a dummy change to the llvm sources with merge.sh,
to make sure it works correctly?

> new file mode 100644
> index 
> ..aaa39b4536a5c5f54910a951470814bbc8a20946
> --- /dev/null
> +++ b/libsanitizer/hwasan/Makefile.am
> @@ -0,0 +1,88 @@
> +AM_CPPFLAGS = -I $(top_srcdir)/include -I $(top_srcdir)
> +
> +# May be used by toolexeclibdir.
> +gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
> +
> +DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS 
> -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DCAN_SANITIZE_UB=0 
> -DHWASAN_WITH_INTERCEPTORS=1
> +AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic 
> -Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fno-rtti 
> -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros 
> -fno-ipa-icf

I realise this is just taken from the other Makefile.ams, but do
you know the reason behind -fomit-frame-pointer?  I think we should
avoid building aarch64 libraries with that flag unless there's a
specific reason.

Otherwise looks good to me, although I'm definitely not an expert
on this stuff.

Thanks,
Richard


[committed] libstdc++: Remove trailing whitespace from XML docs

2020-10-13 Thread Jonathan Wakely via Gcc-patches
Very boring. Committed to trunk.

commit 16d0b033ca4d3dd1331c58730c7944ae6e648a14
Author: Jonathan Wakely 
Date:   Tue Oct 13 16:43:11 2020

libstdc++: Remove trailing whitespace from XML docs

libstdc++-v3/ChangeLog:

* doc/xml/book.txml: Remove trailing whitespace.
* doc/xml/chapter.txml: Likewise.
* doc/xml/class.txml: Likewise.
* doc/xml/gnu/fdl-1.3.xml: Likewise.
* doc/xml/gnu/gpl-3.0.xml: Likewise.
* doc/xml/manual/abi.xml: Likewise.
* doc/xml/manual/algorithms.xml: Likewise.
* doc/xml/manual/allocator.xml: Likewise.
* doc/xml/manual/appendix_contributing.xml: Likewise.
* doc/xml/manual/appendix_free.xml: Likewise.
* doc/xml/manual/appendix_porting.xml: Likewise.
* doc/xml/manual/atomics.xml: Likewise.
* doc/xml/manual/auto_ptr.xml: Likewise.
* doc/xml/manual/backwards_compatibility.xml: Likewise.
* doc/xml/manual/bitmap_allocator.xml: Likewise.
* doc/xml/manual/build_hacking.xml: Likewise.
* doc/xml/manual/codecvt.xml: Likewise.
* doc/xml/manual/concurrency.xml: Likewise.
* doc/xml/manual/concurrency_extensions.xml: Likewise.
* doc/xml/manual/configure.xml: Likewise.
* doc/xml/manual/containers.xml: Likewise.
* doc/xml/manual/ctype.xml: Likewise.
* doc/xml/manual/debug.xml: Likewise.
* doc/xml/manual/debug_mode.xml: Likewise.
* doc/xml/manual/diagnostics.xml: Likewise.
* doc/xml/manual/documentation_hacking.xml: Likewise.
* doc/xml/manual/evolution.xml: Likewise.
* doc/xml/manual/internals.xml: Likewise.
* doc/xml/manual/intro.xml: Likewise.
* doc/xml/manual/io.xml: Likewise.
* doc/xml/manual/iterators.xml: Likewise.
* doc/xml/manual/locale.xml: Likewise.
* doc/xml/manual/localization.xml: Likewise.
* doc/xml/manual/messages.xml: Likewise.
* doc/xml/manual/mt_allocator.xml: Likewise.
* doc/xml/manual/numerics.xml: Likewise.
* doc/xml/manual/parallel_mode.xml: Likewise.
* doc/xml/manual/policy_data_structures.xml: Likewise.
* doc/xml/manual/prerequisites.xml: Likewise.
* doc/xml/manual/shared_ptr.xml: Likewise.
* doc/xml/manual/spine.xml: Likewise.
* doc/xml/manual/status_cxxtr1.xml: Likewise.
* doc/xml/manual/status_cxxtr24733.xml: Likewise.
* doc/xml/manual/strings.xml: Likewise.
* doc/xml/manual/support.xml: Likewise.
* doc/xml/manual/test.xml: Likewise.
* doc/xml/manual/test_policy_data_structures.xml: Likewise.
* doc/xml/manual/using.xml: Likewise.
* doc/xml/manual/using_exceptions.xml: Likewise.
* doc/xml/manual/utilities.xml: Likewise.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/book.txml b/libstdc++-v3/doc/xml/book.txml
index 55b050271a5..8e3f3273c34 100644
--- a/libstdc++-v3/doc/xml/book.txml
+++ b/libstdc++-v3/doc/xml/book.txml
@@ -2,7 +2,7 @@
 
 http://docbook.org/ns/docbook"; version="5.0" xml:id="api" 
xreflabel="Source Level Documentation">
 
- 
+
 
   
 2007
@@ -13,7 +13,7 @@
   
   
 
-  http://www.w3.org/1999/xlink"; 
xlink:href="17_intro/license.html">License 
+  http://www.w3.org/1999/xlink"; 
xlink:href="17_intro/license.html">License
   
 
   
@@ -22,7 +22,7 @@
 
 
   
-
+
 
   
 
diff --git a/libstdc++-v3/doc/xml/chapter.txml 
b/libstdc++-v3/doc/xml/chapter.txml
index b251c84609a..a5a9a3a8230 100644
--- a/libstdc++-v3/doc/xml/chapter.txml
+++ b/libstdc++-v3/doc/xml/chapter.txml
@@ -1,7 +1,7 @@
 
 
 http://docbook.org/ns/docbook"; version="5.0" 
xml:id="manual.intro" xreflabel="Introduction">
- 
+
 Introduction
   
 ISO C++
@@ -12,33 +12,33 @@
 
 
 Status
-  
+
   
 The GNU C++ ...
   
 
 
 Setup
-  
+
   
 The GNU C++ ...
   
   Next1
-
+
 
   The GNU C++ ...
 
   
   Next2
-
+
 
   The GNU C++ ...
 
-
+  
 
 
 Using
-  
+
   
 The GNU C++ ...
   
diff --git a/libstdc++-v3/doc/xml/class.txml b/libstdc++-v3/doc/xml/class.txml
index be0929b7046..8d96d2cb0fa 100644
--- a/libstdc++-v3/doc/xml/class.txml
+++ b/libstdc++-v3/doc/xml/class.txml
@@ -2,7 +2,7 @@
 
 http://docbook.org/ns/docbook"; version="5.0" 
xml:id="manual.util.memory.allocator" xreflabel="allocator">
 
- 
+
 allocator
   
 ISO C++
@@ -31,7 +31,7 @@
 
   
 
-   
+  
   
 
 
@@ -41,7 +41,7 @@
   
   
 
-   
+  
   
 
 
@@ -49,19 +49,19 @@
 
 
   Interface Design
-
+
 
 
 
-
- 
+
+
 
   
 
   Selecting Default Allocation Strategy
-
 
- 
+
+
 
 

@@ -77,12 +77,12 @@
   
 
   Disabling Memory Caching
-
 
- 
+
+
 
 
-
+   

   
 

Re: [patch] Rework CPP_BUILTINS_SPEC for powerpc-vxworks

2020-10-13 Thread Segher Boessenkool
Hi!

On Tue, Oct 13, 2020 at 03:35:11PM +0200, Olivier Hainque wrote:
> Here's an updated version which passed in-house build & tests
> with gcc-10 based toolchains for VxWorks 6.9 & 7.2, and with which
> I could build mainline for VxWorks 6.9 including libstdc++ (combined
> with a few other patches, orthogonal to what is proposed here and
> which I'll post shortly).
> 
> Same ChangeLog. Patch hopefully quotable if needed now.

It is, thank you!

The patch looks fine to me now.  Not that you need my approval :-)


Segher


Re: [RFC][gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p

2020-10-13 Thread Tom de Vries
On 10/12/20 9:15 AM, Richard Biener wrote:
> On Fri, 9 Oct 2020, Tom de Vries wrote:
> 
>> Hi,
>>
>> The function gimple_can_duplicate_bb_p currently always returns true.
>>
>> The presence of can_duplicate_bb_p in tracer.c however suggests that
>> there are cases when bb's indeed cannot be duplicated.
>>
>> Move the implementation of can_duplicate_bb_p to gimple_can_duplicate_bb_p.
>>
>> Bootstrapped and reg-tested on x86_64-linux.
>>
>> Build x86_64-linux with nvptx accelerator and tested libgomp.
>>
>> No issues found.
>>
>> As corner-case check, bootstrapped and reg-tested a patch that makes
>> gimple_can_duplicate_bb_p always return false, resulting in
>> PR97333 - "[gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate]
>> ICE in duplicate_block, at cfghooks.c:1093".
>>
>> Any comments?
> 
> In principle it's correct to move this to the CFG hook since there
> now seem to be stmts that cannot be duplicated and thus we need
> to implement can_duplicate_bb_p.
> 
> Some minor things below...
> 
>> Thanks,
>> - Tom
>>
>> [gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p
>>
>> gcc/ChangeLog:
>>
>> 2020-10-09  Tom de Vries  
>>
>>  * tracer.c (cached_can_duplicate_bb_p): Use can_duplicate_block_p
>>  instead of can_duplicate_bb_p.
>>  (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p): Move ...
>>  * tree-cfg.c: ... here.
>>  * tracer.c (can_duplicate_bb_p): Move ...
>>  * tree-cfg.c (gimple_can_duplicate_bb_p): here.
>>  * tree-cfg.h (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p):
>>  Declare.
>>
>> ---
>>  gcc/tracer.c   | 61 
>> +-
>>  gcc/tree-cfg.c | 54 ++-
>>  gcc/tree-cfg.h |  2 ++
>>  3 files changed, 56 insertions(+), 61 deletions(-)
>>
>> diff --git a/gcc/tracer.c b/gcc/tracer.c
>> index e1c2b9527e5..16b46c65b14 100644
>> --- a/gcc/tracer.c
>> +++ b/gcc/tracer.c
>> @@ -84,65 +84,6 @@ bb_seen_p (basic_block bb)
>>return bitmap_bit_p (bb_seen, bb->index);
>>  }
>>  
>> -/* Return true if gimple stmt G can be duplicated.  */
>> -static bool
>> -can_duplicate_insn_p (gimple *g)
>> -{
>> -  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
>> - duplicated as part of its group, or not at all.
>> - The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
>> - group, so the same holds there.  */
>> -  if (is_gimple_call (g)
>> -  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
>> -  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
>> -  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
>> -  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
>> -  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
>> -return false;
>> -
>> -  return true;
>> -}
>> -
>> -/* Return true if BB can be duplicated.  Avoid iterating over the insns.  */
>> -static bool
>> -can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
>> -{
>> -  if (bb->index < NUM_FIXED_BLOCKS)
>> -return false;
>> -
>> -  if (gimple *g = last_stmt (CONST_CAST_BB (bb)))
>> -{
>> -  /* A transaction is a single entry multiple exit region.  It
>> - must be duplicated in its entirety or not at all.  */
>> -  if (gimple_code (g) == GIMPLE_TRANSACTION)
>> -return false;
>> -
>> -  /* An IFN_UNIQUE call must be duplicated as part of its group,
>> - or not at all.  */
>> -  if (is_gimple_call (g)
>> -  && gimple_call_internal_p (g)
>> -  && gimple_call_internal_unique_p (g))
>> -return false;
>> -}
>> -
>> -  return true;
>> -}
>> -
>> -/* Return true if BB can be duplicated.  */
>> -static bool
>> -can_duplicate_bb_p (const_basic_block bb)
>> -{
>> -  if (!can_duplicate_bb_no_insn_iter_p (bb))
>> -return false;
>> -
>> -  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
>> -   !gsi_end_p (gsi); gsi_next (&gsi))
>> -if (!can_duplicate_insn_p (gsi_stmt (gsi)))
>> -  return false;
>> -
>> -  return true;
>> -}
>> -
>>  static sbitmap can_duplicate_bb;
>>  
>>  /* Cache VAL as value of can_duplicate_bb_p for BB.  */
>> @@ -167,7 +108,7 @@ cached_can_duplicate_bb_p (const_basic_block bb)
>>return false;
>>  }
>>  
>> -  return can_duplicate_bb_p (bb);
>> +  return can_duplicate_block_p (bb);
>>  }
>>  
>>  /* Return true if we should ignore the basic block for purposes of tracing. 
>>  */
>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>> index 5caf3b62d69..a5677859ffc 100644
>> --- a/gcc/tree-cfg.c
>> +++ b/gcc/tree-cfg.c
>> @@ -6208,11 +6208,63 @@ gimple_split_block_before_cond_jump (basic_block bb)
>>  }
>>  
>>  
>> +/* Return true if gimple stmt G can be duplicated.  */
>> +bool
>> +can_duplicate_insn_p (gimple *g)
> 
> Does this need to be exported?

Yes, it's still used in tracer.c.  With the renaming, that has become
evident now.

>  Please name it
> can_duplicate_stmt_p.

Done.

>  It's also incomplete gi

[RFA,PATCH] Bail in bounds_of_var_in_loop if no step found.

2020-10-13 Thread Aldy Hernandez via Gcc-patches
[Neither Andrew nor I are familiar with the SCEV code.  We treat it as a
black box :).  So we could use a SCEV expert here.]

In bounds_of_var_in_loop, evolution_part_in_loop_num is returning NULL:

   step = evolution_part_in_loop_num (chrec, loop->num);

and we ICE while trying to calculate the range for STEP.

This is for:

(gdb) dd stmt
qx.0_3 = PHI 
(gdb) dd var
qx.0_3

It looks like NULL is a perfectly valid response from
evolution_part_in_loop_num.  Should we just bail if NULL?

If bailing is the correct solution, the following patch fixes the PR and
passes tests.

Thanks.
Aldy

gcc/ChangeLog:

PR tree-optimization/97396
* vr-values.c (bounds_of_var_in_loop): Bail on no step.

gcc/testsuite/ChangeLog:

* gcc.dg/pr97396.c: New test.
---
 gcc/testsuite/gcc.dg/pr97396.c | 23 +++
 gcc/vr-values.c|  2 ++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr97396.c

diff --git a/gcc/testsuite/gcc.dg/pr97396.c b/gcc/testsuite/gcc.dg/pr97396.c
new file mode 100644
index 000..d992c11f238
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97396.c
@@ -0,0 +1,23 @@
+// { dg-do compile }
+// { dg-options "-O1 -ftree-vrp" }
+// { dg-additional-options "-m32" { target { i?86-*-* x86_64-*-* } } }
+
+unsigned int
+po (char *os, unsigned int al)
+{
+  for (;;)
+{
+  int qx = 0;
+
+  while (al < 1)
+{
+  char *cw;
+
+  cw = os + qx;
+  if (cw)
+return al + qx;
+
+  qx += sizeof *cw;
+}
+}
+}
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index da0b249278b..16f6c629f29 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -1827,6 +1827,8 @@ bounds_of_var_in_loop (tree *min, tree *max, range_query 
*query,
 
   init = initial_condition_in_loop_num (chrec, loop->num);
   step = evolution_part_in_loop_num (chrec, loop->num);
+  if (step == NULL_TREE)
+return false;
 
   /* If INIT is an SSA with a singleton range, set INIT to said
  singleton, otherwise leave INIT alone.  */
-- 
2.25.4



[PUSHED] ranger: Do not save hash slots across calls to hash_table::get_or_insert.

2020-10-13 Thread Aldy Hernandez via Gcc-patches
There's a read of a freed block while accessing the default_slot in
calc_switch_ranges.

  default_slot->intersect (def_range);

It seems the default_slot got swiped from under us, and the valgrind
dump indicates the free came from the get_or_insert in the same
function:

  irange *&slot = m_edge_table->get_or_insert (e, &existed);

So it looks like the get_or_insert is actually freeing the value of
the previously allocated default_slot.  Looking down the chain
from get_or_insert, we see it calls hash_table<>::expand, which
actually does a free while doing a resize of sorts:

  if (!m_ggc)
Allocator  ::data_free (oentries);
  else
ggc_free (oentries);

This patch avoids keeping a pointer to the default_slot across multiple
calls to get_or_insert in the loop.

Pushed.

gcc/ChangeLog:

PR tree-optimization/97379
* gimple-range-edge.cc (outgoing_range::calc_switch_ranges): Do
not save hash slot across calls to hash_table<>::get_or_insert.
---
 gcc/gimple-range-edge.cc | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-range-edge.cc b/gcc/gimple-range-edge.cc
index c5ee54fec51..b42dcd6d318 100644
--- a/gcc/gimple-range-edge.cc
+++ b/gcc/gimple-range-edge.cc
@@ -106,19 +106,14 @@ outgoing_range::calc_switch_ranges (gswitch *sw)
   unsigned x, lim;
   lim = gimple_switch_num_labels (sw);
   tree type = TREE_TYPE (gimple_switch_index (sw));
-  
   edge default_edge = gimple_switch_default_edge (cfun, sw);
-  irange *&default_slot = m_edge_table->get_or_insert (default_edge, &existed);
 
-  // This should be the first call into this switch.  For the default
-  // range case, start with varying and intersect each other case from
-  // it.
-
-  gcc_checking_assert (!existed);
-
-  // Allocate an int_range_max for default case.
-  default_slot = m_range_allocator.allocate (255);
-  default_slot->set_varying (type);
+  // This should be the first call into this switch.
+  //
+  // Allocate an int_range_max for the default range case, start with
+  // varying and intersect each other case from it.
+  irange *default_range = m_range_allocator.allocate (255);
+  default_range->set_varying (type);
 
   for (x = 1; x < lim; x++)
 {
@@ -137,7 +132,7 @@ outgoing_range::calc_switch_ranges (gswitch *sw)
   int_range_max def_range (low, high);
   range_cast (def_range, type);
   def_range.invert ();
-  default_slot->intersect (def_range);
+  default_range->intersect (def_range);
 
   // Create/union this case with anything on else on the edge.
   int_range_max case_range (low, high);
@@ -157,6 +152,11 @@ outgoing_range::calc_switch_ranges (gswitch *sw)
   // intrusive than allocating max ranges for each case.
   slot = m_range_allocator.allocate (case_range);
 }
+
+  irange *&slot = m_edge_table->get_or_insert (default_edge, &existed);
+  // This should be the first call into this switch.
+  gcc_checking_assert (!existed);
+  slot = default_range;
 }
 
 
-- 
2.26.2



Re: [PATCH] Practical Improvement to libgcc Complex Divide

2020-10-13 Thread Patrick McGehearty via Gcc-patches

Ping - still need review of version 4 of this patch.
It has been over a month since the last comment.

- patrick



On 9/9/2020 2:13 AM, Richard Biener wrote:

On Tue, Sep 8, 2020 at 8:50 PM Patrick McGehearty via Gcc-patches
 wrote:

(Version 4)

(Added in version 4)
Fixed Changelog entry to include __divsc3, __divdc3, __divxc3, __divtc3.
Revised description to avoid incorrect use of "ulp (units last place)".
Modified float precison case to use double precision when double
precision hardware is available. Otherwise float uses the new algorithm.
Added code to scale subnormal numerator arguments when appropriate.
This change reduces 16 bit errors in double precision by a factor of 140.
Revised results charts to match current version of code.
Added background of tuning approach.

Summary of Purpose

The following patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://urldefense.com/v3/__https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714__;!!GqivPVa7Brio!NjjEhnQQ38VNyP_v8nlAm9uVjvZldUnobfY5hZdq22cMMVauop64MFw3nOHIQXUmy8PToRw$
 .  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test set by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rarest and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Therefore half precision is left unchanged.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.

Testing for when both parts of the denominator had exponents roughly
small enough to allow shifting any subnormal values to normal values,
all input values could be scaled up without risking unnecessary
overflow and gaining a clear improvement in accuracy. Similarly, when
either numerator was subnormal and the other numerator and both
denominator values were not too large, scaling could be used to reduce
risk of computing with subnormals.  The test and scaling values used
all fit within the allowed 

[PATCH][GCC 8] AArch64: Add Neoverse N2 tuning model

2020-10-13 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This is the GCC 8 patch for the Neoverse N2 tuning struct.
It sets the AARCH64_EXTRA_TUNE_PREFER_ADVSIMD_AUTOVEC tune flag as well.

Bootstrapped and tested on the branch.
Pushing to GCC 8.
Thanks,
Kyrill

gcc/
* config/aarch64/aarch64.c (neoversen2_tunings): Define.
* config/aarch64/aarch64-cores.def (neoverse-n2): Use it.


n2-tune-8.patch
Description: n2-tune-8.patch


[PATCH][GCC 9] AArch64: Add Neoverse N2 tuning mode

2020-10-13 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This is the GCC 9 patch for the Neoverse N2 tuning struct.
It sets the AARCH64_EXTRA_TUNE_PREFER_ADVSIMD_AUTOVEC tune flag as well.

Bootstrapped and tested on the branch.
Pushing to GCC 9.
Thanks,
Kyrill

gcc/
* config/aarch64/aarch64.c (neoversen2_tunings): Define.
* config/aarch64/aarch64-cores.def (neoverse-n2): Use it.


n2-tune-9.patch
Description: n2-tune-9.patch


[PATCH][GCC 10] AArch64: Add Neoverse N2 tuning model

2020-10-13 Thread Kyrylo Tkachov via Gcc-patches
This is the GCC 10 version of the Neoverse N2 tuning struct patch.
It's more or less identical.

Bootstrapped and tested on the branch.
Pushing to GCC 10.
Thanks,
Kyrill

gcc/
* config/aarch64/aarch64.c (neoversen2_tunings): Define.
* config/aarch64/aarch64-cores.def (neoverse-n2): Use it.


n2-tune-10.patch
Description: n2-tune-10.patch


Re: [Patch] libgomp: Add, if existing, -latomic to libgomp.spec --as-needed (was: Re: [RFC] Offloading and automatic linking of libraries)

2020-10-13 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 12, 2020 at 05:51:17PM +0200, Tobias Burnus wrote:
> first: *PING*.
> 
> secondly, I think the change to testsuite/lib/libgomp.exp's libgomp_init
> is also needed.
> (Hence, I now added it.) I have a too new system-installed libatomic to
> be sure that
> it fails without.

I think for libgomp.spec we should add it solely for the offloading targets,
neither GCC generated code for OpenMP construct nor libgomp itself needs
-latomic on the hosts.
Otherwise, if we want to add it --as-needed for all targets that have it, it
should be done in gcc/configure* and gcc/gcc.c (and adjust then the
testsuites + document in release notes).
That would really need e.g. packaging changes for distros because
libatomic-devel wouldn't be optional anymore (and likely also the static
version would be needed too).

Jakub



Re: [Patch] lto-wrapper: Use nontemp filename with -save-temps

2020-10-13 Thread Tobias Burnus

On 10/13/20 2:56 PM, Richard Biener wrote:

On Tue, 13 Oct 2020, Tobias Burnus wrote:

This patch generates (for a.out and -save-temps)
the file a.crtoffloadtable.o.


I missed a misused of concat in my patch :-(

Committed as obvious.

Tobias
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 8311899eddf91d0d3e3ad931c6bbf2d5a1b445ca
Author: Tobias Burnus 
Date:   Tue Oct 13 15:56:58 2020 +0200

lto-wrapper: Use nontemp filename with -save-temps

gcc/ChangeLog:

* lto-wrapper.c (find_crtoffloadtable): Fix last commit
by adding NULL as last argument to concat.

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 4d3cd7a56f2..4d93313241d 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1043,7 +1043,7 @@ find_crtoffloadtable (int save_temps, const char *dumppfx)
 	if (!save_temps)
 	  crtoffloadtable = make_temp_file (".crtoffloadtable.o");
 	else
-	  crtoffloadtable = concat (dumppfx, "crtoffloadtable.o");
+	  crtoffloadtable = concat (dumppfx, "crtoffloadtable.o", NULL);
 	copy_file (crtoffloadtable, paths[i]);
 	printf ("%s\n", crtoffloadtable);
 	XDELETEVEC (crtoffloadtable);


Re: [PATCH] arm: Add a couple of extra stack-protector tests

2020-10-13 Thread Christophe Lyon via Gcc-patches
On Tue, 13 Oct 2020 at 15:51, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Wed, 23 Sep 2020 at 20:33, Richard Sandiford
> >  wrote:
> >>
> >> These tests were inspired by the corresponding aarch64 ones that I just
> >> committed.  They already pass.
> >>
> >> Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
> >> OK for trunk?
> >>
> >> Richard
> >>
> >>
> >> gcc/testsuite/
> >> * gcc.target/arm/stack-protector-5.c: New test.
> >> * gcc.target/arm/stack-protector-6.c: Likewise.
> >> ---
> >
> > Hi Richard,
> >
> > These new tests fail when compiling for cortex-a15 and cortex-a57...
> > There are 2 "str" instructions generated, the code is much longer than
> > for cortex-a9 for instance.
> >
> > They pass with cortex-a9, cortex-a5 and arm10tdmi.
>
> Gah, thanks for the heads-up.  I've applied the below as obvious
> after testing on arm-linux-gnueabihf and armeb-eabi.
>

Nice, I hadn't thought of that workaround.

> Richard
>


Re: [PATCH] arm: Add a couple of extra stack-protector tests

2020-10-13 Thread Richard Sandiford via Gcc-patches
Christophe Lyon  writes:
> On Wed, 23 Sep 2020 at 20:33, Richard Sandiford
>  wrote:
>>
>> These tests were inspired by the corresponding aarch64 ones that I just
>> committed.  They already pass.
>>
>> Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
>> OK for trunk?
>>
>> Richard
>>
>>
>> gcc/testsuite/
>> * gcc.target/arm/stack-protector-5.c: New test.
>> * gcc.target/arm/stack-protector-6.c: Likewise.
>> ---
>
> Hi Richard,
>
> These new tests fail when compiling for cortex-a15 and cortex-a57...
> There are 2 "str" instructions generated, the code is much longer than
> for cortex-a9 for instance.
>
> They pass with cortex-a9, cortex-a5 and arm10tdmi.

Gah, thanks for the heads-up.  I've applied the below as obvious
after testing on arm-linux-gnueabihf and armeb-eabi.

Richard

>From f694a0d2edc025cb54657cb804960f97a31fbda2 Mon Sep 17 00:00:00 2001
From: Richard Sandiford 
Date: Tue, 13 Oct 2020 14:50:24 +0100
Subject: [PATCH] [arm] Use -Os for stack-protector-[56].c tests

Using -O2 made the tests subject to LDRD vs. LDM tuning.
The simplest fix seems to be to use -Os, so that LDM is
unequivocally a win.

gcc/testsuite/
* gcc.target/arm/stack-protector-5.c: Use -Os rather than -O2.
* gcc.target/arm/stack-protector-6.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/stack-protector-5.c | 2 +-
 gcc/testsuite/gcc.target/arm/stack-protector-6.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-5.c 
b/gcc/testsuite/gcc.target/arm/stack-protector-5.c
index b808b11aa3d..ae70b99efc4 100644
--- a/gcc/testsuite/gcc.target/arm/stack-protector-5.c
+++ b/gcc/testsuite/gcc.target/arm/stack-protector-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fstack-protector-all -O2" } */
+/* { dg-options "-fstack-protector-all -Os" } */
 
 void __attribute__ ((noipa))
 f (void)
diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-6.c 
b/gcc/testsuite/gcc.target/arm/stack-protector-6.c
index f8eec878bd6..2b7e6f72ea0 100644
--- a/gcc/testsuite/gcc.target/arm/stack-protector-6.c
+++ b/gcc/testsuite/gcc.target/arm/stack-protector-6.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target fpic } */
-/* { dg-options "-fstack-protector-all -O2 -fpic" } */
+/* { dg-options "-fstack-protector-all -Os -fpic" } */
 
 #include "stack-protector-5.c"
 
-- 
2.17.1



[PATCH] AArch64: Add Neoverse N2 tuning model

2020-10-13 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch adds a tuning structure for Neoverse N2 to allow for further tuning.
For now it's just a deduplication of the Neoverse N1 struct that it was reusing 
but with the SVE width set to 128.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk and will push similar patches to the branches where 
-mcpu=neoverse-n2 is supported.

Thanks,
Kyrill

gcc/
* config/aarch64/aarch64.c (neoversen2_tunings): Define.
* config/aarch64/aarch64-cores.def (neoverse-n2): Use it.


n2-tune.patch
Description: n2-tune.patch


Re: [patch] Rework CPP_BUILTINS_SPEC for powerpc-vxworks

2020-10-13 Thread Olivier Hainque
Hi Segher,

> On 5 Oct 2020, at 19:27, Olivier Hainque  wrote:
> 
> I'll post an updated version, thanks for the comments.

Here's an updated version which passed in-house build & tests
with gcc-10 based toolchains for VxWorks 6.9 & 7.2, and with which
I could build mainline for VxWorks 6.9 including libstdc++ (combined
with a few other patches, orthogonal to what is proposed here and
which I'll post shortly).

Same ChangeLog. Patch hopefully quotable if needed now.

Olivier

2020-10-13  Olivier Hainque  

   * config/rs6000/vxworks.h (TARGET_OS_CPP_BUILTINS): Accommodate
   expectations from different versions of VxWorks, for 32 or 64bit
   configurations.


diff --git a/gcc/config/rs6000/vxworks.h b/gcc/config/rs6000/vxworks.h
index 771dddf68bba..60e1ef42390f 100644
--- a/gcc/config/rs6000/vxworks.h
+++ b/gcc/config/rs6000/vxworks.h
@@ -29,17 +29,60 @@ along with GCC; see the file COPYING3.  If not see
 #define TARGET_OS_CPP_BUILTINS()   \
   do   \
 {  \
-  builtin_define ("__ppc");\
-  builtin_define ("__PPC__");  \
-  builtin_define ("__EABI__"); \
   builtin_define ("__ELF__");  \
+  if (!TARGET_VXWORKS7)\
+   builtin_define ("__EABI__");\
+   \
+  /* CPU macros, based on what the system compilers do.  */\
+  if (!TARGET_VXWORKS7)\
+   {   \
+ builtin_define ("__ppc"); \
+ /* Namespace violation below, but the system headers \
+really depend heavily on this.  */ \
+ builtin_define ("CPU_FAMILY=PPC");\
+   \
+ /* __PPC__ isn't actually emitted by the system compiler \
+prior to vx7 but has been advertised by us for ages.  */   \
+ builtin_define ("__PPC__");   \
+   }   \
+  else \
+   {   \
+ builtin_define ("__PPC__");   \
+ builtin_define ("__powerpc__");   \
+ if (TARGET_64BIT) \
+   {   \
+ builtin_define ("__PPC64__"); \
+ builtin_define ("__powerpc64__"); \
+   }   \
+ else  \
+   {   \
+ builtin_define ("__PPC"); \
+ builtin_define ("__powerpc"); \
+   }   \
+   }   \
+   \
+  /* Asserts for #cpu and #machine.  */\
+  if (TARGET_64BIT)\
+   {   \
+ builtin_assert ("cpu=powerpc64"); \
+ builtin_assert ("machine=powerpc64"); \
+   }   \
+  else \
+   {   \
+ builtin_assert ("cpu=powerpc");   \
+ builtin_assert ("machine=powerpc");   \
+   }   \
+   \
+  /* PowerPC VxWorks specificities.  */\
   if (!TARGET_SOFT_FLOAT)  \
-   builtin_define ("__hardfp");\
+   {   \
+ builtin_define ("__hardfp");  \
+ builtin_define ("_WRS_HARDWARE_FP");  \
+   }   \
\
-  /* C89 namespace violation! */   \
-  builtin_define ("CPU_FAMILY=PPC");   \
-   \
+  /* Common VxWorks and port items.  */\
   VXWORKS_OS_CPP_BUILTINS ();  \
+  TARGET_OS_SYSV_CPP_BUILTINS ();  \
 }  \
   while (0)
 


Re: [arm-perf-staging branch] Add support for -fno-alias

2020-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 13, 2020 at 10:36 AM Tamar Christina via Gcc-patches
 wrote:
>
> Hi,
>
> I am sending some old patches that we have internally since GCC 10 to the Arm 
> Branch but feel free to comment as we will be looking to submit them for GCC 
> 12 to mainline.
>
> This patch adds the option '-fno-alias'. The option makes the compiler treat 
> any pointer being passed as a parameter as if it had the keyword restrict.
> This option makes it easier to check whether using restrict gives a 
> performance boost, without having to change the sources.
> Of course this option can only be used if you know for a fact that all 
> pointers do not alias, or just like with the restrict keyword, things can go 
> very wrong.
>
> The way this patch implements this option is when the option is passed, 
> create a qualified pointer type for any pointer type encountered as a 
> parameter.
> This qualified pointer type will have the TYPE_QUAL_RESTRICT bit set, just as 
> if we had parsed a 'restrict' keyword.
>
> Bootstraped on aarch64-none-linux-gnu, to make sure I didn't break anything 
> obvious in the build.
>
> Is this OK for trunk?

-fno-alias is a very unspecific name.  We've formerly had

fargument-noalias
Common Ignore
Does nothing. Preserved for backward compatibility.

fargument-noalias-global
Common Ignore
Does nothing. Preserved for backward compatibility.

fargument-noalias-anything
Common Ignore
Does nothing. Preserved for backward compatibility.

and you might want to dive into history as to why we removed those.

So - no, please not.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2020-xx-xx  Andre Vieira  
>
> * common.opt (fno-alias): New option.
> * c/cdecl.c (grokdeclarator): When flag_no_alias make all
> pointers passed as parameters restrict.
> * cp/decl.c (grokdeclarator): Likewise.
> * doc/invoke.texi (fno-alias): Document new option.
>
> gcc/testsuite/ChangeLog:
>
> 2020-xx-xx  Andre Vieira  
>
> * gcc.dg/vect/vect-no-alias.c: New test.
> * gcc.dg/vect/noalias.h: New include file used in test.
> * g++.dg/vect/simd-no-alias.cc: New test.
>
> --


Re: [PATCH, 2/3, OpenMP] Target mapping changes for OpenMP 5.0, middle-end parts and compiler testcases

2020-10-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 01, 2020 at 09:16:48PM +0800, Chung-Lin Tang wrote:
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -8350,14 +8350,126 @@ extract_base_bit_offset (tree base, tree *base_ref, 
> poly_int64 *bitposp,
>/* Set *BASE_REF if BASE was a dereferenced reference variable.  */
>if (base_ref && orig_base != base)
>  *base_ref = orig_base;
>  
>return base;
>  }
>  
> +/* Returns true if EXPR is or contains (as a sub-component) BASE_PTR.  */
> +
> +static bool
> +is_or_contains_p (tree expr, tree base_ptr)
> +{
> +  while (expr != base_ptr)
> +if (TREE_CODE (base_ptr) == COMPONENT_REF)
> +  base_ptr = TREE_OPERAND (base_ptr, 0);
> +else
> +  break;
> +  return expr == base_ptr;
> +}
> +
> +/* Implement OpenMP 5.x map ordering rules for target directives. There are
> +   several rules, and with some level of ambiguity, hopefully we can at least
> +   collect the complexity here in one place.  */
> +
> +static void
> +omp_target_reorder_clauses (tree *list_p)
> +{

So, first of all, are you convinced we can sort just the explicit clauses
and leave out the (later on) implicitly added ones?
If it is possible, sure, it will be easier (because we later on need to deal
with the GOMP_MAP_STRUCT sorting too).

> +  vec clauses = vNULL;

Isn't this a memory leak?  Nothing frees the vector.  Perhaps better
  auto_vec clauses;

> +  for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp))
> +clauses.safe_push (*cp);

The rest of the function deals only with OMP_CLAUSE_MAP clauses, wouldn't it
be better to just push to the vec those clauses and keep other clauses just
in *list_p chain?

> +  /* Collect refs to alloc/release/delete maps.  */
> +  vec ard = vNULL;

Again, auto_vec ard;

> +  tree *cp = list_p;
> +  for (unsigned int i = 0; i < clauses.length (); i++)
> +if (clauses[i])
> +  {
> + *cp = clauses[i];
> + cp = &OMP_CLAUSE_CHAIN (clauses[i]);
> +  }
> +  for (unsigned int i = 0; i < ard.length (); i++)
> +{
> +  *cp = ard[i];
> +  cp = &OMP_CLAUSE_CHAIN (ard[i]);
> +}
> +  *cp = NULL_TREE;
> +
> +  /* OpenMP 5.0 requires that pointer variables are mapped before
> + its use as a base-pointer.  */
> +  for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp))
> +if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP)
> +  {
> + tree decl = OMP_CLAUSE_DECL (*cp);
> + gomp_map_kind k = OMP_CLAUSE_MAP_KIND (*cp);
> + if ((k == GOMP_MAP_ALLOC
> +  || k == GOMP_MAP_TO
> +  || k == GOMP_MAP_FROM
> +  || k == GOMP_MAP_TOFROM)

What about the *ALWAYS* kinds?

> + && (TREE_CODE (decl) == INDIRECT_REF
> + || TREE_CODE (decl) == MEM_REF))
> +   {
> + tree base_ptr = TREE_OPERAND (decl, 0);
> + STRIP_TYPE_NOPS (base_ptr);
> + for (tree *cp2 = &OMP_CLAUSE_CHAIN (*cp); *cp2;
> +  cp2 = &OMP_CLAUSE_CHAIN (*cp2))
> +   if (OMP_CLAUSE_CODE (*cp2) == OMP_CLAUSE_MAP)
> + {
> +   tree decl2 = OMP_CLAUSE_DECL (*cp2);
> +   gomp_map_kind k2 = OMP_CLAUSE_MAP_KIND (*cp2);
> +   if ((k2 == GOMP_MAP_ALLOC
> +|| k2 == GOMP_MAP_TO
> +|| k2 == GOMP_MAP_FROM
> +|| k2 == GOMP_MAP_TOFROM)

Again.

This is O(n^2) too, but due to the is_or_contains_p I'm not sure
if we can avoid it.  Perhaps sort the clauses by uid of the base expressions
and deal with those separately.  Maybe let's ignore it for now.

> @@ -8958,25 +9083,20 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
> /* An "attach/detach" operation on an update directive should
>behave as a GOMP_MAP_ALWAYS_POINTER.  Beware that
>unlike attach or detach map kinds, GOMP_MAP_ALWAYS_POINTER
>depends on the previous mapping.  */
> if (code == OACC_UPDATE
> && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH)
>   OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ALWAYS_POINTER);
> -   if (gimplify_expr (pd, pre_p, NULL, is_gimple_lvalue, fb_lvalue)
> -   == GS_ERROR)
> - {
> -   remove = true;
> -   break;
> - }

So what gimplifies those now?

> + if (! (code == OMP_TARGET
> +|| code == OMP_TARGET_DATA
> +|| code == OMP_TARGET_ENTER_DATA
> +|| code == OMP_TARGET_EXIT_DATA))
> +   {

Isn't this just if ((region_type & ORT_ACC) == 0) ?  Or do we want
it for target update too?  Though, we wouldn't talk about more than once in
map clauses then because target update doesn't have those.


Jakub



Re: Fix handling of access ranges in ipa-modref

2020-10-13 Thread Martin Jambor
Hi,

On Sun, Oct 11 2020, Jan Hubicka wrote:
> Hi,
> this patch fixes the range tracking in argument and re-enables it for clones
> (the bug that broke dealII and x264 benchmarks)
>
> It turned out that there was three problems
>  1) for SRA/ipa-cp clones we did not update summarries to represent new
> signature.  This is now done in modref_transform.
> I tested it in ipa-sra testcases and it seems to work fine, but Martin,
> can you please take a look?
>
> Param adjustment interface provides original indexes for new indexes of
> parameters.  I need reverse information: for original index I need new
> that I compute via array map and then do the rewrite.

Looks OK, I only wonder whether the functionality cannot be more
generally useful and so whether computation of the reverse map should
not be made part of ipa_param_adjustments interface.

>
> Martin, if things passed by references are translated to stuff
> passed by value, we may eliminate the corrsponding acess records,
> because reading function parameters is not considered a side-effect
> by mod-ref.  Is there easy way to detect such change?

Such new parameter will have IPA_PARAM_OP_SPLIT as the operation in its
ipa_adjusted_param and so ipa_param_adjustments::get_original_index will
already return -1 for it, if that is enough.  The way to detect that is
currently only the fact that a pointer argument is split (new IPA SRA no
longer splits a reference into sub-references).  If the old parameter is
no longer available we need to add a flag, I'm afraid.

>
>  2) The propagation via jump functions (in applying inline decision)
> mixed bits and bytes in parameter offsets.
> Martin, it is not clear to me why the offset is in bits - there is no
> way to pass a pointer to non-byte aligned address and it seems that this
> only adds a risk of overflows.

Yeah, when I started writing the first versions of the code I simply
used directly the types that get_ref_base_and_extent uses.  I hope it's
all in HOST_WIDE_INTs and so should not really overflow but it wastes
memory.  Converting all of this to bytes is on my TODO list (and new
IPA-SRA already does that in the IPA phase).

> There seems to be overflow check missing in
> update_jump_functions_after_inlining, but it seems to me that these days
> this should be poly_int64.

Not sure about this.  When the conversion to poly_ints happened,
get_ref_base_and_extent_hwi was introduced for the purposes of IPA
stuff, so I think even the poly_int authors did think it was the right
thing to use here.  In any case, I'm still having hard time wrapping my
head around poly_ints, so I'd like to see a testcase before going down
this route.

Thanks,

Martin



Re: [PATCH, 1/3, OpenMP] Target mapping changes for OpenMP 5.0, front-end parts

2020-10-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Sep 01, 2020 at 09:16:23PM +0800, Chung-Lin Tang wrote:
> this patch set implements parts of the target mapping changes introduced
> in OpenMP 5.0, mainly the attachment requirements for pointer-based
> list items, and the clause ordering.
> 
> The first patch here are the C/C++ front-end changes.
> 
> The entire set of changes has been tested for without regressions for
> the compiler and libgomp. Hope this is ready to commit to master.

Sorry for the delay in patch review and thanks for the standard citations,
that really helps.

> gcc/c-family/
> * c-common.h (c_omp_adjust_clauses): New declaration.
> * c-omp.c (c_omp_adjust_clauses): New function.

Besides the naming, I wonder why is it done in a separate function and so
early, can't what the function does be done either in
{,c_}finish_omp_clauses (provided we'd pass separate ORT_OMP vs.
ORT_OMP_TARGET to it to determine if it is target region vs. anything else),
or perhaps even better during gimplification (gimplify_scan_omp_clauses)?

> gcc/c/
> * c-parser.c (c_parser_omp_target_data): Add use of
> new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as
>   handled map clause kind.
> (c_parser_omp_target_enter_data): Likewise.
> (c_parser_omp_target_exit_data): Likewise.
> (c_parser_omp_target): Likewise.
> * c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
> use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
> (c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
> same struct field access to co-exist on OpenMP construct.
> 
> gcc/cp/
> * parser.c (cp_parser_omp_target_data): Add use of
> new c_omp_adjust_clauses function. Add GOMP_MAP_ATTACH_DETACH as
> handled map clause kind.
> (cp_parser_omp_target_enter_data): Likewise.
>   (cp_parser_omp_target_exit_data): Likewise.
>   (cp_parser_omp_target): Likewise.
>   * semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
>   use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
>   interaction between reference case and attach/detach.
>   (finish_omp_clauses): Adjust bitmap checks to allow struct decl and
>   same struct field access to co-exist on OpenMP construct.

The changelog has some 8 space indented lines.

> +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> + && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
> + && TREE_CODE (TREE_TYPE (OMP_CLAUSE_DECL (c))) != ARRAY_TYPE)
> +  {
> + tree ptr = OMP_CLAUSE_DECL (c);
> + bool ptr_mapped = false;
> + if (is_target)
> +   {
> + for (tree m = clauses; m; m = OMP_CLAUSE_CHAIN (m))

Isn't this O(n^2) in number of clauses?  I mean, e.g. for the equality
comparisons (but see below) it could be dealt with e.g. using some bitmap
with DECL_UIDs.

> +   if (OMP_CLAUSE_CODE (m) == OMP_CLAUSE_MAP
> +   && OMP_CLAUSE_DECL (m) == ptr

Does it really need to be equality?  I mean it will be for
map(tofrom:ptr) map(tofrom:ptr[:32])
but what about e.g.
map(tofrom:structx) map(tofrom:structx.ptr[:32])
?  It is true that likely we don't parse this yet though.

> +   && (OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_ALLOC
> +   || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_TO
> +   || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_FROM
> +   || OMP_CLAUSE_MAP_KIND (m) == GOMP_MAP_TOFROM))

What about the always modified mapping kinds?

> + {
> +   ptr_mapped = true;
> +   break;
> + }
> +
> + if (!ptr_mapped
> + && DECL_P (ptr)
> + && is_global_var (ptr)
> + && lookup_attribute ("omp declare target",
> +  DECL_ATTRIBUTES (ptr)))
> +   ptr_mapped = true;
> +   }
> +
> + /* If the pointer variable was mapped, or if this is not an offloaded
> +target region, adjust the map kind to attach/detach.  */
> + if (ptr_mapped || !is_target)
> +   {
> + OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ATTACH_DETACH);
> + c_common_mark_addressable_vec (ptr);

Though perhaps this is argument why it needs to be done in the FEs and not
during gimplification, because it is hard to mark something addressable at
that point.

> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -13580,16 +13580,17 @@ handle_omp_array_sections (tree c, enum 
> c_omp_region_type ort)
>   break;
> }
>tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
>if (ort != C_ORT_OMP && ort != C_ORT_ACC)
>   OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_POINTER);
>else if (TREE_CODE (t) == COMPONENT_REF)
>   {
> -   gomp_map_kind k = (ort == C_ORT_ACC) ? GOMP_MAP_ATTACH_DETACH
> -  

Re: [PATCH] arm: Add missing vec_cmp and vcond patterns

2020-10-13 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply.

Christophe Lyon  writes:
> On Thu, 1 Oct 2020 at 16:10, Richard Sandiford via Gcc-patches
>  wrote:
>>
>> This patch does several things at once:
>>
>> (1) Add vector compare patterns (vec_cmp and vec_cmpu).
>>
>> (2) Add vector selects between floating-point modes when the
>> values being compared are integers (affects vcond and vcondu).
>>
>> (3) Add vector selects between integer modes when the values being
>> compared are floating-point (affects vcond).
>>
>> (4) Add standalone vector select patterns (vcond_mask).
>>
>> (5) Tweak the handling of compound comparisons with zeros.
>>
>> Unfortunately it proved too difficult (for me) to separate this
>> out into a series of smaller patches, since everything is so
>> inter-related.  Defining only some of the new patterns does
>> not leave things in a happy state.
>>
>> The handling of comparisons is mostly taken from the vcond patterns.
>> This means that it remains non-compliant with IEEE: “quiet” comparisons
>> use signalling instructions.  But that shouldn't matter for floats,
>> since we require -funsafe-math-optimizations to vectorize for them
>> anyway.
>>
>> It remains the case that comparisons and selects aren't implemented
>> at all for HF vectors.  Implementing those feels like separate work.
>>
>> Tested on arm-linux-gnueabihf and arm-eabi (for MVE).  OK to install?
>>
>> Richard
>>
>
> Hi Richard,
>
> This patches enables a few more tests on armeb-linux-gnueabihf
> --with-cpu cortex-a9
> --with-fpu neon-fp16, with these failures:
> gcc.dg/vect/slp-cond-2-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorizing stmts using SLP" 3
> gcc.dg/vect/slp-cond-2-big-array.c scan-tree-dump-times vect
> "vectorizing stmts using SLP" 3
> gcc.dg/vect/slp-cond-2.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorizing stmts using SLP" 3
> gcc.dg/vect/slp-cond-2.c scan-tree-dump-times vect "vectorizing
> stmts using SLP" 3
> gcc.dg/vect/vect-cond-10.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorized 1 loops" 8
> gcc.dg/vect/vect-cond-10.c scan-tree-dump-times vect "vectorized 1 loops" 
> 8
> gcc.dg/vect/vect-cond-8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorized 1 loops" 5
> gcc.dg/vect/vect-cond-8.c scan-tree-dump-times vect "vectorized 1 loops" 5
> gcc.dg/vect/vect-cond-9.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorized 1 loops" 10
> gcc.dg/vect/vect-cond-9.c scan-tree-dump-times vect "vectorized 1 loops" 
> 10
>
> I guess this is expected since vectorization does not work well on
> armeb in general?

Yeah, seems like it, unfortunately.  I think if we wanted to fix this,
we should look at supporting the operations disabled in r176050.  Packs
and unpacks seem to be the problem for at least some of the tests above.

Trying to make the results clean with the current (somewhat artificial)
restrictions seems like a dead end.

Thanks,
Richard



Re: [Patch] lto-wrapper: Use nontemp filename with -save-temps

2020-10-13 Thread Richard Biener
On Tue, 13 Oct 2020, Tobias Burnus wrote:

> There are still some @cc... files under /tmp,
> but at lease another file is now at the proper place.
> 
> This patch generates (for a.out and -save-temps)
> the file a.crtoffloadtable.o.
> 
> (I have not fully digested the LTO calls, but I think
> this file is only created once and not by concurrent
> lto-wrapper runs.)
> 
> OK?

OK.

Richard.

> Tobias
> 
> PS: After this patch, there are still some @... files,
> e.g. ccKqLnBY.ofldlist (generated by lto-plugin/lto-plugin.c)
> and those generated by collect-utils.c's fork_execute,
> invoked by config/nvptx/mkoffload.c, config/gcn/mkoffload.c,
> and lto-wrapper.c (with use_atfile = true).
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander
> Walter
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: Support ofsetted parameters in local modref

2020-10-13 Thread Jan Hubicka
> On Tue, 13 Oct 2020, Jan Hubicka wrote:
> 
> > > > So I implemented my own pattern matching and I think we should switch 
> > > > ipa-prop
> > > > on it incrementally. It is based on similar logic in
> > > > ao_ref_init_from_ptr_and_size.
> > > 
> > > So instead of re-inventing the wheel what about splitting out a
> > > common helper instead?
> > 
> > It was my original idea. However it is not completely trivial:
> > ao_ref_init_from_ptr_and_size does bit something else since it is trying
> > to keep the original ref inside ADDR_EXPR (if one is found) to feed it
> > into oracle rather then stripping them all.  If refs are present it does
> > not need to build MEM_REF.
> 
> You've also coded other subtle differences - is it really better
> to have p_1 with unknown offset for in p_3 = p_1 + i_2 instead of
> having p_3?  I suppose the idea is to look for an offsetted
> default def (aka parameter)?

Yes this was intentional (for modref).
Modref primarily cares about the based pointer (for PTA). if it knows
offset it is better.
> 
> Frankly I don't like the loop - unbounded walk over unoptimized ptr += 4;
> ptr += 4; ... chain, possibly quadratic if it's like
> 
> void foo (void *p, int i)
> {
>   void *p1 = p + i;
>   void *p2 = p1 + i;
>   void *p3 = p2 + i;
>   bar (p1, p2, p3);
> }
> 
> so I wonder if the propagation engine instead wants to cache
> the discovered base + offset for a (pointer?) SSA name?  Not
> sure if this recursion when the offset becomes unknown is
> of any help to ipa-prop - IPA prop is also interested in
> other than pointer parameters I guess.

I would like to extend ipa-prop to also record anscessor jump functions
without offset known (to help modref propagate), so in both cases this
should work.

ipa-prop currently handles arith/simple and unary apass through
separately and only then it does the ancessor.

Concerning the loop, maybe stopping after few iterations is better
solution, since ealry pases should eliminate the chains, but I can add
the cache.

Honza
> 
> > static void
> > ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
> > bool range_known,
> > poly_int64 offset,
> > poly_int64 size,
> > poly_int64 max_size)
> > {
> >   poly_int64 t, extra_offset = 0;
> > 
> >   ref->ref = NULL_TREE;
> >   if (TREE_CODE (ptr) == SSA_NAME)
> > {
> >   gimple *stmt = SSA_NAME_DEF_STMT (ptr);
> >   if (gimple_assign_single_p (stmt)
> >   && gimple_assign_rhs_code (stmt) == ADDR_EXPR)
> > ptr = gimple_assign_rhs1 (stmt);
> >   else if (is_gimple_assign (stmt)
> >&& gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
> >&& ptrdiff_tree_p (gimple_assign_rhs2 (stmt), &extra_offset))
> > {
> >   ptr = gimple_assign_rhs1 (stmt);
> >   extra_offset *= BITS_PER_UNIT;
> > }
> > }
> > 
> >   if (TREE_CODE (ptr) == ADDR_EXPR)
> > {
> >   ref->base = get_addr_base_and_unit_offset (TREE_OPERAND (ptr, 0), &t);
> >   if (ref->base)
> > ref->offset = BITS_PER_UNIT * t;
> >   else
> > {
> >   range_known = false;
> >   ref->offset = 0;
> >   ref->base = get_base_address (TREE_OPERAND (ptr, 0));
> > }
> > }
> >   else
> > {
> >   gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)));
> >   ref->base = build2 (MEM_REF, char_type_node,
> >   ptr, null_pointer_node);
> >   ref->offset = 0;
> > }
> > 
> > I could add a flag if I am looking for ADDR_EXPR or just the base
> > pointer but it seemed more for incremental change.  The new helper
> > should be immediately useful for ipa-modref, ipa-prop and
> > ipa-polymorphic-call though.
> > 
> > Honza
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imend


Re: [PATCH] Remove STMT_VINFO_SAME_ALIGN_REFS

2020-10-13 Thread Richard Biener
On Tue, 13 Oct 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > This makes the only consumer of STMT_VINFO_SAME_ALIGN_REFS, the
> > loop peeling for alignment code, use locally computed data and
> > then removes STMT_VINFO_SAME_ALIGN_REFS and its computation.
> > The main benefit is reducing the dependence computation load
> > by not asking for read-read dependences and getting rid of
> > DR related info from stmt_vec_info.
> 
> Nice.  Some very minor comments below?
> 
> > @@ -1130,6 +1129,45 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> > dr_vec_info *dr_info)
> >return;
> >  }
> >  
> > +/* Return whether DR_INFO, which is related to DR_PEEL_INFO in
> > +   that it only differs in DR_INIT, is aligned if DR_PEEL_INFO
> > +   is made aligned via peeling.  */
> > +
> > +static bool
> > +vect_dr_aligned_if_related_peeled_dr_is (dr_vec_info *dr_info,
> > +dr_vec_info *dr_peel_info)
> > +{
> > +  if (known_ge (DR_TARGET_ALIGNMENT (dr_peel_info),
> > +   DR_TARGET_ALIGNMENT (dr_info)))
> 
> Wondered here if multiple_p might be clearer/more precise than known_ge,
> but of course non-power-of-2 alignments aren't much use?

Works for me.

> > +{
> > +  poly_offset_int diff
> > +   = (wi::to_poly_offset (DR_INIT (dr_peel_info->dr))
> > +  - wi::to_poly_offset (DR_INIT (dr_info->dr)));
> > +  if (known_eq (diff, 0)
> > + || multiple_p (diff, DR_TARGET_ALIGNMENT (dr_info)))
> > +   return true;
> > +}
> > +  return false;
> > +}
> > +
> > +/* Return whether DR_INFO is aligned if DR_PEEL_INFO is made
> > +   aligned via peeling.  */
> > +
> > +static bool
> > +vect_dr_aligned_if_peeled_dr_is (dr_vec_info *dr_info,
> > +dr_vec_info *dr_peel_info)
> > +{
> > +  if (!operand_equal_p (DR_BASE_ADDRESS (dr_info->dr),
> > +   DR_BASE_ADDRESS (dr_peel_info->dr), 0)
> > +  || !operand_equal_p (DR_OFFSET (dr_info->dr),
> > +  DR_OFFSET (dr_peel_info->dr), 0)
> > +  || !operand_equal_p (DR_STEP (dr_info->dr),
> > +  DR_STEP (dr_peel_info->dr), 0))
> > +return false;
> > +
> > +  return vect_dr_aligned_if_related_peeled_dr_is (dr_info, dr_peel_info);
> > +}
> > +
> >  /* Function vect_update_misalignment_for_peel.
> > Sets DR_INFO's misalignment
> > - to 0 if it has the same alignment as DR_PEEL_INFO,
> > @@ -1146,18 +1184,10 @@ static void
> >  vect_update_misalignment_for_peel (dr_vec_info *dr_info,
> >dr_vec_info *dr_peel_info, int npeel)
> >  {
> > -  unsigned int i;
> > -  vec same_aligned_drs;
> > -  struct data_reference *current_dr;
> > -  stmt_vec_info peel_stmt_info = dr_peel_info->stmt;
> > -
> >/* It can be assumed that if dr_info has the same alignment as dr_peel,
> >   it is aligned in the vector loop.  */
> > -  same_aligned_drs = STMT_VINFO_SAME_ALIGN_REFS (peel_stmt_info);
> > -  FOR_EACH_VEC_ELT (same_aligned_drs, i, current_dr)
> > +  if (vect_dr_aligned_if_peeled_dr_is (dr_info, dr_peel_info))
> >  {
> > -  if (current_dr != dr_info->dr)
> > -continue;
> >gcc_assert (!known_alignment_for_access_p (dr_info)
> >   || !known_alignment_for_access_p (dr_peel_info)
> >   || (DR_MISALIGNMENT (dr_info)
> > @@ -1572,6 +1602,43 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
> > dr_vec_info *dr0_info,
> >return true;
> >  }
> >  
> > +/* Compare two data-references DRA and DRB to group them into chunks
> > +   with related alignment.  */
> > +
> > +static int
> > +dr_align_group_sort_cmp (const void *dra_, const void *drb_)
> > +{
> > +  data_reference_p dra = *(data_reference_p *)const_cast(dra_);
> > +  data_reference_p drb = *(data_reference_p *)const_cast(drb_);
> > +  int cmp;
> > +
> > +  /* Stabilize sort.  */
> > +  if (dra == drb)
> > +return 0;
> > +
> > +  /* Ordering of DRs according to base.  */
> > +  cmp = data_ref_compare_tree (DR_BASE_ADDRESS (dra),
> > +  DR_BASE_ADDRESS (drb));
> > +  if (cmp != 0)
> > +return cmp;
> > +
> > +  /* And according to DR_OFFSET.  */
> > +  cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb));
> > +  if (cmp != 0)
> > +return cmp;
> > +
> > +  /* And after step.  */
> > +  cmp = data_ref_compare_tree (DR_STEP (dra), DR_STEP (drb));
> > +  if (cmp != 0)
> > +return cmp;
> > +
> > +  /* Then sort after DR_INIT.  In case of identical DRs sort after stmt 
> > UID.  */
> > +  cmp = data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb));
> > +  if (cmp == 0)
> > +return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 
> > 1;
> > +  return cmp;
> > +}
> > +
> >  /* Function vect_enhance_data_refs_alignment
> >  
> > This pass will use loop versioning and loop peeling in order to enhance
> > @@ -1666,7 +1733,6 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
> > dr_vec_info *dr0_info,
> >  opt_result
> >  vect

[Patch] lto-wrapper: Use nontemp filename with -save-temps

2020-10-13 Thread Tobias Burnus

There are still some @cc... files under /tmp,
but at lease another file is now at the proper place.

This patch generates (for a.out and -save-temps)
the file a.crtoffloadtable.o.

(I have not fully digested the LTO calls, but I think
this file is only created once and not by concurrent
lto-wrapper runs.)

OK?

Tobias

PS: After this patch, there are still some @... files,
e.g. ccKqLnBY.ofldlist (generated by lto-plugin/lto-plugin.c)
and those generated by collect-utils.c's fork_execute,
invoked by config/nvptx/mkoffload.c, config/gcn/mkoffload.c,
and lto-wrapper.c (with use_atfile = true).

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
lto-wrapper: Use nontemp filename with -save-temps

gcc/ChangeLog:

	* lto-wrapper.c (find_crtoffloadtable): With -save-temps,
	use non-temp file name utilizing the dump prefix.
	(run_gcc): Update call.

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 82cfa6bd67e..4d3cd7a56f2 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1026,7 +1026,7 @@ copy_file (const char *dest, const char *src)
the copy to the linker.  */
 
 static void
-find_crtoffloadtable (void)
+find_crtoffloadtable (int save_temps, const char *dumppfx)
 {
   char **paths = NULL;
   const char *library_path = getenv ("LIBRARY_PATH");
@@ -1039,7 +1039,11 @@ find_crtoffloadtable (void)
 if (access_check (paths[i], R_OK) == 0)
   {
 	/* The linker will delete the filename we give it, so make a copy.  */
-	char *crtoffloadtable = make_temp_file (".crtoffloadtable.o");
+	char *crtoffloadtable;
+	if (!save_temps)
+	  crtoffloadtable = make_temp_file (".crtoffloadtable.o");
+	else
+	  crtoffloadtable = concat (dumppfx, "crtoffloadtable.o");
 	copy_file (crtoffloadtable, paths[i]);
 	printf ("%s\n", crtoffloadtable);
 	XDELETEVEC (crtoffloadtable);
@@ -1698,7 +1702,7 @@ cont1:
 
   if (offload_names)
 	{
-	  find_crtoffloadtable ();
+	  find_crtoffloadtable (save_temps, dumppfx);
 	  for (i = 0; offload_names[i]; i++)
 	printf ("%s\n", offload_names[i]);
 	  free_array_of_ptrs ((void **) offload_names, i);


Re: Support ofsetted parameters in local modref

2020-10-13 Thread Richard Biener
On Tue, 13 Oct 2020, Jan Hubicka wrote:

> > > So I implemented my own pattern matching and I think we should switch 
> > > ipa-prop
> > > on it incrementally. It is based on similar logic in
> > > ao_ref_init_from_ptr_and_size.
> > 
> > So instead of re-inventing the wheel what about splitting out a
> > common helper instead?
> 
> It was my original idea. However it is not completely trivial:
> ao_ref_init_from_ptr_and_size does bit something else since it is trying
> to keep the original ref inside ADDR_EXPR (if one is found) to feed it
> into oracle rather then stripping them all.  If refs are present it does
> not need to build MEM_REF.

You've also coded other subtle differences - is it really better
to have p_1 with unknown offset for in p_3 = p_1 + i_2 instead of
having p_3?  I suppose the idea is to look for an offsetted
default def (aka parameter)?

Frankly I don't like the loop - unbounded walk over unoptimized ptr += 4;
ptr += 4; ... chain, possibly quadratic if it's like

void foo (void *p, int i)
{
  void *p1 = p + i;
  void *p2 = p1 + i;
  void *p3 = p2 + i;
  bar (p1, p2, p3);
}

so I wonder if the propagation engine instead wants to cache
the discovered base + offset for a (pointer?) SSA name?  Not
sure if this recursion when the offset becomes unknown is
of any help to ipa-prop - IPA prop is also interested in
other than pointer parameters I guess.

> static void
> ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
>   bool range_known,
>   poly_int64 offset,
>   poly_int64 size,
>   poly_int64 max_size)
> {
>   poly_int64 t, extra_offset = 0;
> 
>   ref->ref = NULL_TREE;
>   if (TREE_CODE (ptr) == SSA_NAME)
> {
>   gimple *stmt = SSA_NAME_DEF_STMT (ptr);
>   if (gimple_assign_single_p (stmt)
> && gimple_assign_rhs_code (stmt) == ADDR_EXPR)
>   ptr = gimple_assign_rhs1 (stmt);
>   else if (is_gimple_assign (stmt)
>  && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
>  && ptrdiff_tree_p (gimple_assign_rhs2 (stmt), &extra_offset))
>   {
> ptr = gimple_assign_rhs1 (stmt);
> extra_offset *= BITS_PER_UNIT;
>   }
> }
> 
>   if (TREE_CODE (ptr) == ADDR_EXPR)
> {
>   ref->base = get_addr_base_and_unit_offset (TREE_OPERAND (ptr, 0), &t);
>   if (ref->base)
>   ref->offset = BITS_PER_UNIT * t;
>   else
>   {
> range_known = false;
> ref->offset = 0;
> ref->base = get_base_address (TREE_OPERAND (ptr, 0));
>   }
> }
>   else
> {
>   gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)));
>   ref->base = build2 (MEM_REF, char_type_node,
> ptr, null_pointer_node);
>   ref->offset = 0;
> }
> 
> I could add a flag if I am looking for ADDR_EXPR or just the base
> pointer but it seemed more for incremental change.  The new helper
> should be immediately useful for ipa-modref, ipa-prop and
> ipa-polymorphic-call though.
> 
> Honza
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH] Remove STMT_VINFO_SAME_ALIGN_REFS

2020-10-13 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This makes the only consumer of STMT_VINFO_SAME_ALIGN_REFS, the
> loop peeling for alignment code, use locally computed data and
> then removes STMT_VINFO_SAME_ALIGN_REFS and its computation.
> The main benefit is reducing the dependence computation load
> by not asking for read-read dependences and getting rid of
> DR related info from stmt_vec_info.

Nice.  Some very minor comments below…

> @@ -1130,6 +1129,45 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info)
>return;
>  }
>  
> +/* Return whether DR_INFO, which is related to DR_PEEL_INFO in
> +   that it only differs in DR_INIT, is aligned if DR_PEEL_INFO
> +   is made aligned via peeling.  */
> +
> +static bool
> +vect_dr_aligned_if_related_peeled_dr_is (dr_vec_info *dr_info,
> +  dr_vec_info *dr_peel_info)
> +{
> +  if (known_ge (DR_TARGET_ALIGNMENT (dr_peel_info),
> + DR_TARGET_ALIGNMENT (dr_info)))

Wondered here if multiple_p might be clearer/more precise than known_ge,
but of course non-power-of-2 alignments aren't much use…

> +{
> +  poly_offset_int diff
> + = (wi::to_poly_offset (DR_INIT (dr_peel_info->dr))
> +- wi::to_poly_offset (DR_INIT (dr_info->dr)));
> +  if (known_eq (diff, 0)
> +   || multiple_p (diff, DR_TARGET_ALIGNMENT (dr_info)))
> + return true;
> +}
> +  return false;
> +}
> +
> +/* Return whether DR_INFO is aligned if DR_PEEL_INFO is made
> +   aligned via peeling.  */
> +
> +static bool
> +vect_dr_aligned_if_peeled_dr_is (dr_vec_info *dr_info,
> +  dr_vec_info *dr_peel_info)
> +{
> +  if (!operand_equal_p (DR_BASE_ADDRESS (dr_info->dr),
> + DR_BASE_ADDRESS (dr_peel_info->dr), 0)
> +  || !operand_equal_p (DR_OFFSET (dr_info->dr),
> +DR_OFFSET (dr_peel_info->dr), 0)
> +  || !operand_equal_p (DR_STEP (dr_info->dr),
> +DR_STEP (dr_peel_info->dr), 0))
> +return false;
> +
> +  return vect_dr_aligned_if_related_peeled_dr_is (dr_info, dr_peel_info);
> +}
> +
>  /* Function vect_update_misalignment_for_peel.
> Sets DR_INFO's misalignment
> - to 0 if it has the same alignment as DR_PEEL_INFO,
> @@ -1146,18 +1184,10 @@ static void
>  vect_update_misalignment_for_peel (dr_vec_info *dr_info,
>  dr_vec_info *dr_peel_info, int npeel)
>  {
> -  unsigned int i;
> -  vec same_aligned_drs;
> -  struct data_reference *current_dr;
> -  stmt_vec_info peel_stmt_info = dr_peel_info->stmt;
> -
>/* It can be assumed that if dr_info has the same alignment as dr_peel,
>   it is aligned in the vector loop.  */
> -  same_aligned_drs = STMT_VINFO_SAME_ALIGN_REFS (peel_stmt_info);
> -  FOR_EACH_VEC_ELT (same_aligned_drs, i, current_dr)
> +  if (vect_dr_aligned_if_peeled_dr_is (dr_info, dr_peel_info))
>  {
> -  if (current_dr != dr_info->dr)
> -continue;
>gcc_assert (!known_alignment_for_access_p (dr_info)
> || !known_alignment_for_access_p (dr_peel_info)
> || (DR_MISALIGNMENT (dr_info)
> @@ -1572,6 +1602,43 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
> dr_vec_info *dr0_info,
>return true;
>  }
>  
> +/* Compare two data-references DRA and DRB to group them into chunks
> +   with related alignment.  */
> +
> +static int
> +dr_align_group_sort_cmp (const void *dra_, const void *drb_)
> +{
> +  data_reference_p dra = *(data_reference_p *)const_cast(dra_);
> +  data_reference_p drb = *(data_reference_p *)const_cast(drb_);
> +  int cmp;
> +
> +  /* Stabilize sort.  */
> +  if (dra == drb)
> +return 0;
> +
> +  /* Ordering of DRs according to base.  */
> +  cmp = data_ref_compare_tree (DR_BASE_ADDRESS (dra),
> +DR_BASE_ADDRESS (drb));
> +  if (cmp != 0)
> +return cmp;
> +
> +  /* And according to DR_OFFSET.  */
> +  cmp = data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb));
> +  if (cmp != 0)
> +return cmp;
> +
> +  /* And after step.  */
> +  cmp = data_ref_compare_tree (DR_STEP (dra), DR_STEP (drb));
> +  if (cmp != 0)
> +return cmp;
> +
> +  /* Then sort after DR_INIT.  In case of identical DRs sort after stmt UID. 
>  */
> +  cmp = data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb));
> +  if (cmp == 0)
> +return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1;
> +  return cmp;
> +}
> +
>  /* Function vect_enhance_data_refs_alignment
>  
> This pass will use loop versioning and loop peeling in order to enhance
> @@ -1666,7 +1733,6 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
> dr_vec_info *dr0_info,
>  opt_result
>  vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
>  {
> -  vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>enum dr_alignment_support supportable_dr_alignment;
>dr_vec_info *first_store = NULL;
> @@ -1680,7 +1746,7 @@ vect_enhanc

Re: Support ofsetted parameters in local modref

2020-10-13 Thread Jan Hubicka
> > So I implemented my own pattern matching and I think we should switch 
> > ipa-prop
> > on it incrementally. It is based on similar logic in
> > ao_ref_init_from_ptr_and_size.
> 
> So instead of re-inventing the wheel what about splitting out a
> common helper instead?

It was my original idea. However it is not completely trivial:
ao_ref_init_from_ptr_and_size does bit something else since it is trying
to keep the original ref inside ADDR_EXPR (if one is found) to feed it
into oracle rather then stripping them all.  If refs are present it does
not need to build MEM_REF.

static void
ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
bool range_known,
poly_int64 offset,
poly_int64 size,
poly_int64 max_size)
{
  poly_int64 t, extra_offset = 0;

  ref->ref = NULL_TREE;
  if (TREE_CODE (ptr) == SSA_NAME)
{
  gimple *stmt = SSA_NAME_DEF_STMT (ptr);
  if (gimple_assign_single_p (stmt)
  && gimple_assign_rhs_code (stmt) == ADDR_EXPR)
ptr = gimple_assign_rhs1 (stmt);
  else if (is_gimple_assign (stmt)
   && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
   && ptrdiff_tree_p (gimple_assign_rhs2 (stmt), &extra_offset))
{
  ptr = gimple_assign_rhs1 (stmt);
  extra_offset *= BITS_PER_UNIT;
}
}

  if (TREE_CODE (ptr) == ADDR_EXPR)
{
  ref->base = get_addr_base_and_unit_offset (TREE_OPERAND (ptr, 0), &t);
  if (ref->base)
ref->offset = BITS_PER_UNIT * t;
  else
{
  range_known = false;
  ref->offset = 0;
  ref->base = get_base_address (TREE_OPERAND (ptr, 0));
}
}
  else
{
  gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)));
  ref->base = build2 (MEM_REF, char_type_node,
  ptr, null_pointer_node);
  ref->offset = 0;
}

I could add a flag if I am looking for ADDR_EXPR or just the base
pointer but it seemed more for incremental change.  The new helper
should be immediately useful for ipa-modref, ipa-prop and
ipa-polymorphic-call though.

Honza


[Patch, committed] nvptx/mkoffload.c: Add missing fclose

2020-10-13 Thread Tobias Burnus

When looking at nvptx/mkoffload.c, I spotted a missing fclose.

Committed as obvious.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 35103c6d8e3ee24a2497888bb1f7ef37327ace2b
Author: Tobias Burnus 
Date:   Tue Oct 13 11:54:26 2020 +0200

nvptx/mkoffload.c: Add missing fclose

gcc/ChangeLog:

* config/nvptx/mkoffload.c (main): Add missing fclose (in).

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 4fecb2b9cfe..7c43e617dcf 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -589,16 +589,17 @@ main (int argc, char **argv)
   xputenv (concat ("COMPILER_PATH=", cpath, NULL));
   xputenv (concat ("LIBRARY_PATH=", lpath, NULL));
 
   in = fopen (ptx_name, "r");
   if (!in)
 	fatal_error (input_location, "cannot open intermediate ptx file");
 
   process (in, out);
+  fclose (in);
 }
 
   fclose (out);
 
   compile_native (ptx_cfile_name, outname, collect_gcc, fPIC, fpic);
 
   return 0;
 }



Re: Support ofsetted parameters in local modref

2020-10-13 Thread Richard Biener
On Tue, 13 Oct 2020, Jan Hubicka wrote:

> Hi,
> this patch makes local modref to track parameters that are passed through with
> and adjustment (via pointer_plus or addr_expr of mem_ref).  I intended to 
> re-use
> logic in ipa-prop, but it turns out that it is weird:
>   if (TREE_CODE (op1) != ADDR_EXPR)
> return;
>   op1 = TREE_OPERAND (op1, 0);
>   if (TREE_CODE (TREE_TYPE (op1)) != RECORD_TYPE)
> return;
>   base = get_ref_base_and_extent_hwi (op1, &offset, &size, &reverse);
>   offset_int mem_offset;
>   if (!base
>   || TREE_CODE (base) != MEM_REF
>   || !mem_ref_offset (base).is_constant (&mem_offset))
> return;
>   offset += mem_offset.to_short_addr () * BITS_PER_UNIT;
>   ssa = TREE_OPERAND (base, 0);
>   if (TREE_CODE (ssa) != SSA_NAME
>   || !SSA_NAME_IS_DEFAULT_DEF (ssa)
>   || offset < 0)
> return;
> 
>   /* Dynamic types are changed in constructors and destructors.  */
>   index = ipa_get_param_decl_index (info, SSA_NAME_VAR (ssa));
>   if (index >= 0 && param_type && POINTER_TYPE_P (param_type))
> ipa_set_ancestor_jf (jfunc, offset,  index,
>parm_ref_data_pass_through_p (fbi, index, call, ssa));
> 
> First it does not handle POINTER_PLUS that is quite common i.e. for
> 
> test (int *a)
> {
>   test2 (a+1);
> }
> 
> and also it restrict to access paths that starts by RECORD_TYPE.  This seems
> not to make sense since all ADDR_EXPRs of refs are just pointer adjustements.
> I think it is leftover from time when ancestor was meant to track C++ type
> hiearchy.
> 
> Next it also does refuse negative offset for not very apparent reason (this
> happens in multiple inheritance in C++).
> 
> On tramp3d it is also common that the parameter ADDR_EXPR happens in
> separate statement rather than call itself (i.e. no forward propagation
> is done)
> 
> Finally it works on bits witout overflow check and uses HOST_WIDE_INT
> instead of poly_int64 that I think would be right datatype to accumulate
> offsets these days.
> 
> So I implemented my own pattern matching and I think we should switch ipa-prop
> on it incrementally. It is based on similar logic in
> ao_ref_init_from_ptr_and_size.

So instead of re-inventing the wheel what about splitting out a
common helper instead?

> I support chained statements since they happen
> in tramp3d after early opts.
> 
> Bootstrapped/regtested x86_64-linux.  Also ltobootstrapped and I am waiting
> for cc1plus stats to finish.  OK?
> 
> gcc/ChangeLog:
> 
> 2020-10-13  Jan Hubicka  
> 
>   * ipa-modref.c (merge_call_side_effects): Use
>   unadjusted_ptr_and_unit_offset.
>   * ipa-prop.c (unadjusted_ptr_and_unit_offset): Declare.
>   * ipa-prop.h (unadjusted_ptr_and_unit_offset): New function.
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index 4f86b9ccea1..92119eb6fe3 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -531,6 +532,10 @@ merge_call_side_effects (modref_summary *cur_summary,
>for (unsigned i = 0; i < gimple_call_num_args (stmt); i++)
>  {
>tree op = gimple_call_arg (stmt, i);
> +  bool offset_known;
> +  poly_int64 offset;
> +
> +  offset_known = unadjusted_ptr_and_unit_offset (op, &op, &offset);
>if (TREE_CODE (op) == SSA_NAME
> && SSA_NAME_IS_DEFAULT_DEF (op)
> && TREE_CODE (SSA_NAME_VAR (op)) == PARM_DECL)
> @@ -547,15 +552,23 @@ merge_call_side_effects (modref_summary *cur_summary,
> index++;
>   }
> parm_map[i].parm_index = index;
> -   parm_map[i].parm_offset_known = true;
> -   parm_map[i].parm_offset = 0;
> +   parm_map[i].parm_offset_known = offset_known;
> +   parm_map[i].parm_offset = offset;
>   }
>else if (points_to_local_or_readonly_memory_p (op))
>   parm_map[i].parm_index = -2;
>else
>   parm_map[i].parm_index = -1;
>if (dump_file)
> - fprintf (dump_file, " %i", parm_map[i].parm_index);
> + {
> +   fprintf (dump_file, " %i", parm_map[i].parm_index);
> +   if (parm_map[i].parm_offset_known)
> + {
> +   fprintf (dump_file, " offset:");
> +   print_dec ((poly_int64_pod)parm_map[i].parm_offset,
> +  dump_file, SIGNED);
> + }
> + }
>  }
>if (dump_file)
>  fprintf (dump_file, "\n");
> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index 2d09d913051..fe317f56e02 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -1222,6 +1222,72 @@ load_from_unmodified_param_or_agg (struct 
> ipa_func_body_info *fbi,
>return index;
>  }
>  
> +/* Walk pointer adjustemnts from OP (such as POINTER_PLUS and ADDR_EXPR)
> +   to find original pointer.  Initialize RET to the pointer which results 
> from
> +   the walk.
> +   If offset is known return true and initialize OFFSET_RET.  */
> +
> +bool
> +unadjusted_ptr_and_unit_offset (tree op, tree *ret, poly_int64 *offset_ret)
> +{
> +  poly_int64 offset = 0;
> +  bool offset_known = true;
> +
> +  while 

[PATCH] Remove STMT_VINFO_SAME_ALIGN_REFS

2020-10-13 Thread Richard Biener
This makes the only consumer of STMT_VINFO_SAME_ALIGN_REFS, the
loop peeling for alignment code, use locally computed data and
then removes STMT_VINFO_SAME_ALIGN_REFS and its computation.
The main benefit is reducing the dependence computation load
by not asking for read-read dependences and getting rid of
DR related info from stmt_vec_info.

It also adjusts the auto_vec<> move CTOR/assignment so you
can write

  auto_vec<..> foo = bar.copy ();

and have foo own the generated copy.  I've added an assert
to disallow auto_vec at the RHS since I couldn't think
of a way to statically disallow this (easily circumvented by
the places we pass a auto_vec to a vec<> & parameter).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2020-10-13  Richard Biener  

PR tree-optimization/97382
* tree-vectorizer.h (_stmt_vec_info::same_align_refs): Remove.
(STMT_VINFO_SAME_ALIGN_REFS): Likewise.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Do not
allocate STMT_VINFO_SAME_ALIGN_REFS.
(vec_info::free_stmt_vec_info): Do not release
STMT_VINFO_SAME_ALIGN_REFS.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences):
Do not compute self and read-read dependences.
(vect_dr_aligned_if_related_peeled_dr_is): New helper.
(vect_dr_aligned_if_peeled_dr_is): Likewise.
(vect_update_misalignment_for_peel): Use it instead of
iterating over STMT_VINFO_SAME_ALIGN_REFS.
(dr_align_group_sort_cmp): New function.
(vect_enhance_data_refs_alignment): Count the number of
same aligned refs here and elide uses of STMT_VINFO_SAME_ALIGN_REFS.
(vect_find_same_alignment_drs): Remove.
(vect_analyze_data_refs_alignment): Do not call it.
* vec.h (auto_vec::auto_vec): Adjust CTOR to take
a vec<>&&, assert it isn't using auto storage.
(auto_vec& operator=): Apply a similar change.

* gcc.dg/vect/no-vfa-vect-dv-2.c: Remove same align dump
scanning.
* gcc.dg/vect/vect-103.c: Likewise.
* gcc.dg/vect/vect-91.c: Likewise.
* gfortran.dg/vect/vect-4.f90: Likewise.
---
 gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c |   2 -
 gcc/testsuite/gcc.dg/vect/vect-103.c |   2 -
 gcc/testsuite/gcc.dg/vect/vect-91.c  |   2 -
 gcc/testsuite/gfortran.dg/vect/vect-4.f90|   1 -
 gcc/tree-vect-data-refs.c| 233 +++
 gcc/tree-vectorizer.c|   2 -
 gcc/tree-vectorizer.h|   5 -
 gcc/vec.h|   6 +-
 8 files changed, 143 insertions(+), 110 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c 
b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
index dcb53701795..8cc69ab22c5 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
@@ -75,5 +75,3 @@ int main ()
 
 /* The initialization induction loop (with aligned access) is also vectorized. 
 */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "accesses have the same alignment." 2 
"vect" { target { { vect_aligned_arrays } && {! vect_sizes_32B_16B} } } } } */
-/* { dg-final { scan-tree-dump-times "accesses have the same alignment." 1 
"vect" { target { {! vect_aligned_arrays } && {vect_sizes_32B_16B} } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-103.c 
b/gcc/testsuite/gcc.dg/vect/vect-103.c
index 2a4510482d4..d03562f7cdd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-103.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-103.c
@@ -58,5 +58,3 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "accesses have the same alignment" 1 
"vect" } } */
-
diff --git a/gcc/testsuite/gcc.dg/vect/vect-91.c 
b/gcc/testsuite/gcc.dg/vect/vect-91.c
index 91264d9841d..8983c7da870 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-91.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-91.c
@@ -68,6 +68,4 @@ main3 ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect" { xfail 
vect_no_int_add } } } */
-/* { dg-final { scan-tree-dump-times "accesses have the same alignment." 3 
"vect" { target { { vect_aligned_arrays } && {! vect_sizes_32B_16B} } } } } */
-/* { dg-final { scan-tree-dump-times "accesses have the same alignment." 2 
"vect" { target { {! vect_aligned_arrays } && {vect_sizes_32B_16B} } } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
versioning" 3 "vect" {target { {! vector_alignment_reachable} && {! 
vect_hw_misalign} } } } } */
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-4.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-4.f90
index c2eeafd3900..9c067c69cf3 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-4.f90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-4.f90
@@ -13,4 +13,3 @@ Y = Y + A * X
 END
 
 ! { dg-final { scan-tree-dump-times "vectoriz

RE: [PATCH][GCC-10 backport] arm: Fix fp16 move patterns for base MVE

2020-10-13 Thread Przemyslaw Wirkus via Gcc-patches
> > Backport of commit 6abd428605e3a279e533fde1cecbc9735ce03b66
> > from master branch.
> >
> > OK for gcc-10 ?

Cherry-picked and applied: commit eb061188276d0ac9ec53fd5619c578a6bce6b129

> Ok.
> Thanks,
> Kyrill


[PATCH] RISC-V: Add support for -mcpu option.

2020-10-13 Thread Kito Cheng
 - The behavior of -mcpu basically equal to -march plus -mtune, but it
   has lower priority than -march and -mtune.

 - The behavior and available options has sync with clang except we don't add
   few LLVM specific value, and add more sifive processor to the list.

 - -mtune also accept all available options of -mcpu, and use it setting.

gcc/ChangeLog:

* common/config/riscv/riscv-common.c (riscv_cpu_tables): New.
(riscv_arch_str): Return empty string if current_subset_list
is NULL.
(riscv_find_cpu): New.
(riscv_handle_option): Verify option value of -mcpu.
(riscv_expand_arch): Using std::string.
(riscv_default_mtune): New.
(riscv_expand_arch_from_cpu): Ditto.
* config/riscv/riscv-cores.def: New.
* config/riscv/riscv-protos.h (riscv_find_cpu): New.
(riscv_cpu_info): New.
* config/riscv/riscv.c (riscv_tune_info): Rename ...
(riscv_tune_param): ... to this.
(riscv_cpu_info): Rename ...
(riscv_tune_info): ... to this.
(tune_info): Rename ...
(tune_param): ... to this.
(rocket_tune_info): Update data type name.
(sifive_7_tune_info): Ditto.
(optimize_size_tune_info): Ditto.
(riscv_cpu_info_table): Rename ...
(riscv_tune_info_table): ... to this.
(riscv_parse_cpu): Rename ...
(riscv_parse_tune): ... to this, and translate valid -mcpu option to
-mtune option.
(riscv_rtx_costs): Rename tune_info to tune_param.
(riscv_class_max_nregs): Ditto.
(riscv_memory_move_cost): Ditto.
(riscv_init_machine_status): Use value of -mcpu if -mtune is not
given, and rename tune_info to tune_param.
* config/riscv/riscv.h (riscv_expand_arch_from_cpu): New.
(riscv_default_mtune): Ditto.
(EXTRA_SPEC_FUNCTIONS): Add riscv_expand_arch_from_cpu and
riscv_default_mtune.
(OPTION_DEFAULT_SPECS): Handle default value of -march/-mabi.
(DRIVER_SELF_SPECS): Expand -march from -mcpu if -march is not
given.
* config/riscv/riscv.opt (-mcpu): New option.
* config/riscv/t-riscv ($(common_out_file)): Add
riscv-cores.def to dependency.
* doc/invoke.texi (RISC-V Option): Add -mcpu, and update the
description of default value for -mtune and -march.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-1.c: New.
* gcc.target/riscv/mcpu-2.c: Ditto.
* gcc.target/riscv/mcpu-3.c: Ditto.
* gcc.target/riscv/mcpu-4.c: Ditto.
* gcc.target/riscv/mcpu-5.c: Ditto.
* gcc.target/riscv/mcpu-6.c: Ditto.
* gcc.target/riscv/mcpu-7.c: Ditto.
---
 gcc/common/config/riscv/riscv-common.c  | 91 +--
 gcc/config/riscv/riscv-cores.def| 49 +
 gcc/config/riscv/riscv-protos.h | 14 
 gcc/config/riscv/riscv.c| 97 +
 gcc/config/riscv/riscv.h| 25 +--
 gcc/config/riscv/riscv.opt  |  4 +
 gcc/config/riscv/t-riscv|  2 +
 gcc/doc/invoke.texi | 24 +-
 gcc/testsuite/gcc.target/riscv/mcpu-1.c | 18 +
 gcc/testsuite/gcc.target/riscv/mcpu-2.c | 18 +
 gcc/testsuite/gcc.target/riscv/mcpu-3.c | 18 +
 gcc/testsuite/gcc.target/riscv/mcpu-4.c | 18 +
 gcc/testsuite/gcc.target/riscv/mcpu-5.c | 19 +
 gcc/testsuite/gcc.target/riscv/mcpu-6.c | 10 +++
 gcc/testsuite/gcc.target/riscv/mcpu-7.c | 10 +++
 15 files changed, 356 insertions(+), 61 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-cores.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-7.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 82c5154b6118..4b6bdf8685d7 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -60,6 +60,14 @@ riscv_implied_info_t riscv_implied_info[] =
   {NULL, NULL}
 };
 
+static const riscv_cpu_info riscv_cpu_tables[] =
+{
+#define RISCV_CORE(CORE_NAME, ARCH, TUNE) \
+{CORE_NAME, ARCH, TUNE},
+#include "../../../config/riscv/riscv-cores.def"
+{NULL, NULL, NULL}
+};
+
 /* Subset list.  */
 class riscv_subset_list
 {
@@ -604,8 +612,10 @@ fail:
 std::string
 riscv_arch_str (bool version_p)
 {
-  gcc_assert (current_subset_list);
-  return current_subset_list->to_string (version_p);
+  if (current_subset_list)
+return current_subset_list->to_string (version_p);
+  else
+return std::string();
 }
 
 /* Parse a RISC-V ISA string into an option mask.  Must clear or set all arch
@@ -653

RE: [PATCH][GCC-10 backport] arm: Fix ICEs in no-literal-pool.c on MVE [PR97251]

2020-10-13 Thread Przemyslaw Wirkus via Gcc-patches
> > This patch is a backport of PR97251 fix already commited to master.
> >
> > OK for gcc-10 branch ?

Cherry-picked and applied: commit d121b3259b77203e62402024add1538c1bdf5fdf

> Ok.
> Thanks,
> Kyrill


RE: [PATCH][GCC-10 backport] arm: Fix fp16 move patterns for base MVE

2020-10-13 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Przemyslaw Wirkus 
> Sent: 13 October 2020 10:56
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Kyrylo Tkachov ; Ramana Radhakrishnan
> 
> Subject: [PATCH][GCC-10 backport] arm: Fix fp16 move patterns for base
> MVE
> 
> Backport of commit 6abd428605e3a279e533fde1cecbc9735ce03b66
> from master branch.
> 
> OK for gcc-10 ?

Ok.
Thanks,
Kyrill

> 
> This patch fixes ICEs in gcc.dg/torture/float16-basic.c for
> -march=armv8.1-m.main+mve -mfloat-abi=hard.  The problem was
> that an fp16 argument was (rightly) being passed in FPRs,
> but the fp16 move patterns only handled GPRs.  LRA then cycled
> trying to look for a way of handling the FPR.
> 
> It looks like there are three related problems here:
> 
> (1) We're using the wrong fp16 move pattern for base MVE.
> *mov_vfp_16 (the pattern we use for +mve.fp)
> works for base MVE too.
> 
> (2) The fp16 MVE load and store patterns are separate from the
> main move patterns.  The loads and stores should instead be
> alternatives of the main move patterns, so that LRA knows
> what to do with pseudo registers that become stack slots.
> 
> (3) The range restrictions for the loads and stores were wrong
> for fp16: we were enforcing a multiple of 4 in [-255*4, 255*4]
> instead of a multiple of 2 in [-255*2, 255*2].
> 
> (2) came from a patch to prevent writeback being used for MVE.
> That patch also added a Uj constraint to enforce the correct
> memory types for MVE.  I think the simplest fix is therefore to merge
> the loads and stores back into the main pattern and extend the Uj
> constraint so that it acts like Um for non-MVE.
> 
> The testcase for that patch was mve-vldstr16-no-writeback.c, whose
> main function is:
> 
> void
> fn1 (__fp16 *pSrc)
> {
>   __fp16 high;
>   __fp16 *pDst = 0;
>   unsigned i;
>   for (i = 0;; i++)
> if (pSrc[i])
>   pDst[i] = high;
> }
> 
> Fixing (2) causes the store part to fail, not because we're using
> writeback, but because we decide to use GPRs to store high (which is
> uninitialised, and so gets replaced with zero).  This patch therefore
> adds some scan-assembler-nots instead.  (I wondered about changing the
> testcase to initialise high, but that seemed like a bad idea for
> a regression test.)
> 
> For (3): MVE seems to be the only thing to use
> arm_coproc_mem_operand_wb
> (and its various interfaces) for 16-bit scalars: the Neon patterns only
> use it for 32-bit scalars.
> 
> I've added new tests to try the various FPR alternatives of the
> move patterns.  The range of offsets that GCC uses for FPR loads
> and stores is the intersection of the range allowed for GPRs and
> FPRs, so the tests include GPR<->memory tests as well.
> 
> The fp32 and fp64 tests already pass, they're just there for
> completeness.
> 
> gcc/
>   * config/arm/arm-protos.h
> (arm_mve_mode_and_operands_type_check):
>   Delete.
>   * config/arm/arm.c (arm_coproc_mem_operand_wb): Use a scale
> factor
>   of 2 rather than 4 for 16-bit modes.
>   (arm_mve_mode_and_operands_type_check): Delete.
>   * config/arm/constraints.md (Uj): Allow writeback for Neon,
>   but continue to disallow it for MVE.
>   * config/arm/arm.md (*arm32_mov):
> Add !TARGET_HAVE_MVE.
>   * config/arm/vfp.md (*mov_load_vfp_hf16, *mov_store_vfp_hf16):
> Fold
>   back into...
>   (*mov_vfp_16): ...here but use Uj for the FPR
> memory
>   constraints.  Use for base MVE too.
> 
> gcc/testsuite/
>   * gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: Allow
>   the store to use GPRs instead of FPRs.  Add scan-assembler-nots
>   for writeback.
>   * gcc.target/arm/armv8_1m-fp16-move-1.c: New test.
>   * gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise.
>   * gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise.


RE: [PATCH][GCC-10 backport] arm: Fix ICEs in no-literal-pool.c on MVE [PR97251]

2020-10-13 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Przemyslaw Wirkus 
> Sent: 13 October 2020 10:56
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; ni...@redhat.com;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH][GCC-10 backport] arm: Fix ICEs in no-literal-pool.c on MVE
> [PR97251]
> 
> This patch is a backport of PR97251 fix already commited to master.
> 
> OK for gcc-10 branch ?

Ok.
Thanks,
Kyrill

> 
> This patch fixes ICEs when compiling
> gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool.c with
> -mfp16-format=ieee -mfloat-abi=hard -march=armv8.1-m.main+mve
> -mpure-code.
> 
> The existing conditions in the movsf/movdf expanders (as well as the
> no_literal_pool patterns) were too restrictive, requiring
> TARGET_HARD_FLOAT instead of TARGET_VFP_BASE, which caused
> unrecognised
> insns when compiling this testcase with integer MVE and -mpure-code.
> 
> gcc/:
> 
>   PR target/97251
>   * config/arm/arm.md (movsf): Relax TARGET_HARD_FLOAT to
>   TARGET_VFP_BASE.
>   (movdf): Likewise.
>   * config/arm/vfp.md (no_literal_pool_df_immediate): Likewise.
>   (no_literal_pool_sf_immediate): Likewise.


Re: [r11-3641 Regression] FAIL: gcc.dg/torture/pta-ptrarith-1.c -Os scan-tree-dump alias "ESCAPED = {[^\n}]* i f [^\n}]*}" on Linux/x86_64 (-m32 -march=cascadelake)

2020-10-13 Thread Christophe Lyon via Gcc-patches
On Mon, 12 Oct 2020 at 18:43, Segher Boessenkool
 wrote:
>
> On Mon, Oct 12, 2020 at 03:26:38PM +0200, Christophe Lyon wrote:
> > That's why I kept the reporting part manual on my side: once you know
> > which commit introduced a failure/regression (either via bisect, or by
> > some other way), it's not always easy to identify the gcc-patches
> > message to which you want to reply.
>
> But it *should* be: the check-in subject should be in the patch mail, or
> failing that, at least the changelog entries should be!

Well, for instance I've just reported that a newly introduced testcase
is failing on arm, aarch64 and other platforms.

It's easy to know which commit introduced the problem, since it's a
new test: r11-3827.

When looking for the email thread to which I want to send a reply, I
search my mailbox
for "Wstringop-overflow-47.c", which points me to a thread titled
"correct handling of indices into arrays with elements larger than 1
(PR c++/96511)"
with several iterations, and several sets of patches.

The offending commit has "Generalize compute_objsize to return maximum
size/offset instead of failing (PR middle-end/97023)"
as title, so it's not obvious that this is really the right thread
(and since the patches were attached, gmail does not display them
inline, so I have to open them and check if the one I'm looking for is
really there)

It's not super-long to do, but I feel it's more effort than should be
needed for such a simple case.

>
> > > > What do people think about this kind of followups?  Is this appropriate
> > > > for this mailing list?
> > >
> > > Please just use bugzilla.  And report bugs there the way they should be
> > > reported: full command lines, full description of the errors, and
> > > everything else needed to easily reproduce the problem.
> > >
> > It seems some people prefer such regressions reports in bugzilla,
> > others in gcc-patches@.
>
> If it will be resolved quickly, and by just telling the author, email is
> fine of course.  Otherwise, you need bugzilla.
>
In the above case, I was tempted to open a bugzilla, I would have had
to dig less in my email archives, but since many targets are concerned,
I hope it's obvious enough that the fix will be easy. YMMV.

> > In general when I report a regression I noticed in the GCC testsuite,
> > I tend to assume that the testname and GCC configure options are
> > sufficient for a usual contributor to reproduce.
> > Not sure if it matches "full" and "easily" in your mind?
>
> Tests are often ran with multiple sets of options.  If you give enough
> info that people can reproduce your configuration (hint: most bug
> reports do *not*), all is fine of course.  But in general we *do* need
> all info (as documented in the bug reporting instructions), or we get
> a frustrating "I cannot reproduce this" game.
>
> > With all the automated builds where the build dir is removed from the
> > server at the end whatever the result, it does take time if I have to
> > reproduce the problem manually before reporting.
>
> Yes, and it is *easier* to reproduce for you than for other people!
>
> > > *Actually* following up to the patch mail could be useful (but you can
> > > than just point to the bugzilla).  Sending spam to gcc-patches@ is not
> > > useful for most users of the list.
>
> ^^^ Still my main point.
>
>
> Segher


[PATCH][GCC-10 backport] arm: Fix fp16 move patterns for base MVE

2020-10-13 Thread Przemyslaw Wirkus via Gcc-patches
Backport of commit 6abd428605e3a279e533fde1cecbc9735ce03b66
from master branch.

OK for gcc-10 ?

This patch fixes ICEs in gcc.dg/torture/float16-basic.c for
-march=armv8.1-m.main+mve -mfloat-abi=hard.  The problem was
that an fp16 argument was (rightly) being passed in FPRs,
but the fp16 move patterns only handled GPRs.  LRA then cycled
trying to look for a way of handling the FPR.

It looks like there are three related problems here:

(1) We're using the wrong fp16 move pattern for base MVE.
*mov_vfp_16 (the pattern we use for +mve.fp)
works for base MVE too.

(2) The fp16 MVE load and store patterns are separate from the
main move patterns.  The loads and stores should instead be
alternatives of the main move patterns, so that LRA knows
what to do with pseudo registers that become stack slots.

(3) The range restrictions for the loads and stores were wrong
for fp16: we were enforcing a multiple of 4 in [-255*4, 255*4]
instead of a multiple of 2 in [-255*2, 255*2].

(2) came from a patch to prevent writeback being used for MVE.
That patch also added a Uj constraint to enforce the correct
memory types for MVE.  I think the simplest fix is therefore to merge
the loads and stores back into the main pattern and extend the Uj
constraint so that it acts like Um for non-MVE.

The testcase for that patch was mve-vldstr16-no-writeback.c, whose
main function is:

void
fn1 (__fp16 *pSrc)
{
  __fp16 high;
  __fp16 *pDst = 0;
  unsigned i;
  for (i = 0;; i++)
if (pSrc[i])
  pDst[i] = high;
}

Fixing (2) causes the store part to fail, not because we're using
writeback, but because we decide to use GPRs to store high (which is
uninitialised, and so gets replaced with zero).  This patch therefore
adds some scan-assembler-nots instead.  (I wondered about changing the
testcase to initialise high, but that seemed like a bad idea for
a regression test.)

For (3): MVE seems to be the only thing to use arm_coproc_mem_operand_wb
(and its various interfaces) for 16-bit scalars: the Neon patterns only
use it for 32-bit scalars.

I've added new tests to try the various FPR alternatives of the
move patterns.  The range of offsets that GCC uses for FPR loads
and stores is the intersection of the range allowed for GPRs and
FPRs, so the tests include GPR<->memory tests as well.

The fp32 and fp64 tests already pass, they're just there for
completeness.

gcc/
* config/arm/arm-protos.h (arm_mve_mode_and_operands_type_check):
Delete.
* config/arm/arm.c (arm_coproc_mem_operand_wb): Use a scale factor
of 2 rather than 4 for 16-bit modes.
(arm_mve_mode_and_operands_type_check): Delete.
* config/arm/constraints.md (Uj): Allow writeback for Neon,
but continue to disallow it for MVE.
* config/arm/arm.md (*arm32_mov): Add !TARGET_HAVE_MVE.
* config/arm/vfp.md (*mov_load_vfp_hf16, *mov_store_vfp_hf16): Fold
back into...
(*mov_vfp_16): ...here but use Uj for the FPR memory
constraints.  Use for base MVE too.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: Allow
the store to use GPRs instead of FPRs.  Add scan-assembler-nots
for writeback.
* gcc.target/arm/armv8_1m-fp16-move-1.c: New test.
* gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise.


mve_move_backport.patch
Description: mve_move_backport.patch


[PATCH][GCC-10 backport] arm: Fix ICEs in no-literal-pool.c on MVE [PR97251]

2020-10-13 Thread Przemyslaw Wirkus via Gcc-patches
This patch is a backport of PR97251 fix already commited to master.

OK for gcc-10 branch ?

This patch fixes ICEs when compiling
gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool.c with
-mfp16-format=ieee -mfloat-abi=hard -march=armv8.1-m.main+mve
-mpure-code.

The existing conditions in the movsf/movdf expanders (as well as the
no_literal_pool patterns) were too restrictive, requiring
TARGET_HARD_FLOAT instead of TARGET_VFP_BASE, which caused unrecognised
insns when compiling this testcase with integer MVE and -mpure-code.

gcc/:

PR target/97251
* config/arm/arm.md (movsf): Relax TARGET_HARD_FLOAT to
TARGET_VFP_BASE.
(movdf): Likewise.
* config/arm/vfp.md (no_literal_pool_df_immediate): Likewise.
(no_literal_pool_sf_immediate): Likewise.


PR97251_backport.patch
Description: PR97251_backport.patch


Re: Support ofsetted parameters in local modref

2020-10-13 Thread Jan Hubicka
> Hi,
> this patch makes local modref to track parameters that are passed through with
> and adjustment (via pointer_plus or addr_expr of mem_ref).  I intended to 
> re-use
> logic in ipa-prop, but it turns out that it is weird:
>   if (TREE_CODE (op1) != ADDR_EXPR)
> return;
>   op1 = TREE_OPERAND (op1, 0);
>   if (TREE_CODE (TREE_TYPE (op1)) != RECORD_TYPE)
> return;
>   base = get_ref_base_and_extent_hwi (op1, &offset, &size, &reverse);
>   offset_int mem_offset;
>   if (!base
>   || TREE_CODE (base) != MEM_REF
>   || !mem_ref_offset (base).is_constant (&mem_offset))
> return;
>   offset += mem_offset.to_short_addr () * BITS_PER_UNIT;
>   ssa = TREE_OPERAND (base, 0);
>   if (TREE_CODE (ssa) != SSA_NAME
>   || !SSA_NAME_IS_DEFAULT_DEF (ssa)
>   || offset < 0)
> return;
> 
>   /* Dynamic types are changed in constructors and destructors.  */
>   index = ipa_get_param_decl_index (info, SSA_NAME_VAR (ssa));
>   if (index >= 0 && param_type && POINTER_TYPE_P (param_type))
> ipa_set_ancestor_jf (jfunc, offset,  index,
>parm_ref_data_pass_through_p (fbi, index, call, ssa));
> 
> First it does not handle POINTER_PLUS that is quite common i.e. for
> 
> test (int *a)
> {
>   test2 (a+1);
> }
> 
> and also it restrict to access paths that starts by RECORD_TYPE.  This seems
> not to make sense since all ADDR_EXPRs of refs are just pointer adjustements.
> I think it is leftover from time when ancestor was meant to track C++ type
> hiearchy.
> 
> Next it also does refuse negative offset for not very apparent reason (this
> happens in multiple inheritance in C++).
> 
> On tramp3d it is also common that the parameter ADDR_EXPR happens in
> separate statement rather than call itself (i.e. no forward propagation
> is done)
> 
> Finally it works on bits witout overflow check and uses HOST_WIDE_INT
> instead of poly_int64 that I think would be right datatype to accumulate
> offsets these days.
> 
> So I implemented my own pattern matching and I think we should switch ipa-prop
> on it incrementally. It is based on similar logic in
> ao_ref_init_from_ptr_and_size. I support chained statements since they happen
> in tramp3d after early opts.
> 
> Bootstrapped/regtested x86_64-linux.  Also ltobootstrapped and I am waiting
> for cc1plus stats to finish.  OK?

So stats finished, for record:
Alias oracle query stats:
  refs_may_alias_p: 64156070 disambiguations, 74373175 queries
  ref_maybe_used_by_call_p: 142785 disambiguations, 65060092 queries
  call_may_clobber_ref_p: 22991 disambiguations, 28817 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 37222 queries
  nonoverlapping_refs_since_match_p: 19427 disambiguations, 55940 must 
overlaps, 76158 queries
  aliasing_component_refs_p: 54747 disambiguations, 760197 queries
  TBAA oracle: 23613733 disambiguations 55949704 queries
   16074768 are in alias set 0
   10619115 queries asked about the same object
   125 queries asked about the same alias set
   0 access volatile
   3995285 are dependent in the DAG
   1646678 are aritificially in conflict with void *

Modref stats:
  modref use: 11758 disambiguations, 40510 queries
  modref clobber: 1505226 disambiguations, 1825004 queries
  3894419 tbaa queries (2.133924 per modref query)
  621924 base compares (0.340780 per modref query)

So about 8% increase in clobber disambiguation rate. Use disambiguation
seems unchanged.

Honza

> 
> gcc/ChangeLog:
> 
> 2020-10-13  Jan Hubicka  
> 
>   * ipa-modref.c (merge_call_side_effects): Use
>   unadjusted_ptr_and_unit_offset.
>   * ipa-prop.c (unadjusted_ptr_and_unit_offset): Declare.
>   * ipa-prop.h (unadjusted_ptr_and_unit_offset): New function.
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index 4f86b9ccea1..92119eb6fe3 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -531,6 +532,10 @@ merge_call_side_effects (modref_summary *cur_summary,
>for (unsigned i = 0; i < gimple_call_num_args (stmt); i++)
>  {
>tree op = gimple_call_arg (stmt, i);
> +  bool offset_known;
> +  poly_int64 offset;
> +
> +  offset_known = unadjusted_ptr_and_unit_offset (op, &op, &offset);
>if (TREE_CODE (op) == SSA_NAME
> && SSA_NAME_IS_DEFAULT_DEF (op)
> && TREE_CODE (SSA_NAME_VAR (op)) == PARM_DECL)
> @@ -547,15 +552,23 @@ merge_call_side_effects (modref_summary *cur_summary,
> index++;
>   }
> parm_map[i].parm_index = index;
> -   parm_map[i].parm_offset_known = true;
> -   parm_map[i].parm_offset = 0;
> +   parm_map[i].parm_offset_known = offset_known;
> +   parm_map[i].parm_offset = offset;
>   }
>else if (points_to_local_or_readonly_memory_p (op))
>   parm_map[i].parm_index = -2;
>else
>   parm_map[i].parm_index = -1;
>if (dump_file)
> - fprintf (dump_file, " %i", parm_map[i].parm_index);
> +   

Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-10-13 Thread Christophe Lyon via Gcc-patches
On Tue, 29 Sep 2020 at 00:02, Martin Sebor via Gcc-patches
 wrote:
>
> On 9/25/20 11:17 PM, Jason Merrill wrote:
> > On 9/22/20 4:05 PM, Martin Sebor wrote:
> >> The rebased and retested patches are attached.
> >>
> >> On 9/21/20 3:17 PM, Martin Sebor wrote:
> >>> Ping:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html
> >>>
> >>> (I'm working on rebasing the patch on top of the latest trunk which
> >>> has changed some of the same code but it'd be helpful to get a go-
> >>> ahead on substance the changes.  I don't expect the rebase to
> >>> require any substantive modifications.)
> >>>
> >>> Martin
> >>>
> >>> On 9/14/20 4:01 PM, Martin Sebor wrote:
>  On 9/4/20 11:14 AM, Jason Merrill wrote:
> > On 9/3/20 2:44 PM, Martin Sebor wrote:
> >> On 9/1/20 1:22 PM, Jason Merrill wrote:
> >>> On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:
>  -Wplacement-new handles array indices and pointer offsets the same:
>  by adjusting them by the size of the element.  That's correct for
>  the latter but wrong for the former, causing false positives when
>  the element size is greater than one.
> 
>  In addition, the warning doesn't even attempt to handle arrays of
>  arrays.  I'm not sure if I forgot or if I simply didn't think of
>  it.
> 
>  The attached patch corrects these oversights by replacing most
>  of the -Wplacement-new code with a call to compute_objsize which
>  handles all this correctly (plus more), and is also better tested.
>  But even compute_objsize has bugs: it trips up while converting
>  wide_int to offset_int for some pointer offset ranges.  Since
>  handling the C++ IL required changes in this area the patch also
>  fixes that.
> 
>  For review purposes, the patch affects just the middle end.
>  The C++ diff pretty much just removes code from the front end.
> >>>
> >>> The C++ changes are OK.
> >>
> >> Thank you for looking at the rest as well.
> >>
> >>>
>  -compute_objsize (tree ptr, int ostype, access_ref *pref,
>  -bitmap *visited, const vr_values *rvals /* =
>  NULL */)
>  +compute_objsize (tree ptr, int ostype, access_ref *pref, bitmap
>  *visited,
>  +const vr_values *rvals)
> >>>
> >>> This reformatting seems unnecessary, and I prefer to keep the
> >>> comment about the default argument.
> >>
> >> This overload doesn't take a default argument.  (There was a stray
> >> declaration of a similar function at the top of the file that had
> >> one.  I've removed it.)
> >
> > Ah, true.
> >
>  -  if (!size || TREE_CODE (size) != INTEGER_CST)
>  -   return false;
> >>>  >...
> >>>
> >>> You change some failure cases in compute_objsize to return
> >>> success with a maximum range, while others continue to return
> >>> failure. This needs commentary about the design rationale.
> >>
> >> This is too much for a comment in the code but the background is
> >> this: compute_objsize initially returned the object size as a
> >> constant.
> >> Recently, I have enhanced it to return a range to improve warnings
> >> for
> >> allocated objects.  With that, a failure can be turned into
> >> success by
> >> having the function set the range to that of the largest object.
> >> That
> >> should simplify the function's callers and could even improve
> >> the detection of some invalid accesses.  Once this change is made
> >> it might even be possible to change its return type to void.
> >>
> >> The change that caught your eye is necessary to make the function
> >> a drop-in replacement for the C++ front end code which makes this
> >> same assumption.  Without it, a number of test cases that exercise
> >> VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:
> >>
> >>void f (int n)
> >>{
> >>  char a[n];
> >>  new (a - 1) int ();
> >>}
> >>
> >> Changing any of the other places isn't necessary for existing tests
> >> to pass (and I didn't want to introduce too much churn).  But I do
> >> want to change the rest of the function along the same lines at some
> >> point.
> >
> > Please do change the other places to be consistent; better to have
> > more churn than to leave the function half-updated.  That can be a
> > separate patch if you prefer, but let's do it now rather than later.
> 
>  I've made most of these changes in the other patch (also attached).
>  I'm quite happy with the result but it turned out to be a lot more
>  work than either of us expected, mostly due to the amount of testing.
> 
>  I've left a couple of failing cases in place mainly as

Re: [PATCH v2] PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values.

2020-10-13 Thread Eric Botcazou
> Do you mind having a review for that?

Sorry for missing the v2 patch; yes, it looks good to me.

-- 
Eric Botcazou




Support ofsetted parameters in local modref

2020-10-13 Thread Jan Hubicka
Hi,
this patch makes local modref to track parameters that are passed through with
and adjustment (via pointer_plus or addr_expr of mem_ref).  I intended to re-use
logic in ipa-prop, but it turns out that it is weird:
  if (TREE_CODE (op1) != ADDR_EXPR)
return;
  op1 = TREE_OPERAND (op1, 0);
  if (TREE_CODE (TREE_TYPE (op1)) != RECORD_TYPE)
return;
  base = get_ref_base_and_extent_hwi (op1, &offset, &size, &reverse);
  offset_int mem_offset;
  if (!base
  || TREE_CODE (base) != MEM_REF
  || !mem_ref_offset (base).is_constant (&mem_offset))
return;
  offset += mem_offset.to_short_addr () * BITS_PER_UNIT;
  ssa = TREE_OPERAND (base, 0);
  if (TREE_CODE (ssa) != SSA_NAME
  || !SSA_NAME_IS_DEFAULT_DEF (ssa)
  || offset < 0)
return;

  /* Dynamic types are changed in constructors and destructors.  */
  index = ipa_get_param_decl_index (info, SSA_NAME_VAR (ssa));
  if (index >= 0 && param_type && POINTER_TYPE_P (param_type))
ipa_set_ancestor_jf (jfunc, offset,  index,
 parm_ref_data_pass_through_p (fbi, index, call, ssa));

First it does not handle POINTER_PLUS that is quite common i.e. for

test (int *a)
{
  test2 (a+1);
}

and also it restrict to access paths that starts by RECORD_TYPE.  This seems
not to make sense since all ADDR_EXPRs of refs are just pointer adjustements.
I think it is leftover from time when ancestor was meant to track C++ type
hiearchy.

Next it also does refuse negative offset for not very apparent reason (this
happens in multiple inheritance in C++).

On tramp3d it is also common that the parameter ADDR_EXPR happens in
separate statement rather than call itself (i.e. no forward propagation
is done)

Finally it works on bits witout overflow check and uses HOST_WIDE_INT
instead of poly_int64 that I think would be right datatype to accumulate
offsets these days.

So I implemented my own pattern matching and I think we should switch ipa-prop
on it incrementally. It is based on similar logic in
ao_ref_init_from_ptr_and_size. I support chained statements since they happen
in tramp3d after early opts.

Bootstrapped/regtested x86_64-linux.  Also ltobootstrapped and I am waiting
for cc1plus stats to finish.  OK?

gcc/ChangeLog:

2020-10-13  Jan Hubicka  

* ipa-modref.c (merge_call_side_effects): Use
unadjusted_ptr_and_unit_offset.
* ipa-prop.c (unadjusted_ptr_and_unit_offset): Declare.
* ipa-prop.h (unadjusted_ptr_and_unit_offset): New function.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 4f86b9ccea1..92119eb6fe3 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -531,6 +532,10 @@ merge_call_side_effects (modref_summary *cur_summary,
   for (unsigned i = 0; i < gimple_call_num_args (stmt); i++)
 {
   tree op = gimple_call_arg (stmt, i);
+  bool offset_known;
+  poly_int64 offset;
+
+  offset_known = unadjusted_ptr_and_unit_offset (op, &op, &offset);
   if (TREE_CODE (op) == SSA_NAME
  && SSA_NAME_IS_DEFAULT_DEF (op)
  && TREE_CODE (SSA_NAME_VAR (op)) == PARM_DECL)
@@ -547,15 +552,23 @@ merge_call_side_effects (modref_summary *cur_summary,
  index++;
}
  parm_map[i].parm_index = index;
- parm_map[i].parm_offset_known = true;
- parm_map[i].parm_offset = 0;
+ parm_map[i].parm_offset_known = offset_known;
+ parm_map[i].parm_offset = offset;
}
   else if (points_to_local_or_readonly_memory_p (op))
parm_map[i].parm_index = -2;
   else
parm_map[i].parm_index = -1;
   if (dump_file)
-   fprintf (dump_file, " %i", parm_map[i].parm_index);
+   {
+ fprintf (dump_file, " %i", parm_map[i].parm_index);
+ if (parm_map[i].parm_offset_known)
+   {
+ fprintf (dump_file, " offset:");
+ print_dec ((poly_int64_pod)parm_map[i].parm_offset,
+dump_file, SIGNED);
+   }
+   }
 }
   if (dump_file)
 fprintf (dump_file, "\n");
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 2d09d913051..fe317f56e02 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1222,6 +1222,72 @@ load_from_unmodified_param_or_agg (struct 
ipa_func_body_info *fbi,
   return index;
 }
 
+/* Walk pointer adjustemnts from OP (such as POINTER_PLUS and ADDR_EXPR)
+   to find original pointer.  Initialize RET to the pointer which results from
+   the walk.
+   If offset is known return true and initialize OFFSET_RET.  */
+
+bool
+unadjusted_ptr_and_unit_offset (tree op, tree *ret, poly_int64 *offset_ret)
+{
+  poly_int64 offset = 0;
+  bool offset_known = true;
+
+  while (true)
+{
+  if (TREE_CODE (op) == ADDR_EXPR)
+   {
+ poly_int64 extra_offset = 0;
+ tree base = get_addr_base_and_unit_offset (TREE_OPERAND (op, 0),
+&offset);
+ if (!base)
+   {
+ base = get_base_address (TREE_OPERAND (op, 0));
+   

Re: [PATCH] MIPS/libphobos: Fix switchcontext.S assembly for MIPS I ISA

2020-10-13 Thread Richard Biener
On Mon, 12 Oct 2020, Maciej W. Rozycki wrote:

> On Thu, 8 Oct 2020, Iain Buclaw wrote:
> 
> > >  Noticed in a build of a MIPS I toolchain.  I have no way to run MIPS 
> > > regression-testing right now, however in `libopcodes' the L.D and S.D 
> > > instructions are strict aliases valid for the MIPS II and higher ISAs, 
> > > and 
> > > just to double-check that I have built MIPS32r2 GCC with and without the 
> > > change applied and verified with `objdump' that the respective target 
> > > objects produced are identical.
> > > 
> > >  OK to apply to trunk, and -- as a fatal compilation error -- to backport 
> > > to active release branches?
> > > 
> > 
> > Fine with me, thanks.
> 
>  Applied to trunk, thanks.
> 
>  Jakub, Richard: I should have cc-ed you for the backports to GCC 8/9/10.  
> OK to backport as a fatal build failure fix, or shall we leave this as it 
> stands?  FAOD the L.D and S.D assembly instructions have been supported in 
> binutils as long as the MIPS port has, i.e. from:
> 
> commit 45b1470513cfef2af6fd5532d33a54a840b4600a
> Author: Ian Lance Taylor 
> Date:   Wed Aug 18 19:40:37 1993 +

Sure.

Richard.


Re: [PATCH v2] PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values.

2020-10-13 Thread Kito Cheng
ping^2

Hi Eric:

Do you mind having a review for that?

thanks :)

On Mon, Oct 5, 2020 at 5:24 PM Kito Cheng  wrote:

> ping.
>
>
> On Fri, Sep 25, 2020 at 2:33 PM Richard Biener  wrote:
>
>> On Fri, 25 Sep 2020, Kito Cheng wrote:
>>
>> > In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has
>> handled
>> > misaligned stores, but it didn't handle PARALLEL values, and I refer the
>> > other part of this function, I found the PARALLEL need handled by
>> > emit_group_* functions, so I add a check, and using emit_group_store if
>> > storing a PARALLEL value, also checked this change didn't break the
>> > testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal
>> changes.
>> >
>> > For riscv64 target, struct S {int a; double b;} will pack into a
>> parallel
>> > value to return and it has TImode when misaligned access is supported,
>> > however TImode required 16-byte align, but it only 8-byte align, so it
>> go to
>> > the misaligned stores handling, then it will try to generate move
>> > instruction from a PARALLEL value.
>> >
>> > Tested on following target without introduced new reguression:
>> >   - riscv32/riscv64 elf
>> >   - x86_64-linux
>> >   - arm-eabi
>>
>> OK if Eric says so.
>>
>> Thanks,
>> Richard.
>>
>> > v2 changes:
>> >   - Use maybe_emit_group_store instead of emit_group_store.
>> >   - Remove push_temp_slots/pop_temp_slots, emit_group_store only require
>> > stack temp slot when dst is CONCAT or PARALLEL, however
>> > maybe_emit_group_store will always use REG for dst if needed.
>> >
>> > gcc/ChangeLog:
>> >
>> >   PR target/96759
>> >   * expr.c (expand_assignment): Handle misaligned stores with
>> PARALLEL
>> >   value.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >   PR target/96759
>> >   * g++.target/riscv/pr96759.C: New.
>> >   * gcc.target/riscv/pr96759.c: New.
>> > ---
>> >  gcc/expr.c   |  2 ++
>> >  gcc/testsuite/g++.target/riscv/pr96759.C |  8 
>> >  gcc/testsuite/gcc.target/riscv/pr96759.c | 13 +
>> >  3 files changed, 23 insertions(+)
>> >  create mode 100644 gcc/testsuite/g++.target/riscv/pr96759.C
>> >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr96759.c
>> >
>> > diff --git a/gcc/expr.c b/gcc/expr.c
>> > index 1a15f24b3979..6eb13a12c8c5 100644
>> > --- a/gcc/expr.c
>> > +++ b/gcc/expr.c
>> > @@ -5168,6 +5168,8 @@ expand_assignment (tree to, tree from, bool
>> nontemporal)
>> >rtx reg, mem;
>> >
>> >reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
>> > +  /* Handle PARALLEL.  */
>> > +  reg = maybe_emit_group_store (reg, TREE_TYPE (from));
>> >reg = force_not_mem (reg);
>> >mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> >if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
>> > diff --git a/gcc/testsuite/g++.target/riscv/pr96759.C
>> b/gcc/testsuite/g++.target/riscv/pr96759.C
>> > new file mode 100644
>> > index ..673999a4baf7
>> > --- /dev/null
>> > +++ b/gcc/testsuite/g++.target/riscv/pr96759.C
>> > @@ -0,0 +1,8 @@
>> > +/* { dg-options "-mno-strict-align -std=gnu++17" } */
>> > +/* { dg-do compile } */
>> > +struct S {
>> > +  int a;
>> > +  double b;
>> > +};
>> > +S GetNumbers();
>> > +auto [globalC, globalD] = GetNumbers();
>> > diff --git a/gcc/testsuite/gcc.target/riscv/pr96759.c
>> b/gcc/testsuite/gcc.target/riscv/pr96759.c
>> > new file mode 100644
>> > index ..621c39196fca
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/riscv/pr96759.c
>> > @@ -0,0 +1,13 @@
>> > +/* { dg-options "-mno-strict-align" } */
>> > +/* { dg-do compile } */
>> > +
>> > +struct S {
>> > +  int a;
>> > +  double b;
>> > +};
>> > +struct S GetNumbers();
>> > +struct S g;
>> > +
>> > +void foo(){
>> > +  g = GetNumbers();
>> > +}
>> >
>>
>> --
>> Richard Biener 
>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>> Germany; GF: Felix Imend
>>
>


Re: [PATCH] PR target/96307: Fix KASAN option checking.

2020-10-13 Thread Kito Cheng
ping

On Mon, Oct 5, 2020 at 5:49 PM Kito Cheng  wrote:

>  - Disable kasan if target is unsupported and -fasan-shadow-offset= is not
>given, no matter `--param asan-stack=1` is given or not.
>
>  - Moving KASAN option checking testcase to gcc.dg, those testcase could be
>useful for all other target which not support asan.
>
>  - Verifed on riscv and x86.
>
> gcc/ChangeLog:
>
> PR target/96307
> * toplev.c (process_options): Remove param_asan_stack checking for
> kasan
> option checking.
>
> gcc/testsuite/ChangeLog:
>
> PR target/96307
> * gcc.dg/pr96307.c: New.
> * gcc.target/riscv/pr96260.c: Move this test case from here to ...
> * gcc.dg/pr96260.c: ... here.
> * gcc.target/riscv/pr91441.c: Move this test case from here to ...
> * gcc.dg/pr91441.c: ... here.
> * lib/target-supports.exp
> (check_effective_target_no_fsanitize_address):
> New proc.
> ---
>  .../{gcc.target/riscv => gcc.dg}/pr91441.c|  1 +
>  .../{gcc.target/riscv => gcc.dg}/pr96260.c|  1 +
>  gcc/testsuite/gcc.dg/pr96307.c| 25 +++
>  gcc/testsuite/lib/target-supports.exp | 11 
>  gcc/toplev.c  |  1 -
>  5 files changed, 38 insertions(+), 1 deletion(-)
>  rename gcc/testsuite/{gcc.target/riscv => gcc.dg}/pr91441.c (85%)
>  rename gcc/testsuite/{gcc.target/riscv => gcc.dg}/pr96260.c (77%)
>  create mode 100644 gcc/testsuite/gcc.dg/pr96307.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr91441.c
> b/gcc/testsuite/gcc.dg/pr91441.c
> similarity index 85%
> rename from gcc/testsuite/gcc.target/riscv/pr91441.c
> rename to gcc/testsuite/gcc.dg/pr91441.c
> index b55df5e7f00c..4f7a8fbec5e9 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr91441.c
> +++ b/gcc/testsuite/gcc.dg/pr91441.c
> @@ -1,5 +1,6 @@
>  /* PR target/91441 */
>  /* { dg-do compile  } */
> +/* { dg-require-effective-target no_fsanitize_address }*/
>  /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address" } */
>
>  int *bar(int *);
> diff --git a/gcc/testsuite/gcc.target/riscv/pr96260.c
> b/gcc/testsuite/gcc.dg/pr96260.c
> similarity index 77%
> rename from gcc/testsuite/gcc.target/riscv/pr96260.c
> rename to gcc/testsuite/gcc.dg/pr96260.c
> index 229997f877b7..734832f021e3 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr96260.c
> +++ b/gcc/testsuite/gcc.dg/pr96260.c
> @@ -1,5 +1,6 @@
>  /* PR target/96260 */
>  /* { dg-do compile } */
> +/* { dg-require-effective-target no_fsanitize_address }*/
>  /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address
> -fasan-shadow-offset=0x10" } */
>
>  int *bar(int *);
> diff --git a/gcc/testsuite/gcc.dg/pr96307.c
> b/gcc/testsuite/gcc.dg/pr96307.c
> new file mode 100644
> index ..cd1c17c9661b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr96307.c
> @@ -0,0 +1,25 @@
> +/* PR target/96307 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target no_fsanitize_address }*/
> +/* { dg-additional-options "-fsanitize=kernel-address
> --param=asan-instrumentation-with-call-threshold=8" } */
> +
> +#include 
> +enum a {test1, test2, test3=INT_MAX};
> +enum a a;
> +enum a *b;
> +
> +void reset (void);
> +
> +void
> +t()
> +{
> +  if (a != test2)
> +__builtin_abort ();
> +  if (*b != test2)
> +__builtin_abort ();
> +  reset ();
> +  if (a != test1)
> +__builtin_abort ();
> +  if (*b != test1)
> +__builtin_abort ();
> +}
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index 8314e443c437..e80b71a2110c 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -10552,3 +10552,14 @@ proc check_effective_target_ident_directive {} {
> int i;
>  }]
>  }
> +
> +# Return 1 if target is not support address sanitize, 1 otherwise.
> +
> +proc check_effective_target_no_fsanitize_address {} {
> +if ![check_no_compiler_messages fsanitize_address executable {
> +   int main (void) { return 0; }
> +}] {
> +   return 1;
> +}
> +return 0;
> +}
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index a4cb8bb262ed..540e131d963d 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1842,7 +1842,6 @@ process_options (void)
>
>if ((flag_sanitize & SANITIZE_KERNEL_ADDRESS)
>&& (targetm.asan_shadow_offset == NULL
> - && param_asan_stack
>   && !asan_shadow_offset_set_p ()))
>  {
>warning_at (UNKNOWN_LOCATION, 0,
> --
> 2.28.0
>
>


[Patch] x86: Enable GCC support for Intel Hreset extension

2020-10-13 Thread Hongyu Wang via Gcc-patches
Hi:

This patch is about to support Intel Hreset instruction.

Hreset provides a hint to the processor to selectively reset the prediction
history of the current logical processor.

For more details, please refer to
https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Bootstrap ok, regression test on i386/x86 backend is ok.

OK for master?

gcc/

* common/config/i386/cpuinfo.h (get_available_features):
Detect HRESET.
* common/config/i386/i386-common.c (OPTION_MASK_ISA2_HRESET_SET,
OPTION_MASK_ISA2_HRESET_UNSET): New macros.
(ix86_handle_option): Handle -mhreset.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_HRESET.
* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
for hreset.
* config.gcc: Add hresetintrin.h
* config/i386/hresetintrin.h: New header file.
* config/i386/x86gprintrin.h: Include hresetintrin.h.
* config/i386/cpuid.h (bit_HRESET): New.
* config/i386/i386-builtin.def: Add new builtin.
* config/i386/i386-expand.c (ix86_expand_builtin):
Handle new builtin.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__HRESET__.
* config/i386/i386-options.c (isa2_opts): Add -mhreset.
(ix86_valid_target_attribute_inner_p): Handle hreset.
* config/i386/i386.h (TARGET_HRESET, TARGET_HRESET_P,
PTA_HRESET): New.
(PTA_ALDERLAKE): Add PTA_HRESET.
* config/i386/i386.opt: Add option -mhreset.
* config/i386/i386.md (UNSPECV_HRESET): New unspec.
(hreset): New define_insn.
* doc/invoke.texi: Document -mhreset.
* doc/extend.texi: Document hreset.

gcc/testsuite/

* gcc.target/i386/hreset-1.c: New test.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Update -mhreset.
* gcc.target/i386/sse-13.c: Likewise.
* gcc.target/i386/sse-14.c: Likewise.
* gcc.target/i386/sse-22.c: Likewise.
* gcc.target/i386/sse-23.c: Likewise.
* g++.dg/other/i386-2.C: Likewise.
* g++.dg/other/i386-3.C: Likewise.

-- 
Regards,

Hongyu, Wang
From 9dbb6bfb28431cd52149e12cc5f359be7fb46c64 Mon Sep 17 00:00:00 2001
From: Hongyu Wang 
Date: Tue, 7 Apr 2020 18:39:53 +
Subject: [PATCH] Enable Intel HRESET Instruction

gcc/

	* common/config/i386/cpuinfo.h (get_available_features):
	Detect HRESET.
	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_HRESET_SET,
	OPTION_MASK_ISA2_HRESET_UNSET): New macros.
	(ix86_handle_option): Handle -mhreset.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_HRESET.
	* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
	for hreset.
	* config.gcc: Add hresetintrin.h
	* config/i386/hresetintrin.h: New header file.
	* config/i386/x86gprintrin.h: Include hresetintrin.h.
	* config/i386/cpuid.h (bit_HRESET): New.
	* config/i386/i386-builtin.def: Add new builtin.
	* config/i386/i386-expand.c (ix86_expand_builtin):
	Handle new builtin.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__HRESET__.
	* config/i386/i386-options.c (isa2_opts): Add -mhreset.
	(ix86_valid_target_attribute_inner_p): Handle hreset.
	* config/i386/i386.h (TARGET_HRESET, TARGET_HRESET_P,
	PTA_HRESET): New.
	(PTA_ALDERLAKE): Add PTA_HRESET.
	* config/i386/i386.opt: Add option -mhreset.
	* config/i386/i386.md (UNSPECV_HRESET): New unspec.
	(hreset): New define_insn.
	* doc/invoke.texi: Document -mhreset.
	* doc/extend.texi: Document hreset.

gcc/testsuite/

	* gcc.target/i386/hreset-1.c: New test.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
	* gcc.target/i386/sse-12.c: Update -mhreset.
	* gcc.target/i386/sse-13.c: Likewise.
	* gcc.target/i386/sse-14.c: Likewise.
	* gcc.target/i386/sse-22.c: Likewise.
	* gcc.target/i386/sse-23.c: Likewise.
	* g++.dg/other/i386-2.C: Likewise.
	* g++.dg/other/i386-3.C: Likewise.
---
 gcc/common/config/i386/cpuinfo.h  |  3 ++
 gcc/common/config/i386/i386-common.c  | 15 ++
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/common/config/i386/i386-isas.h|  1 +
 gcc/config.gcc|  4 +-
 gcc/config/i386/cpuid.h   |  1 +
 gcc/config/i386/hresetintrin.h| 53 +++
 gcc/config/i386/i386-builtin.def  |  3 ++
 gcc/config/i386/i386-c.c  |  3 +-
 gcc/config/i386/i386-expand.c |  8 +++
 gcc/config/i386/i386-options.c|  4 +-
 gcc/config/i386/i386.h|  5 +-
 gcc/config/i386/i386.md   | 11 
 gcc/config/i386/i386.opt  |  4 ++
 gcc/config/i386/x86gprintrin.h|  2 +
 gcc/doc/extend.texi   |  5 ++
 gcc/doc/invoke.texi   |  9 ++--
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |  2 +
 gcc/testsuite/gcc.target/i386/hreset-1.c  | 11 
 19 files chan

[PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.

2020-10-13 Thread Hongtao Liu via Gcc-patches
Hi:
  For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
   (parallel [(const_int 0) (const_int 1)]))
 it could be simplified as inner.

  Bootstrap is ok, regression test on i386 backend is ok.

gcc/ChangeLog
PR rtl-optimization/97249
* simplify-rtx.c (simplify_binary_operation_1): Simplify
vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

* gcc.target/i386/pr97249-1.c: New test.

-- 
BR,
Hongtao
From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of paradoxical subreg.

For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
		   (parallel [(const_int 0) (const_int 1)]))
it could be simplified as inner.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c| 27 
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..9c397157f28 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		return subop1;
 		}
 	}
+
+	  /* For cases like
+	 (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
+			  (parallel [(const_int 0) (const_int 1)])).
+	 return inner directly.  */
+	  if (GET_CODE (trueop0) == SUBREG
+	  && paradoxical_subreg_p (trueop0)
+	  && mode == GET_MODE (XEXP (trueop0, 0))
+	  && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+	  && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+	  && l0 % l1 == 0)
+	{
+	  gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+	  unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
+	  unsigned HOST_WIDE_INT sel = 0;
+	  int i = 0;
+	  for (;i != l1; i++)
+		{
+		  rtx j = XVECEXP (trueop1, 0, i);
+		  if (!CONST_INT_P (j))
+		break;
+		  sel |= HOST_WIDE_INT_1U << UINTVAL (j);
+		}
+	  /* ??? Need to simplify XEXP (trueop0, 0) here.  */
+	  if (sel == expect)
+		return XEXP (trueop0, 0);
+	}
 	}
 
   if (XVECLEN (trueop1, 0) == 1
diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c
new file mode 100644
index 000..bc34aa8baa6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c
@@ -0,0 +1,30 @@
+/* PR target/97249  */
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O3 -masm=att" } */
+/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */
+
+void
+foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
+{
+for (int i = 0 ; i != 8; i++)
+ p3[i] = p1[i] + p2[i];
+ return;
+}
+
+void
+foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3)
+{
+for (int i = 0 ; i != 4; i++)
+ p3[i] = p1[i] + p2[i];
+ return;
+}
+
+void
+foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3)
+{
+for (int i = 0 ; i != 2; i++)
+  p3[i] = (long long)p1[i] + (long long)p2[i];
+ return;
+}
-- 
2.18.1

From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 13 Oct 2020 15:35:29 +0800
Subject: [PATCH] Simplify vec_select of paradoxical subreg.

For rtx like
  (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
		   (parallel [(const_int 0) (const_int 1)]))
it could be simplified as inner.

gcc/ChangeLog
	PR rtl-optimization/97249
	* simplify-rtx.c (simplify_binary_operation_1): Simplify
	vec_select of paradoxical subreg.

gcc/testsuite/ChangeLog

	* gcc.target/i386/pr97249-1.c: New test.
---
 gcc/simplify-rtx.c| 27 
 gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..9c397157f28 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 		return subop1;
 		}
 	}
+
+	  /* For cases like
+	 (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
+			  (parallel [(const_int 0) (const_int 1)])).
+	 return inner directly.  */
+	  if (GET_CODE (trueop0) == SUBREG
+	  && paradoxical_subreg_p (trueop0)
+	  && mode == GET_MODE (XEXP (trueop0, 0))
+	  && (GET_MODE_NUNITS (GET_MODE (tru

[arm-perf-staging branch] Add support for -fno-alias

2020-10-13 Thread Tamar Christina via Gcc-patches
Hi,

I am sending some old patches that we have internally since GCC 10 to the Arm 
Branch but feel free to comment as we will be looking to submit them for GCC 12 
to mainline.

This patch adds the option '-fno-alias'. The option makes the compiler treat 
any pointer being passed as a parameter as if it had the keyword restrict.
This option makes it easier to check whether using restrict gives a performance 
boost, without having to change the sources.
Of course this option can only be used if you know for a fact that all pointers 
do not alias, or just like with the restrict keyword, things can go very wrong.

The way this patch implements this option is when the option is passed, create 
a qualified pointer type for any pointer type encountered as a parameter.
This qualified pointer type will have the TYPE_QUAL_RESTRICT bit set, just as 
if we had parsed a 'restrict' keyword.

Bootstraped on aarch64-none-linux-gnu, to make sure I didn't break anything 
obvious in the build.

Is this OK for trunk?

gcc/ChangeLog:

2020-xx-xx  Andre Vieira  

* common.opt (fno-alias): New option.
* c/cdecl.c (grokdeclarator): When flag_no_alias make all
pointers passed as parameters restrict.
* cp/decl.c (grokdeclarator): Likewise.
* doc/invoke.texi (fno-alias): Document new option.

gcc/testsuite/ChangeLog:

2020-xx-xx  Andre Vieira  

* gcc.dg/vect/vect-no-alias.c: New test.
* gcc.dg/vect/noalias.h: New include file used in test.
* g++.dg/vect/simd-no-alias.cc: New test.

-- 
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index c819fd0d0d54b7147c782850f28e861b2f6b0349..f101ea090f3fd2254a1c9c891644a53fe11f75c0 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -7160,8 +7160,18 @@ grokdeclarator (const struct c_declarator *declarator,
 	type = c_build_pointer_type (type);
 	type_quals = TYPE_UNQUALIFIED;
 	  }
-	else if (type_quals)
-	  type = c_build_qualified_type (type, type_quals);
+	else
+	  {
+	/* If -fno-alias then make all pointers used as parameters
+	   restrict pointers.  */
+	if (flag_no_alias && POINTER_TYPE_P (type)
+		&& !POINTER_TYPE_P (TREE_TYPE (type))
+		&& TREE_CODE (TREE_TYPE (type)) != FUNCTION_TYPE)
+	  type_quals |= TYPE_QUAL_RESTRICT;
+
+	if (type_quals)
+	  type = c_build_qualified_type (type, type_quals);
+	  }
 
 	decl = build_decl (declarator->id_loc,
 			   PARM_DECL, declarator->u.id.id, type);
diff --git a/gcc/common.opt b/gcc/common.opt
index fa9da505fc2766794e731312ef6394f75a940d82..02882ec633a74b3014426ffa8da1485c373415a4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3392,4 +3392,7 @@ fipa-ra
 Common Report Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fno-alias
+Common RejectNegative Var(flag_no_alias) Optimization
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index e3f4b435a4905d38e66215a8343f96429af834a0..0c843b29cfe3f69a2b544f66887087659edb1d32 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -13035,6 +13035,16 @@ grokdeclarator (const cp_declarator *declarator,
 
 if (decl_context == PARM)
   {
+	/* If -fno-alias then make all pointers used as parameters
+	   restrict pointers.  */
+	if (flag_no_alias && POINTER_TYPE_P (type)
+	&& !POINTER_TYPE_P (TREE_TYPE (type))
+	&& TREE_CODE (TREE_TYPE (type)) != FUNCTION_TYPE)
+	  {
+	type = cp_build_qualified_type (type,
+	TYPE_QUALS (type)
+	| TYPE_QUAL_RESTRICT);
+	  }
 	decl = cp_build_parm_decl (NULL_TREE, unqualified_id, type);
 	DECL_ARRAY_PARAMETER_P (decl) = array_parameter_p;
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 54375ebd6799447bf468a56b03d044ac997fb842..44cb584f45f08c8f0a19d7370fc43d44641ba52c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -496,7 +496,7 @@ Objective-C and Objective-C++ Dialects}.
 -fno-defer-pop  -fno-fp-int-builtin-inexact  -fno-function-cse @gol
 -fno-guess-branch-probability  -fno-inline  -fno-math-errno  -fno-peephole @gol
 -fno-peephole2  -fno-printf-return-value  -fno-sched-interblock @gol
--fno-sched-spec  -fno-signed-zeros @gol
+-fno-sched-spec  -fno-signed-zeros -fno-alias @gol
 -fno-toplevel-reorder  -fno-trapping-math  -fno-zero-initialized-in-bss @gol
 -fomit-frame-pointer  -foptimize-sibling-calls @gol
 -fpartial-inlining  -fpeel-loops  -fpredictive-commoning @gol
@@ -10887,6 +10887,14 @@ example, an @code{unsigned int} can alias an @code{int}, but not a
 @code{void*} or a @code{double}.  A character type may alias any other
 type.
 
+@item -fno-alias
+@opindex fno-alias
+This option is equivalent to adding the @code{restrict} keyword to all pointers
+used as parameters.  This option can be used to assess the performance impact
+of adding restrict to your pointers without having to make code changes.  Be
+aware that the compiler does not actually perform any check to the validity of
+adding restrict and if pointers do alias correc

[Patch] x86: Enable support for Intel UINTR extension

2020-10-13 Thread Hongyu Wang via Gcc-patches
Hi:

This patch is about to support User Interrupt (UINTR) instructions.

This feature defines user interrupts as new events in the architecture.
They are delivered to software operating in 64-bit mode with CPL = 3
without any change to segmentation state.

For more details, please refer to
https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Bootstrap ok, regression test on i386/x86 backend is ok.

OK for master?

gcc/
* common/config/i386/cpuinfo.h (get_available_features):
Detect UINTR.
* common/config/i386/i386-common.c (OPTION_MASK_ISA2_UINTR_SET
OPTION_MASK_ISA2_UINTR_UNSET): New.
(ix86_handle_option): Handle -muintr.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_UINTR.
* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
for uintr.
* config.gcc: Add uintrintrin.h to extra_headers.
* config/i386/uintrintrin.h: New.
* config/i386/cpuid.h (bit_UINTR): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect UINTR.
* config/i386/i386-builtin-types.def: Add new types.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins): Add
__builtin_ia32_testui.
* config/i386/i386-builtins.h (ix86_builtins): Add
IX86_BUILTIN_TESTUI.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__UINTR__.
* config/i386/i386-expand.c (ix86_expand_special_args_builtin):
Handle UINT8_FTYPE_VOID.
(ix86_expand_builtin): Handle IX86_BUILTIN_TESTUI.
* config/i386/i386-options.c (isa2_opts): Add -muintr.
(ix86_valid_target_attribute_inner_p): Handle UINTR.
(ix86_option_override_internal): Add TARGET_64BIT check for UINTR.
* config/i386/i386.h (TARGET_UINTR, TARGET_UINTR_P, PTA_UINTR): New.
(PTA_SAPPHIRRAPIDS): Add PTA_UINTR.
* config/i386/i386.opt: Add -muintr.
* config/i386/i386.md
(define_int_iterator UINTR_UNSPECV): New.
(define_int_attr uintr_unspecv): New.
(uintr_, uintr_senduipi, testui):
New define_insn patterns.
* config/i386/x86gprintrin.h: Include uintrintrin.h
* doc/invoke.texi: Document -muintr.
* doc/extend.texi: Document uintr.

gcc/testsuite/

* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/uintr-1.c: New test.
* gcc.target/i386/uintr-2.c: Ditto.
* gcc.target/i386/uintr-3.c: Ditto.
* gcc.target/i386/uintr-4.c: Ditto.
* gcc.target/i386/uintr-5.c: Ditto.

-- 
Regards,

Hongyu, Wang
From 9f15ec4498eb7ff03ba9757592871d03fb05d222 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Mon, 20 May 2019 17:56:41 +0800
Subject: [PATCH] Enable gcc support for UINTR

2020-0X-XX  Hongtao Liu  

gcc/
	* common/config/i386/cpuinfo.h (get_available_features):
	Detect UINTR.
	* common/config/i386/i386-common.c (OPTION_MASK_ISA2_UINTR_SET
	OPTION_MASK_ISA2_UINTR_UNSET): New.
	(ix86_handle_option): Handle -muintr.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_UINTR.
	* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
	for uintr.
	* config.gcc: Add uintrintrin.h to extra_headers.
	* config/i386/uintrintrin.h: New.
	* config/i386/cpuid.h (bit_UINTR): New.
	* config/i386/driver-i386.c (host_detect_local_cpu): Detect UINTR.
	* config/i386/i386-builtin-types.def: Add new types.
	* config/i386/i386-builtin.def: Add new builtins.
	* config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins): Add
	__builtin_ia32_testui.
	* config/i386/i386-builtins.h (ix86_builtins): Add
	IX86_BUILTIN_TESTUI.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__UINTR__.
	* config/i386/i386-expand.c (ix86_expand_special_args_builtin):
	Handle UINT8_FTYPE_VOID.
	(ix86_expand_builtin): Handle IX86_BUILTIN_TESTUI.
	* config/i386/i386-options.c (isa2_opts): Add -muintr.
	(ix86_valid_target_attribute_inner_p): Handle UINTR.
	(ix86_option_override_internal): Add TARGET_64BIT check for UINTR.
	* config/i386/i386.h (TARGET_UINTR, TARGET_UINTR_P, PTA_UINTR): New.
	(PTA_SAPPHIRRAPIDS): Add PTA_UINTR.
	* config/i386/i386.opt: Add -muintr.
	* config/i386/i386.md
	(define_int_iterator UINTR_UNSPECV): New.
	(define_int_attr uintr_unspecv): New.
	(uintr_, uintr_senduipi, testui):
	New define_insn patterns.
	* config/i386/x86gprintrin.h: Include uintrintrin.h
	* doc/invoke.texi: Document -muintr.
	* doc/extend.texi: Document uintr.

gcc/testsuite/

	* gcc.target/i386/funcspec-56.inc: Add new target attribute.
	* gcc.target/i386/uintr-1.c: New test.
	* gcc.target/i386/uintr-2.c: Ditto.
	* gcc.target/i386/uintr-3.c: Ditto.
	* gcc.target/i386/uintr-4.c: Ditto.
	* gcc.target/i386/uintr-5.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h  |  2 +
 gcc/common/config/i386/i386-common.c  | 15 
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/common/config/i386/i386-isas.h|  1 +
 gcc/config.gcc|  4 +-
 gcc/config/i386/cpuid.h   |  1 +
 gcc/config/i386/i386-builtin-types.def|  1 +
 gcc/config/i386/i386-builtin.def  

Re: [PATCH] ASAN: disable -Wno-stringop-overflow for 2 tests

2020-10-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 13, 2020 at 10:11:26AM +0200, Martin Liška wrote:
> --- a/gcc/testsuite/g++.dg/asan/asan_test.C
> +++ b/gcc/testsuite/g++.dg/asan/asan_test.C
> @@ -9,6 +9,7 @@
>  // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
> run_expensive_tests } } }
>  // { dg-additional-options "-msse2" { target { i?86-*-linux* x86_64-*-linux* 
> i?86-*-freebsd* x86_64-*-freebsd*} } }
>  // { dg-additional-options "-D__NO_INLINE__" { target { *-*-linux-gnu } } }
> +/* { dg-additional-options "-Wno-stringop-overflow" } */

I'd put this one on the dg-options line next to other -Wno-* options.
Otherwise LGTM.

>  // { dg-set-target-env-var ASAN_OPTIONS "handle_segv=2" }
>  // { dg-final { asan-gtest } }
> diff --git a/gcc/testsuite/gcc.dg/asan/pr80166.c 
> b/gcc/testsuite/gcc.dg/asan/pr80166.c
> index 629dd23a31c..5e153b274fa 100644
> --- a/gcc/testsuite/gcc.dg/asan/pr80166.c
> +++ b/gcc/testsuite/gcc.dg/asan/pr80166.c
> @@ -1,5 +1,6 @@
>  /* PR sanitizer/80166 */
>  /* { dg-do run } */
> +/* { dg-additional-options "-Wno-stringop-overflow" } */
>  #include 
>  #include 
> -- 
> 2.28.0

Jakub



[PATCH] ASAN: disable -Wno-stringop-overflow for 2 tests

2020-10-13 Thread Martin Liška

Hey.

The patch is about disabling of the warning that properly identifies
a violation that we want to catch by ASAN.

Survives asan.exp on x86_64-linux-gnu.

Ready to be installed?
Thanks,
Martin

gcc/testsuite/ChangeLog:

PR middle-end/97392
* g++.dg/asan/asan_test.C: Disable -Wstringop-overflow.
* gcc.dg/asan/pr80166.c: Likewise.
---
 gcc/testsuite/g++.dg/asan/asan_test.C | 1 +
 gcc/testsuite/gcc.dg/asan/pr80166.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.dg/asan/asan_test.C 
b/gcc/testsuite/g++.dg/asan/asan_test.C
index ca989a36bbd..5e41702bd70 100644
--- a/gcc/testsuite/g++.dg/asan/asan_test.C
+++ b/gcc/testsuite/g++.dg/asan/asan_test.C
@@ -9,6 +9,7 @@
 // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS=1" { target { ! 
run_expensive_tests } } }
 // { dg-additional-options "-msse2" { target { i?86-*-linux* x86_64-*-linux* 
i?86-*-freebsd* x86_64-*-freebsd*} } }
 // { dg-additional-options "-D__NO_INLINE__" { target { *-*-linux-gnu } } }
+/* { dg-additional-options "-Wno-stringop-overflow" } */
 // { dg-set-target-env-var ASAN_OPTIONS "handle_segv=2" }
 // { dg-final { asan-gtest } }
 
diff --git a/gcc/testsuite/gcc.dg/asan/pr80166.c b/gcc/testsuite/gcc.dg/asan/pr80166.c

index 629dd23a31c..5e153b274fa 100644
--- a/gcc/testsuite/gcc.dg/asan/pr80166.c
+++ b/gcc/testsuite/gcc.dg/asan/pr80166.c
@@ -1,5 +1,6 @@
 /* PR sanitizer/80166 */
 /* { dg-do run } */
+/* { dg-additional-options "-Wno-stringop-overflow" } */
 
 #include 

 #include 
--
2.28.0



Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-10-13 Thread Richard Sandiford via Gcc-patches
xiezhiheng  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Thursday, August 27, 2020 4:08 PM
>> To: xiezhiheng 
>> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng  writes:
>> > I made two separate patches for these two groups for review purposes.
>> >
>> > Note: Patch for min/max intrinsics should be applied before the patch for
>> rounding intrinsics
>> >
>> > Bootstrapped and tested on aarch64 Linux platform.
>> 
>> Thanks, LGTM.  Pushed to master.
>> 
>> Richard
>
> I made the patch for multiply and multiply accumulator intrinsics.
>
> Note that bfmmlaq intrinsic is special because this instruction ignores the 
> FPCR and does not update the FPSR exception status.
>   
> https://developer.arm.com/docs/ddi0596/h/simd-and-floating-point-instructions-alphabetic-order/bfmmla-bfloat16-floating-point-matrix-multiply-accumulate-into-2x2-matrix
> So I set it to the AUTO_FP flag.
>
> Bootstrapped and tested on aarch64 Linux platform.

Thanks, LGTM.  Pushed to trunk.

Richard


Re: [PATCH v2] arm&aarch64: subdivide the type attribute "alu_shfit_imm"

2020-10-13 Thread Richard Sandiford via Gcc-patches
Thanks, the new patch looks great.  One minor suggestion:

"Qian, Jianhua"  writes:
> @@ -1106,7 +1125,45 @@
>mve_move,\
>mve_store,\
>mve_load"
> -   (const_string "untyped"))
> +   (cond [(eq_attr "autodetect_type" "alu_shift_lsr_op2")
> +(const_string "alu_shift_imm_other")
> +  (eq_attr "autodetect_type" "alu_shift_asr_op2")
> +(const_string "alu_shift_imm_other")

This can be combined into:

+   (cond [(eq_attr "autodetect_type" "alu_shift_lsr_op2,alu_shift_asr_op2")
+(const_string "alu_shift_imm_other")

But I think the patch is good to go as-is.

Thanks,
Richard


Re: [PATCH] Fortran : ICE in build_field PR95614 (2nd attempt)

2020-10-13 Thread Paul Richard Thomas via Gcc-patches
Hi Mark,

OK for master. If you want to backport, that's fine by me but please give
it a few weeks.

Thanks for fixing this.

Paul


On Tue, 13 Oct 2020 at 08:17, Mark Eggleston 
wrote:

> **ping**
>
> previously omitted commit message added
>
> On 29/09/2020 14:03, Mark Eggleston wrote:
> > For review.
> >
> > When the first attempt was committed the result was PR97224 i.e. it
> > broke the build of SPECCPU 2006 Games.
> >
> > I've changed the condition under which the error is produced. It was
> > produced in the local symbol was also found as a global symbol and the
> > the type of the symbol was not GSYM_UNKNOWN and not GSYM_COMMON.  This
> > meant that subroutine names in commons in the SPECCPU source code were
> > rejected.
> >
> > The condition no produces an error if the global symbol is either
> > GSYM_MODULE or GSYM_PROGRAM.
> >
> > The relevant section in the standard (19.3.1 (2)):
> >
> > "Within its scope, a local identifier of an entity of class (1) or
> > class (4) shall not be the same as a global identifier used in that
> > scope unless the global identifier
> >
> >  * is used only as the use-name of a rename in a USE statement,
> >  * is a common block name (19.3.2),
> >  * is an external procedure name that is also a generic name, or
> >  * is an external function name and the inclusive scope is its defining
> >subprogram (19.3.3)."
> >
> > I've added two new test cases for subroutine and function.
> >
> > I'm not certain about the restriction that the external procedure
> > should be a generic name. I have found the earlier standards somewhat
> > confusing on the subject, so I haven't determined whether there should
> > be any standards dependent code.
> >
> [PATCH] Fortran  :  ICE in build_field PR95614
>
> Local identifiers can not be the same as a module name. Original
> patch by Steve Kargl resulted in name clashes between common block
> names and local identifiers.  A local identifier can be the same as
> a global identier if that identifier is not a module or a program.
> The original patch was modified to reject global identifiers that
> represent a module or a program.
>
> 2020-09-29  Steven G. Kargl  
>  Mark Eggleston  
>
> gcc/fortran/
>
>  PR fortran/95614
>  * decl.c (gfc_get_common): Use gfc_match_common_name instead
>  of match_common_name.
>  * decl.c (gfc_bind_idents): Use gfc_match_common_name instead
>  of match_common_name.
>  * match.c : Rename match_common_name to gfc_match_common_name.
>  * match.c (gfc_match_common): Use gfc_match_common_name instead
>  of match_common_name.
>  * match.h : Rename match_common_name to gfc_match_common_name.
>  * resolve.c (resolve_common_vars): Check each symbol in a
>  common block has a global symbol.  If there is a global symbol
>  issue an error if the symbol type is a module or a program.
>
> 2020-09-29  Mark Eggleston 
>
> gcc/testsuite/
>
>  PR fortran/95614
>  * gfortran.dg/pr95614_1.f90: New test.
>  * gfortran.dg/pr95614_2.f90: New test.
>  * gfortran.dg/pr95614_3.f90: New test.
>  * gfortran.dg/pr95614_4.f90: New test.
>
> --
> https://www.codethink.co.uk/privacy.html
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


[PATCH] combine: Fix up simplify_shift_const_1 for nested ROTATEs [PR97386]

2020-10-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcases are miscompiled (the first one since my improvements
to rotate discovery on GIMPLE, the other one for many years) because
combiner optimizes nested ROTATEs with narrowing SUBREG in between (i.e.
the outer rotate is performed in shorter precision than the inner one) to
just one ROTATE of the rotated constant.  While that (under certain
conditions) can work for shifts, it can't work for rotates where we can only
do that with rotates of the same precision.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

OT, on the other side I wonder why the code doesn't handle ROTATERT.  While
earlier the code canonicalizes ROTATERT to ROTATE with the adjusted
(constant) count, I mean if the inner op is ROTATERT, why can't it
decanonicalize the outer one back to ROTATERT and treat it like that?

2020-10-13  Jakub Jelinek  

PR rtl-optimization/97386
* combine.c (simplify_shift_const_1): Don't optimize nested ROTATEs if
they have different modes.

* gcc.c-torture/execute/pr97386-1.c: New test.
* gcc.c-torture/execute/pr97386-2.c: New test.

--- gcc/combine.c.jj2020-08-27 18:42:35.0 +0200
+++ gcc/combine.c   2020-10-12 21:40:00.165359873 +0200
@@ -11003,8 +11003,11 @@ simplify_shift_const_1 (enum rtx_code co
break;
  /* For ((int) (cstLL >> count)) >> cst2 just give up.  Queuing
 up outer sign extension (often left and right shift) is
-hardly more efficient than the original.  See PR70429.  */
- if (code == ASHIFTRT && int_mode != int_result_mode)
+hardly more efficient than the original.  See PR70429.
+Similarly punt for rotates with different modes.
+See PR97386.  */
+ if ((code == ASHIFTRT || code == ROTATE)
+ && int_mode != int_result_mode)
break;
 
  rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
--- gcc/testsuite/gcc.c-torture/execute/pr97386-1.c.jj  2020-10-12 
21:47:46.636649170 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr97386-1.c 2020-10-12 
21:47:18.681049897 +0200
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/97386 */
+
+__attribute__((noipa)) unsigned char
+foo (unsigned int c)
+{
+  return __builtin_bswap16 ((unsigned long long) (0xLLU << c | 0xLLU 
>> ((-c) & 63)));
+}
+
+int
+main ()
+{
+  unsigned char x = foo (0);
+  if (__CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 && x != 0xcc)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr97386-2.c.jj  2020-10-12 
21:47:50.389595376 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr97386-2.c 2020-10-12 
21:47:24.253970009 +0200
@@ -0,0 +1,20 @@
+/* PR rtl-optimization/97386 */
+
+__attribute__((noipa)) unsigned
+foo (int x)
+{
+  unsigned long long a = (0x8000ULL << x) | (0x8000ULL 
>> (64 - x));
+  unsigned int b = a;
+  return (b << 24) | (b >> 8);
+}
+
+int
+main ()
+{
+  if (__CHAR_BIT__ == 8
+  && __SIZEOF_INT__ == 4
+  &&  __SIZEOF_LONG_LONG__ == 8
+  && foo (1) != 0x99000199U)
+__builtin_abort ();
+  return 0;
+}

Jakub



[committed] openmp: Improve composite triangular loop lowering and expansion

2020-10-13 Thread Jakub Jelinek via Gcc-patches
Hi!

This propagates needed values from the point where number of iterations
is calculated on composite loops to the places where that information
is needed to use the more efficient square root discovery to compute
the starting iterator values from the logical iteration number.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-10-13  Jakub Jelinek  

* omp-low.c (add_taskreg_looptemp_clauses): For triangular loops
with non-constant number of iterations add another 4 _looptemp_
clauses before the (optional) one for lastprivate.
(lower_omp_for_lastprivate): Skip those clauses when looking for
the lastprivate clause.
(lower_omp_for): For triangular loops with non-constant number of
iterations add another 4 _looptemp_ clauses.
* omp-expand.c (expand_omp_for_init_counts): For triangular loops
with non-constant number of iterations set counts[0],
fd->first_inner_iterations, fd->factor and fd->adjn1 from the newly
added _looptemp_ clauses.
(expand_omp_for_init_vars): Initialize the newly added _looptemp_
clauses.
(find_lastprivate_looptemp): New function.
(expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
expand_omp_taskloop_for_outer): Use it instead of manually skipping
_looptemp_ clauses.

--- gcc/omp-low.c.jj2020-10-08 11:10:24.109546260 +0200
+++ gcc/omp-low.c   2020-10-12 19:43:10.670666202 +0200
@@ -1919,12 +1919,38 @@ add_taskreg_looptemp_clauses (enum gf_ma
 GIMPLE_OMP_FOR, add one more temporaries for the total number
 of iterations (product of count1 ... countN-1).  */
  if (omp_find_clause (gimple_omp_for_clauses (for_stmt),
-  OMP_CLAUSE_LASTPRIVATE))
-   count++;
- else if (msk == GF_OMP_FOR_KIND_FOR
-  && omp_find_clause (gimple_omp_parallel_clauses (stmt),
-  OMP_CLAUSE_LASTPRIVATE))
-   count++;
+  OMP_CLAUSE_LASTPRIVATE)
+ || (msk == GF_OMP_FOR_KIND_FOR
+ && omp_find_clause (gimple_omp_parallel_clauses (stmt),
+ OMP_CLAUSE_LASTPRIVATE)))
+   {
+ tree temp = create_tmp_var (type);
+ tree c = build_omp_clause (UNKNOWN_LOCATION,
+OMP_CLAUSE__LOOPTEMP_);
+ insert_decl_map (&outer_ctx->cb, temp, temp);
+ OMP_CLAUSE_DECL (c) = temp;
+ OMP_CLAUSE_CHAIN (c) = gimple_omp_taskreg_clauses (stmt);
+ gimple_omp_taskreg_set_clauses (stmt, c);
+   }
+ if (fd.non_rect
+ && fd.last_nonrect == fd.first_nonrect + 1)
+   if (tree v = gimple_omp_for_index (for_stmt, fd.last_nonrect))
+ if (!TYPE_UNSIGNED (TREE_TYPE (v)))
+   {
+ v = gimple_omp_for_index (for_stmt, fd.first_nonrect);
+ tree type2 = TREE_TYPE (v);
+ count++;
+ for (i = 0; i < 3; i++)
+   {
+ tree temp = create_tmp_var (type2);
+ tree c = build_omp_clause (UNKNOWN_LOCATION,
+OMP_CLAUSE__LOOPTEMP_);
+ insert_decl_map (&outer_ctx->cb, temp, temp);
+ OMP_CLAUSE_DECL (c) = temp;
+ OMP_CLAUSE_CHAIN (c) = gimple_omp_taskreg_clauses (stmt);
+ gimple_omp_taskreg_set_clauses (stmt, c);
+   }
+   }
}
   for (i = 0; i < count; i++)
{
@@ -9530,7 +9556,13 @@ lower_omp_for_lastprivate (struct omp_fo
  tree innerc = omp_find_clause (taskreg_clauses,
 OMP_CLAUSE__LOOPTEMP_);
  gcc_assert (innerc);
- for (i = 0; i < fd->collapse; i++)
+ int count = fd->collapse;
+ if (fd->non_rect
+ && fd->last_nonrect == fd->first_nonrect + 1)
+   if (tree v = gimple_omp_for_index (fd->for_stmt, fd->last_nonrect))
+ if (!TYPE_UNSIGNED (TREE_TYPE (v)))
+   count += 4;
+ for (i = 0; i < count; i++)
{
  innerc = omp_find_clause (OMP_CLAUSE_CHAIN (innerc),
OMP_CLAUSE__LOOPTEMP_);
@@ -10453,12 +10485,26 @@ lower_omp_for (gimple_stmt_iterator *gsi
   if (fd.collapse > 1
  && TREE_CODE (fd.loop.n2) != INTEGER_CST)
count += fd.collapse - 1;
+  size_t count2 = 0;
+  tree type2 = NULL_TREE;
   bool taskreg_for
= (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_FOR
   || gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_TASKLOOP);
   tree outerc = NULL, *pc = gimple_omp_for_clauses_ptr (stmt);
   tree simtc = NULL;
   tree clauses = *pc;
+  if (fd.collapse > 1
+ &&

Re: [PATCH] Fortran : Two further previously missed ICEs PR53298

2020-10-13 Thread Mark Eggleston

**ping**

see https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554034.html 
and https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555072.html


OK for master?

On 14/09/2020 08:22, Mark Eggleston wrote:

Second attempt this time with patch attached.

For review.

Fixes the two ICEs reported in PR that remained after the previous fix.

There is a side affect that is manifested in the tree dumps. Instead of

__builtin_free (ptr2->dat.data);

we get

__builtin_free ((void *) ptr2->dat.data);

I do not know the cause of this but from what I can tell the newly 
inserted cast is harmless.  All the examples I've seen so have the 
cast except where the parameter is declared as void *. In the tree 
dumps ptr2 is declared as struct testtype2 *, I do not know where the 
type is declared so I don't know whether data is declared void * (I 
expect it is).


Is it worth the effort to determine how to remove the extra (void *)?

[PATCH] Fortran  : Two further previously missed ICEs PR53298

There were 3 ICEs with different call stacks in the comments of this
PR.  A previous commit fixed only one of those ICEs.

The ICEs fixed here are in trans-array.c and trans-expr.c.

The first ICE occurred when the array reference is not AR_ELEMENT
gfc_conv_scalarized_array_ref is called with se and ar, if se->ss is
NULL the ICE occurs.  If se->ss is NULL there is nothing to do before
the return.

The second ICE occurs in code that did not match its comments. Fixing
the code to match the comments fixes the ICE.  A side affect is that
the in the tree dumps for finalize_35.f90 and finalize_36.f90 contain
"__builtin_free ((void *) ptr2->dat.data);", the "(void *)" was
previously omitted.  The cast is harmless.

2020-09-11  Mark Eggleston 

gcc/fortran/

    PR fortran/53298
    * trans-array.c (gfc_conv_array_ref): In the body of the if
    statement only execute the code before the reurn is se->ss is
    set.
    * trans-expr.c (gfc_conv_component_ref): Change the if
    expression to match the comments.

2020-09-04  Mark Eggleston 

gcc/testsuite/

    PR fortran/53298
    * gfortran.dg/finalize_35.f90: Handle extra (void *).
    * gfortran.dg/finalize_36.f90: Handle extra (void *).
    * gfortran.dg/pr53298_2.f90: New test.
    * gfortran.dg/pr53298_3.f90: New test.


--
https://www.codethink.co.uk/privacy.html



Re: [PATCH] Fortran : ICE in build_field PR95614 (2nd attempt)

2020-10-13 Thread Mark Eggleston

**ping**

previously omitted commit message added

On 29/09/2020 14:03, Mark Eggleston wrote:

For review.

When the first attempt was committed the result was PR97224 i.e. it 
broke the build of SPECCPU 2006 Games.


I've changed the condition under which the error is produced. It was 
produced in the local symbol was also found as a global symbol and the 
the type of the symbol was not GSYM_UNKNOWN and not GSYM_COMMON.  This 
meant that subroutine names in commons in the SPECCPU source code were 
rejected.


The condition no produces an error if the global symbol is either 
GSYM_MODULE or GSYM_PROGRAM.


The relevant section in the standard (19.3.1 (2)):

"Within its scope, a local identifier of an entity of class (1) or 
class (4) shall not be the same as a global identifier used in that 
scope unless the global identifier


 * is used only as the use-name of a rename in a USE statement,
 * is a common block name (19.3.2),
 * is an external procedure name that is also a generic name, or
 * is an external function name and the inclusive scope is its defining
   subprogram (19.3.3)."

I've added two new test cases for subroutine and function.

I'm not certain about the restriction that the external procedure 
should be a generic name. I have found the earlier standards somewhat 
confusing on the subject, so I haven't determined whether there should 
be any standards dependent code.



[PATCH] Fortran  :  ICE in build_field PR95614

Local identifiers can not be the same as a module name. Original
patch by Steve Kargl resulted in name clashes between common block
names and local identifiers.  A local identifier can be the same as
a global identier if that identifier is not a module or a program.
The original patch was modified to reject global identifiers that
represent a module or a program.

2020-09-29  Steven G. Kargl  
        Mark Eggleston  

gcc/fortran/

    PR fortran/95614
    * decl.c (gfc_get_common): Use gfc_match_common_name instead
    of match_common_name.
    * decl.c (gfc_bind_idents): Use gfc_match_common_name instead
    of match_common_name.
    * match.c : Rename match_common_name to gfc_match_common_name.
    * match.c (gfc_match_common): Use gfc_match_common_name instead
    of match_common_name.
    * match.h : Rename match_common_name to gfc_match_common_name.
    * resolve.c (resolve_common_vars): Check each symbol in a
    common block has a global symbol.  If there is a global symbol
    issue an error if the symbol type is a module or a program.

2020-09-29  Mark Eggleston 

gcc/testsuite/

    PR fortran/95614
    * gfortran.dg/pr95614_1.f90: New test.
    * gfortran.dg/pr95614_2.f90: New test.
    * gfortran.dg/pr95614_3.f90: New test.
    * gfortran.dg/pr95614_4.f90: New test.

--
https://www.codethink.co.uk/privacy.html



PING^3 [PATCH 1/4] unroll: Add middle-end unroll factor estimation

2020-10-13 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html

BR,
Kewen

on 2020/9/15 下午3:44, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> Gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
> 
> BR,
> Kewen
> 
> on 2020/8/31 下午1:49, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> I'd like to gentle ping this since IVOPTs part is already to land.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
>>
>> BR,
>> Kewen
>>
>> on 2020/5/28 下午8:19, Kewen.Lin via Gcc-patches wrote:
>>>
>>> gcc/ChangeLog
>>>
>>> 2020-MM-DD  Kewen Lin  
>>>
>>> * cfgloop.h (struct loop): New field estimated_unroll.
>>> * tree-ssa-loop-manip.c (decide_unroll_const_iter): New function.
>>> (decide_unroll_runtime_iter): Likewise.
>>> (decide_unroll_stupid): Likewise.
>>> (estimate_unroll_factor): Likewise.
>>> * tree-ssa-loop-manip.h (estimate_unroll_factor): New declaration.
>>> * tree-ssa-loop.c (tree_average_num_loop_insns): New function.
>>> * tree-ssa-loop.h (tree_average_num_loop_insns): New declaration.
>>>


PING^1 [PATCH v2] rs6000: Use direct move for char/short vector CTOR [PR96933]

2020-10-13 Thread Kewen.Lin via Gcc-patches
Hi,

I'd like to gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553555.html

BR,
Kewen

on 2020/9/10 上午11:19, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As Segher's suggestion in the PR, for 128bit_direct_move, this new
> version leverages vector pack insns instead of vector perms with
> one control vector.  The performance evaluation shows that it's on
> par with the previous version for char, while it's better than the
> previous for short.
> 
> Bootstrapped/regtested again on powerpc64{,le}-linux-gnu P8 and
> powerpc64le-linux-gnu P9.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> 
> 
> gcc/ChangeLog:
> 
>   PR target/96933
>   * config/rs6000/rs6000.c (rs6000_expand_vector_init): Use direct move
>   instructions for vector construction with char/short types.
>   * config/rs6000/rs6000.md (p8_mtvsrwz_v16qisi2): New define_insn.
>   (p8_mtvsrd_v16qidi2): Likewise. 
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/96933
>   * gcc.target/powerpc/pr96933-1.c: New test.
>   * gcc.target/powerpc/pr96933-2.c: New test.
>   * gcc.target/powerpc/pr96933-3.c: New test.
>   * gcc.target/powerpc/pr96933.h: New test.
>