Re: libgo patch committed: Upgrade to Go 1.9 release

2017-09-14 Thread Rainer Orth
Hi Ian,

> On Thu, Sep 14, 2017 at 3:19 PM, Rainer Orth
>  wrote:
>>
>>> I've committed a patch to libgo to upgrade it to the recent Go 1.9 release.
>>>
>>> As usual with these upgrades, the patch is too large to attach here.
>>> I've attached the changes to files that are more or less specific to
>>> gccgo.
>>>
>>> This upgrade required some changes to the gotools Makefile.  And one
>>> test had to be updated.  These patches are also below.
>>>
>>> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
>>> to mainline.
>>
>> the patch broke Solaris bootstrap:
>>
>> /vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_unix.go:240:11: error: 
>> reference to undefined name 'forkExecPipe'
>>   if err = forkExecPipe(p[:]); err != nil {
>>^
>>
>> libgo/go/syscall/forkpipe_bsd.go is needed on Solaris, too.
>>
>> /vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/link.go:73:10: 
>> error: use of undefined type 'lifnum'
>>   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
>> sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
>>   ^
>> make[8]: *** [Makefile:3349: golang_org/x/net/lif.lo] Error 1
>>
>> The Go 1.9 upgrade patch has
>>
>> @@ -70,7 +70,7 @@ func Links(af int, name string) ([]Link,
>>
>>  func links(eps []endpoint, name string) ([]Link, error) {
>> var lls []Link
>> -   lifn := sysLifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
>> sysLIFC_AL
>> LZONES | sysLIFC_UNDER_IPMP}
>> +   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
>> sysLIFC_ALLZO
>> NES | sysLIFC_UNDER_IPMP}
>>
>> Reverting that allows link.go to compile.
>>
>> /vol/gcc/src/hg/trunk/local/libgo/go/internal/poll/fd_unix.go:366:21: error: 
>> reference to undefined identifier 'syscall.ReadDirent'
>>n, err := syscall.ReadDirent(fd.Sysfd, buf)
>>  ^
>>
>> I don't yet see where this comes from on non-Linux systems...
>
> It's in forkpipe_bsd.go.  Does this patch fix the problem?

that's true for forkExecPipe and I had this change in the patch I'd
attached.  But what about syscall.ReadDirent?  I couldn't find that
one...

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH v2] [libcc1] Rename C{,P}_COMPILER_NAME and remove triplet from them

2017-09-14 Thread Sergio Durigan Junior
Ping.

On Friday, September 01 2017, I wrote:

> On Wednesday, August 23 2017, Pedro Alves wrote:
>
>> On 08/23/2017 05:17 AM, Sergio Durigan Junior wrote:
>>> Hi there,
>>> 
>>> This is a series of two patches, one for GDB and one for GCC, which aims
>>> to improve the detection and handling of triplets present on compiler
>>> names.  The motivation for this series was mostly the fact that GDB's
>>> "compile" command is broken on Debian unstable, as can be seen here:
>>> 
>>>   
>>> 
>>> The reason for the failure is the fact that Debian compiles GCC using
>>> the --program-{prefix,suffix} options from configure in order to name
>>> the compiler using the full triplet (i.e., Debian's GCC is not merely
>>> named "gcc", but e.g. "x86_64-linux-gnu-gcc-7"), which end up naming the
>>> C_COMPILER_NAME and CP_COMPILER_NAME defines with the specified prefix
>>> and suffix.  Therefore, the regexp being used to match the compiler name
>>> is wrong because it doesn't take into account the fact that the defines
>>> may already contain the triplets.
>>
>> As discussed on IRC, I think the problem is that C_COMPILER_NAME
>> in libcc1 includes the full triplet in the first place.  I think
>> that it shouldn't.  I think that C_COMPILER_NAME should always
>> be "gcc".
>>
>> The problem is in bootstrapping code, before there's a plugin
>> yet -- i.e.., in the code that libcc1 uses to find the compiler (which
>> then loads a plugin that libcc1 talks with).
>>
>> Please bear with me while I lay down my rationale, so that we're
>> in the same page.
>>
>> C_COMPILER_NAME seems to include the prefix currently in an attempt
>> to support cross debugging, or more generically, --enable-targets=all
>> in gdb, but the whole thing doesn't really work as intended if
>> C_COMPILER_NAME already includes a target prefix.
>>
>> IIUC the libcc1/plugin design, a single "libcc1.so" (what gdb loads,
>> not the libcc1plugin compiler plugin) should work with any compiler in
>> the PATH, in case you have several in the system.  E.g., one for
>> each arch.
>>
>> Let me expand.
>>
>> The idea is that gdb always dlopens "libcc1.so", by that name exactly.
>> Usually that'll open the libcc1.so installed in the system, e.g.,
>> "/usr/lib64/libcc1.so", which for convenience was originally built from the
>> same source tree as the systems's compiler was built.  You could force gdb to
>> load some other libcc1.so, e.g., by tweaking LD_LIBRARY_PATH of course,
>> but you shouldn't need to.
>>
>> libcc1.so is responsible for finding a compiler that targets the
>> architecture of the inferior that the user is debugging in gdb.
>> E.g., say you're cross debugging for arm-none-eabi, on a
>> x86-64 Fedora host.  GDB knows the target inferior's architecture, and passes
>> down to (the system) libcc1 a triplet regex like "arm*-*eabi*" or
>> similar to libcc1,.  libcc1 appends "-" + C_COMPILER_NAME to that regex,
>> generating something like "arm*-*eabi*-gcc", and then looks for binaries
>> in PATH that match that regex.  When one is found, e.g., "arm-none-eabi-gcc",
>> libcc1 forks/execs that compiler, passing it "-fplugin=libcc1plugin".
>> libcc1 then communicates with that compiler's libcc1plugin plugin
>> via a socket.
>>
>> In this scheme, "libcc1.so", the library that gdb loads, has no
>> target-specific logic at all.  It should work with any compiler
>> in the system, for any target/arch.  All it does is marshall the gcc/gdb
>> interface between the gcc plugin and gdb, it is not linked against gcc.
>> That boundary is versioned, and ABI-stable.  So as long as the
>> libcc1.so that gdb loads understands the same API version of the gcc/gdb
>> interface API as gdb understands, it all should work.  (The APIs
>> are always extended keeping backward compatibility.)
>>
>> So in this scheme, having the "C_COMPILER_NAME" macro in libcc1
>> include the target prefix for the --target that the plugin that
>> libcc1 is built along with, seems to serve no real purpose, AFAICT.
>> It's just getting in the way.
>>
>> I.e., something like:
>>
>>   "$gdb_specified_triplet_re" + "-" + C_COMPILER_NAME
>>
>> works if C_COMPILER_NAME is exactly "gcc", but not if C_COMPILER_NAME is 
>> already:
>>
>>   "$whatever_triplet_libcc1_happened_to_be_built_with" + "-gcc"
>>
>> because we end up with:
>>
>>   "$gdb_specified_triplet_re" + "-" 
>> "$whatever_triplet_libcc1_happened_to_be_built_with" +  "-gcc"
>>
>> which is the problem case.
>>
>> In sum, I think the libcc1.so (not the plugin) should _not_ have baked
>> in target awareness, and thus C_COMPILER_NAME should always be "gcc", and
>> then libcc1's regex should be adjusted to also tolerate a suffix in
>> the final compiler binary name regex.
>>
>> WDYT?
>
> As I replied before, I agree with Pedro's rationale here and his idea
> actually makes my initial patch much simpler.  By renaming
> C_COMPILER_NAME (and the new CP_COMPILER_NAME) to just "gcc" (or "g++"),
> the 

Re: [RFC][PACH 3/5] Prevent tree unroller from completely unrolling inner loops if that results in excessive strided-loads in outer loop

2017-09-14 Thread Andrew Pinski
On Thu, Sep 14, 2017 at 6:30 PM, Kugan Vivekanandarajah
 wrote:
> This patch prevent tree unroller from completely unrolling inner loops if that
> results in excessive strided-loads in outer loop.

Same comments from the RTL version.

Though one more comment here:
+  if (!INDIRECT_REF_P (op)
+  && TREE_CODE (op) != MEM_REF
+  && TREE_CODE (op) != TARGET_MEM_REF)
+continue;

This does not handle ARRAY_REF which might be/should be handled.


+  if ((loop_father = loop_outer (loop)))

Since you don't use loop_father outside of the if statement use the
following (allowed) format
if (struct loop *loop_father = loop_outer (loop))

Thinking about this more, hw_prefetchers_avail might not be equivalent
to num_slots (PARAM_SIMULTANEOUS_PREFETCHES) but the name does not fit
what it means if I understand your hardware correctly.
Maybe hw_load_non_cacheline_prefetcher_avail since if I understand the
micro-arch is that the prefetchers are not based on the cacheline
being loaded.

Thanks,
Andrew

>
> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2017-09-12  Kugan Vivekanandarajah  
>
> * config/aarch64/aarch64.c (count_mem_load_streams): New.
> (aarch64_ok_to_unroll): New.
> * doc/tm.texi (ok_to_unroll): Define new target hook.
> * doc/tm.texi.in (ok_to_unroll): Likewise.
> * target.def (ok_to_unroll): Likewise.
> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use
>   ok_to_unroll while unrolling.


Re: [RFC][AARCH64][PATCH 2/5]: Add number of hw prefetchers available to cpu_prefetch_tune

2017-09-14 Thread Andrew Pinski
On Thu, Sep 14, 2017 at 6:28 PM, Kugan Vivekanandarajah
 wrote:
> This patch adds number of hw prefetchers available to
> cpu_prefetch_tune so it can be used in loop unrolling decisions.

Can you explain the difference between this and num_slots
(PARAM_SIMULTANEOUS_PREFETCHES)?  Because it seems like they should be
the same here.

Thanks,
Andrew

>
> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2017-09-12  Kugan Vivekanandarajah  
>
> * config/aarch64/aarch64-protos.h (struct cpu_prefetch_tune): Add
>   new field hw_prefetchers_avail.
> * config/aarch64/aarch64.c: Add values for hw_prefetchers_avail.


[RFC][AARCH64][PATCH 5/5] add aarch64_loop_unroll_adjust to limit partial unrolling in rtl based on strided-loads in loop

2017-09-14 Thread Kugan Vivekanandarajah
This patch adds aarch64_loop_unroll_adjust to limit partial unrolling
in rtl based on strided-loads in loop.

Thanks,
Kugan

gcc/ChangeLog:

2017-09-12  Kugan Vivekanandarajah  

* cfgloop.h (iv_analyze_biv): export.
* loop-iv.c: Likewise.
* config/aarch64/aarch64.c (strided_load_p): New.
(insn_has_strided_load): New.
(count_strided_load_rtl): New.
(aarch64_loop_unroll_adjust): New.
From 10e02b026784798fff6a3513dc11b1cffb1cf78a Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Wed, 23 Aug 2017 12:35:14 +1000
Subject: [PATCH 5/5] add aarch64_loop_unroll_adjust

---
 gcc/cfgloop.h|   1 +
 gcc/config/aarch64/aarch64.c | 136 +++
 gcc/loop-iv.c|   2 +-
 3 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 2308e7a..a3876a2 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -479,6 +479,7 @@ extern bool iv_analyze_expr (rtx_insn *, rtx, machine_mode,
 extern rtx get_iv_value (struct rtx_iv *, rtx);
 extern bool biv_p (rtx_insn *, rtx);
 extern void find_simple_exit (struct loop *, struct niter_desc *);
+extern bool iv_analyze_biv (rtx def, struct rtx_iv *iv);
 extern void iv_analysis_done (void);
 
 extern struct niter_desc *get_simple_loop_desc (struct loop *loop);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e88bb6c..624a996 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15189,6 +15189,139 @@ aarch64_ok_to_unroll (struct loop *loop, unsigned HOST_WIDE_INT nunroll)
   return true;
 }
 
+/* Return true if X is a strided load.  */
+
+static bool
+strided_load_p (const_rtx x)
+{
+  struct rtx_iv iv;
+  rtx reg;
+
+  if (!MEM_P (x))
+return false;
+
+  reg = XEXP (x, 0);
+  if (REG_P (reg)
+  || UNARY_P (reg))
+{
+  if (!REG_P (reg))
+	reg = XEXP (reg, 0);
+  if (REG_P (reg)
+	  && iv_analyze_biv (reg, ))
+	return true;
+}
+  else if (BINARY_P (reg))
+{
+  rtx reg1, reg2;
+  reg1 = XEXP (reg, 0);
+  reg2 = XEXP (reg, 1);
+  if (REG_P (reg1)
+	  && iv_analyze_biv (reg1, ))
+	return true;
+  if (REG_P (reg2)
+	  && iv_analyze_biv (reg2, ))
+	return true;
+}
+  return false;
+}
+
+
+/* Return true if X INSN is a strided load.  */
+
+static bool
+insn_has_strided_load (rtx_insn *insn)
+{
+  subrtx_iterator::array_type array;
+  if (!INSN_P (insn) || recog_memoized (insn) < 0)
+return false;
+  rtx pat = PATTERN (insn);
+
+  switch (GET_CODE (pat))
+{
+case PARALLEL:
+	{
+	  for (int j = 0; j < XVECLEN (pat, 0); ++j)
+	{
+	  rtx ex = XVECEXP (pat, 0, j);
+	  FOR_EACH_SUBRTX (iter, array, ex, NONCONST)
+		{
+		  const_rtx x = *iter;
+		  if (GET_CODE (x) == SET
+		  && strided_load_p (SET_SRC (x)))
+		return true;
+		}
+	}
+	}
+  break;
+
+case SET:
+  FOR_EACH_SUBRTX (iter, array, SET_SRC (pat), NONCONST)
+	{
+	  const_rtx x = *iter;
+	  if (strided_load_p (x))
+	return true;
+	}
+
+default:
+  break;
+}
+  return false;
+}
+
+/* Count the strided loads in the LOOP. If the strided loads are larger
+   (compared to MAX_STRIDED_LOADS), we dont need to compute all of
+   them.  This is used to limit the partial  unrolling factor to avoid
+   prefetcher collision.  */
+
+static unsigned
+count_strided_load_rtl (struct loop *loop, unsigned max_strided_loads)
+{
+  basic_block *bbs;
+  unsigned count = 0;
+  rtx_insn *insn;
+  iv_analysis_loop_init (loop);
+  bbs = get_loop_body (loop);
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+{
+  FOR_BB_INSNS (bbs[i], insn)
+	{
+	  if (insn_has_strided_load (insn))
+	count ++;
+
+	  if (count > (max_strided_loads / 2))
+	{
+	  free (bbs);
+	  iv_analysis_done ();
+	  return count;
+	}
+	}
+}
+  free (bbs);
+  iv_analysis_done ();
+  return count;
+}
+
+/* Target hook loop_unroll_adjust that limits partial loop unrolling
+   factor, if this would make the outer loop's prefetch streams more
+   than hardware can handle.  */
+
+static unsigned
+aarch64_loop_unroll_adjust (unsigned n_unroll, struct loop *loop)
+{
+  int max_strided_loads;
+  max_strided_loads = aarch64_tune_params.prefetch->hw_prefetchers_avail;
+
+  if (max_strided_loads == -1)
+return n_unroll;
+
+  unsigned count = count_strided_load_rtl (loop, max_strided_loads);
+  if (count > 0)
+n_unroll = 1 << (floor_log2 (max_strided_loads/count));
+
+  return n_unroll;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -15620,6 +15753,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_OK_TO_UNROLL
 #define TARGET_OK_TO_UNROLL aarch64_ok_to_unroll
 
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST aarch64_loop_unroll_adjust
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git 

[RFC][PATCH 4/5] Change iv_analyze_result to take const_rtx.

2017-09-14 Thread Kugan Vivekanandarajah
Change iv_analyze_result to take const_rtx. This is just to make the
next patch compile. No functional changes:

Thanks,
Kugan

gcc/ChangeLog:

2017-09-12  Kugan Vivekanandarajah  

* cfgloop.h (iv_analyze_result): Change 2nd param from rtx to
  const_rtx.
* df-core.c (df_find_def): Likewise.
* df.h (df_find_def): Likewise.
* loop-iv.c (iv_analyze_result): Likewise.
From 5d50c51c520d881104d44603514088a19e14e652 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Fri, 25 Aug 2017 10:49:50 +1000
Subject: [PATCH 4/5] Change iv_analyze_result to take const_rtx

---
 gcc/cfgloop.h | 2 +-
 gcc/df-core.c | 2 +-
 gcc/df.h  | 2 +-
 gcc/loop-iv.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index e7ffa23..2308e7a 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -473,7 +473,7 @@ struct GTY(()) niter_desc
 
 extern void iv_analysis_loop_init (struct loop *);
 extern bool iv_analyze (rtx_insn *, rtx, struct rtx_iv *);
-extern bool iv_analyze_result (rtx_insn *, rtx, struct rtx_iv *);
+extern bool iv_analyze_result (rtx_insn *, const_rtx, struct rtx_iv *);
 extern bool iv_analyze_expr (rtx_insn *, rtx, machine_mode,
 			 struct rtx_iv *);
 extern rtx get_iv_value (struct rtx_iv *, rtx);
diff --git a/gcc/df-core.c b/gcc/df-core.c
index 1e84d4d..ecb6b15 100644
--- a/gcc/df-core.c
+++ b/gcc/df-core.c
@@ -1951,7 +1951,7 @@ df_bb_regno_last_def_find (basic_block bb, unsigned int regno)
DF is the dataflow object.  */
 
 df_ref
-df_find_def (rtx_insn *insn, rtx reg)
+df_find_def (rtx_insn *insn, const_rtx reg)
 {
   df_ref def;
 
diff --git a/gcc/df.h b/gcc/df.h
index 07fd334..8861cc9 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -979,7 +979,7 @@ extern void df_check_cfg_clean (void);
 #endif
 extern df_ref df_bb_regno_first_def_find (basic_block, unsigned int);
 extern df_ref df_bb_regno_last_def_find (basic_block, unsigned int);
-extern df_ref df_find_def (rtx_insn *, rtx);
+extern df_ref df_find_def (rtx_insn *, const_rtx);
 extern bool df_reg_defined (rtx_insn *, rtx);
 extern df_ref df_find_use (rtx_insn *, rtx);
 extern bool df_reg_used (rtx_insn *, rtx);
diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index 896fe0b1..745b613 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -1198,7 +1198,7 @@ iv_analyze (rtx_insn *insn, rtx val, struct rtx_iv *iv)
 /* Analyzes definition of DEF in INSN and stores the result to IV.  */
 
 bool
-iv_analyze_result (rtx_insn *insn, rtx def, struct rtx_iv *iv)
+iv_analyze_result (rtx_insn *insn, const_rtx def, struct rtx_iv *iv)
 {
   df_ref adef;
 
-- 
2.7.4



[RFC][PACH 3/5] Prevent tree unroller from completely unrolling inner loops if that results in excessive strided-loads in outer loop

2017-09-14 Thread Kugan Vivekanandarajah
This patch prevent tree unroller from completely unrolling inner loops if that
results in excessive strided-loads in outer loop.

Thanks,
Kugan

gcc/ChangeLog:

2017-09-12  Kugan Vivekanandarajah  

* config/aarch64/aarch64.c (count_mem_load_streams): New.
(aarch64_ok_to_unroll): New.
* doc/tm.texi (ok_to_unroll): Define new target hook.
* doc/tm.texi.in (ok_to_unroll): Likewise.
* target.def (ok_to_unroll): Likewise.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use
  ok_to_unroll while unrolling.
From 5de245bbf6ba1768e8206a61feb0f42c106a1d94 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Fri, 18 Aug 2017 16:41:13 +1000
Subject: [PATCH 3/5] tree unroller limit strided loads

---
 gcc/config/aarch64/aarch64.c | 70 
 gcc/doc/tm.texi  |  4 +++
 gcc/doc/tm.texi.in   |  2 ++
 gcc/target.def   |  8 +
 gcc/tree-ssa-loop-ivcanon.c  |  8 +
 5 files changed, 92 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7d1ee70..e88bb6c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -64,6 +64,7 @@
 #include "sched-int.h"
 #include "target-globals.h"
 #include "common/common-target.h"
+#include "tree-scalar-evolution.h"
 #include "selftest.h"
 #include "selftest-rtl.h"
 
@@ -15122,6 +15123,72 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
 }
 }
 
+/* Count the strided loads in the LOOP with respect to OUT_LOOP.
+   If the strided loads are larger (compared to MAX_STRIDED_LOADS),
+   we dont need to compute all of them.  */
+
+static unsigned
+count_mem_load_streams (struct loop *out_loop,
+			struct loop *loop,
+			unsigned max_strided_loads)
+{
+  basic_block *bbs = get_loop_body (loop);
+  unsigned nbbs = loop->num_nodes;
+  gimple_stmt_iterator gsi;
+  unsigned count = 0;
+
+  for (unsigned i = 0; i < nbbs; i++)
+{
+  bool ok;
+  basic_block bb = bbs[i];
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next ())
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  if (!is_gimple_assign (stmt)
+	  || !gimple_vuse (stmt))
+	continue;
+	  tree op = gimple_assign_rhs1 (stmt);
+	  if (!INDIRECT_REF_P (op)
+	  && TREE_CODE (op) != MEM_REF
+	  && TREE_CODE (op) != TARGET_MEM_REF)
+	continue;
+	  op = TREE_OPERAND (op, 0);
+	  tree ev = analyze_scalar_evolution (out_loop, op);
+	  ev = instantiate_parameters (loop, ev);
+	  if (no_evolution_in_loop_p (ev, out_loop->num, ) && !ok)
+	count++;
+	  if (count >= max_strided_loads)
+	return count;
+	}
+}
+  return count;
+}
+
+/* Target hook that prevents complete loop unrolling if this would make
+   the outer loop's prefetch strems more than hardware can handle.  */
+
+static bool
+aarch64_ok_to_unroll (struct loop *loop, unsigned HOST_WIDE_INT nunroll)
+{
+  struct loop *loop_father;
+  unsigned loads;
+  unsigned outter_loads;
+
+  if (aarch64_tune_params.prefetch->hw_prefetchers_avail == -1)
+return true;
+
+  if ((loop_father = loop_outer (loop)))
+{
+  unsigned max_strided_loads = aarch64_tune_params.prefetch->hw_prefetchers_avail;
+  loads = count_mem_load_streams (loop_father, loop, max_strided_loads);
+  outter_loads = count_mem_load_streams (loop_father, loop_father, max_strided_loads);
+  if ((outter_loads + (nunroll - 1) * loads) > max_strided_loads)
+	return false;
+}
+  return true;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -15550,6 +15617,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4
 
+#undef TARGET_OK_TO_UNROLL
+#define TARGET_OK_TO_UNROLL aarch64_ok_to_unroll
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 795e492..45cea4c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11617,6 +11617,10 @@ is required only when the target has special constraints like maximum
 number of memory accesses.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OK_TO_UNROLL (struct loop *@var{loop_info}, unsigned HOST_WIDE_INT @var{nunroll})
+This hook should return false if target prefers loop should not be unrolled
+@end deftypefn
+
 @defmac POWI_MAX_MULTS
 If defined, this macro is interpreted as a signed integer C expression
 that specifies the maximum number of floating point multiplications
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 98f2e6b..64dfa51 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8155,6 +8155,8 @@ build_type_attribute_variant (@var{mdecl},
 
 @hook TARGET_LOOP_UNROLL_ADJUST
 
+@hook TARGET_OK_TO_UNROLL
+
 @defmac POWI_MAX_MULTS
 If defined, this macro is interpreted as a signed integer C expression
 that specifies the maximum number of floating point 

[RFC][AARCH64][PATCH 2/5]: Add number of hw prefetchers available to cpu_prefetch_tune

2017-09-14 Thread Kugan Vivekanandarajah
This patch adds number of hw prefetchers available to
cpu_prefetch_tune so it can be used in loop unrolling decisions.

Thanks,
Kugan

gcc/ChangeLog:

2017-09-12  Kugan Vivekanandarajah  

* config/aarch64/aarch64-protos.h (struct cpu_prefetch_tune): Add
  new field hw_prefetchers_avail.
* config/aarch64/aarch64.c: Add values for hw_prefetchers_avail.
From 07de7988c4c36a8eb262d53c259dc17d20d3b770 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Fri, 25 Aug 2017 10:02:45 +1000
Subject: [PATCH 2/5] Add hw prefetchers to cpu_prefetch_tune

---
 gcc/config/aarch64/aarch64-protos.h |  1 +
 gcc/config/aarch64/aarch64.c| 18 --
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e397ff4..a182105 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -211,6 +211,7 @@ struct cpu_prefetch_tune
   const int l1_cache_line_size;
   const int l2_cache_size;
   const int default_opt_level;
+  const int hw_prefetchers_avail;
 };
 
 struct tune_params
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d753666..7d1ee70 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -533,7 +533,8 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune exynosm1_prefetch_tune =
@@ -542,7 +543,8 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune qdf24xx_prefetch_tune =
@@ -551,7 +553,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   1024,			/* l2_cache_size  */
-  3			/* default_opt_level  */
+  3,			/* default_opt_level  */
+  7			/* hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -560,7 +563,8 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   16*1024,		/* l2_cache_size  */
-  3			/* default_opt_level  */
+  3,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune thunderx_prefetch_tune =
@@ -569,7 +573,8 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
@@ -578,7 +583,8 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   256,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const struct tune_params generic_tunings =
-- 
2.7.4



[RFC][PATCH 1/5] Add separate parms for rtl unroller

2017-09-14 Thread Kugan Vivekanandarajah
This patch adds separate params for rtl unroller so that they can be
tunned accordingly. Default values I have are based on some testing on
aarch64. I am happy to leave it as the current value and set them in
the back-end.

Thanks,
Kugan


gcc/ChangeLog:

2017-09-12  Kugan Vivekanandarajah  

* loop-unroll.c (decide_unroll_constant_iterations): Use new params.
(decide_unroll_runtime_iterations): Likewise.
(decide_unroll_stupid): Likewise.
* params.def (DEFPARAM): Separate and add new params for rtl unroller.
From a899caf9f82767de3db556225b28dc52a81d5967 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Mon, 14 Aug 2017 10:12:09 +1000
Subject: [PATCH 1/5] add parms for rtl unroller

---
 gcc/loop-unroll.c | 24 
 gcc/params.def| 17 +
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 84145bb..871558c 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -360,13 +360,13 @@ decide_unroll_constant_iterations (struct loop *loop, int flags)
 
   /* nunroll = total number of copies of the original loop body in
  unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
-  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
+  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns;
   nunroll_by_av
-= PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
+= PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns;
   if (nunroll > nunroll_by_av)
 nunroll = nunroll_by_av;
-  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
-nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES))
+nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES);
 
   if (targetm.loop_unroll_adjust)
 nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -664,12 +664,12 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags)
 
   /* nunroll = total number of copies of the original loop body in
  unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
-  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
-  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
+  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns;
+  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns;
   if (nunroll > nunroll_by_av)
 nunroll = nunroll_by_av;
-  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
-nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES))
+nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES);
 
   if (targetm.loop_unroll_adjust)
 nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -1158,13 +1158,13 @@ decide_unroll_stupid (struct loop *loop, int flags)
 
   /* nunroll = total number of copies of the original loop body in
  unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
-  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
+  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns;
   nunroll_by_av
-= PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
+= PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns;
   if (nunroll > nunroll_by_av)
 nunroll = nunroll_by_av;
-  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
-nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES))
+nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES);
 
   if (targetm.loop_unroll_adjust)
 nunroll = targetm.loop_unroll_adjust (nunroll, loop);
diff --git a/gcc/params.def b/gcc/params.def
index 805302b..c8b0a2b 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -302,6 +302,23 @@ DEFPARAM(PARAM_MAX_PEELED_INSNS,
 	"max-peeled-insns",
 	"The maximum number of insns of a peeled loop.",
 	100, 0, 0)
+
+DEFPARAM(PARAM_MAX_UNROLLEDP_INSNS,
+	 "max-partial-unrolled-insns",
+	 "The maximum number of instructions to consider to unroll in a loop by rtl unroller.",
+	 100, 0, 0)
+/* This parameter limits how many times the loop is unrolled depending
+   on number of insns really executed in each iteration.  */
+DEFPARAM(PARAM_MAX_AVERAGE_UNROLLEDP_INSNS,
+	 "max-partial-average-unrolled-insns",
+	 "The maximum number of instructions to consider to unroll in a loop on average by rtl unroller.",
+	 40, 0, 0)
+/* The maximum number of unrollings of a single loop.  */
+DEFPARAM(PARAM_MAX_UNROLLP_TIMES,
+	"max-partial-unroll-times",
+	"The maximum number of unrollings of a single loop by rtl unroller.",
+	4, 0, 0)
+
 /* The maximum number of peelings of a single loop.  */
 DEFPARAM(PARAM_MAX_PEEL_TIMES,
 	"max-peel-times",
-- 
2.7.4



[RFC][PATCH 0/5] Loop unrolling and memory load streams

2017-09-14 Thread Kugan Vivekanandarajah
While loop unrolling helps to keep the pipeline busy in modern
processors, it also can increase the memory streams resulting in
collisions for the hardware prefetcher that can impact performance.
This patch series tries to detect this and limit the loop unrolling.

Patch 1 : Add separate parms for rtl unroller:

Patch2: Add number of hw prefetchers available to cpu_prefetch_tune so it can
be used in loop unrolling decisions:

Patch3: Prevent tree unroller from completely unrolling inner loops if that
results in excessive strided-loads in outer loop:

Patch4: Change iv_analyze_result to take const_rtx. This is just to make the
next patch compile. No functional changes:

Patch5: add aarch64_loop_unroll_adjust to limit partial unrolling in rtl
based on strided-loads in loop:

Bootstrapped and tested on aarch64-linux-gnu (with
–funroll-all-loops). Testing on x86_64-linux-gnu ongoing.

Thanks,
Kugan


Re: [PATCH, rs6000] Don't mark the TOC reg as set up in prologue

2017-09-14 Thread Alan Modra
On Thu, Sep 14, 2017 at 11:39:54AM -0500, Segher Boessenkool wrote:
> [ pressed send too early ]
> 
> On Thu, Sep 14, 2017 at 10:18:55AM -0500, Pat Haugen wrote:
> > --- gcc/config/rs6000/rs6000.c  (revision 252029)
> > +++ gcc/config/rs6000/rs6000.c  (working copy)
> > @@ -37807,6 +37807,11 @@ rs6000_set_up_by_prologue (struct hard_r
> >  add_to_hard_reg_set (>set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
> >if (cfun->machine->split_stack_argp_used)
> >  add_to_hard_reg_set (>set, Pmode, 12);
> > +
> > +  /* Make sure the hard reg set doesn't include r2, which was possibly 
> > added
> > + via PIC_OFFSET_TABLE_REGNUM.  */
> > +  if (TARGET_TOC)
> > +remove_from_hard_reg_set (>set, Pmode, TOC_REGNUM);
> >  }
> 
> And why is the problem in PR51872 no longer there?  Or is it?

This code in rs6000_set_up_by_prologue:

  if (!TARGET_SINGLE_PIC_BASE
  && TARGET_TOC
  && TARGET_MINIMAL_TOC
  && !constant_pool_empty_p ())
add_to_hard_reg_set (>set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);

adds r30, the -mminimal-toc toc pointer register set up by the
prologue.  Pat's change removes r2.  Which looks correct to me.

-- 
Alan Modra
Australia Development Lab, IBM


Re: libgo patch committed: Upgrade to Go 1.9 release

2017-09-14 Thread Ian Lance Taylor
On Thu, Sep 14, 2017 at 3:19 PM, Rainer Orth
 wrote:
>
>> I've committed a patch to libgo to upgrade it to the recent Go 1.9 release.
>>
>> As usual with these upgrades, the patch is too large to attach here.
>> I've attached the changes to files that are more or less specific to
>> gccgo.
>>
>> This upgrade required some changes to the gotools Makefile.  And one
>> test had to be updated.  These patches are also below.
>>
>> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
>> to mainline.
>
> the patch broke Solaris bootstrap:
>
> /vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_unix.go:240:11: error: 
> reference to undefined name 'forkExecPipe'
>   if err = forkExecPipe(p[:]); err != nil {
>^
>
> libgo/go/syscall/forkpipe_bsd.go is needed on Solaris, too.
>
> /vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/link.go:73:10: 
> error: use of undefined type 'lifnum'
>   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES 
> | sysLIFC_UNDER_IPMP}
>   ^
> make[8]: *** [Makefile:3349: golang_org/x/net/lif.lo] Error 1
>
> The Go 1.9 upgrade patch has
>
> @@ -70,7 +70,7 @@ func Links(af int, name string) ([]Link,
>
>  func links(eps []endpoint, name string) ([]Link, error) {
> var lls []Link
> -   lifn := sysLifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
> sysLIFC_AL
> LZONES | sysLIFC_UNDER_IPMP}
> +   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | 
> sysLIFC_ALLZO
> NES | sysLIFC_UNDER_IPMP}
>
> Reverting that allows link.go to compile.
>
> /vol/gcc/src/hg/trunk/local/libgo/go/internal/poll/fd_unix.go:366:21: error: 
> reference to undefined identifier 'syscall.ReadDirent'
>n, err := syscall.ReadDirent(fd.Sysfd, buf)
>  ^
>
> I don't yet see where this comes from on non-Linux systems...

It's in forkpipe_bsd.go.  Does this patch fix the problem?

Ian
diff --git a/libgo/go/syscall/forkpipe_bsd.go b/libgo/go/syscall/forkpipe_bsd.go
index d4180722..28897bfd 100644
--- a/libgo/go/syscall/forkpipe_bsd.go
+++ b/libgo/go/syscall/forkpipe_bsd.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly netbsd openbsd
+// +build darwin dragonfly netbsd openbsd solaris
 
 package syscall
 


Re: [PATCH], Add support for __builtin_{sqrt,fma}f128 on PowerPC ISA 3.0

2017-09-14 Thread Michael Meissner
On Thu, Sep 14, 2017 at 09:54:14AM -0500, Segher Boessenkool wrote:
> On Wed, Sep 13, 2017 at 05:46:00PM -0400, Michael Meissner wrote:
> > This patch adds support on PowerPC ISA 3.0 for the built-in function
> > __builtin_sqrtf128 generating the XSSQRTQP hardware square root instruction 
> > and
> > the built-in function __builtin_fmaf128 generating XSMADDQP, XSMSUBQP,
> > XSNMADDQP, and XSNMSUBQP fused multiply-add instructions.
> > 
> > While I was at it, I changed the documentation so that it no longer 
> > documents
> > the 'q' built-in functions (to mirror libquadmath) but instead just 
> > documented
> > the 'f128' functions that matches glibc 2.26 and the technical report that
> > added the _FloatF128 date.
> > 
> > I changed the tests that used __fabsq to use __fabsf128 instead.
> > 
> > I also added && lp64 to float128-5.c so that it doesn't cause errors when 
> > doing
> > the test for a 32-bit target.  This is due to the fact that if you enable
> > hardware IEEE 128-bit floating point, you eventually will need TImode
> > supported, and that is not supported on 32-bit targets.
> > 
> > I did a bootstrap and make check with subversion id 252033 on a little 
> > endian
> > power8 system.  The subversion id 252033 is one of the last svn ids that
> > bootstrap without additional patches on the PowerPC.  There were no 
> > regressions
> > in this patch, and I verified the 4 new tests were run.  Can I check this 
> > patch
> > into the trunk?
> 
> Yes please.  A few trivial things:
> 
> > * doc/extend.texi (RS/6000 built-in functions): Document the
> > 'f128' IEEE 128-bit floating point built-in functions.  Don't
> > document the older 'q' versions of the functions. Document the
> > built-in IEEE 128-bit floating point square root and fused
> > multiply-add built-ins.
> 
> Dot space space.
> 
> > +/* 1 argument IEEE 128-bit floating point functions that require ISA 3.0
> > +   hardware.  We define both a 'q' version for libquadmath compatibility, 
> > and a
> > +   'f128' for glibc 2.26.  We didn't need this for FABS/COPYSIGN, since the
> > +   machine independent built-in support already defines the F128 versions, 
> >  */
> 
> Dot instead of comma?
> 
> > --- gcc/testsuite/gcc.target/powerpc/float128-5.c   (revision 252730)
> > +++ gcc/testsuite/gcc.target/powerpc/float128-5.c   (working copy)
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target { powerpc*-*-linux* } } } */
> > +/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
> 
> Maybe add a comment why this is -m64 only?

I removed the changes to the documentation that didn't pertain to the functions
I was adding (we can clean that up some other time), and added a -m64 comment.

I checked this into the trunk:

[gcc]
2017-09-14  Michael Meissner  

* config/rs6000/rs6000-builtin.def (BU_FLOAT128_1_HW): New macros
to support float128 built-in functions that require the ISA 3.0
hardware.
(BU_FLOAT128_3_HW): Likewise.
(SQRTF128): Add support for the IEEE 128-bit square root and fma
built-in functions.
(FMAF128): Likewise.
(FMAQ): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add
support for built-in functions that need the ISA 3.0 IEEE 128-bit
floating point instructions.
(rs6000_invalid_builtin): Likewise.
(rs6000_builtin_mask_names): Likewise.
* config/rs6000/rs6000.h (MASK_FLOAT128_HW): Likewise.
(RS6000_BTM_FLOAT128_HW): Likewise.
(RS6000_BTM_COMMON): Likewise.
* config/rs6000/rs6000.md (fma4_hw): Add a generator
function.
* doc/extend.texi (RS/6000 built-in functions): Document the
'f128' IEEE 128-bit floating point built-in functions. Don't
document the older 'q' versions of the functions. Document the
built-in IEEE 128-bit floating point square root and fused
multiply-add built-ins.

[gcc/testsuite]
2017-09-14  Michael Meissner  

* gcc.target/powerpc/abs128-1.c: Use __builtin_fabsf128 instead of
__builtin_fabsq.
* gcc.target/powerpc/float128-5.c: Use __builtin_fabsf128 instead
of __builtin_fabsq.  Prevent the test from running on 32-bit.
* gcc.target/powerpc/float128-fma1.c: New test.
* gcc.target/powerpc/float128-fma2.c: Likewise.
* gcc.target/powerpc/float128-sqrt1.c: Likewise.
* gcc.target/powerpc/float128-sqrt2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 252768)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -667,6 +667,23 @@
 | RS6000_BTC_UNARY),

Re: libgo patch committed: Upgrade to Go 1.9 release

2017-09-14 Thread Rainer Orth
Hi Ian,

> I've committed a patch to libgo to upgrade it to the recent Go 1.9 release.
>
> As usual with these upgrades, the patch is too large to attach here.
> I've attached the changes to files that are more or less specific to
> gccgo.
>
> This upgrade required some changes to the gotools Makefile.  And one
> test had to be updated.  These patches are also below.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

the patch broke Solaris bootstrap:

/vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_unix.go:240:11: error: 
reference to undefined name 'forkExecPipe'
  if err = forkExecPipe(p[:]); err != nil {
   ^

libgo/go/syscall/forkpipe_bsd.go is needed on Solaris, too.

/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/link.go:73:10: error: 
use of undefined type 'lifnum'
  lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES | 
sysLIFC_UNDER_IPMP}
  ^
make[8]: *** [Makefile:3349: golang_org/x/net/lif.lo] Error 1

The Go 1.9 upgrade patch has

@@ -70,7 +70,7 @@ func Links(af int, name string) ([]Link,
 
 func links(eps []endpoint, name string) ([]Link, error) {
var lls []Link
-   lifn := sysLifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_AL
LZONES | sysLIFC_UNDER_IPMP}
+   lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZO
NES | sysLIFC_UNDER_IPMP}

Reverting that allows link.go to compile.

/vol/gcc/src/hg/trunk/local/libgo/go/internal/poll/fd_unix.go:366:21: error: 
reference to undefined identifier 'syscall.ReadDirent'
   n, err := syscall.ReadDirent(fd.Sysfd, buf)
 ^

I don't yet see where this comes from on non-Linux systems...

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


diff --git a/libgo/go/golang_org/x/net/lif/link.go b/libgo/go/golang_org/x/net/lif/link.go
--- a/libgo/go/golang_org/x/net/lif/link.go
+++ b/libgo/go/golang_org/x/net/lif/link.go
@@ -70,7 +70,7 @@ func Links(af int, name string) ([]Link,
 
 func links(eps []endpoint, name string) ([]Link, error) {
 	var lls []Link
-	lifn := lifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
+	lifn := sysLifnum{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
 	lifc := lifconf{Flags: sysLIFC_NOXMIT | sysLIFC_TEMPORARY | sysLIFC_ALLZONES | sysLIFC_UNDER_IPMP}
 	for _, ep := range eps {
 		lifn.Family = uint16(ep.af)
diff --git a/libgo/go/syscall/forkpipe_bsd.go b/libgo/go/syscall/forkpipe_bsd.go
--- a/libgo/go/syscall/forkpipe_bsd.go
+++ b/libgo/go/syscall/forkpipe_bsd.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly netbsd openbsd
+// +build darwin dragonfly netbsd openbsd solaris
 
 package syscall
 


Re: [C++ PATCH] Renames/adjustments of 1z to 17

2017-09-14 Thread Jakub Jelinek
On Thu, Sep 14, 2017 at 10:32:12PM +0100, Pedro Alves wrote:
> On 09/14/2017 09:26 PM, Jakub Jelinek wrote:
> > +@item c++17
> > +@itemx c++1z
> > +The 2017 ISO C++ standard plus amendments.
> > +The name @samp{c++1z} is deprecated.
> > +
> > +@item gnu++17
> > +@itemx gnu++1z
> > +GNU dialect of @option{-std=c++17}.
> > +The name @samp{gnu++17} is deprecated.
> >  @end table
> 
> I think you meant to say that gnu++1z is deprecated, not gnu++17.

Fixed in my copy, thanks for catching that.

Jakub


Re: [C++ PATCH] Renames/adjustments of 1z to 17

2017-09-14 Thread Pedro Alves
On 09/14/2017 09:26 PM, Jakub Jelinek wrote:
> +@item c++17
> +@itemx c++1z
> +The 2017 ISO C++ standard plus amendments.
> +The name @samp{c++1z} is deprecated.
> +
> +@item gnu++17
> +@itemx gnu++1z
> +GNU dialect of @option{-std=c++17}.
> +The name @samp{gnu++17} is deprecated.
>  @end table

I think you meant to say that gnu++1z is deprecated, not gnu++17.

Thanks,
Pedro Alves



Re: [C++ PATCH] Renames/adjustments of 1z to 17

2017-09-14 Thread Jakub Jelinek
On Thu, Sep 14, 2017 at 02:24:01PM -0700, Mike Stump wrote:
> > --- gcc/doc/invoke.texi.jj  2017-09-12 21:57:57.0 +0200
> > +++ gcc/doc/invoke.texi 2017-09-14 19:32:34.342959968 +0200
> > @@ -1870,15 +1870,15 @@ GNU dialect of @option{-std=c++14}.
> > This is the default for C++ code.
> > The name @samp{gnu++1y} is deprecated.
> > 
> > -@item c++1z
> > -The next revision of the ISO C++ standard, tentatively planned for
> > -2017.  Support is highly experimental, and will almost certainly
> > -change in incompatible ways in future releases.
> > -
> > -@item gnu++1z
> > -GNU dialect of @option{-std=c++1z}.  Support is highly experimental,
> > -and will almost certainly change in incompatible ways in future
> > -releases.
> > +@item c++17
> > +@itemx c++1z
> > +The 2017 ISO C++ standard plus amendments.
> > +The name @samp{c++1z} is deprecated.
> > +
> > +@item gnu++17
> > +@itemx gnu++1z
> > +GNU dialect of @option{-std=c++17}.
> > +The name @samp{gnu++17} is deprecated.
> 
> I'd be tempted to say leave all this, and march 1z -> 2a for the _next_ 
> standard.  2020 or so is a good first stab at the date.

I didn't want to add c++2a and gnu++2a in the same patch, it can be added
incrementally and readd the above wording.  Unless somebody else is planning
to do that, I can do that next.

> > -or an unspecified value strictly larger than @code{201402L} for the
> > -experimental languages enabled by @option{-std=c++1z} and
> > -@option{-std=gnu++1z}.
> > +@code{201703L} for the 2017 C++ standard.
> 
> Likewise.

Likewise.

Jakub


Re: [C++ PATCH] Renames/adjustments of 1z to 17

2017-09-14 Thread Mike Stump
On Sep 14, 2017, at 1:26 PM, Jakub Jelinek  wrote:
> 
> Given https://herbsutter.com/2017/09/06/c17-is-formally-approved/
> this patch makes -std=c++17 and -std=gnu++17 the documented options

> --- gcc/doc/invoke.texi.jj2017-09-12 21:57:57.0 +0200
> +++ gcc/doc/invoke.texi   2017-09-14 19:32:34.342959968 +0200
> @@ -1870,15 +1870,15 @@ GNU dialect of @option{-std=c++14}.
> This is the default for C++ code.
> The name @samp{gnu++1y} is deprecated.
> 
> -@item c++1z
> -The next revision of the ISO C++ standard, tentatively planned for
> -2017.  Support is highly experimental, and will almost certainly
> -change in incompatible ways in future releases.
> -
> -@item gnu++1z
> -GNU dialect of @option{-std=c++1z}.  Support is highly experimental,
> -and will almost certainly change in incompatible ways in future
> -releases.
> +@item c++17
> +@itemx c++1z
> +The 2017 ISO C++ standard plus amendments.
> +The name @samp{c++1z} is deprecated.
> +
> +@item gnu++17
> +@itemx gnu++1z
> +GNU dialect of @option{-std=c++17}.
> +The name @samp{gnu++17} is deprecated.

I'd be tempted to say leave all this, and march 1z -> 2a for the _next_ 
standard.  2020 or so is a good first stab at the date.

> -or an unspecified value strictly larger than @code{201402L} for the
> -experimental languages enabled by @option{-std=c++1z} and
> -@option{-std=gnu++1z}.
> +@code{201703L} for the 2017 C++ standard.

Likewise.


Anyway, the testsuite portion is obvious and I reviewed it for correctness and 
Ok.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH, rs6000 version 2] Add support for vec_xst_len_r() and vec_xl_len_r() builtins

2017-09-14 Thread Carl Love

GCC maintainers:

Here is an updated patch to address the comment from Segher.  The one
comment that was not addressed was:

>> +(define_insn "altivec_lvsl_reg"
>> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=v")
>> +   (unspec:V16QI
>> +   [(match_operand:DI 1 "gpc_reg_operand" "b")]
>> +   UNSPEC_LVSL_REG))]
>> +  "TARGET_ALTIVEC"
>> +  "lvsl %0,0,%1"
>> +  [(set_attr "type" "vecload")])

vecload isn't really the correct type for this, but I see we have the
same on the existing lvsl patterns (it's permute unit on p9; I expect
the same on p8 and older, but please check).

Per our additional discussions Segher said:

> You can leave it as vecload just like the other lvsl's we have, leave
> the cleanup for a later date.

I believe everything else has been addressed.  The patch was retested on
powerpc64le-unknown-linux-gnu (Power 9 LE) and
powerpc64le-unknown-linux-gnu (Power 8 LE) without regressions.

Let me know if there are additional issues that need addressing.
Thanks.

  Carl Love

--

gcc/ChangeLog:

2017-09-14  Carl Love  

* config/rs6000/rs6000-c.c (P9V_BUILTIN_VEC_XL_LEN_R,
P9V_BUILTIN_VEC_XST_LEN_R): Add support for builtins
vector unsigned char vec_xl_len_r (unsigned char *, size_t);
void vec_xst_len_r (vector unsigned char, unsigned char *, size_t);
* config/rs6000/altivec.h (vec_xl_len_r, vec_xst_len_r): Add defines.
* config/rs6000/rs6000-builtin.def (XL_LEN_R, XST_LEN_R): Add
definitions and overloading.
* config/rs6000/rs6000.c (altivec_expand_builtin): Add case
statement for P9V_BUILTIN_XST_LEN_R.
(altivec_init_builtins): Add def_builtin for P9V_BUILTIN_STXVLL.
* config/rs6000/vsx.md (lxvll, stxvll, xl_len_r, xst_len_r): Add
define_expand and define_insn for the instructions and builtins.
* doc/extend.texi: Update the built-in documenation file for the new
built-in functions.
* config/rs6000/altivec.md (altivec_lvsl_reg, altivec_lvsr_reg): Add
define_insn for the instructions

gcc/testsuite/ChangeLog:

2017-09-14  Carl Love  

* gcc.target/powerpc/builtins-5-p9-runnable.c: Add new runable test file
for the new built-ins and the existing built-ins.
---
 gcc/config/rs6000/altivec.h|   2 +
 gcc/config/rs6000/altivec.md   |  18 ++
 gcc/config/rs6000/rs6000-builtin.def   |   4 +
 gcc/config/rs6000/rs6000-c.c   |   8 +
 gcc/config/rs6000/rs6000.c |  11 +-
 gcc/config/rs6000/vsx.md   | 114 
 gcc/doc/extend.texi|   4 +
 .../gcc.target/powerpc/builtins-5-p9-runnable.c| 309 +
 8 files changed, 468 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-5-p9-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index c8e508cf0..94a4db24a 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -467,6 +467,8 @@
 #ifdef _ARCH_PPC64
 #define vec_xl_len __builtin_vec_lxvl
 #define vec_xst_len __builtin_vec_stxvl
+#define vec_xl_len_r __builtin_vec_xl_len_r
+#define vec_xst_len_r __builtin_vec_xst_len_r
 #endif
 
 #define vec_cmpnez __builtin_vec_vcmpnez
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0aa1e3016..3436c0dfd 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2542,6 +2542,15 @@
   DONE;
 })
 
+(define_insn "altivec_lvsl_reg"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=v")
+   (unspec:V16QI
+   [(match_operand:DI 1 "gpc_reg_operand" "b")]
+   UNSPEC_LVSL_REG))]
+  "TARGET_ALTIVEC"
+  "lvsl %0,0,%1"
+  [(set_attr "type" "vecload")])
+
 (define_insn "altivec_lvsl_direct"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(unspec:V16QI [(match_operand:V16QI 1 "memory_operand" "Z")]
@@ -2574,6 +2583,15 @@
   DONE;
 })
 
+(define_insn "altivec_lvsr_reg"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=v")
+   (unspec:V16QI
+   [(match_operand:DI 1 "gpc_reg_operand" "b")]
+   UNSPEC_LVSR_REG))]
+  "TARGET_ALTIVEC"
+  "lvsr %0,0,%1"
+  [(set_attr "type" "vecload")])
+
 (define_insn "altivec_lvsr_direct"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(unspec:V16QI [(match_operand:V16QI 1 "memory_operand" "Z")]
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 850164a09..8f87ccea4 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2125,6 +2125,7 @@ BU_P9V_OVERLOAD_2 (VIESP, "insert_exp_sp")
 
 /* 2 argument vector functions added in ISA 3.0 (power9).  */
 BU_P9V_64BIT_VSX_2 (LXVL,  "lxvl", CONST,  lxvl)

Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-14 Thread Will Schmidt
On Thu, 2017-09-14 at 09:38 -0500, Bill Schmidt wrote:
> On Sep 14, 2017, at 5:15 AM, Richard Biener  
> wrote:
> > 
> > On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt
> >  wrote:
> >> On Sep 13, 2017, at 10:40 AM, Bill Schmidt  
> >> wrote:
> >>> 
> >>> On Sep 13, 2017, at 7:23 AM, Richard Biener  
> >>> wrote:
>  
>  On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>   wrote:
> > Hi,
> > 
> > [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
> > 
> > Folding of vector loads in GIMPLE.
> > 
> > Add code to handle gimple folding for the vec_ld builtins.
> > Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
> > Surrounding
> > comments have been adjusted slightly so they continue to read OK for the
> > existing vec_st code.
> > 
> > The resulting code is specifically verified by the 
> > powerpc/fold-vec-ld-*.c
> > tests which have been posted separately.
> > 
> > For V2 of this patch, I've removed the chunk of code that prohibited the
> > gimple fold from occurring in BE environments.   This had fixed an issue
> > for me earlier during my development of the code, and turns out this was
> > not necessary.  I've sniff-tested after removing that check and it looks
> > OK.
> > 
> >> + /* Limit folding of loads to LE targets.  */
> >> +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
> >> +return false;
> > 
> > I've restarted a regression test on this updated version.
> > 
> > OK for trunk (assuming successful regression test completion)  ?
> > 
> > Thanks,
> > -Will
> > 
> > [gcc]
> > 
> >  2017-09-12  Will Schmidt  
> > 
> >  * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
> >  for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
> >  * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> >  Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
> > 
> > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> > index fbab0a2..bb8a77d 100644
> > --- a/gcc/config/rs6000/rs6000-c.c
> > +++ b/gcc/config/rs6000/rs6000-c.c
> > @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t 
> > loc, tree fndecl,
> >   convert (TREE_TYPE (stmt), arg0));
> > stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> > return stmt;
> >   }
> > 
> > -  /* Expand vec_ld into an expression that masks the address and
> > - performs the load.  We need to expand this early to allow
> > +  /* Expand vec_st into an expression that masks the address and
> > + performs the store.  We need to expand this early to allow
> >the best aliasing, as by the time we get into RTL we no longer
> >are able to honor __restrict__, for example.  We may want to
> >consider this for all memory access built-ins.
> > 
> >When -maltivec=be is specified, or the wrong number of arguments
> >is provided, simply punt to existing built-in processing.  */
> > -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
> > -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
> > -  && nargs == 2)
> > -{
> > -  tree arg0 = (*arglist)[0];
> > -  tree arg1 = (*arglist)[1];
> > -
> > -  /* Strip qualifiers like "const" from the pointer arg.  */
> > -  tree arg1_type = TREE_TYPE (arg1);
> > -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
> > ARRAY_TYPE)
> > -   goto bad;
> > -
> > -  tree inner_type = TREE_TYPE (arg1_type);
> > -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
> > -   {
> > - arg1_type = build_pointer_type (build_qualified_type 
> > (inner_type,
> > -   0));
> > - arg1 = fold_convert (arg1_type, arg1);
> > -   }
> > -
> > -  /* Construct the masked address.  Let existing error handling 
> > take
> > -over if we don't have a constant offset.  */
> > -  arg0 = fold (arg0);
> > -
> > -  if (TREE_CODE (arg0) == INTEGER_CST)
> > -   {
> > - if (!ptrofftype_p (TREE_TYPE (arg0)))
> > -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
> > -
> > - tree arg1_type = TREE_TYPE (arg1);
> > - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
> > -   {
> > - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
> > - tree const0 = build_int_cstu (sizetype, 0);
> > - tree arg1_elt0 = build_array_ref (loc, arg1, const0);

[committed] Fix handling of reference vars in C++ implicit task/taskloop firstprivate (PR c++/81314)

2017-09-14 Thread Jakub Jelinek
Hi!

For firstprivate vars, even when implicit, the privatized entity is
what the reference refers to; if its copy ctor or dtor need instantiation,
doing this at gimplification time is too late, therefore we should handle
it during genericization like we handle non-reference firstprivatized vars.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued for backporting.

2017-09-14  Jakub Jelinek  

PR c++/81314
* cp-gimplify.c (omp_var_to_track): Look through references.
(omp_cxx_notice_variable): Likewise.

* testsuite/libgomp.c++/pr81314.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2017-09-01 09:26:24.0 +0200
+++ gcc/cp/cp-gimplify.c2017-09-14 15:31:54.526100238 +0200
@@ -895,6 +895,8 @@ omp_var_to_track (tree decl)
   tree type = TREE_TYPE (decl);
   if (is_invisiref_parm (decl))
 type = TREE_TYPE (type);
+  else if (TREE_CODE (type) == REFERENCE_TYPE)
+type = TREE_TYPE (type);
   while (TREE_CODE (type) == ARRAY_TYPE)
 type = TREE_TYPE (type);
   if (type == error_mark_node || !CLASS_TYPE_P (type))
@@ -947,6 +949,8 @@ omp_cxx_notice_variable (struct cp_gener
  tree type = TREE_TYPE (decl);
  if (is_invisiref_parm (decl))
type = TREE_TYPE (type);
+ else if (TREE_CODE (type) == REFERENCE_TYPE)
+   type = TREE_TYPE (type);
  while (TREE_CODE (type) == ARRAY_TYPE)
type = TREE_TYPE (type);
  get_copy_ctor (type, tf_none);
--- libgomp/testsuite/libgomp.c++/pr81314.C.jj  2017-09-14 15:51:17.883604562 
+0200
+++ libgomp/testsuite/libgomp.c++/pr81314.C 2017-09-14 15:50:56.0 
+0200
@@ -0,0 +1,38 @@
+// PR c++/81314
+// { dg-do link }
+
+template 
+struct S {
+  S () { s = 0; }
+  S (const S ) { s = x.s; }
+  ~S () {}
+  int s;
+};
+
+void
+foo (S<2> )
+{
+  #pragma omp taskloop
+  for (int i = 0; i < 100; ++i)
+x.s++;
+}
+
+void
+bar (S<3> )
+{
+  #pragma omp task
+  x.s++;
+}
+
+int
+main ()
+{
+  S<2> s;
+  S<3> t;
+  #pragma omp parallel
+  #pragma omp master
+  {
+foo (s);
+bar (t);
+  }
+}

Jakub


[C++ PATCH] Fix compile time hog in replace_placeholders (PR sanitizer/81929)

2017-09-14 Thread Jakub Jelinek
Hi!

When the expression replace_placeholders is called on contains
many SAVE_EXPRs that appear more than once in the tree, we hang walking them
over and over again, while it is sufficient to just walk it without
duplicates (not using cp_walk_tree_without_duplicates, because the callback
can cp_walk_tree* again and we want to use the same pointer set between
those).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-09-14  Jakub Jelinek  

PR sanitizer/81929
* tree.c (struct replace_placeholders_t): Add pset field.
(replace_placeholders_r): Call cp_walk_tree with d->pset as
last argument instead of NULL.  Formatting fix.
(replace_placeholders): Add pset variable, add its address
into data.  Pass  instead of NULL to cp_walk_tree.

* g++.dg/ubsan/pr81929.C: New test.

--- gcc/cp/tree.c.jj2017-09-12 09:35:47.0 +0200
+++ gcc/cp/tree.c   2017-09-14 17:38:07.717064412 +0200
@@ -3063,6 +3063,7 @@ struct replace_placeholders_t
 {
   tree obj;/* The object to be substituted for a PLACEHOLDER_EXPR.  */
   bool seen;   /* Whether we've encountered a PLACEHOLDER_EXPR.  */
+  hash_set *pset;/* To avoid walking same trees multiple times.  
*/
 };
 
 /* Like substitute_placeholder_in_expr, but handle C++ tree codes and
@@ -3085,8 +3086,8 @@ replace_placeholders_r (tree* t, int* wa
 case PLACEHOLDER_EXPR:
   {
tree x = obj;
-   for (; !(same_type_ignoring_top_level_qualifiers_p
-(TREE_TYPE (*t), TREE_TYPE (x)));
+   for (; !same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (*t),
+  TREE_TYPE (x));
 x = TREE_OPERAND (x, 0))
  gcc_assert (TREE_CODE (x) == COMPONENT_REF);
*t = x;
@@ -3118,8 +3119,7 @@ replace_placeholders_r (tree* t, int* wa
  valp = _EXPR_INITIAL (*valp);
  }
d->obj = subob;
-   cp_walk_tree (valp, replace_placeholders_r,
- data_, NULL);
+   cp_walk_tree (valp, replace_placeholders_r, data_, d->pset);
d->obj = obj;
  }
*walk_subtrees = false;
@@ -3151,10 +3151,11 @@ replace_placeholders (tree exp, tree obj
 return exp;
 
   tree *tp = 
-  replace_placeholders_t data = { obj, false };
+  hash_set pset;
+  replace_placeholders_t data = { obj, false,  };
   if (TREE_CODE (exp) == TARGET_EXPR)
 tp = _EXPR_INITIAL (exp);
-  cp_walk_tree (tp, replace_placeholders_r, , NULL);
+  cp_walk_tree (tp, replace_placeholders_r, , );
   if (seen_p)
 *seen_p = data.seen;
   return exp;
--- gcc/testsuite/g++.dg/ubsan/pr81929.C.jj 2017-09-14 17:48:09.052611540 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr81929.C2017-09-14 17:49:21.644711332 
+0200
@@ -0,0 +1,14 @@
+// PR sanitizer/81929
+// { dg-do compile }
+// { dg-options "-std=c++14 -fsanitize=undefined" }
+
+struct S { S << (long); S foo (); S (); };
+
+void
+bar ()
+{
+  static_cast(S () << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0
+  << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0
+  << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0
+  << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 
0).foo ();
+}

Jakub


Re: Rb_tree constructor optimization

2017-09-14 Thread François Dumont
I realized there was no test on the noexcept qualification of the move 
constructor with allocator.


I added some and found out that patch was missing a noexcept 
qualification at _Rb_tree level.


Here is the updated patch fully tested, ok to commit ?

François


On 13/09/2017 21:57, François Dumont wrote:

On 08/09/2017 17:50, Jonathan Wakely wrote:


Since we know __a == __x.get_allocator() we could just do:

 _Rb_tree(_Rb_tree&& __x, _Node_allocator&&, true_type)
noexcept(is_nothrow_move_constructible<_Rb_tree_impl<_Compare>>::value)
 : _M_impl(std::move(__x._M_impl))
 { }

This means we don't need the new constructor.


You want to consider that a always equal allocator is stateless 
and so that the provided allocator rvalue reference do not need to be 
moved. IMHO you can have allocator with state that do not participate 
in comparison like some monitoring info.


I'm not confortable with that and prefer to keep current behavior 
so I propose this new patch considering all your other remarks.


I change noexcept qualification on [multi]map/set constructors to 
just rely on _Rep_type constructor noexcept qualification to not 
duplicate it.


François






diff --git a/libstdc++-v3/include/bits/stl_map.h b/libstdc++-v3/include/bits/stl_map.h
index 0e8a98a..cdd2e7c 100644
--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -235,8 +235,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   map(map&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Pair_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Pair_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_multimap.h b/libstdc++-v3/include/bits/stl_multimap.h
index 7e3cea4..d32104d 100644
--- a/libstdc++-v3/include/bits/stl_multimap.h
+++ b/libstdc++-v3/include/bits/stl_multimap.h
@@ -232,8 +232,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   multimap(multimap&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Pair_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Pair_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h
index 517e77e..9ab4ab7 100644
--- a/libstdc++-v3/include/bits/stl_multiset.h
+++ b/libstdc++-v3/include/bits/stl_multiset.h
@@ -244,8 +244,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   multiset(multiset&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Key_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Key_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h
index e804a7c..6b64bcd 100644
--- a/libstdc++-v3/include/bits/stl_set.h
+++ b/libstdc++-v3/include/bits/stl_set.h
@@ -248,8 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   set(set&& __x, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Key_alloc_type>())) )
   : _M_t(std::move(__x._M_t), _Key_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h
index c2417f1..ebfc3f9 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -704,6 +704,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
 	  _Rb_tree_impl(_Rb_tree_impl&&) = default;
 
+	  _Rb_tree_impl(_Rb_tree_impl&& __x, _Node_allocator&& __a)
+	  : _Node_allocator(std::move(__a)),
+	_Base_key_compare(std::move(__x)),
+	_Rb_tree_header(std::move(__x))
+	  { }
+
 	  _Rb_tree_impl(const _Key_compare& __comp, _Node_allocator&& __a)
 	  : _Node_allocator(std::move(__a)), _Base_key_compare(__comp)
 	  { }
@@ -944,10 +950,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Rb_tree(_Rb_tree&&) = default;
 
   _Rb_tree(_Rb_tree&& __x, const allocator_type& __a)
+  noexcept( noexcept(
+	_Rb_tree(declval<_Rb_tree>(), declval<_Node_allocator>())) )
   : _Rb_tree(std::move(__x), _Node_allocator(__a))
   { }
 
-  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a);
+ 

Re: [PATCH, PR81844] Fix condition folding in c_parser_omp_for_loop

2017-09-14 Thread Jakub Jelinek
On Thu, Sep 14, 2017 at 07:34:14PM +, de Vries, Tom wrote:

> --- a/libgomp/testsuite/libgomp.c++/c++.exp
> +++ b/libgomp/testsuite/libgomp.c++/c++.exp
> @@ -22,6 +22,11 @@ dg-init
>  # Turn on OpenMP.
>  lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
>  
> +# Switch into C++ mode.  Otherwise, the libgomp.c-c++-common/*.c
> +# files would be compiled as C files.
> +set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
> +set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
> +
>  set blddir [lookfor_file [get_multilibs] libgomp]
>  
>  
> @@ -47,7 +52,9 @@ if { $blddir != "" } {
>  
>  if { $lang_test_file_found } {
>  # Gather a list of all tests.
> -set tests [lsort [find $srcdir/$subdir *.C]]
> +set tests [lsort [concat \
> +   [find $srcdir/$subdir *.C] \
> +   [find $srcdir/$subdir/../libgomp.c-c++-common *.c]]]
>  
>  if { $blddir != "" } {
>  set ld_library_path 
> "$always_ld_library_path:${blddir}/${lang_library_path}"

I don't see SAVE_GCC_UNDER_TEST being used anywhere after it is set.
Did you mean to set GCC_UNDER_TEST back to SAVE_GCC_UNDER_TEST at the end of
c++.exp?
libgomp.oacc-c++/c++.exp has:
# See above.
set GCC_UNDER_TEST "$SAVE_GCC_UNDER_TEST"

Otherwise LGTM, thanks.

Jakub


Re: [PATCH, PR81844] Fix condition folding in c_parser_omp_for_loop

2017-09-14 Thread de Vries, Tom

> I know we don't have
> libgomp.c-c++-common (maybe we should add that)

Like so?

Ran:
- make check-target-libgomp RUNTESTFLAGS=c.exp=cancel-taskgroup-1.c
- make check-target-libgomp RUNTESTFLAGS=c++.exp=cancel-taskgroup-1.c

Currently running make check-target-libgomp.

OK for trunk if tests pass?

Thanks,
- TomIntroduce libgomp/testsuite/libgomp.c-c++-common

2017-09-14  Tom de Vries  

	* testsuite/libgomp.c++/cancel-taskgroup-1.C: Remove.
	* testsuite/libgomp.c/cancel-taskgroup-1.c: Move to ...
	* testsuite/libgomp.c-c++-common/cancel-taskgroup-1.c: ... here.
	* testsuite/libgomp.c/c.exp: Include test-cases from
	libgomp.c-c++-common.
	* testsuite/libgomp.c++/c++.exp: Same.  Force c++-mode compilation of .c
	files.

---
 libgomp/testsuite/libgomp.c++/c++.exp  |  9 ++-
 libgomp/testsuite/libgomp.c++/cancel-taskgroup-1.C |  4 --
 .../libgomp.c-c++-common/cancel-taskgroup-1.c  | 70 ++
 libgomp/testsuite/libgomp.c/c.exp  |  4 +-
 libgomp/testsuite/libgomp.c/cancel-taskgroup-1.c   | 70 --
 5 files changed, 81 insertions(+), 76 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/c++.exp b/libgomp/testsuite/libgomp.c++/c++.exp
index 0454f95..146b2ba 100644
--- a/libgomp/testsuite/libgomp.c++/c++.exp
+++ b/libgomp/testsuite/libgomp.c++/c++.exp
@@ -22,6 +22,11 @@ dg-init
 # Turn on OpenMP.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
+# Switch into C++ mode.  Otherwise, the libgomp.c-c++-common/*.c
+# files would be compiled as C files.
+set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
+set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
+
 set blddir [lookfor_file [get_multilibs] libgomp]
 
 
@@ -47,7 +52,9 @@ if { $blddir != "" } {
 
 if { $lang_test_file_found } {
 # Gather a list of all tests.
-set tests [lsort [find $srcdir/$subdir *.C]]
+set tests [lsort [concat \
+			  [find $srcdir/$subdir *.C] \
+			  [find $srcdir/$subdir/../libgomp.c-c++-common *.c]]]
 
 if { $blddir != "" } {
 set ld_library_path "$always_ld_library_path:${blddir}/${lang_library_path}"
diff --git a/libgomp/testsuite/libgomp.c++/cancel-taskgroup-1.C b/libgomp/testsuite/libgomp.c++/cancel-taskgroup-1.C
deleted file mode 100644
index 4f66859..000
--- a/libgomp/testsuite/libgomp.c++/cancel-taskgroup-1.C
+++ /dev/null
@@ -1,4 +0,0 @@
-// { dg-do run }
-// { dg-set-target-env-var OMP_CANCELLATION "true" }
-
-#include "../libgomp.c/cancel-taskgroup-1.c"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/cancel-taskgroup-1.c b/libgomp/testsuite/libgomp.c-c++-common/cancel-taskgroup-1.c
new file mode 100644
index 000..5a80811
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/cancel-taskgroup-1.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-set-target-env-var OMP_CANCELLATION "true" } */
+
+#include 
+#include 
+
+struct T { struct T *children[2]; int val; };
+
+struct T *
+search (struct T *tree, int val, int lvl)
+{
+  if (tree == NULL || tree->val == val)
+return tree;
+  struct T *ret = NULL;
+  int i;
+  for (i = 0; i < 2; i++)
+#pragma omp task shared(ret) if(lvl < 10)
+{
+  struct T *r = search (tree->children[i], val, lvl + 1);
+  if (r)
+	{
+	  #pragma omp atomic write
+	  ret = r;
+	  #pragma omp cancel taskgroup
+	}
+}
+  #pragma omp taskwait
+  return ret;
+}
+
+struct T *
+searchp (struct T *tree, int val)
+{
+  struct T *ret;
+  #pragma omp parallel shared(ret) firstprivate (tree, val)
+  #pragma omp single
+  #pragma omp taskgroup
+  ret = search (tree, val, 0);
+  return ret;
+}
+
+int
+main ()
+{
+  /* Must be power of two minus 1.  */
+  int size = 0x7;
+  struct T *trees = (struct T *) malloc (size * sizeof (struct T));
+  if (trees == NULL)
+return 0;
+  int i, l = 1, b = 0;
+  for (i = 0; i < size; i++)
+{
+  if (i == l)
+	{
+	  b = l;
+	  l = l * 2 + 1;
+	}
+  trees[i].val = i;
+  trees[i].children[0] = l == size ? NULL : [l + (i - b) * 2];
+  trees[i].children[1] = l == size ? NULL : [l + (i - b) * 2 + 1];
+}
+  for (i = 0; i < 50; i++)
+{
+  int v = random () & size;
+  if (searchp ([0], v) != [v])
+	abort ();
+}
+  free (trees);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/c.exp b/libgomp/testsuite/libgomp.c/c.exp
index 300b921..31bdd57 100644
--- a/libgomp/testsuite/libgomp.c/c.exp
+++ b/libgomp/testsuite/libgomp.c/c.exp
@@ -24,7 +24,9 @@ dg-init
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
 # Gather a list of all tests.
-set tests [lsort [find $srcdir/$subdir *.c]]
+set tests [lsort [concat \
+		  [find $srcdir/$subdir *.c] \
+		  [find $srcdir/$subdir/../libgomp.c-c++-common *.c]]]
 
 set ld_library_path $always_ld_library_path
 append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
diff --git a/libgomp/testsuite/libgomp.c/cancel-taskgroup-1.c b/libgomp/testsuite/libgomp.c/cancel-taskgroup-1.c
deleted file mode 100644
index 5a80811..000
--- 

[committed] Fix crash accessing builtins in sanitizer.def and after (PR jit/82174)

2017-09-14 Thread David Malcolm
Calls to gcc_jit_context_get_builtin_function that accessed builtins
in sanitizer.def and after (or failed to match any builtin) led to
a crash accessing a NULL builtin name.

The entries with the NULL name came from these lines in sanitizer.def:

  /* This has to come before all the sanitizer builtins.  */
  DEF_BUILTIN_STUB(BEGIN_SANITIZER_BUILTINS, (const char *)0)

  [...snip...]

  /* This has to come after all the sanitizer builtins.  */
  DEF_BUILTIN_STUB(END_SANITIZER_BUILTINS, (const char *)0)

This patch updates jit-builtins.c to cope with such entries, fixing the
crash.

Successfully bootstrapped on x86_64-pc-linux-gnu;
takes jit.sum from 9769 to 9789 PASS results.

Committed to trunk as r252769.

gcc/jit/ChangeLog:
PR jit/82174
* jit-builtins.c (matches_builtin): Ignore entries with a NULL
name.

gcc/testsuite/ChangeLog:
PR jit/82174
* 
jit.dg/test-error-gcc_jit_context_get_builtin_function-unknown-builtin.c:
New test case.
---
 gcc/jit/jit-builtins.c |  5 -
 ..._context_get_builtin_function-unknown-builtin.c | 22 ++
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/jit.dg/test-error-gcc_jit_context_get_builtin_function-unknown-builtin.c

diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c
index 7840915..35c4db0 100644
--- a/gcc/jit/jit-builtins.c
+++ b/gcc/jit/jit-builtins.c
@@ -68,7 +68,10 @@ matches_builtin (const char *in_name,
 const struct builtin_data& bd)
 {
   const bool debug = 0;
-  gcc_assert (bd.name);
+
+  /* Ignore entries with a NULL name.  */
+  if (!bd.name)
+return false;
 
   if (debug)
 fprintf (stderr, "seen builtin: %s\n", bd.name);
diff --git 
a/gcc/testsuite/jit.dg/test-error-gcc_jit_context_get_builtin_function-unknown-builtin.c
 
b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_get_builtin_function-unknown-builtin.c
new file mode 100644
index 000..b1e389c
--- /dev/null
+++ 
b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_get_builtin_function-unknown-builtin.c
@@ -0,0 +1,22 @@
+#include 
+#include 
+
+#include "libgccjit.h"
+
+#include "harness.h"
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  gcc_jit_context_get_builtin_function (ctxt,
+   "this_is_not_a_builtin");
+}
+
+void
+verify_code (gcc_jit_context *ctxt, gcc_jit_result *result)
+{
+  CHECK_VALUE (result, NULL);
+
+  CHECK_STRING_VALUE (gcc_jit_context_get_first_error (ctxt),
+ "builtin \"this_is_not_a_builtin\" not found");
+}
-- 
1.8.5.3



Re: [PATCH version 2, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-14 Thread Michael Meissner
On Wed, Sep 13, 2017 at 06:08:45PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 12, 2017 at 07:17:07PM -0400, Michael Meissner wrote:
> > On Tue, Sep 12, 2017 at 05:41:34PM -0500, Segher Boessenkool wrote:
> > > This needs "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT" I think?  Which
> > > is the same as "TARGET_DF_FPR".  "lrintdi2" also has "TARGET_FPRND"
> > > but that is a mistake I think.  Cc:ing Mike for that.
> > 
> > TARGET_FPRND is for ISA 2.02 (power5+) which added the various round to 
> > integer
> > instructions.  So, unless we are going to stop supporting older machines, 
> > it is
> > needed.
> 
> I think you have this a bit wrong?
> 
> ISA 2.02 added fri*, but that's round to integer, not convert.
> 
> ISA 2.06 added some fc[ft]i* instructions, but most are in base PowerPC
> already.
> 
> And FPRND is enabled at ISA 2.04 and later?

Perhaps, I wasn't paying close attention of the exact instructions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Add support for __builtin_{sqrt,fma}f128 on PowerPC ISA 3.0

2017-09-14 Thread Michael Meissner
On Wed, Sep 13, 2017 at 10:49:43PM +, Joseph Myers wrote:
> On Wed, 13 Sep 2017, Michael Meissner wrote:
> 
> > This patch adds support on PowerPC ISA 3.0 for the built-in function
> > __builtin_sqrtf128 generating the XSSQRTQP hardware square root instruction 
> > and
> > the built-in function __builtin_fmaf128 generating XSMADDQP, XSMSUBQP,
> > XSNMADDQP, and XSNMSUBQP fused multiply-add instructions.
> 
> Is there a reason for these to be architecture-specific rather than 
> generic everywhere _Float128 is supported?  (With the fmaf128 / sqrtf128 
> names available as well as the __builtin_* variants of those.)

I wanted to get in the PowerPC stuff ASAP, so the library people can start
using it.

I do think for at least some of the built-ins (sqrt, fma, lrint, etc.) it makes
sense, and at some point I was going to look at it.

In the grand scheme of things, it is only a temporary measure, and the real end
goal is to enable switching long double to be float128.  However, that will
take multiple releases to get there.

> Full support for _FloatN/_FloatNx variants of all the existing built-in 
> functions might be complicated, and run into potential issues with startup 
> cost of creating large numbers of extra built-in functions (it's 
> desirable, but possibly hard, which is why I excluded it from the initial 
> _FloatN / _FloatNx support patches).  But adding just these two functions 
> to builtins.def and making them fold / expand appropriately ought to be 
> much simpler.  (I realise sqrt goes through internal-fn.def and 
> DEF_INTERNAL_FLT_FN expects a particular set of functions for standard 
> types, so maybe some duplication would be involved to get the built-in 
> function expanded appropriately, i.e. using an insn pattern or a call to 
> an external sqrtf128 function according to whether such an insn pattern is 
> available.  fma ought not to involve much more than adding an extra case 
> where CASE_FLT_FN (BUILT_IN_FMA) is used.)

Yeah, but I wanted to get the easy stuff in there right now before looking at
the machine independent support.

> > While I was at it, I changed the documentation so that it no longer 
> > documents
> > the 'q' built-in functions (to mirror libquadmath) but instead just 
> > documented
> > the 'f128' functions that matches glibc 2.26 and the technical report that
> > added the _FloatF128 date.
> 
> Those *f128 built-in functions (inf / huge_val / nan / nans / fabs / 
> copysign) are not target-specific; they exist for all _FloatN / _FloatNx 
> types for all targets with such types.  So it doesn't seem appropriate to 
> document them in a target-specific section of the manual, beyond a brief 
> cross-reference to the documentation of the functions as 
> target-independent.

Right now we just document a few of the 'q' functions that were added before
you added the f128 versions.  I was trying to harmonize things.  Originally, I
was going to make both __builtin_sqrtq and __builtin_sqrtf128, but Segher and
David didn't want that.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [patch, fortran, RFC] warn about out-of-bounds errors in DO loops

2017-09-14 Thread Thomas Koenig

Hi Richard,


Is it OK to throw a hard error for this?  Maybe the rules are different
from C and C++, but normally we can't do that for code that's only
invalid if executed.  An unconditional warning would be good though.


I can also issue an unconditional warning; this will even simplify
the code somewhat.  Actually, we do the same for simple out-of-bounds-
accesses, so this would be consistent.

I'll rework the patch accordingly, unless somebody else speaks up
with another idea.

Regards

Thomas


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-14 Thread Steve Ellcey
On Thu, 2017-09-14 at 11:53 -0600, Jeff Law wrote:
> 
> 
> And I think that's starting to zero in on the problem --
> WORD_REGISTER_OPERATIONS is zero on aarch64 as you don't get extension
> to word_mode for W form registers.
> 
> I wonder if what needs to happen is somehow look to extend that code
> somehow so that combine and friends know that the value is zero extended
> to 32 bits, even if it's not extended to word_mode.
> 
> Jeff

This might be a good long term direction to move but in the mean time
it sure does seem a lot easier to just generate a subreg.  Here is a
patch that does that, it passes bootstrap and has no regressions and
fixes the bug in question (and most likely improves other code as
well).

The "LOAD_EXTEND_OP (mode) == ZERO_EXTEND" part of the if
statement is not really necessary since we know this is true on aarch64
but I thought it helped make it clear what we were doing and the
compiler should optimize it away anyway.

OK to checkin this fix while we consider longer term options?

Steve Ellcey
sell...@cavium.com


2017-09-14  Steve Ellcey  

PR target/77729
* config/aarch64/aarch64.md (mov): Generate subreg for
short loads to reflect that upper bits are zeroed out on load.


2017-09-14  Steve Ellcey  

* gcc.target/aarch64/pr77729.c: New test.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f8cdb06..bca4cf5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -864,6 +864,15 @@
 	(match_operand:SHORT 1 "general_operand" ""))]
   ""
   "
+if (LOAD_EXTEND_OP (mode) == ZERO_EXTEND && MEM_P (operands[1])
+	&& can_create_pseudo_p () && optimize > 0)
+  {
+	/* Generate subreg of SImode so we know that the upper bits
+	of the reg are zero and do not need to masked out later.  */
+	rtx reg = gen_reg_rtx (SImode);
+	emit_insn (gen_zero_extendsi2 (reg, operands[1]));
+	operands[1] = gen_lowpart (mode, reg);
+  }
 if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
   operands[1] = force_reg (mode, operands[1]);
   "
diff --git a/gcc/testsuite/gcc.target/aarch64/pr77729.c b/gcc/testsuite/gcc.target/aarch64/pr77729.c
index e69de29..2fcda9a 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr77729.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr77729.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int TrieCase3_v1(const char *string)
+{
+if((string[0] | 32) == 't') {
+if((string[1] | 32) == 'a') {
+if((string[2] | 32) == 'g') {
+return 42;
+}
+}
+}
+return -1;
+}
+
+int TrieCase3_v2(const char *string)
+{
+switch(string[0] | 32) {
+case 't':
+switch(string[1] | 32) {
+case 'a':
+switch(string[2] | 32) {
+case 'g':
+return 42;
+}
+}
+}
+return -1;
+}
+
+/* { dg-final { scan-assembler-not "and" } } */
+/* { dg-final { scan-assembler-not "uxtb" } } */


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-14 Thread Jeff Law
On 09/14/2017 10:33 AM, Steve Ellcey wrote:
> On Thu, 2017-09-14 at 09:03 -0600, Jeff Law wrote:
>> On 09/13/2017 03:46 PM, Steve Ellcey wrote:
>>>  
>>> In arm32 rtl expansion, when reading the QI memory location, I see
>>> these instructions get generated:
>>>
>>> (insn 10 3 11 2 (set (reg:SI 119)
>>> (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0
>>> *string_9(D)+0 S1 A8]))) "pr77729.c":4 -1
>>>  (nil))
>>> (insn 11 10 12 2 (set (reg:QI 118)
>>> (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
>>>  (nil))
>>>
>>> And in aarch64 rtl expansion I see:
>>>
>>> (insn 10 9 11 (set (reg:QI 81)
>>> (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1
>>> A8])) "pr77729.c":3 -1
>>>  (nil))
>>>
>>> Both of these sequences expand to ldrb but in the arm32 case I know
>>> that I set all 32 bits of the register (even though I only want the
>>> bottom 8 bits), but for aarch64 I only know that I set the bottom 8
>>> bits and I don't know anything about the higher bits, meaning I have to
>>> keep the AND instruction to mask out the upper bits on aarch64.
> 
>> It's one of the reasons I discourage subregs -- the number of cases
>> where we can optimize based on the "don't care" semantics are relatively
>> small in my experience and I consistently see cases where the "don't
>> care" property of the subreg turns into "don't know" and suppresses
>> downstream optimizations.
>>
>> It's always a judgment call, but more and more often I find myself
>> pushing towards defining those bits using a zero/sign extension, bit
>> operation or whatever rather than using subregs.
> 
> So if I were loading a QImode to a register (zeroing out the upper
> bits) would you generate something like:
> 
> (truncate:QI (zero_extend:SI (reg:QI)))
> 
>  instead of:
> 
> (subreg:QI (reg:SI))
> 
>>> I think we should change the movqi/movhi expansions on aarch64 to
>>> recognize that the ldrb/ldrh instructions zero out the upper bits in
>>> the register by generating rtl like arm32 does.
> 
>> Is LOAD_EXTEND_OP defined for aarch64?
> 
> Yes, aarch64 defines LOAD_EXTEND_OP to be ZERO_EXTEND.
> 
>> It may also be worth looking at ree.c -- my recollection is that it
>> didn't handle subregs, but it could and probably should.
> 
> I only see a couple of references to subregs in ree.c.  I think they
> both involve searching for all uses of a register.
Right.  But the subreg expressions are also forms of extension -- we
just don't know (or care) if it's zero or sign extension.

We'd start by recognizing the paradoxical subreg in
add_removable_extension as a form of an extension similar to zero_extend
and sign_extend.

When we go to combine the "extension" into the defining insn, we would
test 3 forms

(set (target) (zero_extend (exp)))

(set (target) (sign_extend (exp)))

(set (target) (subreg (exp)))

If any form matches an insn on the target, then we're done.


This may require adding some new patterns to aarch64 -- I believe we've
got patterns on x86 to match some of these forms to aid redundant
extension eliminmation.

It might also be helpful to teach ree about LOAD_EXTEND_OP which would
allow combining one of the extension forms with a memory reference.

Jeff


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-14 Thread Jeff Law
On 09/14/2017 10:33 AM, Steve Ellcey wrote:
> On Thu, 2017-09-14 at 09:03 -0600, Jeff Law wrote:
>> On 09/13/2017 03:46 PM, Steve Ellcey wrote:
>>>  
>>> In arm32 rtl expansion, when reading the QI memory location, I see
>>> these instructions get generated:
>>>
>>> (insn 10 3 11 2 (set (reg:SI 119)
>>> (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0
>>> *string_9(D)+0 S1 A8]))) "pr77729.c":4 -1
>>>  (nil))
>>> (insn 11 10 12 2 (set (reg:QI 118)
>>> (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
>>>  (nil))
>>>
>>> And in aarch64 rtl expansion I see:
>>>
>>> (insn 10 9 11 (set (reg:QI 81)
>>> (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1
>>> A8])) "pr77729.c":3 -1
>>>  (nil))
>>>
>>> Both of these sequences expand to ldrb but in the arm32 case I know
>>> that I set all 32 bits of the register (even though I only want the
>>> bottom 8 bits), but for aarch64 I only know that I set the bottom 8
>>> bits and I don't know anything about the higher bits, meaning I have to
>>> keep the AND instruction to mask out the upper bits on aarch64.
> 
>> It's one of the reasons I discourage subregs -- the number of cases
>> where we can optimize based on the "don't care" semantics are relatively
>> small in my experience and I consistently see cases where the "don't
>> care" property of the subreg turns into "don't know" and suppresses
>> downstream optimizations.
>>
>> It's always a judgment call, but more and more often I find myself
>> pushing towards defining those bits using a zero/sign extension, bit
>> operation or whatever rather than using subregs.
> 
> So if I were loading a QImode to a register (zeroing out the upper
> bits) would you generate something like:
> 
> (truncate:QI (zero_extend:SI (reg:QI)))
On a LOAD_EXTEND_OP target which zero extends and
WORD_REGISTER_OPERATIONS as 1 I'd load memory with just

(set (reg:QI) (mem:QI (whatever))

If that object is later used elsewhere it probably will be explicitly
sign/zero extended.  But combine ought to be able to eliminate that
explicit extension.

And I think that's starting to zero in on the problem --
WORD_REGISTER_OPERATIONS is zero on aarch64 as you don't get extension
to word_mode for W form registers.

I wonder if what needs to happen is somehow look to extend that code
somehow so that combine and friends know that the value is zero extended
to 32 bits, even if it's not extended to word_mode.

Jeff



Re: [PATCH, rs6000] Don't mark the TOC reg as set up in prologue

2017-09-14 Thread Segher Boessenkool
On Thu, Sep 14, 2017 at 11:53:02AM -0500, Pat Haugen wrote:
> On 09/14/2017 11:35 AM, Segher Boessenkool wrote:
> > On Thu, Sep 14, 2017 at 10:18:55AM -0500, Pat Haugen wrote:
> >> --- gcc/config/rs6000/rs6000.c (revision 252029)
> >> +++ gcc/config/rs6000/rs6000.c (working copy)
> >> @@ -37807,6 +37807,11 @@ rs6000_set_up_by_prologue (struct hard_r
> >>  add_to_hard_reg_set (>set, Pmode, 
> >> RS6000_PIC_OFFSET_TABLE_REGNUM);
> >>if (cfun->machine->split_stack_argp_used)
> >>  add_to_hard_reg_set (>set, Pmode, 12);
> >> +
> >> +  /* Make sure the hard reg set doesn't include r2, which was possibly 
> >> added
> >> + via PIC_OFFSET_TABLE_REGNUM.  */
> >> +  if (TARGET_TOC)
> >> +remove_from_hard_reg_set (>set, Pmode, TOC_REGNUM);
> >>  }
> > Hrm, can't you simply not add it in the first place?  Just a few lines up?
> 
> As we discussed offline, that is RS6000_PIC_OFFSET_TABLE_REGNUM (i.e.
> r30). PIC_OFFSET_REGNUM is added in shrink-wrap.c:try_shrink_wrapping().

Ah, right, I never can keep the two apart.

> I noticed that the test fails on 32-bit because there's no prologue and
> hence no shrink-wrapping to be performed. Following is updated testcase
> which limits it to 64-bit. Ok?

> --- gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c (working copy)
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target { lp64 } } } */

You don't need the braces around lp64.

> +/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
> +
> +/* Verify we move the prologue past the TOC reference of 'j' and
> shrink-wrap
> +   the function. */
> +void bar();
> +int j;
> +void foo(int i)
> +{
> +  j = i;
> +  if (i > 0)
> +{
> +  bar();

If you add   asm ("");   here, it won't do a sibcall on any (sub-)target.
Not that this testcase is relevant elsewhere anyway.

> +}
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1
> "pro_and_epilogue" } } */

Okay for trunk, with or without that asm.  Thanks!


Segher


Re: [PATCH] Add comments to struct cgraph_thunk_info

2017-09-14 Thread Jeff Law
On 09/14/2017 02:01 AM, Pierre-Marie de Rodat wrote:
> Hello,
> 
> This commit adds comments to fields in the cgraph_thunk_info structure
> declaration from cgraph.h. They will hopefully answer questions that
> people like myself can ask while discovering the thunk machinery.  I
> also made an assertion stricter in cgraph_node::create_thunk.
> 
> I'm adding Nathan in copy as we discussed this thunk matter at this
> year's Cauldron. :-)
> 
> Bootsrapped and regtested on x86_64-linux.  Ok to commit?  Thank you in
> advance!
> 
> gcc/
> 
>   * cgraph.h (cgraph_thunk_info): Add comments, reorder fields.
>   * cgraph.c (cgraph_node::create_thunk): Adjust comment, make
>   assert for VIRTUAL_* arguments stricter.
The comment additions are fine.  What's the rationale behind the
ordering of the fields?  In general we want the opposite order from what
you did -- going from most strictly aligned to least strictly aligned
minimizes the amount of unused padding.


jeff


libgo patch committed: Upgrade to Go 1.9 release

2017-09-14 Thread Ian Lance Taylor
I've committed a patch to libgo to upgrade it to the recent Go 1.9 release.

As usual with these upgrades, the patch is too large to attach here.
I've attached the changes to files that are more or less specific to
gccgo.

This upgrade required some changes to the gotools Makefile.  And one
test had to be updated.  These patches are also below.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

gotools/:
2017-09-14  Ian Lance Taylor  

* Makefile.am (LIBGOTOOL): Define.
(go_cmd_go_files): Update for Go 1.9 release.
(go$(EXEEXT)): Depend on and link against $(LIBGOTOOL).
(CHECK_ENV): Add definition of shell variable fl.
(check-go-tool): Update for rearrangement of cmd/go sources in Go
1.9 release.  Echo failure message if test fails.
(check-runtime): Echo failure message if test fails.
(check-cgo-test, check-carchive-test): Likewise.
* Makefile.in: Rebuild.


patch.txt.bz2
Description: BZip2 compressed data


Re: Turn CANNOT_CHANGE_MODE_CLASS into a hook

2017-09-14 Thread Jeff Law
On 09/13/2017 01:19 PM, Richard Sandiford wrote:
> This also seemed like a good opportunity to reverse the sense of the
> hook to "can", to avoid the awkward double negative in !CANNOT.
Yea.  The double-negatives can sometimes make code hard to read.


> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory.  OK to install?
> 
> Richard
> 
> 
> 2017-09-13  Richard Sandiford  
>   Alan Hayard  
>   David Sherwood  
> 
> gcc/
>   * target.def (can_change_mode_class): New hook.
>   (mode_rep_extended): Refer to it instead of CANNOT_CHANGE_MODE_CLASS.
>   (hard_regno_nregs): Likewise.
>   * hooks.h (hook_bool_mode_mode_reg_class_t_true): Declare.
>   * hooks.c (hook_bool_mode_mode_reg_class_t_true): New function.
>   * doc/tm.texi.in (CANNOT_CHANGE_MODE_CLASS): Replace with...
>   (TARGET_CAN_CHANGE_MODE_CLASS): ...this.
>   (LOAD_EXTEND_OP): Update accordingly.
>   * doc/tm.texi: Regenerate.
>   * doc/rtl.texi: Refer to TARGET_CAN_CHANGE_MODE_CLASS instead of
>   CANNOT_CHANGE_MODE_CLASS.
>   * hard-reg-set.h (REG_CANNOT_CHANGE_MODE_P): Replace with...
>   (REG_CAN_CHANGE_MODE_P): ...this new macro.
>   * combine.c (simplify_set): Update accordingly.
>   * emit-rtl.c (validate_subreg): Likewise.
>   * recog.c (general_operand): Likewise.
>   * regcprop.c (mode_change_ok): Likewise.
>   * reload1.c (choose_reload_regs): Likewise.
>   (inherit_piecemeal_p): Likewise.
>   * rtlanal.c (simplify_subreg_regno): Likewise.
>   * postreload.c (reload_cse_simplify_set): Use REG_CAN_CHANGE_MODE_P
>   instead of CANNOT_CHANGE_MODE_CLASS.
>   (reload_cse_simplify_operands): Likewise.
>   * reload.c (push_reload): Use targetm.can_change_mode_class
>   instead of CANNOT_CHANGE_MODE_CLASS.
>   (push_reload): Likewise.  Also use REG_CAN_CHANGE_MODE_P instead of
>   REG_CANNOT_CHANGE_MODE_P.
>   * config/alpha/alpha.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/alpha/alpha.c (alpha_can_change_mode_class): New function.
>   (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/arm/arm.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   (arm_can_change_mode_class): New function.
>   * config/arm/neon.md: Refer to TARGET_CAN_CHANGE_MODE_CLASS rather
>   than CANNOT_CHANGE_MODE_CLASS in comments.
>   * config/i386/i386.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/i386/i386-protos.h (ix86_cannot_change_mode_class): Delete.
>   * config/i386/i386.c (ix86_cannot_change_mode_class): Replace with...
>   (ix86_can_change_mode_class): ...this new function, inverting the
>   sense of the return value.
>   (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   * config/ia64/ia64.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/ia64/ia64.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   (ia64_can_change_mode_class): New function.
>   * config/m32c/m32c.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/m32c/m32c-protos.h (m32c_cannot_change_mode_class): Delete.
>   * config/m32c/m32c.c (m32c_cannot_change_mode_class): Replace with...
>   (m32c_can_change_mode_class): ...this new function, inverting the
>   sense of the return value.
>   (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   * config/mips/mips.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/mips/mips-protos.h (mips_cannot_change_mode_class): Delete.
>   * config/mips/mips.c (mips_cannot_change_mode_class): Replace with...
>   (mips_can_change_mode_class): ...this new function, inverting the
>   sense of the return value.
>   (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   * config/msp430/msp430.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/msp430/msp430.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   (msp430_can_change_mode_class): New function.
>   * config/nvptx/nvptx.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/nvptx/nvptx.c (nvptx_can_change_mode_class): New function.
>   (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   * config/pa/pa32-regs.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/pa/pa64-regs.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * config/pa/pa-protos.h (pa_cannot_change_mode_class): Delete.
>   * config/pa/pa.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
>   (pa_cannot_change_mode_class): Replace with...
>   (pa_can_change_mode_class): ...this new function, inverting the
>   sense of the return value.
>   (pa_modes_tieable_p): Refer to TARGET_CAN_CHANGE_MODE_CLASS rather
>   than CANNOT_CHANGE_MODE_CLASS in comments.
>   * config/pdp11/pdp11.h (CANNOT_CHANGE_MODE_CLASS): Delete.
>   * 

Re: [PATCH, rs6000] Don't mark the TOC reg as set up in prologue

2017-09-14 Thread Segher Boessenkool
[ pressed send too early ]

On Thu, Sep 14, 2017 at 10:18:55AM -0500, Pat Haugen wrote:
> --- gcc/config/rs6000/rs6000.c(revision 252029)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -37807,6 +37807,11 @@ rs6000_set_up_by_prologue (struct hard_r
>  add_to_hard_reg_set (>set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
>if (cfun->machine->split_stack_argp_used)
>  add_to_hard_reg_set (>set, Pmode, 12);
> +
> +  /* Make sure the hard reg set doesn't include r2, which was possibly added
> + via PIC_OFFSET_TABLE_REGNUM.  */
> +  if (TARGET_TOC)
> +remove_from_hard_reg_set (>set, Pmode, TOC_REGNUM);
>  }

And why is the problem in PR51872 no longer there?  Or is it?


Segher


Re: Turn TRULY_NOOP_TRUNCATION into a hook

2017-09-14 Thread Jeff Law
On 09/13/2017 01:21 PM, Richard Sandiford wrote:
> I'm not sure the documentation is correct that outprec is always less
> than inprec, and each non-default implementation tested for the case
> in which it wasn't, but the patch leaves it as-is.
While the non-default implementations may always test for that case, I
don't think is makes much, if any sense.  It could well be all the
implementations starting from a common base when TURLY_NOOP_TRUNCATION
was added and just getting copied over time.

I'd fully support someone doing some instrumentation to verify we're not
seeing outprec >= inprec, then removing those checks independently.



> 
> The SH port had a couple of TRULY_NOOP_TRUNCATION tests that were left
> over from the old shmedia port.
shmedia is a subset (or mode) for sh5, right?  If so, then it can just
go away.

> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory.  OK to install?
> 
> Richard
> 
> 
> 2017-09-13  Richard Sandiford  
>   Alan Hayard  
>   David Sherwood  
> 
> gcc/
>   * target.def (truly_noop_truncation): New hook.
>   (mode_rep_extended): Refer to TARGET_TRULY_NOOP_TRUNCATION rather
>   than TRULY_NOOP_TRUNCATION.
>   * hooks.h (hook_bool_uint_uint_true): Declare.
>   * hooks.c (hook_bool_uint_uint_true): New function.
>   * doc/tm.texi.in (TRULY_NOOP_TRUNCATION): Replace with...
>   (TARGET_TRULY_NOOP_TRUNCATION): ...this.
>   * doc/tm.texi: Regenerate.
>   * combine.c (make_extraction): Refer to TARGET_TRULY_NOOP_TRUNCATION
>   rather than TRULY_NOOP_TRUNCATION in comments.
>   (simplify_comparison): Likewise.
>   (record_truncated_value): Likewise.
>   * expmed.c (extract_bit_field_1): Likewise.
>   (extract_split_bit_field): Likewise.
>   * convert.c (convert_to_integer_1): Use targetm.truly_noop_truncation
>   instead of TRULY_NOOP_TRUNCATION.
>   * function.c (assign_parm_setup_block): Likewise.
>   * machmode.h (TRULY_NOOP_TRUNCATION_MODES_P): Likewise.
>   * rtlhooks.c: Include target.h.
>   * config/aarch64/aarch64.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/alpha/alpha.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/arc/arc.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/arm/arm.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/avr/avr.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/bfin/bfin.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/c6x/c6x.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/cr16/cr16.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/cris/cris.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/epiphany/epiphany.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/fr30/fr30.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/frv/frv.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/ft32/ft32.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/h8300/h8300.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/i386/i386.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/ia64/ia64.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/iq2000/iq2000.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/lm32/lm32.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/m32c/m32c.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/m32r/m32r.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/m68k/m68k.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/mcore/mcore.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/microblaze/microblaze.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/mips/mips.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/mips/mips.c (mips_truly_noop_truncation): New function.
>   (TARGET_TRULY_NOOP_TRUNCATION): Redefine.
>   * config/mips/mips.md: Refer to TARGET_TRULY_NOOP_TRUNCATION
>   rather than TRULY_NOOP_TRUNCATION in comments.
>   * config/mmix/mmix.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/mn10300/mn10300.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/moxie/moxie.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/msp430/msp430.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/nds32/nds32.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/nios2/nios2.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/nvptx/nvptx.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/pa/pa.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/pdp11/pdp11.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/powerpcspe/powerpcspe.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/riscv/riscv.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/riscv/riscv.md: Refer to TARGET_TRULY_NOOP_TRUNCATION
>   rather than TRULY_NOOP_TRUNCATION in comments.
>   * config/rl78/rl78.h (TRULY_NOOP_TRUNCATION): Delete.
>   * config/rs6000/rs6000.h (TRULY_NOOP_TRUNCATION): Delete.
>   * 

Re: [PATCH, rs6000] Don't mark the TOC reg as set up in prologue

2017-09-14 Thread Pat Haugen
On 09/14/2017 11:35 AM, Segher Boessenkool wrote:
> On Thu, Sep 14, 2017 at 10:18:55AM -0500, Pat Haugen wrote:
>> --- gcc/config/rs6000/rs6000.c   (revision 252029)
>> +++ gcc/config/rs6000/rs6000.c   (working copy)
>> @@ -37807,6 +37807,11 @@ rs6000_set_up_by_prologue (struct hard_r
>>  add_to_hard_reg_set (>set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
>>if (cfun->machine->split_stack_argp_used)
>>  add_to_hard_reg_set (>set, Pmode, 12);
>> +
>> +  /* Make sure the hard reg set doesn't include r2, which was possibly added
>> + via PIC_OFFSET_TABLE_REGNUM.  */
>> +  if (TARGET_TOC)
>> +remove_from_hard_reg_set (>set, Pmode, TOC_REGNUM);
>>  }
> Hrm, can't you simply not add it in the first place?  Just a few lines up?

As we discussed offline, that is RS6000_PIC_OFFSET_TABLE_REGNUM (i.e.
r30). PIC_OFFSET_REGNUM is added in shrink-wrap.c:try_shrink_wrapping().

I noticed that the test fails on 32-bit because there's no prologue and
hence no shrink-wrapping to be performed. Following is updated testcase
which limits it to 64-bit. Ok?

-Pat

Index: gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c
===
--- gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c   (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c   (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify we move the prologue past the TOC reference of 'j' and
shrink-wrap
+   the function. */
+void bar();
+int j;
+void foo(int i)
+{
+  j = i;
+  if (i > 0)
+{
+  bar();
+}
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1
"pro_and_epilogue" } } */



Re: [PATCH] Enhance PHI processing in VN

2017-09-14 Thread David Edelsohn
* tree-ssa-sccvn.c (visit_phi): Merge undefined values similar
to VN_TOP.

This seems to have regressed

FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
"Read tp_first_run: 0" 2
FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
"Read tp_first_run: 2" 1
FAIL: gcc.dg/tree-prof/time-profiler-2.c scan-ipa-dump-times profile
"Read tp_first_run: 3" 1

- David


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-09-14 Thread Jan Hubicka
> 
> Well, it's of course the poor-mans solution compared to providing our own
> ifunc-enabled libm ...

One benefit here would be that we could have our own calling convention for
this.  So for floor/ceil we may just declare registers to be preserved (as
they are on all modern AVX enabled cpus) which would make the code size/speed
tradeoffs more interesting.

Honza
> 
> I would expect that for SSE 4.1 the PLT and call overhead is measurable
> and an inline run-time check be quite a bit more efficient.  As you have a
> testcase would it be possible to measure that by hand-editing the assembly
> (or the benchmark source in case it is not fortran...)?
> 
> The whole point of having the inline expansions was to have inline expansions,
> avoding the need to spill the whole set of SSE regs around such calls.
> 
> > I was just surprised by the glibc check, what would you consider a
> > recent-enough glibc?  Or is the check mainly necessary to ensure we
> > are indeed using glibc and not some other libc (and thus something
> > like we do for TARGET_LIBC_PROVIDES_SSP would do)?
> >
> > I will try to come up with a patch.
> 
> I don't think this is the appropriate solution.  Try disabling the inline
> expansion and run SPEC (without -march=sse4.1 of course).
> 
> I realize that doing the inline-expansion with a runtime check
> is going to be quite tricky and the GCC local IFUNC trick doesn't
> solve the inlining (but we might be able to avoid spilling with some
> IPA RA help and/or attributes?).
> 
> Richard.
> 
> > Thanks,
> >
> > Martin


Re: Turn FUNCTION_ARG_OFFSET into a hook

2017-09-14 Thread Jeff Law
On 09/13/2017 01:22 PM, Richard Sandiford wrote:
> Nice and easy, one definition and one use :-)
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory.  OK to install?
> 
> Richard
> 
> 
> 2017-09-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * target.def (function_arg_offset): New hook.
>   * targhooks.h (default_function_arg_offset): Declare.
>   * targhooks.c (default_function_arg_offset): New function.
>   * function.c (locate_and_pad_parm): Use
>   targetm.calls.function_arg_offset instead of FUNCTION_ARG_OFFSET.
>   * doc/tm.texi.in (FUNCTION_ARG_OFFSET): Replace with...
>   (TARGET_FUNCTION_ARG_OFFSET): ...this.
>   * doc/tm.texi: Regenerate.
>   * config/spu/spu.h (FUNCTION_ARG_OFFSET): Delete.
>   * config/spu/spu.c (spu_function_arg_offset): New function.
>   (TARGET_FUNCTION_ARG_OFFSET): Redefine.
>   * system.h (FUNCTION_ARG_OFFSET): Poison.
OK.
jeff


Re: [PATCH, rs6000] Don't mark the TOC reg as set up in prologue

2017-09-14 Thread Segher Boessenkool
On Thu, Sep 14, 2017 at 10:18:55AM -0500, Pat Haugen wrote:
> --- gcc/config/rs6000/rs6000.c(revision 252029)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -37807,6 +37807,11 @@ rs6000_set_up_by_prologue (struct hard_r
>  add_to_hard_reg_set (>set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
>if (cfun->machine->split_stack_argp_used)
>  add_to_hard_reg_set (>set, Pmode, 12);
> +
> +  /* Make sure the hard reg set doesn't include r2, which was possibly added
> + via PIC_OFFSET_TABLE_REGNUM.  */
> +  if (TARGET_TOC)
> +remove_from_hard_reg_set (>set, Pmode, TOC_REGNUM);
>  }

Hrm, can't you simply not add it in the first place?  Just a few lines up?


Segher


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-14 Thread Steve Ellcey
On Thu, 2017-09-14 at 09:03 -0600, Jeff Law wrote:
> On 09/13/2017 03:46 PM, Steve Ellcey wrote:
> > 
> > In arm32 rtl expansion, when reading the QI memory location, I see
> > these instructions get generated:
> > 
> > (insn 10 3 11 2 (set (reg:SI 119)
> > (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0
> > *string_9(D)+0 S1 A8]))) "pr77729.c":4 -1
> >  (nil))
> > (insn 11 10 12 2 (set (reg:QI 118)
> > (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
> >  (nil))
> > 
> > And in aarch64 rtl expansion I see:
> > 
> > (insn 10 9 11 (set (reg:QI 81)
> > (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1
> > A8])) "pr77729.c":3 -1
> >  (nil))
> > 
> > Both of these sequences expand to ldrb but in the arm32 case I know
> > that I set all 32 bits of the register (even though I only want the
> > bottom 8 bits), but for aarch64 I only know that I set the bottom 8
> > bits and I don't know anything about the higher bits, meaning I have to
> > keep the AND instruction to mask out the upper bits on aarch64.

> It's one of the reasons I discourage subregs -- the number of cases
> where we can optimize based on the "don't care" semantics are relatively
> small in my experience and I consistently see cases where the "don't
> care" property of the subreg turns into "don't know" and suppresses
> downstream optimizations.
> 
> It's always a judgment call, but more and more often I find myself
> pushing towards defining those bits using a zero/sign extension, bit
> operation or whatever rather than using subregs.

So if I were loading a QImode to a register (zeroing out the upper
bits) would you generate something like:

(truncate:QI (zero_extend:SI (reg:QI)))

 instead of:

(subreg:QI (reg:SI))

> > I think we should change the movqi/movhi expansions on aarch64 to
> > recognize that the ldrb/ldrh instructions zero out the upper bits in
> > the register by generating rtl like arm32 does.

> Is LOAD_EXTEND_OP defined for aarch64?

Yes, aarch64 defines LOAD_EXTEND_OP to be ZERO_EXTEND.

> It may also be worth looking at ree.c -- my recollection is that it
> didn't handle subregs, but it could and probably should.

I only see a couple of references to subregs in ree.c.  I think they
both involve searching for all uses of a register.

Steve Ellcey


[PATCHv2] Add a -Wcast-align=strict warning

2017-09-14 Thread Bernd Edlinger
On 09/04/17 10:07, Bernd Edlinger wrote:
> Hi,
> 
> as you know we have a -Wcast-align warning which works only for
> STRICT_ALIGNMENT targets.  But occasionally it would be nice to be
> able to switch this warning on even for other targets.
> 
> Therefore I would like to add a strict version of this option
> which can be invoked with -Wcast-align=strict.  With the only
> difference that it does not depend on STRICT_ALIGNMENT.
> 
> I used the code from check_effective_target_non_strict_align
> in target-supports.exp for the first version of the test case,
> where we have this:
> 
> return [check_no_compiler_messages non_strict_align assembly {
>   char *y;
>   typedef char __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))) c;
>   c *z;
>   void foo(void) { z = (c *) y; }
> } "-Wcast-align"]
> 
> ... and to my big surprise it did _not_ work for C++ as-is,
> because same_type_p considers differently aligned types identical,
> and therefore cp_build_c_cast tries the conversion first via a
> const_cast which succeeds, but did not emit the cast-align warning
> in this case.
> 
> As a work-around I had to check the alignment in build_const_cast_1
> as well.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
> 

Hi,

as suggested by Joseph, here is an updated patch that
uses min_align_of_type instead of TYPE_ALIGN.

Is it OK?


Thanks,
Bernd.

gcc:
2017-09-03  Bernd Edlinger  

* common.opt (Wcast-align=strict): New warning option.
* doc/invoke.texi: Document -Wcast-align=strict. 

c:
2017-09-03  Bernd Edlinger  

* c-typeck.c (build_c_cast): Implement -Wcast-align=strict.

cp:
2017-09-03  Bernd Edlinger  

* typeck.c (build_reinterpret_cast_1,
build_const_cast_1): Implement -Wcast-align=strict.

testsuite:
2017-09-03  Bernd Edlinger  

* c-c++-common/Wcast-align.c: New test.
Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c	(revision 251617)
+++ gcc/c/c-typeck.c	(working copy)
@@ -5578,7 +5578,7 @@ build_c_cast (location_t loc, tree type,
 	}
 
   /* Warn about possible alignment problems.  */
-  if (STRICT_ALIGNMENT
+  if ((STRICT_ALIGNMENT || warn_cast_align == 2)
 	  && TREE_CODE (type) == POINTER_TYPE
 	  && TREE_CODE (otype) == POINTER_TYPE
 	  && TREE_CODE (TREE_TYPE (otype)) != VOID_TYPE
@@ -5587,7 +5587,8 @@ build_c_cast (location_t loc, tree type,
 	 restriction is unknown.  */
 	  && !(RECORD_OR_UNION_TYPE_P (TREE_TYPE (otype))
 	   && TYPE_MODE (TREE_TYPE (otype)) == VOIDmode)
-	  && TYPE_ALIGN (TREE_TYPE (type)) > TYPE_ALIGN (TREE_TYPE (otype)))
+	  && min_align_of_type (TREE_TYPE (type))
+	 > min_align_of_type (TREE_TYPE (otype)))
 	warning_at (loc, OPT_Wcast_align,
 		"cast increases required alignment of target type");
 
Index: gcc/common.opt
===
--- gcc/common.opt	(revision 251617)
+++ gcc/common.opt	(working copy)
@@ -564,6 +564,10 @@ Wcast-align
 Common Var(warn_cast_align) Warning
 Warn about pointer casts which increase alignment.
 
+Wcast-align=strict
+Common Var(warn_cast_align,2) Warning
+Warn about pointer casts which increase alignment.
+
 Wcpp
 Common Var(warn_cpp) Init(1) Warning
 Warn when a #warning directive is encountered.
Index: gcc/cp/typeck.c
===
--- gcc/cp/typeck.c	(revision 251617)
+++ gcc/cp/typeck.c	(working copy)
@@ -7265,15 +7265,16 @@ build_reinterpret_cast_1 (tree type, tre
 	   complain))
 	return error_mark_node;
   /* Warn about possible alignment problems.  */
-  if (STRICT_ALIGNMENT && warn_cast_align
-  && (complain & tf_warning)
+  if ((STRICT_ALIGNMENT || warn_cast_align == 2)
+	  && (complain & tf_warning)
 	  && !VOID_TYPE_P (type)
 	  && TREE_CODE (TREE_TYPE (intype)) != FUNCTION_TYPE
 	  && COMPLETE_TYPE_P (TREE_TYPE (type))
 	  && COMPLETE_TYPE_P (TREE_TYPE (intype))
-	  && TYPE_ALIGN (TREE_TYPE (type)) > TYPE_ALIGN (TREE_TYPE (intype)))
+	  && min_align_of_type (TREE_TYPE (type))
+	 > min_align_of_type (TREE_TYPE (intype)))
 	warning (OPT_Wcast_align, "cast from %qH to %qI "
- "increases required alignment of target type", intype, type);
+		 "increases required alignment of target type", intype, type);
 
   /* We need to strip nops here, because the front end likes to
 	 create (int *) for array-to-pointer decay, instead of [0].  */
@@ -7447,6 +7448,14 @@ build_const_cast_1 (tree dst_type, tree
 		 the user is making a potentially unsafe cast.  */
 	  check_for_casting_away_constness (src_type, dst_type,
 		CAST_EXPR, complain);
+	  /* ??? comp_ptr_ttypes_const ignores TYPE_ALIGN.  */
+	  if ((STRICT_ALIGNMENT || warn_cast_align == 2)
+		  && (complain & tf_warning)
+		  && 

Re: [PATCH version 3, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-14 Thread Segher Boessenkool
Hi Carl,

On Wed, Sep 13, 2017 at 04:29:01PM -0700, Carl Love wrote:
> -- add "TARGET_SF_FPR && TARGET_FPRND" to the define_insn "lrintsfsi2"
> as mentioned it was missing on the original define_insn for fctiw.

I don't think TARGET_FPRND is correct: this instruction is in the original
PowerPC specification already.  As noted, this is wrong in more patterns,
we can deal with all of them at once later.

> I looked and really couldn't find a macro that really fit for these
> builtins.  I added a new macro for miscellaneous pre ISA 2.04 builtins.
> I called it BU_P1_MISC_1 for lack of a better name.  Open to suggestions
> on a better name.  I really don't know what the ISA numbers are prior to
> 2.04 so called it P1 to catch all builtins prior to isa 2.04.

P7 stands for "POWER7" here, not some ISA version (and POWER1 would be
incorrect, fctiw was introduced on POWER2, and fctid on PowerPC).

BU_FP_1 maybe?

> +#define BU_P1_MISC_1(ENUM, NAME, ATTR, ICODE)
> \
> +  RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM,   /* ENUM */  \
> + "__builtin_" NAME,  /* NAME */  \
> + RS6000_BTM_ALWAYS,  /* MASK */  \
> + (RS6000_BTC_ ## ATTR/* ATTR */  \
> +  | RS6000_BTC_UNARY),   \
> + CODE_FOR_ ## ICODE) /* ICODE */

I wonder if this needs some test for hard float (try to build a testcase
with -msoft-float, what happens?  Pretty much anything that isn't an ICE
is fine).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c
> @@ -0,0 +1,138 @@
> +/* { dg-do run { target { powerpc*-*-* && { lp64 && p8vector_hw } } } } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mcpu=power8" } */

Do you still need the p8vector_ok test if you already have p8vector_hw?

Would this test work on 32-bit if you used "long long" instead of "long"?

The patch looks good with those last few details taking care of :-)


Segher


[PATCH, rs6000] Don't mark the TOC reg as set up in prologue

2017-09-14 Thread Pat Haugen
Revision 235876 inadvertently caused the TOC reg to be marked as set up
in prologue, which prevents shrink-wrapping from moving the prologue
past a TOC reference. The following patch corrects the situation.

Bootstrap/regtest on powerpc64le-linux and powerpc64-linux(-m32/-m64)
with no new regressions. Ok for trunk?

-Pat


2017-09-14  Pat Haugen  

* config/rs6000/rs6000.c (rs6000_set_up_by_prologue): Make sure the TOC
reg (r2) isn't in the set of registers defined in the prologue.


testsuite/ChangeLog:
2017-09-14  Pat Haugen  

* gcc.target/powerpc/r2_shrink-wrap.c: New.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 252029)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -37807,6 +37807,11 @@ rs6000_set_up_by_prologue (struct hard_r
 add_to_hard_reg_set (>set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
   if (cfun->machine->split_stack_argp_used)
 add_to_hard_reg_set (>set, Pmode, 12);
+
+  /* Make sure the hard reg set doesn't include r2, which was possibly added
+ via PIC_OFFSET_TABLE_REGNUM.  */
+  if (TARGET_TOC)
+remove_from_hard_reg_set (>set, Pmode, TOC_REGNUM);
 }
 
 
Index: gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c
===
--- gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/r2_shrink-wrap.c	(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+void bar();
+int j;
+void foo(int i)
+{
+  j = i;
+  if (i > 0)
+{
+  bar();
+}
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 "pro_and_epilogue" } } */


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-14 Thread Jeff Law
On 09/13/2017 03:46 PM, Steve Ellcey wrote:
> On Wed, 2017-09-13 at 14:46 -0500, Segher Boessenkool wrote:
>> On Wed, Sep 13, 2017 at 06:13:50PM +0100, Kyrill Tkachov wrote:
>>>  
>>> We are usually hesitant to add explicit subreg matching in the MD pattern
>>> (though I don't remember if there's a hard rule against it).
>>> In this case this looks like a missing simplification from combine 
>>> (simplify-rtx) so
>>> I think adding it there would be better.
> 
>> Yes, it probably belongs as a generic simplification in simplify-rtx.c;
>> if there is a reason not to do that, it can be done in combine.c
>> instead.
> 
> Actually, now that I look at it some more and compare it to the arm32
> version (where we do not have this problem) I think the problem starts
> well before combine.
> 
> In arm32 rtl expansion, when reading the QI memory location, I see
> these instructions get generated:
> 
> (insn 10 3 11 2 (set (reg:SI 119)
> (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0 *string_9(D)+0 
> S1 A8]))) "pr77729.c":4 -1
>  (nil))
> (insn 11 10 12 2 (set (reg:QI 118)
> (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
>  (nil))
> 
> And in aarch64 rtl expansion I see:
> 
> (insn 10 9 11 (set (reg:QI 81)
> (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) 
> "pr77729.c":3 -1
>  (nil))
> 
> Both of these sequences expand to ldrb but in the arm32 case I know
> that I set all 32 bits of the register (even though I only want the
> bottom 8 bits), but for aarch64 I only know that I set the bottom 8
> bits and I don't know anything about the higher bits, meaning I have to
> keep the AND instruction to mask out the upper bits on aarch64.
It's one of the reasons I discourage subregs -- the number of cases
where we can optimize based on the "don't care" semantics are relatively
small in my experience and I consistently see cases where the "don't
care" property of the subreg turns into "don't know" and suppresses
downstream optimizations.

It's always a judgment call, but more and more often I find myself
pushing towards defining those bits using a zero/sign extension, bit
operation or whatever rather than using subregs.


> 
> I think we should change the movqi/movhi expansions on aarch64 to
> recognize that the ldrb/ldrh instructions zero out the upper bits in
> the register by generating rtl like arm32 does.
Is LOAD_EXTEND_OP defined for aarch64?

It may also be worth looking at ree.c -- my recollection is that it
didn't handle subregs, but it could and probably should.

jeff


[PATCH PR82163]Rewrite loop into lcssa form instantly

2017-09-14 Thread Bin Cheng
Hi,
Current pcom implementation rewrites into lcssa form after all loops are 
transformed, this is
not enough because unrolling of later loop checks lcssa form in function 
tree_transform_and_unroll_loop.
This simple patch rewrites loop into lcssa form if store-store chain is 
handled.  I think it doesn't
affect compilation time since rewrite_into_loop_closed_ssa_1 is only called for 
store-store chain
transformation and only the transformed loop is rewritten.
Bootstrap and test ongoing on x86_64.  is it OK if no failures?

Thanks,
bin
2017-09-14  Bin Cheng  

PR tree-optimization/82163
* tree-predcom.c (tree_predictive_commoning_loop): Rewrite into
loop closed ssa instantly.  Return boolean true if loop is unrolled.
(tree_predictive_commoning): Return TODO_cleanup_cfg if loop is
unrolled.

gcc/testsuite
2017-09-14  Bin Cheng  

PR tree-optimization/82163
* gcc.dg/tree-ssa/pr82163.c: New test.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c
new file mode 100644
index 000..fef2b1d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b, c[4], d, e, f, g;
+
+void h ()
+{
+  for (; a; a++)
+{
+  c[a + 3] = g;
+  if (b)
+c[a] = f;
+  else
+{
+  for (; d; d++)
+c[d + 3] = c[d];
+  for (e = 1; e == 2; e++)
+;
+  if (e)
+break;
+}
+}
+}
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index e7b10cb..ffbe332 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -3014,11 +3014,10 @@ insert_init_seqs (struct loop *loop, vec 
chains)
   }
 }
 
-/* Performs predictive commoning for LOOP.  Sets bit 1<<0 of return value
-   if LOOP was unrolled; Sets bit 1<<1 of return value if loop closed ssa
-   form was corrupted.  */
+/* Performs predictive commoning for LOOP.  Returns true if LOOP was
+   unrolled.  */
 
-static unsigned
+static bool
 tree_predictive_commoning_loop (struct loop *loop)
 {
   vec datarefs;
@@ -3154,7 +3153,13 @@ end: ;
 
   free_affine_expand_cache (_expansions);
 
-  return (unroll ? 1 : 0) | (loop_closed_ssa ? 2 : 0);
+  /* Rewrite loop into loop closed ssa form if necessary.  We can not do it
+ after all loops are transformed because unrolling of later loop checks
+ loop closed ssa form.  */
+  if (loop_closed_ssa)
+rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop);
+
+  return unroll;
 }
 
 /* Runs predictive commoning.  */
@@ -3163,7 +3168,7 @@ unsigned
 tree_predictive_commoning (void)
 {
   struct loop *loop;
-  unsigned ret = 0, changed = 0;
+  bool changed = 0;
 
   initialize_original_copy_tables ();
   FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
@@ -3173,17 +3178,13 @@ tree_predictive_commoning (void)
   }
   free_original_copy_tables ();
 
-  if (changed > 0)
+  if (changed)
 {
   scev_reset ();
-
-  if (changed > 1)
-   rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
-
-  ret = TODO_cleanup_cfg;
+  return TODO_cleanup_cfg;
 }
 
-  return ret;
+  return 0;
 }
 
 /* Predictive commoning Pass.  */


Re: [PATCH], Add support for __builtin_{sqrt,fma}f128 on PowerPC ISA 3.0

2017-09-14 Thread Segher Boessenkool
On Wed, Sep 13, 2017 at 05:46:00PM -0400, Michael Meissner wrote:
> This patch adds support on PowerPC ISA 3.0 for the built-in function
> __builtin_sqrtf128 generating the XSSQRTQP hardware square root instruction 
> and
> the built-in function __builtin_fmaf128 generating XSMADDQP, XSMSUBQP,
> XSNMADDQP, and XSNMSUBQP fused multiply-add instructions.
> 
> While I was at it, I changed the documentation so that it no longer documents
> the 'q' built-in functions (to mirror libquadmath) but instead just documented
> the 'f128' functions that matches glibc 2.26 and the technical report that
> added the _FloatF128 date.
> 
> I changed the tests that used __fabsq to use __fabsf128 instead.
> 
> I also added && lp64 to float128-5.c so that it doesn't cause errors when 
> doing
> the test for a 32-bit target.  This is due to the fact that if you enable
> hardware IEEE 128-bit floating point, you eventually will need TImode
> supported, and that is not supported on 32-bit targets.
> 
> I did a bootstrap and make check with subversion id 252033 on a little endian
> power8 system.  The subversion id 252033 is one of the last svn ids that
> bootstrap without additional patches on the PowerPC.  There were no 
> regressions
> in this patch, and I verified the 4 new tests were run.  Can I check this 
> patch
> into the trunk?

Yes please.  A few trivial things:

>   * doc/extend.texi (RS/6000 built-in functions): Document the
>   'f128' IEEE 128-bit floating point built-in functions.  Don't
>   document the older 'q' versions of the functions. Document the
>   built-in IEEE 128-bit floating point square root and fused
>   multiply-add built-ins.

Dot space space.

> +/* 1 argument IEEE 128-bit floating point functions that require ISA 3.0
> +   hardware.  We define both a 'q' version for libquadmath compatibility, 
> and a
> +   'f128' for glibc 2.26.  We didn't need this for FABS/COPYSIGN, since the
> +   machine independent built-in support already defines the F128 versions,  
> */

Dot instead of comma?

> --- gcc/testsuite/gcc.target/powerpc/float128-5.c (revision 252730)
> +++ gcc/testsuite/gcc.target/powerpc/float128-5.c (working copy)
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { powerpc*-*-linux* } } } */
> +/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */

Maybe add a comment why this is -m64 only?

Thanks,


Segher


Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-14 Thread Bill Schmidt
On Sep 14, 2017, at 5:15 AM, Richard Biener  wrote:
> 
> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt
>  wrote:
>> On Sep 13, 2017, at 10:40 AM, Bill Schmidt  
>> wrote:
>>> 
>>> On Sep 13, 2017, at 7:23 AM, Richard Biener  
>>> wrote:
 
 On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
  wrote:
> Hi,
> 
> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
> 
> Folding of vector loads in GIMPLE.
> 
> Add code to handle gimple folding for the vec_ld builtins.
> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
> Surrounding
> comments have been adjusted slightly so they continue to read OK for the
> existing vec_st code.
> 
> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c
> tests which have been posted separately.
> 
> For V2 of this patch, I've removed the chunk of code that prohibited the
> gimple fold from occurring in BE environments.   This had fixed an issue
> for me earlier during my development of the code, and turns out this was
> not necessary.  I've sniff-tested after removing that check and it looks
> OK.
> 
>> + /* Limit folding of loads to LE targets.  */
>> +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
>> +return false;
> 
> I've restarted a regression test on this updated version.
> 
> OK for trunk (assuming successful regression test completion)  ?
> 
> Thanks,
> -Will
> 
> [gcc]
> 
>  2017-09-12  Will Schmidt  
> 
>  * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>  for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
>  * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>  Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
> 
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index fbab0a2..bb8a77d 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t 
> loc, tree fndecl,
>   convert (TREE_TYPE (stmt), arg0));
> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> return stmt;
>   }
> 
> -  /* Expand vec_ld into an expression that masks the address and
> - performs the load.  We need to expand this early to allow
> +  /* Expand vec_st into an expression that masks the address and
> + performs the store.  We need to expand this early to allow
>the best aliasing, as by the time we get into RTL we no longer
>are able to honor __restrict__, for example.  We may want to
>consider this for all memory access built-ins.
> 
>When -maltivec=be is specified, or the wrong number of arguments
>is provided, simply punt to existing built-in processing.  */
> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
> -  && nargs == 2)
> -{
> -  tree arg0 = (*arglist)[0];
> -  tree arg1 = (*arglist)[1];
> -
> -  /* Strip qualifiers like "const" from the pointer arg.  */
> -  tree arg1_type = TREE_TYPE (arg1);
> -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
> ARRAY_TYPE)
> -   goto bad;
> -
> -  tree inner_type = TREE_TYPE (arg1_type);
> -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
> -   {
> - arg1_type = build_pointer_type (build_qualified_type 
> (inner_type,
> -   0));
> - arg1 = fold_convert (arg1_type, arg1);
> -   }
> -
> -  /* Construct the masked address.  Let existing error handling take
> -over if we don't have a constant offset.  */
> -  arg0 = fold (arg0);
> -
> -  if (TREE_CODE (arg0) == INTEGER_CST)
> -   {
> - if (!ptrofftype_p (TREE_TYPE (arg0)))
> -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
> -
> - tree arg1_type = TREE_TYPE (arg1);
> - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
> -   {
> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
> - tree const0 = build_int_cstu (sizetype, 0);
> - tree arg1_elt0 = build_array_ref (loc, arg1, const0);
> - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
> -   }
> -
> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
> -  arg1, arg0);
> - tree aligned = fold_build2_loc (loc, 

Re: Add a vect_get_dr_size helper function

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 4:05 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Sep 14, 2017 at 1:23 PM, Richard Sandiford
>>  wrote:
>>> This patch adds a helper function for getting the number of
>>> bytes accessed by a scalar data reference, which helps when general
>>> modes have a variable size.
>>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>>> OK to install?
>>
>> Can you put it into tree-data-ref.h?
>
> The idea (which I forgot to say) was to collect the uses within the
> vectoriser into one place so that we can assert in only one place
> that the reference is constant-sized.
>
> A general data_reference can be variable-sized, so the guarantees
> wouldn't be the same elsewhere.
>
> Would putting it in tree-vectorizer.h be OK?

Maybe name it vect_get_scalar_dr_size then?

Ok with that.

Richard.

> Thanks,
> Richard


Re: Store VECTOR_CST_NELTS directly in tree_node

2017-09-14 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Sep 14, 2017 at 1:13 PM, Richard Sandiford
>  wrote:
>> Previously VECTOR_CST_NELTS (t) read the number of elements from
>> TYPE_VECTOR_SUBPARTS (TREE_TYPE (t)).  There were two ways of handling
>> this with variable TYPE_VECTOR_SUBPARTS: either forcibly convert the
>> number to a constant (which is doable) or store the number directly
>> in the VECTOR_CST.  The latter seemed better, since it involves less
>> pointer chasing and since the tree_node u field is otherwise unused
>> for VECTOR_CST.  It would still be easy to switch to the former in
>> future if we need to free up the field for someting else.
>>
>> The patch also changes various bits of VECTOR_CST code to use
>> VECTOR_CST_NELTS instead of TYPE_VECTOR_SUBPARTS when iterating
>> over VECTOR_CST_ELTs.  Also, when the two are checked for equality,
>> the patch prefers to read VECTOR_CST_NELTS (which must be constant)
>> and check against TYPE_VECTOR_SUBPARTS, instead of the other way
>> around.
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> OK to install?
>
> Ok but I don't see how this helps the variable TYPE_VECTOR_SUBPARTS case?
> Are there no VECTOR_CSTs for SVE?

Not for SVE in the normal variable-length case, no.  We have different tree
codes for building braodcast and step vectors.

So it's similar in some ways to the scalar_mode stuff: the fact that we
have a VECTOR_CST is "proof" that we have a constant number of elements.

Thanks,
Richard


Re: Add a vect_get_dr_size helper function

2017-09-14 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Sep 14, 2017 at 1:23 PM, Richard Sandiford
>  wrote:
>> This patch adds a helper function for getting the number of
>> bytes accessed by a scalar data reference, which helps when general
>> modes have a variable size.
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> OK to install?
>
> Can you put it into tree-data-ref.h?

The idea (which I forgot to say) was to collect the uses within the
vectoriser into one place so that we can assert in only one place
that the reference is constant-sized.

A general data_reference can be variable-sized, so the guarantees
wouldn't be the same elsewhere.

Would putting it in tree-vectorizer.h be OK?

Thanks,
Richard


[rs6000,patch] fix for fold-vec-ld-longlong.c test (lp64)

2017-09-14 Thread Will Schmidt
Hi, 
  I missed a target lp64 require for the fold-vec-ld-longlong.c test.
I'm now wearing my cone of shame.  :-(

Committing as trivial, momentarily. 

Thanks,
-Will


diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-longlong.c
index 37941af..db4a879 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-longlong.c
@@ -1,9 +1,9 @@
 /* Verify that overloaded built-ins for vec_ld* with long long
inputs produce the right code.  */
 
-/* { dg-do compile } */
+/* { dg-do compile { target lp64 } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mpower8-vector -O2" } */
 
 #include 
 




Re: Make more use of gimple-fold.h in tree-vect-loop.c

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:25 PM, Richard Sandiford
 wrote:
> This patch makes the vectoriser use the gimple-fold.h routines
> in more cases, instead of vect_init_vector.  Later patches want
> to use the same interface to handle variable-length vectors.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-vect-loop.c (vectorizable_induction): Use gimple_build instead
> of vect_init_vector.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2017-09-14 11:26:37.599804415 +0100
> +++ gcc/tree-vect-loop.c2017-09-14 11:27:16.962234838 +0100
> @@ -6839,18 +6839,21 @@ vectorizable_induction (gimple *phi,
>  {
>/* iv_loop is the loop to be vectorized. Generate:
>   vec_step = [VF*S, VF*S, VF*S, VF*S]  */
> +  gimple_seq seq = NULL;
>if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
> {
>   expr = build_int_cst (integer_type_node, vf);
> - expr = fold_convert (TREE_TYPE (step_expr), expr);
> + expr = gimple_build (, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> }
>else
> expr = build_int_cst (TREE_TYPE (step_expr), vf);
> -  new_name = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
> - expr, step_expr);
> -  if (TREE_CODE (step_expr) == SSA_NAME)
> -   new_name = vect_init_vector (phi, new_name,
> -TREE_TYPE (step_expr), NULL);
> +  new_name = gimple_build (, MULT_EXPR, TREE_TYPE (step_expr),
> +  expr, step_expr);
> +  if (seq)
> +   {
> + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
> + gcc_assert (!new_bb);
> +   }
>  }
>
>t = unshare_expr (new_name);
> @@ -6899,6 +6902,7 @@ vectorizable_induction (gimple *phi,
>
>if (ncopies > 1)
>  {
> +  gimple_seq seq = NULL;
>stmt_vec_info prev_stmt_vinfo;
>/* FORNOW. This restriction should be relaxed.  */
>gcc_assert (!nested_in_vect_loop);
> @@ -6907,15 +6911,18 @@ vectorizable_induction (gimple *phi,
>if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
> {
>   expr = build_int_cst (integer_type_node, nunits);
> - expr = fold_convert (TREE_TYPE (step_expr), expr);
> + expr = gimple_build (, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
> }
>else
> expr = build_int_cst (TREE_TYPE (step_expr), nunits);
> -  new_name = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
> - expr, step_expr);
> -  if (TREE_CODE (step_expr) == SSA_NAME)
> -   new_name = vect_init_vector (phi, new_name,
> -TREE_TYPE (step_expr), NULL);
> +  new_name = gimple_build (, MULT_EXPR, TREE_TYPE (step_expr),
> +  expr, step_expr);
> +  if (seq)
> +   {
> + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
> + gcc_assert (!new_bb);
> +   }
> +
>t = unshare_expr (new_name);
>gcc_assert (CONSTANT_CLASS_P (new_name)
>   || TREE_CODE (new_name) == SSA_NAME);


Re: Add LOOP_VINFO_MAX_VECT_FACTOR

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:24 PM, Richard Sandiford
 wrote:
> Epilogue vectorisation uses the vectorisation factor of the main loop
> as the maximum vectorisation factor allowed for correctness.  That makes
> sense as a conservatively correct value, since the chosen vectorisation
> factor will be strictly less than that anyway.  However, once the VF
> itself becomes variable, it's easier to carry across the original
> maximum VF instead.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.
Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-vectorizer.h (_loop_vec_info): Add max_vectorization_factor.
> (LOOP_VINFO_MAX_VECT_FACTOR): New macro.
> (LOOP_VINFO_ORIG_VECT_FACTOR): Replace with...
> (LOOP_VINFO_ORIG_MAX_VECT_FACTOR): ...this new macro.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences): Update
> accordingly.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> max_vectorization_factor.
> (vect_analyze_loop_2): Set LOOP_VINFO_MAX_VECT_FACTOR.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-09-14 11:28:27.080519923 +0100
> +++ gcc/tree-vectorizer.h   2017-09-14 11:30:06.064254417 +0100
> @@ -241,6 +241,10 @@ typedef struct _loop_vec_info : public v
>/* Unrolling factor  */
>int vectorization_factor;
>
> +  /* Maximum runtime vectorization factor, or MAX_VECTORIZATION_FACTOR
> + if there is no particular limit.  */
> +  unsigned HOST_WIDE_INT max_vectorization_factor;
> +
>/* Unknown DRs according to which loop was peeled.  */
>struct data_reference *unaligned_dr;
>
> @@ -355,6 +359,7 @@ #define LOOP_VINFO_NITERS_ASSUMPTIONS(L)
>  #define LOOP_VINFO_COST_MODEL_THRESHOLD(L) (L)->th
>  #define LOOP_VINFO_VECTORIZABLE_P(L)   (L)->vectorizable
>  #define LOOP_VINFO_VECT_FACTOR(L)  (L)->vectorization_factor
> +#define LOOP_VINFO_MAX_VECT_FACTOR(L)  (L)->max_vectorization_factor
>  #define LOOP_VINFO_PTR_MASK(L) (L)->ptr_mask
>  #define LOOP_VINFO_LOOP_NEST(L)(L)->loop_nest
>  #define LOOP_VINFO_DATAREFS(L) (L)->datarefs
> @@ -400,8 +405,8 @@ #define LOOP_VINFO_NITERS_KNOWN_P(L)
>  #define LOOP_VINFO_EPILOGUE_P(L) \
>(LOOP_VINFO_ORIG_LOOP_INFO (L) != NULL)
>
> -#define LOOP_VINFO_ORIG_VECT_FACTOR(L) \
> -  (LOOP_VINFO_VECT_FACTOR (LOOP_VINFO_ORIG_LOOP_INFO (L)))
> +#define LOOP_VINFO_ORIG_MAX_VECT_FACTOR(L) \
> +  (LOOP_VINFO_MAX_VECT_FACTOR (LOOP_VINFO_ORIG_LOOP_INFO (L)))
>
>  static inline loop_vec_info
>  loop_vec_info_for_loop (struct loop *loop)
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-09-14 11:29:19.649870912 +0100
> +++ gcc/tree-vect-data-refs.c   2017-09-14 11:30:06.063347272 +0100
> @@ -509,7 +509,7 @@ vect_analyze_data_ref_dependences (loop_
>   was applied to original loop.  Therefore we may just get max_vf
>   using VF of original loop.  */
>if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> -*max_vf = LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo);
> +*max_vf = LOOP_VINFO_ORIG_MAX_VECT_FACTOR (loop_vinfo);
>else
>  FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr)
>if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf))
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2017-09-14 11:28:27.079519923 +0100
> +++ gcc/tree-vect-loop.c2017-09-14 11:30:06.064254417 +0100
> @@ -,6 +,7 @@ _loop_vec_info::_loop_vec_info (struct l
>  num_iters_assumptions (NULL_TREE),
>  th (0),
>  vectorization_factor (0),
> +max_vectorization_factor (0),
>  unaligned_dr (NULL),
>  peeling_for_alignment (0),
>  ptr_mask (0),
> @@ -1920,6 +1921,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>  "bad data dependence.\n");
>return false;
>  }
> +  LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) = max_vf;
>
>ok = vect_determine_vectorization_factor (loop_vinfo);
>if (!ok)


Re: Add a vect_get_dr_size helper function

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:23 PM, Richard Sandiford
 wrote:
> This patch adds a helper function for getting the number of
> bytes accessed by a scalar data reference, which helps when general
> modes have a variable size.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Can you put it into tree-data-ref.h?

Ok with that change.
Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-vect-data-refs.c (vect_get_dr_size): New function.
> (vect_update_misalignment_for_peel): Use it.
> (vect_enhance_data_refs_alignment): Likewise.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-09-14 11:27:50.350257085 +0100
> +++ gcc/tree-vect-data-refs.c   2017-09-14 11:29:19.649870912 +0100
> @@ -950,6 +950,13 @@ vect_compute_data_ref_alignment (struct
>return true;
>  }
>
> +/* Return the size of the value accessed by DR, which is always constant.  */
> +
> +static unsigned int
> +vect_get_dr_size (struct data_reference *dr)
> +{
> +  return GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
> +}
>
>  /* Function vect_update_misalignment_for_peel.
> Sets DR's misalignment
> @@ -970,8 +977,8 @@ vect_update_misalignment_for_peel (struc
>unsigned int i;
>vec same_aligned_drs;
>struct data_reference *current_dr;
> -  int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
> -  int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF 
> (dr_peel;
> +  int dr_size = vect_get_dr_size (dr);
> +  int dr_peel_size = vect_get_dr_size (dr_peel);
>stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
>stmt_vec_info peel_stmt_info = vinfo_for_stmt (DR_STMT (dr_peel));
>
> @@ -1659,8 +1666,7 @@ vect_enhance_data_refs_alignment (loop_v
>
>vectype = STMT_VINFO_VECTYPE (stmt_info);
>nelements = TYPE_VECTOR_SUBPARTS (vectype);
> -  mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
> -TREE_TYPE (DR_REF (dr;
> + mis = DR_MISALIGNMENT (dr) / vect_get_dr_size (dr);
>   if (DR_MISALIGNMENT (dr) != 0)
> npeel_tmp = (negative ? (mis - nelements)
>  : (nelements - mis)) & (nelements - 1);
> @@ -1932,8 +1938,7 @@ vect_enhance_data_refs_alignment (loop_v
>   updating DR_MISALIGNMENT values.  The peeling factor is the
>   vectorization factor minus the misalignment as an element
>   count.  */
> -  mis = DR_MISALIGNMENT (dr0);
> -  mis /= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr0;
> + mis = DR_MISALIGNMENT (dr0) / vect_get_dr_size (dr0);
>npeel = ((negative ? mis - nelements : nelements - mis)
>& (nelements - 1));
>  }


Re: Add a vect_worthwhile_without_simd_p helper routine

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:22 PM, Richard Sandiford
 wrote:
> The vectoriser sometimes considers lowering "vector" operations into N
> scalar word operations.  This N needs to be fixed at compile time, so
> the condition guarding it needs to change when variable-lengh vectors
> are added.  This patch puts the condition into a helper routine so that
> there's only one place to update.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-vectorizer.h (vect_min_worthwhile_factor): Delete.
> (vect_worthwhile_without_simd_p): Declare.
> * tree-vect-loop.c (vect_worthwhile_without_simd_p): New function.
> (vectorizable_reduction): Use it.
> * tree-vect-stmts.c (vectorizable_shift): Likewise.
> (vectorizable_operation): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-09-14 11:27:50.352072753 +0100
> +++ gcc/tree-vectorizer.h   2017-09-14 11:28:27.080519923 +0100
> @@ -1230,7 +1230,7 @@ extern bool vectorizable_reduction (gimp
>  extern bool vectorizable_induction (gimple *, gimple_stmt_iterator *,
> gimple **, slp_tree);
>  extern tree get_initial_def_for_reduction (gimple *, tree, tree *);
> -extern int vect_min_worthwhile_factor (enum tree_code);
> +extern bool vect_worthwhile_without_simd_p (vec_info *, tree_code);
>  extern int vect_get_known_peeling_cost (loop_vec_info, int, int *,
> stmt_vector_for_cost *,
> stmt_vector_for_cost *,
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2017-09-14 11:27:50.351164919 +0100
> +++ gcc/tree-vect-loop.c2017-09-14 11:28:27.079519923 +0100
> @@ -6030,8 +6030,7 @@ vectorizable_reduction (gimple *stmt, gi
>  dump_printf (MSG_NOTE, "op not supported by target.\n");
>
>if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
> -  || LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> - < vect_min_worthwhile_factor (code))
> + || !vect_worthwhile_without_simd_p (loop_vinfo, code))
>  return false;
>
>if (dump_enabled_p ())
> @@ -6040,8 +6039,7 @@ vectorizable_reduction (gimple *stmt, gi
>
>/* Worthwhile without SIMD support?  */
>if (!VECTOR_MODE_P (TYPE_MODE (vectype_in))
> -  && LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> -< vect_min_worthwhile_factor (code))
> + && !vect_worthwhile_without_simd_p (loop_vinfo, code))
>  {
>if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -6492,6 +6490,18 @@ vect_min_worthwhile_factor (enum tree_co
>  }
>  }
>
> +/* Return true if VINFO indicates we are doing loop vectorization and if
> +   it is worth decomposing CODE operations into scalar operations for
> +   that loop's vectorization factor.  */
> +
> +bool
> +vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
> +  return (loop_vinfo
> + && (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> + >= vect_min_worthwhile_factor (code)));
> +}
>
>  /* Function vectorizable_induction
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2017-09-14 11:27:50.352072753 +0100
> +++ gcc/tree-vect-stmts.c   2017-09-14 11:28:27.080519923 +0100
> @@ -4869,7 +4869,6 @@ vectorizable_shift (gimple *stmt, gimple
>bool scalar_shift_arg = true;
>bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>vec_info *vinfo = stmt_info->vinfo;
> -  int vf;
>
>if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>  return false;
> @@ -4937,11 +4936,6 @@ vectorizable_shift (gimple *stmt, gimple
>return false;
>  }
>
> -  if (loop_vinfo)
> -vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -  else
> -vf = 1;
> -
>/* Multiple types in SLP are handled by creating the appropriate number of
>   vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
>   case of SLP.  */
> @@ -5086,8 +5080,8 @@ vectorizable_shift (gimple *stmt, gimple
>   "op not supported by target.\n");
>/* Check only during analysis.  */
>if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
> -  || (vf < vect_min_worthwhile_factor (code)
> -  && !vec_stmt))
> + || (!vec_stmt
> + && !vect_worthwhile_without_simd_p (vinfo, code)))
>  return 

Re: Add a vect_get_num_copies helper routine

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:22 PM, Richard Sandiford
 wrote:
> This patch adds a vectoriser helper routine to calculate how
> many copies of a vector statement we need.  At present this
> is always:
>
>   LOOP_VINFO_VECT_FACTOR (loop_vinfo) / TYPE_VECTOR_SUBPARTS (vectype)
>
> but later patches add other cases.  Another benefit of using
> a helper routine is that it can assert that the division is
> exact (which it must be).
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-vectorizer.h (vect_get_num_copies): New function.
> * tree-vect-data-refs.c (vect_get_data_access_cost): Use it.
> * tree-vect-loop.c (vectorizable_reduction): Likewise.
> (vectorizable_induction): Likewise.
> (vectorizable_live_operation): Likewise.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Likewise.
> (vectorizable_bswap): Likewise.
> (vectorizable_call): Likewise.
> (vectorizable_conversion): Likewise.
> (vectorizable_assignment): Likewise.
> (vectorizable_shift): Likewise.
> (vectorizable_operation): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> (vectorizable_condition): Likewise.
> (vectorizable_comparison): Likewise.
> (vect_analyze_stmt): Pass the slp node to vectorizable_live_operation.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-09-14 11:25:32.166167193 +0100
> +++ gcc/tree-vectorizer.h   2017-09-14 11:27:50.352072753 +0100
> @@ -1076,6 +1076,20 @@ unlimited_cost_model (loop_p loop)
>return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED);
>  }
>
> +/* Return the number of copies needed for loop vectorization when
> +   a statement operates on vectors of type VECTYPE.  This is the
> +   vectorization factor divided by the number of elements in
> +   VECTYPE and is always known at compile time.  */
> +
> +static inline unsigned int
> +vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
> +{
> +  gcc_checking_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> +  % TYPE_VECTOR_SUBPARTS (vectype) == 0);
> +  return (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> + / TYPE_VECTOR_SUBPARTS (vectype));
> +}
> +
>  /* Source location */
>  extern source_location vect_location;
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-09-14 11:25:32.163167193 +0100
> +++ gcc/tree-vect-data-refs.c   2017-09-14 11:27:50.350257085 +0100
> @@ -1181,10 +1181,13 @@ vect_get_data_access_cost (struct data_r
>  {
>gimple *stmt = DR_STMT (dr);
>stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> -  int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
>loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> -  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -  int ncopies = MAX (1, vf / nunits); /* TODO: Handle SLP properly  */
> +  int ncopies;
> +
> +  if (PURE_SLP_STMT (stmt_info))
> +ncopies = 1;
> +  else
> +ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE 
> (stmt_info));
>
>if (DR_IS_READ (dr))
>  vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2017-09-14 11:27:16.962234838 +0100
> +++ gcc/tree-vect-loop.c2017-09-14 11:27:50.351164919 +0100
> @@ -5683,8 +5683,7 @@ vectorizable_reduction (gimple *stmt, gi
>if (slp_node)
> ncopies = 1;
>else
> -   ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> -  / TYPE_VECTOR_SUBPARTS (vectype_in));
> +   ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
>
>use_operand_p use_p;
>gimple *use_stmt;
> @@ -5980,8 +5979,7 @@ vectorizable_reduction (gimple *stmt, gi
>if (slp_node)
>  ncopies = 1;
>else
> -ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> -   / TYPE_VECTOR_SUBPARTS (vectype_in));
> +ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
>
>gcc_assert (ncopies >= 1);
>
> @@ -6550,7 +6548,7 @@ vectorizable_induction (gimple *phi,
>if (slp_node)
>  ncopies = 1;
>else
> -ncopies = vf / nunits;
> +ncopies = vect_get_num_copies (loop_vinfo, vectype);
>gcc_assert (ncopies >= 1);
>
>/* FORNOW. These restrictions should be relaxed.  */
> @@ -7013,12 +7011,17 @@ vectorizable_live_operation (gimple *stm
>tree lhs, lhs_type, bitsize, vec_bitsize;
>tree vectype = 

Re: Add gimple_build_vector* helpers

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:20 PM, Richard Sandiford
 wrote:
> This patch adds gimple-fold.h equivalents of build_vector and
> build_vector_from_val.  Like the other gimple-fold.h routines
> they always return a valid gimple value and add any new
> statements to a given gimple_seq.  In combination with later
> patches this reduces the number of force_gimple_operands.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * gimple-fold.h (gimple_build_vector_from_val): Declare, and provide
> an inline wrapper that provides a location.
> (gimple_build_vector): Likewise.
> * gimple-fold.c (gimple_build_vector_from_val): New function.
> (gimple_build_vector): Likewise.
> * tree-vect-loop.c (get_initial_def_for_reduction): Use the new
> functions to build the initial value.  Always return a gimple value.
> (get_initial_defs_for_reduction): Likewise.  Only compute
> neutral_vec once.
> (vect_create_epilog_for_reduction): Don't call force_gimple_operand or
> vect_init_vector on the results from get_initial_def(s)_for_reduction.
> (vectorizable_induction): Use gimple_build_vector rather than
> vect_init_vector.
>
> Index: gcc/gimple-fold.h
> ===
> --- gcc/gimple-fold.h   2017-07-08 11:37:46.573465901 +0100
> +++ gcc/gimple-fold.h   2017-09-14 11:26:37.598804415 +0100
> @@ -127,6 +127,21 @@ gimple_convert_to_ptrofftype (gimple_seq
>return gimple_convert_to_ptrofftype (seq, UNKNOWN_LOCATION, op);
>  }
>
> +extern tree gimple_build_vector_from_val (gimple_seq *, location_t, tree,
> + tree);
> +inline tree
> +gimple_build_vector_from_val (gimple_seq *seq, tree type, tree op)
> +{
> +  return gimple_build_vector_from_val (seq, UNKNOWN_LOCATION, type, op);
> +}
> +
> +extern tree gimple_build_vector (gimple_seq *, location_t, tree, vec);
> +inline tree
> +gimple_build_vector (gimple_seq *seq, tree type, vec elts)
> +{
> +  return gimple_build_vector (seq, UNKNOWN_LOCATION, type, elts);
> +}
> +
>  extern bool gimple_stmt_nonnegative_warnv_p (gimple *, bool *, int = 0);
>  extern bool gimple_stmt_integer_valued_real_p (gimple *, int = 0);
>
> Index: gcc/gimple-fold.c
> ===
> --- gcc/gimple-fold.c   2017-09-14 11:24:42.666088258 +0100
> +++ gcc/gimple-fold.c   2017-09-14 11:26:37.598804415 +0100
> @@ -7058,6 +7058,58 @@ gimple_convert_to_ptrofftype (gimple_seq
>return gimple_convert (seq, loc, sizetype, op);
>  }
>
> +/* Build a vector of type TYPE in which each element has the value OP.
> +   Return a gimple value for the result, appending any new statements
> +   to SEQ.  */
> +
> +tree
> +gimple_build_vector_from_val (gimple_seq *seq, location_t loc, tree type,
> + tree op)
> +{
> +  tree res, vec = build_vector_from_val (type, op);
> +  if (is_gimple_val (vec))
> +return vec;
> +  if (gimple_in_ssa_p (cfun))
> +res = make_ssa_name (type);
> +  else
> +res = create_tmp_reg (type);
> +  gimple *stmt = gimple_build_assign (res, vec);
> +  gimple_set_location (stmt, loc);
> +  gimple_seq_add_stmt_without_update (seq, stmt);
> +  return res;
> +}
> +
> +/* Build a vector of type TYPE in which the elements have the values
> +   given by ELTS.  Return a gimple value for the result, appending any
> +   new instructions to SEQ.  */
> +
> +tree
> +gimple_build_vector (gimple_seq *seq, location_t loc, tree type,
> +vec elts)
> +{
> +  unsigned int nelts = elts.length ();
> +  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> +  for (unsigned int i = 0; i < nelts; ++i)
> +if (!TREE_CONSTANT (elts[i]))
> +  {
> +   vec *v;
> +   vec_alloc (v, nelts);
> +   for (i = 0; i < nelts; ++i)
> + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, elts[i]);
> +
> +   tree res;
> +   if (gimple_in_ssa_p (cfun))
> + res = make_ssa_name (type);
> +   else
> + res = create_tmp_reg (type);
> +   gimple *stmt = gimple_build_assign (res, build_constructor (type, v));
> +   gimple_set_location (stmt, loc);
> +   gimple_seq_add_stmt_without_update (seq, stmt);
> +   return res;
> +  }
> +  return build_vector (type, elts);
> +}
> +
>  /* Return true if the result of assignment STMT is known to be non-negative.
> If the return value is based on the assumption that signed overflow is
> undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
> Index: gcc/tree-vect-loop.c
> 

Re: Use vec<> for constant permute masks

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:20 PM, Richard Sandiford
 wrote:
> This patch makes can_vec_perm_p & co. take a vec<>, wrapped in new
> typedefs vec_perm_indices and auto_vec_perm_indices.  There are two
> reasons for doing this for SVE:
>
> (1) it means that the number of elements is bundled with the elements
> themselves, and is obviously constant.
>
> (2) it makes it easier to change the "unsigned char" element type to
> something wider.
>
> I'm happy to change the target hooks as a follow-on patch, if this is OK.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * target.h (vec_perm_indices): New typedef.
> (auto_vec_perm_indices): Likewise.
> * optabs-query.h: Include target.h
> (can_vec_perm_p): Take a vec_perm_indices *.
> * optabs-query.c (can_vec_perm_p): Likewise.
> (can_mult_highpart_p): Update accordingly.  Use auto_vec_perm_indices.
> * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
> * tree-vect-generic.c (lower_vec_perm): Likewise.
> * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
> (vect_grouped_load_supported): Likewise.
> (vect_shift_permute_load_chain): Likewise.
> (vect_permute_store_chain): Use auto_vec_perm_indices.
> (vect_permute_load_chain): Likewise.
> * fold-const.c (fold_vec_perm): Take vec_perm_indices.
> (fold_ternary_loc): Update accordingly.  Use auto_vec_perm_indices.
> Update uses of can_vec_perm_p.
> * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Replace the
> mode with a number of elements.  Take a vec_perm_indices *.
> (vect_create_epilog_for_reduction): Update accordingly.
> Use auto_vec_perm_indices.
> (have_whole_vector_shift): Likewise.  Update call to can_vec_perm_p.
> * tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
> (vect_transform_slp_perm_load): Likewise.
> (vect_schedule_slp_instance): Use auto_vec_perm_indices.
> * tree-vectorizer.h (vect_gen_perm_mask_any): Take a vec_perm_indices.
> (vect_gen_perm_mask_checked): Likewise.
> * tree-vect-stmts.c (vect_gen_perm_mask_any): Take a vec_perm_indices.
> (vect_gen_perm_mask_checked): Likewise.
> (vectorizable_mask_load_store): Use auto_vec_perm_indices.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> (perm_mask_for_reverse): Likewise.  Update call to can_vec_perm_p.
> (vectorizable_bswap): Likewise.
>
> Index: gcc/target.h
> ===
> --- gcc/target.h2017-09-11 17:10:58.656085547 +0100
> +++ gcc/target.h2017-09-14 11:25:32.162167193 +0100
> @@ -191,6 +191,14 @@ enum vect_cost_model_location {
>vect_epilogue = 2
>  };
>
> +/* The type to use for vector permutes with a constant permute vector.
> +   Each entry is an index into the concatenated input vectors.  */
> +typedef vec vec_perm_indices;
> +
> +/* Same, but can be used to construct local permute vectors that are
> +   automatically freed.  */
> +typedef auto_vec auto_vec_perm_indices;
> +
>  /* The target structure.  This holds all the backend hooks.  */
>  #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
>  #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
> Index: gcc/optabs-query.h
> ===
> --- gcc/optabs-query.h  2017-08-30 12:14:51.272396735 +0100
> +++ gcc/optabs-query.h  2017-09-14 11:25:32.162167193 +0100
> @@ -21,6 +21,7 @@ the Free Software Foundation; either ver
>  #define GCC_OPTABS_QUERY_H
>
>  #include "insn-opinit.h"
> +#include "target.h"
>
>  /* Return the insn used to implement mode MODE of OP, or CODE_FOR_nothing
> if the target does not have such an insn.  */
> @@ -165,7 +166,7 @@ enum insn_code can_extend_p (machine_mod
>  enum insn_code can_float_p (machine_mode, machine_mode, int);
>  enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
>  bool can_conditionally_move_p (machine_mode mode);
> -bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
> +bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
>  enum insn_code widening_optab_handler (optab, machine_mode, machine_mode);
>  /* Find a widening optab even if it doesn't widen as much as we want.  */
>  #define find_widening_optab_handler(A,B,C,D) \
> Index: gcc/optabs-query.c
> ===
> --- gcc/optabs-query.c  2017-09-05 20:57:40.745898121 +0100
> +++ gcc/optabs-query.c  2017-09-14 11:25:32.162167193 +0100
> @@ -353,8 

Re: Use vec<> in build_vector

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:14 PM, Richard Sandiford
 wrote:
> This patch makes build_vector take the elements as a vec<> rather
> than a tree *.  This is useful for SVE because it bundles the number
> of elements with the elements themselves, and enforces the fact that
> the number is constant.  Also, I think things like the folds can be used
> with any generic GNU vector, not just those that match machine vectors,
> so the arguments to XALLOCAVEC had no clear limit.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree.h (build_vector): Take a vec instead of a tree *.
> * tree.c (build_vector): Likewise.
> (build_vector_from_ctor): Update accordingly.
> (build_vector_from_val): Likewise.
> * gimple-fold.c (gimple_fold_stmt_to_constant_1): Likewise.
> * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
> * tree-vect-generic.c (add_rshift): Likewise.
> (expand_vector_divmod): Likewise.
> (optimize_vector_constructor): Likewise.
> * tree-vect-slp.c (vect_get_constant_vectors): Likewise.
> (vect_transform_slp_perm_load): Likewise.
> (vect_schedule_slp_instance): Likewise.
> * tree-vect-stmts.c (vectorizable_bswap): Likewise.
> (vectorizable_call): Likewise.
> (vect_gen_perm_mask_any): Likewise.  Add elements in order.
> * expmed.c (make_tree): Likewise.
> * fold-const.c (fold_negate_expr_1): Use auto_vec when building
> a vector passed to build_vector.
> (fold_convert_const): Likewise.
> (exact_inverse): Likewise.
> (fold_ternary_loc): Likewise.
> (fold_relational_const): Likewise.
> (const_binop): Likewise.  Use VECTOR_CST_ELT directly when operating
> on VECTOR_CSTs, rather than going through vec_cst_ctor_to_array.
> (const_unop): Likewise.  Store the reduction accumulator in a
> variable rather than an array.
> (vec_cst_ctor_to_array): Take the number of elements as a parameter.
> (fold_vec_perm): Update calls accordingly.  Use auto_vec for
> the new vector, rather than constructing it after the input arrays.
> (native_interpret_vector): Use auto_vec when building
> a vector passed to build_vector.  Add elements in order.
> * tree-vect-loop.c (get_initial_defs_for_reduction): Use
> auto_vec when building a vector passed to build_vector.
> (vect_create_epilog_for_reduction): Likewise.
> (vectorizable_induction): Likewise.
> (get_initial_def_for_reduction): Likewise.  Fix indentation of
> case statements.
> * config/sparc/sparc.c (sparc_handle_vis_mul8x16): Change n_elts
> to a vec *.
> (sparc_fold_builtin): Use auto_vec when building a vector
> passed to build_vector.
>
> Index: gcc/tree.h
> ===
> --- gcc/tree.h  2017-09-14 11:23:57.004947653 +0100
> +++ gcc/tree.h  2017-09-14 11:24:42.669777533 +0100
> @@ -4026,7 +4026,7 @@ extern tree build_int_cst (tree, HOST_WI
>  extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst);
>  extern tree build_int_cst_type (tree, HOST_WIDE_INT);
>  extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
> -extern tree build_vector (tree, tree * CXX_MEM_STAT_INFO);
> +extern tree build_vector (tree, vec CXX_MEM_STAT_INFO);
>  extern tree build_vector_from_ctor (tree, vec *);
>  extern tree build_vector_from_val (tree, tree);
>  extern void recompute_constructor_flags (tree);
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  2017-09-14 11:23:57.004947653 +0100
> +++ gcc/tree.c  2017-09-14 11:24:42.669777533 +0100
> @@ -1702,18 +1702,20 @@ make_vector (unsigned len MEM_STAT_DECL)
>  }
>
>  /* Return a new VECTOR_CST node whose type is TYPE and whose values
> -   are in a list pointed to by VALS.  */
> +   are given by VALS.  */
>
>  tree
> -build_vector (tree type, tree *vals MEM_STAT_DECL)
> +build_vector (tree type, vec vals MEM_STAT_DECL)
>  {
> +  unsigned int nelts = vals.length ();
> +  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
>int over = 0;
>unsigned cnt = 0;
> -  tree v = make_vector (TYPE_VECTOR_SUBPARTS (type));
> +  tree v = make_vector (nelts);
>TREE_TYPE (v) = type;
>
>/* Iterate through elements and check for overflow.  */
> -  for (cnt = 0; cnt < TYPE_VECTOR_SUBPARTS (type); ++cnt)
> +  for (cnt = 0; cnt < nelts; ++cnt)
>  {
>tree value = vals[cnt];
>
> @@ -1736,20 +1738,21 @@ build_vector (tree type, tree *vals MEM_
>  tree
>  build_vector_from_ctor (tree 

Re: Store VECTOR_CST_NELTS directly in tree_node

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 1:13 PM, Richard Sandiford
 wrote:
> Previously VECTOR_CST_NELTS (t) read the number of elements from
> TYPE_VECTOR_SUBPARTS (TREE_TYPE (t)).  There were two ways of handling
> this with variable TYPE_VECTOR_SUBPARTS: either forcibly convert the
> number to a constant (which is doable) or store the number directly
> in the VECTOR_CST.  The latter seemed better, since it involves less
> pointer chasing and since the tree_node u field is otherwise unused
> for VECTOR_CST.  It would still be easy to switch to the former in
> future if we need to free up the field for someting else.
>
> The patch also changes various bits of VECTOR_CST code to use
> VECTOR_CST_NELTS instead of TYPE_VECTOR_SUBPARTS when iterating
> over VECTOR_CST_ELTs.  Also, when the two are checked for equality,
> the patch prefers to read VECTOR_CST_NELTS (which must be constant)
> and check against TYPE_VECTOR_SUBPARTS, instead of the other way
> around.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

Ok but I don't see how this helps the variable TYPE_VECTOR_SUBPARTS case?
Are there no VECTOR_CSTs for SVE?

Thanks,
Richard.

> Richard
>
>
> 2017-09-14  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-core.h (tree_base::u): Add an "nelts" field.
> (tree_vector): Use VECTOR_CST_NELTS as the length.
> * tree.c (tree_size): Likewise.
> (make_vector): Initialize VECTOR_CST_NELTS.
> * tree.h (VECTOR_CST_NELTS): Use the u.nelts field.
> * cfgexpand.c (expand_debug_expr): Use VECTOR_CST_NELTS instead of
> TYPE_VECTOR_SUBPARTS.
> * expr.c (const_vector_mask_from_tree): Consistently use "units"
> as the number of units, setting it from VECTOR_CST_NELTS.
> (const_vector_from_tree): Likewise.
> * fold-const.c (negate_expr_p): Use VECTOR_CST_NELTS instead of
> TYPE_VECTOR_SUBPARTS for the number of elements in a VECTOR_CST.
> (fold_negate_expr_1): Likewise.
> (fold_convert_const): Likewise.
> (const_binop): Likewise.  Differentiate the number of output and
> input elements.
> (const_unop): Likewise.
> (fold_ternary_loc): Use VECTOR_CST_NELTS for the number of elements
> in a VECTOR_CST, asserting that it is the same as TYPE_VECTOR_SUBPARTS
> in cases that did the opposite.
>
> Index: gcc/tree-core.h
> ===
> --- gcc/tree-core.h 2017-08-21 10:42:05.815630531 +0100
> +++ gcc/tree-core.h 2017-09-14 11:23:57.004041291 +0100
> @@ -975,6 +975,9 @@ struct GTY(()) tree_base {
>  /* VEC length.  This field is only used with TREE_VEC.  */
>  int length;
>
> +/* Number of elements.  This field is only used with VECTOR_CST.  */
> +unsigned int nelts;
> +
>  /* SSA version number.  This field is only used with SSA_NAME.  */
>  unsigned int version;
>
> @@ -1326,7 +1329,7 @@ struct GTY(()) tree_complex {
>
>  struct GTY(()) tree_vector {
>struct tree_typed typed;
> -  tree GTY ((length ("TYPE_VECTOR_SUBPARTS (TREE_TYPE ((tree)&%h))"))) 
> elts[1];
> +  tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
>  };
>
>  struct GTY(()) tree_identifier {
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  2017-09-11 17:10:38.700973860 +0100
> +++ gcc/tree.c  2017-09-14 11:23:57.004947653 +0100
> @@ -873,7 +873,7 @@ tree_size (const_tree node)
>
>  case VECTOR_CST:
>return (sizeof (struct tree_vector)
> - + (TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)) - 1) * sizeof 
> (tree));
> + + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
>
>  case STRING_CST:
>return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) 
> + 1;
> @@ -1696,6 +1696,7 @@ make_vector (unsigned len MEM_STAT_DECL)
>
>TREE_SET_CODE (t, VECTOR_CST);
>TREE_CONSTANT (t) = 1;
> +  VECTOR_CST_NELTS (t) = len;
>
>return t;
>  }
> Index: gcc/tree.h
> ===
> --- gcc/tree.h  2017-08-30 12:19:19.721220029 +0100
> +++ gcc/tree.h  2017-09-14 11:23:57.004947653 +0100
> @@ -1026,7 +1026,7 @@ #define TREE_REALPART(NODE) (COMPLEX_CST
>  #define TREE_IMAGPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.imag)
>
>  /* In a VECTOR_CST node.  */
> -#define VECTOR_CST_NELTS(NODE) (TYPE_VECTOR_SUBPARTS (TREE_TYPE (NODE)))
> +#define VECTOR_CST_NELTS(NODE) (VECTOR_CST_CHECK (NODE)->base.u.nelts)
>  #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
>  #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
>
> Index: gcc/cfgexpand.c
> ===
> --- gcc/cfgexpand.c

Re: [PATCH, i386] Enable option -mprefer-avx256 added for Intel AVX512 configuration

2017-09-14 Thread Markus Trippelsdorf
On 2017.09.14 at 14:36 +0200, Jakub Jelinek wrote:
> On Thu, Sep 14, 2017 at 12:10:50PM +, Shalnov, Sergey wrote:
> > GCC has the option "mprefer-avx128" to use 128-bit AVX registers instead
> > of 256-bit AVX registers in the auto-vectorizer.
> 
> > This patch enables the command line option "mprefer-avx256" that reduces
> > 512-bit registers usage in "march=skylake-avx512" mode.  This is the
> > initial implementation of the option.  Currently, 512-bit registers might
> > appears in some cases.  I have a plan to continue fix the cases where
> > 512-bit registers are appear.  Sergey
> 
> What is the rationale for this?  -mprefer-avx128 has been added because some
> (older) AMD CPUs implement AVX by performing 256-bit ops as two 128-bit uops
> and thus it is faster to emit 128-bit only code.
> Is that the case for any AVX512 implementations too?

You get a huge frequency drop when you run AVX512 code. There are
situations where this more than offsets the potential gains.
Glibc had to disable AVX512 memcpy because of this issue.

-- 
Markus


Re: [PATCH, i386] Enable option -mprefer-avx256 added for Intel AVX512 configuration

2017-09-14 Thread Jakub Jelinek
On Thu, Sep 14, 2017 at 12:10:50PM +, Shalnov, Sergey wrote:
> GCC has the option "mprefer-avx128" to use 128-bit AVX registers instead
> of 256-bit AVX registers in the auto-vectorizer.

> This patch enables the command line option "mprefer-avx256" that reduces
> 512-bit registers usage in "march=skylake-avx512" mode.  This is the
> initial implementation of the option.  Currently, 512-bit registers might
> appears in some cases.  I have a plan to continue fix the cases where
> 512-bit registers are appear.  Sergey

What is the rationale for this?  -mprefer-avx128 has been added because some
(older) AMD CPUs implement AVX by performing 256-bit ops as two 128-bit uops
and thus it is faster to emit 128-bit only code.
Is that the case for any AVX512 implementations too?

Jakub


Re: [RFC] Make 4-stage PGO bootstrap really working

2017-09-14 Thread Martin Liška
PING^1

On 08/30/2017 11:45 AM, Martin Liška wrote:
> Hi.
> 
> This is follow up which I've just noticed. Main problem we have is that
> an instrumented compiler w/ -fprofile-generate (built in $OBJDIR/gcc 
> subfolder)
> will generate all *.gcda files in a same dir as *.o files. That's problematic
> because we then have *.gcda files spread in 'profile' subfolder (because 
> profile'
> compiler builds libgcc) and 'train' subfolder. Eventually in 'feedback' stage
> we don't load any *.gcda files :/
> 
> Well I really hope we need to set -fprofile-generate=$folder to a $folder. 
> There comes
> second problem: all *.gcda files are created as $folder/$aux_base_name.gcda 
> which makes
> it useless as we multiple same file names:
> 
> $ find . -name expr.c
> ./libcpp/expr.c
> ./gcc/expr.c
> 
> Thus I suggest patch #0001 that appends full path of current work dir. Patch 
> #0002 sets
> a folder for PGO bootstrap. So far so good with a small exception: 
> conftest.gcda files
> that trigger -Wcoverage-mismatch. Can we remove these before a stage? Do we 
> do a similar
> thing somewhere?
> 
> Thoughts?
> Thanks,
> Martin
> 



[PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2017-09-14 Thread Martin Liška
Hello.

As mentioned at Cauldron 2017, second step in switch lowering should be massive
simplification in code that does expansion of balanced tree. Basically it 
includes
VRP and DCE, which we can for obvious reason do by our own.

The patch does that, and introduces a separate pass for -O0 that's responsible
for lowering at the end of tree pass.

There's a small fallback that I would like to discuss:

1) vrp105.c - there's no longer catches threading opportunity in between 
default cases:
adding Patrick who can probably help why is the opportunity skipped with 
expanded tree

2) uninit-18.c where we currently report:

/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/uninit-18.c:13:12: warning: 
‘tmp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 tmp[5] = 7;/* { dg-bogus "may be used uninitialized" } */

Am I right that the pass uses VRP?

Last question is whether the pass is properly moved in optimization pipeline?
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Thanks,
Martin

gcc/ChangeLog:

2017-09-14  Martin Liska  

* passes.def: Add pass_lower_switch and pass_lower_switch_O0.
* tree-pass.h (make_pass_lower_switch_O0): New function.
* tree-switch-conversion.c (node_has_low_bound): Remove.
(node_has_high_bound): Likewise.
(node_is_bounded): Likewise.
(class pass_lower_switch): Make it a template type and create
two instances.
(pass_lower_switch::execute): Add template argument.
(make_pass_lower_switch): New function.
(make_pass_lower_switch_O0): New function.
(do_jump_if_equal): Remove.
(emit_case_nodes): Simplify to just handle all 3 cases and leave
all the hard work to tree optimization passes.

gcc/testsuite/ChangeLog:

2017-09-14  Martin Liska  

* gcc.dg/tree-ssa/vrp104.c: Adjust dump file that is scanned.
* gcc.dg/tree-prof/update-loopch.c: Likewise.
---
 gcc/passes.def |   4 +-
 gcc/testsuite/gcc.dg/tree-prof/update-loopch.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp104.c |   2 +-
 gcc/tree-pass.h|   1 +
 gcc/tree-switch-conversion.c   | 604 +++--
 5 files changed, 72 insertions(+), 541 deletions(-)


diff --git a/gcc/passes.def b/gcc/passes.def
index 00e75d2b55a..bb371d9bde5 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -314,6 +314,7 @@ along with GCC; see the file COPYING3.  If not see
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_simduid_cleanup);
   NEXT_PASS (pass_lower_vector_ssa);
+  NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_cse_reciprocals);
   NEXT_PASS (pass_sprintf_length, true);
   NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
@@ -358,6 +359,7 @@ along with GCC; see the file COPYING3.  If not see
   /* Lower remaining pieces of GIMPLE.  */
   NEXT_PASS (pass_lower_complex);
   NEXT_PASS (pass_lower_vector_ssa);
+  NEXT_PASS (pass_lower_switch);
   /* Perform simple scalar cleanup which is constant/copy propagation.  */
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
   NEXT_PASS (pass_post_ipa_warn);
@@ -393,7 +395,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_vaarg);
   NEXT_PASS (pass_lower_vector);
   NEXT_PASS (pass_lower_complex_O0);
-  NEXT_PASS (pass_lower_switch);
+  NEXT_PASS (pass_lower_switch_O0);
   NEXT_PASS (pass_sancov_O0);
   NEXT_PASS (pass_asan_O0);
   NEXT_PASS (pass_tsan_O0);
diff --git a/gcc/testsuite/gcc.dg/tree-prof/update-loopch.c b/gcc/testsuite/gcc.dg/tree-prof/update-loopch.c
index 73efc878ec0..15baada1081 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/update-loopch.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/update-loopch.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fdump-ipa-profile-blocks-details -fdump-tree-switchlower-blocks-details" } */
+/* { dg-options "-O2 -fdump-ipa-profile-blocks-details -fdump-tree-switchlower1-blocks-details" } */
 int max = 3;
 int a[8];
 int
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c
index 0a952267b29..71fa3bfa2ca 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O2 -fdump-tree-switchlower" }  */
 /* We scan for 2 switches as the dump file reports a transformation,
IL really contains just a single.  */
-/* { dg-final { scan-tree-dump-times "switch" 2 "switchlower" } }  */
+/* { dg-final { scan-tree-dump-times "switch" 2 "switchlower1" } }  */
 
 void foo (void);
 void bar (void);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9f76d822abc..6ae65765431 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -410,6 +410,7 @@ extern gimple_opt_pass *make_pass_strip_predict_hints (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_complex_O0 (gcc::context *ctxt);
 

[PATCH, i386] Enable option -mprefer-avx256 added for Intel AVX512 configuration

2017-09-14 Thread Shalnov, Sergey
Hi,
GCC has the option "mprefer-avx128" to use 128-bit AVX registers instead of 
256-bit AVX registers in the auto-vectorizer.
This patch enables the command line option "mprefer-avx256" that reduces 
512-bit registers usage in "march=skylake-avx512" mode.
This is the initial implementation of the option. Currently, 512-bit registers 
might appears in some cases. I have a plan to continue fix the cases where 
512-bit registers are appear.
Sergey

2017-09-14  Sergey Shalnov  sergey.shal...@intel.com
* config/i386/i386.opt (mprefer-avx256): New flag. 
* config/i386/i386.c (ix86_preferred_simd_mode): Prefer 256-bit AVX modes when 
the flag -mprefer-avx256 is on.



0001-Option-mprefer-avx256-added-for-Intel-AVX512-configu.patch
Description: 0001-Option-mprefer-avx256-added-for-Intel-AVX512-configu.patch


Re: [PATCH] Enhance PHI processing in VN

2017-09-14 Thread Richard Biener
On Thu, 7 Sep 2017, Richard Biener wrote:

> On Thu, 7 Sep 2017, Richard Biener wrote:
> 
> > 
> > This enhances VN to do the same PHI handling as CCP, meeting
> > undefined and constant to constant.  I've gone a little bit
> > further (and maybe will revisit this again) in also meeting
> > all-undefined to undefined taking one of the undefined args
> > as the value number.  I feel like this might run into
> > the equation issues I mentioned in the other mail so it
> > would be cleaner to invent a "new" undefined value number
> > here -- but I have to make sure to not create too many
> > default-defs or break iteration convergence (default defs are also
> > expensive given they require a decl - sth I want to change as well).
> > 
> > So for now I guess I'll stick with the slightly bogus(?) way also
> > hoping for a testcase that shows the issue this uncovers.
> > 
> > Note it's required to handle
> > 
> > _3 = PHI <_1(D), _2(D)>
> > ..
> > 
> > _4 = PHI <_3, 1>
> > 
> > consistently with
> > 
> > _4 = PHI <_1(D), _2(D), 1>
> > 
> > aka with/without extra forwarders.
> 
> That said, "fallout" is we simplify
> 
> int foo (int b)
> { 
>   int i, j, k;
>   if (b)
> k = i;
>   else
> k = j;
>   if (k == i)
> return 1;
>   else if (k == j)
> return 2;
>   return 0;
> 
> to
> 
>   if (j == i)
> return 1;
>   else
> return 2;
> 
> or even just
> 
>   return 2;
> 
> dependent on PHI argument order of k = PHI .
> 
> Likewise we'd say that either k - i or k - j is zero.
> 
> The complication with PHIs is that they do not always only appear
> in places where uses of the args dominate it but the other way
> around so we can't really invoke the undefined behavior rule
> on a PHI node with undefined args itself.  The question is whether
> we may for PHIs with just undefined args ... but my guess is no
> so I do have to fix the above.
> 
> Anybody can produce a testcase that we'd consider wrong-code?
> (the above examples clearly are not)

After some pondering I decided for PHIs with all undefs there's
really no way things can go wrong.

Thus the following is what I installed.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2017-09-14  Richard Biener  

* tree-ssa-sccvn.c (visit_phi): Merge undefined values similar
to VN_TOP.

* gcc.dg/tree-ssa/ssa-fre-59.c: New testcase.
* gcc.dg/uninit-suppress_2.c: Adjust.
* gcc.dg/tree-ssa/ssa-sccvn-2.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 252062)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -3860,11 +3860,11 @@ visit_reference_op_store (tree lhs, tree
 static bool
 visit_phi (gimple *phi)
 {
-  bool changed = false;
-  tree result;
-  tree sameval = VN_TOP;
-  bool allsame = true;
+  tree result, sameval = VN_TOP, seen_undef = NULL_TREE;
   unsigned n_executable = 0;
+  bool allsame = true;
+  edge_iterator ei;
+  edge e;
 
   /* TODO: We could check for this in init_sccvn, and replace this
  with a gcc_assert.  */
@@ -3873,8 +3873,6 @@ visit_phi (gimple *phi)
 
   /* See if all non-TOP arguments have the same value.  TOP is
  equivalent to everything, so we can ignore it.  */
-  edge_iterator ei;
-  edge e;
   FOR_EACH_EDGE (e, ei, gimple_bb (phi)->preds)
 if (e->flags & EDGE_EXECUTABLE)
   {
@@ -3884,8 +3882,12 @@ visit_phi (gimple *phi)
if (TREE_CODE (def) == SSA_NAME)
  def = SSA_VAL (def);
if (def == VN_TOP)
- continue;
-   if (sameval == VN_TOP)
+ ;
+   /* Ignore undefined defs for sameval but record one.  */
+   else if (TREE_CODE (def) == SSA_NAME
+&& ssa_undefined_value_p (def, false))
+ seen_undef = def;
+   else if (sameval == VN_TOP)
  sameval = def;
else if (!expressions_equal_p (def, sameval))
  {
@@ -3893,30 +3895,39 @@ visit_phi (gimple *phi)
break;
  }
   }
-  
-  /* If none of the edges was executable or all incoming values are
- undefined keep the value-number at VN_TOP.  If only a single edge
- is exectuable use its value.  */
-  if (sameval == VN_TOP
-  || n_executable == 1)
-return set_ssa_val_to (PHI_RESULT (phi), sameval);
 
+
+  /* If none of the edges was executable keep the value-number at VN_TOP,
+ if only a single edge is exectuable use its value.  */
+  if (n_executable <= 1)
+result = seen_undef ? seen_undef : sameval;
+  /* If we saw only undefined values create a new undef SSA name to
+ avoid false equivalences.  */
+  else if (sameval == VN_TOP)
+{
+  gcc_assert (seen_undef);
+  result = seen_undef;
+}
   /* First see if it is equivalent to a phi node in this block.  We prefer
  this as it allows IV elimination - see PRs 66502 and 67167.  */
-  result = vn_phi_lookup (phi);
-  if (result)
-changed = set_ssa_val_to (PHI_RESULT 

[PATCH][x86] Knights Mill -march/-mtune options

2017-09-14 Thread Peryt, Sebastian
Hi,

This patch adds  options -march=/-mtune=knm for Knights Mill.

2017-09-14  Sebastian Peryt  
gcc/

* config.gcc: Support "knm".
* config/i386/driver-i386.c (host_detect_local_cpu): Detect "knm".
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
PROCESSOR_KNM.
* config/i386/i386.c (m_KNM): Define.
(processor_target_table): Add "knm".
(PTA_KNM): Define.
(ix86_option_override_internal): Add "knm".
(ix86_issue_rate): Add PROCESSOR_KNM.
(ix86_adjust_cost): Ditto.
(ia32_multipass_dfa_lookahead): Ditto.
(get_builtin_code_for_version): Handle PROCESSOR_KNM.
(fold_builtin_cpu): Define M_INTEL_KNM.
* config/i386/i386.h (TARGET_KNM): Define.
(processor_type): Add PROCESSOR_KNM.
* config/i386/x86-tune.def: Add m_KNM.
* doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.


gcc/testsuite/

* gcc.target/i386/funcspec-5.c: Test knm.

Is it ok for trunk?

Thanks,
Sebastian




KNM_enabling.patch
Description: KNM_enabling.patch


Make more use of gimple-fold.h in tree-vect-loop.c

2017-09-14 Thread Richard Sandiford
This patch makes the vectoriser use the gimple-fold.h routines
in more cases, instead of vect_init_vector.  Later patches want
to use the same interface to handle variable-length vectors.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-loop.c (vectorizable_induction): Use gimple_build instead
of vect_init_vector.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-14 11:26:37.599804415 +0100
+++ gcc/tree-vect-loop.c2017-09-14 11:27:16.962234838 +0100
@@ -6839,18 +6839,21 @@ vectorizable_induction (gimple *phi,
 {
   /* iv_loop is the loop to be vectorized. Generate:
  vec_step = [VF*S, VF*S, VF*S, VF*S]  */
+  gimple_seq seq = NULL;
   if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
{
  expr = build_int_cst (integer_type_node, vf);
- expr = fold_convert (TREE_TYPE (step_expr), expr);
+ expr = gimple_build (, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
}
   else
expr = build_int_cst (TREE_TYPE (step_expr), vf);
-  new_name = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
- expr, step_expr);
-  if (TREE_CODE (step_expr) == SSA_NAME)
-   new_name = vect_init_vector (phi, new_name,
-TREE_TYPE (step_expr), NULL);
+  new_name = gimple_build (, MULT_EXPR, TREE_TYPE (step_expr),
+  expr, step_expr);
+  if (seq)
+   {
+ new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+ gcc_assert (!new_bb);
+   }
 }
 
   t = unshare_expr (new_name);
@@ -6899,6 +6902,7 @@ vectorizable_induction (gimple *phi,
 
   if (ncopies > 1)
 {
+  gimple_seq seq = NULL;
   stmt_vec_info prev_stmt_vinfo;
   /* FORNOW. This restriction should be relaxed.  */
   gcc_assert (!nested_in_vect_loop);
@@ -6907,15 +6911,18 @@ vectorizable_induction (gimple *phi,
   if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
{
  expr = build_int_cst (integer_type_node, nunits);
- expr = fold_convert (TREE_TYPE (step_expr), expr);
+ expr = gimple_build (, FLOAT_EXPR, TREE_TYPE (step_expr), expr);
}
   else
expr = build_int_cst (TREE_TYPE (step_expr), nunits);
-  new_name = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr),
- expr, step_expr);
-  if (TREE_CODE (step_expr) == SSA_NAME)
-   new_name = vect_init_vector (phi, new_name,
-TREE_TYPE (step_expr), NULL);
+  new_name = gimple_build (, MULT_EXPR, TREE_TYPE (step_expr),
+  expr, step_expr);
+  if (seq)
+   {
+ new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+ gcc_assert (!new_bb);
+   }
+
   t = unshare_expr (new_name);
   gcc_assert (CONSTANT_CLASS_P (new_name)
  || TREE_CODE (new_name) == SSA_NAME);


Add LOOP_VINFO_MAX_VECT_FACTOR

2017-09-14 Thread Richard Sandiford
Epilogue vectorisation uses the vectorisation factor of the main loop
as the maximum vectorisation factor allowed for correctness.  That makes
sense as a conservatively correct value, since the chosen vectorisation
factor will be strictly less than that anyway.  However, once the VF
itself becomes variable, it's easier to carry across the original
maximum VF instead.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (_loop_vec_info): Add max_vectorization_factor.
(LOOP_VINFO_MAX_VECT_FACTOR): New macro.
(LOOP_VINFO_ORIG_VECT_FACTOR): Replace with...
(LOOP_VINFO_ORIG_MAX_VECT_FACTOR): ...this new macro.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences): Update
accordingly.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
max_vectorization_factor.
(vect_analyze_loop_2): Set LOOP_VINFO_MAX_VECT_FACTOR.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-09-14 11:28:27.080519923 +0100
+++ gcc/tree-vectorizer.h   2017-09-14 11:30:06.064254417 +0100
@@ -241,6 +241,10 @@ typedef struct _loop_vec_info : public v
   /* Unrolling factor  */
   int vectorization_factor;
 
+  /* Maximum runtime vectorization factor, or MAX_VECTORIZATION_FACTOR
+ if there is no particular limit.  */
+  unsigned HOST_WIDE_INT max_vectorization_factor;
+
   /* Unknown DRs according to which loop was peeled.  */
   struct data_reference *unaligned_dr;
 
@@ -355,6 +359,7 @@ #define LOOP_VINFO_NITERS_ASSUMPTIONS(L)
 #define LOOP_VINFO_COST_MODEL_THRESHOLD(L) (L)->th
 #define LOOP_VINFO_VECTORIZABLE_P(L)   (L)->vectorizable
 #define LOOP_VINFO_VECT_FACTOR(L)  (L)->vectorization_factor
+#define LOOP_VINFO_MAX_VECT_FACTOR(L)  (L)->max_vectorization_factor
 #define LOOP_VINFO_PTR_MASK(L) (L)->ptr_mask
 #define LOOP_VINFO_LOOP_NEST(L)(L)->loop_nest
 #define LOOP_VINFO_DATAREFS(L) (L)->datarefs
@@ -400,8 +405,8 @@ #define LOOP_VINFO_NITERS_KNOWN_P(L)
 #define LOOP_VINFO_EPILOGUE_P(L) \
   (LOOP_VINFO_ORIG_LOOP_INFO (L) != NULL)
 
-#define LOOP_VINFO_ORIG_VECT_FACTOR(L) \
-  (LOOP_VINFO_VECT_FACTOR (LOOP_VINFO_ORIG_LOOP_INFO (L)))
+#define LOOP_VINFO_ORIG_MAX_VECT_FACTOR(L) \
+  (LOOP_VINFO_MAX_VECT_FACTOR (LOOP_VINFO_ORIG_LOOP_INFO (L)))
 
 static inline loop_vec_info
 loop_vec_info_for_loop (struct loop *loop)
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-09-14 11:29:19.649870912 +0100
+++ gcc/tree-vect-data-refs.c   2017-09-14 11:30:06.063347272 +0100
@@ -509,7 +509,7 @@ vect_analyze_data_ref_dependences (loop_
  was applied to original loop.  Therefore we may just get max_vf
  using VF of original loop.  */
   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
-*max_vf = LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo);
+*max_vf = LOOP_VINFO_ORIG_MAX_VECT_FACTOR (loop_vinfo);
   else
 FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr)
   if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf))
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-14 11:28:27.079519923 +0100
+++ gcc/tree-vect-loop.c2017-09-14 11:30:06.064254417 +0100
@@ -,6 +,7 @@ _loop_vec_info::_loop_vec_info (struct l
 num_iters_assumptions (NULL_TREE),
 th (0),
 vectorization_factor (0),
+max_vectorization_factor (0),
 unaligned_dr (NULL),
 peeling_for_alignment (0),
 ptr_mask (0),
@@ -1920,6 +1921,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
 "bad data dependence.\n");
   return false;
 }
+  LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) = max_vf;
 
   ok = vect_determine_vectorization_factor (loop_vinfo);
   if (!ok)


Add a vect_get_dr_size helper function

2017-09-14 Thread Richard Sandiford
This patch adds a helper function for getting the number of
bytes accessed by a scalar data reference, which helps when general
modes have a variable size.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vect-data-refs.c (vect_get_dr_size): New function.
(vect_update_misalignment_for_peel): Use it.
(vect_enhance_data_refs_alignment): Likewise.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-09-14 11:27:50.350257085 +0100
+++ gcc/tree-vect-data-refs.c   2017-09-14 11:29:19.649870912 +0100
@@ -950,6 +950,13 @@ vect_compute_data_ref_alignment (struct
   return true;
 }
 
+/* Return the size of the value accessed by DR, which is always constant.  */
+
+static unsigned int
+vect_get_dr_size (struct data_reference *dr)
+{
+  return GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
+}
 
 /* Function vect_update_misalignment_for_peel.
Sets DR's misalignment
@@ -970,8 +977,8 @@ vect_update_misalignment_for_peel (struc
   unsigned int i;
   vec same_aligned_drs;
   struct data_reference *current_dr;
-  int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
-  int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel;
+  int dr_size = vect_get_dr_size (dr);
+  int dr_peel_size = vect_get_dr_size (dr_peel);
   stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
   stmt_vec_info peel_stmt_info = vinfo_for_stmt (DR_STMT (dr_peel));
 
@@ -1659,8 +1666,7 @@ vect_enhance_data_refs_alignment (loop_v
 
   vectype = STMT_VINFO_VECTYPE (stmt_info);
   nelements = TYPE_VECTOR_SUBPARTS (vectype);
-  mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
-TREE_TYPE (DR_REF (dr;
+ mis = DR_MISALIGNMENT (dr) / vect_get_dr_size (dr);
  if (DR_MISALIGNMENT (dr) != 0)
npeel_tmp = (negative ? (mis - nelements)
 : (nelements - mis)) & (nelements - 1);
@@ -1932,8 +1938,7 @@ vect_enhance_data_refs_alignment (loop_v
  updating DR_MISALIGNMENT values.  The peeling factor is the
  vectorization factor minus the misalignment as an element
  count.  */
-  mis = DR_MISALIGNMENT (dr0);
-  mis /= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr0;
+ mis = DR_MISALIGNMENT (dr0) / vect_get_dr_size (dr0);
   npeel = ((negative ? mis - nelements : nelements - mis)
   & (nelements - 1));
 }


Add a vect_worthwhile_without_simd_p helper routine

2017-09-14 Thread Richard Sandiford
The vectoriser sometimes considers lowering "vector" operations into N
scalar word operations.  This N needs to be fixed at compile time, so
the condition guarding it needs to change when variable-lengh vectors
are added.  This patch puts the condition into a helper routine so that
there's only one place to update.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (vect_min_worthwhile_factor): Delete.
(vect_worthwhile_without_simd_p): Declare.
* tree-vect-loop.c (vect_worthwhile_without_simd_p): New function.
(vectorizable_reduction): Use it.
* tree-vect-stmts.c (vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-09-14 11:27:50.352072753 +0100
+++ gcc/tree-vectorizer.h   2017-09-14 11:28:27.080519923 +0100
@@ -1230,7 +1230,7 @@ extern bool vectorizable_reduction (gimp
 extern bool vectorizable_induction (gimple *, gimple_stmt_iterator *,
gimple **, slp_tree);
 extern tree get_initial_def_for_reduction (gimple *, tree, tree *);
-extern int vect_min_worthwhile_factor (enum tree_code);
+extern bool vect_worthwhile_without_simd_p (vec_info *, tree_code);
 extern int vect_get_known_peeling_cost (loop_vec_info, int, int *,
stmt_vector_for_cost *,
stmt_vector_for_cost *,
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-14 11:27:50.351164919 +0100
+++ gcc/tree-vect-loop.c2017-09-14 11:28:27.079519923 +0100
@@ -6030,8 +6030,7 @@ vectorizable_reduction (gimple *stmt, gi
 dump_printf (MSG_NOTE, "op not supported by target.\n");
 
   if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
-  || LOOP_VINFO_VECT_FACTOR (loop_vinfo)
- < vect_min_worthwhile_factor (code))
+ || !vect_worthwhile_without_simd_p (loop_vinfo, code))
 return false;
 
   if (dump_enabled_p ())
@@ -6040,8 +6039,7 @@ vectorizable_reduction (gimple *stmt, gi
 
   /* Worthwhile without SIMD support?  */
   if (!VECTOR_MODE_P (TYPE_MODE (vectype_in))
-  && LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-< vect_min_worthwhile_factor (code))
+ && !vect_worthwhile_without_simd_p (loop_vinfo, code))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6492,6 +6490,18 @@ vect_min_worthwhile_factor (enum tree_co
 }
 }
 
+/* Return true if VINFO indicates we are doing loop vectorization and if
+   it is worth decomposing CODE operations into scalar operations for
+   that loop's vectorization factor.  */
+
+bool
+vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code)
+{
+  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
+  return (loop_vinfo
+ && (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+ >= vect_min_worthwhile_factor (code)));
+}
 
 /* Function vectorizable_induction
 
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-09-14 11:27:50.352072753 +0100
+++ gcc/tree-vect-stmts.c   2017-09-14 11:28:27.080519923 +0100
@@ -4869,7 +4869,6 @@ vectorizable_shift (gimple *stmt, gimple
   bool scalar_shift_arg = true;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   vec_info *vinfo = stmt_info->vinfo;
-  int vf;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
 return false;
@@ -4937,11 +4936,6 @@ vectorizable_shift (gimple *stmt, gimple
   return false;
 }
 
-  if (loop_vinfo)
-vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  else
-vf = 1;
-
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
@@ -5086,8 +5080,8 @@ vectorizable_shift (gimple *stmt, gimple
  "op not supported by target.\n");
   /* Check only during analysis.  */
   if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
-  || (vf < vect_min_worthwhile_factor (code)
-  && !vec_stmt))
+ || (!vec_stmt
+ && !vect_worthwhile_without_simd_p (vinfo, code)))
 return false;
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
@@ -5095,9 +5089,9 @@ vectorizable_shift (gimple *stmt, gimple
 }
 
   /* Worthwhile without SIMD support?  Check only during analysis.  */
-  if (!VECTOR_MODE_P (TYPE_MODE (vectype))
-  && vf < 

Add a vect_get_num_copies helper routine

2017-09-14 Thread Richard Sandiford
This patch adds a vectoriser helper routine to calculate how
many copies of a vector statement we need.  At present this
is always:

  LOOP_VINFO_VECT_FACTOR (loop_vinfo) / TYPE_VECTOR_SUBPARTS (vectype)

but later patches add other cases.  Another benefit of using
a helper routine is that it can assert that the division is
exact (which it must be).

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (vect_get_num_copies): New function.
* tree-vect-data-refs.c (vect_get_data_access_cost): Use it.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(vect_analyze_stmt): Pass the slp node to vectorizable_live_operation.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-09-14 11:25:32.166167193 +0100
+++ gcc/tree-vectorizer.h   2017-09-14 11:27:50.352072753 +0100
@@ -1076,6 +1076,20 @@ unlimited_cost_model (loop_p loop)
   return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED);
 }
 
+/* Return the number of copies needed for loop vectorization when
+   a statement operates on vectors of type VECTYPE.  This is the
+   vectorization factor divided by the number of elements in
+   VECTYPE and is always known at compile time.  */
+
+static inline unsigned int
+vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
+{
+  gcc_checking_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+  % TYPE_VECTOR_SUBPARTS (vectype) == 0);
+  return (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+ / TYPE_VECTOR_SUBPARTS (vectype));
+}
+
 /* Source location */
 extern source_location vect_location;
 
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-09-14 11:25:32.163167193 +0100
+++ gcc/tree-vect-data-refs.c   2017-09-14 11:27:50.350257085 +0100
@@ -1181,10 +1181,13 @@ vect_get_data_access_cost (struct data_r
 {
   gimple *stmt = DR_STMT (dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  int ncopies = MAX (1, vf / nunits); /* TODO: Handle SLP properly  */
+  int ncopies;
+
+  if (PURE_SLP_STMT (stmt_info))
+ncopies = 1;
+  else
+ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info));
 
   if (DR_IS_READ (dr))
 vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-14 11:27:16.962234838 +0100
+++ gcc/tree-vect-loop.c2017-09-14 11:27:50.351164919 +0100
@@ -5683,8 +5683,7 @@ vectorizable_reduction (gimple *stmt, gi
   if (slp_node)
ncopies = 1;
   else
-   ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-  / TYPE_VECTOR_SUBPARTS (vectype_in));
+   ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
 
   use_operand_p use_p;
   gimple *use_stmt;
@@ -5980,8 +5979,7 @@ vectorizable_reduction (gimple *stmt, gi
   if (slp_node)
 ncopies = 1;
   else
-ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-   / TYPE_VECTOR_SUBPARTS (vectype_in));
+ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
 
   gcc_assert (ncopies >= 1);
 
@@ -6550,7 +6548,7 @@ vectorizable_induction (gimple *phi,
   if (slp_node)
 ncopies = 1;
   else
-ncopies = vf / nunits;
+ncopies = vect_get_num_copies (loop_vinfo, vectype);
   gcc_assert (ncopies >= 1);
 
   /* FORNOW. These restrictions should be relaxed.  */
@@ -7013,12 +7011,17 @@ vectorizable_live_operation (gimple *stm
   tree lhs, lhs_type, bitsize, vec_bitsize;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+  int ncopies;
   gimple *use_stmt;
   auto_vec vec_oprnds;
 
   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
 
+  if (slp_node)
+ncopies = 1;
+  else
+ncopies = vect_get_num_copies 

Add gimple_build_vector* helpers

2017-09-14 Thread Richard Sandiford
This patch adds gimple-fold.h equivalents of build_vector and
build_vector_from_val.  Like the other gimple-fold.h routines
they always return a valid gimple value and add any new
statements to a given gimple_seq.  In combination with later
patches this reduces the number of force_gimple_operands.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* gimple-fold.h (gimple_build_vector_from_val): Declare, and provide
an inline wrapper that provides a location.
(gimple_build_vector): Likewise.
* gimple-fold.c (gimple_build_vector_from_val): New function.
(gimple_build_vector): Likewise.
* tree-vect-loop.c (get_initial_def_for_reduction): Use the new
functions to build the initial value.  Always return a gimple value.
(get_initial_defs_for_reduction): Likewise.  Only compute
neutral_vec once.
(vect_create_epilog_for_reduction): Don't call force_gimple_operand or
vect_init_vector on the results from get_initial_def(s)_for_reduction.
(vectorizable_induction): Use gimple_build_vector rather than
vect_init_vector.

Index: gcc/gimple-fold.h
===
--- gcc/gimple-fold.h   2017-07-08 11:37:46.573465901 +0100
+++ gcc/gimple-fold.h   2017-09-14 11:26:37.598804415 +0100
@@ -127,6 +127,21 @@ gimple_convert_to_ptrofftype (gimple_seq
   return gimple_convert_to_ptrofftype (seq, UNKNOWN_LOCATION, op);
 }
 
+extern tree gimple_build_vector_from_val (gimple_seq *, location_t, tree,
+ tree);
+inline tree
+gimple_build_vector_from_val (gimple_seq *seq, tree type, tree op)
+{
+  return gimple_build_vector_from_val (seq, UNKNOWN_LOCATION, type, op);
+}
+
+extern tree gimple_build_vector (gimple_seq *, location_t, tree, vec);
+inline tree
+gimple_build_vector (gimple_seq *seq, tree type, vec elts)
+{
+  return gimple_build_vector (seq, UNKNOWN_LOCATION, type, elts);
+}
+
 extern bool gimple_stmt_nonnegative_warnv_p (gimple *, bool *, int = 0);
 extern bool gimple_stmt_integer_valued_real_p (gimple *, int = 0);
 
Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   2017-09-14 11:24:42.666088258 +0100
+++ gcc/gimple-fold.c   2017-09-14 11:26:37.598804415 +0100
@@ -7058,6 +7058,58 @@ gimple_convert_to_ptrofftype (gimple_seq
   return gimple_convert (seq, loc, sizetype, op);
 }
 
+/* Build a vector of type TYPE in which each element has the value OP.
+   Return a gimple value for the result, appending any new statements
+   to SEQ.  */
+
+tree
+gimple_build_vector_from_val (gimple_seq *seq, location_t loc, tree type,
+ tree op)
+{
+  tree res, vec = build_vector_from_val (type, op);
+  if (is_gimple_val (vec))
+return vec;
+  if (gimple_in_ssa_p (cfun))
+res = make_ssa_name (type);
+  else
+res = create_tmp_reg (type);
+  gimple *stmt = gimple_build_assign (res, vec);
+  gimple_set_location (stmt, loc);
+  gimple_seq_add_stmt_without_update (seq, stmt);
+  return res;
+}
+
+/* Build a vector of type TYPE in which the elements have the values
+   given by ELTS.  Return a gimple value for the result, appending any
+   new instructions to SEQ.  */
+
+tree
+gimple_build_vector (gimple_seq *seq, location_t loc, tree type,
+vec elts)
+{
+  unsigned int nelts = elts.length ();
+  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
+  for (unsigned int i = 0; i < nelts; ++i)
+if (!TREE_CONSTANT (elts[i]))
+  {
+   vec *v;
+   vec_alloc (v, nelts);
+   for (i = 0; i < nelts; ++i)
+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, elts[i]);
+
+   tree res;
+   if (gimple_in_ssa_p (cfun))
+ res = make_ssa_name (type);
+   else
+ res = create_tmp_reg (type);
+   gimple *stmt = gimple_build_assign (res, build_constructor (type, v));
+   gimple_set_location (stmt, loc);
+   gimple_seq_add_stmt_without_update (seq, stmt);
+   return res;
+  }
+  return build_vector (type, elts);
+}
+
 /* Return true if the result of assignment STMT is known to be non-negative.
If the return value is based on the assumption that signed overflow is
undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-09-14 11:25:32.164167193 +0100
+++ gcc/tree-vect-loop.c2017-09-14 11:26:37.599804415 +0100
@@ -4044,33 +4044,18 @@ get_initial_def_for_reduction (gimple *s
 else
   def_for_init = build_int_cst (scalar_type, int_init_val);
 
-/* Create a vector of '0' or '1' 

Use vec<> for constant permute masks

2017-09-14 Thread Richard Sandiford
This patch makes can_vec_perm_p & co. take a vec<>, wrapped in new
typedefs vec_perm_indices and auto_vec_perm_indices.  There are two
reasons for doing this for SVE:

(1) it means that the number of elements is bundled with the elements
themselves, and is obviously constant.

(2) it makes it easier to change the "unsigned char" element type to
something wider.

I'm happy to change the target hooks as a follow-on patch, if this is OK.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.h (vec_perm_indices): New typedef.
(auto_vec_perm_indices): Likewise.
* optabs-query.h: Include target.h
(can_vec_perm_p): Take a vec_perm_indices *.
* optabs-query.c (can_vec_perm_p): Likewise.
(can_mult_highpart_p): Update accordingly.  Use auto_vec_perm_indices.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-generic.c (lower_vec_perm): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_shift_permute_load_chain): Likewise.
(vect_permute_store_chain): Use auto_vec_perm_indices.
(vect_permute_load_chain): Likewise.
* fold-const.c (fold_vec_perm): Take vec_perm_indices.
(fold_ternary_loc): Update accordingly.  Use auto_vec_perm_indices.
Update uses of can_vec_perm_p.
* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Replace the
mode with a number of elements.  Take a vec_perm_indices *.
(vect_create_epilog_for_reduction): Update accordingly.
Use auto_vec_perm_indices.
(have_whole_vector_shift): Likewise.  Update call to can_vec_perm_p.
* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Use auto_vec_perm_indices.
* tree-vectorizer.h (vect_gen_perm_mask_any): Take a vec_perm_indices.
(vect_gen_perm_mask_checked): Likewise.
* tree-vect-stmts.c (vect_gen_perm_mask_any): Take a vec_perm_indices.
(vect_gen_perm_mask_checked): Likewise.
(vectorizable_mask_load_store): Use auto_vec_perm_indices.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(perm_mask_for_reverse): Likewise.  Update call to can_vec_perm_p.
(vectorizable_bswap): Likewise.

Index: gcc/target.h
===
--- gcc/target.h2017-09-11 17:10:58.656085547 +0100
+++ gcc/target.h2017-09-14 11:25:32.162167193 +0100
@@ -191,6 +191,14 @@ enum vect_cost_model_location {
   vect_epilogue = 2
 };
 
+/* The type to use for vector permutes with a constant permute vector.
+   Each entry is an index into the concatenated input vectors.  */
+typedef vec vec_perm_indices;
+
+/* Same, but can be used to construct local permute vectors that are
+   automatically freed.  */
+typedef auto_vec auto_vec_perm_indices;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
Index: gcc/optabs-query.h
===
--- gcc/optabs-query.h  2017-08-30 12:14:51.272396735 +0100
+++ gcc/optabs-query.h  2017-09-14 11:25:32.162167193 +0100
@@ -21,6 +21,7 @@ the Free Software Foundation; either ver
 #define GCC_OPTABS_QUERY_H
 
 #include "insn-opinit.h"
+#include "target.h"
 
 /* Return the insn used to implement mode MODE of OP, or CODE_FOR_nothing
if the target does not have such an insn.  */
@@ -165,7 +166,7 @@ enum insn_code can_extend_p (machine_mod
 enum insn_code can_float_p (machine_mode, machine_mode, int);
 enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
 bool can_conditionally_move_p (machine_mode mode);
-bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
+bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
 enum insn_code widening_optab_handler (optab, machine_mode, machine_mode);
 /* Find a widening optab even if it doesn't widen as much as we want.  */
 #define find_widening_optab_handler(A,B,C,D) \
Index: gcc/optabs-query.c
===
--- gcc/optabs-query.c  2017-09-05 20:57:40.745898121 +0100
+++ gcc/optabs-query.c  2017-09-14 11:25:32.162167193 +0100
@@ -353,8 +353,7 @@ can_conditionally_move_p (machine_mode m
zeroes; this case is not dealt with here.  */
 
 bool
-can_vec_perm_p (machine_mode mode, bool variable,
-   const unsigned char *sel)
+can_vec_perm_p (machine_mode mode, bool variable, vec_perm_indices *sel)
 {
   machine_mode 

Use vec<> in build_vector

2017-09-14 Thread Richard Sandiford
This patch makes build_vector take the elements as a vec<> rather
than a tree *.  This is useful for SVE because it bundles the number
of elements with the elements themselves, and enforces the fact that
the number is constant.  Also, I think things like the folds can be used
with any generic GNU vector, not just those that match machine vectors,
so the arguments to XALLOCAVEC had no clear limit.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree.h (build_vector): Take a vec instead of a tree *.
* tree.c (build_vector): Likewise.
(build_vector_from_ctor): Update accordingly.
(build_vector_from_val): Likewise.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-generic.c (add_rshift): Likewise.
(expand_vector_divmod): Likewise.
(optimize_vector_constructor): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c (vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vect_gen_perm_mask_any): Likewise.  Add elements in order.
* expmed.c (make_tree): Likewise.
* fold-const.c (fold_negate_expr_1): Use auto_vec when building
a vector passed to build_vector.
(fold_convert_const): Likewise.
(exact_inverse): Likewise.
(fold_ternary_loc): Likewise.
(fold_relational_const): Likewise.
(const_binop): Likewise.  Use VECTOR_CST_ELT directly when operating
on VECTOR_CSTs, rather than going through vec_cst_ctor_to_array.
(const_unop): Likewise.  Store the reduction accumulator in a
variable rather than an array.
(vec_cst_ctor_to_array): Take the number of elements as a parameter.
(fold_vec_perm): Update calls accordingly.  Use auto_vec for
the new vector, rather than constructing it after the input arrays.
(native_interpret_vector): Use auto_vec when building
a vector passed to build_vector.  Add elements in order.
* tree-vect-loop.c (get_initial_defs_for_reduction): Use
auto_vec when building a vector passed to build_vector.
(vect_create_epilog_for_reduction): Likewise.
(vectorizable_induction): Likewise.
(get_initial_def_for_reduction): Likewise.  Fix indentation of
case statements.
* config/sparc/sparc.c (sparc_handle_vis_mul8x16): Change n_elts
to a vec *.
(sparc_fold_builtin): Use auto_vec when building a vector
passed to build_vector.

Index: gcc/tree.h
===
--- gcc/tree.h  2017-09-14 11:23:57.004947653 +0100
+++ gcc/tree.h  2017-09-14 11:24:42.669777533 +0100
@@ -4026,7 +4026,7 @@ extern tree build_int_cst (tree, HOST_WI
 extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst);
 extern tree build_int_cst_type (tree, HOST_WIDE_INT);
 extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
-extern tree build_vector (tree, tree * CXX_MEM_STAT_INFO);
+extern tree build_vector (tree, vec CXX_MEM_STAT_INFO);
 extern tree build_vector_from_ctor (tree, vec *);
 extern tree build_vector_from_val (tree, tree);
 extern void recompute_constructor_flags (tree);
Index: gcc/tree.c
===
--- gcc/tree.c  2017-09-14 11:23:57.004947653 +0100
+++ gcc/tree.c  2017-09-14 11:24:42.669777533 +0100
@@ -1702,18 +1702,20 @@ make_vector (unsigned len MEM_STAT_DECL)
 }
 
 /* Return a new VECTOR_CST node whose type is TYPE and whose values
-   are in a list pointed to by VALS.  */
+   are given by VALS.  */
 
 tree
-build_vector (tree type, tree *vals MEM_STAT_DECL)
+build_vector (tree type, vec vals MEM_STAT_DECL)
 {
+  unsigned int nelts = vals.length ();
+  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
   int over = 0;
   unsigned cnt = 0;
-  tree v = make_vector (TYPE_VECTOR_SUBPARTS (type));
+  tree v = make_vector (nelts);
   TREE_TYPE (v) = type;
 
   /* Iterate through elements and check for overflow.  */
-  for (cnt = 0; cnt < TYPE_VECTOR_SUBPARTS (type); ++cnt)
+  for (cnt = 0; cnt < nelts; ++cnt)
 {
   tree value = vals[cnt];
 
@@ -1736,20 +1738,21 @@ build_vector (tree type, tree *vals MEM_
 tree
 build_vector_from_ctor (tree type, vec *v)
 {
-  tree *vec = XALLOCAVEC (tree, TYPE_VECTOR_SUBPARTS (type));
-  unsigned HOST_WIDE_INT idx, pos = 0;
+  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type);
+  unsigned HOST_WIDE_INT idx;
   tree value;
 
+  auto_vec vec (nelts);
   

Store VECTOR_CST_NELTS directly in tree_node

2017-09-14 Thread Richard Sandiford
Previously VECTOR_CST_NELTS (t) read the number of elements from
TYPE_VECTOR_SUBPARTS (TREE_TYPE (t)).  There were two ways of handling
this with variable TYPE_VECTOR_SUBPARTS: either forcibly convert the
number to a constant (which is doable) or store the number directly
in the VECTOR_CST.  The latter seemed better, since it involves less
pointer chasing and since the tree_node u field is otherwise unused
for VECTOR_CST.  It would still be easy to switch to the former in
future if we need to free up the field for someting else.

The patch also changes various bits of VECTOR_CST code to use
VECTOR_CST_NELTS instead of TYPE_VECTOR_SUBPARTS when iterating
over VECTOR_CST_ELTs.  Also, when the two are checked for equality,
the patch prefers to read VECTOR_CST_NELTS (which must be constant)
and check against TYPE_VECTOR_SUBPARTS, instead of the other way
around.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-09-14  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-core.h (tree_base::u): Add an "nelts" field.
(tree_vector): Use VECTOR_CST_NELTS as the length.
* tree.c (tree_size): Likewise.
(make_vector): Initialize VECTOR_CST_NELTS.
* tree.h (VECTOR_CST_NELTS): Use the u.nelts field.
* cfgexpand.c (expand_debug_expr): Use VECTOR_CST_NELTS instead of
TYPE_VECTOR_SUBPARTS.
* expr.c (const_vector_mask_from_tree): Consistently use "units"
as the number of units, setting it from VECTOR_CST_NELTS.
(const_vector_from_tree): Likewise.
* fold-const.c (negate_expr_p): Use VECTOR_CST_NELTS instead of
TYPE_VECTOR_SUBPARTS for the number of elements in a VECTOR_CST.
(fold_negate_expr_1): Likewise.
(fold_convert_const): Likewise.
(const_binop): Likewise.  Differentiate the number of output and
input elements.
(const_unop): Likewise.
(fold_ternary_loc): Use VECTOR_CST_NELTS for the number of elements
in a VECTOR_CST, asserting that it is the same as TYPE_VECTOR_SUBPARTS
in cases that did the opposite.

Index: gcc/tree-core.h
===
--- gcc/tree-core.h 2017-08-21 10:42:05.815630531 +0100
+++ gcc/tree-core.h 2017-09-14 11:23:57.004041291 +0100
@@ -975,6 +975,9 @@ struct GTY(()) tree_base {
 /* VEC length.  This field is only used with TREE_VEC.  */
 int length;
 
+/* Number of elements.  This field is only used with VECTOR_CST.  */
+unsigned int nelts;
+
 /* SSA version number.  This field is only used with SSA_NAME.  */
 unsigned int version;
 
@@ -1326,7 +1329,7 @@ struct GTY(()) tree_complex {
 
 struct GTY(()) tree_vector {
   struct tree_typed typed;
-  tree GTY ((length ("TYPE_VECTOR_SUBPARTS (TREE_TYPE ((tree)&%h))"))) elts[1];
+  tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
 };
 
 struct GTY(()) tree_identifier {
Index: gcc/tree.c
===
--- gcc/tree.c  2017-09-11 17:10:38.700973860 +0100
+++ gcc/tree.c  2017-09-14 11:23:57.004947653 +0100
@@ -873,7 +873,7 @@ tree_size (const_tree node)
 
 case VECTOR_CST:
   return (sizeof (struct tree_vector)
- + (TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)) - 1) * sizeof (tree));
+ + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
 
 case STRING_CST:
   return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 
1;
@@ -1696,6 +1696,7 @@ make_vector (unsigned len MEM_STAT_DECL)
 
   TREE_SET_CODE (t, VECTOR_CST);
   TREE_CONSTANT (t) = 1;
+  VECTOR_CST_NELTS (t) = len;
 
   return t;
 }
Index: gcc/tree.h
===
--- gcc/tree.h  2017-08-30 12:19:19.721220029 +0100
+++ gcc/tree.h  2017-09-14 11:23:57.004947653 +0100
@@ -1026,7 +1026,7 @@ #define TREE_REALPART(NODE) (COMPLEX_CST
 #define TREE_IMAGPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.imag)
 
 /* In a VECTOR_CST node.  */
-#define VECTOR_CST_NELTS(NODE) (TYPE_VECTOR_SUBPARTS (TREE_TYPE (NODE)))
+#define VECTOR_CST_NELTS(NODE) (VECTOR_CST_CHECK (NODE)->base.u.nelts)
 #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
 #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
 
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c 2017-09-11 22:30:14.149035751 +0100
+++ gcc/cfgexpand.c 2017-09-14 11:23:57.002228567 +0100
@@ -4921,12 +4921,12 @@ expand_debug_expr (tree exp)
 
 case VECTOR_CST:
   {
-   unsigned i;
+   unsigned i, nelts;
 
-   op0 = gen_rtx_CONCATN
- (mode, rtvec_alloc (TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp;
+   nelts = VECTOR_CST_NELTS (exp);
+   op0 = gen_rtx_CONCATN (mode, 

Re: [PATCH, PR81844] Fix condition folding in c_parser_omp_for_loop

2017-09-14 Thread Jakub Jelinek
On Mon, Aug 14, 2017 at 10:25:22AM +0200, Tom de Vries wrote:
> 2017-08-14  Tom de Vries  
> 
>   PR c/81844

Please use PR c/81875 instead, now that you've filed it.

>   * c-parser.c (c_parser_omp_for_loop): Fix condition folding.

Fold only operands of cond, not cond itself.
?

>   * testsuite/libgomp.c/pr81805.c: New test.

Wouldn't it be worth to test it also for C++?  I know we don't have
libgomp.c-c++-common (maybe we should add that), so the current
way would be add libgomp.c++/pr81805.C that #includes the other test
source (if you tweak it for C++, it would need #ifdef __cplusplus "C" #endif
for abort).

> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -15027,7 +15027,24 @@ c_parser_omp_for_loop (location_t loc, c_parser 
> *parser, enum tree_code code,
>  
> cond = cond_expr.value;
> cond = c_objc_common_truthvalue_conversion (cond_loc, cond);
> -   cond = c_fully_fold (cond, false, NULL);
> +   switch (TREE_CODE (cond))

Just do if (COMPARISON_CLASS_P (cond)) instead of the switch?

> + {
> + case GT_EXPR:
> + case GE_EXPR:
> + case LT_EXPR:
> + case LE_EXPR:
> + case NE_EXPR:
> +   {
> + tree op0 = TREE_OPERAND (cond, 0), op1 = TREE_OPERAND (cond, 1);
> + op0 = c_fully_fold (op0, false, NULL);
> + op1 = c_fully_fold (op1, false, NULL);
> + TREE_OPERAND (cond, 0) = op0;
> + TREE_OPERAND (cond, 1) = op1;
> +   }
> +   break;
> + default:
> +   break;
> + }
> switch (cond_expr.original_code)
>   {
>   case GT_EXPR:

Ok with those changes and sorry for the review delay.

Jakub


Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-14 Thread Richard Biener
On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt
 wrote:
> On Sep 13, 2017, at 10:40 AM, Bill Schmidt  
> wrote:
>>
>> On Sep 13, 2017, at 7:23 AM, Richard Biener  
>> wrote:
>>>
>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>>>  wrote:
 Hi,

 [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

 Folding of vector loads in GIMPLE.

 Add code to handle gimple folding for the vec_ld builtins.
 Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
 Surrounding
 comments have been adjusted slightly so they continue to read OK for the
 existing vec_st code.

 The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c
 tests which have been posted separately.

 For V2 of this patch, I've removed the chunk of code that prohibited the
 gimple fold from occurring in BE environments.   This had fixed an issue
 for me earlier during my development of the code, and turns out this was
 not necessary.  I've sniff-tested after removing that check and it looks
 OK.

> + /* Limit folding of loads to LE targets.  */
> +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
> +return false;

 I've restarted a regression test on this updated version.

 OK for trunk (assuming successful regression test completion)  ?

 Thanks,
 -Will

 [gcc]

   2017-09-12  Will Schmidt  

   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
   for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
   * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
   Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.

 diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
 index fbab0a2..bb8a77d 100644
 --- a/gcc/config/rs6000/rs6000-c.c
 +++ b/gcc/config/rs6000/rs6000-c.c
 @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t 
 loc, tree fndecl,
convert (TREE_TYPE (stmt), arg0));
  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
  return stmt;
}

 -  /* Expand vec_ld into an expression that masks the address and
 - performs the load.  We need to expand this early to allow
 +  /* Expand vec_st into an expression that masks the address and
 + performs the store.  We need to expand this early to allow
 the best aliasing, as by the time we get into RTL we no longer
 are able to honor __restrict__, for example.  We may want to
 consider this for all memory access built-ins.

 When -maltivec=be is specified, or the wrong number of arguments
 is provided, simply punt to existing built-in processing.  */
 -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
 -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
 -  && nargs == 2)
 -{
 -  tree arg0 = (*arglist)[0];
 -  tree arg1 = (*arglist)[1];
 -
 -  /* Strip qualifiers like "const" from the pointer arg.  */
 -  tree arg1_type = TREE_TYPE (arg1);
 -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
 ARRAY_TYPE)
 -   goto bad;
 -
 -  tree inner_type = TREE_TYPE (arg1_type);
 -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
 -   {
 - arg1_type = build_pointer_type (build_qualified_type (inner_type,
 -   0));
 - arg1 = fold_convert (arg1_type, arg1);
 -   }
 -
 -  /* Construct the masked address.  Let existing error handling take
 -over if we don't have a constant offset.  */
 -  arg0 = fold (arg0);
 -
 -  if (TREE_CODE (arg0) == INTEGER_CST)
 -   {
 - if (!ptrofftype_p (TREE_TYPE (arg0)))
 -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
 -
 - tree arg1_type = TREE_TYPE (arg1);
 - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
 -   {
 - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
 - tree const0 = build_int_cstu (sizetype, 0);
 - tree arg1_elt0 = build_array_ref (loc, arg1, const0);
 - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
 -   }
 -
 - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
 -  arg1, arg0);
 - tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, 
 addr,
 - build_int_cst (arg1_type, -16));
 -
 - /* Find the built-in to get the return type so we can convert
 -

Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-09-14 Thread Richard Biener
On Wed, Sep 13, 2017 at 7:34 PM, Martin Jambor  wrote:
> Hello,
>
> I apologize for not coming back to this, I keep on getting distracted.
> Anyway...
>
> On Tue, Aug 15, 2017 at 02:20:55PM +, Joseph Myers wrote:
>> On Tue, 15 Aug 2017, Martin Jambor wrote:
>>
>> > I am not sure what to do about this, to me it seems that the
>> > -ffp-int-builtin-inexact simply has a wrong default value, at least
>> > for x86_64, as it was added in order not to slow code down but does
>> > exactly that (all of the slowdown of course disappears when
>> > -fno-fp-int-builtin-inexact is used).
>> >
>> > Or is the situation somehow more complex?
>>
>> It's supposed to be that -ffp-int-builtin-inexact allows inexact to be
>> raised, and is on by default, and -fno-fp-int-builtin-inexact is the
>> nondefault option that disallows it from being raised and may result in
>> slower code generation.
>>
>> As I understand it, your issue is actually with inline SSE expansions of
>> certain functions.  Before my patch, those had !flag_trapping_math
>> conditionals.  My patch changed that to the logically correct
>> (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact), that
>> being the conditions under which the expansion in question is correct.
>> Your problem is that the expansion, though correct under those conditions,
>> is slow compared to an IFUNC implementation of the library function.
>
> ...that is exactly right (modulo the fact that TARGET_ROUND meanwhile
> became TARGET_SSE4_1.
>
>>
>> Maybe that means that expansion should be disabled under some conditions
>> where it is correct but suboptimal.  It should be kept for TARGET_ROUND,
>> because then it's expanding to a single instruction.  But for
>> !TARGET_ROUND, it's a tuning question (e.g. if tuning for a processor that
>> would satisfy TARGET_ROUND, or for -mtune=generic, and building with
>> recent-enough glibc, the expansion should be avoided as suboptimal, on the
>> expectation that at runtime an IFUNC is likely to be available - or given
>> the size of the generic SSE expansion, maybe it should be avoided more
>> generally than that).
>
> This seems to me the best solution.  SSE 4.1 is 11 years old, we
> should be tuning for it in generic tuning.  That is also the reason
> why I do not think run-time checks for SSE 4.1 or an attempt at an
> internal IFUNC are a good idea (or justified effort).

Well, it's of course the poor-mans solution compared to providing our own
ifunc-enabled libm ...

I would expect that for SSE 4.1 the PLT and call overhead is measurable
and an inline run-time check be quite a bit more efficient.  As you have a
testcase would it be possible to measure that by hand-editing the assembly
(or the benchmark source in case it is not fortran...)?

The whole point of having the inline expansions was to have inline expansions,
avoding the need to spill the whole set of SSE regs around such calls.

> I was just surprised by the glibc check, what would you consider a
> recent-enough glibc?  Or is the check mainly necessary to ensure we
> are indeed using glibc and not some other libc (and thus something
> like we do for TARGET_LIBC_PROVIDES_SSP would do)?
>
> I will try to come up with a patch.

I don't think this is the appropriate solution.  Try disabling the inline
expansion and run SPEC (without -march=sse4.1 of course).

I realize that doing the inline-expansion with a runtime check
is going to be quite tricky and the GCC local IFUNC trick doesn't
solve the inlining (but we might be able to avoid spilling with some
IPA RA help and/or attributes?).

Richard.

> Thanks,
>
> Martin


Re: [testsuite, sparc] Don't xfail gcc.dg/vect/vect-multitypes-12.c on 32-bit SPARC (PR tree-optimization/80996)

2017-09-14 Thread Rainer Orth
Hi Richard,

> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> target
> sparc*-*-* xfail ilp32 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> target
> sparc*-*-* } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> target
> { { ! sparc*-*-* }  && { ! vect_unpack } } } } } */
>
> merge the sparc line with the last?  That is, remove the { ! sparc*-*-* }
> there?

of course, thanks for noticing.  Installed like this.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-09-14  Rainer Orth  

PR tree-optimization/80996
* gcc.dg/vect/vect-multitypes-12.c: Remove sparc*-*-* handling.

# HG changeset patch
# Parent  b1b31b27e4684d67a402b5172d464e79b8ed3fb6
Don't xfail gcc.dg/vect/vect-multitypes-12.c on 32-bit SPARC (PR tree-optimization/80996)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
@@ -40,6 +40,6 @@ int main (void)
 
 /* bleah */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target sparc*-*-* xfail ilp32 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target sparc*-*-* } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { ! sparc*-*-* }  && { ! vect_unpack } } } } } */
 


Re: [PATCH] Make __FUNCTION__ a mergeable string and do not generate symbol entry.

2017-09-14 Thread Martin Liška
On 08/10/2017 09:43 PM, Jason Merrill wrote:
> On 07/14/2017 01:35 AM, Martin Liška wrote:
>> On 05/01/2017 09:13 PM, Jason Merrill wrote:
>>> On Wed, Apr 26, 2017 at 6:58 AM, Martin Liška  wrote:
 On 04/25/2017 01:58 PM, Jakub Jelinek wrote:
> On Tue, Apr 25, 2017 at 01:48:05PM +0200, Martin Liška wrote:
>> Hello.
>>
>> This is patch that was originally installed by Jason and later reverted 
>> due to PR70422.
>> In the later PR Richi suggested a fix for that and Segher verified that 
>> it helped him
>> to survive regression tests. That's reason why I'm resending that.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression 
>> tests.
>>
>> Ready to be installed?
>> Martin
>
>> >From a34ce0ef37ae00609c9f3ff98a9cb0b7db6a8bd0 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Thu, 20 Apr 2017 14:56:30 +0200
>> Subject: [PATCH] Make __FUNCTION__ a mergeable string and do not generate
>>   symbol entry.
>>
>> gcc/cp/ChangeLog:
>>
>> 2017-04-20  Jason Merrill  
>>   Martin Liska  
>>   Segher Boessenkool  
>>
>>   PR c++/64266
>>   PR c++/70353
>>   PR bootstrap/70422
>>   Core issue 1962
>>   * decl.c (cp_fname_init): Decay the initializer to pointer.
>>   (cp_make_fname_decl): Set DECL_DECLARED_CONSTEXPR_P,
>>   * pt.c (tsubst_expr) [DECL_EXPR]: Set DECL_VALUE_EXPR,
>>   DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P and
>>   DECL_IGNORED_P.  Don't call cp_finish_decl.
>
> If we don't emit those into the debug info, will the debugger be
> able to handle __FUNCTION__ etc. properly?

 No, debugger with the patch can't handled these. Similar to how clang
 behaves currently. Maybe it can be conditionally enabled with -g3, or -g?

> Admittedly, right now we emit it into debug info only if those decls
> are actually used, say on:
> const char * foo () { return __FUNCTION__; }
> const char * bar () { return ""; }
> we'd emit foo::__FUNCTION__, but not bar::__FUNCTION__, so the debugger
> has to have some handling of it anyway.  But while in functions
> that don't refer to __FUNCTION__ it is always the debugger that needs
> to synthetize those and thus they will be always pointer-equal,
> if there are some uses of it and for other uses the debugger would
> synthetize it, there is the possibility that the debugger synthetized
> string will not be the same object as actually used in the function.

 You're right, currently one has to use a special function to be able to
 print it in debugger. I believe we've already discussed that, according
 to spec, the strings don't have to point to a same string.

 Suggestions what we should do with the patch?
>>>
>>> We need to emit debug information for these variables.  From Jim's
>>> description in 70422 it seems that the problem is that the reference
>>> to the string from the debug information is breaking
>>> function_mergeable_rodata_prefix, which relies on
>>> current_function_decl.  It seems to me that its callers should pass
>>> along their decl parameter so that f_m_r_p can use the decl's
>>> DECL_CONTEXT rather than rely on current_function_decl being set
>>> properly>
>>> Jason
>>>
>>
>> Ok, after some time I returned back to it. I followed your advises and
>> changed the function function_mergeable_rodata_prefix. Apart from a small
>> rebase was needed.
>>
>> May I ask Jim to test the patch?
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
>> +  DECL_IGNORED_P (decl) = 1;
> 
> As I said before, we do need to emit debug information for these variables, 
> so this is wrong.

Hello.

Sorry for overlooking that.

> 
>> -  section *s = targetm.asm_out.function_rodata_section 
>> (current_function_decl);
>> +  tree decl = current_function_decl;
>> +  if (decl && DECL_CONTEXT (decl)
>> +  && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
>> +    decl = DECL_CONTEXT (decl);
> 
> I don't see how this would help; it still relies on current_function_decl 
> being set correctly, which was the problem we were running into before.

I see, that's what I wanted to discuss on Cauldron with you, but eventually I 
did not find time.
Well problem that I see is that we want to make decision based on 
DECL_CONTEXT(fname_decl), but mergeable_string_section
is called with STRING_CST (which is VALUE_EXPR of created fname_decl):

#0  mergeable_string_section (decl=0x2c2bf1a0, align=64, flags=0) at 
../../gcc/varasm.c:796
#1  0x01594ce3 in default_elf_select_section (decl=0x2c2bf1a0, 
reloc=0, align=64) at ../../gcc/varasm.c:6641
#2  0x0158b649 in get_constant_section (exp=0x2c2bf1a0, align=64) 
at ../../gcc/varasm.c:3284
#3  

[AArch64] Improve LDP/STP generation that requires a base register

2017-09-14 Thread Jackson Woodruff

Hi all,

This patch generalizes the formation of LDP/STP that require a base 
register.


Previously, we would only accept address pairs that were ordered in 
ascending or descending order, and only strictly sequential loads/stores.


This patch improves that by allowing us to accept all orders of 
loads/stores that are valid, and extending the range that the LDP/STP 
addresses can reach.


This patch is based on 
https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00741.html


OK for trunk?

Jackson

ChangeLog:

gcc/

2017-08-09  Jackson Woodruff  

* aarch64.c (aarch64_host_wide_int_compare): New.
(aarch64_ldrstr_offset_compare): New.
(aarch64_operands_adjust_ok_for_ldpstp): Change to consider all
load/store orderings.
(aarch64_gen_adjusted_ldpstp): Likewise.

gcc/testsuite

2017-08-09  Jackson Woodruff  

* gcc.target/aarch64/simd/ldp_stp_9: New.
* gcc.target/aarch64/simd/ldp_stp_10: New.
* gcc.target/aarch64/simd/ldp_stp_11: New.
* gcc.target/aarch64/simd/ldp_stp_12: New.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4c5ed9610cb8bbb337bbfcb9260d7fd227c68ce8..e015bc440e0c5e4cd85b6b92a9058bb69ada6fa1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14799,6 +14799,49 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load,
   return true;
 }
 
+int
+aarch64_host_wide_int_compare (const void *x, const void *y)
+{
+  return wi::cmps (* ((const HOST_WIDE_INT *) x),
+		   * ((const HOST_WIDE_INT *) y));
+}
+
+/* Taking X and Y to be pairs of RTX, one pointing to a MEM rtx and the
+   other pointing to a REG rtx containing an offset, compare the offsets
+   of the two pairs.
+
+   Return:
+
+	1 iff offset (X) > offset (Y)
+	0 iff offset (X) == offset (Y)
+	-1 iff offset (X) < offset (Y)
+ */
+int
+aarch64_ldrstr_offset_compare (const void *x, const void *y)
+{
+  const rtx * operands_1 = (const rtx *) x;
+  const rtx * operands_2 = (const rtx *) y;
+  rtx mem_1, mem_2, base, offset_1, offset_2;
+
+  if (GET_CODE (operands_1[0]) == MEM)
+mem_1 = operands_1[0];
+  else
+mem_1 = operands_1[1];
+
+  if (GET_CODE (operands_2[0]) == MEM)
+mem_2 = operands_2[0];
+  else
+mem_2 = operands_2[1];
+
+  /* Extract the offsets.  */
+  extract_base_offset_in_addr (mem_1, , _1);
+  extract_base_offset_in_addr (mem_2, , _2);
+
+  gcc_assert (offset_1 != NULL_RTX && offset_2 != NULL_RTX);
+
+  return wi::cmps (INTVAL (offset_1), INTVAL (offset_2));
+}
+
 /* Given OPERANDS of consecutive load/store that can be merged,
swap them if they are not in ascending order.  */
 void
@@ -14859,7 +14902,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
    scalar_mode mode)
 {
   enum reg_class rclass_1, rclass_2, rclass_3, rclass_4;
-  HOST_WIDE_INT offval_1, offval_2, offval_3, offval_4, msize;
+  HOST_WIDE_INT offvals[4], msize;
   rtx mem_1, mem_2, mem_3, mem_4, reg_1, reg_2, reg_3, reg_4;
   rtx base_1, base_2, base_3, base_4, offset_1, offset_2, offset_3, offset_4;
 
@@ -14875,8 +14918,12 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
   mem_4 = operands[7];
   gcc_assert (REG_P (reg_1) && REG_P (reg_2)
 		  && REG_P (reg_3) && REG_P (reg_4));
-  if (REGNO (reg_1) == REGNO (reg_2) || REGNO (reg_3) == REGNO (reg_4))
-	return false;
+
+  /* Do not attempt to merge the loads if the loads clobber each other.  */
+  for (int i = 0; i < 8; i += 2)
+	for (int j = i + 2; j < 8; j += 2)
+	  if (REGNO (operands[i]) == REGNO (operands[j]))
+	return false;
 }
   else
 {
@@ -14918,34 +14965,36 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
   || !rtx_equal_p (base_3, base_4))
 return false;
 
-  offval_1 = INTVAL (offset_1);
-  offval_2 = INTVAL (offset_2);
-  offval_3 = INTVAL (offset_3);
-  offval_4 = INTVAL (offset_4);
+  offvals[0] = INTVAL (offset_1);
+  offvals[1] = INTVAL (offset_2);
+  offvals[2] = INTVAL (offset_3);
+  offvals[3] = INTVAL (offset_4);
   msize = GET_MODE_SIZE (mode);
-  /* Check if the offsets are consecutive.  */
-  if ((offval_1 != (offval_2 + msize)
-   || offval_1 != (offval_3 + msize * 2)
-   || offval_1 != (offval_4 + msize * 3))
-  && (offval_4 != (offval_3 + msize)
-	  || offval_4 != (offval_2 + msize * 2)
-	  || offval_4 != (offval_1 + msize * 3)))
+
+  /* Check if the offsets can be put in the right order to do a ldp/stp.  */
+  qsort (offvals, 4, sizeof (HOST_WIDE_INT), aarch64_host_wide_int_compare);
+
+  if (!(offvals[1] == offvals[0] + msize
+	&& offvals[3] == offvals[2] + msize))
 return false;
 
-  /* Check if the addresses are clobbered by load.  */
-  if (load)
-{
-  if (reg_mentioned_p (reg_1, mem_1)
-	  || reg_mentioned_p (reg_2, mem_2)
-	  || reg_mentioned_p (reg_3, mem_3))
-	return false;
+  /* Check that offsets close enough togther.  The ldp/stp instructions 

Re: [testsuite, sparc] Don't xfail gcc.dg/vect/vect-multitypes-12.c on 32-bit SPARC (PR tree-optimization/80996)

2017-09-14 Thread Richard Biener
On Thu, 14 Sep 2017, Rainer Orth wrote:

> Since
> 
> 2017-06-02  Richard Biener  
> 
> * tree-vect-loop.c (vect_analyze_loop_operations): Not relevant
> PHIs are ok.
> * tree-vect-stmts.c (process_use): Do not mark backedge defs
> for inductions as relevant.
> 
> gcc.dg/vect/vect-multitypes-12.c XPASSes on 32-bit SPARC:
> 
> XPASS: gcc.dg/vect/vect-multitypes-12.c -flto -ffat-lto-objects  
> scan-tree-dump
> -times vect "vectorized 1 loops" 1
> XPASS: gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect "vectorized 
> 1
>  loops" 1
> 
> Fixed by removing the xfail.  Tested with the appropriate runtest
> invocation on sparc-sun-solaris2.11, i386-pc-solaris2.11, and
> x86_64-pc-linux-gnu.  Ok for mainline?

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
target
sparc*-*-* xfail ilp32 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
target
sparc*-*-* } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
target
{ { ! sparc*-*-* }  && { ! vect_unpack } } } } } */

merge the sparc line with the last?  That is, remove the { ! sparc*-*-* }
there?

Ok with that change.

Richard.


[testsuite, sparc] Don't xfail gcc.dg/vect/vect-multitypes-12.c on 32-bit SPARC (PR tree-optimization/80996)

2017-09-14 Thread Rainer Orth
Since

2017-06-02  Richard Biener  

* tree-vect-loop.c (vect_analyze_loop_operations): Not relevant
PHIs are ok.
* tree-vect-stmts.c (process_use): Do not mark backedge defs
for inductions as relevant.

gcc.dg/vect/vect-multitypes-12.c XPASSes on 32-bit SPARC:

XPASS: gcc.dg/vect/vect-multitypes-12.c -flto -ffat-lto-objects  scan-tree-dump
-times vect "vectorized 1 loops" 1
XPASS: gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect "vectorized 1
 loops" 1

Fixed by removing the xfail.  Tested with the appropriate runtest
invocation on sparc-sun-solaris2.11, i386-pc-solaris2.11, and
x86_64-pc-linux-gnu.  Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-09-14  Rainer Orth  

PR tree-optimization/80996
* gcc.dg/vect/vect-multitypes-12.c: Don't xfail
scan-tree-dump-times on sparc*-*-* ilp32.

# HG changeset patch
# Parent  b1b31b27e4684d67a402b5172d464e79b8ed3fb6
Don't xfail gcc.dg/vect/vect-multitypes-12.c on 32-bit SPARC (PR tree-optimization/80996)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
@@ -40,6 +40,6 @@ int main (void)
 
 /* bleah */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target sparc*-*-* xfail ilp32 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target sparc*-*-* } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { ! sparc*-*-* }  && { ! vect_unpack } } } } } */
 


Minor tweak to dwarf2out_source_line

2017-09-14 Thread Eric Botcazou
The function contains these lines:

  if (debug_column_info)
fprint_ul (asm_out_file, column);
  else
putc ('0', asm_out_file);

but they are dominated by:

  if (!debug_column_info)
column = 0;


Bootstrapped/regtested on x86_64-suse-linux, applied on mainline as obvious.


2017-09-14  Eric Botcazou  

* dwarf2out.c (dwarf2out_source_line): Remove superfluous test.

-- 
Eric BotcazouIndex: dwarf2out.c
===
--- dwarf2out.c	(revision 252749)
+++ dwarf2out.c	(working copy)
@@ -26645,10 +26645,7 @@ dwarf2out_source_line (unsigned int line
   putc (' ', asm_out_file);
   fprint_ul (asm_out_file, line);
   putc (' ', asm_out_file);
-  if (debug_column_info)
-	fprint_ul (asm_out_file, column);
-  else
-	putc ('0', asm_out_file);
+  fprint_ul (asm_out_file, column);
 
   if (is_stmt != table->is_stmt)
 	{


[committed] Formatting fixes in the combiner

2017-09-14 Thread Jakub Jelinek
Hi!

While debugging this function I've noticed way too many formatting issues
and fixed them, committed as obvious to trunk:

2017-09-14  Jakub Jelinek  

* combine.c (make_compound_operation_int): Formatting fixes.

--- gcc/combine.c.jj2017-09-12 21:58:06.0 +0200
+++ gcc/combine.c   2017-09-13 14:37:07.521081009 +0200
@@ -7976,8 +7976,8 @@ make_compound_operation_int (scalar_int_
  && (i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
{
  new_rtx = make_compound_operation (XEXP (XEXP (x, 0), 0), next_code);
- new_rtx = make_extraction (mode, new_rtx, 0, XEXP (XEXP (x, 0), 1), 
i, 1,
-0, in_code == COMPARE);
+ new_rtx = make_extraction (mode, new_rtx, 0, XEXP (XEXP (x, 0), 1),
+i, 1, 0, in_code == COMPARE);
}
 
   /* Same as previous, but for (subreg (lshiftrt ...)) in first op.  */
@@ -8016,10 +8016,10 @@ make_compound_operation_int (scalar_int_
{
  /* Apply the distributive law, and then try to make extractions.  */
  new_rtx = gen_rtx_fmt_ee (GET_CODE (XEXP (x, 0)), mode,
-   gen_rtx_AND (mode, XEXP (XEXP (x, 0), 0),
-XEXP (x, 1)),
-   gen_rtx_AND (mode, XEXP (XEXP (x, 0), 1),
-XEXP (x, 1)));
+   gen_rtx_AND (mode, XEXP (XEXP (x, 0), 0),
+XEXP (x, 1)),
+   gen_rtx_AND (mode, XEXP (XEXP (x, 0), 1),
+XEXP (x, 1)));
  new_rtx = make_compound_operation (new_rtx, in_code);
}
 
@@ -8033,9 +8033,9 @@ make_compound_operation_int (scalar_int_
{
  new_rtx = make_compound_operation (XEXP (XEXP (x, 0), 0), next_code);
  new_rtx = make_extraction (mode, new_rtx,
-(GET_MODE_PRECISION (mode)
- - INTVAL (XEXP (XEXP (x, 0), 1))),
-NULL_RTX, i, 1, 0, in_code == COMPARE);
+(GET_MODE_PRECISION (mode)
+ - INTVAL (XEXP (XEXP (x, 0), 1))),
+NULL_RTX, i, 1, 0, in_code == COMPARE);
}
 
   /* On machines without logical shifts, if the operand of the AND is
@@ -8055,8 +8055,10 @@ make_compound_operation_int (scalar_int_
  if ((INTVAL (XEXP (x, 1)) & ~mask) == 0)
SUBST (XEXP (x, 0),
   gen_rtx_ASHIFTRT (mode,
-make_compound_operation
-(XEXP (XEXP (x, 0), 0), next_code),
+make_compound_operation (XEXP (XEXP (x,
+ 0),
+   0),
+ next_code),
 XEXP (XEXP (x, 0), 1)));
}
 
@@ -8066,9 +8068,9 @@ make_compound_operation_int (scalar_int_
 we are in a COMPARE.  */
   else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
new_rtx = make_extraction (mode,
-  make_compound_operation (XEXP (x, 0),
-   next_code),
-  0, NULL_RTX, i, 1, 0, in_code == COMPARE);
+  make_compound_operation (XEXP (x, 0),
+   next_code),
+  0, NULL_RTX, i, 1, 0, in_code == COMPARE);
 
   /* If we are in a comparison and this is an AND with a power of two,
 convert this into the appropriate bit extract.  */
@@ -8119,9 +8121,9 @@ make_compound_operation_int (scalar_int_
  && (nonzero_bits (XEXP (x, 0), mode) & (1 << (mode_width - 1))) == 0)
{
  new_rtx = gen_rtx_ASHIFTRT (mode,
- make_compound_operation (XEXP (x, 0),
-  next_code),
- XEXP (x, 1));
+ make_compound_operation (XEXP (x, 0),
+  next_code),
+ XEXP (x, 1));
  break;
}
 
@@ -8142,9 +8144,9 @@ make_compound_operation_int (scalar_int_
{
  new_rtx = make_compound_operation (XEXP (lhs, 0), next_code);
  new_rtx = make_extraction (mode, new_rtx,
-INTVAL (rhs) - INTVAL (XEXP (lhs, 1)),
-NULL_RTX, mode_width - INTVAL (rhs),
-code == LSHIFTRT, 

[PATCH] Add comments to struct cgraph_thunk_info

2017-09-14 Thread Pierre-Marie de Rodat
Hello,

This commit adds comments to fields in the cgraph_thunk_info structure
declaration from cgraph.h. They will hopefully answer questions that
people like myself can ask while discovering the thunk machinery.  I
also made an assertion stricter in cgraph_node::create_thunk.

I'm adding Nathan in copy as we discussed this thunk matter at this
year's Cauldron. :-)

Bootsrapped and regtested on x86_64-linux.  Ok to commit?  Thank you in
advance!

gcc/

* cgraph.h (cgraph_thunk_info): Add comments, reorder fields.
* cgraph.c (cgraph_node::create_thunk): Adjust comment, make
assert for VIRTUAL_* arguments stricter.
---
 gcc/cgraph.c | 10 +++---
 gcc/cgraph.h | 36 +---
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 69aa6c5bce2..20ab418d410 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -603,7 +603,7 @@ cgraph_node::create_same_body_alias (tree alias, tree decl)
 
 /* Add thunk alias into callgraph.  The alias declaration is ALIAS and it
aliases DECL with an adjustments made into the first parameter.
-   See comments in thunk_adjust for detail on the parameters.  */
+   See comments in struct cgraph_thunk_info for detail on the parameters.  */
 
 cgraph_node *
 cgraph_node::create_thunk (tree alias, tree, bool this_adjusting,
@@ -619,8 +619,12 @@ cgraph_node::create_thunk (tree alias, tree, bool 
this_adjusting,
 node->reset ();
   else
 node = cgraph_node::create (alias);
-  gcc_checking_assert (!virtual_offset
-  || wi::eq_p (virtual_offset, virtual_value));
+
+  /* Make sure that if VIRTUAL_OFFSET is in sync with VIRTUAL_VALUE.  */
+  gcc_checking_assert (virtual_offset
+  ? wi::eq_p (virtual_offset, virtual_value)
+  : virtual_value == 0);
+
   node->thunk.fixed_offset = fixed_offset;
   node->thunk.this_adjusting = this_adjusting;
   node->thunk.virtual_value = virtual_value;
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 57cdaa45681..372ac9f01aa 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -629,18 +629,40 @@ extern const char * const cgraph_availability_names[];
 extern const char * const ld_plugin_symbol_resolution_names[];
 extern const char * const tls_model_names[];
 
-/* Information about thunk, used only for same body aliases.  */
+/* Sub-structure of cgraph_node.  Holds information about thunk, used only for
+   same body aliases.
+
+   Thunks are basically wrappers around methods which are introduced in case
+   of multiple inheritance in order to adjust the value of the "this" pointer
+   or of the returned value.  */
 
 struct GTY(()) cgraph_thunk_info {
-  /* Information about the thunk.  */
-  HOST_WIDE_INT fixed_offset;
-  HOST_WIDE_INT virtual_value;
-  tree alias;
+  /* Set to true when alias node (the cgraph_node to which this struct belong)
+ is a thunk.  Access to any other fields is invalid if this is false.  */
+  bool thunk_p;
+
+  /* Nonzero for a "this" adjusting thunk and zero for a result adjusting
+ thunk.  */
   bool this_adjusting;
+
+  /* If true, this thunk is what we call a virtual thunk.  In this case, after
+ the FIXED_OFFSET based adjustment is done, add to the result the offset
+ found in the vtable at: vptr + VIRTUAL_VALUE.  */
   bool virtual_offset_p;
+
+  /* ??? True for special kind of thunks, seems related to instrumentation.  */
   bool add_pointer_bounds_args;
-  /* Set to true when alias node is thunk.  */
-  bool thunk_p;
+
+  /* Offset used to adjust "this".  */
+  HOST_WIDE_INT fixed_offset;
+
+  /* Offset in the virtual table to get the offset to adjust "this".  Valid iff
+ VIRTUAL_OFFSET_P is true.  */
+  HOST_WIDE_INT virtual_value;
+
+  /* Thunk target, i.e. the method that this thunk wraps.  Depending on the
+ TARGET_USE_LOCAL_THUNK_ALIAS_P macro, this may have to be a new alias.  */
+  tree alias;
 };
 
 /* Information about the function collected locally.
-- 
2.14.1



Re: [OpenACC] Enable SIMD vectorization on vector loops

2017-09-14 Thread Jakub Jelinek
On Wed, Sep 13, 2017 at 04:20:32PM -0700, Cesar Philippidis wrote:
> 2017-09-13  Cesar Philippidis  
> 
>   gcc/
>   * omp-offload.c (oacc_xform_loop): Enable SIMD vectorization on
>   non-SIMT targets in acc vector loops.

Ok, thanks.

Jakub