[PATCH] Fix segfault on verify_dominators error path

2016-05-27 Thread Alan Modra
Committed as obvious.  If the next check also fails, an attempt is
made to print imm_bb->index.

* dominance.c (verify_dominators): Don't segfault on NULL imm_bb.

Index: gcc/dominance.c
===
--- gcc/dominance.c (revision 236843)
+++ gcc/dominance.c (working copy)
@@ -1024,6 +1024,7 @@
{
  error ("dominator of %d status unknown", bb->index);
  err = true;
+ continue;
}
 
   basic_block imm_bb_correct = di.get_idom (bb);

-- 
Alan Modra
Australia Development Lab, IBM


Re: Dominance related breakage, was Re: [PATCH] PR71275 ira.c bb_loop_depth

2016-05-27 Thread Alan Modra
On Thu, May 26, 2016 at 11:04:41PM -0400, Vladimir Makarov wrote:
> On 05/26/2016 10:14 PM, Alan Modra wrote:
> >On Thu, May 26, 2016 at 10:12:14AM -0400, Vladimir Makarov wrote:
> >>On 05/26/2016 07:02 AM, Alan Modra wrote:
> >>>This fixes lack of bb_loop_depth info in some of the early parts of
> >>>ira, which has been the case for quite some time.  All active branches
> >>>return 0 from bb_loop_depth() in update_equiv_regs, but whether that
> >>>actually causes mis-optimization anywhere but trunk is yet to be
> >>>determined.
> >>>
> >>>I played a little with trying to consolidate this loop_optimizer_init
> >>>call with one that occurs a little later, but ran into ICEs.  (We now
> >>>have four calls to loop_optimizer_init in ira.c.)
> >>>
> >>>Bootstrapped and regression tested powerpc64le-linux and x86_64-linux.
> >>>OK to apply?
> >>>
> >>Yes.  Thank you, Alan.
> >Hi Vlad,
> >Sorry to do this to you and others, but the patch (committed as
> >r236789) may be wrong.  I didn't see any problems on trunk but when
> >I backported to gcc-5, I hit an error in stage2 compiling
> >insn-recog.c "dominator of 10 status unknown" from if_after_reload.
> >
> >On gcc-5, the error disappears by adding a call to
> >   free_dominance_info (CDI_DOMINATORS);
> >after the newly added call to loop_optimizer_finalize.
> >
> >I'm not sure yet what is going on.  Does anyone know whether the
> >free_dominance_info call is needed on trunk?
> >
> That is ok.  It is always a discovery.  I am not sure but I think I saw this
> problem when I wrote IRA.
> 
> Looking at the dominance code, I seems to me that it can reuse the previous
> info if it was not cleared.  So I guess free_dominance_info is important.

Given your comment, I've committed the following after running
a full regression check.

PR rtl-optimization/71275
* ira.c (ira): Free dominance info.

diff --git a/gcc/ira.c b/gcc/ira.c
index 1b269ea..3c4e3b6 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5188,6 +5188,7 @@ ira (FILE *f)
 add_store_equivs ();
 
   loop_optimizer_finalize ();
+  free_dominance_info (CDI_DOMINATORS);
   end_alias_analysis ();
   free (reg_equiv);
 

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][AArch64] Use aarch64_fusion_enabled_p to check for insn fusion capabilities

2016-05-27 Thread Evandro Menezes

On 05/27/16 11:59, Kyrill Tkachov wrote:

Hi all,

This patch is a small cleanup that uses the newly introduced 
aarch64_fusion_enabled_p predicate
to check for what fusion opportunities are enabled for the current 
target.


Tested on aarch64-none-elf.

Ok for trunk?

Thanks,
Kyrill

2016-05-27  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Use
aarch64_fusion_enabled_p to check for fusion capabilities.


LGTM

--
Evandro Menezes



Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-05-27 Thread Evandro Menezes

On 05/25/16 11:16, James Greenhalgh wrote:

On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:

gcc/
 * config/aarch64/aarch64-protos.h
 (tune_params): Add new member "approx_div_modes".
 (aarch64_emit_approx_div): Declare new function.
 * config/aarch64/aarch64.c
 (generic_tunings): New member "approx_div_modes".
 (cortexa35_tunings): Likewise.
 (cortexa53_tunings): Likewise.
 (cortexa57_tunings): Likewise.
 (cortexa72_tunings): Likewise.
 (exynosm1_tunings): Likewise.
 (thunderx_tunings): Likewise.
 (xgene1_tunings): Likewise.
 (aarch64_emit_approx_div): Define new function.
 * config/aarch64/aarch64.md ("div3"): New expansion.
 * config/aarch64/aarch64-simd.md ("div3"): Likewise.
 * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
 * doc/invoke.texi (-mlow-precision-div): Describe new option.

My comments from the other two patches around using a structure to
group up the tuning flags and whether we really want the new option
apply here too.

This code has no consumers by default and is only used for
-mlow-precision-div. Is this option likely to be useful to our users in
practice? It might all be more palatable under something like the rs6000's
-mrecip=opt .


I agree.  OK as a follow up?


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 47ccb18..7e99e16 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1509,7 +1509,19 @@
[(set_attr "type" "neon_fp_mul_")]
  )
  
-(define_insn "div3"

+(define_expand "div3"
+ [(set (match_operand:VDQF 0 "register_operand")
+   (div:VDQF (match_operand:VDQF 1 "general_operand")

What does this relaxation to general_operand give you?


Hold that thought...


+(match_operand:VDQF 2 "register_operand")))]
+ "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+DONE;
+
+  operands[1] = force_reg (mode, operands[1]);

...other than the need to do this (sorry if I've missed something obvious).


Hold on...


+})
+
+(define_insn "*div3"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
 (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
 (match_operand:VDQF 2 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 589871b..d3e73bf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7604,6 +7612,83 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
return true;
  }
  
+/* Emit the instruction sequence to compute the approximation for a division.  */

Long line, missing details on what the return type means and the meaning of
arguments.


OK


+
+bool
+aarch64_emit_approx_div (rtx quo, rtx num, rtx div)

DIV is ambiguous (divisor, or the RTX or the division itself?) "DIVISOR" is
not much more typing and is clear.


I renamed it to imply the denominator.


+{
+  machine_mode mode = GET_MODE (quo);
+
+  if (!flag_finite_math_only
+  || flag_trapping_math
+  || !flag_unsafe_math_optimizations
+  || optimize_function_for_size_p (cfun)
+  || !(flag_mlow_precision_div
+  || (aarch64_tune_params.approx_div_modes & AARCH64_APPROX_MODE 
(mode

Long line.


OK


+return false;
+
+  /* Estimate the approximate reciprocal.  */
+  rtx xrcp = gen_reg_rtx (mode);
+  switch (mode)
+{
+  case SFmode:
+   emit_insn (gen_aarch64_frecpesf (xrcp, div)); break;
+  case V2SFmode:
+   emit_insn (gen_aarch64_frecpev2sf (xrcp, div)); break;
+  case V4SFmode:
+   emit_insn (gen_aarch64_frecpev4sf (xrcp, div)); break;
+  case DFmode:
+   emit_insn (gen_aarch64_frecpedf (xrcp, div)); break;
+  case V2DFmode:
+   emit_insn (gen_aarch64_frecpev2df (xrcp, div)); break;
+  default:
+   gcc_unreachable ();
+}

Factor this to get_recpe_type or similar (as was done for get_rsqrts_type).


OK


+
+  /* Iterate over the series twice for SF and thrice for DF.  */
+  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
+
+  /* Optionally iterate over the series once less for faster performance,
+ while sacrificing the accuracy.  */
+  if (flag_mlow_precision_div)
+iterations--;
+
+  /* Iterate over the series to calculate the approximate reciprocal.  */
+  rtx xtmp = gen_reg_rtx (mode);
+  while (iterations--)
+{
+  switch (mode)
+{
+ case SFmode:
+   emit_insn (gen_aarch64_frecpssf (xtmp, xrcp, div)); break;
+ case V2SFmode:
+   emit_insn (gen_aarch64_frecpsv2sf (xtmp, xrcp, div)); break;
+ case V4SFmode:
+   emit_insn (gen_aarch64_frecpsv4sf (xtmp, xrcp, div)); break;
+ case DFmode:
+   emit_insn (gen_aarch64_frecpsdf (xtmp, xrcp, div)); break;
+ case V2DFmode:
+   emit_insn (gen_aa

Re: [PATCH 2/3][AArch64] Emit square root using the Newton series

2016-05-27 Thread Evandro Menezes

On 05/25/16 10:52, James Greenhalgh wrote:

On Wed, Apr 27, 2016 at 04:15:45PM -0500, Evandro Menezes wrote:

gcc/
 * config/aarch64/aarch64-protos.h
 (aarch64_emit_approx_rsqrt): Replace with new function
 "aarch64_emit_approx_sqrt".
 (tune_params): New member "approx_sqrt_modes".
 * config/aarch64/aarch64.c
 (generic_tunings): New member "approx_rsqrt_modes".
 (cortexa35_tunings): Likewise.
 (cortexa53_tunings): Likewise.
 (cortexa57_tunings): Likewise.
 (cortexa72_tunings): Likewise.
 (exynosm1_tunings): Likewise.
 (thunderx_tunings): Likewise.
 (xgene1_tunings): Likewise.
 (aarch64_emit_approx_rsqrt): Replace with new function
 "aarch64_emit_approx_sqrt".
 (aarch64_override_options_after_change_1): Handle new option.
 * config/aarch64/aarch64-simd.md
 (rsqrt2): Use new function instead.
 (sqrt2): New expansion and insn definitions.
 * config/aarch64/aarch64.md: Likewise.
 * config/aarch64/aarch64.opt
 (mlow-precision-sqrt): Add new option description.
 * doc/invoke.texi (mlow-precision-sqrt): Likewise.


  

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 50f1d24..437f6af 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -244,6 +244,7 @@ struct tune_params
} autoprefetcher_model;
  
unsigned int extra_tuning_flags;

+  unsigned int approx_sqrt_modes;
unsigned int approx_rsqrt_modes;
  };

This should go in struct recommended in 1/3 too.


OK

  
@@ -396,7 +397,7 @@ void aarch64_register_pragmas (void);

  void aarch64_relayout_simd_types (void);
  void aarch64_reset_previous_fndecl (void);
  void aarch64_save_restore_target_globals (tree);
-void aarch64_emit_approx_rsqrt (rtx, rtx);
+bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
  
  /* Initialize builtins for SIMD intrinsics.  */

  void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 68381bf..589871b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -38,6 +38,7 @@
  #include "recog.h"
  #include "diagnostic.h"
  #include "insn-attr.h"
+#include "insn-flags.h"

Can you remember what you needed this include for? I couldn't spot it below
and removing the include didn't seem to cause any trouble to the build.
If it isn't needed, drop it.


I don't recall.  Dropped.


  #include "insn-modes.h"
  #include "alias.h"
  #include "fold-const.h"
@@ -7521,46 +7530,78 @@ get_rsqrts_type (machine_mode mode)
}
  }
  
-/* Emit instruction sequence to compute the reciprocal square root using the

-   Newton-Raphson series.  Iterate over the series twice for SF
-   and thrice for DF.  */
+/* Emit instruction sequence to compute either the approximate square root
+   or its approximate reciprocal.  */

As you are updating this function a new parameter and a return value, please
comment it appropriately. Describing the purpose of the parameter and the
meaning of the return value.


OK


-void
-aarch64_emit_approx_rsqrt (rtx dst, rtx src)
+bool
+aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
  {
-  machine_mode mode = GET_MODE (src);
-  gcc_assert (
-mode == SFmode || mode == V2SFmode || mode == V4SFmode
-   || mode == DFmode || mode == V2DFmode);
-
-  rtx xsrc = gen_reg_rtx (mode);
-  emit_move_insn (xsrc, src);
-  rtx x0 = gen_reg_rtx (mode);
+  machine_mode mode = GET_MODE (dst);
+  machine_mode mmsk = mode_for_vector (int_mode_for_mode (GET_MODE_INNER 
(mode)),
+  GET_MODE_NUNITS (mode));
+
+  if (!flag_finite_math_only
+  || flag_trapping_math
+  || !flag_unsafe_math_optimizations
+  || optimize_function_for_size_p (cfun)
+  || !((recp && (flag_mrecip_low_precision_sqrt
+|| (aarch64_tune_params.approx_rsqrt_modes
+& AARCH64_APPROX_MODE (mode
+  || (!recp && (flag_mlow_precision_sqrt
+|| (aarch64_tune_params.approx_sqrt_modes
+& AARCH64_APPROX_MODE (mode))
+return false;

Can you pull out these sub-clauses and assign them somewhere for clarity.
These expressions are a bit too big to grok at a distance. I'd rather be
reading:

bool use_low_precision_rsqrt_p
   = recp
  && (flag_mrecip_low_precision_sqrt
 || (aarch64_tune_params.approx_rsqrt_modes
 & AARCH64_APPROX_MODE (mode)))
<...>

   !((use_low_precision_sqrt_p)
  || (use_low_precision_rsqrt_p))


OK


-  emit_insn ((*get_rsqrte_type (mode)) (x0, xsrc));
+  rtx xmsk = gen_reg_rtx (mmsk);
+  if (!recp)
+/* When calculating the approximate square root, compare the argument with
+   0.0 and create a mask.  */
+emit_insn (gen_rtx_SET (xmsk, gen_rtx_NEG (mmsk, gen_rtx_EQ (mmsk, src,
+  

Re: [PATCH 1/3][AArch64] Add more choices for the reciprocal square root approximation

2016-05-27 Thread Evandro Menezes

On 05/25/16 05:15, James Greenhalgh wrote:

On Wed, Apr 27, 2016 at 04:13:33PM -0500, Evandro Menezes wrote:

gcc/
 * config/aarch64/aarch64-protos.h
 (AARCH64_APPROX_MODE): New macro.
 (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}):
Likewise.
 (tune_params): New member "approx_rsqrt_modes".
 * config/aarch64/aarch64-tuning-flags.def
 (AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
 * config/aarch64/aarch64.c
 (generic_tunings): New member "approx_rsqrt_modes".
 (cortexa35_tunings): Likewise.
 (cortexa53_tunings): Likewise.
 (cortexa57_tunings): Likewise.
 (cortexa72_tunings): Likewise.
 (exynosm1_tunings): Likewise.
 (thunderx_tunings): Likewise.
 (xgene1_tunings): Likewise.
 (use_rsqrt_p): New argument for the mode and use new member from
 "tune_params".
 (aarch64_builtin_reciprocal): Devise mode from builtin.
 (aarch64_optab_supported_p): New argument for the mode.
 * doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description.

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index f22a31c..50f1d24 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,32 @@ struct cpu_branch_cost
const int unpredictable;  /* Unpredictable branch or optimizing for speed.  
*/
  };
  
+/* Control approximate alternatives to certain FP operators.  */

+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
+ + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
+ : (0))
+#define AARCH64_APPROX_NONE (0)
+#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \
+  | AARCH64_APPROX_MODE (V2SFmode) \
+  | AARCH64_APPROX_MODE (V4SFmode))
+#define AARCH64_APPROX_DP (AARCH64_APPROX_MODE (DFmode) \
+  | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_DFORM (AARCH64_APPROX_MODE (SFmode) \
+ | AARCH64_APPROX_MODE (DFmode) \
+ | AARCH64_APPROX_MODE (V2SFmode))
+#define AARCH64_APPROX_QFORM (AARCH64_APPROX_MODE (V4SFmode) \
+ | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_SCALAR (AARCH64_APPROX_MODE (SFmode) \
+  | AARCH64_APPROX_MODE (DFmode))
+#define AARCH64_APPROX_VECTOR (AARCH64_APPROX_MODE (V2SFmode) \
+  | AARCH64_APPROX_MODE (V4SFmode) \
+  | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_ALL (-1)
+

Thanks for providing these various subsets, but I think they are
unneccesary for the final submission. From what I can see, only
AARCH64_APPROX_ALL and AARCH64_APPROX_NONE are used. Please remove the
rest, they are easy enough to add back if a subtarget wants them.


OK


  struct tune_params
  {
const struct cpu_cost_table *insn_extra_cost;
@@ -218,6 +244,7 @@ struct tune_params
} autoprefetcher_model;
  
unsigned int extra_tuning_flags;

+  unsigned int approx_rsqrt_modes;

As we're going to add a few of these, lets follow the approach for some
of the other costs (e.g. branch costs, vector costs) and bury them in a
structure of their own.


OK


  };
  
  #define AARCH64_FUSION_PAIR(x, name) \

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..048c2a3 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,3 @@
   AARCH64_TUNE_ to give an enum name. */
  
  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)

-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
-

Did you want to add another way to tune these by command line (not
neccessary now, but as a follow-up)? See how instruction fusion is
handled by the -moverride code for an example.


I prefer your suggestion a la mode of RS6000, something like 
-mapprox=.


Thank you,

--
Evandro Menezes

>From 86d7690632d03ec85fd69bfaef8e89c0542518ad Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH 1/3] [AArch64] Add more choices for the reciprocal square root
 approximation

Allow a target to prefer such operation depending on the operation mode.

2016-03-03  Evandro Menezes  

gcc/
	* config/aarch64/aarch64-protos.h
	(AARCH64_APPROX_MODE): New macro.
	(AARCH64_APPROX_{NONE,ALL}): Likewise.
	(cpu_approx_modes): New structure.
	(tune_params): New member "approx_modes".
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
	* config/aarch64/aarch64.c
	({generic,exynosm1,xgene1}_ap

Re: [PATCH v4] gcov: Runtime configurable destination output

2016-05-27 Thread Nathan Sidwell

On 05/26/16 13:08, Aaron Conole wrote:

The previous gcov behavior was to always output errors on the stderr channel.
This is fine for most uses, but some programs will require stderr to be
untouched by libgcov for certain tests. This change allows configuring
the gcov output via an environment variable which will be used to open
the appropriate file.


this is ok, thanks.

1) Do you know how to write and format a ChangeLog entry?
2) Are you able to commit the patch yourself (or have someone at RH walk you 
through the process)


nathan



Re: [PTX] malloc/realloc/free

2016-05-27 Thread Nathan Sidwell

On 05/26/16 13:08, Alexander Monakov wrote:

Hello,

On Thu, 26 May 2016, Nathan Sidwell wrote:


This patch removes the malloc/realloc/free wrappers from libgcc.  I've
implemented  them  completely in C and  put them in the ptx newlib port --
where one expects such functions.


It appears that the new Newlib code doesn't free 'p' on 'realloc (p, 0)';


I pushed an alternative patch.



[wwwdocs] readings.html -- www.fh-jena.de -> www.eah-jena.de

2016-05-27 Thread Gerald Pfeifer
Applied.

Gerald

Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.247
diff -u -r1.247 readings.html
--- readings.html   27 May 2016 20:09:51 -  1.247
+++ readings.html   27 May 2016 20:46:56 -
@@ -530,7 +530,7 @@
A few links for your enjoyment.

  
-   http://www.fh-jena.de/~kleine/history/";>Historic Documents in
+   http://www.eah-jena.de/~kleine/history/";>Historic Documents in
Computer Science by Karl Kleine
  
 


[wwwdocs] projets/gupc.html -- www.gwu.edu now uses https

2016-05-27 Thread Gerald Pfeifer
Applied.

Gerald

Index: gupc.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/gupc.html,v
retrieving revision 1.9
diff -u -r1.9 gupc.html
--- gupc.html   28 Aug 2015 19:50:33 -  1.9
+++ gupc.html   27 May 2016 20:42:59 -
@@ -36,7 +36,7 @@
 Features
 
 
-http://www.gwu.edu/~upc/docs/upc_specs_1.2.pdf";>
+https://www.gwu.edu/~upc/docs/upc_specs_1.2.pdf";>
 UPC 1.2 specification compliant
 http://upc.gwu.edu/docs/UPC_Coll_Spec_V1.0.pdf";>
 UPC collectives library support


[patch] config.gcc FreeBSD ARM

2016-05-27 Thread Andreas Tobler

Hi all,

The FreeBSD ARM people eliminated the extra armv6hf target and moved the 
hardfloat 'functionality' into the armv6-*-freebsd11+ target.
This applies / will apply (FreeBSD11 is not released yet. Planned date 
in September 16) to FreeBSD11. On FreeBSD10 armv6 still has only soft 
float. The armv6hf is not life on FreeBSD10.


This simplifies life a bit.

I'll commit the attached patch to all the active branches. Regarding the 
gcc-5 branch, do I have permission to apply?


TIA,
Andreas

2016-05-27  Andreas Tobler  

* config.gcc: Move hard float support for arm*hf*-*-freebsd* into
armv6*-*-freebsd* for FreeBSD11*. Eliminate the arm*hf*-*-freebsd*
target.

Index: config.gcc
===
--- config.gcc  (revision 236835)
+++ config.gcc  (working copy)
@@ -1058,13 +1058,11 @@
case $target in
armv6*-*-freebsd*)
tm_defines="${tm_defines} TARGET_FREEBSD_ARMv6=1"
+if test $fbsd_major -ge 11; then
+   tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"
+fi
;;
esac
-   case $target in
-   arm*hf-*-freebsd*)
-   tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"
-   ;;
-   esac
with_tls=${with_tls:-gnu}
;;
 arm*-*-netbsdelf*)


[wwwdocs] Update boehm-gc master sources in codingconventions.html

2016-05-27 Thread Gerald Pfeifer
This updates the upstream web site and contact address for boehm-gc,
based on feedback from Hans.  (Thank you!)

Committed.

Gerald

Index: codingconventions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
retrieving revision 1.73
diff -u -r1.73 codingconventions.html
--- codingconventions.html  27 Jun 2015 18:46:13 -  1.73
+++ codingconventions.html  27 May 2016 19:24:15 -
@@ -628,8 +628,9 @@
 essentially maintained in the GCC source tree.
 
 boehm-gc: The master sources are at http://www.hpl.hp.com/personal/Hans_Boehm/gc/";>www.hpl.hp.com/personal/Hans_Boehm/gc/.
-Patches should be sent to mailto:bo...@acm.org";>bo...@acm.org,
+href="http://hboehm.info/gc/";>http://hboehm.info/gc/.
+Patches should be sent to
+mailto:bd...@lists.opendylan.org";>bd...@lists.opendylan.org,
 but it's acceptable to check them in the GCC source tree before getting
 them installed in the master tree.
 


[PATCH] Fix missing/wrong function declaration in s-osinte-rtems.ads (ada/71317)

2016-05-27 Thread Jan Sommer
Hello,

this patch fixes the build failures of recent gnat compiler version for RTEMS 
targets (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71317).
Attached are patches for trunk, gcc-5-branch and gcc-6-branch.
I don't have write access to the svn, so if the patches pass the review process 
please commit them.

CC is the maintainer of the RTEMS project in case there are some further 
questions.

Best regards,

   Jan

Index: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog	(Revision 236835)
+++ gcc/ada/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,10 @@
+2016-05-27  Jan Sommer 
+
+	PR ada/71317
+	* s-osinte-rtems.ads: Fix missing/wrong function declarations:
+	Missing: clock_getres
+	Wrong:   Get_Page_Size
+
 2016-05-06  Eric Botcazou  
 
 	PR ada/70969
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads	(Revision 236835)
+++ gcc/ada/s-osinte-rtems.ads	(Arbeitskopie)
@@ -188,6 +188,11 @@ package System.OS_Interface is
   tp   : access timespec) return int;
pragma Import (C, clock_gettime, "clock_gettime");
 
+   function clock_getres
+ (clock_id : clockid_t;
+  res  : access timespec) return int;
+   pragma Import (C, clock_getres, "clock_getres");
+
function To_Duration (TS : timespec) return Duration;
pragma Inline (To_Duration);
 
@@ -291,8 +296,7 @@ package System.OS_Interface is
--  These two functions are only needed to share s-taprop.adb with
--  FSU threads.
 
-   function Get_Page_Size return size_t;
-   function Get_Page_Size return Address;
+   function Get_Page_Size return int;
pragma Import (C, Get_Page_Size, "getpagesize");
--  Returns the size of a page
 
Index: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog	(Revision 236834)
+++ gcc/ada/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,10 @@
+2016-05-27  Jan Sommer 
+
+	PR ada/71317
+	* s-osinte-rtems.ads: Fix missing/wrong function declarations:
+	Missing: clock_getres
+	Wrong:   Get_Page_Size
+
 2016-05-06  Eric Botcazou  
 
 	PR ada/70969
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads	(Revision 236834)
+++ gcc/ada/s-osinte-rtems.ads	(Arbeitskopie)
@@ -188,6 +188,11 @@ package System.OS_Interface is
   tp   : access timespec) return int;
pragma Import (C, clock_gettime, "clock_gettime");
 
+   function clock_getres
+ (clock_id : clockid_t;
+  res  : access timespec) return int;
+   pragma Import (C, clock_getres, "clock_getres");
+
function To_Duration (TS : timespec) return Duration;
pragma Inline (To_Duration);
 
@@ -291,8 +296,7 @@ package System.OS_Interface is
--  These two functions are only needed to share s-taprop.adb with
--  FSU threads.
 
-   function Get_Page_Size return size_t;
-   function Get_Page_Size return Address;
+   function Get_Page_Size return int;
pragma Import (C, Get_Page_Size, "getpagesize");
--  Returns the size of a page
 
Index: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog	(Revision 236834)
+++ gcc/ada/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,10 @@
+2016-05-27  Jan Sommer 
+
+	PR ada/71317
+	* s-osinte-rtems.ads: Fix missing/wrong function declarations:
+	Missing: clock_getres
+	Wrong:   Get_Page_Size
+
 2016-05-20  Eric Botcazou  
 
 	* gcc-interface/decl.c (gnat_to_gnu_entity) :
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads	(Revision 236834)
+++ gcc/ada/s-osinte-rtems.ads	(Arbeitskopie)
@@ -188,6 +188,11 @@ package System.OS_Interface is
   tp   : access timespec) return int;
pragma Import (C, clock_gettime, "clock_gettime");
 
+   function clock_getres
+ (clock_id : clockid_t;
+  res  : access timespec) return int;
+   pragma Import (C, clock_getres, "clock_getres");
+
function To_Duration (TS : timespec) return Duration;
pragma Inline (To_Duration);
 
@@ -291,8 +296,7 @@ package System.OS_Interface is
--  These two functions are only needed to share s-taprop.adb with
--  FSU threads.
 
-   function Get_Page_Size return size_t;
-   function Get_Page_Size return Address;
+   function Get_Page_Size return int;
pragma Import (C, Get_Page_Size, "getpagesize");
--  Returns the size of a page
 


[wwwdocs] Update upstream Go repository in doc/sourcebuild.texi

2016-05-27 Thread Gerald Pfeifer
The guys managing code.google.com put a nice redirect in place, 
just following this.

I'll probably push this back to GCC 6 as well in a while.

Committed.

Gerald

2016-05-27  Gerald Pfeifer  

* doc/sourcebuild.texi: New address for upstream Go repository.

Index: doc/sourcebuild.texi
===
--- doc/sourcebuild.texi(revision 236835)
+++ doc/sourcebuild.texi(working copy)
@@ -86,7 +86,7 @@
 
 @item libgo
 The Go runtime library.  The bulk of this library is mirrored from the
-@uref{http://code.google.com/@/p/@/go/, master Go repository}.
+@uref{https://github.com/@/golang/go, master Go repository}.
 
 @item libgomp
 The GNU Offloading and Multi Processing Runtime Library.


[wwwdocs] Remove developer.intel.com link from readings.html

2016-05-27 Thread Gerald Pfeifer
Intel's web team had this now redirect to a general product page
(for hardware).  With such an approach, I don't want to link to
them directly any longer -- people can just google for whatever
the location of the day of manuals etc. may be.

Applied.

Gerald

Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.246
diff -u -r1.246 readings.html
--- readings.html   14 Mar 2016 09:59:05 -  1.246
+++ readings.html   27 May 2016 20:08:05 -
@@ -147,9 +147,6 @@
  i386 (i486, i586, i686, i786)
Manufacturer: Intel
 
-  http://developer.intel.com/products/processor/manuals/index.htm";>
-Intel®64 and IA-32 Architectures Software Developer's Manuals
-
   Some information about optimizing for x86 processors, links to
   x86 manuals and documentation:
 


[wwwdocs] testing/testing-blitz.html -- sourceforge.net uses https

2016-05-27 Thread Gerald Pfeifer
Committed.

Gerald

Index: testing/testing-blitz.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/testing/testing-blitz.html,v
retrieving revision 1.6
diff -u -r1.6 testing-blitz.html
--- testing/testing-blitz.html  1 Nov 2012 18:40:09 -   1.6
+++ testing/testing-blitz.html  27 May 2016 20:04:13 -
@@ -8,13 +8,13 @@
 Blitz++ build and test guide
 
 This page is a guide to building the
-http://sourceforge.net/projects/blitz/";>Blitz++ scientific
+https://sourceforge.net/projects/blitz/";>Blitz++ scientific
 computing class library as part of GCC integration testing.
 
 Resource usage
 
 The Blitz++ 0.7 distribution, available via the http://sourceforge.net/projects/blitz/files/";>Blitz++ download
+href="https://sourceforge.net/projects/blitz/files/";>Blitz++ download
 page at SourceForge is a 2.0 MB file.  The uncompressed
 distribution comprises 10.5 MB of source files.  Building and running the
 test suite adds an additional 75 or so MB of object files


[PATCH, rs6000] Add builtin-support for new Power9 vslv and vsrv (vector shift left and right variable) instructions

2016-05-27 Thread Kelvin Nilsen

This patch adds built-in function support for the Power9 vslv and vsrv
instructions.

I have bootstrapped and tested this patch against the trunk on
powerpc64le-unkonwn-linux-gnu with no regressions.  Is this ok for the
trunk?

I have not yet tested against the gcc-6 branch as this patch depends on
infrastructure that has not yet been backported to gcc-6.  Once the
necessary infrastructure is available, is this ok for backporting to
gcc6 following bootstrap and regression testing?

Thanks,
Kelvin


gcc/ChangeLog:

2016-05-27  Kelvin Nilsen  

* config/rs6000/altivec.h (vec_slv): New macro.
(vec_srv): New macro.
* config/rs6000/altivec.md (UNSPEC_VSLV): New value.
(UNSPEC_VSRV): New value.
(vslv): New insn.
(vsrv): New insn.
* config/rs6000/rs6000-builtin.def (vslv): New builtin definition.
(vsrv): New builtin definition.
* config/rs6000/rs6000-c.c (P9V_BUILTIN_VSLV): Macro expansion to
define argument types for new builtin.
(P9V_BUILTIN_VSRV): Macro expansion to define argument types for
new builtin.
* doc/extend.texi: Document the new vec_vslv and vec_srv built-in
functions. 

gcc/testsuite/ChangeLog:

2016-05-27  Kelvin Nilsen  

* gcc.target/powerpc/vslv-0.c: New test.
* gcc.target/powerpc/vslv-1.c: New test.
* gcc.target/powerpc/vsrv-0.c: New test.
* gcc.target/powerpc/vsrv-1.c: New test.


Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 236796)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -400,6 +400,9 @@
 #ifdef _ARCH_PPC64
 #define vec_vprtybq __builtin_vec_vprtybq
 #endif
+
+#define vec_slv __builtin_vec_vslv
+#define vec_srv __builtin_vec_vsrv
 #endif
 
 /* Predicates.
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 236796)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -114,6 +114,8 @@
UNSPEC_STVLXL
UNSPEC_STVRX
UNSPEC_STVRXL
+   UNSPEC_VSLV
+   UNSPEC_VSRV
UNSPEC_VMULWHUB
UNSPEC_VMULWLUB
UNSPEC_VMULWHSB
@@ -1631,6 +1633,24 @@
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
+(define_insn "vslv"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
+  (match_operand:V16QI 2 "register_operand" "v")]
+ UNSPEC_VSLV))]
+  "TARGET_P9_VECTOR"
+  "vslv %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vsrv"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
+  (match_operand:V16QI 2 "register_operand" "v")]
+ UNSPEC_VSRV))]
+  "TARGET_P9_VECTOR"
+  "vsrv %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
 (define_insn "*altivec_vsl"
   [(set (match_operand:VI2 0 "register_operand" "=v")
 (ashift:VI2 (match_operand:VI2 1 "register_operand" "v")
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 236796)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1749,6 +1749,14 @@ BU_P8V_OVERLOAD_3 (VADDEUQM, "vaddeuqm")
 BU_P8V_OVERLOAD_3 (VSUBECUQ,   "vsubecuq")
 BU_P8V_OVERLOAD_3 (VSUBEUQM,   "vsubeuqm")
 
+/* ISA 3.0 vector overloaded 2-argument functions. */
+BU_P9V_AV_2 (VSLV, "vslv", CONST, vslv)
+BU_P9V_AV_2 (VSRV, "vsrv", CONST, vsrv)
+
+/* ISA 3.0 vector overloaded 2-argument functions. */
+BU_P9V_OVERLOAD_2 (VSLV,   "vslv")
+BU_P9V_OVERLOAD_2 (VSRV,   "vsrv")
+
 
 /* 2 argument extended divide functions added in ISA 2.06.  */
 BU_P7_MISC_2 (DIVWE,   "divwe",CONST,  dive_si)
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 236796)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -4488,6 +4488,13 @@ const struct altivec_builtin_types altivec_overloa
   { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
 RS6000_BTI_unsigned_V16QI, 0, 0, 0 },
 
+  { P9V_BUILTIN_VEC_VSLV, P9V_BUILTIN_VSLV,
+RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+RS6000_BTI_unsigned_V16QI, 0 },
+  { P9V_BUILTIN_VEC_VSRV, P9V_BUILTIN_VSRV,
+RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
+RS6000_BTI_unsigned_V16QI, 0 },
+
   /* Crypto builtins.  */
   { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 236796)
+++ gcc/doc/extend.texi (working copy)
@@ -14686,8 +14686,8 @@ The @code{__bu

Re: [wwwdocs] Document GCC 6 Solaris changes

2016-05-27 Thread Gerald Pfeifer
On Tue, 19 Apr 2016, Rainer Orth wrote:
>>> [gcc-6/changes.html]
> Btw., I noticed that the subsections of `Operating Systems' are in
> random order.  Shouldn't they be sorted alphabetically?

Yes.  Want to give it a try?  It's surely pre-approved.

Gerald


[wwwdocs] Update to current canonical address for CSiBE

2016-05-27 Thread Gerald Pfeifer
Applied.

Gerald

Index: benchmarks/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/benchmarks/index.html,v
retrieving revision 1.34
diff -u -r1.34 index.html
--- benchmarks/index.html   27 Feb 2016 22:43:31 -  1.34
+++ benchmarks/index.html   27 May 2016 19:33:01 -
@@ -40,7 +40,7 @@
 
 
 Statistics about GCC code size for several targets are available from the
-http://www.inf.u-szeged.hu/csibe/";>GCC Code-Size Benchmark
+http://szeged.github.io/csibe/";>GCC Code-Size Benchmark
 Environment (CSiBE), along with the testbed and measurement scripts.
 
 
Index: projects/cfo.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cfo.html,v
retrieving revision 1.9
diff -u -r1.9 cfo.html
--- projects/cfo.html   29 Dec 2012 00:24:56 -  1.9
+++ projects/cfo.html   27 May 2016 19:33:01 -
@@ -159,8 +159,8 @@
 Preliminary results
 
 The following results have been prepared using the 
-http://www.csibe.org";>CSiBE benchmark with respect to the mainline
-at the last merge (2005-07-11).
+http://szeged.github.io/csibe/";>CSiBE benchmark with respect
+to the mainline at the last merge (2005-07-11).
 
 
 


[wwwdocs] www.akkadia.org now uses https -- gcc-4.0/changes.html

2016-05-27 Thread Gerald Pfeifer
Applied.

Gerald

Index: gcc-4.0/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.0/changes.html,v
retrieving revision 1.67
diff -u -r1.67 changes.html
--- gcc-4.0/changes.html27 Jun 2015 19:26:46 -  1.67
+++ gcc-4.0/changes.html27 May 2016 19:27:22 -
@@ -169,8 +169,8 @@
 never used outside a binary as hidden, one can completely avoid PLT
 indirection overheads during their usage by the compiler. You can find
 out more about the advantages of this at http://www.akkadia.org/drepper/dsohowto.pdf";>
-http://www.akkadia.org/drepper/dsohowto.pdf
+href="https://www.akkadia.org/drepper/dsohowto.pdf";>
+https://www.akkadia.org/drepper/dsohowto.pdf
 
 The -fvisibility-inlines-hidden option has been added
 which marks all inlineable functions as having hidden ELF visibility,


Re: Record likely upper bounds for loops

2016-05-27 Thread Paolo Carlini

Hi,

On 27/05/2016 19:20, Alexander Monakov wrote:

Hi,

On Fri, 27 May 2016, Jan Hubicka wrote:

Thanks, updatted and comitted.

This checkin seems to regress gcc.c-torture/execute/20050826-2.c at -Os:

gcc/xgcc -Bgcc/ ../gcc/gcc/testsuite/gcc.c-torture/execute/20050826-2.c -Os \
   -o ./20050826-2.exe

./20050826-2.exe
Aborted

(the previous revision is fine)

I suspect (should double check) this one too in libstdc++-v3:

FAIL: 25_algorithms/is_sorted/1.cc execution test

Paolo.




Re: [PATCH] Fixes to must-tail-call tests

2016-05-27 Thread David Malcolm
On Fri, 2016-05-27 at 13:29 +0100, Thomas Preudhomme wrote:
> Hi Rainer,
> 
> On Wednesday 25 May 2016 11:31:12 Rainer Orth wrote:
> > David Malcolm  writes:
> > > The following fixes the known failures of the must-tail-call
> > > tests.
> > > 
> > > Tested with --target=
> > > * aarch64-unknown-linux-gnu
> > > * ia64-unknown-linux-gnu
> > > * m68k-unknown-linux-gnu
> > > * x86_64-pc-linux-gnu
> > 
> > Even with this patch, there are still failures on sparc-sun
> > -solaris2.12:
> > 
> > FAIL: gcc.dg/plugin/must-tail-call-1.c 
> > -fplugin=./must_tail_call_plugin.so
> > (test for excess errors)
> > 
> > Excess errors:
> > /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail
> > -call-1.c:1
> > 2:10: error: cannot tail-call: target is not able to optimize the
> > call into
> > a sibling call
> > 
> > FAIL: gcc.dg/plugin/must-tail-call-2.c 
> > -fplugin=./must_tail_call_plugin.so
> > (test for excess errors)
> > 
> > Excess errors:
> > /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail
> > -call-2.c:3
> > 2:10: error: cannot tail-call: target is not able to optimize the
> > call into
> > a sibling call


My aim with these tests was to try to cover the various ways in which
mandatory tail-call optimization can fail.

However, this is very target-dependent, and, as written, the test over
-specifies the output.

Sorry about this.

I've run the test on all of the configurations in contrib/config
-list.mk (using the patch kit in
  https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02100.html )

Collated output can be seen here:
  https://dmalcolm.fedorapeople.org/gcc/2016-05-27/must-tail-call-logs.txt
showing all the different error messages across every configuration.

It's not clear to me what the best way forward here is.
We could simply check for
   error: cannot tail-call:
and leave the precise messages we're checking for unspecified.

If we care about checking for the precise messages, one of more
duplicate copy(s) of the test could be provided, filtering by target to
specify precise targets, giving more precise output, where this is
known.

I'm attaching a patch (sans ChangeLog) which strips the precise
messages in the manner described above.  Retesting on all targets with
this patch, the only remaining failures are of the form:
  must-tail-call-1.c:12:10: error: cannot tail-call: machine
description does not have a sibcall_epilogue instruction pattern
on targets lacking the pattern.

Thoughts?

Sorry again about the breakage

Dave


> Now that the logic is in place, you probably want to add sparc-sun
> -solaris in 
> plugin.exp to the the list of architecture where tail call plugin
> tests should 
> be skipped, alongside Thumb-1 ARM targets.
> 
> Best regards,
> 
diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c b/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
index c5504f8..c6dfecd 100644
--- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
@@ -14,7 +14,7 @@ returns_struct (int i)
 int __attribute__((noinline,noclone))
 test_1 (int i)
 {
-  return returns_struct (i * 5).i; /* { dg-error "cannot tail-call: callee returns a structure" } */
+  return returns_struct (i * 5).i; /* { dg-error "cannot tail-call: " } */
 }
 
 int __attribute__((noinline,noclone))
@@ -29,14 +29,14 @@ int __attribute__((noinline,noclone))
 test_2_caller (int i)
 {
   struct box b;
-  return test_2_callee (i + 1, b); /* { dg-error "cannot tail-call: callee required more stack slots than the caller" } */
+  return test_2_callee (i + 1, b); /* { dg-error "cannot tail-call: " } */
 }
 
 extern void setjmp (void);
 void
 test_3 (void)
 {
-  setjmp (); /* { dg-error "cannot tail-call: callee returns twice" } */
+  setjmp (); /* { dg-error "cannot tail-call: " } */
 }
 
 void
@@ -45,7 +45,7 @@ test_4 (void)
   void nested (void)
   {
   }
-  nested (); /* { dg-error "cannot tail-call: nested function" } */
+  nested (); /* { dg-error "cannot tail-call: " } */
 }
 
 typedef void (fn_ptr_t) (void);
@@ -54,5 +54,5 @@ volatile fn_ptr_t fn_ptr;
 void
 test_5 (void)
 {
-  fn_ptr (); /* { dg-error "cannot tail-call: callee does not return" } */
+  fn_ptr (); /* { dg-error "cannot tail-call: " } */
 }


[PATCH] c/69507 - bogus warning: ISO C does not allow ‘__alignof__ (expression)’

2016-05-27 Thread Martin Sebor

The patch below adjusts the C alignof pedantic warning to avoid
diagnosing the GCC extension (__alignof__) and only diagnose
_Alignof in C99 and prior modes.  This is consistent with how
__attribute__ ((aligned)) and _Alignas is handled (among other
extensions vs standard features).

Martin

PR c/69507 - bogus warning: ISO C does not allow ‘__alignof__ (expression)’

gcc/testsuite/ChangeLog:
2016-05-27  Martin Sebor  

PR c/69507
* gcc.dg/alignof.c: New test.

gcc/c/ChangeLog:
2016-05-27  Martin Sebor  

PR c/69507
* c-parser.c (c_parser_alignof_expression): Avoid diagnosing
__alignof__ (expression).

Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 232841)
+++ gcc/c/c-parser.c(working copy)
@@ -7019,9 +7019,10 @@ c_parser_alignof_expression (c_parser *p
   mark_exp_read (expr.value);
   c_inhibit_evaluation_warnings--;
   in_alignof--;
-  pedwarn (start_loc,
-  OPT_Wpedantic, "ISO C does not allow %<%E (expression)%>",
-  alignof_spelling);
+  if (is_c11_alignof)
+   pedwarn (start_loc,
+OPT_Wpedantic, "ISO C does not allow %<%E (expression)%>",
+alignof_spelling);
   ret.value = c_alignof_expr (start_loc, expr.value);
   ret.original_code = ERROR_MARK;
   ret.original_type = NULL;
Index: gcc/testsuite/gcc.dg/alignof.c
===
--- gcc/testsuite/gcc.dg/alignof.c  (revision 0)
+++ gcc/testsuite/gcc.dg/alignof.c  (working copy)
@@ -0,0 +1,11 @@
+/* PR c/69507 - bogus warning: ISO C does not allow '__alignof__ 
(expression)'

+ */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -Wno-error -Wpedantic" } */
+
+extern int e;
+
+int a[] = {
+__alignof__ (e),
+_Alignof (e)   /* { dg-warning "ISO C does not allow ._Alignof 
\\(expression\\)." } */

+};


Re: [PATCH1][PR71252] Fix missing swap to stmt_to_insert

2016-05-27 Thread Richard Biener
On May 27, 2016 4:12:06 PM GMT+02:00, Kugan Vivekanandarajah 
 wrote:
>Hi,
>
>This fix the missing swap for stmt-to_insert. I tested this with the
>attached test case which is not valid any more due to some other
>commits. This I believe an is obvious fix and maybe the test case is
>needed.
>
>I am running bootstrap and regression testing on x86-64-linux gnu. Is
>this OK for trunk if the testing is fine ?

OK.

Richard.

>Thanks,
>
>Kugan
>
>
>gcc/ChangeLog:
>
>2016-05-28 Kugan Vivekanandarajah 
>
>* tree-ssa-reassoc.c (swap_ops_for_binary_stmt): Fix swap such that
>
>all fields including stmt_to_insert are swapped.
>
>gcc/testsuite/ChangeLog:
>
>2016-05-28 Kugan Vivekanandarajah 
>
>* gcc.dg/tree-ssa/pr71252-2.c: New test.




Re: Record likely upper bounds for loops

2016-05-27 Thread Alexander Monakov
Hi,

On Fri, 27 May 2016, Jan Hubicka wrote:
> Thanks, updatted and comitted.

This checkin seems to regress gcc.c-torture/execute/20050826-2.c at -Os:

gcc/xgcc -Bgcc/ ../gcc/gcc/testsuite/gcc.c-torture/execute/20050826-2.c -Os \
  -o ./20050826-2.exe  

./20050826-2.exe
Aborted

(the previous revision is fine)

Alexander


Re: [C++ Patch/RFC] PR 60385 and other issues about wrongly named namespaces (eg, c++/68723)

2016-05-27 Thread Paolo Carlini

Hi,

On 27/05/2016 16:56, Jason Merrill wrote:

Let's go with the second patch.


Good. Then I'm going to commit the below after an additional round of 
testing with an updated tree.


Thanks!
Paolo.

///
/cp
2016-05-27  Paolo Carlini  

PR c++/60385
* name-lookup.c (push_namespace): Return bool, false when pushdecl
fails.
* name-lookup.h (push_namespace): Adjust declaration.
* parser.c (cp_parser_namespace_definition): Check push_namespace
return value.

/testsuite
2016-05-27  Paolo Carlini  

PR c++/60385
* g++.dg/parse/namespace13.C: New.
Index: cp/name-lookup.c
===
--- cp/name-lookup.c(revision 236830)
+++ cp/name-lookup.c(working copy)
@@ -3701,9 +3701,10 @@ handle_namespace_attrs (tree ns, tree attributes)
 }
   
 /* Push into the scope of the NAME namespace.  If NAME is NULL_TREE, then we
-   select a name that is unique to this compilation unit.  */
+   select a name that is unique to this compilation unit.  Returns FALSE if
+   pushdecl fails, TRUE otherwise.  */
 
-void
+bool
 push_namespace (tree name)
 {
   tree d = NULL_TREE;
@@ -3777,7 +3778,11 @@ push_namespace (tree name)
TREE_PUBLIC (d) = 0;
   else
TREE_PUBLIC (d) = 1;
-  pushdecl (d);
+  if (pushdecl (d) == error_mark_node)
+   {
+ timevar_cond_stop (TV_NAME_LOOKUP, subtime);
+ return false;
+   }
   if (anon)
{
  /* Clear DECL_NAME for the benefit of debugging back ends.  */
@@ -3795,6 +3800,7 @@ push_namespace (tree name)
   current_namespace = d;
 
   timevar_cond_stop (TV_NAME_LOOKUP, subtime);
+  return true;
 }
 
 /* Pop from the scope of the current namespace.  */
Index: cp/name-lookup.h
===
--- cp/name-lookup.h(revision 236830)
+++ cp/name-lookup.h(working copy)
@@ -312,7 +312,7 @@ extern tree push_inner_scope (tree);
 extern void pop_inner_scope (tree, tree);
 extern void push_binding_level (cp_binding_level *);
 
-extern void push_namespace (tree);
+extern bool push_namespace (tree);
 extern void pop_namespace (void);
 extern void push_nested_namespace (tree);
 extern void pop_nested_namespace (tree);
Index: cp/parser.c
===
--- cp/parser.c (revision 236830)
+++ cp/parser.c (working copy)
@@ -17549,7 +17549,7 @@ cp_parser_namespace_definition (cp_parser* parser)
 }
 
   /* Start the namespace.  */
-  push_namespace (identifier);
+  bool ok = push_namespace (identifier);
 
   /* Parse any nested namespace definition. */
   if (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE))
@@ -17582,7 +17582,7 @@ cp_parser_namespace_definition (cp_parser* parser)
 
   /* "inline namespace" is equivalent to a stub namespace definition
  followed by a strong using directive.  */
-  if (is_inline)
+  if (is_inline && ok)
 {
   tree name_space = current_namespace;
   /* Set up namespace association.  */
@@ -17610,7 +17610,8 @@ cp_parser_namespace_definition (cp_parser* parser)
 pop_namespace ();
 
   /* Finish the namespace.  */
-  pop_namespace ();
+  if (ok)
+pop_namespace ();
   /* Look for the final `}'.  */
   cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE);
 }
Index: testsuite/g++.dg/parse/namespace13.C
===
--- testsuite/g++.dg/parse/namespace13.C(revision 0)
+++ testsuite/g++.dg/parse/namespace13.C(working copy)
@@ -0,0 +1,11 @@
+// PR c++/60385
+
+float foo4();   // { dg-message "previous declaration" }
+
+namespace foo4  // { dg-error "redeclared" }
+{
+  struct bar6
+{
+  friend wchar_t bar1();
+};
+}


[PATCH][AArch64] Use aarch64_fusion_enabled_p to check for insn fusion capabilities

2016-05-27 Thread Kyrill Tkachov

Hi all,

This patch is a small cleanup that uses the newly introduced 
aarch64_fusion_enabled_p predicate
to check for what fusion opportunities are enabled for the current target.

Tested on aarch64-none-elf.

Ok for trunk?

Thanks,
Kyrill

2016-05-27  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Use
aarch64_fusion_enabled_p to check for fusion capabilities.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 49cd50b61cf4ba8074a44ae4029316a8af2f793b..8f850c653167d108f899ae9ec5d65938a288aa17 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13070,8 +13070,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
   if (!aarch64_macro_fusion_p ())
 return false;
 
-  if (simple_sets_p
-  && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_MOV_MOVK))
+  if (simple_sets_p && aarch64_fusion_enabled_p (AARCH64_FUSE_MOV_MOVK))
 {
   /* We are trying to match:
  prev (mov)  == (set (reg r0) (const_int imm16))
@@ -13095,8 +13094,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 }
 }
 
-  if (simple_sets_p
-  && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_ADRP_ADD))
+  if (simple_sets_p && aarch64_fusion_enabled_p (AARCH64_FUSE_ADRP_ADD))
 {
 
   /*  We're trying to match:
@@ -13121,8 +13119,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 }
 }
 
-  if (simple_sets_p
-  && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_MOVK_MOVK))
+  if (simple_sets_p && aarch64_fusion_enabled_p (AARCH64_FUSE_MOVK_MOVK))
 {
 
   /* We're trying to match:
@@ -13150,8 +13147,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 
 }
-  if (simple_sets_p
-  && (aarch64_tune_params.fusible_ops & AARCH64_FUSE_ADRP_LDR))
+  if (simple_sets_p && aarch64_fusion_enabled_p (AARCH64_FUSE_ADRP_LDR))
 {
   /* We're trying to match:
   prev (adrp) == (set (reg r0)
@@ -13182,11 +13178,11 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 }
 }
 
-  if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_AES_AESMC)
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)
&& aarch_crypto_can_dual_issue (prev, curr))
 return true;
 
-  if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_CMP_BRANCH)
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_BRANCH)
   && any_condjump_p (curr))
 {
   enum attr_type prev_type = get_attr_type (prev);


[PATCH][AArch64] Remove aarch64_simd_attr_length_move

2016-05-27 Thread Kyrill Tkachov

Hi all,

I notice that we can do without aarch64_simd_attr_length_move. The move 
alternatives for
the OI,CI,XImode modes that involve memory operands all use a single load/store 
so are always
length 4, whereas the register-to-register moves have a statically-known length 
of
(GET_MODE_BITSIZE (mode) / 128) * 4, i.e. 4 bytes for every 128-bit SIMD move 
instruction.
This is already encoded in the insn_count mode attribute so use that when 
needed.

That way we avoid a call to recog and a switch statement just to get the length 
of an insn.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2016-05-27  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_simd_attr_length_move): Delete.
* config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_move):
Delete prototype.
* config/aarch64/iterators.md (insn_count): Add descriptive comment.
* config/aarch64/aarch64-simd.md (*aarch64_mov, VSTRUCT modes):
Remove use of aarch64_simd_attr_length_move, set length attribute
directly.
(*aarch64_be_movoi): Likewise.
(*aarch64_be_movci): Likewise.
(*aarch64_be_movxi): Likewise.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 6e97c01cf768ec5ca2f18e795a8085b8a247f5b6..b39eac94ae6fdef4c39bd1bab5b832f3a1bef618 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -337,7 +337,6 @@ machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
-int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 8be69357eb086e288ae838dc536be8a2ebe0463b..2ca48aefd3dac120c5feeb0ebda6f258d120384c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4639,7 +4639,7 @@ (define_insn "*aarch64_mov"
ld1\\t{%S0.16b - %0.16b}, %1"
   [(set_attr "type" "multiple,neon_store_reg_q,\
 		 neon_load_reg_q")
-   (set (attr "length") (symbol_ref "aarch64_simd_attr_length_move (insn)"))]
+   (set_attr "length" ",4,4")]
 )
 
 (define_insn "aarch64_be_ld1"
@@ -4672,7 +4672,7 @@ (define_insn "*aarch64_be_movoi"
stp\\t%q1, %R1, %0
ldp\\t%q0, %R0, %1"
   [(set_attr "type" "multiple,neon_stp_q,neon_ldp_q")
-   (set (attr "length") (symbol_ref "aarch64_simd_attr_length_move (insn)"))]
+   (set_attr "length" "8,4,4")]
 )
 
 (define_insn "*aarch64_be_movci"
@@ -4683,7 +4683,7 @@ (define_insn "*aarch64_be_movci"
|| register_operand (operands[1], CImode))"
   "#"
   [(set_attr "type" "multiple")
-   (set (attr "length") (symbol_ref "aarch64_simd_attr_length_move (insn)"))]
+   (set_attr "length" "12,4,4")]
 )
 
 (define_insn "*aarch64_be_movxi"
@@ -4694,7 +4694,7 @@ (define_insn "*aarch64_be_movxi"
|| register_operand (operands[1], XImode))"
   "#"
   [(set_attr "type" "multiple")
-   (set (attr "length") (symbol_ref "aarch64_simd_attr_length_move (insn)"))]
+   (set_attr "length" "16,4,4")]
 )
 
 (define_split
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2d62c9ba9548aefdb986e97a1419b1565c150b63..49cd50b61cf4ba8074a44ae4029316a8af2f793b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10765,33 +10765,6 @@ aarch64_simd_emit_reg_reg_move (rtx *operands, enum machine_mode mode,
 		  gen_rtx_REG (mode, rsrc + count - i - 1));
 }
 
-/* Compute and return the length of aarch64_simd_mov, where  is
-   one of VSTRUCT modes: OI, CI or XI.  */
-int
-aarch64_simd_attr_length_move (rtx_insn *insn)
-{
-  machine_mode mode;
-
-  extract_insn_cached (insn);
-
-  if (REG_P (recog_data.operand[0]) && REG_P (recog_data.operand[1]))
-{
-  mode = GET_MODE (recog_data.operand[0]);
-  switch (mode)
-	{
-	case OImode:
-	  return 8;
-	case CImode:
-	  return 12;
-	case XImode:
-	  return 16;
-	default:
-	  gcc_unreachable ();
-	}
-}
-  return 4;
-}
-
 /* Compute and return the length of aarch64_simd_reglist, where  is
one of VSTRUCT modes: OI, CI, or XI.  */
 int
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d9bd39112c3f4af19781290778babdf919f1c514..43b22d81cda30398564af2f2fcaefceb215ec04c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -715,6 +715,7 @@ (define_mode_attr vp [(V8QI "v") (V16QI "v")
 (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
 (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
 
+;; Sum of lengths of instructions needed to move vector registers of a mode.
 (define_mode_attr insn_count [(OI "8") (CI "12") (XI "16")])
 
 ;; -fpic small model GOT reloc modifers: gotpage_lo15/lo14 for ILP64/32.


Re: [AArch64, 1/6] Reimplement scalar fixed-point intrinsics

2016-05-27 Thread Jiong Wang



On 27/05/16 14:03, James Greenhalgh wrote:

On Tue, May 24, 2016 at 09:23:36AM +0100, Jiong Wang wrote:

 * config/aarch64/aarch64-simd-builtins.def: Rename to
 aarch64-builtins.def.

Why? We already have some number of intrinsics in here that are not
strictly SIMD, but I don't see the value in the rename?


Mostly because this builtin infrastructure is handy that I want to
implement some vfp builtins in this .def file instead of implement those
raw structure inside aarch64-builtins.c.

And there maybe more and more such builtins in the future, so I renamed
this file.


Is this OK?


+(define_int_iterator FCVT_FIXED2F_SCALAR [UNSPEC_SCVTF_SCALAR 
UNSPEC_UCVTF_SCALAR])

Again, do we need the "SCALAR" versions at all?


That's because for scalar fixed-point conversion, we have two types of
instructions to support this.

  * scalar instruction from vfp
  * scalar variant instruction from simd

One is guarded by TARGET_FLOAT, the other is guarded by TARGET_SIMD, and
their instruction format is different, so I want to keep them in
aarch64.md and aarch64-simd.md seperately.

The other reason is these two use different patterns:

  * vfp scalar support conversion between different size, for example,
SF->DI, DF->SI, so it's using two mode iterators, GPI and GPF, and
is utilizing the product of the two to cover all supported
conversions, sfsi, sfdi, dfsi, dfdi, sisf, sidf, disf, didf.

  * simd scalar only support conversion between same size that single
mode iterator is used to cover sfsi, sisf, dfdi, didf.

For intrinsics implementation, I used builtins backed by vfp scalar
instead of simd scalar which requires the input sitting inside vector 
register.


I remember the simd scalar pattern was here because it's anyway needed
by patch [2/6] which extends it's modes naturally to vector modes. I was
thinking it's better to keep simd scalar variant with this scalar
intrinsics enable patch.

Is this OK?

Thanks.



Re: [AArch64, 3/6] Reimplement frsqrte intrinsics

2016-05-27 Thread Jiong Wang



On 27/05/16 14:24, James Greenhalgh wrote:

On Tue, May 24, 2016 at 09:23:48AM +0100, Jiong Wang wrote:

These intrinsics were implemented before the instruction pattern
"aarch64_rsqrte" added, that these intrinsics were implemented through
inline assembly.

This mirgrate the implementation to builtin.

gcc/
2016-05-23  Jiong Wang 

 * config/aarch64/aarch64-builtins.def (rsqrte): New builtins
for modes
 VALLF.
 * config/aarch64/aarch64-simd.md (aarch64_rsqrte_2):
Rename to
"aarch64_rsqrte".
 * config/aarch64/aarch64.c (get_rsqrte_type): Update gen* name.
 * config/aarch64/arm_neon.h (vrsqrts_f32): Remove inline
assembly.  Use
builtin.
 (vrsqrted_f64): Likewise.
 (vrsqrte_f32): Likewise.
 (vrsqrteq_f32): Likewise.
 (vrsqrteq_f64): Likewise.

This ChangeLog is not in the correct form.

It looks like you are missing vrsqrte_f64, could you please add that?


vrsqrte_f64 wasn't cleaned up in this patch because it's input type is 
float64x1 which
caused trouble during fitting it into aarch64 builtin infrastructure 
cleanly.


I might missed some thing, will double check this.



[PATCH] c++/71306 - bogus -Wplacement-new with an array element

2016-05-27 Thread Martin Sebor

It was pointed out on gcc-help last night that the -Wplacement-new
warning issues a false positive when a placement new expression is
invoked with an operand that is an element of an array of pointers
(to buffers of unknown size).  The attached patch adjusts
the warning so as to avoid this false positive.

The patch also removes the pointless loop that Jakub questioned
below:

  https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00050.html

Martin
PR c++/71306 - bogus -Wplacement-new with an array element

gcc/cp/ChangeLog:
2016-05-27  Martin Sebor  

	PR c++/71306
	* init.c (warn_placement_new_too_small): Handle placement new arguments
	that are elements of arrays more carefully.  Remove a pointless loop.

gcc/testsuite/ChangeLog:
2016-05-27  Martin Sebor  

	PR c++/71306
	* g++.dg/warn/Wplacement-new-size-3.C: New test.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 681ca12..9cbd43f 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2375,7 +2375,8 @@ warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper)
 
   STRIP_NOPS (oper);
 
-  if (TREE_CODE (oper) == ARRAY_REF)
+  if (TREE_CODE (oper) == ARRAY_REF
+  && (addr_expr || TREE_CODE (TREE_TYPE (oper)) == ARRAY_TYPE))
 {
   /* Similar to the offset computed above, see if the array index
 	 is a compile-time constant.  If so, and unless the offset was
@@ -2404,8 +2405,8 @@ warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper)
   bool compref = TREE_CODE (oper) == COMPONENT_REF;
 
   /* Descend into a struct or union to find the member whose address
- is being used as the agument.  */
-  while (TREE_CODE (oper) == COMPONENT_REF)
+ is being used as the argument.  */
+  if (TREE_CODE (oper) == COMPONENT_REF)
 {
   tree op0 = oper;
   while (TREE_CODE (op0 = TREE_OPERAND (op0, 0)) == COMPONENT_REF);
diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-3.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-3.C
new file mode 100644
index 000..f5dd642
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-3.C
@@ -0,0 +1,40 @@
+// PR c++/71306 - bogus -Wplacement-new with an array element
+// { dg-do compile }
+// { dg-options "-Wplacement-new" }
+
+void* operator new (__SIZE_TYPE__, void *p) { return p; }
+
+struct S64 { char c [64]; };
+
+S64 s2 [2];
+S64* ps2 [2];
+S64* ps2_2 [2][2];
+
+void* pv2 [2];
+
+void f ()
+{
+  char a [2][sizeof (S64)];
+
+  new (a) S64;
+  new (a [0]) S64;
+  new (a [1]) S64;
+
+  // Verify there is no warning with buffers of sufficient size.
+  new (&s2 [0]) S64;
+  new (&s2 [1]) S64;
+
+  // ..and no warning with pointers to buffers of unknown size.
+  new (ps2 [0]) S64;
+  new (ps2 [1]) S64;
+
+  // But a warning when using the ps2_2 array itself as opposed
+  // to the pointers it's elements might point to.
+  new (ps2_2 [0]) S64;// { dg-warning "placement new" }
+  new (ps2_2 [1]) S64;// { dg-warning "placement new" }
+
+  // ..and no warning again with pointers to buffers of unknown
+  // size.
+  new (pv2 [0]) S64;
+  new (pv2 [1]) S64;
+}


Moving backwards/FSM threader into its own pass

2016-05-27 Thread Jeff Law


It's been my plan since finally wrapping my head around Bodik's thesis 
to revamp how we handle jump threading to use some of the principles 
from his thesis.  In particular, the back substitution and 
simplification model feels like the right long term direction.


Sebastian's FSM threader was the first step on that path (gcc-5). 
Exploiting that threader for more than just FSM loops was the next big 
step (gcc-6).


This patch takes the next step -- dis-entangling that new jump threading 
code from the old threading code and VRP/DOM.


The key thing to realize here is that the backwards (FSM) jump threader 
does not inherently need the DOM tables nor the ASSERT_EXPRs from VRP to 
do its job.  ie, it can and should run completely independently of 
DOM/VRP (though one day it may exploit range information that a prior 
VRP pass has computed).


By moving the backwards threader into its own pass, we can run it prior 
to DOM/VRP, which allow DOM/VRP to work on a simpler CFG with larger 
extended basic blocks.


The removal of unexecutable paths before VRP also has the nice effect of 
also eliminating false positive warnings for some work Aldy is doing 
around out-of-bound array index warnings.


We can remove all the calls to the backwards threader from the old style 
threader.  The way the FSM bits wired into the old threader caused 
redundant path evaluations.  That can be easily fixed with the FSM bits 
in their own pass.  The net is a 25% reduction in paths examined by the 
FSM threader.


Finally, we ultimately end up threading more jumps.  I don't have the #s 
handy anymore, but when I ran this through my tester there was a clear 
decrease in the number of runtime jumps.


So what are the downsides.

With the threader in its own pass, we end up getting more calls into the 
CFG & SSA verification routines in a checking-enabled compiler.  So the 
compile-time improvement is lost for a checking-enabled compiler.


The backward threader does not combine identical jump threading paths 
with different starting edges into a single jump threading path with 
multiple entry points.  This is primarily a codesize issue, but can have 
a secondary effect on performance.  I know how to fix this and it's on 
the list for gcc-7 along with further cleanups.



Bootstrapped and regression tested on x86_64 linux.  Installing on the 
trunk momentarily.


Jeff
commit 35bd646a4834a68a49af9ccb5873362a0fc742ae
Author: Jeff Law 
Date:   Fri May 27 10:23:54 2016 -0600

* tree-ssa-threadedge.c: Remove include of tree-ssa-threadbackward.h.
(thread_across_edge): Remove calls to find_jump_threads_backwards.
* passes.def: Add jump threading passes before DOM/VRP.
* tree-ssa-threadbackward.c (find_jump_threads_backwards): Change
argument to a basic block from an edge.  Remove tests which are
handled elsewhere.
(pass_data_thread_jumps, class pass_thread_jumps): New.
(pass_thread_jumps::gate, pass_thread_jumps::execute): New.
(make_pass_thread_jumps): Likewise.
* tree-pass.h (make_pass_thread_jumps): Declare.

* gcc.dg/tree-ssa/pr21417.c: Update expected output.
* gcc.dg/tree-ssa/pr66752-3.c: Likewise.
* gcc.dg/tree-ssa/pr68198.c: Likewise.
* gcc.dg/tree-ssa/pr69196-1.c: Likewise.
* gcc.dg/tree-ssa/pr69270-3.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2g.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2h.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-13.c: Likewise.
* gcc.dg/tree-ssa/vrp56.c: Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f04d26d..40bac96 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2016-05-26  Jeff Law  
+
+   * tree-ssa-threadedge.c: Remove include of tree-ssa-threadbackward.h.
+   (thread_across_edge): Remove calls to find_jump_threads_backwards.
+   * passes.def: Add jump threading passes before DOM/VRP.
+   * tree-ssa-threadbackward.c (find_jump_threads_backwards): Change
+   argument to a basic block from an edge.  Remove tests which are
+   handled elsewhere.
+   (pass_data_thread_jumps, class pass_thread_jumps): New.
+   (pass_thread_jumps::gate, pass_thread_jumps::execute): New.
+   (make_pass_thread_jumps): Likewise.
+   * tree-pass.h (make_pass_thread_jumps): Declare.
+
 2016-05-27  Eric Botcazou  
 
* config/visium/visium-protos.h (split_double_move): Rename into...
diff --git a/gcc/passes.def b/gcc/passes.def
index 993ed28..3647e90 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -199,6 +199,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_return_slot);
   NEXT_PASS (pass_fre);
   NEXT_PASS (pass_merge_phi);
+   

Re: [C++ PATCH] PR c++/69855

2016-05-27 Thread Jason Merrill
On Fri, May 27, 2016 at 11:20 AM, Ville Voutilainen
 wrote:
> On 27 May 2016 at 17:46, Jason Merrill  wrote:
>> OK, thanks.

> Should this fix be backported to the gcc6-branch? I have no plans to
> backport it any further than that.

No, the bug isn't a regression and only affects invalid code, so it
isn't a good candidate for backporting.

Jason


Re: Enable loop peeling at -O3

2016-05-27 Thread Sandra Loosemore

On 05/27/2016 07:19 AM, Jan Hubicka wrote:


[snip]

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236815)
+++ doc/invoke.texi (working copy)
@@ -8661,10 +8661,17 @@ the loop is entered.  This usually makes
  @item -fpeel-loops
  @opindex fpeel-loops
  Peels loops for which there is enough information that they do not
-roll much (from profile feedback).  It also turns on complete loop peeling
-(i.e.@: complete removal of loops with small constant number of iterations).
+roll much (from profile feedback or static analysis).  It also turns on
+complete loop peeling (i.e.@: complete removal of loops with small constant
+number of iterations).

-Enabled with @option{-fprofile-use}.
+Enabled with @option{-O3} and @option{-fprofile-use}.


Do you really mean "or" instead of "and" here?  It looks to me like the 
code part of your patch enables -fpeel-loops unconditionally at -O3 and 
does not check if -fprofile-use is also set.



+
+@item -fpeel-all-loops
+@opindex fpeel-all-loops
+Peel all loops, even if their number of iterations is uncertain when
+the loop is entered.  For loops with large number of iterations this leads
+to wasted code size.

  @item -fmove-loop-invariants
  @opindex fmove-loop-invariants


I think you also need to add the new option -fpeel-all-loops to the 
"Option Summary" section, and -fpeel-loops to the documentation of -O3.


-Sandra



[PATCH2][PR71252] Fix insertion point of stmt_to_insert

2016-05-27 Thread Kugan Vivekanandarajah
Hi Richard,

This fix insertion point of stmt_to_insert based on your comments. In
insert_stmt_before_use , I now use find_insert_point such that we
insert the stmt_to_insert after its operands are defined. This means
that we now have to insert before and insert after in other cases.

I also factored out uses of insert_stmt_before_use.

I tested this with:
./build/gcc/f951 cp2k_single_file.f90 -O3 -ffast-math -march=westmere

I am running bootstrap and regression testing on x86-64-linux gnu. Is
this OK for trunk if the testing is fine ? I will also test with other
test cases from relevant PRs


Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-05-28  Kugan Vivekanandarajah  

* gcc.dg/tree-ssa/pr71269.c: New test.

gcc/ChangeLog:

2016-05-28  Kugan Vivekanandarajah  

* tree-ssa-reassoc.c (insert_stmt_before_use): Use find_insert_point so
that inserted stmt will not dominate stmts that defines its operand.
(rewrite_expr_tree): Add stmt_to_insert before adding the use stmt.
(rewrite_expr_tree_parallel): Likewise.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
index e69de29..4dceaaa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
@@ -0,0 +1,10 @@
+/* PR middle-end/71269 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+int a, b, c;
+void  fn2 (int);
+void fn1 ()
+{
+  fn2 (sizeof 0 + c + a + b + b);
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index c9ed679..8a2154f 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1777,16 +1777,6 @@ eliminate_redundant_comparison (enum tree_code opcode,
   return false;
 }
 
-/* If the stmt that defines operand has to be inserted, insert it
-   before the use.  */
-static void
-insert_stmt_before_use (gimple *stmt, gimple *stmt_to_insert)
-{
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
-  gimple_set_uid (stmt_to_insert, gimple_uid (stmt));
-  gsi_insert_before (&gsi, stmt_to_insert, GSI_NEW_STMT);
-}
-
 
 /* Transform repeated addition of same values into multiply with
constant.  */
@@ -3799,6 +3789,29 @@ find_insert_point (gimple *stmt, tree rhs1, tree rhs2)
   return stmt;
 }
 
+/* If the stmt that defines operand has to be inserted, insert it
+   before the use.  */
+static void
+insert_stmt_before_use (gimple *stmt, gimple *stmt_to_insert)
+{
+  gcc_assert (is_gimple_assign (stmt_to_insert));
+  tree rhs1 = gimple_assign_rhs1 (stmt_to_insert);
+  tree rhs2 = gimple_assign_rhs2 (stmt_to_insert);
+  gimple *insert_point = find_insert_point (stmt, rhs1, rhs2);
+  gimple_stmt_iterator gsi = gsi_for_stmt (insert_point);
+  gimple_set_uid (stmt_to_insert, gimple_uid (insert_point));
+
+  /* If the insert point is not stmt, then insert_point would be
+ the point where operand rhs1 or rhs2 is defined. In this case,
+ stmt_to_insert has to be inserted afterwards. This would
+ only happen when the stmt insertion point is flexible. */
+  if (stmt == insert_point)
+gsi_insert_before (&gsi, stmt_to_insert, GSI_NEW_STMT);
+  else
+gsi_insert_after (&gsi, stmt_to_insert, GSI_NEW_STMT);
+}
+
+
 /* Recursively rewrite our linearized statements so that the operators
match those in OPS[OPINDEX], putting the computation in rank
order.  Return new lhs.  */
@@ -3835,6 +3848,12 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
  print_gimple_stmt (dump_file, stmt, 0, 0);
}
 
+ /* If the stmt that defines operand has to be inserted, insert it
+before the use.  */
+ if (oe1->stmt_to_insert)
+   insert_stmt_before_use (stmt, oe1->stmt_to_insert);
+ if (oe2->stmt_to_insert)
+   insert_stmt_before_use (stmt, oe2->stmt_to_insert);
  /* Even when changed is false, reassociation could have e.g. removed
 some redundant operations, so unless we are just swapping the
 arguments or unless there is no change at all (then we just
@@ -3843,12 +3862,6 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
{
  gimple *insert_point
= find_insert_point (stmt, oe1->op, oe2->op);
- /* If the stmt that defines operand has to be inserted, insert it
-before the use.  */
- if (oe1->stmt_to_insert)
-   insert_stmt_before_use (stmt, oe1->stmt_to_insert);
- if (oe2->stmt_to_insert)
-   insert_stmt_before_use (stmt, oe2->stmt_to_insert);
  lhs = make_ssa_name (TREE_TYPE (lhs));
  stmt
= gimple_build_assign (lhs, gimple_assign_rhs_code (stmt),
@@ -3864,12 +3877,6 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
{
  gcc_checking_assert (find_insert_point (stmt, oe1->op, oe2->op)
   == stmt);
- /* If the stmt that defines operand has to be inserted, insert it
-befor

[gomp4.5] Partial support for Fortran OpenMP doacross loops

2016-05-27 Thread Jakub Jelinek
Hi!

I've committed the following patch to gomp-4_5-branch, which contains
initial version of doacross Fortran support.  No testcase yet,
as only simple loops (ones with constant 1 or -1 step) work right now,
for non-simple ones (variable step or non-1/-1 step) I'll need to add some
middle-end support, because for those we emit to the middle-end
a loop starting at 0 and with step 1 and thus need to adjust the
depend(sink:) expansion.

2016-05-27  Jakub Jelinek  

* gfortran.h (enum gfc_statement): Add ST_OMP_ORDERED_DEPEND.
(enum gfc_omp_depend_op): Add OMP_DEPEND_SINK_FIRST and
OMP_DEPEND_SINK.
(struct gfc_omp_clauses): Add depend_source field.
* parse.c (decode_omp_directive): If ordered directive has
depend clause as the first of the clauses, use
gfc_match_omp_ordered_depend and ST_OMP_ORDERED_DEPEND instead of
gfc_match_omp_ordered and ST_OMP_ORDERED.
(case_executable): Add ST_OMP_ORDERED_DEPEND case.
(gfc_ascii_statement): Handle ST_OMP_ORDERED_DEPEND.
* st.c (gfc_free_statement): Free omp clauses even for
EXEC_OMP_ORDERED.
* dump-parse-tree.c (show_omp_namelist): Handle OMP_DEPEND_SINK_FIRST
depend_op.
(show_omp_clauses): Handle depend_source.
(show_omp_node): Print clauses for EXEC_OMP_ORDERED.  Allow NULL
c->block for EXEC_OMP_ORDERED.
* trans-openmp.c (gfc_trans_omp_clauses): Handle OMP_DEPEND_SINK_FIRST
depend_op.  Handle orderedc and depend_source.
(gfc_trans_omp_do): Set collapse to orderedc if non-zero.  Fill in
OMP_FOR_ORIG_DECLS for doacross loops.
(gfc_trans_omp_ordered): Translate omp clauses, allow NULL
code->block.
(gfc_split_omp_clauses): Copy orderedc together with ordered.
* frontend-passes.c (gfc_code_walker): Handle EXEC_OMP_ORDERED.
* openmp.c (gfc_match_omp_depend_sink): New function.
(gfc_match_omp_clauses): Parse depend(source) and depend(sink: ...).
(OMP_ORDERED_CLAUSES): Define.
(gfc_match_omp_ordered): Parse clauses.
(gfc_match_omp_ordered_depend): New function.
(resolve_omp_clauses): Require orderedc >= collapse if specified.
Handle depend(sink:) and depend(source) restrictions.  Disallow linear
clause when orderedc is non-zero.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse to orderedc
if non-zero.
(resolve_omp_do): Set collapse to orderedc if non-zero.
* match.h (gfc_match_omp_ordered_depend): New prototype.
* match.c (match_exit_cycle): Rename collapse variable to count,
set it to orderedc if non-zero, instead of collapse.

--- gcc/fortran/gfortran.h.jj   2016-05-23 17:20:09.0 +0200
+++ gcc/fortran/gfortran.h  2016-05-25 18:23:54.740764529 +0200
@@ -246,7 +246,7 @@ enum gfc_statement
   ST_OMP_TARGET_ENTER_DATA, ST_OMP_TARGET_EXIT_DATA,
   ST_OMP_TARGET_SIMD, ST_OMP_END_TARGET_SIMD,
   ST_OMP_TASKLOOP, ST_OMP_END_TASKLOOP,
-  ST_OMP_TASKLOOP_SIMD, ST_OMP_END_TASKLOOP_SIMD,
+  ST_OMP_TASKLOOP_SIMD, ST_OMP_END_TASKLOOP_SIMD, ST_OMP_ORDERED_DEPEND,
   ST_PROCEDURE, ST_GENERIC, ST_CRITICAL, ST_END_CRITICAL,
   ST_GET_FCN_CHARACTERISTICS, ST_LOCK, ST_UNLOCK, ST_EVENT_POST,
   ST_EVENT_WAIT,ST_NONE
@@ -1110,7 +1110,9 @@ enum gfc_omp_depend_op
 {
   OMP_DEPEND_IN,
   OMP_DEPEND_OUT,
-  OMP_DEPEND_INOUT
+  OMP_DEPEND_INOUT,
+  OMP_DEPEND_SINK_FIRST,
+  OMP_DEPEND_SINK
 };
 
 enum gfc_omp_map_op
@@ -1255,7 +1257,7 @@ typedef struct gfc_omp_clauses
   bool nowait, ordered, untied, mergeable;
   bool inbranch, notinbranch, defaultmap, nogroup;
   bool sched_simd, sched_monotonic, sched_nonmonotonic;
-  bool simd, threads;
+  bool simd, threads, depend_source;
   enum gfc_omp_cancel_kind cancel;
   enum gfc_omp_proc_bind_kind proc_bind;
   struct gfc_expr *safelen_expr;
--- gcc/fortran/parse.c.jj  2016-05-13 11:49:47.0 +0200
+++ gcc/fortran/parse.c 2016-05-25 16:06:33.694148119 +0200
@@ -831,7 +831,14 @@ decode_omp_directive (void)
   matcho ("master", gfc_match_omp_master, ST_OMP_MASTER);
   break;
 case 'o':
-  matcho ("ordered", gfc_match_omp_ordered, ST_OMP_ORDERED);
+  if (flag_openmp && gfc_match ("ordered depend (") == MATCH_YES)
+   {
+ gfc_current_locus = old_locus;
+ matcho ("ordered", gfc_match_omp_ordered_depend,
+ ST_OMP_ORDERED_DEPEND);
+   }
+  else
+   matcho ("ordered", gfc_match_omp_ordered, ST_OMP_ORDERED);
   break;
 case 'p':
   matchs ("parallel do simd", gfc_match_omp_parallel_do_simd,
@@ -1373,7 +1380,8 @@ next_statement (void)
   case ST_OMP_BARRIER: case ST_OMP_TASKWAIT: case ST_OMP_TASKYIELD: \
   case ST_OMP_CANCEL: case ST_OMP_CANCELLATION_POINT: \
   case ST_OMP_TARGET_UPDATE: case ST_OMP_TARGET_ENTER_DATA: \
-  case ST_OMP_TARGET_EXIT_DATA: case ST_ERROR_STOP: case ST_SYNC_ALL: \
+  case ST_OMP_TARGET_EXIT_DATA: case ST_OMP_ORDERED_DEP

Re: [C++ PATCH] PR c++/69855

2016-05-27 Thread Ville Voutilainen
On 27 May 2016 at 17:46, Jason Merrill  wrote:
> OK, thanks.

Should this fix be backported to the gcc6-branch? I have no plans to
backport it any further than that.

>
> Jason
>
>
> On Fri, May 27, 2016 at 10:43 AM, Ville Voutilainen
>  wrote:
>> On 20 May 2016 at 07:05, Ville Voutilainen  
>> wrote:
>>> On 19 May 2016 at 19:40, Jason Merrill  wrote:
 Any thoughts on doing something similar for extern variable declarations?
>>>
>>> Ah, we diagnose local extern variable declarations that clash with
>>> previous declarations,
>>> but we don't diagnose cases where a subsequent declaration clashes
>>> with a previous
>>> local extern declaration. I'll take a look.
>>
>> As discussed on irc, this requires teaching variable declarations to
>> work with DECL_ANTICIPATED
>> and is thus some amounts of surgery, so the recommendation was to go
>> ahead with this patch.
>> I added a comment to the new code block, an updated patch attached.
>> Changelog as before.
>> Ok for trunk?


[gomp4] backport firstprivate subarray changes

2016-05-27 Thread Cesar Philippidis
This patch backports the recent firstprivate subarray changes I've made
to trunk. Gomp4 has preliminary support for c++ reference types, so I
had to make some adjustments to the original patch to get this.C and
non-scalar-data.C working. Those changes were relatively minor, so I'll
bring them to trunk after I address the remarks Thomas made on my
original patch.

Thomas, I decided to xfail a bunch of kernels tests in gomp4 instead of
removing them so that we can have a better record on what changed. One
of use should investigate why the alias analysis doesn't like the
firstprivate pointer changes.

Cesar
2016-05-27  Cesar Philippidis  

	gcc/testsuite/
	* c-c++-common/goacc/kernels-loop-offload-alias-none.c: Add xfails.
	* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-2.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-3.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-6.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias.c: Likewise.
	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
	Likewise.
	* g++.dg/goacc/data-1.C: New test.

	libgomp/
	* testsuite/libgomp.oacc-c++/non-scalar-data.C: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c: Adjust
	test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/
	kernels-parallel-loop-data-enter-exit.c: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-14.f90: Likewise.

	Backport trunk r236678:
	2016-05-24  Cesar Philippidis  
	gcc/c/
	* c-parser.c (c_parser_oacc_declare): Add support for
	GOMP_MAP_FIRSTPRIVATE_POINTER.
	* c-typeck.c (handle_omp_array_sections_1): Replace bool is_omp
	argument with enum c_omp_region_type ort.
	(handle_omp_array_sections): Likewise.  Update call to
	handle_omp_array_sections_1.
	(c_finish_omp_clauses): Add specific errors and warning messages for
	OpenACC.  Use firsrtprivate pointers for OpenACC subarrays.  Update
	call to handle_omp_array_sections.


	gcc/cp/
	* parser.c (cp_parser_oacc_declare): Add support for
	GOMP_MAP_FIRSTPRIVATE_POINTER.
	* semantics.c (handle_omp_array_sections_1): Replace bool is_omp
	argument with enum c_omp_region_type ort.  Don't privatize OpenACC
	non-static members.
	(handle_omp_array_sections): Replace bool is_omp argument with enum
	c_omp_region_type ort.  Update call to handle_omp_array_sections_1.
	(finish_omp_clauses): Add specific errors and warning messages for
	OpenACC.  Use firsrtprivate pointers for OpenACC subarrays.  Update
	call to handle_omp_array_sections.

	gcc/
	* gimplify.c (omp_notice_variable): Use zero-length arrays for data
	pointers inside OACC_DATA regions.
	(gimplify_scan_omp_clauses): Prune firstprivate clause associated
	with OACC_DATA, OACC_ENTER_DATA and OACC_EXIT data regions.
	(gimplify_adjust_omp_clauses): Fix typo in comment.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-3.c: Likewise.
	* c-c++-common/goacc/kernels-alias-4.c: Likewise.
	* c-c++-common/goacc/kernels-alias-5.c: Likewise.
	* c-c++-common/goacc/kernels-alias-8.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/pcopy.c: Likewise.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/pr70688.c: New test.
	* c-c++-common/goacc/present-1.c: Adjust test.
	* c-c++-common/goacc/reduction-5.c: Likewise.
	* g++.dg/goacc/data-1.C: New test.

	libgomp/
	* oacc-mem.c (acc_malloc): Update handling of shared-memory targets.
	(acc_free): Likewise.
	(acc_memcpy_to_device): Likewise.
	(acc_memcpy_from_device): Likewise.
	(acc_deviceptr): Likewise.
	(acc_hostptr): Likewise.
	(acc_is_present): Likewise.
	(acc_map_data): Likewise.
	(acc_unmap_data): Likewise.
	(present_create_copy): Likewise.
	(delete_copyout): Likewise.
	(update_dev_host): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Remove xfail.
	* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/enter_exit-lib.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: Adjust test so that
	it only runs on nvptx targets.
	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-15.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-16.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-17.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-20.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-21.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-22.c: Likewise.
	* test

Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-05-27 Thread Alan Hayward

On 27/05/2016 12:41, "Richard Biener"  wrote:

>On Fri, May 27, 2016 at 11:09 AM, Alan Hayward 
>wrote:
>> This patch is a reworking of the previous vectorize inductions that are
>> live
>> after the loop patch.
>> It now supports SLP and an optimisation has been moved to patch [3/3].
>>
>>
>> Stmts which are live (ie: defined inside a loop and then used after the
>> loop)
>> are not currently supported by the vectorizer.  In many cases
>> vectorization can
>> still occur because the SCEV cprop pass will hoist the definition of the
>> stmt
>> outside of the loop before the vectorizor pass. However, there are
>>various
>> cases SCEV cprop cannot hoist, for example:
>>   for (i = 0; i < n; ++i)
>> {
>>   ret = x[i];
>>   x[i] = i;
>> }
>>return i;
>>
>> Currently stmts are marked live using a bool, and the relevant state
>>using
>> an
>> enum. Both these states are propagated to the definition of all uses of
>>the
>> stmt. Also, a stmt can be live but not relevant.
>>
>> This patch vectorizes a live stmt definition normally within the loop
>>and
>> then
>> after the loop uses BIT_FIELD_REF to extract the final scalar value from
>> the
>> vector.
>>
>> This patch adds a new relevant state (vect_used_only_live) for when a
>>stmt
>> is
>> used only outside the loop. The relevant state is still propagated to
>>all
>> it's
>> uses, but the live bool is not (this ensures that
>> vectorizable_live_operation
>> is only called with stmts that really are live).
>>
>> Tested on x86 and aarch64.
>
>+  /* If STMT is a simple assignment and its inputs are invariant, then
>it can
>+ remain in place, unvectorized.  The original last scalar value that
>it
>+ computes will be used.  */
>+  if (is_simple_and_all_uses_invariant (stmt, loop_vinfo))
> {
>
>so we can't use STMT_VINFO_RELEVANT or so?  I thought we somehow
>mark stmts we don't vectorize in the end.

It’s probably worth making clear that this check exists in the current
GCC head - today vectorize_live_operation only supports the
simple+invariant
case and the SSE2 case.
My patch simply moved the simple+invariant code into the new function
is_simple_and_all_uses_invariant.

Looking at this again, I think what we really care about is….
*If the stmt is live but not relevant, we need to mark it to ensure it is
vectorised.
*If a stmt is simple+invariant then a live usage of it can either use the
scalar or vectorized version.

So for a live stmt:
*If it is simple+invariant and not relevant, then it is more optimal to
use the
scalar version.
*If it is simple+invariant and relevant then it is more optimal to use the
vectorized version.
*If it is not simple+invariant then we must always use the vectorized
version.

Therefore, the patch as it stands is correct but not optimal. In patch 3/3
for
the code above (vectorize_live_operation) I can change the check to if not
relevant then assert that it is not simple+invariant and return true.



>
>+  lhs = (is_a  (stmt)) ? gimple_phi_result (stmt)
>+   : gimple_get_lhs (stmt);
>+  lhs_type = TREE_TYPE (lhs);
>+
>+  /* Find all uses of STMT outside the loop.  */
>+  auto_vec worklist;
>+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
>+{
>+  basic_block bb = gimple_bb (use_stmt);
>+
>+  if (!flow_bb_inside_loop_p (loop, bb))
>+   worklist.safe_push (use_stmt);
> }
>+  gcc_assert (!worklist.is_empty ());
>
>as we are in loop-closed SSA there should be exactly one such use...?

Yes, I should change this to assert that worklist is of length 1.

>
>+  /* Get the correct slp vectorized stmt.  */
>+  vec_oprnds.create (num_vec);
>+  vect_get_slp_vect_defs (slp_node, &vec_oprnds);
>
>As you look at the vectorized stmt you can directly use the
>SLP_TREE_VEC_STMTS
>array (the stmts lhs, of course), no need to export this function.

Ok, I can change this.

>
>The rest of the changes look ok to me.

Does that include PATCH 1/3  ?

>
>Thanks,
>Richard.
>
>
>> gcc/
>> * tree-vect-loop.c (vect_analyze_loop_operations): Allow live
>> stmts.
>> (vectorizable_reduction): Check for new relevant state.
>> (vectorizable_live_operation): vectorize live stmts using
>> BIT_FIELD_REF.  Remove special case for gimple assigns stmts.
>> * tree-vect-stmts.c (is_simple_and_all_uses_invariant): New
>> function.
>> (vect_stmt_relevant_p): Check for stmts which are only used
>>live.
>> (process_use): Use of a stmt does not inherit it's live value.
>> (vect_mark_stmts_to_be_vectorized): Simplify relevance
>>inheritance.
>> (vect_analyze_stmt): Check for new relevant state.
>> *tree-vect-slp.c (vect_get_slp_vect_defs): Make global
>> *tree-vectorizer.h (vect_relevant): New entry for a stmt which
>>is
>> used
>> outside the loop, but not inside it.
>>
>> testsuite/
>> * gcc.dg/tree-ssa/pr64183.c: Ensure test does not vectorize.
>> * testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c: Rem

[visium] Split DImode arithmetical operations

2016-05-27 Thread Eric Botcazou
This makes it so that DImode arithmetical operations are split into a pair of 
SImode operations in order to enable better scheduling.

Tested on visium-elf, applied on the mainline and 6 branch.


2016-05-27  Eric Botcazou  

* config/visium/visium-protos.h (split_double_move): Rename into...
(visium_split_double_move): ...this.
(visium_split_double_add): Declare.
* config/visium/visium.c (split_double_move): Rename into...
(visium_split_double_move): ...this.
(visium_split_double_add): New function.
(visium_expand_copysign): Renumber operands for consistency.
* config/visium/visium.md (DImode move splitter): Adjust to renaming.
(DFmode move splitter): Likewise.
(*addi3_insn): Split by means of visium_split_double_add.
(*adddi3_insn_flags): Delete.
(*plus_plus_sltu): New insn.
(*subdi3_insn): Split by means of visium_split_double_add.
(subdi3_insn_flags): Delete.
(*minus_minus_sltu): New insn.
(*negdi2_insn): Split by means of visium_split_double_add.
(*negdi2_insn_flags): Delete.

-- 
Eric BotcazouIndex: config/visium/visium-protos.h
===
--- config/visium/visium-protos.h	(revision 236761)
+++ config/visium/visium-protos.h	(working copy)
@@ -49,7 +49,8 @@ extern void visium_split_cbranch (enum r
 extern const char *output_ubranch (rtx, rtx_insn *);
 extern const char *output_cbranch (rtx, enum rtx_code, enum machine_mode, int,
    rtx_insn *);
-extern void split_double_move (rtx *, enum machine_mode);
+extern void visium_split_double_move (rtx *, enum machine_mode);
+extern void visium_split_double_add (enum rtx_code, rtx, rtx, rtx);
 extern void visium_expand_copysign (rtx *, enum machine_mode);
 extern void visium_expand_int_cstore (rtx *, enum machine_mode);
 extern void visium_expand_fp_cstore (rtx *, enum machine_mode);
Index: config/visium/visium.c
===
--- config/visium/visium.c	(revision 236761)
+++ config/visium/visium.c	(working copy)
@@ -2026,7 +2026,7 @@ visium_rtx_costs (rtx x, machine_mode mo
 /* Split a double move of OPERANDS in MODE.  */
 
 void
-split_double_move (rtx *operands, enum machine_mode mode)
+visium_split_double_move (rtx *operands, enum machine_mode mode)
 {
   bool swap = false;
 
@@ -2076,14 +2076,74 @@ split_double_move (rtx *operands, enum m
 }
 }
 
+/* Split a double addition or subtraction of operands.  */
+
+void
+visium_split_double_add (enum rtx_code code, rtx op0, rtx op1, rtx op2)
+{
+  rtx op3 = gen_lowpart (SImode, op0);
+  rtx op4 = gen_lowpart (SImode, op1);
+  rtx op5;
+  rtx op6 = gen_highpart (SImode, op0);
+  rtx op7 = (op1 == const0_rtx ? op1 : gen_highpart (SImode, op1));
+  rtx op8;
+
+  /* If operand #2 is a small constant, then its high part is null.  */
+  if (CONST_INT_P (op2))
+{
+  HOST_WIDE_INT val = INTVAL (op2);
+
+  if (val < 0)
+	{
+	  code = (code == MINUS ? PLUS : MINUS);
+	  val = -val;
+	}
+
+  op5 = gen_int_mode (val, SImode);
+  op8 = const0_rtx;
+}
+  else
+{
+  op5 = gen_lowpart (SImode, op2);
+  op8 = gen_highpart (SImode, op2);
+}
+
+  /* This is the {add,sub,neg}si3_insn_set_flags pattern.  */
+  rtx x;
+  if (op4 == const0_rtx)
+x = gen_rtx_NEG (SImode, op5);
+  else
+x = gen_rtx_fmt_ee (code, SImode, op4, op5);
+  rtx pat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  XVECEXP (pat, 0, 0) = gen_rtx_SET (op3, x);
+  rtx flags = gen_rtx_REG (CC_NOOVmode, FLAGS_REGNUM);
+  x = gen_rtx_COMPARE (CC_NOOVmode, shallow_copy_rtx (x), const0_rtx);
+  XVECEXP (pat, 0, 1) = gen_rtx_SET (flags, x);
+  emit_insn (pat);
+
+  /* This is the plus_[plus_]sltu_flags or minus_[minus_]sltu_flags pattern.  */
+  if (op8 == const0_rtx)
+x = op7;
+  else
+x = gen_rtx_fmt_ee (code, SImode, op7, op8);
+  x = gen_rtx_fmt_ee (code, SImode, x, gen_rtx_LTU (SImode, flags, const0_rtx));
+  pat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  XVECEXP (pat, 0, 0) = gen_rtx_SET (op6, x);
+  flags = gen_rtx_REG (CCmode, FLAGS_REGNUM);
+  XVECEXP (pat, 0, 1) = gen_rtx_CLOBBER (VOIDmode, flags);
+  emit_insn (pat);
+
+  visium_flags_exposed = true;
+}
+
 /* Expand a copysign of OPERANDS in MODE.  */
 
 void
 visium_expand_copysign (rtx *operands, enum machine_mode mode)
 {
-  rtx dest = operands[0];
-  rtx op0 = operands[1];
-  rtx op1 = operands[2];
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
   rtx mask = force_reg (SImode, GEN_INT (0x7fff));
   rtx x;
 
@@ -2091,37 +2151,37 @@ visium_expand_copysign (rtx *operands, e
  the FPU on the MCM have a non-standard behavior wrt NaNs.  */
   gcc_assert (mode == SFmode);
 
-  /* First get all the non-sign bits of OP0.  */
-  if (GET_CODE (op0) == CONST_DOUBLE)
+  /* First get all the non-sign bits of op1.  */
+  if (GET_CODE (op1) == CONST_DOUBLE)
 {
- 

Re: [C++ Patch/RFC] PR 60385 and other issues about wrongly named namespaces (eg, c++/68723)

2016-05-27 Thread Jason Merrill

Let's go with the second patch.

Jason


Re: [RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2016-05-27 Thread Ulrich Weigand
Andre Vieira (lists) wrote:
> On 07/04/16 10:30, Andre Vieira (lists) wrote:
> > On 17/03/16 16:33, Andre Vieira (lists) wrote:
> >> On 23/10/15 12:31, Bernd Schmidt wrote:
> >>> On 10/12/2015 11:58 AM, Ulrich Weigand wrote:
> 
>  Index: gcc/configure.ac
>  ===
>  --- gcc/configure.ac(revision 228530)
>  +++ gcc/configure.ac(working copy)
>  @@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t
>    fi
> 
>    if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
>  -  if test "x$with_headers" != x; then
>  +  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
>    target_header_dir=$with_headers
>  elif test "x$with_sysroot" = x; then
>   
>  target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
> 
> >>>
> >>> I'm missing the beginning of this conversation, but this looks like a
> >>> reasonable change (avoiding target_header_dir=yes for --with-headers).
> >>> So, approved.
> >>>
> >>>
> >>> Bernd
> >>>
> >> Hi there,
> >>
> >> I was wondering why this never made it to trunk. I am currently running
> >> into an issue that this patch would fix.

Seems I never actually checked this in, even though it was approved.
Thanks for the reminder, I've now checked the patch in.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[C++ Patch/RFC] PR 60385 and other issues about wrongly named namespaces (eg, c++/68723)

2016-05-27 Thread Paolo Carlini

Hi,

we have these long standing issues with code like (c++/60385):

float foo4();

namespace foo4
{
   // 
}

where the name of the namespace conflicts with an existing declaration. 
Error recovery is currently suboptimal, for example, c++/60385 is about


struct bar6
{
friend wchar_t bar1();
};

inside the namespace: due to the code in do_friend:

  push_nested_namespace (ns);
  decl = pushdecl_namespace_level (decl, /*is_friend=*/true);
  pop_nested_namespace (ns);

we issue duplicate diagnostic about the wrong name and later we crash in 
pop_namespace (the second time in push_namespace, 
IDENTIFIER_NAMESPACE_VALUE isn't found set for the malformed namespace, 
thus need_new is true, pushdecl is called...)


Now, I'm wondering how far we want to go with error recovery for such 
snippets. Certainly, in analogy with the code at the beginning of 
cp_parser_class_specifier_1, we can completely skip the body of such 
malformed namespaces. That would be the first attached patchlet. Or we 
can go on in cp_parser_namespace_definition but remember that 
push_namespace didn't really succeed and keep things consistent, thus 
avoid crashing in pop_namespace later, as currently happens. That would 
be second patchlet. Both ideas pass testing and work for c++/68723 too 
(as expected, the first patchlet leads to particularly neat diagnostic 
for the very broken snippet in c++/68723, only the error about the wrong 
namespace name, as for c++/60385).


Thanks!
Paolo.

//


Index: name-lookup.c
===
--- name-lookup.c   (revision 236809)
+++ name-lookup.c   (working copy)
@@ -3685,7 +3685,7 @@ handle_namespace_attrs (tree ns, tree attributes)
 /* Push into the scope of the NAME namespace.  If NAME is NULL_TREE, then we
select a name that is unique to this compilation unit.  */
 
-void
+bool
 push_namespace (tree name)
 {
   tree d = NULL_TREE;
@@ -3759,7 +3759,11 @@ push_namespace (tree name)
TREE_PUBLIC (d) = 0;
   else
TREE_PUBLIC (d) = 1;
-  pushdecl (d);
+  if (pushdecl (d) == error_mark_node)
+   {
+ timevar_cond_stop (TV_NAME_LOOKUP, subtime);
+ return false;
+   }
   if (anon)
{
  /* Clear DECL_NAME for the benefit of debugging back ends.  */
@@ -3777,6 +3781,7 @@ push_namespace (tree name)
   current_namespace = d;
 
   timevar_cond_stop (TV_NAME_LOOKUP, subtime);
+  return true;
 }
 
 /* Pop from the scope of the current namespace.  */
Index: name-lookup.h
===
--- name-lookup.h   (revision 236809)
+++ name-lookup.h   (working copy)
@@ -312,7 +312,7 @@ extern tree push_inner_scope (tree);
 extern void pop_inner_scope (tree, tree);
 extern void push_binding_level (cp_binding_level *);
 
-extern void push_namespace (tree);
+extern bool push_namespace (tree);
 extern void pop_namespace (void);
 extern void push_nested_namespace (tree);
 extern void pop_nested_namespace (tree);
Index: parser.c
===
--- parser.c(revision 236809)
+++ parser.c(working copy)
@@ -17549,7 +17549,11 @@ cp_parser_namespace_definition (cp_parser* parser)
 }
 
   /* Start the namespace.  */
-  push_namespace (identifier);
+  if (!push_namespace (identifier))
+{
+  cp_parser_skip_to_end_of_block_or_statement (parser);
+  return;
+}
 
   /* Parse any nested namespace definition. */
   if (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE))
Index: name-lookup.c
===
--- name-lookup.c   (revision 236799)
+++ name-lookup.c   (working copy)
@@ -3685,7 +3685,7 @@ handle_namespace_attrs (tree ns, tree attributes)
 /* Push into the scope of the NAME namespace.  If NAME is NULL_TREE, then we
select a name that is unique to this compilation unit.  */
 
-void
+bool
 push_namespace (tree name)
 {
   tree d = NULL_TREE;
@@ -3759,7 +3759,11 @@ push_namespace (tree name)
TREE_PUBLIC (d) = 0;
   else
TREE_PUBLIC (d) = 1;
-  pushdecl (d);
+  if (pushdecl (d) == error_mark_node)
+   {
+ timevar_cond_stop (TV_NAME_LOOKUP, subtime);
+ return false;
+   }
   if (anon)
{
  /* Clear DECL_NAME for the benefit of debugging back ends.  */
@@ -3777,6 +3781,7 @@ push_namespace (tree name)
   current_namespace = d;
 
   timevar_cond_stop (TV_NAME_LOOKUP, subtime);
+  return true;
 }
 
 /* Pop from the scope of the current namespace.  */
Index: name-lookup.h
===
--- name-lookup.h   (revision 236799)
+++ name-lookup.h   (working copy)
@@ -312,7 +312,7 @@ extern tree push_inner_scope (tree);
 extern void pop_inner_scope (tree, tree);
 extern void push_binding_level (cp_binding_level *);
 
-exte

Re: [C++ PATCH] PR c++/69855

2016-05-27 Thread Jason Merrill
OK, thanks.

Jason


On Fri, May 27, 2016 at 10:43 AM, Ville Voutilainen
 wrote:
> On 20 May 2016 at 07:05, Ville Voutilainen  
> wrote:
>> On 19 May 2016 at 19:40, Jason Merrill  wrote:
>>> Any thoughts on doing something similar for extern variable declarations?
>>
>> Ah, we diagnose local extern variable declarations that clash with
>> previous declarations,
>> but we don't diagnose cases where a subsequent declaration clashes
>> with a previous
>> local extern declaration. I'll take a look.
>
> As discussed on irc, this requires teaching variable declarations to
> work with DECL_ANTICIPATED
> and is thus some amounts of surgery, so the recommendation was to go
> ahead with this patch.
> I added a comment to the new code block, an updated patch attached.
> Changelog as before.
> Ok for trunk?


[gomp4] backport gfc_match_omp_clauses restructuring changes

2016-05-27 Thread Cesar Philippidis
This patch backports the gfc_match_omp_clauses restructuring changes
that occurred early this month in trunk to gomp-4_0-branch. Now it's
easier to detect which of our local changes in gomp4 are not present in
trunk yet.

Cesar
2016-05-27  Cesar Philippidis  

	Backport trunk r235922:
	2016-05-05  Jakub Jelinek  

	* openmp.c (gfc_match_omp_clauses): Restructuralize, so that clause
	parsing is done in a big switch based on gfc_peek_ascii_char and
	individual clauses under their first letters are sorted too.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 5916df3..a2a0e4b 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -646,712 +646,779 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
   needs_space = false;
   first = false;
   gfc_gobble_whitespace ();
-  if ((mask & OMP_CLAUSE_ASYNC) && !c->async)
-	if (gfc_match ("async") == MATCH_YES)
-	  {
-	c->async = true;
-	needs_space = false;
-	if (gfc_match (" ( %e )", &c->async_expr) != MATCH_YES)
-	  {
-		c->async_expr = gfc_get_constant_expr (BT_INTEGER,
-		   gfc_default_integer_kind,
-		  &gfc_current_locus);
-		mpz_set_si (c->async_expr->value.integer, GOMP_ASYNC_NOVAL);
-	  }
-	continue;
-	  }
-  if ((mask & OMP_CLAUSE_GANG) && !c->gang)
-	if (gfc_match ("gang") == MATCH_YES)
-	  {
-	c->gang = true;
-	if (match_oacc_clause_gang(c) == MATCH_YES)
+  bool end_colon;
+  gfc_omp_namelist **head;
+  old_loc = gfc_current_locus;
+  char pc = gfc_peek_ascii_char ();
+  switch (pc)
+	{
+	case 'a':
+	  end_colon = false;
+	  head = NULL;
+	  if ((mask & OMP_CLAUSE_ALIGNED)
+	  && gfc_match_omp_variable_list ("aligned (",
+	  &c->lists[OMP_LIST_ALIGNED],
+	  false, &end_colon,
+	  &head) == MATCH_YES)
+	{
+	  gfc_expr *alignment = NULL;
+	  gfc_omp_namelist *n;
+
+	  if (end_colon && gfc_match (" %e )", &alignment) != MATCH_YES)
+		{
+		  gfc_free_omp_namelist (*head);
+		  gfc_current_locus = old_loc;
+		  *head = NULL;
+		  break;
+		}
+	  for (n = *head; n; n = n->next)
+		if (n->next && alignment)
+		  n->expr = gfc_copy_expr (alignment);
+		else
+		  n->expr = alignment;
+	  continue;
+	}
+	  if ((mask & OMP_CLAUSE_ASYNC)
+	  && !c->async
+	  && gfc_match ("async") == MATCH_YES)
+	{
+	  c->async = true;
 	  needs_space = false;
-	else
+	  if (gfc_match (" ( %e )", &c->async_expr) != MATCH_YES)
+		{
+		  c->async_expr
+		= gfc_get_constant_expr (BT_INTEGER,
+	 gfc_default_integer_kind,
+	 &gfc_current_locus);
+		  mpz_set_si (c->async_expr->value.integer, GOMP_ASYNC_NOVAL);
+		}
+	  continue;
+	}
+	  if ((mask & OMP_CLAUSE_AUTO)
+	  && !c->par_auto
+	  && gfc_match ("auto") == MATCH_YES)
+	{
+	  c->par_auto = true;
 	  needs_space = true;
+	  continue;
+	}
+	  break;
+	case 'b':
+	  if ((mask && OMP_CLAUSE_BIND) && c->routine_bind == NULL
+	  && gfc_match ("bind ( %s )", &c->routine_bind) == MATCH_YES)
+	{
+	  c->bind = 1;
+	  continue;
+	}
+	  break;
+	case 'c':
+	  if ((mask & OMP_CLAUSE_COLLAPSE)
+	  && !c->collapse)
+	{
+	  gfc_expr *cexpr = NULL;
+	  match m = gfc_match ("collapse ( %e )", &cexpr);
+
+	  if (m == MATCH_YES)
+		{
+		  int collapse;
+		  const char *p = gfc_extract_int (cexpr, &collapse);
+		  if (p)
+		{
+		  gfc_error_now (p);
+		  collapse = 1;
+		}
+		  else if (collapse <= 0)
+		{
+		  gfc_error_now ("COLLAPSE clause argument not"
+ " constant positive integer at %C");
+		  collapse = 1;
+		}
+		  c->collapse = collapse;
+		  gfc_free_expr (cexpr);
+		  c->acc_collapse = 1;
+		  continue;
+		}
+	}
+	  if ((mask & OMP_CLAUSE_COPY)
+	  && gfc_match ("copy ( ") == MATCH_YES
+	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
+	   OMP_MAP_FORCE_TOFROM))
 	continue;
-	  }
-  if ((mask & OMP_CLAUSE_WORKER) && !c->worker)
-	if (gfc_match ("worker") == MATCH_YES)
-	  {
-	c->worker = true;
-	if (gfc_match (" ( num : %e )", &c->worker_expr) == MATCH_YES
-	|| gfc_match (" ( %e )", &c->worker_expr) == MATCH_YES)
-	  needs_space = false;
-	else
-	  needs_space = true;
+	  if (mask & OMP_CLAUSE_COPYIN)
+	{
+	  if (openacc)
+		{
+		  if (gfc_match ("copyin ( ") == MATCH_YES
+		  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
+		   OMP_MAP_FORCE_TO))
+		continue;
+		}
+	  else if (gfc_match_omp_variable_list ("copyin (",
+		&c->lists[OMP_LIST_COPYIN],
+		true) == MATCH_YES)
+		continue;
+	}
+	  if ((mask & OMP_CLAUSE_COPYOUT)
+	  && gfc_match ("copyout ( ") == MATCH_YES
+	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
+	   OMP_MAP_FORCE_FROM))
 	continue;
-	  }
-  if ((mask & OMP_CLAUSE_VECTOR_LENGTH) && c->vector_length_expr == NULL
-	  && gfc_match ("vector_length ( %

Re: [C++ PATCH] PR c++/69855

2016-05-27 Thread Ville Voutilainen
On 20 May 2016 at 07:05, Ville Voutilainen  wrote:
> On 19 May 2016 at 19:40, Jason Merrill  wrote:
>> Any thoughts on doing something similar for extern variable declarations?
>
> Ah, we diagnose local extern variable declarations that clash with
> previous declarations,
> but we don't diagnose cases where a subsequent declaration clashes
> with a previous
> local extern declaration. I'll take a look.

As discussed on irc, this requires teaching variable declarations to
work with DECL_ANTICIPATED
and is thus some amounts of surgery, so the recommendation was to go
ahead with this patch.
I added a comment to the new code block, an updated patch attached.
Changelog as before.
Ok for trunk?
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index eb128db..568c75e 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -929,6 +929,24 @@ pushdecl_maybe_friend_1 (tree x, bool is_friend)
  DECL_ANTICIPATED (t) = 1;
  DECL_HIDDEN_FRIEND_P (t) = 1;
}
+
+ if (TREE_CODE (x) == FUNCTION_DECL
+ && DECL_LOCAL_FUNCTION_P (x)
+ && !DECL_OMP_DECLARE_REDUCTION_P (x)
+ && !type_dependent_expression_p (x))
+   {
+ /* PR c++/69855, a local function declaration
+is stripped from template info and pushed to
+the local scope as a hidden declaration. This
+allows ill-formed overloads even in other scopes
+to be diagnosed both at the local declaration site
+and after it.  */
+ tree t2 = copy_decl (t);
+ DECL_USE_TEMPLATE (t2) = 0;
+ DECL_TEMPLATE_INFO (t2) = NULL_TREE;
+ DECL_ANTICIPATED (t2) = 1;
+ push_overloaded_decl (t2, PUSH_GLOBAL, is_friend);
+   }
}
 
   if (t != x || DECL_FUNCTION_TEMPLATE_P (t))
diff --git a/gcc/testsuite/g++.dg/overload/69855.C 
b/gcc/testsuite/g++.dg/overload/69855.C
new file mode 100644
index 000..dc2d733
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/69855.C
@@ -0,0 +1,44 @@
+// PR c++/69855
+// { dg-do compile }
+
+int get();
+void f() {
+  char get(); // { dg-error "ambiguating" }
+}
+
+int get2();
+char get2(int);
+void f2() {
+  char get2(); // { dg-error "ambiguating" }
+}
+
+char get3(int);
+void f3() {
+  char get3();
+}
+
+void f4() {
+  char get4();
+}
+int get4(); // { dg-error "ambiguating" }
+
+void get5();
+
+template  struct X
+{
+  void g()
+  {
+int get5(); // { dg-error "ambiguating" }
+  }
+};
+
+
+template  struct X2
+{
+  void g()
+  {
+int get6();
+  }
+};
+
+void get6(); // { dg-error "ambiguating" }
diff --git a/gcc/testsuite/g++.old-deja/g++.law/missed-error2.C 
b/gcc/testsuite/g++.old-deja/g++.law/missed-error2.C
index 42f70ae..26ae87d 100644
--- a/gcc/testsuite/g++.old-deja/g++.law/missed-error2.C
+++ b/gcc/testsuite/g++.old-deja/g++.law/missed-error2.C
@@ -25,9 +25,10 @@ int main() {
foo(4, -37, 14.39, 14.38);
 }
 
-// 971006 we no longer give an error for this since we emit a hard error
-// about the declaration above
-static void foo(int i, int j, double x, double y) { 
+// 971006 we no longer gave an error for this since we emit a hard error
+// about the declaration above, but after the fix for PR c++/69855
+// this declaration emits a diagnostic again
+static void foo(int i, int j, double x, double y) { // { dg-error 
"extern|static" }
 
std::cout << "Max(int): " << max(i,j) << " Max(double): " <<
 max(x,y) << '\n';
diff --git a/gcc/testsuite/g++.old-deja/g++.pt/crash3.C 
b/gcc/testsuite/g++.old-deja/g++.pt/crash3.C
index 160cbe5..2ba61d9 100644
--- a/gcc/testsuite/g++.old-deja/g++.pt/crash3.C
+++ b/gcc/testsuite/g++.old-deja/g++.pt/crash3.C
@@ -10,7 +10,7 @@ public:
 }
 CVector g() const
 {
-   CVector v();
-   return v;
+   CVector v2();
+   return v2;
 }
 };


Re: Further refinement to -Wswitch-unreachable

2016-05-27 Thread Jason Merrill

On 05/26/2016 02:44 PM, Marek Polacek wrote:

+ if (gimple_code (stmt) == GIMPLE_TRY)
{
+ /* A compiler-generated cleanup or a user-written try block.
+Try to get the first statement in its try-block, for better
+location.  */
+ if ((seq = gimple_try_eval (stmt)))
+   stmt = gimple_seq_first_stmt (seq);


Should this loop?  If there are two variables declared, do we get two 
try blocks?


Jason



[PATCH1][PR71252] Fix missing swap to stmt_to_insert

2016-05-27 Thread Kugan Vivekanandarajah
Hi,

This fix the missing swap for stmt-to_insert. I tested this with the
attached test case which is not valid any more due to some other
commits. This I believe an is obvious fix and maybe the test case is
needed.

I am running bootstrap and regression testing on x86-64-linux gnu. Is
this OK for trunk if the testing is fine ?

Thanks,

Kugan


gcc/ChangeLog:

2016-05-28 Kugan Vivekanandarajah 

* tree-ssa-reassoc.c (swap_ops_for_binary_stmt): Fix swap such that

all fields including stmt_to_insert are swapped.

gcc/testsuite/ChangeLog:

2016-05-28 Kugan Vivekanandarajah 

* gcc.dg/tree-ssa/pr71252-2.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71252-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71252-2.c
index e69de29..e621d3e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71252-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71252-2.c
@@ -0,0 +1,9 @@
+/* PR middle-end/71252 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+unsigned a;
+int b, c;
+void fn1 ()
+{
+  b = a + c + 3 + c;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index c9ed679..d13be29 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -3763,25 +3763,13 @@ swap_ops_for_binary_stmt (vec ops,
   || (stmt && is_phi_for_stmt (stmt, oe3->op)
  && !is_phi_for_stmt (stmt, oe1->op)
  && !is_phi_for_stmt (stmt, oe2->op)))
-{
-  operand_entry temp = *oe3;
-  oe3->op = oe1->op;
-  oe3->rank = oe1->rank;
-  oe1->op = temp.op;
-  oe1->rank= temp.rank;
-}
+std::swap (*oe1, *oe3);
   else if ((oe1->rank == oe3->rank
&& oe2->rank != oe3->rank)
   || (stmt && is_phi_for_stmt (stmt, oe2->op)
   && !is_phi_for_stmt (stmt, oe1->op)
   && !is_phi_for_stmt (stmt, oe3->op)))
-{
-  operand_entry temp = *oe2;
-  oe2->op = oe1->op;
-  oe2->rank = oe1->rank;
-  oe1->op = temp.op;
-  oe1->rank = temp.rank;
-}
+std::swap (*oe1, *oe3);
 }
 
 /* If definition of RHS1 or RHS2 dominates STMT, return the later of those


[gomp4] fix bootstrap failure in oacc_loop_auto_partitions

2016-05-27 Thread Cesar Philippidis
This patch fixes a bootstrap failure that I encountered while
backporting the firstprivate subarray patch to gomp-4_0-branch. I've
applied it to gomp4.

Cesar
2016-05-27  Cesar Philippidis  

	* gcc/omp-low.c (oacc_loop_auto_partitions): Use boolean OR
	when comparing outer_assign adn loop->inner.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index dd8789d..200d331 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -19699,7 +19699,7 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
   noisy = false;
 #endif
 
-  if (assign && (!outer_assign | loop->inner))
+  if (assign && (!outer_assign || loop->inner))
 {
   /* Allocate outermost and non-innermost loops at the outermost
 	 non-innermost available level.  */


Re: Enable loop peeling at -O3

2016-05-27 Thread Jan Hubicka
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-loop-ivcanon" } */
> 
> This should probably be -fdump-tree-ivcanon-details.

Yep, I updated the testcaes in my tree.
> 
> > +struct foo {int b; int a[3];} foo;
> > +void add(struct foo *a,int l)
> > +{
> > +  int i;
> > +  for (i=0;i > +a->a[i]++;
> > +}
> > +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 
> > "ivcanon"} } */
> > +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */
> 
> And here scan-tree-dump-times.  But even with that the testcases don't pass 
> for
> me.
It is because the unrolling happens in cunroll. 
> 
> > Index: testsuite/gcc.dg/tree-ssa/peel2.c
> > ===
> > --- testsuite/gcc.dg/tree-ssa/peel2.c   (revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/peel2.c   (working copy)
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fpeel-all-loops -fdump-tree-loop-ivcanon" } */
> > +void add(int *a,int l)
> > +{
> > +  int i;
> > +  for (i=0;i > +a[i]++;
> > +}
> > +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 
> > "ivcanon"} } */
> 
> How do you determine "3 times"?  Isn't something missing here?

It is bogus, I meant to only that that loop gets unrolled.  I should have 
re-run the tests
after changing them :)
I will send updated patch shortly

Honza


Re: [AArch64, 4/6] Reimplement frsqrts intrinsics

2016-05-27 Thread Jiong Wang



On 27/05/16 14:25, James Greenhalgh wrote:

On Tue, May 24, 2016 at 09:23:53AM +0100, Jiong Wang wrote:

Similar as [3/6], these intrinsics were implemented before the instruction
pattern "aarch64_rsqrts" added, that these intrinsics were implemented
through inline assembly.

This mirgrate the implementation to builtin.

gcc/
2016-05-23  Jiong Wang 

 * config/aarch64/aarch64-builtins.def (rsqrts): New builtins
for modes
 VALLF.
 * config/aarch64/aarch64-simd.md (aarch64_rsqrts_3):
Rename to
"aarch64_rsqrts".
 * config/aarch64/aarch64.c (get_rsqrts_type): Update gen* name.
 * config/aarch64/arm_neon.h (vrsqrtss_f32): Remove inline
assembly.  Use
builtin.
 (vrsqrtsd_f64): Likewise.
 (vrsqrts_f32): Likewise.
 (vrsqrtsq_f32): Likewise.
 (vrsqrtsq_f64): Likewise.

This ChangeLog format is incorrect.

It looks like you're missing vrsqrts_f64, could you please add that?


I haven't found vrsqrts_f64 before this rewrite patch.



Re: Enable loop peeling at -O3

2016-05-27 Thread Marek Polacek
On Fri, May 27, 2016 at 03:19:29PM +0200, Jan Hubicka wrote:
> Hi,
> this patch enabled -fpeel-loops by default at -O3 and makes it to use likely
> upper bound estimates.  The patch also adds -fpeel-all-loops flag that is
> symmetric to -funroll-all-loops.  Long time ago we used to interpret
> -fpeel-loops this way and blindly peel every loop but this behaviour got lost
> and now we only peel loop we have some evidence for.
> 
> Bootstrapped/regtested x86_64-linux, I am retesting after last minute change
> (adding of the testcase). OK?
> 
> Honza
> 
>   * common.opt (flag_peel_all_loops): New option.
>   * doc/invoke.texi: (-fpeel-loops): Update documentation.
>   (-fpeel-all-loops): Document.
>   * opts.c (default_options): Add OPT_fpeel_loops to -O3+.
>   * toplev.c (process_options): flag_peel_all_loops implies
>   flag_peel_loops.
>   * tree-ssa-lop-ivcanon.c (try_peel_loop): Update comment; handle
>   -fpeel-all-loops, use likely estimates.
> 
>   * gcc.dg/tree-ssa/peel1.c: New testcase.
>   * gcc.dg/tree-ssa/peel2.c: New testcase.
> Index: common.opt
> ===
> --- common.opt(revision 236815)
> +++ common.opt(working copy)
> @@ -1840,6 +1840,10 @@ fpeel-loops
>  Common Report Var(flag_peel_loops) Optimization
>  Perform loop peeling.
>  
> +fpeel-all-loops
> +Common Report Var(flag_peel_all_loops) Optimization
> +Perform loop peeling of all loops.
> +
>  fpeephole
>  Common Report Var(flag_no_peephole,0) Optimization
>  Enable machine specific peephole optimizations.
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi   (revision 236815)
> +++ doc/invoke.texi   (working copy)
> @@ -8661,10 +8661,17 @@ the loop is entered.  This usually makes
>  @item -fpeel-loops
>  @opindex fpeel-loops
>  Peels loops for which there is enough information that they do not
> -roll much (from profile feedback).  It also turns on complete loop peeling
> -(i.e.@: complete removal of loops with small constant number of iterations).
> +roll much (from profile feedback or static analysis).  It also turns on
> +complete loop peeling (i.e.@: complete removal of loops with small constant
> +number of iterations).
>  
> -Enabled with @option{-fprofile-use}.
> +Enabled with @option{-O3} and @option{-fprofile-use}.
> +
> +@item -fpeel-all-loops
> +@opindex fpeel-all-loops
> +Peel all loops, even if their number of iterations is uncertain when
> +the loop is entered.  For loops with large number of iterations this leads
> +to wasted code size.
>  
>  @item -fmove-loop-invariants
>  @opindex fmove-loop-invariants
> Index: opts.c
> ===
> --- opts.c(revision 236815)
> +++ opts.c(working copy)
> @@ -535,6 +535,7 @@ static const struct default_options defa
>  { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, 
> VECT_COST_MODEL_DYNAMIC },
>  { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
>  { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
> +{ OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 },
>  
>  /* -Ofast adds optimizations to -O3.  */
>  { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
> Index: testsuite/gcc.dg/tree-ssa/peel1.c
> ===
> --- testsuite/gcc.dg/tree-ssa/peel1.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/peel1.c (working copy)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-loop-ivcanon" } */

This should probably be -fdump-tree-ivcanon-details.

> +struct foo {int b; int a[3];} foo;
> +void add(struct foo *a,int l)
> +{
> +  int i;
> +  for (i=0;i +a->a[i]++;
> +}
> +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 
> "ivcanon"} } */
> +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */

And here scan-tree-dump-times.  But even with that the testcases don't pass for
me.

> Index: testsuite/gcc.dg/tree-ssa/peel2.c
> ===
> --- testsuite/gcc.dg/tree-ssa/peel2.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/peel2.c (working copy)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fpeel-all-loops -fdump-tree-loop-ivcanon" } */
> +void add(int *a,int l)
> +{
> +  int i;
> +  for (i=0;i +a[i]++;
> +}
> +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 
> "ivcanon"} } */

How do you determine "3 times"?  Isn't something missing here?

> +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */
> Index: toplev.c
> ===
> --- toplev.c  (revision 236815)
> +++ toplev.c  (working copy)
> @@ -1294,6 +1294,9 @@ process_options (void)
>if (flag_unroll_all_loops)
>  flag_unroll_loops = 1;
>  
> +  if (flag_pee

[PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-05-27 Thread Kyrill Tkachov

Hi all,

As mentioned in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00297.html, 
frename-registers registers can be beneficial for aarch64
and the patch at https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01618.html 
resolves the AESE/AESMC fusion issue that it exposed
in the aarch64 backend. So this patch enables the pass for aarch64 at -O2 and 
above.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

P.S. Why is the table holding this information called 
aarch_option_optimization_table rather than
aarch64_option_optimization_table ?

2016-05-27  Kyrylo Tkachov  

* common/config/aarch64/aarch64-common.c
(aarch64_option_optimization_table): Enable -frename-registers at
-O2 and higher.
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 08e795934207d015d9fa22c3822930af4a21c93a..91801df731471f1842802370497e498fda62098a 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -50,6 +50,8 @@ static const struct default_options aarch_option_optimization_table[] =
 { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
 /* Enable redundant extension instructions removal at -O2 and higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
+/* Enable the register renaming pass at -O2 and higher.  */
+{ OPT_LEVELS_2_PLUS, OPT_frename_registers, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 


Re: [AArch64, 5/6] Reimplement fabd intrinsics & merge rtl patterns

2016-05-27 Thread Jiong Wang



On 27/05/16 14:31, James Greenhalgh wrote:

On Tue, May 24, 2016 at 09:23:58AM +0100, Jiong Wang wrote:

These intrinsics were implemented before "fabd_3" introduces.
Meanwhile
the patterns "fabd_3" and "*fabd_scalar3" can be merged into a
single "fabd3" using VALLF.

This patch migrate the implementation to builtins backed by this pattern.

gcc/
2016-05-23  Jiong Wang 

 * config/aarch64/aarch64-builtins.def (fabd): New builtins
for modes
 VALLF.
 * config/aarch64/aarch64-simd.md (fabd_3): Extend
modes from VDQF
 to VALLF.
 "*fabd_scalar3): Delete.
 * config/aarch64/arm_neon.h (vabds_f32): Remove inline assembly.
 Use builtin.
 (vabdd_f64): Likewise.
 (vabd_f32): Likewise.
 (vabdq_f32): Likewise.
 (vabdq_f64): Likewise.


This ChangeLog format is wrong.

It looks like you've missed vabd_f64, could you please add that?


vabd_f64 is not there before this patch. so I haven't touched it.




Re: [AArch64, 6/6] Reimplement vpadd intrinsics & extend rtl patterns to all modes

2016-05-27 Thread Jiong Wang



On 27/05/16 14:42, James Greenhalgh wrote:

On Tue, May 24, 2016 at 09:24:03AM +0100, Jiong Wang wrote:

These intrinsics was implemented by inline assembly using "faddp"
instruction.
There was a pattern "aarch64_addpv4sf" which supportsV4SF mode only
while we can
extend this pattern to support VDQF mode, then we can reimplement these
intrinsics through builtlins.

gcc/
2016-05-23  Jiong Wang 

 * config/aarch64/aarch64-builtins.def (faddp): New builtins
for modes in VDQF.
 * config/aarch64/aarch64-simd.md (aarch64_faddp): New.
 (arch64_addpv4sf): Delete.
 (reduc_plus_scal_v4sf): Use "gen_aarch64_faddpv4sf" instead of
 "gen_aarch64_addpv4sf".
 * gcc/config/aarch64/iterators.md (UNSPEC_FADDP): New.
 * config/aarch64/arm_neon.h (vpadd_f32): Remove inline
assembly.  Use
 builtin.
 (vpaddq_f32): Likewise.
 (vpaddq_f64): Likewise.

This ChangeLog format is incorrect.

You've missed vpaddd_f64 and vpadds_f32, could you add those?


vpaddd_f64 is already there without inline assembly.


This patch cleans up those intrinsics with symmetric vector input and 
output.
vpadds_f32 looks to me is doing reduce job the return value is scalar 
instead of vector thus
can't fit well by the touched pattern. I can clean it up with a seperate 
patch. Is this OK?





Thanks,
James





Re: [PATCH][ARM] Tie operand 1 to operand 0 in AESMC pattern when fusing AES/AESMC

2016-05-27 Thread Kyrill Tkachov


On 20/05/16 11:04, Kyrill Tkachov wrote:

Hi all,

The recent -frename-registers change exposed a deficiency in the way we fuse 
AESE/AESMC instruction
pairs in arm.

Basically we want to enforce:
AESE Vn, _
AESMC Vn, Vn

to enable the fusion, but regrename comes along and renames the output Vn 
register in AESMC to something
else, killing the fusion in the hardware.

The solution in this patch is to add an alternative that ties the input and 
output registers in the AESMC pattern
and enable that alternative when the fusion is enabled.

With this patch I've confirmed that the above preferred register sequence is 
kept even with -frename-registers
when tuning for a cpu that enables the fusion and that the chain is broken by 
regrename otherwise and have
seen the appropriate improvement in a proprietary benchmark (that I cannot 
name) that exercises this sequence.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?



Following James's feedback on the AArch64 version, this slightly modified 
version uses the enum type for the argument of the new function.
Is this ok instead?

Thanks,
Kyrill

2016-05-27  Kyrylo Tkachov  

* config/arm/arm.c (arm_fusion_enabled_p): New function.
* config/arm/arm-protos.h (arm_fusion_enabled_p): Declare prototype.
* config/arm/crypto.md (crypto_, CRYPTO_UNARY):
Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cf221d6793eaf0959f2713fe0903a5d8602ec2f4..12a781de13f2f7816cc2b16b04835d87c83f7abb 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -320,6 +320,7 @@ extern int vfp3_const_double_for_bits (rtx);
 
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
 	   rtx);
+extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
 extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
 #endif /* RTX_CODE */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5110d9e989d605a9e2c262e6007b89a1c7dc7080..39a24c06c123b86883134368ef39794abf11898b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29704,6 +29704,13 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* curr)
   return false;
 }
 
+/* Return true iff the instruction fusion described by OP is enabled.  */
+bool
+arm_fusion_enabled_p (tune_params::fuse_ops op)
+{
+  return current_tune->fusible_ops & op;
+}
+
 /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
 
 static unsigned HOST_WIDE_INT
diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index c6f17270b1dbaf6dc43eb1e9b8a182dbb0f5a1e1..0f510f069408471fcbf6751f161e984f39929813 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -18,14 +18,27 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; .
 
+
+;; When AES/AESMC fusion is enabled we want the register allocation to
+;; look like:
+;;AESE Vn, _
+;;AESMC Vn, Vn
+;; So prefer to tie operand 1 to operand 0 when fusing.
+
 (define_insn "crypto_"
-  [(set (match_operand: 0 "register_operand" "=w")
+  [(set (match_operand: 0 "register_operand" "=w,w")
 (unspec: [(match_operand: 1
-   "register_operand" "w")]
+   "register_operand" "0,w")]
  CRYPTO_UNARY))]
   "TARGET_CRYPTO"
   ".\\t%q0, %q1"
-  [(set_attr "type" "")]
+  [(set_attr "type" "")
+   (set_attr_alternative "enabled"
+ [(if_then_else (match_test
+		   "arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)")
+		 (const_string "yes" )
+		 (const_string "no"))
+  (const_string "yes")])]
 )
 
 (define_insn "crypto_"


Re: [AArch64, 6/6] Reimplement vpadd intrinsics & extend rtl patterns to all modes

2016-05-27 Thread James Greenhalgh
On Tue, May 24, 2016 at 09:24:03AM +0100, Jiong Wang wrote:
> These intrinsics was implemented by inline assembly using "faddp"
> instruction.
> There was a pattern "aarch64_addpv4sf" which supportsV4SF mode only
> while we can
> extend this pattern to support VDQF mode, then we can reimplement these
> intrinsics through builtlins.
> 
> gcc/
> 2016-05-23  Jiong Wang 
> 
> * config/aarch64/aarch64-builtins.def (faddp): New builtins
> for modes in VDQF.
> * config/aarch64/aarch64-simd.md (aarch64_faddp): New.
> (arch64_addpv4sf): Delete.
> (reduc_plus_scal_v4sf): Use "gen_aarch64_faddpv4sf" instead of
> "gen_aarch64_addpv4sf".
> * gcc/config/aarch64/iterators.md (UNSPEC_FADDP): New.
> * config/aarch64/arm_neon.h (vpadd_f32): Remove inline
> assembly.  Use
> builtin.
> (vpaddq_f32): Likewise.
> (vpaddq_f64): Likewise.

This ChangeLog format is incorrect.

You've missed vpaddd_f64 and vpadds_f32, could you add those?

Thanks,
James



Re: [PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled

2016-05-27 Thread Kyrill Tkachov


On 26/05/16 11:17, James Greenhalgh wrote:

On Fri, May 20, 2016 at 11:04:32AM +0100, Kyrill Tkachov wrote:

Hi all,

The recent -frename-registers change exposed a deficiency in the way we fuse
AESE/AESMC instruction pairs in aarch64.

Basically we want to enforce:
 AESE Vn, _
 AESMC Vn, Vn

to enable the fusion, but regrename comes along and renames the output Vn
register in AESMC to something else, killing the fusion in the hardware.

The solution in this patch is to add an alternative that ties the input and
output registers in the AESMC pattern and enable that alternative when the
fusion is enabled.

With this patch I've confirmed that the above preferred register sequence is
kept even with -frename-registers when tuning for a cpu that enables the
fusion and that the chain is broken by regrename otherwise and have seen the
appropriate improvement in a proprietary benchmark (that I cannot name) that
exercises this sequence.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-05-20  Kyrylo Tkachov  

 * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
 * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
 prototype.
 * config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi):
 Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
  bool aarch64_expand_movmem (rtx *);
  bool aarch64_float_const_zero_rtx_p (rtx);
  bool aarch64_function_arg_regno_p (unsigned);
+bool aarch64_fusion_enabled_p (unsigned int);

This argument type should be "enum aarch64_fusion_pairs".


  bool aarch64_gen_movmemqi (rtx *);
  bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
  bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
return false;
  }
  
+/* Return true iff the instruction fusion described by OP is enabled.  */

+
+bool
+aarch64_fusion_enabled_p (unsigned int op)
+{
+  return (aarch64_tune_params.fusible_ops & op) != 0;
+}
+

A follow-up patch fixing the uses in aarch_macro_fusion_pair_p to use your
new function would be nice.

OK with the change to argument type.


Thanks, here is what I'm committing.
The patch to use the new function in aarch_macro_fusion_pair_p is in testing.

Kyrill


2016-05-27  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
* config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
prototype.
* config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi):
Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.


Thanks,
James



diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 98c81baad827f3c0b0c7054dd97d3d0989af3796..6e97c01cf768ec5ca2f18e795a8085b8a247f5b6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
+bool aarch64_fusion_enabled_p (enum aarch64_fusion_pairs);
 bool aarch64_gen_movmemqi (rtx *);
 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
 bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 59a578f5937a240b325af22021bbd662230ed404..8be69357eb086e288ae838dc536be8a2ebe0463b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5401,13 +5401,25 @@ (define_insn "aarch64_crypto_aesv16qi"
   [(set_attr "type" "crypto_aese")]
 )
 
+;; When AES/AESMC fusion is enabled we want the register allocation to
+;; look like:
+;;AESE Vn, _
+;;AESMC Vn, Vn
+;; So prefer to tie operand 1 to operand 0 when fusing.
+
 (define_insn "aarch64_crypto_aesv16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=w")
-	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "w")]
+  [(set (match_operand:V16QI 0 "register_operand" "=w,w")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0,w")]
 	 CRYPTO_AESMC))]
   "TARGET_SIMD && TARGET_CRYPTO"
   "aes\\t%0.16b, %1.16b"
-  [(set_attr "type" "crypto_aesmc")]
+  [(set_attr "type" "crypto_aesmc")
+   (set_attr_alternative "enabled"
+ [(if_then_else (match_test
+		 

Re: [AArch64, 5/6] Reimplement fabd intrinsics & merge rtl patterns

2016-05-27 Thread James Greenhalgh
On Tue, May 24, 2016 at 09:23:58AM +0100, Jiong Wang wrote:
> These intrinsics were implemented before "fabd_3" introduces.
> Meanwhile
> the patterns "fabd_3" and "*fabd_scalar3" can be merged into a
> single "fabd3" using VALLF.
> 
> This patch migrate the implementation to builtins backed by this pattern.
> 
> gcc/
> 2016-05-23  Jiong Wang 
> 
> * config/aarch64/aarch64-builtins.def (fabd): New builtins
> for modes
> VALLF.
> * config/aarch64/aarch64-simd.md (fabd_3): Extend
> modes from VDQF
> to VALLF.
> "*fabd_scalar3): Delete.
> * config/aarch64/arm_neon.h (vabds_f32): Remove inline assembly.
> Use builtin.
> (vabdd_f64): Likewise.
> (vabd_f32): Likewise.
> (vabdq_f32): Likewise.
> (vabdq_f64): Likewise.
> 

This ChangeLog format is wrong.

It looks like you've missed vabd_f64, could you please add that?

> From 9bafb58055d4e379df7b626acd6aa80bdb0d4b22 Mon Sep 17 00:00:00 2001
> From: "Jiong.Wang" 
> Date: Mon, 23 May 2016 12:12:53 +0100
> Subject: [PATCH 5/6] 5
> 
> ---
>  gcc/config/aarch64/aarch64-builtins.def |  3 ++
>  gcc/config/aarch64/aarch64-simd.md  | 23 +++--
>  gcc/config/aarch64/arm_neon.h   | 87 
> -
>  3 files changed, 42 insertions(+), 71 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-builtins.def 
> b/gcc/config/aarch64/aarch64-builtins.def
> index 1955d17..40baebe 100644
> --- a/gcc/config/aarch64/aarch64-builtins.def
> +++ b/gcc/config/aarch64/aarch64-builtins.def
> @@ -465,3 +465,6 @@
>  
>/* Implemented by aarch64_rsqrts.  */
>BUILTIN_VALLF (BINOP, rsqrts, 0)
> +
> +  /* Implemented by fabd_3.  */

This comment is incorrect, it should say "Implemented by fabd3.",
without the underscore.

> +  BUILTIN_VALLF (BINOP, fabd, 3)

Thanks,
James



Re: [AArch64, 3/6] Reimplement frsqrte intrinsics

2016-05-27 Thread James Greenhalgh
On Tue, May 24, 2016 at 09:23:48AM +0100, Jiong Wang wrote:
> These intrinsics were implemented before the instruction pattern
> "aarch64_rsqrte" added, that these intrinsics were implemented through
> inline assembly.
> 
> This mirgrate the implementation to builtin.
> 
> gcc/
> 2016-05-23  Jiong Wang 
> 
> * config/aarch64/aarch64-builtins.def (rsqrte): New builtins
> for modes
> VALLF.
> * config/aarch64/aarch64-simd.md (aarch64_rsqrte_2):
> Rename to
> "aarch64_rsqrte".
> * config/aarch64/aarch64.c (get_rsqrte_type): Update gen* name.
> * config/aarch64/arm_neon.h (vrsqrts_f32): Remove inline
> assembly.  Use
> builtin.
> (vrsqrted_f64): Likewise.
> (vrsqrte_f32): Likewise.
> (vrsqrteq_f32): Likewise.
> (vrsqrteq_f64): Likewise.

This ChangeLog is not in the correct form. 

It looks like you are missing vrsqrte_f64, could you please add that?

Thanks,
James



Re: [AArch64, 4/6] Reimplement frsqrts intrinsics

2016-05-27 Thread James Greenhalgh
On Tue, May 24, 2016 at 09:23:53AM +0100, Jiong Wang wrote:
> Similar as [3/6], these intrinsics were implemented before the instruction
> pattern "aarch64_rsqrts" added, that these intrinsics were implemented
> through inline assembly.
> 
> This mirgrate the implementation to builtin.
> 
> gcc/
> 2016-05-23  Jiong Wang 
> 
> * config/aarch64/aarch64-builtins.def (rsqrts): New builtins
> for modes
> VALLF.
> * config/aarch64/aarch64-simd.md (aarch64_rsqrts_3):
> Rename to
> "aarch64_rsqrts".
> * config/aarch64/aarch64.c (get_rsqrts_type): Update gen* name.
> * config/aarch64/arm_neon.h (vrsqrtss_f32): Remove inline
> assembly.  Use
> builtin.
> (vrsqrtsd_f64): Likewise.
> (vrsqrts_f32): Likewise.
> (vrsqrtsq_f32): Likewise.
> (vrsqrtsq_f64): Likewise.

This ChangeLog format is incorrect.

It looks like you're missing vrsqrts_f64, could you please add that?

Thanks,
James



Enable loop peeling at -O3

2016-05-27 Thread Jan Hubicka
Hi,
this patch enabled -fpeel-loops by default at -O3 and makes it to use likely
upper bound estimates.  The patch also adds -fpeel-all-loops flag that is
symmetric to -funroll-all-loops.  Long time ago we used to interpret
-fpeel-loops this way and blindly peel every loop but this behaviour got lost
and now we only peel loop we have some evidence for.

Bootstrapped/regtested x86_64-linux, I am retesting after last minute change
(adding of the testcase). OK?

Honza

* common.opt (flag_peel_all_loops): New option.
* doc/invoke.texi: (-fpeel-loops): Update documentation.
(-fpeel-all-loops): Document.
* opts.c (default_options): Add OPT_fpeel_loops to -O3+.
* toplev.c (process_options): flag_peel_all_loops implies
flag_peel_loops.
* tree-ssa-lop-ivcanon.c (try_peel_loop): Update comment; handle
-fpeel-all-loops, use likely estimates.

* gcc.dg/tree-ssa/peel1.c: New testcase.
* gcc.dg/tree-ssa/peel2.c: New testcase.
Index: common.opt
===
--- common.opt  (revision 236815)
+++ common.opt  (working copy)
@@ -1840,6 +1840,10 @@ fpeel-loops
 Common Report Var(flag_peel_loops) Optimization
 Perform loop peeling.
 
+fpeel-all-loops
+Common Report Var(flag_peel_all_loops) Optimization
+Perform loop peeling of all loops.
+
 fpeephole
 Common Report Var(flag_no_peephole,0) Optimization
 Enable machine specific peephole optimizations.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236815)
+++ doc/invoke.texi (working copy)
@@ -8661,10 +8661,17 @@ the loop is entered.  This usually makes
 @item -fpeel-loops
 @opindex fpeel-loops
 Peels loops for which there is enough information that they do not
-roll much (from profile feedback).  It also turns on complete loop peeling
-(i.e.@: complete removal of loops with small constant number of iterations).
+roll much (from profile feedback or static analysis).  It also turns on
+complete loop peeling (i.e.@: complete removal of loops with small constant
+number of iterations).
 
-Enabled with @option{-fprofile-use}.
+Enabled with @option{-O3} and @option{-fprofile-use}.
+
+@item -fpeel-all-loops
+@opindex fpeel-all-loops
+Peel all loops, even if their number of iterations is uncertain when
+the loop is entered.  For loops with large number of iterations this leads
+to wasted code size.
 
 @item -fmove-loop-invariants
 @opindex fmove-loop-invariants
Index: opts.c
===
--- opts.c  (revision 236815)
+++ opts.c  (working copy)
@@ -535,6 +535,7 @@ static const struct default_options defa
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 },
 
 /* -Ofast adds optimizations to -O3.  */
 { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
Index: testsuite/gcc.dg/tree-ssa/peel1.c
===
--- testsuite/gcc.dg/tree-ssa/peel1.c   (revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel1.c   (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-loop-ivcanon" } */
+struct foo {int b; int a[3];} foo;
+void add(struct foo *a,int l)
+{
+  int i;
+  for (i=0;ia[i]++;
+}
+/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 
"ivcanon"} } */
+/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */
Index: testsuite/gcc.dg/tree-ssa/peel2.c
===
--- testsuite/gcc.dg/tree-ssa/peel2.c   (revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel2.c   (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fpeel-all-loops -fdump-tree-loop-ivcanon" } */
+void add(int *a,int l)
+{
+  int i;
+  for (i=0;iinner)
 {
   if (dump_file)
@@ -969,12 +971,16 @@ try_peel_loop (struct loop *loop,
   /* Check if there is an estimate on the number of iterations.  */
   npeel = estimated_loop_iterations_int (loop);
   if (npeel < 0)
+npeel = likely_max_loop_iterations_int (loop);
+  if (npeel < 0 && flag_peel_all_loops)
+npeel = PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1;
+  if (npeel < 0)
 {
   if (dump_file)
 fprintf (dump_file, "Not peeling: number of iterations is not "
 "estimated\n");
   return false;
 }
   if (maxiter >= 0 && maxiter <= npeel)
 {
   if (dump_file)


Re: [PR71252][PR71269] Fix trunk errors due to stmt_to_insert

2016-05-27 Thread Richard Biener
On Fri, May 27, 2016 at 2:36 PM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
> On 27 May 2016 at 19:56, Richard Biener  wrote:
>> On Thu, May 26, 2016 at 11:32 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Jakub,
>>>
>>>
>>> On 26 May 2016 at 18:18, Jakub Jelinek  wrote:
 On Thu, May 26, 2016 at 02:17:56PM +1000, Kugan Vivekanandarajah wrote:
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -3767,8 +3767,10 @@ swap_ops_for_binary_stmt (vec ops,
>operand_entry temp = *oe3;
>oe3->op = oe1->op;
>oe3->rank = oe1->rank;
> +  oe3->stmt_to_insert = oe1->stmt_to_insert;
>oe1->op = temp.op;
>oe1->rank= temp.rank;
> +  oe1->stmt_to_insert = temp.stmt_to_insert;

 If you want to swap those 3 fields (what about the others?), can't you 
 write
   std::swap (oe1->op, oe3->op);
   std::swap (oe1->rank, oe3->rank);
   std::swap (oe1->stmt_to_insert, oe3->stmt_to_insert);
 instead and drop operand_entry temp = *oe3; ?

>  }
>else if ((oe1->rank == oe3->rank
>   && oe2->rank != oe3->rank)
> @@ -3779,8 +3781,10 @@ swap_ops_for_binary_stmt (vec ops,
>operand_entry temp = *oe2;
>oe2->op = oe1->op;
>oe2->rank = oe1->rank;
> +  oe2->stmt_to_insert = oe1->stmt_to_insert;
>oe1->op = temp.op;
>oe1->rank = temp.rank;
> +  oe1->stmt_to_insert = temp.stmt_to_insert;
>  }

 Similarly.
>>>
>>> Done. Revised patch attached.
>>
>> Your patch only adds a single testcase, please make sure to include
>> _all_ relevant testcases.
>>
>> The swap should simply swap the whole operand, thus
>>
>>  std::swap (*oe1, *oe3);
>>
>
> Thanks for the review.
>
> I will change this.
>
>> it was probably not updated when all the other fields were added.
>>
>> I don't like the find_insert_point changes or the change before
>
> If we insert the stmt_to_insert before the find_insert_point, we can
> end up inserting stmt_to_insert before its argument defining stmt.
> This can be seen with f951 cp2k_single_file.f90 -O3 -ffast-math
> -march=westmere from PR71252. I am attaching the CFG when all the
> insert_stmt_before_use are moved before.

Hmm, but then this effectively means we should have find_insert_point
for inserting to_insert stmts in the first place?  That is, in
insert_stmt_before_use use find_insert_point on the stmt_to_insert
ops?

> I dont understand Fortran and I am not able to reduce a testcase from this.
>
>> build_and_add_sum.
>> Why not move the if (stmt1) insert; if (stmt2) insert; before the if
>> () unconditionally?
>
> In this case also, we dont know where build_and_add_sum will insert
> the new instruction. It may not be stmts[i] before calling
> build_and_add_sum. Therefore we can end up inserting in a wrong place.
> testcase gfortran.dg/pr71252.f90 would ICE.

So split off build_and_add_sum_1 which does not do stmt insertion and
instead treat it like a to_insert stmt at this point (simply insert it
at stmts[i]
or using find_insert_point)?

>>
>> Do we make progress with just the rest of the changes?  If so please split 
>> the
>> patch and include relevant testcases.
>
> I think there are two issues.
> 1. the swap which is obvious
> 2. insertion poing which has some related changes and shows two
> different problems (listed above)
>
> if you prefer, I can send two patches for the above.

Yes please.

> Unfortunately, I am not able to reduce Fortran testcase. Any help from
> anyone is really appreciated.
>
>
> Thanks,
> Kugan
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Kugan


Re: [AArch64, 1/6] Reimplement scalar fixed-point intrinsics

2016-05-27 Thread James Greenhalgh
On Tue, May 24, 2016 at 09:23:36AM +0100, Jiong Wang wrote:
> This patch reimplement scalar intrinsics for conversion between floating-
> point and fixed-point.
> 
> Previously, all such intrinsics are implemented through inline assembly.
> This patch added RTL pattern for these operations that those intrinsics
> can be implemented through builtins.
> 
> gcc/
> 2016-05-23  Jiong Wang
> 
> * config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New
> (TYPES_BINOP_SUS): Likewise.
> (aarch64_simd_builtin_data): Update include file name.
> (aarch64_builtins): Likewise.
> * config/aarch64/aarch64-simd-builtins.def: Rename to
> aarch64-builtins.def.

Why? We already have some number of intrinsics in here that are not
strictly SIMD, but I don't see the value in the rename?

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 223a4cc..d463808 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -75,6 +75,8 @@
>  UNSPEC_CRC32H
>  UNSPEC_CRC32W
>  UNSPEC_CRC32X
> +UNSPEC_FCVTZS_SCALAR
> +UNSPEC_FCVTZU_SCALAR

Why do we need special "scalar" versions of the unspec? The operation is
semantically the same between the scalar and vector versions.

>  UNSPEC_URECPE
>  UNSPEC_FRECPE
>  UNSPEC_FRECPS
> @@ -105,6 +107,7 @@
>  UNSPEC_NOP
>  UNSPEC_PRLG_STK
>  UNSPEC_RBIT
> +UNSPEC_SCVTF_SCALAR
>  UNSPEC_SISD_NEG
>  UNSPEC_SISD_SSHL
>  UNSPEC_SISD_USHL
> @@ -122,6 +125,7 @@
>  UNSPEC_TLSLE24
>  UNSPEC_TLSLE32
>  UNSPEC_TLSLE48
> +UNSPEC_UCVTF_SCALAR

> +(define_int_iterator FCVT_F2FIXED [UNSPEC_FCVTZS UNSPEC_FCVTZU])
> +(define_int_iterator FCVT_FIXED2F [UNSPEC_SCVTF UNSPEC_UCVTF])
> +(define_int_iterator FCVT_F2FIXED_SCALAR [UNSPEC_FCVTZS_SCALAR 
> UNSPEC_FCVTZU_SCALAR])
> +(define_int_iterator FCVT_FIXED2F_SCALAR [UNSPEC_SCVTF_SCALAR 
> UNSPEC_UCVTF_SCALAR])

Again, do we need the "SCALAR" versions at all?

Thanks,
James



Re: [PATCH, ARM] Do not set ARM_ARCH_ISA_THUMB for armv5

2016-05-27 Thread Kyrill Tkachov


On 27/05/16 13:51, Thomas Preudhomme wrote:

On Tuesday 24 May 2016 18:00:27 Kyrill Tkachov wrote:

Hi Thomas,

Hi Kyrill,



+/* Nonzero if chip supports Thumb.  */
+extern int arm_arch_thumb;
+

Bit of bikeshedding really, but I think a better name would be
arm_arch_thumb1.
This is because we also have the macros TARGET_THUMB and TARGET_THUMB2
where TARGET_THUMB2 means either Thumb-1 or Thumb-2 and a casual reader
might think that arm_arch_thumb means that there is support for either.

Fixed.


Also, please add a simple test that compiles something with -march=armv5
(plus -marm) and checks that __ARM_ARCH_ISA_THUMB is not defined.

Fixed too.

Please find the updated in attachment. ChangeLog entries are now:

*** gcc/ChangeLog ***

2016-05-26  Thomas Preud'homme  

 * config/arm/arm-protos.h (arm_arch_thumb1): Declare.
 * config/arm/arm.c (arm_arch_thumb1): Define.
 (arm_option_override): Initialize arm_arch_thumb1.
 * config/arm/arm.h (arm_arch_thumb1): Declare.
 (TARGET_ARM_ARCH_ISA_THUMB): Use arm_arch_thumb to determine if target
 support Thumb-1 ISA.


*** gcc/testsuite/ChangeLog ***

2016-05-26  Thomas Preud'homme  

 * gcc.target/arm/armv5_thumb_isa.c: New test.



Ok.
Thanks,
Kyrill


Given the renaming I've redone the testing and confirmed that the patch still
works as intended.

Best regards,

Thomas




Re: [PATCH, ARM] Do not set ARM_ARCH_ISA_THUMB for armv5

2016-05-27 Thread Thomas Preudhomme
On Tuesday 24 May 2016 18:00:27 Kyrill Tkachov wrote:
> Hi Thomas,

Hi Kyrill,


> > 
> > +/* Nonzero if chip supports Thumb.  */
> > +extern int arm_arch_thumb;
> > +
> 
> Bit of bikeshedding really, but I think a better name would be
> arm_arch_thumb1.
> This is because we also have the macros TARGET_THUMB and TARGET_THUMB2
> where TARGET_THUMB2 means either Thumb-1 or Thumb-2 and a casual reader
> might think that arm_arch_thumb means that there is support for either.

Fixed.

> 
> Also, please add a simple test that compiles something with -march=armv5
> (plus -marm) and checks that __ARM_ARCH_ISA_THUMB is not defined.

Fixed too.

Please find the updated in attachment. ChangeLog entries are now:

*** gcc/ChangeLog ***

2016-05-26  Thomas Preud'homme  

* config/arm/arm-protos.h (arm_arch_thumb1): Declare.
* config/arm/arm.c (arm_arch_thumb1): Define.
(arm_option_override): Initialize arm_arch_thumb1.
* config/arm/arm.h (arm_arch_thumb1): Declare.
(TARGET_ARM_ARCH_ISA_THUMB): Use arm_arch_thumb to determine if target
support Thumb-1 ISA.


*** gcc/testsuite/ChangeLog ***

2016-05-26  Thomas Preud'homme  

* gcc.target/arm/armv5_thumb_isa.c: New test.


Given the renaming I've redone the testing and confirmed that the patch still 
works as intended.

Best regards,

Thomasdiff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d8179c441bb53dced94d2ebf497aad093e4ac600..34fd06a92d99cfcb7ece4da7f1a2957e0225e4fb 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -603,6 +603,9 @@ extern int arm_tune_cortex_a9;
interworking clean.  */
 extern int arm_cpp_interwork;
 
+/* Nonzero if chip supports Thumb 1.  */
+extern int arm_arch_thumb1;
+
 /* Nonzero if chip supports Thumb 2.  */
 extern int arm_arch_thumb2;
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index ad123dde991a3e4c4b9563ee6ebb84981767988f..1df676e7f844513c0b1b80be492965462557e25b 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -478,6 +478,9 @@ extern int arm_tune_cortex_a9;
interworking clean.  */
 extern int arm_cpp_interwork;
 
+/* Nonzero if chip supports Thumb 1.  */
+extern int arm_arch_thumb1;
+
 /* Nonzero if chip supports Thumb 2.  */
 extern int arm_arch_thumb2;
 
@@ -2191,9 +2194,8 @@ extern int making_const_table;
 #define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
 
 /* The highest Thumb instruction set version supported by the chip.  */
-#define TARGET_ARM_ARCH_ISA_THUMB 		\
-  (arm_arch_thumb2 ? 2\
-	   : ((TARGET_ARM_ARCH >= 5 || arm_arch4t) ? 1 : 0))
+#define TARGET_ARM_ARCH_ISA_THUMB		\
+  (arm_arch_thumb2 ? 2 : (arm_arch_thumb1 ? 1 : 0))
 
 /* Expands to an upper-case char of the target's architectural
profile.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 71b51439dc7ba5be67671e9fb4c3f18040cce58f..2ceee9071cb6c079c203e5876ea7e4749a255169 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -852,6 +852,9 @@ int arm_tune_cortex_a9 = 0;
interworking clean.  */
 int arm_cpp_interwork = 0;
 
+/* Nonzero if chip supports Thumb 1.  */
+int arm_arch_thumb1;
+
 /* Nonzero if chip supports Thumb 2.  */
 int arm_arch_thumb2;
 
@@ -3170,6 +3173,7 @@ arm_option_override (void)
   arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
   arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
   arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
+  arm_arch_thumb1 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
   arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
   arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
 
diff --git a/gcc/testsuite/gcc.target/arm/armv5_thumb_isa.c b/gcc/testsuite/gcc.target/arm/armv5_thumb_isa.c
new file mode 100644
index ..80a00aec978778e848ea47d1eb00974fe7b0d3f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv5_thumb_isa.c
@@ -0,0 +1,8 @@
+/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-add-options arm_arch_v5 } */
+
+#if __ARM_ARCH_ISA_THUMB
+#error "__ARM_ARCH_ISA_THUMB defined for ARMv5"
+#endif
+
+int foo;


Re: [PATCH, ARM] Remove unused TARGET_ARM_V*M macros

2016-05-27 Thread Kyrill Tkachov


On 27/05/16 13:32, Thomas Preudhomme wrote:

Hi,

TARGET_ARM_V6M and TARGET_ARM_v7M defined in gcc/config/arm/arm.h appears to be
unused. This patch removes them.

ChangeLog entry is obvious:


*** gcc/ChangeLog ***

2016-05-23  Thomas Preud'homme  

 * config/arm/arm.h (TARGET_ARM_V6M): Remove.
 (TARGET_ARM_V7M): Likewise.


See patch in attachment.


Is this ok for trunk?


Ok if testing is clean.
Thanks for the cleanup.

Kyrill


Best regards,

Thomas




Re: [PATCH][Testsuite] Fix mips dsp testsuite mistakes

2016-05-27 Thread Paul Hua
I am wrong, I lost the sta16() in mips dsp manual.


[PATCH, ARM] Remove unused TARGET_ARM_V*M macros

2016-05-27 Thread Thomas Preudhomme
Hi,

TARGET_ARM_V6M and TARGET_ARM_v7M defined in gcc/config/arm/arm.h appears to be 
unused. This patch removes them.

ChangeLog entry is obvious:


*** gcc/ChangeLog ***

2016-05-23  Thomas Preud'homme  

* config/arm/arm.h (TARGET_ARM_V6M): Remove.
(TARGET_ARM_V7M): Likewise.


See patch in attachment.


Is this ok for trunk?

Best regards,

Thomasdiff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 82755b8aa6d6b35bd83644ae1e7c2b50c94a6d70..4bb10003bc378e11a0b90187d6258cfc99fe6830 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2196,9 +2196,6 @@ extern int making_const_table;
 #define TARGET_ARM_ARCH	\
   (arm_base_arch)	\
 
-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
-
 /* The highest Thumb instruction set version supported by the chip.  */
 #define TARGET_ARM_ARCH_ISA_THUMB 		\
   (arm_arch_thumb2 ? 2\


Re: [PATCH] Fixes to must-tail-call tests

2016-05-27 Thread Thomas Preudhomme
Hi Rainer,

On Wednesday 25 May 2016 11:31:12 Rainer Orth wrote:
> David Malcolm  writes:
> > The following fixes the known failures of the must-tail-call tests.
> > 
> > Tested with --target=
> > * aarch64-unknown-linux-gnu
> > * ia64-unknown-linux-gnu
> > * m68k-unknown-linux-gnu
> > * x86_64-pc-linux-gnu
> 
> Even with this patch, there are still failures on sparc-sun-solaris2.12:
> 
> FAIL: gcc.dg/plugin/must-tail-call-1.c -fplugin=./must_tail_call_plugin.so
> (test for excess errors)
> 
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c:1
> 2:10: error: cannot tail-call: target is not able to optimize the call into
> a sibling call
> 
> FAIL: gcc.dg/plugin/must-tail-call-2.c -fplugin=./must_tail_call_plugin.so
> (test for excess errors)
> 
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c:3
> 2:10: error: cannot tail-call: target is not able to optimize the call into
> a sibling call

Now that the logic is in place, you probably want to add sparc-sun-solaris in 
plugin.exp to the the list of architecture where tail call plugin tests should 
be skipped, alongside Thumb-1 ARM targets.

Best regards,

Thomas


Re: Record likely upper bounds for loops

2016-05-27 Thread Jan Hubicka
> likely_max_loop_iterations misses a function comment.

Thanks, updatted and comitted.

> 
> Ugh, one more widest_int in struct loop ... (oh well).  Given
> that (on x86_64) sizeof(widest_int) == 40 and sizeof(tree_int_cst) == 24
> (ok, that's cheating, it's with just one HWI for the number) it looks
> appealing to change the storage of these to 'tree' ... (as a followup,
> using uint128_type_node or so or whatever largest integer type a
> target supports).  Another option is to add a GCed wide_int that we
> can "allocate" - you can do this already by having a GTY HWI array
> and length and using wi::from_buffer ().  That way you'd avoid defining
> any tree type.

I am not big firend of using TREEs to represent things that are not exactly
part of IL. (and even in IL I would preffer seeing fewer of them).  For likely
upper bound we can also cap and consider only those bounds that fits in 64bit
type. Others are not useful anyway: loop will very likely not iterate more
than 2^64 times ;)

Honza


[Committed] S/390: Replace rtx_equal_p with reg_overlap_mentioned_p in splitter check.

2016-05-27 Thread Andreas Krebbel
gcc/ChangeLog:

2016-05-27  Andreas Krebbel  

* config/s390/s390.md (2x risbg splitters): Use
reg_overlap_mentioned_p instead of rtx_equal_p.
---
 gcc/ChangeLog   | 5 +
 gcc/config/s390/s390.md | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fd03e8c..ab4e9e8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-05-27  Andreas Krebbel  
+
+   * config/s390/s390.md (2x risbg splitters): Use
+   reg_overlap_mentioned_p instead of rtx_equal_p.
+
 2016-05-27  Kyrylo Tkachov  
 
* config/aarch64/aarch64-modes.def (CC_ZESWP, CC_SESWP): Delete.
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index caf8ed5..f8c61a8 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3926,7 +3926,7 @@
 (ashift:GPR (match_dup 3) (match_dup 4]
 {
   operands[5] = GEN_INT ((1UL << UINTVAL (operands[4])) - 1);
-  if (rtx_equal_p (operands[0], operands[3]))
+  if (reg_overlap_mentioned_p (operands[0], operands[3]))
 {
   if (!can_create_pseudo_p ())
FAIL;
@@ -3954,7 +3954,7 @@
  (clobber (reg:CC CC_REGNUM))])]
 {
   operands[5] = GEN_INT ((1UL << UINTVAL (operands[4])) - 1);
-  if (rtx_equal_p (operands[0], operands[3]))
+  if (reg_overlap_mentioned_p (operands[0], operands[3]))
 {
   if (!can_create_pseudo_p ())
FAIL;
-- 
1.9.1



Re: [PATCH][AArch64] Remove aarch64_cannot_change_mode_class

2016-05-27 Thread Wilco Dijkstra
James Greenhalgh wrote:
> Which targets did you check? I'd hope aarch64_be-none-elf in addition to
> aarch64-none-linux-gnu.

I ran aarch64-none-elf, and just checked aarch64_be-none-elf passes too.

> Please refer to PR67609 in your ChangeLog so this gets tracked alongside
> the other fixes in that bug.

Sure.

>
> ChangeLog:
> 2016-05-19  Wilco Dijkstra  
>
>   * gcc/config/aarch64/aarch64.h
>   (CANNOT_CHANGE_MODE_CLASS): Remove.
>   * gcc/config/aarch64/aarch64.c
>   (aarch64_cannot_change_mode_class): Remove function.
>   * cc/config/aarch64/aarch64-protos.h
>   (aarch64_cannot_change_mode_class): Remove.
>



Re: Fix middle-end/71308

2016-05-27 Thread Richard Biener
On Fri, May 27, 2016 at 1:52 PM, Marek Polacek  wrote:
> A stupid mistake of mine.  When I introduced the should_remove_lhs_p call,
> I removed the check for LHS here, because should_remove_lhs_p checks that.
> Well, in this case the LHS was NULL, but gimple_call_fntype (stmt) had void
> type, so we went on and crashed on the SSA_NAME check.  Sorry.  But at least
> we have a test exercising this path.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Yes.

Richard.

> 2016-05-27  Marek Polacek  
>
> PR middle-end/71308
> * gimple-fold.c (gimple_fold_call): Check that LHS is not null.
>
> * g++.dg/torture/pr71308.C: New test.
>
> diff --git gcc/gimple-fold.c gcc/gimple-fold.c
> index d6657e9..600aa72 100644
> --- gcc/gimple-fold.c
> +++ gcc/gimple-fold.c
> @@ -3053,7 +3053,8 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
> inplace)
>   == void_type_node))
> gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
>   /* If the call becomes noreturn, remove the lhs.  */
> - if (gimple_call_noreturn_p (stmt)
> + if (lhs
> + && gimple_call_noreturn_p (stmt)
>   && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
>   || should_remove_lhs_p (lhs)))
> {
> diff --git gcc/testsuite/g++.dg/torture/pr71308.C 
> gcc/testsuite/g++.dg/torture/pr71308.C
> index e69de29..ff5cd95 100644
> --- gcc/testsuite/g++.dg/torture/pr71308.C
> +++ gcc/testsuite/g++.dg/torture/pr71308.C
> @@ -0,0 +1,18 @@
> +// PR middle-end/71308
> +// { dg-do compile }
> +
> +class S
> +{
> +  void foo ();
> +  virtual void bar () = 0;
> +  virtual ~S ();
> +};
> +inline void
> +S::foo ()
> +{
> +  bar ();
> +};
> +S::~S ()
> +{
> +  foo ();
> +}
>
> Marek


Re: RFC [1/2] divmod transform

2016-05-27 Thread Richard Biener
On Fri, 27 May 2016, Prathamesh Kulkarni wrote:

> On 27 May 2016 at 15:45, Richard Biener  wrote:
> > On Wed, 25 May 2016, Prathamesh Kulkarni wrote:
> >
> >> On 25 May 2016 at 12:52, Richard Biener  wrote:
> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 May 2016 at 19:39, Richard Biener  wrote:
> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 24 May 2016 at 17:42, Richard Biener  wrote:
> >> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >> >> >
> >> >> >> >> On 23 May 2016 at 17:35, Richard Biener 
> >> >> >> >>  wrote:
> >> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >> >> >> >> >  wrote:
> >> >> >> >> >> Hi,
> >> >> >> >> >> I have updated my patch for divmod (attached), which was 
> >> >> >> >> >> originally
> >> >> >> >> >> based on Kugan's patch.
> >> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and 
> >> >> >> >> >> TRUNC_MOD_EXPR
> >> >> >> >> >> having same operands to divmod representation, so we can cse 
> >> >> >> >> >> computation of mod.
> >> >> >> >> >>
> >> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
> >> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
> >> >> >> >> >> is transformed to:
> >> >> >> >> >> complex_tmp = DIVMOD (a, b);
> >> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
> >> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
> >> >> >> >> >>
> >> >> >> >> >> * New hook divmod_expand_libfunc
> >> >> >> >> >> The rationale for introducing the hook is that different 
> >> >> >> >> >> targets have
> >> >> >> >> >> incompatible calling conventions for divmod libfunc.
> >> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> >> >> >> >> return quotient and store remainder in argument passed as 
> >> >> >> >> >> pointer,
> >> >> >> >> >> while the arm version takes two arguments and returns both
> >> >> >> >> >> quotient and remainder having mode double the size of the 
> >> >> >> >> >> operand mode.
> >> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
> >> >> >> >> >> to generate call to target-specific divmod.
> >> >> >> >> >> Ports should define this hook if:
> >> >> >> >> >> a) The port does not have divmod or div insn for the given 
> >> >> >> >> >> mode.
> >> >> >> >> >> b) The port defines divmod libfunc for the given mode.
> >> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
> >> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned 
> >> >> >> >> >> and
> >> >> >> >> >> are of DImode.
> >> >> >> >> >>
> >> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> >> >> >> >> cross-tested on arm*-*-*.
> >> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> >> >> >> >> Does this patch look OK ?
> >> >> >> >> >
> >> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> >> >> >> >> > index 6b4601b..e4a021a 100644
> >> >> >> >> > --- a/gcc/targhooks.c
> >> >> >> >> > +++ b/gcc/targhooks.c
> >> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, 
> >> >> >> >> > machine_mode,
> >> >> >> >> > machine_mode, optimization_type)
> >> >> >> >> >return true;
> >> >> >> >> >  }
> >> >> >> >> >
> >> >> >> >> > +void
> >> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode 
> >> >> >> >> > mode,
> >> >> >> >> > +  rtx op0, rtx op1,
> >> >> >> >> > +  rtx *quot_p, rtx *rem_p)
> >> >> >> >> >
> >> >> >> >> > functions need a comment.
> >> >> >> >> >
> >> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 
> >> >> >> >> > style?  In that
> >> >> >> >> > case we could avoid the target hook.
> >> >> >> >> Well I would prefer adding the hook because that's more easier -;)
> >> >> >> >> Would it be ok for now to go with the hook ?
> >> >> >> >> >
> >> >> >> >> > +  /* If target overrides expand_divmod_libfunc hook
> >> >> >> >> > +then perform divmod by generating call to the 
> >> >> >> >> > target-specifc divmod
> >> >> >> >> > libfunc.  */
> >> >> >> >> > +  if (targetm.expand_divmod_libfunc != 
> >> >> >> >> > default_expand_divmod_libfunc)
> >> >> >> >> > +   return true;
> >> >> >> >> > +
> >> >> >> >> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
> >> >> >> >> > +  return (mode == DImode && unsignedp);
> >> >> >> >> >
> >> >> >> >> > I don't understand this - we know optab_libfunc returns 
> >> >> >> >> > non-NULL for 'mode'
> >> >> >> >> > but still restrict this to DImode && unsigned?  Also if
> >> >> >> >> > targetm.expand_divmod_libfunc
> >> >> >> >> > is not the default we expect the target to handle all modes?
> >> >> >> >> Ah indeed, the check for DImode is unnecessary.
> >> >> >> >> However I suppose the check for unsignedp should be there,
> >> >> >> >> since we want to generate call to __udivmoddi4 only if operand is 
> >

Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-05-27 Thread Richard Biener
On Fri, May 27, 2016 at 1:11 PM, Bin.Cheng  wrote:
> On Fri, May 27, 2016 at 11:45 AM, Richard Biener
>  wrote:
>> On Wed, May 25, 2016 at 1:22 PM, Bin Cheng  wrote:
>>> Hi,
>>> As analyzed in PR68303 and PR69710, vectorizer generates duplicated 
>>> computations in loop's pre-header basic block when creating base address 
>>> for vector reference to the same memory object.  Because the duplicated 
>>> code is out of loop, IVOPT fails to track base object for these vector 
>>> references, resulting in missed strength reduction.
>>> It's agreed that vectorizer should be improved to generate optimal 
>>> (IVOPT-friendly) code, the difficult part is we want a generic 
>>> infrastructure.  After investigation, I tried to introduce a generic/simple 
>>> local CSE interface by reusing existing algorithm/data-structure from 
>>> tree-ssa-dom (tree-ssa-scopedtables).  The interface runs local CSE for 
>>> each basic block in a bitmap, customers of this interface only need to 
>>> record basic blocks in the bitmap when necessary.  Note we don't need 
>>> scopedtables' unwinding facility since the interface runs only for single 
>>> basic block, this should be good in terms of compilation time.
>>> Besides CSE issue, this patch also re-associates address expressions in 
>>> vect_create_addr_base_for_vector_ref, specifically, it splits constant 
>>> offset and adds it back near the expression root in IR.  This is necessary 
>>> because GCC only handles re-association for commutative operators in CSE.
>>>
>>> I checked its impact on various test cases.
>>> With this patch, PR68030's generated assembly is reduced from ~750 lines to 
>>> ~580 lines on x86_64, with both pre-header and loop body simplified.  But,
>>> 1) It doesn't fix all the problem on x86_64.  Root cause is computation for 
>>> base address of the first reference is somehow moved outside of loop's 
>>> pre-header, local CSE can't help in this case.  Though 
>>> split_constant_offset can back track ssa def chain, it causes possible 
>>> redundant when there is no CSE opportunities in pre-header.
>>> 2) It causes regression for PR68030 on AArch64.  I think the regression is 
>>> caused by IVOPT issues which are exposed by this patch.  Checks on offset 
>>> validity in get_address_cost is wrong/inaccurate now.  It considers an 
>>> offset as valid if it's within the maximum offset range that backend 
>>> supports.  This is not true, for example, AArch64 requires aligned offset 
>>> additionally.  For example, LDR [base + 2060] is invalid for V4SFmode, 
>>> although "2060" is within the maximum offset range.  Another issue is also 
>>> in get_address_cost.  Inaccurate cost is computed for "base + offset + 
>>> INDEX" address expression.  When register pressure is low, "base+offset" 
>>> can be hoisted out and we can use [base + INDEX] addressing mode, whichhis 
>>> is current behavior.
>>>
>>> Bootstrap and test on x86_64 and AArch64.  Any comments appreciated.
>>
>> It looks quite straight-forward with the caveat that it has one
>> obvious piece that is not in the order
>> of the complexity of a basic-block.  threadedge_initialize_values
>> creates the SSA value array
> I noticed this too, and think it's better to get rid of this init/fini
> functions by some kind re-design.  I found it's quite weird to call
> threadege_X in tree-vrp.c.  I will keep investigating this.
>> which is zero-initialized (upon use).  That's probably a non-issue for
>> the use you propose for
>> the vectorizer (call cse_bbs once per function).  As Ideally I would
>> like this facility to replace
>> the loop unrollers own propagate_constants_for_unrolling it might
>> become an issue though.
>> In that regard the unroller facility is also more powerful because it
>> handles regions (including
>> PHIs).
> With the current data-structure, I think it's not very hard to extend
> the interface to regions.  I will keep investigating this too.  BTW,
> if it's okay, I tend to do that in following patches.

I'm fine with doing enhancements to this in followup patches (adding Jeff
to CC for his opinions).

>>
>> Note that in particular for SLP vectorization the vectorizer itself
>> may end up creating quite
>> some redundancies so I wonder if it's worth to CSE the vectorized loop
>> body as well
> Maybe.  The next step is condition block created by vectorizer.  Both
> prune_runtime_alias_test_list and generated alias checks are
> sub-optimal now, even worse, somehow computations in alias checks can
> be propagated to loop pre-header.  With help of this interface, alias
> checks (and later code) can be simplified.

Yeah.

Thanks,
Richard.

>> (and given we have PHIs eventually CSE the whole vectorized loop with
>> pre-header as a region...)
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>>
>>> 2016-05-17 Bin Cheng  
>>>
>>> PR tree-optimization/68030
>>> PR tree-optimization/69710
>>> * tree-ssa-dom.c (cse_bbs): New function.
>>> * tree-ssa-dom.h (cse_bbs): N

Re: Record likely upper bounds for loops

2016-05-27 Thread Richard Biener
On Fri, 27 May 2016, Jan Hubicka wrote:

> Hi,
> this patch adds infrastructure to tree-ssa-loop-niter to record likely upper 
> bounds.
> The basic idea is that it is easier to get likely bounds that 100% safe 
> bounds or
> realistic estimates and the bound can be effectively used to trim down 
> optimizations
> that are good idea only for large trip counts. This patch only updates the
> infrastructure. I have two followup patches. First turns current optimizers to
> use the bound (and adds testcase) and second enables loop peeling at -O3 and 
> makes
> us to peel in cases where we would fully unroll if the loop bound was safe.
> This improves some benchmarks, like John the ripper.
> 
> Currently likely upper bounds are derived in two cases
>  1) for arrays we can't prove to be non-trailing
>  2) for array accesses in conditional.  I.e.
> int a[3];
> for (int i = 0; t(); i++)
>   if (q())
>   a[i]++;
> It is easy to add more cases, for example when the only unbounded loopback 
> path contains
> unlikely edge.
> 
> Bootstrapped/regtested x86_64-linux, OK?

likely_max_loop_iterations misses a function comment.

Ugh, one more widest_int in struct loop ... (oh well).  Given
that (on x86_64) sizeof(widest_int) == 40 and sizeof(tree_int_cst) == 24
(ok, that's cheating, it's with just one HWI for the number) it looks
appealing to change the storage of these to 'tree' ... (as a followup,
using uint128_type_node or so or whatever largest integer type a
target supports).  Another option is to add a GCed wide_int that we
can "allocate" - you can do this already by having a GTY HWI array
and length and using wi::from_buffer ().  That way you'd avoid defining
any tree type.

Ok with the comment added.

Thanks,
Richard.

> Honza
> 
>   * cfgloop.c (record_niter_bound): Record likely upper bounds.
>   (likely_max_stmt_executions_int, get_likely_max_loop_iterations,
>   get_likely_max_loop_iterations_int): New.
>   * cfgloop.h (struct loop): Add nb_iterations_likely_upper_bound,
>   any_likely_upper_bound.
>   (get_likely_max_loop_iterations_int, get_likely_max_loop_iterations):
>   Declare.
>   * cfgloopmanip.c (copy_loop_info): Copy likely upper bounds.
>   * loop-unroll.c (unroll_loop_constant_iterations): Update likely
>   upper bound.
>   (unroll_loop_constant_iterations): Likewise.
>   (unroll_loop_runtime_iterations): Likewise.
>   * lto-streamer-in.c (input_cfg): Stream likely upper bounds.
>   * lto-streamer-out.c (output_cfg): Likewise.
>   * tree-ssa-loop-ivcanon.c (try_peel_loop): Update likely upper
>   bounds.
>   (canonicalize_loop_induction_variables): Dump likely upper bounds.
>   * tree-ssa-loop-niter.c (record_estimate): Record likely upper bounds.
>   (likely_max_loop_iterations): New.
>   (likely_max_loop_iterations_int): New.
>   (likely_max_stmt_executions): New.
>   * tree-ssa-loop-niter.h (likely_max_loop_iterations,
>   likely_max_loop_iterations_int, likely_max_stmt_executions_int,
>   likely_max_stmt_executions): Declare.
> Index: cfgloop.c
> ===
> --- cfgloop.c (revision 236762)
> +++ cfgloop.c (working copy)
> @@ -1790,6 +1790,11 @@ record_niter_bound (struct loop *loop, c
>  {
>loop->any_upper_bound = true;
>loop->nb_iterations_upper_bound = i_bound;
> +  if (!loop->any_likely_upper_bound)
> + {
> +   loop->any_likely_upper_bound = true;
> +   loop->nb_iterations_likely_upper_bound = i_bound;
> + }
>  }
>if (realistic
>&& (!loop->any_estimate
> @@ -1798,6 +1803,13 @@ record_niter_bound (struct loop *loop, c
>loop->any_estimate = true;
>loop->nb_iterations_estimate = i_bound;
>  }
> +  if (!realistic
> +  && (!loop->any_likely_upper_bound
> +  || wi::ltu_p (i_bound, loop->nb_iterations_likely_upper_bound)))
> +{
> +  loop->any_likely_upper_bound = true;
> +  loop->nb_iterations_likely_upper_bound = i_bound;
> +}
>  
>/* If an upper bound is smaller than the realistic estimate of the
>   number of iterations, use the upper bound instead.  */
> @@ -1806,6 +1818,11 @@ record_niter_bound (struct loop *loop, c
>&& wi::ltu_p (loop->nb_iterations_upper_bound,
>   loop->nb_iterations_estimate))
>  loop->nb_iterations_estimate = loop->nb_iterations_upper_bound;
> +  if (loop->any_upper_bound
> +  && loop->any_likely_upper_bound
> +  && wi::ltu_p (loop->nb_iterations_upper_bound,
> + loop->nb_iterations_likely_upper_bound))
> +loop->nb_iterations_likely_upper_bound = loop->nb_iterations_upper_bound;
>  }
>  
>  /* Similar to get_estimated_loop_iterations, but returns the estimate only
> @@ -1847,6 +1864,25 @@ max_stmt_executions_int (struct loop *lo
>return snit < 0 ? -1 : snit;
>  }
>  
> +/* Returns an likely upper bound on the number of executions of state

Fix middle-end/71308

2016-05-27 Thread Marek Polacek
A stupid mistake of mine.  When I introduced the should_remove_lhs_p call,
I removed the check for LHS here, because should_remove_lhs_p checks that.
Well, in this case the LHS was NULL, but gimple_call_fntype (stmt) had void
type, so we went on and crashed on the SSA_NAME check.  Sorry.  But at least
we have a test exercising this path.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-27  Marek Polacek  

PR middle-end/71308
* gimple-fold.c (gimple_fold_call): Check that LHS is not null.

* g++.dg/torture/pr71308.C: New test.

diff --git gcc/gimple-fold.c gcc/gimple-fold.c
index d6657e9..600aa72 100644
--- gcc/gimple-fold.c
+++ gcc/gimple-fold.c
@@ -3053,7 +3053,8 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace)
  == void_type_node))
gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
  /* If the call becomes noreturn, remove the lhs.  */
- if (gimple_call_noreturn_p (stmt)
+ if (lhs
+ && gimple_call_noreturn_p (stmt)
  && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
  || should_remove_lhs_p (lhs)))
{
diff --git gcc/testsuite/g++.dg/torture/pr71308.C 
gcc/testsuite/g++.dg/torture/pr71308.C
index e69de29..ff5cd95 100644
--- gcc/testsuite/g++.dg/torture/pr71308.C
+++ gcc/testsuite/g++.dg/torture/pr71308.C
@@ -0,0 +1,18 @@
+// PR middle-end/71308
+// { dg-do compile }
+
+class S
+{
+  void foo ();
+  virtual void bar () = 0;
+  virtual ~S ();
+};
+inline void
+S::foo ()
+{
+  bar ();
+};
+S::~S ()
+{
+  foo ();
+}

Marek


Re: RFC [1/2] divmod transform

2016-05-27 Thread Prathamesh Kulkarni
On 27 May 2016 at 15:45, Richard Biener  wrote:
> On Wed, 25 May 2016, Prathamesh Kulkarni wrote:
>
>> On 25 May 2016 at 12:52, Richard Biener  wrote:
>> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 24 May 2016 at 19:39, Richard Biener  wrote:
>> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 May 2016 at 17:42, Richard Biener  wrote:
>> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 23 May 2016 at 17:35, Richard Biener 
>> >> >> >>  wrote:
>> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> >> >> >  wrote:
>> >> >> >> >> Hi,
>> >> >> >> >> I have updated my patch for divmod (attached), which was 
>> >> >> >> >> originally
>> >> >> >> >> based on Kugan's patch.
>> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and 
>> >> >> >> >> TRUNC_MOD_EXPR
>> >> >> >> >> having same operands to divmod representation, so we can cse 
>> >> >> >> >> computation of mod.
>> >> >> >> >>
>> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> >> >> is transformed to:
>> >> >> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >> >> >>
>> >> >> >> >> * New hook divmod_expand_libfunc
>> >> >> >> >> The rationale for introducing the hook is that different targets 
>> >> >> >> >> have
>> >> >> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> >> >> return quotient and store remainder in argument passed as 
>> >> >> >> >> pointer,
>> >> >> >> >> while the arm version takes two arguments and returns both
>> >> >> >> >> quotient and remainder having mode double the size of the 
>> >> >> >> >> operand mode.
>> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> >> >> to generate call to target-specific divmod.
>> >> >> >> >> Ports should define this hook if:
>> >> >> >> >> a) The port does not have divmod or div insn for the given mode.
>> >> >> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
>> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> >> >> >> are of DImode.
>> >> >> >> >>
>> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> >> >> cross-tested on arm*-*-*.
>> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> >> >> Does this patch look OK ?
>> >> >> >> >
>> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> >> >> > index 6b4601b..e4a021a 100644
>> >> >> >> > --- a/gcc/targhooks.c
>> >> >> >> > +++ b/gcc/targhooks.c
>> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, 
>> >> >> >> > machine_mode,
>> >> >> >> > machine_mode, optimization_type)
>> >> >> >> >return true;
>> >> >> >> >  }
>> >> >> >> >
>> >> >> >> > +void
>> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> >> >> >> > +  rtx op0, rtx op1,
>> >> >> >> > +  rtx *quot_p, rtx *rem_p)
>> >> >> >> >
>> >> >> >> > functions need a comment.
>> >> >> >> >
>> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 
>> >> >> >> > style?  In that
>> >> >> >> > case we could avoid the target hook.
>> >> >> >> Well I would prefer adding the hook because that's more easier -;)
>> >> >> >> Would it be ok for now to go with the hook ?
>> >> >> >> >
>> >> >> >> > +  /* If target overrides expand_divmod_libfunc hook
>> >> >> >> > +then perform divmod by generating call to the 
>> >> >> >> > target-specifc divmod
>> >> >> >> > libfunc.  */
>> >> >> >> > +  if (targetm.expand_divmod_libfunc != 
>> >> >> >> > default_expand_divmod_libfunc)
>> >> >> >> > +   return true;
>> >> >> >> > +
>> >> >> >> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> >> >> > +  return (mode == DImode && unsignedp);
>> >> >> >> >
>> >> >> >> > I don't understand this - we know optab_libfunc returns non-NULL 
>> >> >> >> > for 'mode'
>> >> >> >> > but still restrict this to DImode && unsigned?  Also if
>> >> >> >> > targetm.expand_divmod_libfunc
>> >> >> >> > is not the default we expect the target to handle all modes?
>> >> >> >> Ah indeed, the check for DImode is unnecessary.
>> >> >> >> However I suppose the check for unsignedp should be there,
>> >> >> >> since we want to generate call to __udivmoddi4 only if operand is 
>> >> >> >> unsigned ?
>> >> >> >
>> >> >> > The optab libfunc for sdivmod should be NULL in that case.
>> >> >> Ah indeed, thanks.
>> >> >> >
>> >> >> >> >
>> >> >> >> > That said - I expected the above piece to be simply a 'return 
>> >> >> >> > true;' ;)
>> >> >> >> >
>> >> >> >> > Usually we use so

Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-05-27 Thread Richard Biener
On Fri, May 27, 2016 at 11:09 AM, Alan Hayward  wrote:
> This patch is a reworking of the previous vectorize inductions that are
> live
> after the loop patch.
> It now supports SLP and an optimisation has been moved to patch [3/3].
>
>
> Stmts which are live (ie: defined inside a loop and then used after the
> loop)
> are not currently supported by the vectorizer.  In many cases
> vectorization can
> still occur because the SCEV cprop pass will hoist the definition of the
> stmt
> outside of the loop before the vectorizor pass. However, there are various
> cases SCEV cprop cannot hoist, for example:
>   for (i = 0; i < n; ++i)
> {
>   ret = x[i];
>   x[i] = i;
> }
>return i;
>
> Currently stmts are marked live using a bool, and the relevant state using
> an
> enum. Both these states are propagated to the definition of all uses of the
> stmt. Also, a stmt can be live but not relevant.
>
> This patch vectorizes a live stmt definition normally within the loop and
> then
> after the loop uses BIT_FIELD_REF to extract the final scalar value from
> the
> vector.
>
> This patch adds a new relevant state (vect_used_only_live) for when a stmt
> is
> used only outside the loop. The relevant state is still propagated to all
> it's
> uses, but the live bool is not (this ensures that
> vectorizable_live_operation
> is only called with stmts that really are live).
>
> Tested on x86 and aarch64.

+  /* If STMT is a simple assignment and its inputs are invariant, then it can
+ remain in place, unvectorized.  The original last scalar value that it
+ computes will be used.  */
+  if (is_simple_and_all_uses_invariant (stmt, loop_vinfo))
 {

so we can't use STMT_VINFO_RELEVANT or so?  I thought we somehow
mark stmts we don't vectorize in the end.

+  lhs = (is_a  (stmt)) ? gimple_phi_result (stmt)
+   : gimple_get_lhs (stmt);
+  lhs_type = TREE_TYPE (lhs);
+
+  /* Find all uses of STMT outside the loop.  */
+  auto_vec worklist;
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+{
+  basic_block bb = gimple_bb (use_stmt);
+
+  if (!flow_bb_inside_loop_p (loop, bb))
+   worklist.safe_push (use_stmt);
 }
+  gcc_assert (!worklist.is_empty ());

as we are in loop-closed SSA there should be exactly one such use...?

+  /* Get the correct slp vectorized stmt.  */
+  vec_oprnds.create (num_vec);
+  vect_get_slp_vect_defs (slp_node, &vec_oprnds);

As you look at the vectorized stmt you can directly use the SLP_TREE_VEC_STMTS
array (the stmts lhs, of course), no need to export this function.

The rest of the changes look ok to me.

Thanks,
Richard.


> gcc/
> * tree-vect-loop.c (vect_analyze_loop_operations): Allow live
> stmts.
> (vectorizable_reduction): Check for new relevant state.
> (vectorizable_live_operation): vectorize live stmts using
> BIT_FIELD_REF.  Remove special case for gimple assigns stmts.
> * tree-vect-stmts.c (is_simple_and_all_uses_invariant): New
> function.
> (vect_stmt_relevant_p): Check for stmts which are only used live.
> (process_use): Use of a stmt does not inherit it's live value.
> (vect_mark_stmts_to_be_vectorized): Simplify relevance inheritance.
> (vect_analyze_stmt): Check for new relevant state.
> *tree-vect-slp.c (vect_get_slp_vect_defs): Make global
> *tree-vectorizer.h (vect_relevant): New entry for a stmt which is
> used
> outside the loop, but not inside it.
>
> testsuite/
> * gcc.dg/tree-ssa/pr64183.c: Ensure test does not vectorize.
> * testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c: Remove xfail.
> * gcc.dg/vect/vect-live-1.c: New test.
> * gcc.dg/vect/vect-live-2.c: New test.
> * gcc.dg/vect/vect-live-3.c: New test.
> * gcc.dg/vect/vect-live-4.c: New test.
> * gcc.dg/vect/vect-live-5.c: New test.
> * gcc.dg/vect/vect-live-slp-1.c: New test.
> * gcc.dg/vect/vect-live-slp-2.c: New test.
> * gcc.dg/vect/vect-live-slp-3.c: New test.
> * gcc.dg/vect/vect-live-slp-4.c: New test.
>
>
>
> Alan.
>


Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops

2016-05-27 Thread James Greenhalgh
On Tue, Mar 15, 2016 at 03:31:30PM +, James Greenhalgh wrote:
> On Mon, Mar 07, 2016 at 10:54:25PM -0800, Andrew Pinski wrote:
> > On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix)  
> > wrote:
> > >> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix)  
> > >> wrote:
> > >> > Hi,
> > >> >
> > >> > As discussed in LKML:
> > >> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html,
> > >>  the
> > >> cost of changing a cache line
> > >> > from shared to exclusive state can be significant on aarch64 cores,
> > >> especially when this is triggered by an exclusive store, since it may
> > >> > result in having to retry the transaction.
> > >> > This patch makes use of the "prfm PSTL1STRM" instruction to 
> > >> > prefetch
> > >> cache lines for write prior to ldxr/stxr loops generated by the ll/sc 
> > >> atomic
> > >> routines.
> > >> > Bootstrapped on AArch64 server, is it OK?
> > >>
> > >>
> > >> I don't think this is a good thing in general.  For an example on 
> > >> ThunderX, the
> > >> prefetch just adds a cycle for no benefit.  This really depends on the
> > >> micro-architecture of the core and how LDXR/STXR are
> > >> implemented.   So after this patch, it will slow down ThunderX.
> > >>
> > >> Thanks,
> > >> Andrew Pinski
> > >>
> > >
> > > Hi Andrew,
> > >
> > >I am not quite clear about the ThunderX micro-arch.  But, Yes, I agree
> > >it depends on the micro-architecture of the core.  As the mentioned
> > >kernel patch is merged upstream, I think the added prefetch instruction
> > >in atomic routines is good for most of AArch64 cores in the market.  If
> > >it does nothing good for ThunderX, then how about adding some checking
> > >here?  I mean disabling the the generation of the prfm if we are tuning
> > >for ThunderX.
> > 
> > No it is not just not do any good, it actually causes worse
> > performance for ThunderX.  How about only doing it for the
> > micro-architecture where it helps and also not do it for generic since
> > it hurts ThunderX so much.
> 
> This should be a GCC 7 patch at this point, which should give us some time
> to talk through whether we want this patch or not.
> 
> How bad is this for ThunderX - upthread you said one cycle penalty, but here
> you suggest it hurts ThunderX more? Note that the prefetch is outside of
> the LDXR/STXR loop.

Hi Andrew,

Did you have any further thoughts on the magnitude of the penalty you
would face on ThunderX? 

Felix, I think if this is going to be expensive for some AArch64 platforms,
then the best way to progress this patch would be to add a new flag to
tune_flags. Something like AARCH64_EXTRA_TUNE_PREFETCH_LDREX. This would
allow targets which want the explicit prefetch to enable it.

Thanks,
James



Record likely upper bounds for loops

2016-05-27 Thread Jan Hubicka
Hi,
this patch adds infrastructure to tree-ssa-loop-niter to record likely upper 
bounds.
The basic idea is that it is easier to get likely bounds that 100% safe bounds 
or
realistic estimates and the bound can be effectively used to trim down 
optimizations
that are good idea only for large trip counts. This patch only updates the
infrastructure. I have two followup patches. First turns current optimizers to
use the bound (and adds testcase) and second enables loop peeling at -O3 and 
makes
us to peel in cases where we would fully unroll if the loop bound was safe.
This improves some benchmarks, like John the ripper.

Currently likely upper bounds are derived in two cases
 1) for arrays we can't prove to be non-trailing
 2) for array accesses in conditional.  I.e.
int a[3];
for (int i = 0; t(); i++)
  if (q())
a[i]++;
It is easy to add more cases, for example when the only unbounded loopback path 
contains
unlikely edge.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* cfgloop.c (record_niter_bound): Record likely upper bounds.
(likely_max_stmt_executions_int, get_likely_max_loop_iterations,
get_likely_max_loop_iterations_int): New.
* cfgloop.h (struct loop): Add nb_iterations_likely_upper_bound,
any_likely_upper_bound.
(get_likely_max_loop_iterations_int, get_likely_max_loop_iterations):
Declare.
* cfgloopmanip.c (copy_loop_info): Copy likely upper bounds.
* loop-unroll.c (unroll_loop_constant_iterations): Update likely
upper bound.
(unroll_loop_constant_iterations): Likewise.
(unroll_loop_runtime_iterations): Likewise.
* lto-streamer-in.c (input_cfg): Stream likely upper bounds.
* lto-streamer-out.c (output_cfg): Likewise.
* tree-ssa-loop-ivcanon.c (try_peel_loop): Update likely upper
bounds.
(canonicalize_loop_induction_variables): Dump likely upper bounds.
* tree-ssa-loop-niter.c (record_estimate): Record likely upper bounds.
(likely_max_loop_iterations): New.
(likely_max_loop_iterations_int): New.
(likely_max_stmt_executions): New.
* tree-ssa-loop-niter.h (likely_max_loop_iterations,
likely_max_loop_iterations_int, likely_max_stmt_executions_int,
likely_max_stmt_executions): Declare.
Index: cfgloop.c
===
--- cfgloop.c   (revision 236762)
+++ cfgloop.c   (working copy)
@@ -1790,6 +1790,11 @@ record_niter_bound (struct loop *loop, c
 {
   loop->any_upper_bound = true;
   loop->nb_iterations_upper_bound = i_bound;
+  if (!loop->any_likely_upper_bound)
+   {
+ loop->any_likely_upper_bound = true;
+ loop->nb_iterations_likely_upper_bound = i_bound;
+   }
 }
   if (realistic
   && (!loop->any_estimate
@@ -1798,6 +1803,13 @@ record_niter_bound (struct loop *loop, c
   loop->any_estimate = true;
   loop->nb_iterations_estimate = i_bound;
 }
+  if (!realistic
+  && (!loop->any_likely_upper_bound
+  || wi::ltu_p (i_bound, loop->nb_iterations_likely_upper_bound)))
+{
+  loop->any_likely_upper_bound = true;
+  loop->nb_iterations_likely_upper_bound = i_bound;
+}
 
   /* If an upper bound is smaller than the realistic estimate of the
  number of iterations, use the upper bound instead.  */
@@ -1806,6 +1818,11 @@ record_niter_bound (struct loop *loop, c
   && wi::ltu_p (loop->nb_iterations_upper_bound,
loop->nb_iterations_estimate))
 loop->nb_iterations_estimate = loop->nb_iterations_upper_bound;
+  if (loop->any_upper_bound
+  && loop->any_likely_upper_bound
+  && wi::ltu_p (loop->nb_iterations_upper_bound,
+   loop->nb_iterations_likely_upper_bound))
+loop->nb_iterations_likely_upper_bound = loop->nb_iterations_upper_bound;
 }
 
 /* Similar to get_estimated_loop_iterations, but returns the estimate only
@@ -1847,6 +1864,25 @@ max_stmt_executions_int (struct loop *lo
   return snit < 0 ? -1 : snit;
 }
 
+/* Returns an likely upper bound on the number of executions of statements
+   in the LOOP.  For statements before the loop exit, this exceeds
+   the number of execution of the latch by one.  */
+
+HOST_WIDE_INT
+likely_max_stmt_executions_int (struct loop *loop)
+{
+  HOST_WIDE_INT nit = get_likely_max_loop_iterations_int (loop);
+  HOST_WIDE_INT snit;
+
+  if (nit == -1)
+return -1;
+
+  snit = (HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) nit + 1);
+
+  /* If the computation overflows, return -1.  */
+  return snit < 0 ? -1 : snit;
+}
+
 /* Sets NIT to the estimated number of executions of the latch of the
LOOP.  If we have no reliable estimate, the function returns false, 
otherwise
returns true.  */
@@ -1899,6 +1935,40 @@ get_max_loop_iterations_int (struct loop
 return -1;
 
   if (!wi::fits_shwi_p (nit))
+return -1;
+  hwi_nit = nit.to_shwi ();
+
+  return hwi_nit < 0 ? 

Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-05-27 Thread Bin.Cheng
On Fri, May 27, 2016 at 11:45 AM, Richard Biener
 wrote:
> On Wed, May 25, 2016 at 1:22 PM, Bin Cheng  wrote:
>> Hi,
>> As analyzed in PR68303 and PR69710, vectorizer generates duplicated 
>> computations in loop's pre-header basic block when creating base address for 
>> vector reference to the same memory object.  Because the duplicated code is 
>> out of loop, IVOPT fails to track base object for these vector references, 
>> resulting in missed strength reduction.
>> It's agreed that vectorizer should be improved to generate optimal 
>> (IVOPT-friendly) code, the difficult part is we want a generic 
>> infrastructure.  After investigation, I tried to introduce a generic/simple 
>> local CSE interface by reusing existing algorithm/data-structure from 
>> tree-ssa-dom (tree-ssa-scopedtables).  The interface runs local CSE for each 
>> basic block in a bitmap, customers of this interface only need to record 
>> basic blocks in the bitmap when necessary.  Note we don't need scopedtables' 
>> unwinding facility since the interface runs only for single basic block, 
>> this should be good in terms of compilation time.
>> Besides CSE issue, this patch also re-associates address expressions in 
>> vect_create_addr_base_for_vector_ref, specifically, it splits constant 
>> offset and adds it back near the expression root in IR.  This is necessary 
>> because GCC only handles re-association for commutative operators in CSE.
>>
>> I checked its impact on various test cases.
>> With this patch, PR68030's generated assembly is reduced from ~750 lines to 
>> ~580 lines on x86_64, with both pre-header and loop body simplified.  But,
>> 1) It doesn't fix all the problem on x86_64.  Root cause is computation for 
>> base address of the first reference is somehow moved outside of loop's 
>> pre-header, local CSE can't help in this case.  Though split_constant_offset 
>> can back track ssa def chain, it causes possible redundant when there is no 
>> CSE opportunities in pre-header.
>> 2) It causes regression for PR68030 on AArch64.  I think the regression is 
>> caused by IVOPT issues which are exposed by this patch.  Checks on offset 
>> validity in get_address_cost is wrong/inaccurate now.  It considers an 
>> offset as valid if it's within the maximum offset range that backend 
>> supports.  This is not true, for example, AArch64 requires aligned offset 
>> additionally.  For example, LDR [base + 2060] is invalid for V4SFmode, 
>> although "2060" is within the maximum offset range.  Another issue is also 
>> in get_address_cost.  Inaccurate cost is computed for "base + offset + 
>> INDEX" address expression.  When register pressure is low, "base+offset" can 
>> be hoisted out and we can use [base + INDEX] addressing mode, whichhis is 
>> current behavior.
>>
>> Bootstrap and test on x86_64 and AArch64.  Any comments appreciated.
>
> It looks quite straight-forward with the caveat that it has one
> obvious piece that is not in the order
> of the complexity of a basic-block.  threadedge_initialize_values
> creates the SSA value array
I noticed this too, and think it's better to get rid of this init/fini
functions by some kind re-design.  I found it's quite weird to call
threadege_X in tree-vrp.c.  I will keep investigating this.
> which is zero-initialized (upon use).  That's probably a non-issue for
> the use you propose for
> the vectorizer (call cse_bbs once per function).  As Ideally I would
> like this facility to replace
> the loop unrollers own propagate_constants_for_unrolling it might
> become an issue though.
> In that regard the unroller facility is also more powerful because it
> handles regions (including
> PHIs).
With the current data-structure, I think it's not very hard to extend
the interface to regions.  I will keep investigating this too.  BTW,
if it's okay, I tend to do that in following patches.
>
> Note that in particular for SLP vectorization the vectorizer itself
> may end up creating quite
> some redundancies so I wonder if it's worth to CSE the vectorized loop
> body as well
Maybe.  The next step is condition block created by vectorizer.  Both
prune_runtime_alias_test_list and generated alias checks are
sub-optimal now, even worse, somehow computations in alias checks can
be propagated to loop pre-header.  With help of this interface, alias
checks (and later code) can be simplified.

> (and given we have PHIs eventually CSE the whole vectorized loop with
> pre-header as a region...)
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>
>> 2016-05-17 Bin Cheng  
>>
>> PR tree-optimization/68030
>> PR tree-optimization/69710
>> * tree-ssa-dom.c (cse_bbs): New function.
>> * tree-ssa-dom.h (cse_bbs): New declaration.
>> * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
>> Re-associate address by splitting constant offset.
>> (vect_create_data_ref_ptr, vect_setup_realignment): Record changed
>> basic block.
>> * tree-vect-l

Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-05-27 Thread Richard Biener
On Wed, May 25, 2016 at 1:22 PM, Bin Cheng  wrote:
> Hi,
> As analyzed in PR68303 and PR69710, vectorizer generates duplicated 
> computations in loop's pre-header basic block when creating base address for 
> vector reference to the same memory object.  Because the duplicated code is 
> out of loop, IVOPT fails to track base object for these vector references, 
> resulting in missed strength reduction.
> It's agreed that vectorizer should be improved to generate optimal 
> (IVOPT-friendly) code, the difficult part is we want a generic 
> infrastructure.  After investigation, I tried to introduce a generic/simple 
> local CSE interface by reusing existing algorithm/data-structure from 
> tree-ssa-dom (tree-ssa-scopedtables).  The interface runs local CSE for each 
> basic block in a bitmap, customers of this interface only need to record 
> basic blocks in the bitmap when necessary.  Note we don't need scopedtables' 
> unwinding facility since the interface runs only for single basic block, this 
> should be good in terms of compilation time.
> Besides CSE issue, this patch also re-associates address expressions in 
> vect_create_addr_base_for_vector_ref, specifically, it splits constant offset 
> and adds it back near the expression root in IR.  This is necessary because 
> GCC only handles re-association for commutative operators in CSE.
>
> I checked its impact on various test cases.
> With this patch, PR68030's generated assembly is reduced from ~750 lines to 
> ~580 lines on x86_64, with both pre-header and loop body simplified.  But,
> 1) It doesn't fix all the problem on x86_64.  Root cause is computation for 
> base address of the first reference is somehow moved outside of loop's 
> pre-header, local CSE can't help in this case.  Though split_constant_offset 
> can back track ssa def chain, it causes possible redundant when there is no 
> CSE opportunities in pre-header.
> 2) It causes regression for PR68030 on AArch64.  I think the regression is 
> caused by IVOPT issues which are exposed by this patch.  Checks on offset 
> validity in get_address_cost is wrong/inaccurate now.  It considers an offset 
> as valid if it's within the maximum offset range that backend supports.  This 
> is not true, for example, AArch64 requires aligned offset additionally.  For 
> example, LDR [base + 2060] is invalid for V4SFmode, although "2060" is within 
> the maximum offset range.  Another issue is also in get_address_cost.  
> Inaccurate cost is computed for "base + offset + INDEX" address expression.  
> When register pressure is low, "base+offset" can be hoisted out and we can 
> use [base + INDEX] addressing mode, whichhis is current behavior.
>
> Bootstrap and test on x86_64 and AArch64.  Any comments appreciated.

It looks quite straight-forward with the caveat that it has one
obvious piece that is not in the order
of the complexity of a basic-block.  threadedge_initialize_values
creates the SSA value array
which is zero-initialized (upon use).  That's probably a non-issue for
the use you propose for
the vectorizer (call cse_bbs once per function).  As Ideally I would
like this facility to replace
the loop unrollers own propagate_constants_for_unrolling it might
become an issue though.
In that regard the unroller facility is also more powerful because it
handles regions (including
PHIs).

Note that in particular for SLP vectorization the vectorizer itself
may end up creating quite
some redundancies so I wonder if it's worth to CSE the vectorized loop
body as well
(and given we have PHIs eventually CSE the whole vectorized loop with
pre-header as a region...)

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-05-17 Bin Cheng  
>
> PR tree-optimization/68030
> PR tree-optimization/69710
> * tree-ssa-dom.c (cse_bbs): New function.
> * tree-ssa-dom.h (cse_bbs): New declaration.
> * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
> Re-associate address by splitting constant offset.
> (vect_create_data_ref_ptr, vect_setup_realignment): Record changed
> basic block.
> * tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop): Record
> changed basic block.
> * tree-vectorizer.c (tree-ssa-dom.h): Include header file.
> (changed_bbs): New variable.
> (vectorize_loops): Allocate and free CHANGED_BBS.  Call cse_bbs.
> * tree-vectorizer.h (changed_bbs): New declaration.


Re: RFC [1/2] divmod transform

2016-05-27 Thread Richard Biener
On Wed, 25 May 2016, Prathamesh Kulkarni wrote:

> On 25 May 2016 at 12:52, Richard Biener  wrote:
> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >
> >> On 24 May 2016 at 19:39, Richard Biener  wrote:
> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 24 May 2016 at 17:42, Richard Biener  wrote:
> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 23 May 2016 at 17:35, Richard Biener  
> >> >> >> wrote:
> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >> >> >> >  wrote:
> >> >> >> >> Hi,
> >> >> >> >> I have updated my patch for divmod (attached), which was 
> >> >> >> >> originally
> >> >> >> >> based on Kugan's patch.
> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and 
> >> >> >> >> TRUNC_MOD_EXPR
> >> >> >> >> having same operands to divmod representation, so we can cse 
> >> >> >> >> computation of mod.
> >> >> >> >>
> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
> >> >> >> >> is transformed to:
> >> >> >> >> complex_tmp = DIVMOD (a, b);
> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
> >> >> >> >>
> >> >> >> >> * New hook divmod_expand_libfunc
> >> >> >> >> The rationale for introducing the hook is that different targets 
> >> >> >> >> have
> >> >> >> >> incompatible calling conventions for divmod libfunc.
> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> >> >> >> return quotient and store remainder in argument passed as pointer,
> >> >> >> >> while the arm version takes two arguments and returns both
> >> >> >> >> quotient and remainder having mode double the size of the operand 
> >> >> >> >> mode.
> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
> >> >> >> >> to generate call to target-specific divmod.
> >> >> >> >> Ports should define this hook if:
> >> >> >> >> a) The port does not have divmod or div insn for the given mode.
> >> >> >> >> b) The port defines divmod libfunc for the given mode.
> >> >> >> >> The default hook default_expand_divmod_libfunc() generates call
> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> >> >> >> are of DImode.
> >> >> >> >>
> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> >> >> >> cross-tested on arm*-*-*.
> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> >> >> >> Does this patch look OK ?
> >> >> >> >
> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> >> >> >> > index 6b4601b..e4a021a 100644
> >> >> >> > --- a/gcc/targhooks.c
> >> >> >> > +++ b/gcc/targhooks.c
> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, 
> >> >> >> > machine_mode,
> >> >> >> > machine_mode, optimization_type)
> >> >> >> >return true;
> >> >> >> >  }
> >> >> >> >
> >> >> >> > +void
> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> >> >> >> > +  rtx op0, rtx op1,
> >> >> >> > +  rtx *quot_p, rtx *rem_p)
> >> >> >> >
> >> >> >> > functions need a comment.
> >> >> >> >
> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 
> >> >> >> > style?  In that
> >> >> >> > case we could avoid the target hook.
> >> >> >> Well I would prefer adding the hook because that's more easier -;)
> >> >> >> Would it be ok for now to go with the hook ?
> >> >> >> >
> >> >> >> > +  /* If target overrides expand_divmod_libfunc hook
> >> >> >> > +then perform divmod by generating call to the 
> >> >> >> > target-specifc divmod
> >> >> >> > libfunc.  */
> >> >> >> > +  if (targetm.expand_divmod_libfunc != 
> >> >> >> > default_expand_divmod_libfunc)
> >> >> >> > +   return true;
> >> >> >> > +
> >> >> >> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
> >> >> >> > +  return (mode == DImode && unsignedp);
> >> >> >> >
> >> >> >> > I don't understand this - we know optab_libfunc returns non-NULL 
> >> >> >> > for 'mode'
> >> >> >> > but still restrict this to DImode && unsigned?  Also if
> >> >> >> > targetm.expand_divmod_libfunc
> >> >> >> > is not the default we expect the target to handle all modes?
> >> >> >> Ah indeed, the check for DImode is unnecessary.
> >> >> >> However I suppose the check for unsignedp should be there,
> >> >> >> since we want to generate call to __udivmoddi4 only if operand is 
> >> >> >> unsigned ?
> >> >> >
> >> >> > The optab libfunc for sdivmod should be NULL in that case.
> >> >> Ah indeed, thanks.
> >> >> >
> >> >> >> >
> >> >> >> > That said - I expected the above piece to be simply a 'return 
> >> >> >> > true;' ;)
> >> >> >> >
> >> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if 
> >> >> >> > the target
> >> >> >> > supports a specific operation (for example SImode divmod would use 
> >> >> >> > DImo

Re: [PATCH, PR middle-end/71279] Avoid folding vec_cond_expr into comparison

2016-05-27 Thread Richard Biener
On Fri, May 27, 2016 at 11:02 AM, Ilya Enkovich  wrote:
> Hi,
>
> This patch disable transformation of VEC_COND_EXPR into comparison
> which became invalid after boolean vectors introduciton.
>
> Bootstrapped and regtested for x86_64-pc-linux-gnu.  OK for trunk
> and gcc-6?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-27  Ilya Enkovich  
>
> PR middle-end/71279
> * fold-const.c (fold_ternary_loc): Don't fold VEC_COND_EXPR
> into comparison.
>
> gcc/testsuite/
>
> 2016-05-27  Ilya Enkovich  
>
> PR middle-end/71279
> * gcc.dg/pr71279.c: New test.
>
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 556fc73..5058746 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -11515,9 +11515,9 @@ fold_ternary_loc (location_t loc, enum tree_code 
> code, tree type,
>/* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>  over COND_EXPR in cases such as floating point comparisons.  */
>if (integer_zerop (op1)
> - && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
> -   : (integer_onep (op2)
> -  && !VECTOR_TYPE_P (type)))
> + && code == COND_EXPR
> + && integer_onep (op2)
> + && !VECTOR_TYPE_P (type)
>   && truth_value_p (TREE_CODE (arg0)))
> return pedantic_non_lvalue_loc (loc,
> fold_convert_loc (loc, type,
> diff --git a/gcc/testsuite/gcc.dg/pr71279.c b/gcc/testsuite/gcc.dg/pr71279.c
> new file mode 100644
> index 000..4ecc84b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr71279.c
> @@ -0,0 +1,14 @@
> +/* PR middle-end/71279 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-additional-options "-march=knl" { target { i?86-*-* x86_64-*-* } } } 
> */
> +
> +extern int a, b;
> +long c[1][1][1];
> +long d[1][1];
> +
> +void fn1 ()
> +{
> +  for (int e = 0; e < b; e = e + 1)
> +*(e + **c) = (a && *d[1]) - 1;
> +}


Re: [PATCH] Help PR70729, shuffle LIM and PRE

2016-05-27 Thread Richard Biener
On Thu, 26 May 2016, Christophe Lyon wrote:

> On 18 May 2016 at 12:55, Richard Biener  wrote:
> >
> > The following patch moves LIM before PRE to allow it to cleanup CSE
> > (and copyprop) opportunities LIM exposes.  It also moves the DCE done
> > in loop before the loop pipeline as otherwise it is no longer executed
> > uncoditionally at this point (since we have the no_loop pipeline).
> >
> > The patch requires some testsuite adjustments such as cope with LIM now
> > running before PRE and thus disabling the former and to adjust
> > for better optimization we now do in the two testcases with redundant
> > stores where store motion enables sinking to sink all interesting code
> > out of the innermost loop.
> >
> > It also requires the LIM PHI hoisting cost adjustment patch I am
> > testing separately.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu (with testsuite
> > fallout resulting in the following adjustments).
> >
> > I'm going to re-test before committing.
> >
> > Richard.
> 
> Hi Richard,
> 
> I've noticed that this patch introduces a regression on aarch64/arm targets:
> gcc.dg/tree-ssa/scev-4.c scan-tree-dump-times optimized "&a" 1
> 
> because '&a' now appears twice in the log.
> 
> Actually, this is the only regression on aarch64, but on arm I've also
> noticed regressions on scev-5 and scev-3 (for armv5t for the latter)

See PR71237.

Richard.

> Christophe.
> 
> 
> >
> > 2016-05-18  Richard Biener  
> >
> > PR tree-optimization/70729
> > * passes.def: Move LIM pass before PRE.  Remove no longer
> > required copyprop and move first DCE out of the loop pipeline.
> >
> > * gcc.dg/autopar/outer-6.c: Adjust to avoid redundant store.
> > * gcc.dg/graphite/scop-18.c: Likewise.
> > * gcc.dg/pr41783.c: Disable LIM.
> > * gcc.dg/tree-ssa/loadpre10.c: Likewise.
> > * gcc.dg/tree-ssa/loadpre23.c: Likewise.
> > * gcc.dg/tree-ssa/loadpre24.c: Likewise.
> > * gcc.dg/tree-ssa/loadpre25.c: Likewise.
> > * gcc.dg/tree-ssa/loadpre4.c: Likewise.
> > * gcc.dg/tree-ssa/loadpre8.c: Likewise.
> > * gcc.dg/tree-ssa/ssa-pre-16.c: Likewise.
> > * gcc.dg/tree-ssa/ssa-pre-18.c: Likewise.
> > * gcc.dg/tree-ssa/ssa-pre-20.c: Likewise.
> > * gcc.dg/tree-ssa/ssa-pre-3.c: Likewise.
> > * gfortran.dg/pr42108.f90: Likewise.
> >
> > Index: trunk/gcc/passes.def
> > ===
> > --- trunk.orig/gcc/passes.def   2016-05-18 11:46:56.518134310 +0200
> > +++ trunk/gcc/passes.def2016-05-18 11:47:16.006355920 +0200
> > @@ -243,12 +243,14 @@ along with GCC; see the file COPYING3.
> >NEXT_PASS (pass_cse_sincos);
> >NEXT_PASS (pass_optimize_bswap);
> >NEXT_PASS (pass_laddress);
> > +  NEXT_PASS (pass_lim);
> >NEXT_PASS (pass_split_crit_edges);
> >NEXT_PASS (pass_pre);
> >NEXT_PASS (pass_sink_code);
> >NEXT_PASS (pass_sancov);
> >NEXT_PASS (pass_asan);
> >NEXT_PASS (pass_tsan);
> > +  NEXT_PASS (pass_dce);
> >/* Pass group that runs when 1) enabled, 2) there are loops
> >  in the function.  Make sure to run pass_fix_loops before
> >  to discover/remove loops before running the gate function
> > @@ -257,9 +259,6 @@ along with GCC; see the file COPYING3.
> >NEXT_PASS (pass_tree_loop);
> >PUSH_INSERT_PASSES_WITHIN (pass_tree_loop)
> >   NEXT_PASS (pass_tree_loop_init);
> > - NEXT_PASS (pass_lim);
> > - NEXT_PASS (pass_copy_prop);
> > - NEXT_PASS (pass_dce);
> >   NEXT_PASS (pass_tree_unswitch);
> >   NEXT_PASS (pass_scev_cprop);
> >   NEXT_PASS (pass_record_bounds);
> > Index: trunk/gcc/testsuite/gcc.dg/autopar/outer-6.c
> > ===
> > --- trunk.orig/gcc/testsuite/gcc.dg/autopar/outer-6.c   2016-01-20 
> > 15:36:51.477802338 +0100
> > +++ trunk/gcc/testsuite/gcc.dg/autopar/outer-6.c2016-05-18 
> > 12:40:29.342665450 +0200
> > @@ -24,7 +24,7 @@ void parloop (int N)
> >for (i = 0; i < N; i++)
> >{
> >  for (j = 0; j < N; j++)
> > -  y[i]=x[i][j];
> > +  y[i]+=x[i][j];
> >  sum += y[i];
> >}
> >g_sum = sum;
> > Index: trunk/gcc/testsuite/gcc.dg/graphite/scop-18.c
> > ===
> > --- trunk.orig/gcc/testsuite/gcc.dg/graphite/scop-18.c  2015-09-14 
> > 10:21:31.364089947 +0200
> > +++ trunk/gcc/testsuite/gcc.dg/graphite/scop-18.c   2016-05-18 
> > 12:38:35.673369299 +0200
> > @@ -13,13 +13,13 @@ void test (void)
> >for (i = 0; i < 24; i++)
> >  for (j = 0; j < 24; j++)
> >for (k = 0; k < 24; k++)
> > -A[i][j] = B[i][k] * C[k][j];
> > +A[i][j] += B[i][k] * C[k][j];
> >
> >/* These loops should still be strip mined.  */
> >for (i = 0; i < 1000; i++)
> >  for (j =

Re: [PR71252][PR71269] Fix trunk errors due to stmt_to_insert

2016-05-27 Thread Richard Biener
On Thu, May 26, 2016 at 11:32 AM, Kugan Vivekanandarajah
 wrote:
> Hi Jakub,
>
>
> On 26 May 2016 at 18:18, Jakub Jelinek  wrote:
>> On Thu, May 26, 2016 at 02:17:56PM +1000, Kugan Vivekanandarajah wrote:
>>> --- a/gcc/tree-ssa-reassoc.c
>>> +++ b/gcc/tree-ssa-reassoc.c
>>> @@ -3767,8 +3767,10 @@ swap_ops_for_binary_stmt (vec ops,
>>>operand_entry temp = *oe3;
>>>oe3->op = oe1->op;
>>>oe3->rank = oe1->rank;
>>> +  oe3->stmt_to_insert = oe1->stmt_to_insert;
>>>oe1->op = temp.op;
>>>oe1->rank= temp.rank;
>>> +  oe1->stmt_to_insert = temp.stmt_to_insert;
>>
>> If you want to swap those 3 fields (what about the others?), can't you write
>>   std::swap (oe1->op, oe3->op);
>>   std::swap (oe1->rank, oe3->rank);
>>   std::swap (oe1->stmt_to_insert, oe3->stmt_to_insert);
>> instead and drop operand_entry temp = *oe3; ?
>>
>>>  }
>>>else if ((oe1->rank == oe3->rank
>>>   && oe2->rank != oe3->rank)
>>> @@ -3779,8 +3781,10 @@ swap_ops_for_binary_stmt (vec ops,
>>>operand_entry temp = *oe2;
>>>oe2->op = oe1->op;
>>>oe2->rank = oe1->rank;
>>> +  oe2->stmt_to_insert = oe1->stmt_to_insert;
>>>oe1->op = temp.op;
>>>oe1->rank = temp.rank;
>>> +  oe1->stmt_to_insert = temp.stmt_to_insert;
>>>  }
>>
>> Similarly.
>
> Done. Revised patch attached.

Your patch only adds a single testcase, please make sure to include
_all_ relevant testcases.

The swap should simply swap the whole operand, thus

 std::swap (*oe1, *oe3);

it was probably not updated when all the other fields were added.

I don't like the find_insert_point changes or the change before
build_and_add_sum.
Why not move the if (stmt1) insert; if (stmt2) insert; before the if
() unconditionally?

Do we make progress with just the rest of the changes?  If so please split the
patch and include relevant testcases.

Thanks,
Richard.

> Thanks,
> Kugan


Re: [PATCH][2/3][AArch64] Keep CTZ components together until after reload

2016-05-27 Thread James Greenhalgh
On Thu, May 26, 2016 at 10:53:07AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> In a similar rationale to patch 1/3 this patch changes the AArch64 backend to
> keep the CTZ expression as a single RTX until after reload when it is split
> into an RBIT and a CLZ instruction.  This enables CTZ-specific optimisations
> in the pre-reload RTL optimisers.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

Makes sense to me. Thanks for the patch.

OK.

Thanks,
James

> 2016-05-26  Kyrylo Tkachov  
> 
> PR middle-end/37780
> * config/aarch64/aarch64.md (ctz2): Convert to
> define_insn_and_split.



Re: [PATCH][AArch64] Simplify ashl3 expander for SHORT modes

2016-05-27 Thread James Greenhalgh
On Wed, Apr 27, 2016 at 03:10:47PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The ashl3 expander for QI and HI modes is needlessly obfuscated.
> The 2nd operand predicate accepts nonmemory_operand but the expand code
> FAILs if it's not a CONST_INT. We can just demand a const_int_operand in
> the predicate and remove the extra CONST_INT check.
> 
> Looking at git blame, it seems it was written that way as a result of some
> other refactoring a few years back for an unrelated change.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?

This is OK.

Thanks for the cleanup,
James

> 2016-04-27  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.md (ashl3, SHORT modes):
> Use const_int_operand for operand 2 predicate.  Simplify expand code
> as a result.



Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-27 Thread Richard Biener
On Fri, 27 May 2016, Jan Hubicka wrote:

> Hi,
> this is version of patch which bootstrap®ress all languages at x86_64. OK?

The tree-pretty-print.c change is ok.  I think the

> -  if (stride)
> +  if (stride && akind >= GFC_ARRAY_ALLOCATABLE)

should include a comment.

Richard.

> Honza
> 
>   * trans-types.c (gfc_array_range_type): Remove.
>   (gfc_init_types): Do not build gfc_array_range_type.
>   (gfc_get_array_type_bounds): Do not put unrealistic array bounds.
>   * trans-types.h (gfc_array_range_type): Remove.
>   * tree-pretty-print.c (dump_array_domain): Dump empty domain as
>   [0:].
> Index: fortran/trans-types.c
> ===
> --- fortran/trans-types.c (revision 236762)
> +++ fortran/trans-types.c (working copy)
> @@ -52,7 +52,6 @@ along with GCC; see the file COPYING3.
>  CInteropKind_t c_interop_kinds_table[ISOCBINDING_NUMBER];
>  
>  tree gfc_array_index_type;
> -tree gfc_array_range_type;
>  tree gfc_character1_type_node;
>  tree pvoid_type_node;
>  tree prvoid_type_node;
> @@ -945,12 +944,6 @@ gfc_init_types (void)
>  = build_pointer_type (build_function_type_list (void_type_node, 
> NULL_TREE));
>  
>gfc_array_index_type = gfc_get_int_type (gfc_index_integer_kind);
> -  /* We cannot use gfc_index_zero_node in definition of gfc_array_range_type,
> - since this function is called before gfc_init_constants.  */
> -  gfc_array_range_type
> -   = build_range_type (gfc_array_index_type,
> -   build_int_cst (gfc_array_index_type, 0),
> -   NULL_TREE);
>  
>/* The maximum array element size that can be handled is determined
>   by the number of bits available to store this field in the array
> @@ -1920,12 +1913,12 @@ gfc_get_array_type_bounds (tree etype, i
>  
>/* We define data as an array with the correct size if possible.
>   Much better than doing pointer arithmetic.  */
> -  if (stride)
> +  if (stride && akind >= GFC_ARRAY_ALLOCATABLE)
>  rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
> int_const_binop (MINUS_EXPR, stride,
>  build_int_cst (TREE_TYPE 
> (stride), 1)));
>else
> -rtype = gfc_array_range_type;
> +rtype = NULL;
>arraytype = build_array_type (etype, rtype);
>arraytype = build_pointer_type (arraytype);
>if (restricted)
> Index: fortran/trans-types.h
> ===
> --- fortran/trans-types.h (revision 236762)
> +++ fortran/trans-types.h (working copy)
> @@ -24,7 +24,6 @@ along with GCC; see the file COPYING3.
>  #define GFC_BACKEND_H
>  
>  extern GTY(()) tree gfc_array_index_type;
> -extern GTY(()) tree gfc_array_range_type;
>  extern GTY(()) tree gfc_character1_type_node;
>  extern GTY(()) tree ppvoid_type_node;
>  extern GTY(()) tree pvoid_type_node;
> Index: tree-pretty-print.c
> ===
> --- tree-pretty-print.c   (revision 236762)
> +++ tree-pretty-print.c   (working copy)
> @@ -362,7 +362,7 @@ dump_array_domain (pretty_printer *pp, t
>   }
>  }
>else
> -pp_string (pp, "");
> +pp_string (pp, "0:");
>pp_right_bracket (pp);
>  }
>  
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH][3/3] No need to vectorize simple only-live stmts

2016-05-27 Thread Alan Hayward
Statements which are live but not relevant need marking to ensure they are
vectorized.

Live statements which are simple and all uses of them are invariant do not
need
to be vectorized.

This patch adds a check to make sure those stmts which pass both the above
checks are not vectorized and then discarded.

Tested on x86 and aarch64.


gcc/
*tree-vect-stmts.c (vect_stmt_relevant_p): Do not vectorize non
live
relevant stmts which are simple and invariant.

testsuite/
* gcc.dg/vect/vect-live-slp-5.c: Remove dg check.



Alan.



live3.patch
Description: Binary data


[PATCH][2/3] Vectorize inductions that are live after the loop

2016-05-27 Thread Alan Hayward
This patch is a reworking of the previous vectorize inductions that are
live
after the loop patch.
It now supports SLP and an optimisation has been moved to patch [3/3].


Stmts which are live (ie: defined inside a loop and then used after the
loop)
are not currently supported by the vectorizer.  In many cases
vectorization can
still occur because the SCEV cprop pass will hoist the definition of the
stmt
outside of the loop before the vectorizor pass. However, there are various
cases SCEV cprop cannot hoist, for example:
  for (i = 0; i < n; ++i)
{
  ret = x[i];
  x[i] = i;
}
   return i;

Currently stmts are marked live using a bool, and the relevant state using
an
enum. Both these states are propagated to the definition of all uses of the
stmt. Also, a stmt can be live but not relevant.

This patch vectorizes a live stmt definition normally within the loop and
then
after the loop uses BIT_FIELD_REF to extract the final scalar value from
the
vector.

This patch adds a new relevant state (vect_used_only_live) for when a stmt
is
used only outside the loop. The relevant state is still propagated to all
it's
uses, but the live bool is not (this ensures that
vectorizable_live_operation
is only called with stmts that really are live).

Tested on x86 and aarch64.

gcc/
* tree-vect-loop.c (vect_analyze_loop_operations): Allow live
stmts.
(vectorizable_reduction): Check for new relevant state.
(vectorizable_live_operation): vectorize live stmts using
BIT_FIELD_REF.  Remove special case for gimple assigns stmts.
* tree-vect-stmts.c (is_simple_and_all_uses_invariant): New
function.
(vect_stmt_relevant_p): Check for stmts which are only used live.
(process_use): Use of a stmt does not inherit it's live value.
(vect_mark_stmts_to_be_vectorized): Simplify relevance inheritance.
(vect_analyze_stmt): Check for new relevant state.
*tree-vect-slp.c (vect_get_slp_vect_defs): Make global
*tree-vectorizer.h (vect_relevant): New entry for a stmt which is
used
outside the loop, but not inside it.

testsuite/
* gcc.dg/tree-ssa/pr64183.c: Ensure test does not vectorize.
* testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c: Remove xfail.
* gcc.dg/vect/vect-live-1.c: New test.
* gcc.dg/vect/vect-live-2.c: New test.
* gcc.dg/vect/vect-live-3.c: New test.
* gcc.dg/vect/vect-live-4.c: New test.
* gcc.dg/vect/vect-live-5.c: New test.
* gcc.dg/vect/vect-live-slp-1.c: New test.
* gcc.dg/vect/vect-live-slp-2.c: New test.
* gcc.dg/vect/vect-live-slp-3.c: New test.
* gcc.dg/vect/vect-live-slp-4.c: New test.



Alan.



live2.patch
Description: Binary data


[PATCH][1/3] Add loop_vinfo to vect_get_vec_def_for_operand

2016-05-27 Thread Alan Hayward
This patch simply adds loop_vinfo as an extra argument to
vect_get_vec_def_for_operand and only generates a stmt_vinfo if required.
This is a required cleanup for patch [2/3].
Tested on x86 and aarch64.

gcc/
* tree-vectorizer.h (vect_get_vec_def_for_operand): Pass loop_vinfo in.
* tree-vect-stmts.c (vect_get_vec_def_for_operand): Pass loop_vinfo in.
(vect_get_vec_defs): Pass down loop_vinfo.
(vectorizable_mask_load_store): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vect_get_loop_based_defs): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
* tree-vect-loop.c (get_initial_def_for_induction): Likewise.
(get_initial_def_for_reduction): Likewise.
(vectorizable_reduction):  Likewise.


Alan.



live1.patch
Description: Binary data


[PATCH, PR middle-end/71279] Avoid folding vec_cond_expr into comparison

2016-05-27 Thread Ilya Enkovich
Hi,

This patch disable transformation of VEC_COND_EXPR into comparison
which became invalid after boolean vectors introduciton.

Bootstrapped and regtested for x86_64-pc-linux-gnu.  OK for trunk
and gcc-6?

Thanks,
Ilya
--
gcc/

2016-05-27  Ilya Enkovich  

PR middle-end/71279
* fold-const.c (fold_ternary_loc): Don't fold VEC_COND_EXPR
into comparison.

gcc/testsuite/

2016-05-27  Ilya Enkovich  

PR middle-end/71279
* gcc.dg/pr71279.c: New test.


diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 556fc73..5058746 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -11515,9 +11515,9 @@ fold_ternary_loc (location_t loc, enum tree_code code, 
tree type,
   /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
 over COND_EXPR in cases such as floating point comparisons.  */
   if (integer_zerop (op1)
- && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
-   : (integer_onep (op2)
-  && !VECTOR_TYPE_P (type)))
+ && code == COND_EXPR
+ && integer_onep (op2)
+ && !VECTOR_TYPE_P (type)
  && truth_value_p (TREE_CODE (arg0)))
return pedantic_non_lvalue_loc (loc,
fold_convert_loc (loc, type,
diff --git a/gcc/testsuite/gcc.dg/pr71279.c b/gcc/testsuite/gcc.dg/pr71279.c
new file mode 100644
index 000..4ecc84b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr71279.c
@@ -0,0 +1,14 @@
+/* PR middle-end/71279 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-march=knl" { target { i?86-*-* x86_64-*-* } } } */
+
+extern int a, b;
+long c[1][1][1];
+long d[1][1];
+
+void fn1 ()
+{
+  for (int e = 0; e < b; e = e + 1)
+*(e + **c) = (a && *d[1]) - 1;
+}


Re: [PATCH][AArch64] Remove aarch64_cannot_change_mode_class

2016-05-27 Thread James Greenhalgh
On Thu, May 19, 2016 at 12:23:32PM +0100, Wilco Dijkstra wrote:
> Remove aarch64_cannot_change_mode_class as the underlying issue
> (PR67609) has been resolved.  This avoids a few unnecessary lane
> widening operations like:
> 
> faddp   d18, v18.2d
> mov d18, v18.d[0] 
> 
> Passes regress, OK for commit?

Which targets did you check? I'd hope aarch64_be-none-elf in addition to
aarch64-none-linux-gnu.

OK if they're both clean.

Please refer to PR67609 in your ChangeLog so this gets tracked alongside
the other fixes in that bug.

Thanks,
James

> 
> ChangeLog:
> 2016-05-19  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.h
>   (CANNOT_CHANGE_MODE_CLASS): Remove.
>   * gcc/config/aarch64/aarch64.c
>   (aarch64_cannot_change_mode_class): Remove function.
>   * cc/config/aarch64/aarch64-protos.h
>   (aarch64_cannot_change_mode_class): Remove.
> 



Re: [PATCH][AArch64] Delete obsolete CC_ZESWP and CC_SESWP CC modes

2016-05-27 Thread James Greenhalgh
On Wed, Apr 27, 2016 at 03:12:10PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The CC_ZESWP and CC_SESWP are not used anywhere and seem to be a remmant of 
> some
> old code that was removed. The various compare+extend patterns in aarch64.md 
> don't
> use these modes. So it should be safe to remove them to avoid future 
> confusion.
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?

OK.

Thanks for the cleanup.
James

> 2016-04-27  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64-modes.def (CC_ZESWP, CC_SESWP): Delete.
> * config/aarch64/aarch64.c (aarch64_select_cc_mode): Remove condition
> that returns CC_SESWPmode and CC_ZESWPmode.
> (aarch64_get_condition_code_1): Remove handling of CC_SESWPmode
> and CC_SESWPmode.
> (aarch64_rtx_costs): Likewise.



Re: [SPARC] Support for --with-{cpu,tune}-{32,64} in sparc*-* targets

2016-05-27 Thread Eric Botcazou
> Tested in sparc64-linux-gnu, sparcv9-linux-gnu and sparc-sun-solaris2.11.
> 
> 2016-05-25  Jose E. Marchesi  
> 
>   * config.gcc (sparc*-*-*): Support cpu_32, cpu_64, tune_32 and
>   tune_64.
>   * doc/install.texi (--with-cpu-32, --with-cpu-64): Document
>   support on SPARC.
>   * config/sparc/linux64.h (OPTION_DEFAULT_SPECS): Add entries for
>   cpu_32, cpu_64, tune_32 and tune_64.
>   * config/sparc/sol2.h (OPTION_DEFAULT_SPECS): Likewise.

OK for mainline, thanks.

-- 
Eric Botcazou


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-27 Thread Jan Hubicka
Hi,
this is version of patch which bootstrap®ress all languages at x86_64. OK?

Honza

* trans-types.c (gfc_array_range_type): Remove.
(gfc_init_types): Do not build gfc_array_range_type.
(gfc_get_array_type_bounds): Do not put unrealistic array bounds.
* trans-types.h (gfc_array_range_type): Remove.
* tree-pretty-print.c (dump_array_domain): Dump empty domain as
[0:].
Index: fortran/trans-types.c
===
--- fortran/trans-types.c   (revision 236762)
+++ fortran/trans-types.c   (working copy)
@@ -52,7 +52,6 @@ along with GCC; see the file COPYING3.
 CInteropKind_t c_interop_kinds_table[ISOCBINDING_NUMBER];
 
 tree gfc_array_index_type;
-tree gfc_array_range_type;
 tree gfc_character1_type_node;
 tree pvoid_type_node;
 tree prvoid_type_node;
@@ -945,12 +944,6 @@ gfc_init_types (void)
 = build_pointer_type (build_function_type_list (void_type_node, 
NULL_TREE));
 
   gfc_array_index_type = gfc_get_int_type (gfc_index_integer_kind);
-  /* We cannot use gfc_index_zero_node in definition of gfc_array_range_type,
- since this function is called before gfc_init_constants.  */
-  gfc_array_range_type
- = build_range_type (gfc_array_index_type,
- build_int_cst (gfc_array_index_type, 0),
- NULL_TREE);
 
   /* The maximum array element size that can be handled is determined
  by the number of bits available to store this field in the array
@@ -1920,12 +1913,12 @@ gfc_get_array_type_bounds (tree etype, i
 
   /* We define data as an array with the correct size if possible.
  Much better than doing pointer arithmetic.  */
-  if (stride)
+  if (stride && akind >= GFC_ARRAY_ALLOCATABLE)
 rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
  int_const_binop (MINUS_EXPR, stride,
   build_int_cst (TREE_TYPE 
(stride), 1)));
   else
-rtype = gfc_array_range_type;
+rtype = NULL;
   arraytype = build_array_type (etype, rtype);
   arraytype = build_pointer_type (arraytype);
   if (restricted)
Index: fortran/trans-types.h
===
--- fortran/trans-types.h   (revision 236762)
+++ fortran/trans-types.h   (working copy)
@@ -24,7 +24,6 @@ along with GCC; see the file COPYING3.
 #define GFC_BACKEND_H
 
 extern GTY(()) tree gfc_array_index_type;
-extern GTY(()) tree gfc_array_range_type;
 extern GTY(()) tree gfc_character1_type_node;
 extern GTY(()) tree ppvoid_type_node;
 extern GTY(()) tree pvoid_type_node;
Index: tree-pretty-print.c
===
--- tree-pretty-print.c (revision 236762)
+++ tree-pretty-print.c (working copy)
@@ -362,7 +362,7 @@ dump_array_domain (pretty_printer *pp, t
}
 }
   else
-pp_string (pp, "");
+pp_string (pp, "0:");
   pp_right_bracket (pp);
 }