Re: [PING] [PR other/70945] Handle function_glibc_finite_math in offloading

2016-06-06 Thread Jakub Jelinek
On Mon, Jun 06, 2016 at 09:11:18PM +, Joseph Myers wrote:
> On Fri, 3 Jun 2016, Jakub Jelinek wrote:
> 
> > On Fri, Jun 03, 2016 at 04:44:15PM +0200, Thomas Schwinge wrote:
> > > Hi!
> > > 
> > > Ping.
> > 
> > I think it would be better to just add this support to newlib.
> 
> That suggestion doesn't really make sense to me.  Why should newlib be 
> expected to follow the same choices as glibc regarding what variants of 
> libm functions to export, beyond the standard names for those functions, 
> or how to name any variants it does export?

I'm not saying newlib in general, let newlib do whatever they want, but
I'm talking about offloading port(s) of newlib, which IMHO should provide
translation layer from the host headers to the offloading target functions.
The thing is, I think it is much better to have this layer in a source form
where you can easily modify it than inside of the compiler where you have to
hardwire everything in there.  It could sit in some offloading directory of
newlib, which the offloading port(s) could share.
The __*_finite functions aren't the only one, what if glibc the next half a
year adds another 4-5 of the finite math functions?  What about e.g.
-D_FORTIFY_SOURCE=2 string functions, etc.?

Jakub


[PING] C/C++ OpenACC routine directive, undeclared name error: try to help the user, once

2016-06-06 Thread Thomas Schwinge
Hi!

Ping.

On Tue, 31 May 2016 17:49:49 +0200, I wrote:
> OK for trunk, as follows?
> 
> commit 3289032bf7fd7e4a0cce37e7acd71e3330729d83
> Author: Thomas Schwinge 
> Date:   Tue May 31 17:46:26 2016 +0200
> 
> C/C++ OpenACC routine directive, undeclared name error: try to help the 
> user, once
> 
>   gcc/c/
>   * c-parser.c (c_parser_oacc_routine): If running into an
>   undeclared name error, try to help the user, once.
>   gcc/cp/
>   * parser.c (cp_parser_oacc_routine): If running into an undeclared
>   name error, try to help the user, once.
>   gcc/testsuite/
>   * c-c++-common/goacc/routine-5.c: Update.
> ---
>  gcc/c/c-parser.c | 16 ++--
>  gcc/cp/parser.c  | 16 ++--
>  gcc/testsuite/c-c++-common/goacc/routine-5.c | 15 ++-
>  3 files changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git gcc/c/c-parser.c gcc/c/c-parser.c
> index 993c0a0..d3cab69 100644
> --- gcc/c/c-parser.c
> +++ gcc/c/c-parser.c
> @@ -14003,8 +14003,20 @@ c_parser_oacc_routine (c_parser *parser, enum 
> pragma_context context)
>   {
> decl = lookup_name (token->value);
> if (!decl)
> - error_at (token->location, "%qE has not been declared",
> -   token->value);
> + {
> +   error_at (token->location, "%qE has not been declared",
> + token->value);
> +   static bool informed_once = false;
> +   if (!informed_once)
> + {
> +   inform (token->location,
> +   "omit the %<(%E)%>, if you want to mark the"
> +   " immediately following function, or place this"
> +   " pragma after a declaration of the function to be"
> +   " marked", token->value);
> +   informed_once = true;
> + }
> + }
> c_parser_consume_token (parser);
>   }
>else
> diff --git gcc/cp/parser.c gcc/cp/parser.c
> index 8841666..0c67608 100644
> --- gcc/cp/parser.c
> +++ gcc/cp/parser.c
> @@ -36528,8 +36528,20 @@ cp_parser_oacc_routine (cp_parser *parser, cp_token 
> *pragma_tok,
>/*optional_p=*/false);
>decl = cp_parser_lookup_name_simple (parser, id, token->location);
>if (id != error_mark_node && decl == error_mark_node)
> - cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
> -  token->location);
> + {
> +   cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
> +token->location);
> +   static bool informed_once = false;
> +   if (!informed_once)
> + {
> +   inform (token->location,
> +   "omit the %<(%E)%>, if you want to mark the"
> +   " immediately following function, or place this"
> +   " pragma after a declaration of the function to be"
> +   " marked", id);
> +   informed_once = true;
> + }
> + }
>  
>if (decl == error_mark_node
> || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
> diff --git gcc/testsuite/c-c++-common/goacc/routine-5.c 
> gcc/testsuite/c-c++-common/goacc/routine-5.c
> index 1efd154..def78cd 100644
> --- gcc/testsuite/c-c++-common/goacc/routine-5.c
> +++ gcc/testsuite/c-c++-common/goacc/routine-5.c
> @@ -71,7 +71,20 @@ void Foo ()
>  
>  #pragma acc routine (Foo) gang // { dg-error "must be applied before 
> definition" }
>  
> -#pragma acc routine (Baz) // { dg-error "not been declared" }
> +#pragma acc routine (Baz) worker
> +/* { dg-error ".Baz. has not been declared" "" { target *-*-* } 74 }
> +   Try to help the user:
> +   { dg-message "note: omit the .\\(Baz\\)., if" "" { target *-*-* } 74 } */
> +
> +#pragma acc routine (Baz) vector
> +/* { dg-error ".Baz. has not been declared" "" { target *-*-* } 79 }
> +   Don't try to help the user again:
> +   { dg-bogus "note: omit the .\\(Baz\\)., if" "" { target *-*-* } 79 } */
> +
> +#pragma acc routine (Qux) seq
> +/* { dg-error ".Qux. has not been declared" "" { target *-*-* } 84 }
> +   Don't try to help the user again:
> +   { dg-bogus "note: omit the .\\(Qux\\)., if" "" { target *-*-* } 84 } */
>  
>  
>  int vb1; /* { dg-error "directive for use" } */


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] Selftest framework (v7)

2016-06-06 Thread Trevor Saunders
On Mon, Jun 06, 2016 at 11:57:49PM +0200, Jakub Jelinek wrote:
> On Mon, Jun 06, 2016 at 05:53:50PM -0400, Trevor Saunders wrote:
> > > > As far as I can 
> > > > tell this just involves moving the start of namespace selftest
> > > > upwards a 
> > > > bit in the files where we have tests.
> > > 
> > > Yes, and it does seem cleaner to have all of the selftest code start
> > > like this:
> > > 
> > >   #if CHECKING_P
> > 
> > What are we gaining by ifdefing this? I would think on reasonable
> > systems the compiler would optimize out the call to the selftests in
> > release builds and then the linker would gc all the unused functions.
> > Do we really care about code size in places that doesn't happen enough
> > to go through this?
> 
> Not everyone is building the compiler with LTO, and if you don't, then
> how would you optimize that away?

-ffunction-sections -Wl,--gc-sections should be enough I think.  I guess
we don't use those at the moment though.

> And yes, not having the self-tests, especially if they are going to grow
> further, in release compilers is desirable, especially if it would be
> intermixed with hot code.

That's fair, though turning on --gc-sections where we can should further
help with that, and that should be more effective with
-ffunction-sections -fdata-sections, so its seems to me like the right
thing to do is add configure tests to enable those?  And then its more
of a non issue?

Trev

> 
>   Jakub


[committed] Generate indirect long calls to non-local functions on 64-bit hppa

2016-06-06 Thread John David Anglin
The attached patch generates indirect long calls to non-local functions on 
64-bit hppa.  This improves
opportunities for optimization and scheduling.

Tested on hppa64-hp-hpux11.11 with no observed regressions.

Dave
--
John David Anglin   dave.ang...@bell.net


2016-06-06  John David Anglin  

* config/pa/pa.md (call): Generate indirect long calls to non-local
functions on TARGET_64BIT.
(call_value): Likewise.

Index: config/pa/pa.md
===
--- config/pa/pa.md (revision 237110)
+++ config/pa/pa.md (working copy)
@@ -7014,7 +7014,7 @@
   op = XEXP (operands[0], 0);
 
   /* Generate indirect long calls to non-local functions. */
-  if (!TARGET_64BIT && TARGET_LONG_CALLS && GET_CODE (op) == SYMBOL_REF)
+  if (TARGET_LONG_CALLS && GET_CODE (op) == SYMBOL_REF)
{
  tree call_decl = SYMBOL_REF_DECL (op);
  if (!(call_decl && targetm.binds_local_p (call_decl)))
@@ -7517,7 +7517,7 @@
call_powf = true;
 
  /* Generate indirect long calls to non-local functions. */
- else if (!TARGET_64BIT && TARGET_LONG_CALLS)
+ else if (TARGET_LONG_CALLS)
{
  tree call_decl = SYMBOL_REF_DECL (op);
  if (!(call_decl && targetm.binds_local_p (call_decl)))


[committed] Remove R1 clobbers from 64-bit indirect call value patters and splitters

2016-06-06 Thread John David Anglin
The instruction sequence generated for 64-bit indirect calls on hppa does not 
clobber register %r1, so
the clobbers for this register can be removed.

Tested on hppa64-hp-hpux11.11 with no observed regressions.

Dave
--
John David Anglin   dave.ang...@bell.net


2016-06-06  John David Anglin  

* config/pa/pa.md (call_val_reg_64bit): Remove "reg: DI " clobber from
pattern and subsequent splitters.
(call_val_reg_64bit_post_reload): Likewise.

Index: config/pa/pa.md
===
--- config/pa/pa.md (revision 237110)
+++ config/pa/pa.md (working copy)
@@ -8133,7 +8133,6 @@
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "register_operand" "r"))
  (match_operand 2 "" "i")))
-   (clobber (reg:DI 1))
(clobber (reg:DI 2))
(clobber (match_operand 3))
(use (reg:DI 27))
@@ -8155,7 +8154,6 @@
   [(parallel [(set (match_operand 0 "" "")
   (call (mem:SI (match_operand:DI 1 "register_operand" ""))
 (match_operand 2 "" "")))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (clobber (match_operand 3))
  (use (reg:DI 27))
@@ -8167,7 +8165,6 @@
(parallel [(set (match_dup 0)
   (call (mem:SI (match_dup 1))
 (match_dup 2)))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (use (reg:DI 27))
  (use (reg:DI 29))
@@ -8178,7 +8175,6 @@
   [(parallel [(set (match_operand 0 "" "")
   (call (mem:SI (match_operand:DI 1 "register_operand" ""))
 (match_operand 2 "" "")))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (clobber (match_operand 3))
  (use (reg:DI 27))
@@ -8189,7 +8185,6 @@
(parallel [(set (match_dup 0)
   (call (mem:SI (match_dup 1))
 (match_dup 2)))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (use (reg:DI 27))
  (use (reg:DI 29))
@@ -8201,7 +8196,6 @@
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "register_operand" "r"))
  (match_operand 2 "" "i")))
-   (clobber (reg:DI 1))
(clobber (reg:DI 2))
(use (reg:DI 27))
(use (reg:DI 29))


[PATCH,rs6000] Add built-in function support for new Power9 vector absolute difference unsigned instructions

2016-06-06 Thread Kelvin Nilsen

This patch adds built-in function support for the ISA 3.0 vabsub, 
vabsduh, and vabsduw instructions.

I have bootstrapped and tested on powerpc64le-unkonwn-linux-gnu with no 
regressions.  Is this ok for the trunk?

I have also tested against the gcc-6 branch without regressions.  Is 
this ok for backporting to gcc6 after a few days of burn-in time on the 
trunk?

gcc/testsuite/ChangeLog:

2016-06-06  Kelvin Nilsen  

* gcc.target/powerpc/vadsdu-0.c: New test.
* gcc.target/powerpc/vadsdu-1.c: New test.
* gcc.target/powerpc/vadsdu-2.c: New test.
* gcc.target/powerpc/vadsdu-3.c: New test.
* gcc.target/powerpc/vadsdu-4.c: New test.
* gcc.target/powerpc/vadsdu-5.c: New test.
* gcc.target/powerpc/vadsdub-1.c: New test.
* gcc.target/powerpc/vadsdub-2.c: New test.
* gcc.target/powerpc/vadsduh-1.c: New test.
* gcc.target/powerpc/vadsduh-2.c: New test.
* gcc.target/powerpc/vadsduw-1.c: New test.
* gcc.target/powerpc/vadsduw-2.c: New test.


gcc/ChangeLog:

2016-06-06  Kelvin Nilsen  

* config/rs6000/altivec.h (vec_adu): New macro for vector absolute
difference unsigned.
(vec_adub): New macro for vector absolute difference unsigned
byte.
(vec_aduh): New macro for vector absolute difference unsigned
half-word. 
(vec_aduw): New macro for vector absolute difference unsigned word.
* config/rs6000/altivec.md (UNSPEC_VADU): New value.
(vadu3): New insn.
(*p9_vadu3): New insn.
* config/rs6000/rs6000-builtin.def (vadub): New built-in
definition.
(vaduh): New built-in definition.
(vaduw): New built-in definition.
(vadu): New overloaded built-in definition.
(vadub): New overloaded built-in definition.
(vaduh): New overloaded built-in definition.
(vaduw): New overloaded built-in definition.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
overloaded vector absolute difference unsigned functions.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Document
the ISA 3.0 vector absolute difference unsigned built-in functions.

Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 237045)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -401,6 +401,11 @@
 #define vec_vprtybq __builtin_vec_vprtybq
 #endif
 
+#define vec_adu __builtin_vec_vadu
+#define vec_adub __builtin_vec_vadub
+#define vec_aduh __builtin_vec_vaduh
+#define vec_aduw __builtin_vec_vaduw
+
 #define vec_slv __builtin_vec_vslv
 #define vec_srv __builtin_vec_vsrv
 #endif
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 237045)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -114,6 +114,7 @@
UNSPEC_STVLXL
UNSPEC_STVRX
UNSPEC_STVRXL
+   UNSPEC_VADU
UNSPEC_VSLV
UNSPEC_VSRV
UNSPEC_VMULWHUB
@@ -3464,6 +3465,25 @@
   [(set_attr "length" "4")
(set_attr "type" "vecsimple")])
 
+;; Vector absolute difference unsigned
+(define_expand "vadu3"
+  [(set (match_operand:VI 0 "register_operand" "")
+(unspec:VI [(match_operand:VI 1 "register_operand" "")
+   (match_operand:VI 2 "register_operand" "")]
+ UNSPEC_VADU))]
+  "TARGET_P9_VECTOR")
+
+;; Vector absolute difference unsigned
+(define_insn "*p9_vadu3"
+  [(set (match_operand:VI 0 "register_operand" "=v")
+(unspec:VI [(match_operand:VI 1 "register_operand" "v")
+   (match_operand:VI 2 "register_operand" "v")]
+ UNSPEC_VADU))]
+  "TARGET_P9_VECTOR"
+  "vabsdu %0, %1, %2"
+  [(set_attr "type" "add")
+   (set_attr "length" "4")])
+
 ;; Vector count trailing zeros
 (define_insn "*p9v_ctz2"
   [(set (match_operand:VI2 0 "register_operand" "=v")
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 237045)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1757,6 +1757,17 @@ BU_P9V_AV_2 (VSRV,   "vsrv", 
CONST, vsrv)
 BU_P9V_OVERLOAD_2 (VSLV,   "vslv")
 BU_P9V_OVERLOAD_2 (VSRV,   "vsrv")
 
+/* 2 argument vector functions added in ISA 3.0 (power9). */
+BU_P9V_AV_2 (VADUB,"vadub",CONST,  vaduv16qi3)
+BU_P9V_AV_2 (VADUH,"vaduh",CONST,  vaduv8hi3)
+BU_P9V_AV_2 (VADUW,"vaduw",CONST,  vaduv4si3)
+
+/* ISA 3.0 vector overloaded 2 argument functions. */
+BU_P9V_OVERLOAD_2 (VADU,   "vadu")
+BU_P9V_OVERLOAD_2 (VADUB,  "vadub")
+BU_P9V_OVERLOAD_2 (VADUH,  "vaduh")
+BU_P9V_OVERLOAD_2 (VADUW,  "vaduw")
+
 
 /* 2 argument extended divide functions added in ISA 2.06.  */
 BU_P7_MISC_2 (DIVWE,   "divw

[PATCH, i386]: Insert CLD insns using mode-switching pass

2016-06-06 Thread Uros Bizjak
Hello!

Attached patch inserts CLD instruction at optimal location using
mode-switching pass, so there is no unnecessary access to flags reg in
paths that  don't use calls or string instructions.

The patch handles insertions for interrupt handler functions and also
insertions for legacy TARGET_CLD targets.

2016-06-06  Uros Bizjak  

* config/i386/i386.h (enum ix86_enitity): Add X86_DIRFLAG.
(enum x86_dirflag_state): New enum.
(NUM_MODES_FOR_MODE_SWITCHING): Add X86_DIRFLAG_ANY.
(machine_function): Remove needs_cld.
(ix86_current_function_needs_cld): Remove.
* config/i386/i386.c (ix86_set_func_type): Set
ix86_optimize_mode_switching[X86_DIRFLAG] to 1.
(ix86_expand_prologue): Do not emit CLD here.
(ix86_dirflag_mode_needed): New function.
(ix86_dirflag_mode_entry): Ditto.
(ix86_mode_needed): Handle X86_DIRFLAG entity.
(ix86_mode_after): Ditto.
(ix86_mode_entry): Ditto.
(ix86_mode_exit): Ditto.
(ix86_emit_mode_set): Ditto.
* config/i386/i386.md (strmov_singleop): Set
ix86_optimize_mode_switching[X86_DIRFLAG] to 1 for TARGET_CLD.
Do not set ix86_current_function_needs_cld.
(rep_mov): Ditto.
(strset_singleop): Ditto.
(rep_stos): Ditto.
(cmpstrnqi_nz_1): Ditto.
(cmpstrnqi_1): Ditto.
(strlenqi_1): Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 237133)
+++ config/i386/i386.c  (working copy)
@@ -6823,6 +6823,8 @@ ix86_set_func_type (tree fndecl)
  cfun->machine->func_type
= nargs == 2 ? TYPE_EXCEPTION : TYPE_INTERRUPT;
 
+ ix86_optimize_mode_switching[X86_DIRFLAG] = 1;
+
  /* Only dwarf2out.c can handle -WORD(AP) as a pointer argument.  */
  if (write_symbols != NO_DEBUG && write_symbols != DWARF2_DEBUG)
sorry ("Only DWARF debug format is supported for interrupt "
@@ -13817,16 +13819,6 @@ ix86_expand_prologue (void)
   if (frame_pointer_needed && frame.red_zone_size)
 emit_insn (gen_memory_blockage ());
 
-  /* Emit cld instruction if stringops are used in the function.  Since
- we can't assume the direction flag in interrupt handler, we must
- emit cld instruction if stringops are used in interrupt handler or
- interrupt handler isn't a leaf function.  */
-  if ((TARGET_CLD && ix86_current_function_needs_cld)
-  || (!TARGET_CLD
- && cfun->machine->func_type != TYPE_NORMAL
- && (ix86_current_function_needs_cld || !crtl->is_leaf)))
-emit_insn (gen_cld ());
-
   /* SEH requires that the prologue end within 256 bytes of the start of
  the function.  Prevent instruction schedules that would extend that.
  Further, prevent alloca modifications to the stack pointer from being
@@ -18600,6 +18592,35 @@ output_387_binary_op (rtx insn, rtx *operands)
   return buf;
 }
 
+/* Return needed mode for entity in optimize_mode_switching pass.  */
+
+static int
+ix86_dirflag_mode_needed (rtx_insn *insn)
+{
+  if (CALL_P (insn))
+{
+  if (cfun->machine->func_type == TYPE_NORMAL)
+   return X86_DIRFLAG_ANY;
+  else
+   /* No need to emit CLD in interrupt handler for TARGET_CLD.  */
+   return TARGET_CLD ? X86_DIRFLAG_ANY : X86_DIRFLAG_RESET;
+}
+
+  if (recog_memoized (insn) < 0)
+return X86_DIRFLAG_ANY;
+
+  if (get_attr_type (insn) == TYPE_STR)
+{
+  /* Emit cld instruction if stringops are used in the function.  */
+  if (cfun->machine->func_type == TYPE_NORMAL)
+   return TARGET_CLD ? X86_DIRFLAG_RESET : X86_DIRFLAG_ANY;
+  else
+   return X86_DIRFLAG_RESET;
+}
+
+  return X86_DIRFLAG_ANY;
+}
+
 /* Check if a 256bit AVX register is referenced inside of EXP.   */
 
 static bool
@@ -18712,6 +18733,8 @@ ix86_mode_needed (int entity, rtx_insn *insn)
 {
   switch (entity)
 {
+case X86_DIRFLAG:
+  return ix86_dirflag_mode_needed (insn);
 case AVX_U128:
   return ix86_avx_u128_mode_needed (insn);
 case I387_TRUNC:
@@ -18771,6 +18794,8 @@ ix86_mode_after (int entity, int mode, rtx_insn *i
 {
   switch (entity)
 {
+case X86_DIRFLAG:
+  return mode;
 case AVX_U128:
   return ix86_avx_u128_mode_after (mode, insn);
 case I387_TRUNC:
@@ -18784,6 +18809,18 @@ ix86_mode_after (int entity, int mode, rtx_insn *i
 }
 
 static int
+ix86_dirflag_mode_entry (void)
+{
+  /* For TARGET_CLD or in the interrupt handler we can't assume
+ direction flag state at function entry.  */
+  if (TARGET_CLD
+  || cfun->machine->func_type != TYPE_NORMAL)
+return X86_DIRFLAG_ANY;
+
+  return X86_DIRFLAG_RESET;
+}
+
+static int
 ix86_avx_u128_mode_entry (void)
 {
   tree arg;
@@ -18810,6 +18847,8 @@ ix86_mode_entry (int entity)
 {
   switch (entity)
 {
+case X86_DIRFLAG:
+  return ix86_dirflag_mode_entry ();
 case AVX_U128:
   return ix86

Re: [Diagnostic Patch] Clean-up diagnostic facilities in diagnostic.c

2016-06-06 Thread Paolo Carlini

Hi David,

On 06/06/2016 22:26, David Malcolm wrote:

Thanks, looks like a nice simplification.

I see the new prototypes are wrapped in #ifdef ATTRIBUTE_GCC_DIAG.
Isn't that redundant?  Looking at diagnostic-core.h, isn't it always
defined? (perhaps to the empty string)

Also the new functions are missing comments.

Other than that, LGTM.
Thanks. Thus tomorrow morning, after an additional round of testing I'll 
commit the below.


Thanks again,
Paolo.

/
Index: diagnostic.c
===
--- diagnostic.c(revision 237155)
+++ diagnostic.c(working copy)
@@ -46,8 +46,12 @@ along with GCC; see the file COPYING3.  If not see
 #define permissive_error_option(DC) ((DC)->opt_permissive)
 
 /* Prototypes.  */
+static bool diagnostic_impl (rich_location *, int, const char *,
+va_list *, diagnostic_t) ATTRIBUTE_GCC_DIAG(3,0);
+static bool diagnostic_n_impl (location_t, int, int, const char *,
+  const char *, va_list *,
+  diagnostic_t) ATTRIBUTE_GCC_DIAG(5,0);
 static void error_recursion (diagnostic_context *) ATTRIBUTE_NORETURN;
-
 static void real_abort (void) ATTRIBUTE_NORETURN;
 
 /* Name of program invoked, sans directories.  */
@@ -913,29 +917,57 @@ diagnostic_append_note (diagnostic_context *contex
   va_end (ap);
 }
 
-bool
-emit_diagnostic (diagnostic_t kind, location_t location, int opt,
-const char *gmsgid, ...)
+/* Implement emit_diagnostic, inform, inform_at_rich_loc, warning, warning_at,
+   warning_at_rich_loc, pedwarn, permerror, permerror_at_rich_loc, error,
+   error_at, error_at_rich_loc, sorry, fatal_error, internal_error, and
+   internal_error_no_backtrace, as documented and defined below.  */
+static bool
+diagnostic_impl (rich_location *richloc, int opt,
+const char *gmsgid,
+va_list *ap, diagnostic_t kind)
 {
   diagnostic_info diagnostic;
-  va_list ap;
-  bool ret;
-  rich_location richloc (line_table, location);
-
-  va_start (ap, gmsgid);
   if (kind == DK_PERMERROR)
 {
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
+  diagnostic_set_info (&diagnostic, gmsgid, ap, richloc,
   permissive_error_kind (global_dc));
   diagnostic.option_index = permissive_error_option (global_dc);
 }
-  else {
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, kind);
+  else
+{
+  diagnostic_set_info (&diagnostic, gmsgid, ap, richloc, kind);
   if (kind == DK_WARNING || kind == DK_PEDWARN)
diagnostic.option_index = opt;
-  }
+}
+  return report_diagnostic (&diagnostic);
+}
 
-  ret = report_diagnostic (&diagnostic);
+/* Implement inform_n, warning_n, and error_n, as documented and
+   defined below.  */
+static bool
+diagnostic_n_impl (location_t location, int opt, int n,
+  const char *singular_gmsgid,
+  const char *plural_gmsgid,
+  va_list *ap, diagnostic_t kind)
+{
+  diagnostic_info diagnostic;
+  rich_location richloc (line_table, location);
+  diagnostic_set_info_translated (&diagnostic,
+  ngettext (singular_gmsgid, plural_gmsgid, n),
+  ap, &richloc, kind);
+  if (kind == DK_WARNING)
+diagnostic.option_index = opt;
+  return report_diagnostic (&diagnostic);
+}
+
+bool
+emit_diagnostic (diagnostic_t kind, location_t location, int opt,
+const char *gmsgid, ...)
+{
+  va_list ap;
+  va_start (ap, gmsgid);
+  rich_location richloc (line_table, location);
+  bool ret = diagnostic_impl (&richloc, opt, gmsgid, &ap, kind);
   va_end (ap);
   return ret;
 }
@@ -945,13 +977,10 @@ diagnostic_append_note (diagnostic_context *contex
 void
 inform (location_t location, const char *gmsgid, ...)
 {
-  diagnostic_info diagnostic;
   va_list ap;
+  va_start (ap, gmsgid);
   rich_location richloc (line_table, location);
-
-  va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_NOTE);
-  report_diagnostic (&diagnostic);
+  diagnostic_impl (&richloc, -1, gmsgid, &ap, DK_NOTE);
   va_end (ap);
 }
 
@@ -959,12 +988,9 @@ inform (location_t location, const char *gmsgid, .
 void
 inform_at_rich_loc (rich_location *richloc, const char *gmsgid, ...)
 {
-  diagnostic_info diagnostic;
   va_list ap;
-
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, richloc, DK_NOTE);
-  report_diagnostic (&diagnostic);
+  diagnostic_impl (richloc, -1, gmsgid, &ap, DK_NOTE);
   va_end (ap);
 }
 
@@ -974,15 +1000,10 @@ void
 inform_n (location_t location, int n, const char *singular_gmsgid,
   const char *plural_gmsgid, ...)
 {
-  diagnostic_info diagnostic;
   va_list ap;
-  rich_location richloc (line_table, location);
-
   va_start (ap, plural_gmsgid);
-  diagnostic_set_info_translated (&diagnostic,
-  

Re: [PATCH] Selftest framework (v7)

2016-06-06 Thread Jakub Jelinek
On Mon, Jun 06, 2016 at 05:53:50PM -0400, Trevor Saunders wrote:
> > > As far as I can 
> > > tell this just involves moving the start of namespace selftest
> > > upwards a 
> > > bit in the files where we have tests.
> > 
> > Yes, and it does seem cleaner to have all of the selftest code start
> > like this:
> > 
> >   #if CHECKING_P
> 
> What are we gaining by ifdefing this? I would think on reasonable
> systems the compiler would optimize out the call to the selftests in
> release builds and then the linker would gc all the unused functions.
> Do we really care about code size in places that doesn't happen enough
> to go through this?

Not everyone is building the compiler with LTO, and if you don't, then
how would you optimize that away?
And yes, not having the self-tests, especially if they are going to grow
further, in release compilers is desirable, especially if it would be
intermixed with hot code.

Jakub


Re: [PATCH] Selftest framework (v7)

2016-06-06 Thread Trevor Saunders
> > As far as I can 
> > tell this just involves moving the start of namespace selftest
> > upwards a 
> > bit in the files where we have tests.
> 
> Yes, and it does seem cleaner to have all of the selftest code start
> like this:
> 
>   #if CHECKING_P

What are we gaining by ifdefing this? I would think on reasonable
systems the compiler would optimize out the call to the selftests in
release builds and then the linker would gc all the unused functions.
Do we really care about code size in places that doesn't happen enough
to go through this?

Thanks!

Trev


Re: C PATCH to improve location for abstract declarators (PR c/71362)

2016-06-06 Thread Joseph Myers
On Fri, 3 Jun 2016, Marek Polacek wrote:

> This fixes an imprecise location info with abstract declarators.  The problem
> was that when we build_id_declarator, the default location was input_location
> and we never attempted to use a more precise location.  This patch does it.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING] [PR other/70945] Handle function_glibc_finite_math in offloading

2016-06-06 Thread Joseph Myers
On Fri, 3 Jun 2016, Jakub Jelinek wrote:

> On Fri, Jun 03, 2016 at 04:44:15PM +0200, Thomas Schwinge wrote:
> > Hi!
> > 
> > Ping.
> 
> I think it would be better to just add this support to newlib.

That suggestion doesn't really make sense to me.  Why should newlib be 
expected to follow the same choices as glibc regarding what variants of 
libm functions to export, beyond the standard names for those functions, 
or how to name any variants it does export?

-- 
Joseph S. Myers
jos...@codesourcery.com


[Fortran, Patch] First patch for coarray FAILED IMAGES (TS 18508)

2016-06-06 Thread Alessandro Fanfarillo
Dear all,

please find in attachment the first patch (of n) for the FAILED IMAGES
capability defined in the coarray TS 18508.
The patch adds support for three new intrinsic functions defined in
the TS for simulating a failure (fail image), checking an image status
(image_status) and getting the list of failed images (failed_images).
The patch has been built and regtested on x86_64-pc-linux-gnu.

Ok for trunk?

Alessandro
commit b3bca5b09f4cbcf18f2409dae2485a16a7c06498
Author: Alessandro Fanfarillo 
Date:   Mon Jun 6 14:27:37 2016 -0600

First patch Failed Images CAF TS-18508

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index d26e45e..71931cb 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -1210,6 +1210,62 @@ gfc_check_event_query (gfc_expr *event, gfc_expr *count, 
gfc_expr *stat)
   return true;
 }
 
+bool
+gfc_check_image_status (gfc_expr *image, gfc_expr *team)
+{
+  if (!type_check (image, 1, BT_INTEGER))
+return false;
+
+  if(team)
+{
+  gfc_error ("TEAM argument of the IMAGE_STATUS intrinsic function at %L "
+"not yet supported",
+&team->where);
+  return false;
+}
+
+  int i = gfc_validate_kind (BT_INTEGER, image->ts.kind, false);
+  int j = gfc_validate_kind (BT_INTEGER, gfc_default_integer_kind, false);
+
+  if (gfc_integer_kinds[i].range < gfc_integer_kinds[j].range)
+{
+  gfc_error ("IMAGE argument of the IMAGE_STATUS intrinsic function at %L "
+"shall have at least the range of the default integer",
+&image->where);
+  return false;
+}
+
+  return true;
+}
+
+bool
+gfc_check_failed_images (gfc_expr *team, gfc_expr *kind)
+{
+  if (team)
+{
+  gfc_error ("TEAM argument of the FAILED_IMAGES intrinsic function at %L "
+"not yet supported",
+&team->where);
+  return false;
+}
+
+  if (kind)
+{
+  int i = gfc_validate_kind (BT_INTEGER, kind->ts.kind, false);
+  int j = gfc_validate_kind (BT_INTEGER, gfc_default_integer_kind, false);
+
+  if (gfc_integer_kinds[i].range < gfc_integer_kinds[j].range)
+   {
+ gfc_error ("KIND argument of the FAILED_IMAGES intrinsic function at 
%L "
+"shall have at least the range of the default integer",
+&kind->where);
+ return false;
+   }
+}
+
+  return true;
+}
+
 
 bool
 gfc_check_atomic_fetch_op (gfc_expr *atom, gfc_expr *value, gfc_expr *old,
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index f507434..41ed664 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1628,6 +1628,9 @@ show_code_node (int level, gfc_code *c)
 
   break;
 
+case EXEC_FAIL_IMAGE:
+  fputs ("FAIL IMAGE ", dumpfile);
+
 case EXEC_SYNC_ALL:
   fputs ("SYNC ALL ", dumpfile);
   if (c->expr2 != NULL)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 0bb71cb..6d87632 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -253,7 +253,7 @@ enum gfc_statement
   ST_OMP_END_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD,
   ST_PROCEDURE, ST_GENERIC, ST_CRITICAL, ST_END_CRITICAL,
   ST_GET_FCN_CHARACTERISTICS, ST_LOCK, ST_UNLOCK, ST_EVENT_POST,
-  ST_EVENT_WAIT,ST_NONE
+  ST_EVENT_WAIT,ST_FAIL_IMAGE,ST_NONE
 };
 
 /* Types of interfaces that we can have.  Assignment interfaces are
@@ -411,6 +411,7 @@ enum gfc_isym_id
   GFC_ISYM_EXP,
   GFC_ISYM_EXPONENT,
   GFC_ISYM_EXTENDS_TYPE_OF,
+  GFC_ISYM_FAILED_IMAGES,
   GFC_ISYM_FDATE,
   GFC_ISYM_FE_RUNTIME_ERROR,
   GFC_ISYM_FGET,
@@ -454,6 +455,7 @@ enum gfc_isym_id
   GFC_ISYM_IEOR,
   GFC_ISYM_IERRNO,
   GFC_ISYM_IMAGE_INDEX,
+  GFC_ISYM_IMAGE_STATUS,
   GFC_ISYM_INDEX,
   GFC_ISYM_INT,
   GFC_ISYM_INT2,
@@ -2382,7 +2384,7 @@ enum gfc_exec_op
   EXEC_OPEN, EXEC_CLOSE, EXEC_WAIT,
   EXEC_READ, EXEC_WRITE, EXEC_IOLENGTH, EXEC_TRANSFER, EXEC_DT_END,
   EXEC_BACKSPACE, EXEC_ENDFILE, EXEC_INQUIRE, EXEC_REWIND, EXEC_FLUSH,
-  EXEC_LOCK, EXEC_UNLOCK, EXEC_EVENT_POST, EXEC_EVENT_WAIT,
+  EXEC_LOCK, EXEC_UNLOCK, EXEC_EVENT_POST, EXEC_EVENT_WAIT, EXEC_FAIL_IMAGE,
   EXEC_OACC_KERNELS_LOOP, EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_ROUTINE,
   EXEC_OACC_PARALLEL, EXEC_OACC_KERNELS, EXEC_OACC_DATA, EXEC_OACC_HOST_DATA,
   EXEC_OACC_LOOP, EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE,
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 1d7503d..8dfb568 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -1823,6 +1823,10 @@ add_functions (void)
 a, BT_UNKNOWN, 0, REQUIRED,
 mo, BT_UNKNOWN, 0, REQUIRED);
 
+  add_sym_2 ("failed_images", GFC_ISYM_FAILED_IMAGES, CLASS_TRANSFORMATIONAL, 
ACTUAL_NO, BT_INTEGER,
+dd, GFC_STD_F2008_TS, gfc_check_failed_images, NULL,
+gfc_resolve_failed_images, "team", BT_INTEGER, di, OPTIONAL, 
"kind", BT_INTEGER, di, OPTIONAL);
+
   add_sym_0 ("fdate",  GFC_ISYM_FDATE, CLASS_IMPURE, ACTUAL_NO, B

[PATCH] Add ggc-tests.c

2016-06-06 Thread David Malcolm
Jeff approved an earlier version of this (as
unittests/test-ggc.c):
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03306.html
> Not terribly happy with that counter to used to create a big list
> to detect recursion. But I'm not offhand sure how to avoid without
> exposing more of the ggc system that is wise.
>
> OK if/when prereqs are approved. Minor twiddling if we end up
> moving it elsewhere or standardizing/reducing header files is
> pre-approved.

This version moves it to gcc/ggc-tests.c and ports it to the new
-fself-test approach.

For now, I also reduced the count within
  TEST_F (ggc_test, chain_next)
from 2 million to 10, to avoid slowing down the test (though the
former takes only about 0.5s on my box).

I've also fixed things so that it works with both checked and unchecked
builds.  Note that the GTY roots within ggc-tests.c are wrapped with
  #if CHECKING_P
which implies that PCH files would be incompatible between release vs
checked builds (I'm not sure whether or not that's already the case).

I've also added various new tests since Jeff's review, for:
  * GTY((length)),
  * unions, and
  * GTY((user))

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* Makefile.in (OBJS): Add ggc-tests.o.
(GTFILES): Add ggc-tests.c.
* ggc-tests.c: New file.
---
 gcc/Makefile.in  |   2 +
 gcc/ggc-tests.c  | 517 +++
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 4 files changed, 521 insertions(+)
 create mode 100644 gcc/ggc-tests.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index fdcc42a..776f6d7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1270,6 +1270,7 @@ OBJS = \
gcse.o \
gcse-common.o \
ggc-common.o \
+   ggc-tests.o \
gimple.o \
gimple-builder.o \
gimple-expr.o \
@@ -2398,6 +2399,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/emit-rtl.c $(srcdir)/except.h $(srcdir)/explow.c $(srcdir)/expr.c \
   $(srcdir)/expr.h \
   $(srcdir)/function.c $(srcdir)/except.c \
+  $(srcdir)/ggc-tests.c \
   $(srcdir)/gcse.c $(srcdir)/godump.c \
   $(srcdir)/lists.c $(srcdir)/optabs-libfuncs.c \
   $(srcdir)/profile.c $(srcdir)/mcf.c \
diff --git a/gcc/ggc-tests.c b/gcc/ggc-tests.c
new file mode 100644
index 000..fab1570
--- /dev/null
+++ b/gcc/ggc-tests.c
@@ -0,0 +1,517 @@
+/* Unit tests for GCC's garbage collector (and gengtype etc).
+   Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree-core.h"
+#include "tree.h"
+#include "ggc-internal.h" /* (for ggc_force_collect).  */
+#include "selftest.h"
+
+#if CHECKING_P
+
+/* The various GTY markers must be outside of a namespace to be seen by
+   gengtype, so we don't put this file within the selftest namespace.  */
+
+/* A helper function for writing ggc tests.  */
+
+static void
+forcibly_ggc_collect ()
+{
+  ggc_force_collect = true;
+  ggc_collect ();
+  ggc_force_collect = false;
+}
+
+
+
+/* Verify that a simple struct works, and that it can
+   own references to non-roots, and have them be marked.  */
+
+struct GTY(()) test_struct
+{
+  struct test_struct *other;
+};
+
+static GTY(()) test_struct *root_test_struct;
+
+static void
+test_basic_struct ()
+{
+  root_test_struct = ggc_cleared_alloc  ();
+  root_test_struct->other = ggc_cleared_alloc  ();
+
+  forcibly_ggc_collect ();
+
+  ASSERT_TRUE (ggc_marked_p (root_test_struct));
+  ASSERT_TRUE (ggc_marked_p (root_test_struct->other));
+}
+
+
+
+/* Selftest for GTY((length)).  */
+
+/* A test struct using GTY((length)).  */
+
+struct GTY(()) test_of_length
+{
+  int num_elem;
+  struct test_of_length * GTY ((length ("%h.num_elem"))) elem[1];
+};
+
+static GTY(()) test_of_length *root_test_of_length;
+
+static void
+test_length ()
+{
+  const int count = 5;
+  size_t sz = sizeof (test_of_length) + (count- 1) * sizeof (test_of_length *);
+  root_test_of_length = (test_of_length *)ggc_internal_cleared_alloc (sz);
+  root_test_of_length->num_elem = count;
+  for (int i = 0; i < count; i++)
+root_test_of_length->elem[i] = ggc_cleared_alloc  ();
+
+  forcibly_ggc_collect ();
+
+  ASSERT_TRUE (ggc_marked_p (root_test_of_length));
+  for (int i = 0; i < c

[PATCH] Add selftest for pretty-print.c

2016-06-06 Thread David Malcolm
This adds another set of test cases to -fself-test, this time
for the basic functionality within pretty-print.c.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* pretty-print.c: Include "selftest.h".
(pp_format): Fix comment.
(selftest::test_basic_printing): New function.
(selftest::assert_pp_format_va): New function.
(selftest::assert_pp_format): New function.
(selftest::assert_pp_format_colored): New function.
(selftest::test_pp_format): New function.
(selftest::pretty_print_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::pretty_print_c_tests.
* selftest.h (selftest::pretty_print_c_tests): New declaration.
---
 gcc/pretty-print.c   | 153 ++-
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 3 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index cc2b8cc..d1829f3 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "pretty-print.h"
 #include "diagnostic-color.h"
+#include "selftest.h"
 
 #if HAVE_ICONV
 #include 
@@ -304,7 +305,7 @@ pp_indent (pretty_printer *pp)
 
 /* Formatting phases 1 and 2: render TEXT->format_spec plus
TEXT->args_ptr into a series of chunks in pp_buffer (PP)->args[].
-   Phase 3 is in pp_format_text.  */
+   Phase 3 is in pp_output_formatted_text.  */
 
 void
 pp_format (pretty_printer *pp, text_info *text)
@@ -1203,3 +1204,153 @@ identifier_to_locale (const char *ident)
 return ret;
   }
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Smoketest for pretty_printer.  */
+
+static void
+test_basic_printing ()
+{
+  pretty_printer pp;
+  pp_string (&pp, "hello");
+  pp_space (&pp);
+  pp_string (&pp, "world");
+
+  ASSERT_STREQ ("hello world", pp_formatted_text (&pp));
+}
+
+/* Helper function for testing pp_format.
+   Verify that pp_format (FMT, ...) followed by pp_output_formatted_text
+   prints EXPECTED, assuming that pp_show_color is SHOW_COLOR.  */
+
+static void
+assert_pp_format_va (const char *expected, bool show_color, const char *fmt,
+va_list *ap)
+{
+  pretty_printer pp;
+  text_info ti;
+  rich_location rich_loc (line_table, UNKNOWN_LOCATION);
+
+  ti.format_spec = fmt;
+  ti.args_ptr = ap;
+  ti.err_no = 0;
+  ti.x_data = NULL;
+  ti.m_richloc = &rich_loc;
+
+  pp_show_color (&pp) = show_color;
+  pp_format (&pp, &ti);
+  pp_output_formatted_text (&pp);
+  ASSERT_STREQ (expected, pp_formatted_text (&pp));
+}
+
+/* Verify that pp_format (FMT, ...) followed by pp_output_formatted_text
+   prints EXPECTED, with show_color disabled.  */
+
+static void
+assert_pp_format (const char *expected, const char *fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  assert_pp_format_va (expected, false, fmt, &ap);
+  va_end (ap);
+}
+
+/* As above, but with colorization enabled.  */
+
+static void
+assert_pp_format_colored (const char *expected, const char *fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  assert_pp_format_va (expected, true, fmt, &ap);
+  va_end (ap);
+}
+
+/* Verify that pp_format works, for various format codes.  */
+
+static void
+test_pp_format ()
+{
+  /* Avoid introducing locale-specific differences in the results
+ by hardcoding open_quote and close_quote.  */
+  const char *old_open_quote = open_quote;
+  const char *old_close_quote = close_quote;
+  open_quote = "`";
+  close_quote = "'";
+
+  /* Verify that plain text is passed through unchanged.  */
+  assert_pp_format ("unformatted", "unformatted");
+
+  /* Verify various individual format codes, in the order listed in the
+ comment for pp_format above.  */
+  assert_pp_format ("-27", "%d", -27);
+  assert_pp_format ("-5", "%i", -5);
+  assert_pp_format ("10", "%u", 10);
+  assert_pp_format ("17", "%o", 15);
+  assert_pp_format ("cafebabe", "%x", 0xcafebabe);
+  assert_pp_format ("-27", "%ld", (long)-27);
+  assert_pp_format ("-5", "%li", (long)-5);
+  assert_pp_format ("10", "%lu", (long)10);
+  assert_pp_format ("17", "%lo", (long)15);
+  assert_pp_format ("cafebabe", "%lx", (long)0xcafebabe);
+  assert_pp_format ("-27", "%lld", (long long)-27);
+  assert_pp_format ("-5", "%lli", (long long)-5);
+  assert_pp_format ("10", "%llu", (long long)10);
+  assert_pp_format ("17", "%llo", (long long)15);
+  assert_pp_format ("cafebabe", "%llx", (long long)0xcafebabe);
+  assert_pp_format ("-27", "%wd", (HOST_WIDE_INT)-27);
+  assert_pp_format ("-5", "%wi", (HOST_WIDE_INT)-5);
+  assert_pp_format ("10", "%wu", (unsigned HOST_WIDE_INT)10);
+  assert_pp_format ("17", "%wo", (HOST_WIDE_INT)15);
+  assert_pp_format ("0xcafebabe", "%wx", (HOST_WIDE_INT)0xcafebabe);
+  assert_pp_format ("A", "%c", 'A');
+  assert_pp_format ("hello world", "%s", "hello world");
+  assert_pp_format ("0xcafe

[PATCH] spellcheck.c: add test_find_closest_string

2016-06-06 Thread David Malcolm
This adds another test case to -fself-test.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* spellcheck.c (selftest::test_find_closest_string): New function.
(spellcheck_c_tests): Call the above.
---
 gcc/spellcheck.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
index ceb6016..11018f0 100644
--- a/gcc/spellcheck.c
+++ b/gcc/spellcheck.c
@@ -198,6 +198,27 @@ levenshtein_distance_unit_test (const char *a, const char 
*b,
   levenshtein_distance_unit_test_oneway (b, a, expected);
 }
 
+/* Verify that find_closest_string is sane.  */
+
+static void
+test_find_closest_string ()
+{
+  auto_vec candidates;
+
+  /* Verify that it can handle an empty vec.  */
+  ASSERT_EQ (NULL, find_closest_string ("", &candidates));
+
+  /* Verify that it works sanely for non-empty vecs.  */
+  candidates.safe_push ("apple");
+  candidates.safe_push ("banana");
+  candidates.safe_push ("cherry");
+
+  ASSERT_STREQ ("apple", find_closest_string ("app", &candidates));
+  ASSERT_STREQ ("banana", find_closest_string ("banyan", &candidates));
+  ASSERT_STREQ ("cherry", find_closest_string ("berry", &candidates));
+  ASSERT_EQ (NULL, find_closest_string ("not like the others", &candidates));
+}
+
 /* Verify levenshtein_distance for a variety of pairs of pre-canned
inputs, comparing against known-good values.  */
 
@@ -218,6 +239,8 @@ spellcheck_c_tests ()
 ("Lorem ipsum dolor sit amet, consectetur adipiscing elit,",
  "All your base are belong to us",
  44);
+
+  test_find_closest_string ();
 }
 
 } // namespace selftest
-- 
1.8.5.3



Re: [Diagnostic Patch] Clean-up diagnostic facilities in diagnostic.c

2016-06-06 Thread David Malcolm
On Mon, 2016-06-06 at 10:55 +0200, Paolo Carlini wrote:
> Hi,
> 
> yesterday I had the idea of this small clean-up: move the work done
> by 
> emit_diagnostic to a new non-variadic diagnostic_impl and use it to 
> implement the former and all the various inform, warning, permerror,
> etc 
> (lately we have the *_at_rich_loc variants too). Something similar
> can 
> be done for inform_n, warning_n and errror_n. There is the minor 
> subtlety of the declarations decorated with ATTRIBUTE_GCC_DIAG when 
> available to suppress the build-time warning: I think I did it the
> right 
> way, I took inspiration from the declarations of diagnostic_set_info,
> etc, in diagnostic.h.
> 
> Tested x86_64-linux.

Thanks, looks like a nice simplification.

I see the new prototypes are wrapped in #ifdef ATTRIBUTE_GCC_DIAG.
Isn't that redundant?  Looking at diagnostic-core.h, isn't it always
defined? (perhaps to the empty string)

Also the new functions are missing comments.

Other than that, LGTM.



Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-06 Thread Jakub Jelinek
Hi!

On Mon, Jun 06, 2016 at 02:36:17PM +0200, Jakub Jelinek wrote:
> 2016-06-06  Martin Sebor  
>   Jakub Jelinek  
> 
>   PR c++/70507
>   PR c/68120
>   * builtins.def (BUILT_IN_ADD_OVERFLOW_P, BUILT_IN_SUB_OVERFLOW_P,
>   BUILT_IN_MUL_OVERFLOW_P): New builtins.
>   * builtins.c: Include gimple-fold.h.
>   (fold_builtin_arith_overflow): Handle
>   BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
>   (fold_builtin_3): Likewise.
>   * doc/extend.texi (Integer Overflow Builtins): Document
>   __builtin_{add,sub,mul}_overflow_p.
> gcc/c/
>   * c-typeck.c (convert_arguments): Don't promote last argument
>   of BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
> gcc/cp/
>   * constexpr.c: Include gimple-fold.h.
>   (cxx_eval_internal_function): New function.
>   (cxx_eval_call_expression): Call it.
>   (potential_constant_expression_1): Handle integer arithmetic
>   overflow built-ins.
>   * tree.c (builtin_valid_in_constant_expr_p): Likewise.
> gcc/c-family/
>   * c-common.c (check_builtin_function_arguments): Handle
>   BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
> gcc/testsuite/
>   * c-c++-common/builtin-arith-overflow-1.c: Add test cases.
>   * c-c++-common/builtin-arith-overflow-2.c: New test.
>   * g++.dg/cpp0x/constexpr-arith-overflow.C: New test.
>   * g++.dg/cpp1y/constexpr-arith-overflow.C: New test.

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk?

Jakub


Re: [C++ PATCH] Avoid exponential compile time in cp_fold_function (PR c++/70847, PR c++/71330, PR c++/71393)

2016-06-06 Thread Jason Merrill
OK.

Jason


Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Jakub Jelinek
On Mon, Jun 06, 2016 at 09:27:56PM +0200, Marc Glisse wrote:
> The last one would miss floating point registers (no 2 platforms use the
> same letter for those, hence my quest for something more generic).
> 
> The goal of the experiment is described in PR59159 (for which "+X" is
> unlikely to be the right answer, in particular because it is meaningless for
> constants). I don't know in what context people use the "X" constraint, or
> even better "=X"...

X constraint has been added mainly for uses in match_scratch like:
(clobber (match_scratch:SI 2 "=X,X,X,&r"))
or when the predicate takes care of everything and it is not needed to
specify anything further:
  [(set (match_operand:SWI12 0 "push_operand" "=X")
(match_operand:SWI12 1 "nonmemory_no_elim_operand" "rn"))]
Using it in inline asm generally has resulted in lots of issues, including
ICEs etc., so nothing I'd recommend to use.

Jakub


Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Marc Glisse

On Mon, 6 Jun 2016, Jakub Jelinek wrote:


On Mon, Jun 06, 2016 at 12:04:04PM -0600, Jeff Law wrote:

On 06/06/2016 12:01 PM, Jakub Jelinek wrote:

On Mon, Jun 06, 2016 at 11:54:04AM -0600, Jeff Law wrote:

As for recog.c, I can not approve this as I am not a maintainer of it.
I only can say that the code looks questionable to me.

I think the question on the recog part is a matter of how we choose to
interpret what the "X" constraint means.

Does it literally mean accept anything, or accept some subset expressions?

I tend to think the former, which means that things like
reg_overlap_mentioned_p or its callers have to be bullet-proofed.


I think it is a bad idea to accept really anything, even for debug insns,
which initially accepted arbitrarily large RTL expressions (and still accept
stuff like subregs otherwise considered invalid etc.) we found it is highly
undesirable, as it is not very good idea for the compile time complexity
etc., so now we are trying to limit the complexity of the expressions there
by splitting up more complex expressions into smaller ones using temporaries.
So, even accept anything should always be accept anything reasonable,
because most of the RTL passes don't really expect arbitrarily deep
expressions, or expressions where the same reg can appear thousands of times
etc.

The problem is how do you define this subset of expressions you're going to
accept and those which you are going to reject.

I first pondered accepting RTL leaf nodes (reg, subreg, constants) and
rejecting everything else.  But I couldn't convince myself that some port
might reasonably expect (plus (x) (y)) to match the "X" constraint.


It is always going to be arbitrary.
Perhaps RTL leaf nodes (if including MEM, then perhaps with valid address
only), and unary/binary/ternary RTL expressions with RTL leaf node operands?
Or union of what is accepted by any other constraint?
Or "g" plus any constants?


The last one would miss floating point registers (no 2 platforms use the 
same letter for those, hence my quest for something more generic).


The goal of the experiment is described in PR59159 (for which "+X" is 
unlikely to be the right answer, in particular because it is meaningless 
for constants). I don't know in what context people use the "X" 
constraint, or even better "=X"...


--
Marc Glisse


Re: [PATCH, i386] Add native support for VIA C7, Eden and Nano CPUs

2016-06-06 Thread J. Mayer
On Mon, 2016-06-06 at 17:27 +, Joseph Myers wrote:
> This patch is missing the invoke.texi changes to document all the new
> CPU 
> names.

Hi,
correct, please consider adding the following patch to fix this.
Regards.

---

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ce162a0..ac7f8a8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -23261,6 +23261,54 @@ VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE
instruction set support.
 (No scheduling is
 implemented for this chip.)
 
+@item c7
+VIA C7 (Esther) CPU with MMX, SSE, SSE2 and SSE3 instruction set
support.
+(No scheduling is implemented for this chip.)
+
+@item samuel-2
+VIA Eden Samuel 2 CPU with MMX and 3DNow!@: instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item nehemiah
+VIA Eden Nehemiah CPU with MMX and SSE instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item esther
+VIA Eden Esther CPU with MMX, SSE, SSE2 and SSE3 instruction set
support.
+(No scheduling is implemented for this chip.)
+
+@item eden-x2
+VIA Eden X2 CPU with x86-64, MMX, SSE, SSE2 and SSE3 instruction set
support.
+(No scheduling is implemented for this chip.)
+
+@item eden-x4
+VIA Eden X4 CPU with x86-64, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
SSE4.2, AVX and AVX2 instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item nano
+Generic VIA Nano CPU with x86-64, MMX, SSE, SSE2, SSE3 and SSSE3
instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item nano-1000
+VIA Nano 1xxx CPU with x86-64, MMX, SSE, SSE2, SSE3 and SSSE3
instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item nano-2000
+VIA Nano 2xxx CPU with x86-64, MMX, SSE, SSE2, SSE3 and SSSE3
instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item nano-3000
+VIA Nano 3xxx CPU with x86-64, MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.1
instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item nano-x2
+VIA Nano Dual Core CPU with x86-64, MMX, SSE, SSE2, SSE3, SSSE3 and
SSE4.1 instruction set support.
+(No scheduling is implemented for this chip.)
+
+@item nano-x4
+VIA Nano Quad Core CPU with x86-64, MMX, SSE, SSE2, SSE3, SSSE3 and
SSE4.1 instruction set support.
+(No scheduling is implemented for this chip.)
+
 @item geode
 AMD Geode embedded processor with MMX and 3DNow!@: instruction set
support.
 @end table



[PATCH] Support i386 TLS code sequences without PLT

2016-06-06 Thread H.J. Lu
On Mon, Jun 6, 2016 at 11:04 AM, H.J. Lu  wrote:
> On Mon, Jun 6, 2016 at 8:01 AM, Carlos O'Donell  wrote:
>> On 06/03/2016 05:21 PM, H.J. Lu wrote:
>>> We can generate x86-64 TLS code sequences for general and local dynamic
>>> models without PLT, which uses indirect call via GOT:
>>>
>>> call *__tls_get_addr@GOTPCREL(%rip)
>>>
>>> instead of direct call:
>>>
>>> call __tls_get_addr[@PLT]
>>
>> What are the actual pros and cons of this change?
>
> Pros:  improved security and performance since GOT can be
> RELRO and one direct branch is removed.
> Cons: Code size is bigger when there are more than 16 calls
> to __tls_get_addr since direct branch is 4 byte and indirect
> branch is 5 byte. Also there is no lazy binding.
>
>> Does this improve security? Performance?
>>
>> The __tls_get_addr symbol, on x86_64, lives in ld.so, which generally
>> means that all shared objects (GD usage) indirect through their PLT/GOT
>> to make the call. In this model, and because of lazy linking, the
>> PLT-related GOT entries are left read-write to be updated after resolution
>> (ignore the BIND_NOW + RELRO case since in that case we do all of this
>> up front).
>>
>> After your change, without a PLT entry, these symbols can no longer be
>> interposed? The static linker would generate a binding (a got reloc for
>
> It can still be interposed.  Just lazy binding is disabled.
>
>> the symbol which is resolved by the dynamic loader) that cannot be changed,
>> becomes RO after RELRO?
>>
>> Is the security benefit worth the loss of interposition for this symbol?
>
> There is no loss of interposition.
>
>> Is there any performance gains?
>
> One direct branch to PLT entry is removed.
>
> This is what I am checking in.

Here is a similar patch for i386.  Similar pros and cons.  For
i386, since EBX is no longer required to call ___tls_get_addr,
other registers can be used. It will further improve performance
on i386.  I will check it in this week.

-- 
H.J.
From bf5feaddba6b8372f4c2a2b311b4df248d6e59ce Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 5 Jun 2016 07:18:06 -0700
Subject: [PATCH] Support i386 TLS code sequences without PLT

We can generate i386 TLS code sequences for general and local dynamic
models without PLT, which uses indirect call via GOT:

call *___tls_get_addr@GOT(%reg)

where EBX register isn't required as GOT base, instead of direct call:

call ___tls_get_addr[@PLT]

which requires EBX register as GOT base.

Since direct call is 4-byte long and indirect call, is 5-byte long, the
extra one byte must be handled properly.

For general dynamic model, 7-byte lea instruction before call instruction
is replaced by 6-byte one to make room for indirect call.  For local
dynamic model, we simply use 5-byte indirect call.

TLS linker optimization is updated to recognize new instruction patterns.
For local dynamic model to local exec model transition, we generate
a 6-byte lea instruction as nop, instead of a 1-byte nop plus a 4-byte
lea instruction.  Since linker may convert

call ___tls_get_addr[@PLT]

to

addr32 call tls_get_addr

when producing static executable, both patterns are recognized.

bfd/

	* elf64-i386.c (elf_i386_link_hash_entry): Add tls_get_addr.
	(elf_i386_link_hash_newfunc): Initialize tls_get_addr to 2.
	(elf_i386_check_tls_transition): Check indirect call and direct
	call with the addr32 prefix for general and local dynamic models.
	Set the tls_get_addr feild.
	(elf_i386_convert_load_reloc): Always use addr32 prefix for
	indirect ___tls_get_addr call via GOT.
	(elf_i386_relocate_section): Handle GD->LE, GD->IE and LD->LE
	transitions with indirect call and direct call with the addr32
	prefix.

ld/

	* testsuite/ld-i386/i386.exp: Run libtlspic2.so, tlsbin2,
	tlsgd3, tlsld2, tlsgd4, tlspie3a, tlspie3b and tlspie3c.
	* testsuite/ld-i386/pass.out: New file.
	* testsuite/ld-i386/tls-def1.c: Likewise.
	* testsuite/ld-i386/tls-gd1.S: Likewise.
	* testsuite/ld-i386/tls-ld1.S: Likewise.
	* testsuite/ld-i386/tls-main1.c: Likewise.
	* testsuite/ld-i386/tls.exp: Likewise.
	* testsuite/ld-i386/tlsbin2-nacl.rd: Likewise.
	* testsuite/ld-i386/tlsbin2.dd: Likewise.
	* testsuite/ld-i386/tlsbin2.rd: Likewise.
	* testsuite/ld-i386/tlsbin2.sd: Likewise.
	* testsuite/ld-i386/tlsbin2.td: Likewise.
	* testsuite/ld-i386/tlsbinpic2.s: Likewise.
	* testsuite/ld-i386/tlsgd3.dd: Likewise.
	* testsuite/ld-i386/tlsgd3.s: Likewise.
	* testsuite/ld-i386/tlsgd4.d: Likewise.
	* testsuite/ld-i386/tlsgd4.s: Likewise.
	* testsuite/ld-i386/tlsld2.s: Likewise.
	* testsuite/ld-i386/tlspic2-nacl.rd: Likewise.
	* testsuite/ld-i386/tlspic2.dd: Likewise.
	* testsuite/ld-i386/tlspic2.rd: Likewise.
	* testsuite/ld-i386/tlspic2.sd: Likewise.
	* testsuite/ld-i386/tlspic2.td: Likewise.
	* testsuite/ld-i386/tlspic3.s: Likewise.
	* testsuite/ld-i386/tlspie3.s: Likewise.
	* testsuite/ld-i386/tlspie3a.d: Likewise.
	* testsuite/ld-i386/tlspie3b.d: Likewise.
	* testsuite/ld-i386/tlspie3c.d: Likewise.
---
 bfd/elf32-i386.c | 249 +

Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Jakub Jelinek
On Mon, Jun 06, 2016 at 12:04:04PM -0600, Jeff Law wrote:
> On 06/06/2016 12:01 PM, Jakub Jelinek wrote:
> >On Mon, Jun 06, 2016 at 11:54:04AM -0600, Jeff Law wrote:
> >>>As for recog.c, I can not approve this as I am not a maintainer of it.
> >>>I only can say that the code looks questionable to me.
> >>I think the question on the recog part is a matter of how we choose to
> >>interpret what the "X" constraint means.
> >>
> >>Does it literally mean accept anything, or accept some subset expressions?
> >>
> >>I tend to think the former, which means that things like
> >>reg_overlap_mentioned_p or its callers have to be bullet-proofed.
> >
> >I think it is a bad idea to accept really anything, even for debug insns,
> >which initially accepted arbitrarily large RTL expressions (and still accept
> >stuff like subregs otherwise considered invalid etc.) we found it is highly
> >undesirable, as it is not very good idea for the compile time complexity
> >etc., so now we are trying to limit the complexity of the expressions there
> >by splitting up more complex expressions into smaller ones using temporaries.
> >So, even accept anything should always be accept anything reasonable,
> >because most of the RTL passes don't really expect arbitrarily deep
> >expressions, or expressions where the same reg can appear thousands of times
> >etc.
> The problem is how do you define this subset of expressions you're going to
> accept and those which you are going to reject.
> 
> I first pondered accepting RTL leaf nodes (reg, subreg, constants) and
> rejecting everything else.  But I couldn't convince myself that some port
> might reasonably expect (plus (x) (y)) to match the "X" constraint.

It is always going to be arbitrary.
Perhaps RTL leaf nodes (if including MEM, then perhaps with valid address
only), and unary/binary/ternary RTL expressions with RTL leaf node operands?
Or union of what is accepted by any other constraint?
Or "g" plus any constants?

Jakub


Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Jeff Law

On 06/06/2016 12:01 PM, Jakub Jelinek wrote:

On Mon, Jun 06, 2016 at 11:54:04AM -0600, Jeff Law wrote:

As for recog.c, I can not approve this as I am not a maintainer of it.
I only can say that the code looks questionable to me.

I think the question on the recog part is a matter of how we choose to
interpret what the "X" constraint means.

Does it literally mean accept anything, or accept some subset expressions?

I tend to think the former, which means that things like
reg_overlap_mentioned_p or its callers have to be bullet-proofed.


I think it is a bad idea to accept really anything, even for debug insns,
which initially accepted arbitrarily large RTL expressions (and still accept
stuff like subregs otherwise considered invalid etc.) we found it is highly
undesirable, as it is not very good idea for the compile time complexity
etc., so now we are trying to limit the complexity of the expressions there
by splitting up more complex expressions into smaller ones using temporaries.
So, even accept anything should always be accept anything reasonable,
because most of the RTL passes don't really expect arbitrarily deep
expressions, or expressions where the same reg can appear thousands of times
etc.
The problem is how do you define this subset of expressions you're going 
to accept and those which you are going to reject.


I first pondered accepting RTL leaf nodes (reg, subreg, constants) and 
rejecting everything else.  But I couldn't convince myself that some 
port might reasonably expect (plus (x) (y)) to match the "X" constraint.


jeff


Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Jakub Jelinek
On Mon, Jun 06, 2016 at 11:54:04AM -0600, Jeff Law wrote:
> >As for recog.c, I can not approve this as I am not a maintainer of it.
> >I only can say that the code looks questionable to me.
> I think the question on the recog part is a matter of how we choose to
> interpret what the "X" constraint means.
> 
> Does it literally mean accept anything, or accept some subset expressions?
> 
> I tend to think the former, which means that things like
> reg_overlap_mentioned_p or its callers have to be bullet-proofed.

I think it is a bad idea to accept really anything, even for debug insns,
which initially accepted arbitrarily large RTL expressions (and still accept
stuff like subregs otherwise considered invalid etc.) we found it is highly
undesirable, as it is not very good idea for the compile time complexity
etc., so now we are trying to limit the complexity of the expressions there
by splitting up more complex expressions into smaller ones using temporaries.
So, even accept anything should always be accept anything reasonable,
because most of the RTL passes don't really expect arbitrarily deep
expressions, or expressions where the same reg can appear thousands of times
etc.

Jakub


Re: [PATCH][vectorizer] Remove blank debug lines after dump_gimple_stmt

2016-06-06 Thread Jeff Law

On 06/06/2016 06:46 AM, Alan Hayward wrote:

Lots of code calls dump_gimple_stmt then print a newline, however
dump_gimple_stmt will print a newline itself. This makes the vectorizer
debug
file messy. I think the confusion is because dump_generic_expr does NOT
print a
newline. This patch removes all prints of a newline direcly after a
dump_gimple_stmt.

Tested by examining a selection of vect dump files.

gcc/
\* tree-vect-data-refs.c (vect_analyze_data_refs): Remove debug newline.
\* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): likewise.
(vect_can_advance_ivs_p): likewise.
(vect_update_ivs_after_vectorizer): likewise.
\* tree-vect-loop.c (vect_determine_vectorization_factor): likewise.
(vect_analyze_scalar_cycles_1): likewise.
(vect_analyze_loop_operations): likewise.
(report_vect_op): likewise.
(vect_is_slp_reduction): likewise.
(vect_is_simple_reduction): likewise.
(get_initial_def_for_induction): likewise.
(vect_transform_loop): likewise.
\* tree-vect-patterns.c (vect_recog_dot_prod_pattern): likewise.
(vect_recog_sad_pattern): likewise.
(vect_recog_widen_sum_pattern): likewise.
(vect_recog_widening_pattern): likewise.
(vect_recog_divmod_pattern): likewise.
\* tree-vect-slp.c (vect-build-slp_tree_1): likewise.
(vect_analyze_slp_instance): likewise.
(vect_transform_slp_perm_load): likewise.
(vect_schedule_slp_instance): likewise.
Wouldn't you also need to verify that the testsuite passes since it 
could potentially have tests which assume the extra newlines in the 
test's regexps?


So I'm fine with this patch once the testsuite has been verified to run 
without regressions on a target which exercises the vectorizer.


jeff


Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Jeff Law

On 06/06/2016 11:04 AM, Vladimir Makarov wrote:

On 06/06/2016 09:32 AM, Bernd Edlinger wrote:

Ping...

see https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02010.html



Thank you for working on the PR and sorry for the delay with LRA part of
review.

Change in lra-constraints.c is ok for me with the following change.
Instead of just

-  curr_alt[nop] = NO_REGS;
+  curr_alt[nop] = ALL_REGS;
   CLEAR_HARD_REG_SET (curr_alt_set[nop]);

I'd like to see

-  curr_alt[nop] = NO_REGS;
+  curr_alt[nop] = ALL_REGS;
-   CLEAR_HARD_REG_SET (curr_alt_set[nop]);
+ COPY_HARD_REG_SET (curr_alt_set[nop],
reg_class_contents[ALL_REGS]);

Also I don't see /* { dg-do compile } */ in the tests (I don't know what
dejagnu does when there is no any dejagnu actions in the test).
But with the addition '/* { dg-do compile } */' the test pr59155-2.c is
ok for me too.

As for recog.c, I can not approve this as I am not a maintainer of it.
I only can say that the code looks questionable to me.
I think the question on the recog part is a matter of how we choose to 
interpret what the "X" constraint means.


Does it literally mean accept anything, or accept some subset expressions?

I tend to think the former, which means that things like 
reg_overlap_mentioned_p or its callers have to be bullet-proofed.


jeff



Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-06 Thread Jakub Jelinek
On Mon, Jun 06, 2016 at 10:05:57AM +0200, Richard Biener wrote:
> So this ends up generating { a ? -1 : 0, b ? -1 : 0, ... }.  That

Yes, that is already what we do now for loop vectorization.

> might be less optimal than doing { a, b, ... } ? { -1, -1 ... } : { 0, 0, 
> .. }

Well, it would need to be
{ a, b, ... } != { 0, 0, ... } ? { -1, -1, ... } : { 0, 0, ... }
then, doesn't VEC_COND_EXPR assume the condition is in canonical
VECTOR_BOOLEAN_TYPE_P form?

Anyway, if something like the above would be faster, perhaps generic vector
lowering or some similar pass could detect that case post-vectorization and
optimize?

Jakub


Re: [PATCH, i386] Add native support for VIA C7, Eden and Nano CPUs

2016-06-06 Thread Joseph Myers
This patch is missing the invoke.texi changes to document all the new CPU 
names.

-- 
Joseph S. Myers
jos...@codesourcery.com


[Committed] Selftest framework (v8)

2016-06-06 Thread David Malcolm
On Mon, 2016-06-06 at 16:40 +0200, Bernd Schmidt wrote:
> On 06/06/2016 04:17 PM, David Malcolm wrote:
> > I'm testing a revised patch now, incorporating the above, and
> > renaming
> > s-selftests (plural) to s-selftest (singular) etc within
> > gcc/Makefile.in as requested by Bernhard elsewhere in this thread. 
> >  I
> > assume that change is OK?
> 
> Sure.

Thanks.  I've committed v8 of the patch to trunk as r237144 (having
verified bootstrap®test on x86_64-pc-linux-gnu).

For reference, here's what I committed.

gcc/ChangeLog:
* Makefile.in (OBJS): Add function-tests.o,
hash-map-tests.o, hash-set-tests.o, rtl-tests.o,
selftest-run-tests.o.
(OBJS-libcommon): Add selftest.o.
(OBJS-libcommon-target): Add selftest.o.
(all.internal): Add "selftest".
(all.cross): Likewise.
(selftest): New phony target.
(s-selftest): New target.
(selftest-gdb): New phony target.
(COLLECT2_OBJS): Add selftest.o.
* bitmap.c: Include "selftest.h".
(selftest::test_gc_alloc): New function.
(selftest::test_set_range): New function.
(selftest::test_clear_bit_in_middle): New function.
(selftest::test_copying): New function.
(selftest::test_bitmap_single_bit_set_p): New function.
(selftest::bitmap_c_tests): New function.
* common.opt (fself-test): New.
* diagnostic-show-locus.c: Include "selftest.h".
(make_range): New function.
(test_range_contains_point_for_single_point): New function.
(test_range_contains_point_for_single_line): New function.
(test_range_contains_point_for_multiple_lines): New function.
(assert_eq): New function.
(test_get_line_width_without_trailing_whitespace): New function.
(selftest::diagnostic_show_locus_c_tests): New function.
* et-forest.c: Include "selftest.h".
(selftest::test_single_node): New function.
(selftest::test_simple_tree): New function.
(selftest::test_disconnected_nodes): New function.
(selftest::et_forest_c_tests): New function.
* fold-const.c: Include "selftest.h".
(selftest::assert_binop_folds_to_const): New function.
(selftest::assert_binop_folds_to_nonlvalue): New function.
(selftest::test_arithmetic_folding): New function.
(selftest::fold_const_c_tests): New function.
* function-tests.c: New file.
* gimple.c: Include "selftest.h".
Include "gimple-pretty-print.h".
(selftest::verify_gimple_pp): New function.
(selftest::test_assign_single): New function.
(selftest::test_assign_binop): New function.
(selftest::test_nop_stmt): New function.
(selftest::test_return_stmt): New function.
(selftest::test_return_without_value): New function.
(selftest::gimple_c_tests): New function.
* hash-map-tests.c: New file.
* hash-set-tests.c: New file.
* input.c: Include "selftest.h".
(selftest::assert_loceq): New function.
(selftest::test_accessing_ordinary_linemaps): New function.
(selftest::test_unknown_location): New function.
(selftest::test_builtins): New function.
(selftest::test_reading_source_line): New function.
(selftest::input_c_tests): New function.
* rtl-tests.c: New file.
* selftest-run-tests.c: New file.
* selftest.c: New file.
* selftest.h: New file.
* spellcheck.c: Include "selftest.h".
(selftest::levenshtein_distance_unit_test_oneway): New function,
adapted from testsuite/gcc.dg/plugin/levenshtein_plugin.c.
(selftest::levenshtein_distance_unit_test): Likewise.
(selftest::spellcheck_c_tests): Likewise.
* toplev.c: Include selftest.h.
(toplev::run_self_tests): New.
(toplev::main): Handle -fself-test.
* toplev.h (toplev::run_self_tests): New.
* tree.c: Include "selftest.h".
(selftest::test_integer_constants): New function.
(selftest::test_identifiers): New function.
(selftest::test_labels): New function.
(selftest::tree_c_tests): New function.
* tree-cfg.c: Include "selftest.h".
(selftest::push_fndecl): New function.
(selftest::test_linear_chain): New function.
(selftest::test_diamond): New function.
(selftest::test_fully_connected): New function.
(selftest::tree_cfg_c_tests): New function.
* vec.c: Include "selftest.h".
(selftest::safe_push_range): New function.
(selftest::test_quick_push): New function.
(selftest::test_safe_push): New function.
(selftest::test_truncate): New function.
(selftest::test_safe_grow_cleared): New function.
(selftest::test_pop): New function.
(selftest::test_safe_insert): New function.
(selftest::test_ordered_remove): New function.
(selftest::test_unordered_rem

Re: [PATCH][RTL ifcvt] Print name of noce trasform that succeeded in dump file

2016-06-06 Thread Bernd Schmidt

On 06/06/2016 06:28 PM, Kyrill Tkachov wrote:

This patch adds the name of the transform that succeeded in
if-conversion and prints it to the
dump file so that we can pinpoint the extact noce_try* function that
triggered.


Ok.


Bernd



Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Vladimir Makarov

On 06/06/2016 09:32 AM, Bernd Edlinger wrote:

Ping...

see https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02010.html


Thank you for working on the PR and sorry for the delay with LRA part of 
review.


Change in lra-constraints.c is ok for me with the following change. 
Instead of just


- curr_alt[nop] = NO_REGS;
+ curr_alt[nop] = ALL_REGS;
  CLEAR_HARD_REG_SET (curr_alt_set[nop]);

I'd like to see

- curr_alt[nop] = NO_REGS;
+ curr_alt[nop] = ALL_REGS;
- CLEAR_HARD_REG_SET (curr_alt_set[nop]);
+ COPY_HARD_REG_SET (curr_alt_set[nop], 
reg_class_contents[ALL_REGS]);

Also I don't see /* { dg-do compile } */ in the tests (I don't know what 
dejagnu does when there is no any dejagnu actions in the test).
But with the addition '/* { dg-do compile } */' the test pr59155-2.c is ok for 
me too.

As for recog.c, I can not approve this as I am not a maintainer of it.
I only can say that the code looks questionable to me.



Re: Unreviewed patches

2016-06-06 Thread Gerald Pfeifer
On Mon, 6 Jun 2016, Rainer Orth wrote:
> The following patches have remained unreviewed for a week:
> 
>   [gotools, libcc1] Update copyright dates
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02307.html
> 
> Richard already approved the update-copyright.py changes, but the actual
> effects on gotools and libcc1 require either maintainer or release
> manager approval, I believe.

I think applying those updates is a direct consequence of updating
the update-copyright.py script and you can just go ahead.

Gerald


[PATCH v2] Update documentation for ARM architecture

2016-06-06 Thread Stefan Brüns
  * use lexicographical ordering, as "gcc -march=foo" does
  * correct usage of @samp vs @option, add @samp where appropriate
  * add armv6k, armv6z, arm6zk -march option values
  * remove -march=ep9312, it is only valid for -mcpu
  * add armv6s-m and document it, as it is no official ARM name.

Support for the OS extension/SVC is mandatory, non-supporting
implementations are deprecated (ARMv6-M Architecture Reference Manual, B.2)
---
 v2: Add Changelog entry
 Add @samp/@option changes

 gcc/ChangeLog   |  7 +++
 gcc/doc/invoke.texi | 21 ++---
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3e68798..8c1e54b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-06-06  Stefan Bruens  
+
+   * doc/invoke.texi (ARM Options): Use lexicographical ordering.
+   Correct usage of @samp vs @option, add @samp where appropriate.
+   Add -march={armv6k,armv6z,arm6zk}, remove -march=ep9312.
+   Add armv6s-m and document it, as it is no official ARM name.
+
 2016-06-06  Kyrylo Tkachov  
 
PR middle-end/37780
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4a28935..3572fd2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14065,16 +14065,23 @@ name to determine what kind of instructions it can 
emit when generating
 assembly code.  This option can be used in conjunction with or instead
 of the @option{-mcpu=} option.  Permissible names are: @samp{armv2},
 @samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t},
-@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te},
-@samp{armv6}, @samp{armv6j},
-@samp{armv6t2}, @samp{armv6z}, @samp{armv6kz}, @samp{armv6-m},
-@samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, @samp{armv7e-m},
+@samp{armv5}, @samp{armv5e}, @samp{armv5t}, @samp{armv5te},
+@samp{armv6}, @samp{armv6-m}, @samp{armv6j}, @samp{armv6k},
+@samp{armv6kz}, @samp{armv6s-m},
+@samp{armv6t2}, @samp{armv6z}, @samp{armv6zk},
+@samp{armv7}, @samp{armv7-a}, @samp{armv7-m}, @samp{armv7-r}, @samp{armv7e-m},
 @samp{armv7ve}, @samp{armv8-a}, @samp{armv8-a+crc}, @samp{armv8.1-a},
-@samp{armv8.1-a+crc}, @samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}.
+@samp{armv8.1-a+crc}, @samp{iwmmxt}, @samp{iwmmxt2}.
 
-Architecture revisions older than @option{armv4t} are deprecated.
+Architecture revisions older than @samp{armv4t} are deprecated.
 
-@option{-march=armv7ve} is the armv7-a architecture with virtualization
+@option{-march=armv6s-m} is the @samp{armv6-m} architecture with support for
+the (now mandatory) SVC instruction.
+
+@option{-march=armv6zk} is an alias for @samp{armv6kz}, existing for backwards
+compatibility.
+
+@option{-march=armv7ve} is the @samp{armv7-a} architecture with virtualization
 extensions.
 
 @option{-march=armv8-a+crc} enables code generation for the ARMv8-A
-- 
2.8.2



[PATCH][RTL ifcvt] Print name of noce trasform that succeeded in dump file

2016-06-06 Thread Kyrill Tkachov

Hi all,

When debugging the noce if-conversion passes one of the most frustrating and 
time-consuming
things I have to do is find which of the dozen or so transforms triggered.
You'd think going through the cascade of if-gotos in noce_process_if_block in 
gdb would work,
but this tends to be optimised in weird ways and stepping through them doesn't 
work accurately
most of the time.

This patch adds the name of the transform that succeeded in if-conversion and 
prints it to the
dump file so that we can pinpoint the extact noce_try* function that triggered.

Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-06-06  Kyrylo Tkachov  

* ifcvt.c (struct noce_if_info): Add transform_name field.
(noce_try_move): Set if_info->transform_name to the function name.
(noce_try_ifelse_collapse): Likewise.
(noce_try_store_flag): Likewise.
(noce_try_inverse_constants): Likewise.
(noce_try_store_flag_constants): Likewise.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_cmove_arith): Likewise.
(noce_try_minmax): Likewise.
(noce_try_abs): Likewise.
(noce_try_sign_mask): Likewise.
(noce_try_bitop): Likewise.
(noce_convert_multiple_sets): Likewise.
(noce_process_if_block): Print if_info->transform_name to
dump_file if transformation succeeded.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 05fac71409d401a08d01b7dc7cf164613f8477c4..4a277db7dcc4cd467299419b21bae0f2a2b42926 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -813,6 +813,10 @@ struct noce_if_info
 
   /* Estimated cost of the particular branch instruction.  */
   unsigned int branch_cost;
+
+  /* The name of the noce transform that succeeded in if-converting
+ this structure.  Used for debugging.  */
+  const char *transform_name;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1116,6 +1120,7 @@ noce_try_move (struct noce_if_info *if_info)
 	  emit_insn_before_setloc (seq, if_info->jump,
    INSN_LOCATION (if_info->insn_a));
 	}
+  if_info->transform_name = "noce_try_move";
   return TRUE;
 }
   return FALSE;
@@ -1148,6 +1153,8 @@ noce_try_ifelse_collapse (struct noce_if_info * if_info)
 
   emit_insn_before_setloc (seq, if_info->jump,
 			  INSN_LOCATION (if_info->insn_a));
+
+  if_info->transform_name = "noce_try_ifelse_collapse";
   return TRUE;
 }
 
@@ -1195,6 +1202,7 @@ noce_try_store_flag (struct noce_if_info *if_info)
 
   emit_insn_before_setloc (seq, if_info->jump,
 			   INSN_LOCATION (if_info->insn_a));
+  if_info->transform_name = "noce_try_store_flag";
   return TRUE;
 }
   else
@@ -1273,6 +1281,7 @@ noce_try_inverse_constants (struct noce_if_info *if_info)
 
   emit_insn_before_setloc (seq, if_info->jump,
 			   INSN_LOCATION (if_info->insn_a));
+  if_info->transform_name = "noce_try_inverse_constants";
   return true;
 }
 
@@ -1493,6 +1502,8 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 
   emit_insn_before_setloc (seq, if_info->jump,
 			   INSN_LOCATION (if_info->insn_a));
+  if_info->transform_name = "noce_try_store_flag_constants";
+
   return TRUE;
 }
 
@@ -1545,6 +1556,8 @@ noce_try_addcc (struct noce_if_info *if_info)
 
 	  emit_insn_before_setloc (seq, if_info->jump,
    INSN_LOCATION (if_info->insn_a));
+	  if_info->transform_name = "noce_try_addcc";
+
 	  return TRUE;
 	}
 	  end_sequence ();
@@ -1585,6 +1598,7 @@ noce_try_addcc (struct noce_if_info *if_info)
 
 	  emit_insn_before_setloc (seq, if_info->jump,
    INSN_LOCATION (if_info->insn_a));
+	  if_info->transform_name = "noce_try_addcc";
 	  return TRUE;
 	}
 	  end_sequence ();
@@ -1649,6 +1663,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 	  emit_insn_before_setloc (seq, if_info->jump,
    INSN_LOCATION (if_info->insn_a));
+	  if_info->transform_name = "noce_try_store_flag_mask";
+
 	  return TRUE;
 	}
 
@@ -1799,6 +1815,8 @@ noce_try_cmove (struct noce_if_info *if_info)
 
 	  emit_insn_before_setloc (seq, if_info->jump,
    INSN_LOCATION (if_info->insn_a));
+	  if_info->transform_name = "noce_try_cmove";
+
 	  return TRUE;
 	}
   /* If both a and b are constants try a last-ditch transformation:
@@ -1852,6 +1870,7 @@ noce_try_cmove (struct noce_if_info *if_info)
 
 	  emit_insn_before_setloc (seq, if_info->jump,
    INSN_LOCATION (if_info->insn_a));
+	  if_info->transform_name = "noce_try_cmove";
 	  return TRUE;
 	}
 	  else
@@ -2305,6 +2324,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   emit_insn_before_setloc (ifcvt_seq, if_info->jump,
 			   INSN_LOCATION (if_info->insn_a));
+  if_info->transform_name = "noce_try_cmove_arith";
   return TRUE;
 
  end_seq_and_fail:
@@ -2561,6 +2581,7 @@ noce_try_minmax (struct noce_if_info *if_info)
   emit_insn_before_set

Re: [PATCH][AArch64] Add missing fcsel in Cortex-A57 scheduler

2016-06-06 Thread Kyrill Tkachov


On 02/06/16 17:09, Wilco Dijkstra wrote:

The Cortex-A57 scheduler is missing fcsel, so add it.

OK for commit?

ChangeLog:
2016-06-02  Wilco Dijkstra  

* config/arm/cortex-a57.md (cortex_a57_fp_cpys): Add fcsel.


Ok from an arm perspective too.

Thanks,
Kyrill



---
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index 
37912db464315a0d70835b81991e8e07a4d9db89..9b5970a0b647abc364b733cb4e2e22ae03056235
 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -726,7 +726,7 @@
  
  (define_insn_reservation "cortex_a57_fp_cpys" 4

(and (eq_attr "tune" "cortexa57")
-   (eq_attr "type" "fmov"))
+   (eq_attr "type" "fmov,fcsel"))
"(ca57_cx1|ca57_cx2)")
  
  (define_insn_reservation "cortex_a57_fp_divs" 12






[PATCH][ARM] Add initial support for Cortex-A73

2016-06-06 Thread Kyrill Tkachov

Hi all,

This patch adds initial support for the Cortex-A73 processor through the
cortex-a73, cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to -mcpu 
and -mtune.

The Cortex-A73 is an ARMv8-A processor.

Bootstrapped and tested on arm-none-linux-gnueabihf with an appropriately
patched binutils that understands the relevant -mcpu argument.

Ok for trunk?

Thanks,
Kyrill

2016-06-06  Kyrylo Tkachov  

* config/arm/arm.c (arm_cortex_a73_tune): New struct.
* config/arm/arm-cores.def (cortex-a73): New entry.
(cortex-a73.cortex-a35): Likewise.
(cortex-a73.cortex-a53): Likewise.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Handle mcpu=cortex-a73,
mcpu=cortex-a73.cortex-a35 and mcpu=cortex-a73.cortex-a53.
* config/arm/t-aprofile: Handle mcpu=cortex-a73,
mcpu=cortex-a73.cortex-a35 and mcpu=cortex-a73.cortex-a53.
* doc/invoke.texi (ARM Options): Document cortex-a73,
cortex-a73.cortex-a35 and cortex-a73.cortex-a53.
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 22ecd52f67238724d55271a3732f8d706f6b683d..b4f327022b966db718bdce74ce3d28e3dae568a2 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -129,6 +129,7 @@ ARM_CORE("cortex-a35",	cortexa35, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
 ARM_CORE("cortex-a57",	cortexa57, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("cortex-a72",	cortexa72, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
+ARM_CORE("cortex-a73",	cortexa73, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
 ARM_CORE("exynos-m1",	exynosm1,  exynosm1,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1)
 ARM_CORE("qdf24xx",	qdf24xx,   cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("xgene1",  xgene1,xgene1,  8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
@@ -136,3 +137,6 @@ ARM_CORE("xgene1",  xgene1,xgene1,  8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCH
 /* V8 big.LITTLE implementations */
 ARM_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
+ARM_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
+ARM_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
+
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 7264bff9a349a3b2dd84d5a7cee8860cd24cbfcc..c665f65a3f0398148fb073861de6edc86ea19de5 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -229,6 +229,9 @@ EnumValue
 Enum(processor_type) String(cortex-a72) Value(cortexa72)
 
 EnumValue
+Enum(processor_type) String(cortex-a73) Value(cortexa73)
+
+EnumValue
 Enum(processor_type) String(exynos-m1) Value(exynosm1)
 
 EnumValue
@@ -243,6 +246,12 @@ Enum(processor_type) String(cortex-a57.cortex-a53) Value(cortexa57cortexa53)
 EnumValue
 Enum(processor_type) String(cortex-a72.cortex-a53) Value(cortexa72cortexa53)
 
+EnumValue
+Enum(processor_type) String(cortex-a73.cortex-a35) Value(cortexa73cortexa35)
+
+EnumValue
+Enum(processor_type) String(cortex-a73.cortex-a53) Value(cortexa73cortexa53)
+
 Enum
 Name(arm_arch) Type(int)
 Known ARM architectures (for use with the -march= option):
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index a337304340a5322840f2ba863d703188f051f4fa..50710232d45db3a87ff83cd9728495c01697e904 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -16,13 +16,15 @@ (define_attr "tune"
 	arm1156t2s,arm1156t2fs,cortexm1,
 	cortexm0,cortexm0plus,cortexm1smallmultiply,
 	cortexm0smallmultiply,cortexm0plussmallmultiply,genericv7a,
-	cortexa5,cortexa7,cortexr8,cortexa8,
+	cortexa5,cortexa7,cortexa8,
 	cortexa9,cortexa12,cortexa15,
 	cortexa17,cortexr4,cortexr4f,
-	cortexr5,cortexr7,cortexm7,
-	cortexm4,cortexm3,marvell_pj4,
-	cortexa15cortexa7,cortexa17cortexa7,cortexa32,
-	cortexa35,cortexa53,cortexa57,
-	cortexa72,exynosm1,qdf24xx,
-	xgene1,cortexa57cortexa53,cortexa72cortexa53"
+	cortexr5,cortexr7,cortexr8,
+	cortexm7,cortexm4,cortexm3,
+	marvell_pj4,cortexa15cortexa7,cortexa17cortexa7,
+	cortexa32,cortexa35,cortexa53,
+	cortexa57,cortexa72,cortexa73,
+	exynosm1,qdf24xx,xgene1,
+	cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
+	cortexa73cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))

[PATCH][AArch64] Add initial support for Cortex-A73

2016-06-06 Thread Kyrill Tkachov

Hi all,

This patch adds initial support for the Cortex-A73 processor through the
cortex-a73, cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to -mcpu 
and -mtune.

The Cortex-A73 is an ARMv8-A processor and the initial tuning is based on
the Cortex-A57 tuning (though not an exact copy).

Bootstrapped and tested on aarch64-none-linux-gnu with an appropriately
patched binutils that understands the relevant -mcpu argument.

Ok for trunk?

Thanks,
Kyrill

2016-06-06  Kyrylo Tkachov  

* config/aarch64/aarch64.c (cortexa73_tunings): New struct.
* config/aarch64/aarch64-cores.def (cortex-a73): New entry.
(cortex-a73.cortex-a35): Likewise.
(cortex-a73.cortex-a53): Likewise.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document cortex-a73,
cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to
-mcpu and -mtune.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 251a3ebb9be82def8f257cbdcab440d7a51d478b..3bbf42504c528fc364af19f422ff79dc0f8b7cd8 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -44,6 +44,7 @@ AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
 AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
+AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09")
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
 AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
@@ -53,4 +54,5 @@ AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xge
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
 AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03")
-
+AARCH64_CORE("cortex-a73.cortex-a35",  cortexa73cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09.0xd04")
+AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09.0xd03")
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index cbc6f4879edb2f3842a50dfafe206313d49e9cf8..392dfbd0d922007b2d245d168ab5cf95db2670b5 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+	"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c98d6343cfa6fec4ecf686720046a21f46920d58..1784b9215b3ada0dd944838b7ca9f9dbfaf750ec 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -519,6 +519,31 @@ static const struct tune_params cortexa72_tunings =
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
+static const struct tune_params cortexa73_tunings =
+{
+  &cortexa57_extra_costs,
+  &cortexa57_addrcost_table,
+  &cortexa57_regmove_cost,
+  &cortexa57_vector_cost,
+  &generic_branch_cost,
+  4, /* memmov_cost.  */
+  2, /* issue_rate.  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  16,	/* function_align.  */
+  8,	/* jump_align.  */
+  4,	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
+};
+
 static const struct tune_params exynosm1_tunings =
 {
   &exynosm1_extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 75b50597e981fdb4fcf1d309cc9b026aac7c3f38..51879

Re: [v3 PATCH] Support allocators in tuples of zero size.

2016-06-06 Thread Jonathan Wakely

On 05/06/16 21:15 +0300, Ville Voutilainen wrote:

{
public:
  void swap(tuple&) noexcept { /* no-op */ }
+  // We need the default since we're going to define no-op
+  // allocator constructors.
+  tuple() = default;
+  // No-op allocator constructors.
+  template
+   tuple(allocator_arg_t __tag, const _Alloc& __a) { }
+  template
+   tuple(allocator_arg_t __tag, const _Alloc& __a, const tuple& __in) { }


Please remove the names of the unused parameters, so we don't get
warnings with -Wsystem-headers -Wunused-parameter.

OK with that change, thanks.



[PATCH][AArch64] Model CSEL instruction in Cortex-A57 scheduling model

2016-06-06 Thread Kyrill Tkachov

Hi all,

This small patch adds handling of the CSEL-type instructions to the Cortex-A57 
scheduling model.
It is treated the same as simple ALU instructions.

With this patch I didn't see any overall differences in SPEC2006.

Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-linux-gnu.

Ok for trunk?

The patch is very simple and the csel value isn't used in any arm instructions 
so I think
just an aarch64 approval for this should be enough.

Thanks,
Kyrill

2016-06-06  Kyrylo Tkachov  

* config/arm/cortex-a57.md (cortex_a57_alu):
Handle csel type.
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index 37912db464315a0d70835b81991e8e07a4d9db89..c8cf80f4ba7ed99b46c920c2d0ad3299050ec473 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -297,7 +297,7 @@ (define_insn_reservation "cortex_a57_alu" 2
(eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,\
 			alu_sreg,alus_sreg,logic_reg,logics_reg,\
 			adc_imm,adcs_imm,adc_reg,adcs_reg,\
-			adr,bfm,clz,rbit,rev,alu_dsp_reg,\
+			adr,bfm,clz,csel,rbit,rev,alu_dsp_reg,\
 			rotate_imm,shift_imm,shift_reg,\
 			mov_imm,mov_reg,\
 			mvn_imm,mvn_reg,\


[PATCH GCC]Remove duplciated alias check in vectorizer

2016-06-06 Thread Bin Cheng
Hi,
GCC now generates duplicated alias check in vectorizer when versioning loops.  
In current implementation, DR_OFFSET and DR_INIT are added together too early 
when creating structure dr_with_seg_len.  This has two disadvantages: A) 
structure dr_with_seg_len_pair_t is only canonicalized against DR_BASE_ADDRESS 
in function vect_prune_runtime_alias_test_list, while it should be against 
DR_OFFSET too; B) When function vect_prune_runtime_alias_test_list tries to 
merge aias checks with consecutive memory references, it can only handle DRs 
with constant DR_OFFSET + DR_INIT, as in below code:
  /* We consider the case that DR_B1 and DR_B2 are same memrefs,
 and DR_A1 and DR_A2 are two consecutive memrefs.  */
  //... ...
  if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
DR_BASE_ADDRESS (dr_a2->dr),
0)
  || !tree_fits_shwi_p (dr_a1->offset)
  || !tree_fits_shwi_p (dr_a2->offset))
continue;

Both disadvantages result in duplicated/unnecessary alias checks, as well as 
bloated condition basic block of loop versioning.  
This patch fixes the issue.  Bootstrap and test on x86_64 and AArch64.  Is it 
OK?
Test gfortran.dg/vect/vect-8.f90 failed now.  It scans for "vectorized 20 
loops" but with this patch there are more than 20 loops vectorized.  The 
additional loop wasn't vectorized because # of alias checks exceeded parameter 
bound "vect-max-version-for-alias-checks" w/o this patch.

There are other issues in vectorizer alias checking, I will tackle them in 
follow up patches.

Thanks,
bin

2016-06-03  Bin Cheng  

* tree-vectorizer.h (struct dr_with_seg_len): Remove class
member OFFSET.
* tree-vect-data-refs.c (operator ==): Handle DR_OFFSET directly,
rather than OFFSET.
(comp_dr_with_seg_len_pair, comp_dr_with_seg_len_pair): Ditto.
(vect_create_cond_for_alias_checks): Ditto.
(vect_prune_runtime_alias_test_list): Also canonicalize pairs
against DR_OFFSET.  Handle DR_OFFSET directly when prune alias
checks.

gcc/testsuite/ChangeLog
2016-06-03  Bin Cheng  

* gcc.dg/vect/vect-alias-check-1.c: New test.diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 36d302a..ba4d637 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2845,7 +2845,8 @@ operator == (const dr_with_seg_len& d1,
 {
   return operand_equal_p (DR_BASE_ADDRESS (d1.dr),
  DR_BASE_ADDRESS (d2.dr), 0)
-  && compare_tree (d1.offset, d2.offset) == 0
+  && compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0
+  && compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
   && compare_tree (d1.seg_len, d2.seg_len) == 0;
 }
 
@@ -2855,15 +2856,12 @@ operator == (const dr_with_seg_len& d1,
so that we can combine aliasing checks in one scan.  */
 
 static int
-comp_dr_with_seg_len_pair (const void *p1_, const void *p2_)
+comp_dr_with_seg_len_pair (const void *pa_, const void *pb_)
 {
-  const dr_with_seg_len_pair_t* p1 = (const dr_with_seg_len_pair_t *) p1_;
-  const dr_with_seg_len_pair_t* p2 = (const dr_with_seg_len_pair_t *) p2_;
-
-  const dr_with_seg_len &p11 = p1->first,
-   &p12 = p1->second,
-   &p21 = p2->first,
-   &p22 = p2->second;
+  const dr_with_seg_len_pair_t* pa = (const dr_with_seg_len_pair_t *) pa_;
+  const dr_with_seg_len_pair_t* pb = (const dr_with_seg_len_pair_t *) pb_;
+  const dr_with_seg_len &a1 = pa->first, &a2 = pa->second;
+  const dr_with_seg_len &b1 = pb->first, &b2 = pb->second;
 
   /* For DR pairs (a, b) and (c, d), we only consider to merge the alias checks
  if a and c have the same basic address snd step, and b and d have the same
@@ -2871,19 +2869,23 @@ comp_dr_with_seg_len_pair (const void *p1_, const void 
*p2_)
  and step, we don't care the order of those two pairs after sorting.  */
   int comp_res;
 
-  if ((comp_res = compare_tree (DR_BASE_ADDRESS (p11.dr),
-   DR_BASE_ADDRESS (p21.dr))) != 0)
+  if ((comp_res = compare_tree (DR_BASE_ADDRESS (a1.dr),
+   DR_BASE_ADDRESS (b1.dr))) != 0)
+return comp_res;
+  if ((comp_res = compare_tree (DR_BASE_ADDRESS (a2.dr),
+   DR_BASE_ADDRESS (b2.dr))) != 0)
 return comp_res;
-  if ((comp_res = compare_tree (DR_BASE_ADDRESS (p12.dr),
-   DR_BASE_ADDRESS (p22.dr))) != 0)
+  if ((comp_res = compare_tree (DR_STEP (a1.dr), DR_STEP (b1.dr))) != 0)
 return comp_res;
-  if ((comp_res = compare_tree (DR_STEP (p11.dr), DR_STEP (p21.dr))) != 0)
+  if ((comp_res = compare_tree (DR_STEP (a2.dr), DR_STEP (b2.dr))) != 0)
 return comp_res;
-  if ((comp_res = compare_tree (DR_STEP (p12.dr), DR_STEP (p22.dr))) != 0)
+  if ((comp_res = compare_tree (DR_OFFSET (a1.dr), DR_OFFSET (b1.dr)

[PATCH] libstdc++/71320 Add or remove file permissions correctly

2016-06-06 Thread Jonathan Wakely

This adds the missing functionality to filesystem::permissions().

PR libstdc++/71320
* src/filesystem/ops.cc (permissions(const path&, perms, error_code&)):
Add or remove permissions according to perms argument.
* testsuite/experimental/filesystem/operations/permissions.cc: New
test.

Tested x86_64-linux, committed to trunk, backports to follow.


commit bab71865ec13fff8dc212e6e538abe3cde7de6e8
Author: redi 
Date:   Mon Jun 6 15:50:01 2016 +

libstdc++/71320 Add or remove file permissions correctly

PR libstdc++/71320
* src/filesystem/ops.cc (permissions(const path&, perms, error_code&)):
Add or remove permissions according to perms argument.
* testsuite/experimental/filesystem/operations/permissions.cc: New
test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237136 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index 5b82088..67ed8e6 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -1084,6 +1084,28 @@ fs::permissions(const path& p, perms prms)
 
 void fs::permissions(const path& p, perms prms, error_code& ec) noexcept
 {
+  const bool add = is_set(prms, perms::add_perms);
+  const bool remove = is_set(prms, perms::remove_perms);
+  if (add && remove)
+{
+  ec = std::make_error_code(std::errc::invalid_argument);
+  return;
+}
+
+  prms &= perms::mask;
+
+  if (add || remove)
+{
+  auto st = status(p, ec);
+  if (ec)
+   return;
+  auto curr = st.permissions();
+  if (add)
+   prms |= curr;
+  else
+   prms = curr & ~prms;
+}
+
 #if _GLIBCXX_USE_FCHMODAT
   if (::fchmodat(AT_FDCWD, p.c_str(), static_cast(prms), 0))
 #else
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/permissions.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/permissions.cc
new file mode 100644
index 000..e414860
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/operations/permissions.cc
@@ -0,0 +1,51 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11 -lstdc++fs" }
+// { dg-require-filesystem-ts "" }
+
+// 15.26 Permissions [fs.op.permissions]
+
+#include 
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  bool test __attribute__((unused)) = true;
+  using perms = std::experimental::filesystem::perms;
+
+  auto p = __gnu_test::nonexistent_path();
+  std::ofstream{p.native()};
+  VERIFY( exists(p) );
+  permissions(p, perms::owner_all);
+  VERIFY( status(p).permissions() == perms::owner_all );
+  permissions(p, perms::group_read | perms::add_perms);
+  VERIFY( status(p).permissions() == (perms::owner_all | perms::group_read) );
+  permissions(p, perms::group_read | perms::remove_perms);
+  VERIFY( status(p).permissions() == perms::owner_all );
+
+  remove(p);
+}
+
+int
+main()
+{
+  test01();
+}


Re: C PATCH for comptypes handling of TYPE_REF_CAN_ALIAS_ALL

2016-06-06 Thread Joseph Myers
On Mon, 6 Jun 2016, Marek Polacek wrote:

> > I don't see how this test is supposed to verify properties of the 
> > composite type.  I'd expect you to need to verify that something does not 
> > get optimized away, that would get optimized away in the absence of 
> > may_alias.
> 
> Well, were it not for the may_alias attribute, we'd warn about type punning
> (hence the -O2), so I thought that this test would be enough.

In that case, the patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2] gcov: Runtime configurable destination output

2016-06-06 Thread Nathan Sidwell


I applied this patch.  Aaron's patch, AFAICT, would repeatedly fopen the error 
file.

nathan
2016-06-05  Aaron Conole  
	Nathan Sidwell  

	PR libgcc/71400
	* libgcov-driver-system.c (__gcov_error_file): Disable if IN_GCOV_TOOL.
	(get_gcov_error_file): Check __gcov_error_file before trying to
	initialize it.
	(gcov_error): Always use get_gcov_error_file.

Index: libgcov-driver-system.c
===
--- libgcov-driver-system.c	(revision 237131)
+++ libgcov-driver-system.c	(working copy)
@@ -23,31 +23,32 @@ a copy of the GCC Runtime Library Except
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
+#if !IN_GCOV_TOOL
 /* Configured via the GCOV_ERROR_FILE environment variable;
it will either be stderr, or a file of the user's choosing.
Non-static to prevent multiple gcov-aware shared objects from
instantiating their own copies. */
 FILE *__gcov_error_file = NULL;
+#endif
 
 /* A utility function to populate the __gcov_error_file pointer.
This should NOT be called outside of the gcov system driver code. */
 
 static FILE *
-get_gcov_error_file(void)
+get_gcov_error_file (void)
 {
-#if !IN_GCOV_TOOL
+#if IN_GCOV_TOOL
   return stderr;
 #else
-  char *gcov_error_filename = getenv ("GCOV_ERROR_FILE");
-
-  if (gcov_error_filename)
+  if (!__gcov_error_file)
 {
-  FILE *openfile = fopen (gcov_error_filename, "a");
-  if (openfile)
-__gcov_error_file = openfile;
+  const char *gcov_error_filename = getenv ("GCOV_ERROR_FILE");
+
+  if (gcov_error_filename)
+	__gcov_error_file = fopen (gcov_error_filename, "a");
+  if (!__gcov_error_file)
+	__gcov_error_file = stderr;
 }
-  if (!__gcov_error_file)
-__gcov_error_file = stderr;
   return __gcov_error_file;
 #endif
 }
@@ -60,11 +61,8 @@ gcov_error (const char *fmt, ...)
   int ret;
   va_list argp;
 
-  if (!__gcov_error_file)
-__gcov_error_file = get_gcov_error_file ();
-
   va_start (argp, fmt);
-  ret = vfprintf (__gcov_error_file, fmt, argp);
+  ret = vfprintf (get_gcov_error_file (), fmt, argp);
   va_end (argp);
   return ret;
 }


Re: C PATCH for comptypes handling of TYPE_REF_CAN_ALIAS_ALL

2016-06-06 Thread Marek Polacek
On Mon, Jun 06, 2016 at 03:06:18PM +, Joseph Myers wrote:
> On Tue, 31 May 2016, Marek Polacek wrote:
> 
> > > diff --git gcc/testsuite/c-c++-common/attr-may-alias-1.c 
> > > gcc/testsuite/c-c++-common/attr-may-alias-1.c
> > > index e69de29..978b9a5 100644
> > > --- gcc/testsuite/c-c++-common/attr-may-alias-1.c
> > > +++ gcc/testsuite/c-c++-common/attr-may-alias-1.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -Wall" } */
> > > +
> > > +typedef int T __attribute__((may_alias));
> > > +
> > > +extern T *p;
> > > +extern int *p;
> > > +
> > > +extern int *p2;
> > > +extern T *p2;
> > > +
> > > +void fn1 (T);
> > > +void fn1 (int);
> > > +
> > > +void fn2 (int);
> > > +void fn2 (T);
> > > +
> > > +/* Ensure that the composite types have may_alias.  */
> > > +void
> > > +f (long *i)
> > > +{
> > > +  *i = *(__typeof (*p) *) &p;
> > > +  asm ("" : : "r" (*p));
> > > +  *i = *(__typeof (*p2) *) &p2;
> > > +  asm ("" : : "r" (*p2));
> 
> I don't see how this test is supposed to verify properties of the 
> composite type.  I'd expect you to need to verify that something does not 
> get optimized away, that would get optimized away in the absence of 
> may_alias.

Well, were it not for the may_alias attribute, we'd warn about type punning
(hence the -O2), so I thought that this test would be enough.

Marek


Re: [PATCH 4/4] C: add fixit hint to misspelled field names

2016-06-06 Thread Joseph Myers
On Tue, 31 May 2016, David Malcolm wrote:

> Ping:
>   https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01834.html

OK.  What about field names in designated initializers (both C99-style and 
old-style)?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Support x86-64 TLS code sequences without PLT

2016-06-06 Thread Carlos O'Donell
On 06/03/2016 05:21 PM, H.J. Lu wrote:
> We can generate x86-64 TLS code sequences for general and local dynamic
> models without PLT, which uses indirect call via GOT:
> 
> call *__tls_get_addr@GOTPCREL(%rip)
> 
> instead of direct call:
> 
> call __tls_get_addr[@PLT]

What are the actual pros and cons of this change?

Does this improve security? Performance?

The __tls_get_addr symbol, on x86_64, lives in ld.so, which generally
means that all shared objects (GD usage) indirect through their PLT/GOT
to make the call. In this model, and because of lazy linking, the
PLT-related GOT entries are left read-write to be updated after resolution
(ignore the BIND_NOW + RELRO case since in that case we do all of this
up front).

After your change, without a PLT entry, these symbols can no longer be 
interposed? The static linker would generate a binding (a got reloc for
the symbol which is resolved by the dynamic loader) that cannot be changed,
becomes RO after RELRO?

Is the security benefit worth the loss of interposition for this symbol?

Is there any performance gains?

-- 
Cheers,
Carlos.


Re: C PATCH for comptypes handling of TYPE_REF_CAN_ALIAS_ALL

2016-06-06 Thread Joseph Myers
On Tue, 31 May 2016, Marek Polacek wrote:

> > diff --git gcc/testsuite/c-c++-common/attr-may-alias-1.c 
> > gcc/testsuite/c-c++-common/attr-may-alias-1.c
> > index e69de29..978b9a5 100644
> > --- gcc/testsuite/c-c++-common/attr-may-alias-1.c
> > +++ gcc/testsuite/c-c++-common/attr-may-alias-1.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -Wall" } */
> > +
> > +typedef int T __attribute__((may_alias));
> > +
> > +extern T *p;
> > +extern int *p;
> > +
> > +extern int *p2;
> > +extern T *p2;
> > +
> > +void fn1 (T);
> > +void fn1 (int);
> > +
> > +void fn2 (int);
> > +void fn2 (T);
> > +
> > +/* Ensure that the composite types have may_alias.  */
> > +void
> > +f (long *i)
> > +{
> > +  *i = *(__typeof (*p) *) &p;
> > +  asm ("" : : "r" (*p));
> > +  *i = *(__typeof (*p2) *) &p2;
> > +  asm ("" : : "r" (*p2));

I don't see how this test is supposed to verify properties of the 
composite type.  I'd expect you to need to verify that something does not 
get optimized away, that would get optimized away in the absence of 
may_alias.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][1/3][ARM] Keep ctz expressions together until after reload

2016-06-06 Thread Ramana Radhakrishnan
On Thu, May 26, 2016 at 10:52 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> On arm we don't have a dedicated instruction that corresponds to a CTZ rtx
> but we synthesise it
> with an RBIT instruction followed by a CLZ. This is currently done at expand
> time.
> However, I'd like to push that step until after reload and keep the CTZ rtx
> as a single whole in
> the early RTL optimisers.  This better expresses the semantics of the
> operation as a whole, since
> the RBIT operation is represented as an UNSPEC anyway and so will not see
> the benefits of combine,
> and a CTZ-specific optimisation that is implemented in patch 3/3 of this
> series won't be triggered
> if the expression is broken up into an UNSPEC and a CLZ.
>
> Therefore this patch changes the expander to expand to a CTZ rtx and split
> it after reload into
> an RBIT + CLZ to allow sched2 to schedule them apart if it deems necessary.
> This patch enables the optimisation in patch 3/3 where the appropriate test
> is added.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2016-05-26  Kyrylo Tkachov  
>
> PR middle-end/37780
> * config/arm/arm.md (ctzsi2): Convert to define_insn_and_split.

OK.

Ramana


Re: [PATCH] Selftest framework (v7)

2016-06-06 Thread Bernd Schmidt

On 06/06/2016 04:17 PM, David Malcolm wrote:

I'm testing a revised patch now, incorporating the above, and renaming
s-selftests (plural) to s-selftest (singular) etc within
gcc/Makefile.in as requested by Bernhard elsewhere in this thread.  I
assume that change is OK?


Sure.


Bernd



Re: [PATCH][1/3][ARM] Keep ctz expressions together until after reload

2016-06-06 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02078.html

Patches 1 and 3 have been approved.
Thanks,
Kyrill

On 26/05/16 10:52, Kyrill Tkachov wrote:

Hi all,

On arm we don't have a dedicated instruction that corresponds to a CTZ rtx but 
we synthesise it
with an RBIT instruction followed by a CLZ. This is currently done at expand 
time.
However, I'd like to push that step until after reload and keep the CTZ rtx as 
a single whole in
the early RTL optimisers.  This better expresses the semantics of the operation 
as a whole, since
the RBIT operation is represented as an UNSPEC anyway and so will not see the 
benefits of combine,
and a CTZ-specific optimisation that is implemented in patch 3/3 of this series 
won't be triggered
if the expression is broken up into an UNSPEC and a CLZ.

Therefore this patch changes the expander to expand to a CTZ rtx and split it 
after reload into
an RBIT + CLZ to allow sched2 to schedule them apart if it deems necessary.
This patch enables the optimisation in patch 3/3 where the appropriate test is 
added.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* config/arm/arm.md (ctzsi2): Convert to define_insn_and_split.




Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-06-06 Thread Bernd Schmidt

On 05/26/2016 11:53 AM, Kyrill Tkachov wrote:


2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* ifcvt.c (noce_try_ifelse_collapse): New function.
Declare prototype.
(noce_process_if_block): Call noce_try_ifelse_collapse.
* simplify-rtx.c (simplify_cond_clz_ctz): New function.
(simplify_ternary_operation): Use the above to simplify
conditional CLZ/CTZ expressions.

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* gcc.c-torture/execute/pr37780.c: New test.
* gcc.target/aarch64/pr37780_1.c: Likewise.
* gcc.target/arm/pr37780_1.c: Likewise.


Nice. Ok.


Bernd


Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-06-06 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02080.html

Thanks,
Kyrill

On 26/05/16 10:53, Kyrill Tkachov wrote:

Hi all,

In this PR we want to optimise:
int foo (int i)
{
  return (i == 0) ? N : __builtin_clz (i);
}

on targets where CLZ is defined at zero to the constant 'N'.
This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO macro.
The obvious place to implement this would be in combine through simplify-rtx 
where we'd
recognise an IF_THEN_ELSE of the form:
(set (reg:SI r1)
 (if_then_else:SI (ne (reg:SI r2)
  (const_int 0 [0]))
   (clz:SI (reg:SI r2))
   (const_int 32)))

and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify it 
into
just (clz:SI (reg:SI r2)).
However, I found this doesn't quite happen for a couple of reasons:
1) This depends on ifcvt or some other pass to have created a conditional move 
of the
two branches that provide the IF_THEN_ELSE to propagate the const_int and clz 
operation into.

2) Combine will refuse to propagate r2 from the above example into both the 
condition and the
CLZ at the same time, so the most we see is:
(set (reg:SI r1)
 (if_then_else:SI (ne (reg:CC cc)
(const_int 0))
   (clz:SI (reg:SI r2))
   (const_int 32)))

which is not enough information to perform the simplification.

This patch implements the optimisation in ce1 using the noce ifcvt framework.
During ifcvt noce_process_if_block can see that we're trying to optimise 
something
of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all the 
information
needed to perform the transformation.

The transformation is performed by adding a new noce_try* function that tries 
to put the
condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to 
simplify that
using the simplify-rtx machinery. That way, we can implement the simplification 
logic in
simplify-rtx.c where it belongs.

A similar transformation for CTZ is implemented as well.
So for code:
int foo (int i)
{
  return (i == 0) ? 32 : __builtin_clz (i);
}

On aarch64 we now emit:
foo:
clz w0, w0
ret

instead of:
foo:
mov w1, 32
clz w2, w0
cmp w0, 0
cselw0, w2, w1, ne
ret

and for arm similarly we generate:
foo:
clz r0, r0
bx  lr

instead of:
foo:
cmp r0, #0
clzne   r0, r0
moveq   r0, #32
bx  lr


and for x86_64 with -O2 -mlzcnt we generate:
foo:
xorl%eax, %eax
lzcntl  %edi, %eax
ret

instead of:
foo:
xorl%eax, %eax
movl$32, %edx
lzcntl  %edi, %eax
testl   %edi, %edi
cmove   %edx, %eax
ret


I tried getting this to work on other targets as well, but encountered 
difficulties.
For example on powerpc the two arms of the condition seen during ifcvt are:

(insn 4 22 11 4 (set (reg:DI 156 [  ])
(const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
 (nil))
and
(insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [  ]) 0)
(clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
 (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
(nil)))

So the setup code in noce_process_if_block sees that the set destination is not 
the same
((reg:DI 156 [  ]) and (subreg/s/u:SI (reg:DI 156 [  ]) 0))
so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
I suppose that's a consequence of how SImode operations are represented in 
early RTL
on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be 
taught to handle
destinations that are subregs of one another, but that would be a separate 
patch.

Anyway, is this patch ok for trunk?

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, 
x86_64-pc-linux-gnu.

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* ifcvt.c (noce_try_ifelse_collapse): New function.
Declare prototype.
(noce_process_if_block): Call noce_try_ifelse_collapse.
* simplify-rtx.c (simplify_cond_clz_ctz): New function.
(simplify_ternary_operation): Use the above to simplify
conditional CLZ/CTZ expressions.

2016-05-26  Kyrylo Tkachov  

PR middle-end/37780
* gcc.c-torture/execute/pr37780.c: New test.
* gcc.target/aarch64/pr37780_1.c: Likewise.
* gcc.target/arm/pr37780_1.c: Likewise.




Re: [PATCH] Selftest framework (v7)

2016-06-06 Thread David Malcolm
On Sun, 2016-06-05 at 13:37 +0200, Bernd Schmidt wrote:
> On 06/03/2016 09:12 PM, David Malcolm wrote:
> > It's not clear to me if these approvals still hold.
> 
> I was willing to go with it; I had a look through some of these
> patches 
> and didn't spot anything untoward. To make it clear, this patch is
> OK, 
> with one tweak if possible: extend the namespace selftest to cover
> the 
> various helper functions (some of these have names like from_int
> which 
> ideally we wouldn't leak into the rest of the compiler). 

I believe that apart from the from_int specializations, everything else
is marked with "static" (we can't mark template specializations with
"static").


> As far as I can 
> tell this just involves moving the start of namespace selftest
> upwards a 
> bit in the files where we have tests.

Yes, and it does seem cleaner to have all of the selftest code start
like this:

  #if CHECKING_P

  namespace selftest {

I'll make that change, apart from in diagnostic-show-locus.c, where the test 
functions are already within an anonymous namespace (the one containing the 
implementation).

> A few other minor things...
> 
> > +  tree bind_expr =
> > +build3 (BIND_EXPR, void_type_node, NULL, stmt_list, block);
> 
> Operators go at the start of the line.

Fixed.

> > +  tree fn_type = build_function_type_array (integer_type_node, /*
> > return_type */
> 
> The line is too long, and we don't do /* arg name */ anyway.

Fixed.


> > +static void
> > +assert_loceq (const char *exp_filename,
> > + int exp_linenum,
> > + int exp_colnum,
> > + location_t loc)
> 
> > +static layout_range
> > +make_range (int start_line, int start_col,
> > +   int end_line, int end_col)
> 
> These lines are too short :) Could save some vertical space here.

Fixed.


> For the future - I found the single merged patch easier to deal with 
> than the 16- or 21-patch series. Split ups are often good when
> modifying 
> the same code in multiple logically independent steps (keeping in
> mind 
> that bugfixes to newly added code shouldn't be split out either).
> This 
> is a different situation where the patches weren't truly independent,
> and the merged patch is essentially just a concatenation, so
> splitting 
> it up does not really make the review any easier (potentially harder
> if 
> you have to switch between mails rather than just hitting PgUp/Dn.

OK.  Sorry about that.

I'm testing a revised patch now, incorporating the above, and renaming
s-selftests (plural) to s-selftest (singular) etc within
gcc/Makefile.in as requested by Bernhard elsewhere in this thread.  I
assume that change is OK?

Thanks

Dave


[v2][AArch64, 5/6] Reimplement fabd intrinsics & merge rtl patterns

2016-06-06 Thread Jiong Wang
These intrinsics were implemented before "fabd_3" introduces.  
Meanwhile

the patterns "fabd_3" and "*fabd_scalar3" can be merged into a
single "fabd3" using VALLF.

This patch migrate the implementation to builtins backed by this pattern.

gcc/
2016-06-01  Jiong Wang 

* config/aarch64/aarch64-builtins.def (fabd): New builtins for 
modes

VALLF.
* config/aarch64/aarch64-simd.md (fabd_3): Extend modes 
from VDQF

to VALLF.  Rename to "fabd3".
"*fabd_scalar3): Delete.
* config/aarch64/arm_neon.h (vabds_f32): Remove inline assembly.
Use builtin.
(vabdd_f64): Likewise.
(vabd_f32): Likewise.
(vabd_f64): Likewise.
(vabdq_f32): Likewise.
(vabdq_f64): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 1955d171d727e8995795d343ea766f130be0985e..deab3450ab74fcd6dfcf8267fa9cedfc1423ca4e 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -465,3 +465,6 @@
 
   /* Implemented by aarch64_rsqrts.  */
   BUILTIN_VALLF (BINOP, rsqrts, 0)
+
+  /* Implemented by fabd3.  */
+  BUILTIN_VALLF (BINOP, fabd, 3)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 78a87b1fb52b5b5e21ef5cd7dbe090c863369775..ad8b9c1d0c155d022be2e7e7c426120b551f3f2b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -474,23 +474,14 @@
   [(set_attr "type" "neon_arith_acc")]
 )
 
-(define_insn "fabd_3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(abs:VDQF (minus:VDQF
-		   (match_operand:VDQF 1 "register_operand" "w")
-		   (match_operand:VDQF 2 "register_operand" "w"]
-  "TARGET_SIMD"
-  "fabd\t%0., %1., %2."
-  [(set_attr "type" "neon_fp_abd_")]
-)
-
-(define_insn "*fabd_scalar3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-(abs:GPF (minus:GPF
- (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w"]
+(define_insn "fabd3"
+  [(set (match_operand:VALLF 0 "register_operand" "=w")
+	(abs:VALLF
+	  (minus:VALLF
+	(match_operand:VALLF 1 "register_operand" "w")
+	(match_operand:VALLF 2 "register_operand" "w"]
   "TARGET_SIMD"
-  "fabd\t%0, %1, %2"
+  "fabd\t%0, %1, %2"
   [(set_attr "type" "neon_fp_abd_")]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 2177703180ca50acedd64d613e4e665264371fb2..9e966e47789646ed968a081c1fc4cb76b45537af 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -5440,17 +5440,6 @@ vabaq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vabd_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fabd %0.2s, %1.2s, %2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vabd_s8 (int8x8_t a, int8x8_t b)
 {
@@ -5517,17 +5506,6 @@ vabd_u32 (uint32x2_t a, uint32x2_t b)
   return result;
 }
 
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vabdd_f64 (float64_t a, float64_t b)
-{
-  float64_t result;
-  __asm__ ("fabd %d0, %d1, %d2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vabdl_high_s8 (int8x16_t a, int8x16_t b)
 {
@@ -5660,28 +5638,6 @@ vabdl_u32 (uint32x2_t a, uint32x2_t b)
   return result;
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vabdq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("fabd %0.4s, %1.4s, %2.4s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vabdq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("fabd %0.2d, %1.2d, %2.2d"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vabdq_s8 (int8x16_t a, int8x16_t b)
 {
@@ -5748,17 +5704,6 @@ vabdq_u32 (uint32x4_t a, uint32x4_t b)
   return result;
 }
 
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vabds_f32 (float32_t a, float32_t b)
-{
-  float32_t result;
-  __asm__ ("fabd %s0, %s1, %s2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16_t __attribute__ ((__always_inline__))
 vaddlv_s8 (int8x8_t a)
 {
@@ -10235,6 +10180,45 @@ vtbx2_p8 (poly8x8_t r, poly8x8x2_t 

[v2][AArch64, 6/6] Reimplement vpadd intrinsics & extend rtl patterns to all modes

2016-06-06 Thread Jiong Wang

These intrinsics was implemented by inline assembly using "faddp" instruction.
There was a pattern "aarch64_addpv4sf" which supportsV4SF mode only while we can
extend this pattern to support VDQF mode, then we can reimplement these
intrinsics through builtlins.

gcc/
2016-06-06  Jiong Wang

* config/aarch64/aarch64-builtins.def (faddp): New builtins for modes 
in VDQF.
* config/aarch64/aarch64-simd.md (aarch64_faddp): New.
(arch64_addpv4sf): Delete.
(reduc_plus_scal_v4sf): Use "gen_aarch64_faddpv4sf" instead of
"gen_aarch64_addpv4sf".
* config/aarch64/arm_neon.h (vpadd_f32): Remove inline assembly.  Use
builtin.
(vpadds_f32): Likewise.
(vpaddq_f32): Likewise.
(vpaddq_f64): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index deab3450ab74fcd6dfcf8267fa9cedfc1423ca4e..1348e7c198763b24d092f774a0ff25e4d0fd1787 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -468,3 +468,6 @@
 
   /* Implemented by fabd3.  */
   BUILTIN_VALLF (BINOP, fabd, 3)
+
+  /* Implemented by aarch64_faddp.  */
+  BUILTIN_VDQF (BINOP, faddp, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ad8b9c1d0c155d022be2e7e7c426120b551f3f2b..f8d3e766a53736a4b87ba016caccd085eb793bda 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1992,6 +1992,16 @@
   }
 )
 
+(define_insn "aarch64_faddp"
+ [(set (match_operand:VDQF 0 "register_operand" "=w")
+   (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
+		 (match_operand:VDQF 2 "register_operand" "w")]
+		 UNSPEC_FADDV))]
+ "TARGET_SIMD"
+ "faddp\t%0., %1., %2."
+  [(set_attr "type" "neon_fp_reduc_add_")]
+)
+
 (define_insn "aarch64_reduc_plus_internal"
  [(set (match_operand:VDQV 0 "register_operand" "=w")
(unspec:VDQV [(match_operand:VDQV 1 "register_operand" "w")]
@@ -2019,15 +2029,6 @@
   [(set_attr "type" "neon_fp_reduc_add_")]
 )
 
-(define_insn "aarch64_addpv4sf"
- [(set (match_operand:V4SF 0 "register_operand" "=w")
-   (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "w")]
-		UNSPEC_FADDV))]
- "TARGET_SIMD"
- "faddp\\t%0.4s, %1.4s, %1.4s"
-  [(set_attr "type" "neon_fp_reduc_add_s_q")]
-)
-
 (define_expand "reduc_plus_scal_v4sf"
  [(set (match_operand:SF 0 "register_operand")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand")]
@@ -2036,8 +2037,8 @@
 {
   rtx elt = GEN_INT (ENDIAN_LANE_N (V4SFmode, 0));
   rtx scratch = gen_reg_rtx (V4SFmode);
-  emit_insn (gen_aarch64_addpv4sf (scratch, operands[1]));
-  emit_insn (gen_aarch64_addpv4sf (scratch, scratch));
+  emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1]));
+  emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch));
   emit_insn (gen_aarch64_get_lanev4sf (operands[0], scratch, elt));
   DONE;
 })
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9e966e47789646ed968a081c1fc4cb76b45537af..13a4ab80cf7b0470d8ec8b07e0ed1988f8f4e66d 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -8225,17 +8225,6 @@ vpadalq_u32 (uint64x2_t a, uint32x4_t b)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpadd_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("faddp %0.2s,%1.2s,%2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpaddl_s8 (int8x8_t a)
 {
@@ -8368,28 +8357,6 @@ vpaddlq_u32 (uint32x4_t a)
   return result;
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpaddq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("faddp %0.4s,%1.4s,%2.4s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vpaddq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("faddp %0.2d,%1.2d,%2.2d"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vpaddq_s8 (int8x16_t a, int8x16_t b)
 {
@@ -8478,17 +8445,6 @@ vpaddq_u64 (uint64x2_t a, uint64x2_t b)
   return result;
 }
 
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vpadds_f32 (float32x2_t a)
-{
-  float32_t result;
-  __asm__ ("faddp %s0,%1.2s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vqdmulh_n_s16 (int16x4_t a, int16_t b)
 {
@@ -18625,6 +18581,24

[v2][AArch64, 4/6] Reimplement frsqrts intrinsics

2016-06-06 Thread Jiong Wang

Similar as [3/6], these intrinsics were implemented before the instruction
pattern "aarch64_rsqrts" added, that these intrinsics were implemented
through inline assembly.

This mirgrate the implementation to builtin.

gcc/
2016-06-06  Jiong Wang

* config/aarch64/aarch64-builtins.def (rsqrts): New builtins for modes
VALLF.
* config/aarch64/aarch64-simd.md (aarch64_rsqrts_3): Rename to
"aarch64_rsqrts".
* config/aarch64/aarch64.c (get_rsqrts_type): Update gen* name.
* config/aarch64/arm_neon.h (vrsqrtss_f32): Remove inline assembly.  Use
builtin.
(vrsqrtsd_f64): Likewise.
(vrsqrts_f32): Likewise.
(vrsqrts_f64): Likewise.
(vrsqrtsq_f32): Likewise.
(vrsqrtsq_f64): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 32bcd06ec6e483c53b01caf1e30305e0b2b3fb21..1955d171d727e8995795d343ea766f130be0985e 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -462,3 +462,6 @@
 
   /* Implemented by aarch64_rsqrte.  */
   BUILTIN_VALLF (UNOP, rsqrte, 0)
+
+  /* Implemented by aarch64_rsqrts.  */
+  BUILTIN_VALLF (BINOP, rsqrts, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 568dd20ad3436e4aa4c3e7cf6b6f766b7fc127db..78a87b1fb52b5b5e21ef5cd7dbe090c863369775 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -390,7 +390,7 @@
   "frsqrte\\t%0, %1"
   [(set_attr "type" "neon_fp_rsqrte_")])
 
-(define_insn "aarch64_rsqrts_3"
+(define_insn "aarch64_rsqrts"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
 	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
 	   (match_operand:VALLF 2 "register_operand" "w")]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index acfb39dc025d74fe531d439bb87c52d18955ee7c..b60e5c52df6310a87635c523d723eee9768d7aef 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7368,11 +7368,11 @@ get_rsqrts_type (machine_mode mode)
 {
   switch (mode)
   {
-case DFmode:   return gen_aarch64_rsqrts_df3;
-case SFmode:   return gen_aarch64_rsqrts_sf3;
-case V2DFmode: return gen_aarch64_rsqrts_v2df3;
-case V2SFmode: return gen_aarch64_rsqrts_v2sf3;
-case V4SFmode: return gen_aarch64_rsqrts_v4sf3;
+case DFmode:   return gen_aarch64_rsqrtsdf;
+case SFmode:   return gen_aarch64_rsqrtssf;
+case V2DFmode: return gen_aarch64_rsqrtsv2df;
+case V2SFmode: return gen_aarch64_rsqrtsv2sf;
+case V4SFmode: return gen_aarch64_rsqrtsv4sf;
 default: gcc_unreachable ();
   }
 }
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 49d572ff8e5007ad07672568ed4dccbea4e0e139..2177703180ca50acedd64d613e4e665264371fb2 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -9185,61 +9185,6 @@ vrsqrteq_u32 (uint32x4_t a)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vrsqrts_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("frsqrts %0.2s,%1.2s,%2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vrsqrtsd_f64 (float64_t a, float64_t b)
-{
-  float64_t result;
-  __asm__ ("frsqrts %d0,%d1,%d2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vrsqrtsq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("frsqrts %0.4s,%1.4s,%2.4s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vrsqrtsq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("frsqrts %0.2d,%1.2d,%2.2d"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vrsqrtss_f32 (float32_t a, float32_t b)
-{
-  float32_t result;
-  __asm__ ("frsqrts %s0,%s1,%s2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 #define vshrn_high_n_s16(a, b, c)   \
   __extension__ \
 ({  \
@@ -21476,6 +21421,45 @@ vrsqrteq_f64 (float64x2_t __a)
   return __builtin_aarch64_rsqrtev2df (__a);
 }
 
+/* vrsqrts.  */
+
+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+vrsqrtss_f32 (float32_t __a, float32_t _

[v2][AArch64, 3/6] Reimplement frsqrte intrinsics

2016-06-06 Thread Jiong Wang

These intrinsics were implemented before the instruction pattern
"aarch64_rsqrte" added, that these intrinsics were implemented through
inline assembly.

This mirgrate the implementation to builtin.

gcc/
2016-06-06  Jiong Wang

* config/aarch64/aarch64-builtins.def (rsqrte): New builtins for modes
VALLF.
* config/aarch64/aarch64-simd.md (aarch64_rsqrte_2): Rename to
"aarch64_rsqrte".
* config/aarch64/aarch64.c (get_rsqrte_type): Update gen* name.
* config/aarch64/arm_neon.h (vrsqrts_f32): Remove inline assembly.  Use
builtin.
(vrsqrted_f64): Likewise.
(vrsqrte_f32): Likewise.
(vrsqrte_f64): Likewise.
(vrsqrteq_f32): Likewise.
(vrsqrteq_f64): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 25a5270766401bd2f31ccacdafee83c183bdf775..f60f84c42fefd32bace6f4aa690f97ca54f3e4b6 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -451,3 +451,6 @@
   BUILTIN_VALLI (BINOP_SUS, ucvtf, 3)
   BUILTIN_VALLF (BINOP, fcvtzs, 3)
   BUILTIN_VALLF (BINOP_USS, fcvtzu, 3)
+
+  /* Implemented by aarch64_rsqrte.  */
+  BUILTIN_VALLF (UNOP, rsqrte, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ca90b666a7e3888057b7d9e8562a2544a006cf0f..941214680262ef1015cbb23f518b4999f962bf9b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -382,7 +382,7 @@
   [(set_attr "type" "neon_mul__scalar")]
 )
 
-(define_insn "aarch64_rsqrte_2"
+(define_insn "aarch64_rsqrte"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
 	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
 		 UNSPEC_RSQRTE))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ad07fe196a814ace78d43f66e70280d20a4476b5..acfb39dc025d74fe531d439bb87c52d18955ee7c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7349,11 +7349,11 @@ get_rsqrte_type (machine_mode mode)
 {
   switch (mode)
   {
-case DFmode:   return gen_aarch64_rsqrte_df2;
-case SFmode:   return gen_aarch64_rsqrte_sf2;
-case V2DFmode: return gen_aarch64_rsqrte_v2df2;
-case V2SFmode: return gen_aarch64_rsqrte_v2sf2;
-case V4SFmode: return gen_aarch64_rsqrte_v4sf2;
+case DFmode:   return gen_aarch64_rsqrtedf;
+case SFmode:   return gen_aarch64_rsqrtesf;
+case V2DFmode: return gen_aarch64_rsqrtev2df;
+case V2SFmode: return gen_aarch64_rsqrtev2sf;
+case V4SFmode: return gen_aarch64_rsqrtev4sf;
 default: gcc_unreachable ();
   }
 }
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 04bce9ab80c151877619ee75e7cb50f5951099f7..e4f7a66abcc59f306de289d22e9d09cfe32c0c87 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -9163,28 +9163,6 @@ vqrdmulhq_n_s32 (int32x4_t a, int32_t b)
result;  \
  })
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vrsqrte_f32 (float32x2_t a)
-{
-  float32x2_t result;
-  __asm__ ("frsqrte %0.2s,%1.2s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
-vrsqrte_f64 (float64x1_t a)
-{
-  float64x1_t result;
-  __asm__ ("frsqrte %d0,%d1"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrsqrte_u32 (uint32x2_t a)
 {
@@ -9196,39 +9174,6 @@ vrsqrte_u32 (uint32x2_t a)
   return result;
 }
 
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vrsqrted_f64 (float64_t a)
-{
-  float64_t result;
-  __asm__ ("frsqrte %d0,%d1"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vrsqrteq_f32 (float32x4_t a)
-{
-  float32x4_t result;
-  __asm__ ("frsqrte %0.4s,%1.4s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vrsqrteq_f64 (float64x2_t a)
-{
-  float64x2_t result;
-  __asm__ ("frsqrte %0.2d,%1.2d"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrsqrteq_u32 (uint32x4_t a)
 {
@@ -9240,17 +9185,6 @@ vrsqrteq_u32 (uint32x4_t a)
   return result;
 }
 
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vrsqrtes_f32 (float32_t a)
-{
-  float32_t result;
-  __asm__ ("frsqrte %s0,%s1"
-   : "=w"(result)
- 

[v2][AArch64, 2/6] Reimplement vector fixed-point intrinsics

2016-06-06 Thread Jiong Wang

Based on top of [1/6], this patch reimplement vector intrinsics for
conversion between floating-point and fixed-point.

gcc/
2016-06-06  Jiong Wang

* config/aarch64/aarch64-builtins.def (scvtf): Register vector modes.
(ucvtf): Likewise.
(fcvtzs): Likewise.
(fcvtzu): Likewise.
* config/aarch64/aarch64-simd.md
(3): New.
(3): Likewise.
* config/aarch64/arm_neon.h (vcvt_n_f32_s32): Remove inline assembly.
Use builtin.
(vcvt_n_f32_u32): Likewise.
(vcvt_n_s32_f32): Likewise.
(vcvt_n_u32_f32): Likewise.
(vcvtq_n_f32_s32): Likewise.
(vcvtq_n_f32_u32): Likewise.
(vcvtq_n_f64_s64): Likewise.
(vcvtq_n_f64_u64): Likewise.
(vcvtq_n_s32_f32): Likewise.
(vcvtq_n_s64_f64): Likewise.
(vcvtq_n_u32_f32): Likewise.
(vcvtq_n_u64_f64): Likewise.
* config/aarch64/iterators.md (VDQ_SDI): New mode iterator.
(VSDQ_SDI): Likewise.
(fcvt_target): Support V4DI, V4SI and V2SI.
(FCVT_TARGET): Likewise.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 0b2f0631c740558c62cffe5715eaffa5ad0557a9..a7ea3c4b8ea7d695b12e6b0291e6ff815826a641 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -447,7 +447,7 @@
   BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0)
 
   /* Implemented by <*><*>3.  */
-  BUILTIN_GPI (BINOP, scvtf, 3)
-  BUILTIN_GPI (BINOP_SUS, ucvtf, 3)
-  BUILTIN_GPF (BINOP, fcvtzs, 3)
-  BUILTIN_GPF (BINOP_USS, fcvtzu, 3)
+  BUILTIN_VSDQ_SDI (BINOP, scvtf, 3)
+  BUILTIN_VSDQ_SDI (BINOP_SUS, ucvtf, 3)
+  BUILTIN_VALLF (BINOP, fcvtzs, 3)
+  BUILTIN_VALLF (BINOP_USS, fcvtzu, 3)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..d2a6cc27de9c571e84cf59713e5fcb9c450f83a3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1778,6 +1778,28 @@
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
+;; Convert between fixed-point and floating-point (vector modes)
+
+(define_insn "3"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VDQF 1 "register_operand" "w")
+(match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_F2FIXED))]
+  "TARGET_SIMD"
+  "\t%0, %1, #%2"
+  [(set_attr "type" "neon_fp_to_int_")]
+)
+
+(define_insn "3"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VDQ_SDI 1 "register_operand" "w")
+   (match_operand:SI 2 "immediate_operand" "i")]
+	 FCVT_FIXED2F))]
+  "TARGET_SIMD"
+  "\t%0, %1, #%2"
+  [(set_attr "type" "neon_int_to_fp_")]
+)
+
 ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
 ;; is inconsistent with vector ordering elsewhere in the compiler, in that
 ;; the meaning of HI and LO changes depending on the target endianness.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 8a0fba6513e572ede9f2e4aaf8d29baf6baf683d..04bce9ab80c151877619ee75e7cb50f5951099f7 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -6025,150 +6025,6 @@ vaddlvq_u32 (uint32x4_t a)
result;  \
  })
 
-#define vcvt_n_f32_s32(a, b)\
-  __extension__ \
-({  \
-   int32x2_t a_ = (a);  \
-   float32x2_t result;  \
-   __asm__ ("scvtf %0.2s, %1.2s, #%2"   \
-: "=w"(result)  \
-: "w"(a_), "i"(b)   \
-: /* No clobbers */);   \
-   result;  \
- })
-
-#define vcvt_n_f32_u32(a, b)\
-  __extension__ \
-({  \
-   uint32x2_t a_ = (a); \
-   float32x2_t result;  \
-   __asm__ ("ucvtf %0.2s, %1.2s, #%2"   \
-: "=w"(result)  \
-: "w"(a_), "i"(b)   \
-: /* No clobbers */);   \
-   result;  \
- })
-
-#define vcvt_n_s32_f32(a, b)\
-  __extension__  

[v2][AArch64, 1/6] Reimplement scalar fixed-point intrinsics

2016-06-06 Thread Jiong Wang

On 27/05/16 17:52, Jiong Wang wrote:



On 27/05/16 14:03, James Greenhalgh wrote:

On Tue, May 24, 2016 at 09:23:36AM +0100, Jiong Wang wrote:

 * config/aarch64/aarch64-simd-builtins.def: Rename to
 aarch64-builtins.def.

Why? We already have some number of intrinsics in here that are not
strictly SIMD, but I don't see the value in the rename?


Mostly because this builtin infrastructure is handy that I want to
implement some vfp builtins in this .def file instead of implement those
raw structure inside aarch64-builtins.c.

And there maybe more and more such builtins in the future, so I renamed
this file.


Is this OK?

+(define_int_iterator FCVT_FIXED2F_SCALAR [UNSPEC_SCVTF_SCALAR 
UNSPEC_UCVTF_SCALAR])

Again, do we need the "SCALAR" versions at all?


That's because for scalar fixed-point conversion, we have two types of
instructions to support this.

  * scalar instruction from vfp
  * scalar variant instruction from simd

One is guarded by TARGET_FLOAT, the other is guarded by TARGET_SIMD, and
their instruction format is different, so I want to keep them in
aarch64.md and aarch64-simd.md seperately.

The other reason is these two use different patterns:

  * vfp scalar support conversion between different size, for example,
SF->DI, DF->SI, so it's using two mode iterators, GPI and GPF, and
is utilizing the product of the two to cover all supported
conversions, sfsi, sfdi, dfsi, dfdi, sisf, sidf, disf, didf.

  * simd scalar only support conversion between same size that single
mode iterator is used to cover sfsi, sisf, dfdi, didf.

For intrinsics implementation, I used builtins backed by vfp scalar
instead of simd scalar which requires the input sitting inside vector 
register.


I remember the simd scalar pattern was here because it's anyway needed
by patch [2/6] which extends it's modes naturally to vector modes. I was
thinking it's better to keep simd scalar variant with this scalar
intrinsics enable patch.

Is this OK?

Thanks.


I updated this patch set with the following modifications:

  * drop the renaming of aarch64-builtins.def
  * implemented vrsqrts_f64, vrsqrte_f64, vabd_f64, vpadds_f32 as I am here.


OK for trunk?

gcc/
2016-06-06  Jiong Wang

* config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New
(TYPES_BINOP_SUS): Likewise.
(aarch64_simd_builtin_data): Update include file name.
(aarch64_builtins): Likewise.
* config/aarch64/aarch64-simd-builtins.def (scvtf): New entries
for conversion between scalar float-point and fixed-point.
(ucvtf): Likewise.
(fcvtzs): Likewise.
(fcvtzu): Likewise.
* config/aarch64/aarch64.md
(3: New
pattern for conversion between scalar float to fixed-pointer.
(: Likewise.
(UNSPEC_FCVTZS): New UNSPEC enumeration.
(UNSPEC_FCVTZU): Likewise.
(UNSPEC_SCVTF): Likewise.
(UNSPEC_UCVTF): Likewise.
* config/aarch64/arm_neon.h (vcvtd_n_f64_s64): Remove inline assembly.  
Use
builtin.
(vcvtd_n_f64_u64): Likewise.
(vcvtd_n_s64_f64): Likewise.
(vcvtd_n_u64_f64): Likewise.
(vcvtd_n_f32_s32): Likewise.
(vcvts_n_f32_u32): Likewise.
(vcvtd_n_s32_f32): Likewise.
(vcvts_n_u32_f32): Likewise.
* config/aarch64/iterators.md (fcvt_target): Support integer to float 
mapping.
(FCVT_TARGET): Likewise.
(FCVT_FIXED2F): New iterator.
(FCVT_F2FIXED): Likewise.
(fcvt_fixed_insn): New define_int_attr.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 5573903fe0a1f3d1ffc58c36992bd46cd0cb4dad..262ea1c519f4f01a1a0726296994e40a48f26680 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -139,6 +139,14 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned };
 #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_binop_sus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_BINOP_SUS (aarch64_types_binop_sus_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index dd045792b21f84b9587be08a07db0e0081e0c484..0b2f0631c740558c62cffe5715eaffa5ad0557a9 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -445,3 +44

[PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-06 Thread Bernd Edlinger
Ping...

see https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02010.html

Thanks
Bernd.


On 05/25/16 14:58, Bernd Edlinger wrote:
> Hi!
>
> This restricts the X constraint in asm statements, which
> can be easily folded by combine in something completely
> invalid.
>
> It is necessary to allow scratch here, because on i386
> the md_asm_adjust hook inserts them.
>
> The second test case fails because lra does not
> allow all register for anything_ok operands (aka X)
> and later it fails to match the two X constraints
> in case '0': if (curr_alt[m] == NO_REGS) break.
>
> There is also an identical bug in the reload pass,
> but I do not know how to fix that, as it is almost
> used nowhere today.
>
>
> Boot-strapped and regression-tested on x86_64-pc-linux-gnu.
> OK for trunk?
>
>
> Thanks
> Bernd.
>


Re: loop-ch tweek

2016-06-06 Thread Jan Hubicka
Hi,
does this look better?

Honza

* gimple.c: Include builtins.h
(gimple_inexpensive_call_p): New function.
* gimple.h (gimple_inexpensive_call_p): Declare.
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Use it.
* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Likewise.
Index: gimple.c
===
--- gimple.c(revision 237101)
+++ gimple.c(working copy)
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
 #include "gimple-walk.h"
 #include "gimplify.h"
 #include "target.h"
+#include "builtins.h"
 
 
 /* All the tuples have their operand vector (if present) at the very bottom
@@ -3018,3 +3019,16 @@ maybe_remove_unused_call_args (struct fu
   update_stmt_fn (fn, stmt);
 }
 }
+
+/* Return false if STMT will likely expand to real function call.  */
+
+bool
+gimple_inexpensive_call_p (gimple *stmt)
+{
+  if (gimple_call_internal_p (stmt))
+return true;
+  tree decl = gimple_call_fndecl (stmt);
+  if (decl && is_inexpensive_builtin (decl))
+return true;
+  return false;
+}
Index: gimple.h
===
--- gimple.h(revision 237101)
+++ gimple.h(working copy)
@@ -1525,6 +1525,7 @@ extern void preprocess_case_label_vec_fo
 extern void gimple_seq_set_location (gimple_seq, location_t);
 extern void gimple_seq_discard (gimple_seq);
 extern void maybe_remove_unused_call_args (struct function *, gimple *);
+extern bool gimple_inexpensive_call_p (gimple *);
 
 /* Formal (expression) temporary table handling: multiple occurrences of
the same scalar expression are evaluated into the same temporary.  */
Index: tree-ssa-loop-ch.c
===
--- tree-ssa-loop-ch.c  (revision 237101)
+++ tree-ssa-loop-ch.c  (working copy)
@@ -118,7 +118,8 @@ should_duplicate_loop_header_p (basic_bl
   if (is_gimple_debug (last))
continue;
 
-  if (is_gimple_call (last))
+  if (gimple_code (last) == GIMPLE_CALL
+ && !gimple_inexpensive_call_p (last))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 237101)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -339,15 +339,11 @@ tree_estimate_loop_size (struct loop *lo
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
  gimple *stmt = gsi_stmt (gsi);
- if (gimple_code (stmt) == GIMPLE_CALL)
+ if (gimple_code (stmt) == GIMPLE_CALL
+ && !gimple_inexpensive_call_p (stmt))
{
  int flags = gimple_call_flags (stmt);
- tree decl = gimple_call_fndecl (stmt);
-
- if (decl && DECL_IS_BUILTIN (decl)
- && is_inexpensive_builtin (decl))
-   ;
- else if (flags & (ECF_PURE | ECF_CONST))
+ if (flags & (ECF_PURE | ECF_CONST))
size->num_pure_calls_on_hot_path++;
  else
size->num_non_pure_calls_on_hot_path++;


[PATCH][vectorizer] Remove blank debug lines after dump_gimple_stmt

2016-06-06 Thread Alan Hayward
Lots of code calls dump_gimple_stmt then print a newline, however
dump_gimple_stmt will print a newline itself. This makes the vectorizer
debug
file messy. I think the confusion is because dump_generic_expr does NOT
print a
newline. This patch removes all prints of a newline direcly after a
dump_gimple_stmt.

Tested by examining a selection of vect dump files.

gcc/
\* tree-vect-data-refs.c (vect_analyze_data_refs): Remove debug newline.
\* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): likewise.
(vect_can_advance_ivs_p): likewise.
(vect_update_ivs_after_vectorizer): likewise.
\* tree-vect-loop.c (vect_determine_vectorization_factor): likewise.
(vect_analyze_scalar_cycles_1): likewise.
(vect_analyze_loop_operations): likewise.
(report_vect_op): likewise.
(vect_is_slp_reduction): likewise.
(vect_is_simple_reduction): likewise.
(get_initial_def_for_induction): likewise.
(vect_transform_loop): likewise.
\* tree-vect-patterns.c (vect_recog_dot_prod_pattern): likewise.
(vect_recog_sad_pattern): likewise.
(vect_recog_widen_sum_pattern): likewise.
(vect_recog_widening_pattern): likewise.
(vect_recog_divmod_pattern): likewise.
\* tree-vect-slp.c (vect-build-slp_tree_1): likewise.
(vect_analyze_slp_instance): likewise.
(vect_transform_slp_perm_load): likewise.
(vect_schedule_slp_instance): likewise.


Alan




removenewlines.patch
Description: Binary data


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-06 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 02:09:44PM -0600, Martin Sebor wrote:
> I see.  I've made the change in the latest update to the patch
> but I wasn't able to create a test case to verify it.  Maybe
> that's because this is constexpr the COMPLEX_EXPR doesn't make
> it far enough to trigger a problem.  If there is a way to test
> it I'd appreciate a suggestion for how (otherwise, if not caught
> in a code review like in this case, it would be a ticking time
> bomb).

As you haven't been willing to try the __builtin_*_overflow_p
way, I've done that myself, it hasn't been really that hard.

I've also found another bug, when computing the value that is stored
in the constexpr function, you weren't converting the INTEGER_CST
args to type, so you could get incorrect type of the result or
other issues.

> It also occurred to me that a more robust solution might be to
> change build_complex to either enforce as a precondition that
> the members have a type that matches the complex type.  I've
> taken the liberty of making this change as part of this patch.

IMHO it doesn't belong to this patch, feel free to submit the tree.c
change separately.

> (It seems that an even better solution would be to have
> build_complex convert the arguments to the expected type
> so that callers don't have to worry about it.)

I diagree with that, build_complex is a low-level function, it shouldn't
perform such conversions.

2016-06-06  Martin Sebor  
Jakub Jelinek  

PR c++/70507
PR c/68120
* builtins.def (BUILT_IN_ADD_OVERFLOW_P, BUILT_IN_SUB_OVERFLOW_P,
BUILT_IN_MUL_OVERFLOW_P): New builtins.
* builtins.c: Include gimple-fold.h.
(fold_builtin_arith_overflow): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
(fold_builtin_3): Likewise.
* doc/extend.texi (Integer Overflow Builtins): Document
__builtin_{add,sub,mul}_overflow_p.
gcc/c/
* c-typeck.c (convert_arguments): Don't promote last argument
of BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/cp/
* constexpr.c: Include gimple-fold.h.
(cxx_eval_internal_function): New function.
(cxx_eval_call_expression): Call it.
(potential_constant_expression_1): Handle integer arithmetic
overflow built-ins.
* tree.c (builtin_valid_in_constant_expr_p): Likewise.
gcc/c-family/
* c-common.c (check_builtin_function_arguments): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/testsuite/
* c-c++-common/builtin-arith-overflow-1.c: Add test cases.
* c-c++-common/builtin-arith-overflow-2.c: New test.
* g++.dg/cpp0x/constexpr-arith-overflow.C: New test.
* g++.dg/cpp1y/constexpr-arith-overflow.C: New test.

--- gcc/builtins.c.jj   2016-06-03 21:25:16.595678286 +0200
+++ gcc/builtins.c  2016-06-06 11:48:06.800100603 +0200
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.
 #include "rtl-chkp.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "gimple-fold.h"
 
 
 struct target_builtins default_target_builtins;
@@ -7943,18 +7944,28 @@ fold_builtin_unordered_cmp (location_t l
 /* Fold __builtin_{,s,u}{add,sub,mul}{,l,ll}_overflow, either into normal
arithmetics if it can never overflow, or into internal functions that
return both result of arithmetics and overflowed boolean flag in
-   a complex integer result, or some other check for overflow.  */
+   a complex integer result, or some other check for overflow.
+   Similarly fold __builtin_{add,sub,mul}_overflow_p to just the overflow
+   checking part of that.  */
 
 static tree
 fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
 tree arg0, tree arg1, tree arg2)
 {
   enum internal_fn ifn = IFN_LAST;
-  tree type = TREE_TYPE (TREE_TYPE (arg2));
-  tree mem_arg2 = build_fold_indirect_ref_loc (loc, arg2);
+  /* The code of the expression corresponding to the type-generic
+ built-in, or ERROR_MARK for the type-specific ones.  */
+  enum tree_code opcode = ERROR_MARK;
+  bool ovf_only = false;
+
   switch (fcode)
 {
+case BUILT_IN_ADD_OVERFLOW_P:
+  ovf_only = true;
+  /* FALLTHRU */
 case BUILT_IN_ADD_OVERFLOW:
+  opcode = PLUS_EXPR;
+  /* FALLTHRU */
 case BUILT_IN_SADD_OVERFLOW:
 case BUILT_IN_SADDL_OVERFLOW:
 case BUILT_IN_SADDLL_OVERFLOW:
@@ -7963,7 +7974,12 @@ fold_builtin_arith_overflow (location_t
 case BUILT_IN_UADDLL_OVERFLOW:
   ifn = IFN_ADD_OVERFLOW;
   break;
+case BUILT_IN_SUB_OVERFLOW_P:
+  ovf_only = true;
+  /* FALLTHRU */
 case BUILT_IN_SUB_OVERFLOW:
+  opcode = MINUS_EXPR;
+  /* FALLTHRU */
 case BUILT_IN_SSUB_OVERFLOW:
 case BUILT_IN_SSUBL_OVERFLOW:
 case BUILT_IN_SSUBLL_OVERFLOW:
@@ -7972,7 +7988,12 @@ fold_builtin_arith_overflow (location_t
 case BUILT_IN_USUBLL_OVERFLOW:
   ifn = IFN_SUB_OVERFLOW;
   break;
+case BUILT_IN_MUL_OVERFLOW_P:
+  ovf_only = true;
+  /* F

Re: loop-ch tweek

2016-06-06 Thread Jan Hubicka
> On Mon, 6 Jun 2016, Jan Hubicka wrote:
> 
> > > On Sun, 5 Jun 2016, Jan Hubicka wrote:
> > > 
> > > > Hi,
> > > > both loop-ch and loop-ivcanon want to trottle down the heuristics on 
> > > > paths
> > > > containing call. Testing for presence of GIMPLE_CALL is wrong for 
> > > > internal
> > > > call and cheap builtins that are expanded inline.
> > > > 
> > > > Bootstrapped/regtested x86_64-linux, OK?
> > > 
> > > First of all the name is bad - I'd say gimple_inexpensive_call_p ()
> > > is better.  More comments below.
> > 
> > OK, the motivation for name is that I am really testing if the GIMPLE_CALL 
> > will end up
> > call instruction in the final assembly. No matter whetehr expensive or not.
> 
> Well, but that's not what your predicate tests ;)  For example
> CLZ is considered is_inexpensive_builtin even though it may end up
> as a call.  In fact even non-calls can end up as a libcall on
> some targets.

Yep, it is for heuristics estimating the runtime cost of the given path,
(number of branches and number of real calls). It doesn't need to be precise
but it would be better if it was.  

The general intutition that call within loop probalby makes the loop
uninteresting for expensive loop transforms seems kind of sound (i.e. I do not
know of counterexamples) even through it is not a real rocket science.

If I make gimple_inexpensive_call_p then the test would be
  if (gimple_code (stmt) == GIMPLE_CALL && !gimple_inexpensive_call_p (stmt))
... account that call is evil ...

Does that look OK?

BTW the hard-wired bound of 20 insns for header copying seems high. I will turn
it into --param and we probably could check what value is really needed.  I 
don't
think it was tested since it was moved away from jump.c which worked on quite
different context.

Honza
> 
> > > gimple_code (stmt) == GIMPLE_CALL is redundant then.  I'd prefer to
> > > make gimple_inexpensive_call_p take a gcall * argument and do the
> > > test at the callers though.
> > 
> > OK, i will update patch.  I had mostly copied those tests from original
> > code which I think had them as short cirucuits. This most probalby does not
> > matter in practice and LTO may be eventually to do that for us. So it seemed
> > bit like premature optimization.  I will update the patch.
> 
> Thanks,
> Richard.
> 
> > Honza
> > > 
> > > > {
> > > >   int flags = gimple_call_flags (stmt);
> > > > - tree decl = gimple_call_fndecl (stmt);
> > > > -
> > > > - if (decl && DECL_IS_BUILTIN (decl)
> > > > - && is_inexpensive_builtin (decl))
> > > > -   ;
> > > > - else if (flags & (ECF_PURE | ECF_CONST))
> > > > + if (flags & (ECF_PURE | ECF_CONST))
> > > > size->num_pure_calls_on_hot_path++;
> > > >   else
> > > > size->num_non_pure_calls_on_hot_path++;
> > > > 
> > > > 
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [middle-end][PATCH] Update alignment_for_piecewise_move

2016-06-06 Thread Bernd Schmidt

On 06/02/2016 03:37 PM, H.J. Lu wrote:


Are you planning to submit your patch before July?  If not, I will resubmit
mine and work out all the issues.  It may take a long time to review and I
have patches to enable SSE, AVX, AVX512 f memset and memcpy, which
depend on it.  I'd like to see them before stage 2 for GCC 7.


My initial patch is now committed, but I haven't got to followups like 
vector modes quite yet.



Bernd



Re: [PATCH v1] Support for SPARC M7 and VIS 4.0

2016-06-06 Thread Jose E. Marchesi

> This patch adds support for -mcpu=niagara7, corresponding to the SPARC
> M7 CPU as documented in the Oracle SPARC Architecture 2015 and the M7
> Processor Supplement.  The patch also includes intrinsics support for
> all the VIS 4.0 instructions.
> 
> This patch has been tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu
> and sparc-sun-solaris2.11 targets.
> 
> gcc/ChangeLog:
> 
>   * config/sparc/sparc.md (cpu): Add niagara7 cpu type.
>   Include the M7 SPARC DFA scheduler.
>   New attribute v3pipe.
>   Annotate insns with v3pipe where appropriate.
>   Define cpu_feature vis4.
>   Add lzd instruction type and set it on clzdi_sp64 and clzsi_sp64.
>   Add (V8QI "8") to vbits.
>   Add insns {add,sub}v8qi3
>   Add insns ss{add,sub}v8qi3
>   Add insns us{add,sub}{v8qi,v4hi}3
>   Add insns {min,max}{v8qi,v4hi,v2si}3
>   Add insns {minu,maxu}{v8qi,v4hi,v2si}3
>   Add insns fpcmp{le,gt,ule,ug,ule,ugt}{8,16,32}_vis.
>   * config/sparc/niagara4.md: Add a comment explaining the
>   discrepancy between the documented latenty numbers and the
>   implemented ones.
>   * config/sparc/niagara7.md: New file.
>   * configure.ac (HAVE_AS_SPARC5_VIS4): Define if the assembler
>   supports SPARC5 and VIS 4.0 instructions.
>   * configure: Regenerate.
>   * config.in: Likewise.
>   * config.gcc: niagara7 is a supported cpu in sparc*-*-* targets.
>   * config/sparc/sol2.h (ASM_CPU32_DEFAUILT_SPEC): Set for
>   TARGET_CPU_niagara7.
>   (ASM_CPU64_DEFAULT_SPEC): Likewise.
>   (CPP_CPU_SPEC): Handle niagara7.
>   (ASM_CPU_SPEC): Likewise.
>   * config/sparc/sparc-opts.h (processor_type): Add
>   PROCESSOR_NIAGARA7.
>   (mvis4): New option.
>   * config/sparc/sparc.h (TARGET_CPU_niagara7): Define.
>   (AS_NIAGARA7_FLAG): Define.
>   (ASM_CPU64_DEFAULT_SPEC): Set for niagara7.
>   (CPP_CPU64_DEFAULT_SPEC): Likewise.
>   (CPP_CPU_SPEC): Handle niagara7.
>   (ASM_CPU_SPEC): Likewise.
>   * config/sparc/sparc.c (niagara7_costs): Define.
>   (sparc_option_override): Handle niagara7 and adjust cache-related
>   parameters with better values for niagara cpus.  Also support VIS4.
>   (sparc32_initialize_trampoline): Likewise.
>   (sparc_use_sched_lookahead): Likewise.
>   (sparc_issue_rate): Likewise.
>   (sparc_register_move_cost): Likewise.
>   (dump_target_flag_bits): Support VIS4.
>   (sparc_vis_init_builtins): Likewise.
>   (sparc_builtins): Likewise.
>   * config/sparc/sparc-c.c (sparc_target_macros): Define __VIS__ for
>   VIS4 4.0.
>   * config/sparc/driver-sparc.c (cpu_names): Add SPARC-M7 and
>   UltraSparc M7.
>   * config/sparc/sparc.opt (sparc_processor_type): New value
>   niagara7.
>   * config/sparc/visintrin.h (__attribute__): Prototypes for the
>   VIS4 builtins.
>   * doc/invoke.texi (SPARC Options): Document -mcpu=niagara7 and
>   -mvis4.
>   * doc/extend.texi (SPARC VIS Built-in Functions): Document the
>   VIS4 builtins.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/sparc/vis4misc.c: New file.
>   * gcc.target/sparc/fpcmp.c: Likewise.
>   * gcc.target/sparc/fpcmpu.c: Likewise.

OK for mainline, thanks.  As mentioned yesterday, I think that we should 
also 
put it on the 6 branch, but I can do the backport myself.

Committed to trunk.  Thanks.


Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-06-06 Thread Bin.Cheng
On Fri, May 27, 2016 at 12:56 PM, Richard Biener
 wrote:
> On Fri, May 27, 2016 at 1:11 PM, Bin.Cheng  wrote:
>> On Fri, May 27, 2016 at 11:45 AM, Richard Biener
>>  wrote:
>>> On Wed, May 25, 2016 at 1:22 PM, Bin Cheng  wrote:
 Hi,
 As analyzed in PR68303 and PR69710, vectorizer generates duplicated 
 computations in loop's pre-header basic block when creating base address 
 for vector reference to the same memory object.  Because the duplicated 
 code is out of loop, IVOPT fails to track base object for these vector 
 references, resulting in missed strength reduction.
 It's agreed that vectorizer should be improved to generate optimal 
 (IVOPT-friendly) code, the difficult part is we want a generic 
 infrastructure.  After investigation, I tried to introduce a 
 generic/simple local CSE interface by reusing existing 
 algorithm/data-structure from tree-ssa-dom (tree-ssa-scopedtables).  The 
 interface runs local CSE for each basic block in a bitmap, customers of 
 this interface only need to record basic blocks in the bitmap when 
 necessary.  Note we don't need scopedtables' unwinding facility since the 
 interface runs only for single basic block, this should be good in terms 
 of compilation time.
 Besides CSE issue, this patch also re-associates address expressions in 
 vect_create_addr_base_for_vector_ref, specifically, it splits constant 
 offset and adds it back near the expression root in IR.  This is necessary 
 because GCC only handles re-association for commutative operators in CSE.

 I checked its impact on various test cases.
 With this patch, PR68030's generated assembly is reduced from ~750 lines 
 to ~580 lines on x86_64, with both pre-header and loop body simplified.  
 But,
 1) It doesn't fix all the problem on x86_64.  Root cause is computation 
 for base address of the first reference is somehow moved outside of loop's 
 pre-header, local CSE can't help in this case.  Though 
 split_constant_offset can back track ssa def chain, it causes possible 
 redundant when there is no CSE opportunities in pre-header.
 2) It causes regression for PR68030 on AArch64.  I think the regression is 
 caused by IVOPT issues which are exposed by this patch.  Checks on offset 
 validity in get_address_cost is wrong/inaccurate now.  It considers an 
 offset as valid if it's within the maximum offset range that backend 
 supports.  This is not true, for example, AArch64 requires aligned offset 
 additionally.  For example, LDR [base + 2060] is invalid for V4SFmode, 
 although "2060" is within the maximum offset range.  Another issue is also 
 in get_address_cost.  Inaccurate cost is computed for "base + offset + 
 INDEX" address expression.  When register pressure is low, "base+offset" 
 can be hoisted out and we can use [base + INDEX] addressing mode, whichhis 
 is current behavior.

 Bootstrap and test on x86_64 and AArch64.  Any comments appreciated.
>>>
>>> It looks quite straight-forward with the caveat that it has one
>>> obvious piece that is not in the order
>>> of the complexity of a basic-block.  threadedge_initialize_values
>>> creates the SSA value array
>> I noticed this too, and think it's better to get rid of this init/fini
>> functions by some kind re-design.  I found it's quite weird to call
>> threadege_X in tree-vrp.c.  I will keep investigating this.
>>> which is zero-initialized (upon use).  That's probably a non-issue for
>>> the use you propose for
>>> the vectorizer (call cse_bbs once per function).  As Ideally I would
>>> like this facility to replace
>>> the loop unrollers own propagate_constants_for_unrolling it might
>>> become an issue though.
>>> In that regard the unroller facility is also more powerful because it
>>> handles regions (including
>>> PHIs).
>> With the current data-structure, I think it's not very hard to extend
>> the interface to regions.  I will keep investigating this too.  BTW,
>> if it's okay, I tend to do that in following patches.
>
> I'm fine with doing enhancements to this in followup patches (adding Jeff
> to CC for his opinions).
Hi,
I further investigated the impact of this patch, and believe my
previous conclusion still holds.  On targets that don't support [base
+ index + offset] addressing mode, it may causes IVO generating worse
code than now.  I found and investigated a new case showing exactly
the same problem.  Take AArch64 as an example, before this patch, IVO
computes cost and chooses candidate as below:

Loop preheader:
vect_p_1 = ...
vect_p_2 = ...
Loop header:
MEM[vect_p_1 + VAR]
MEM[vect_p_2 + VAR]

After this patch, cse opportunities are realized between vect_p_1 and
vect_p_2, and IVO computes cost for the second MEM_REF differently as
below:
Loop preheader:
vect_p_1 = 
vect_p_2 = vect_1_p + offset
Loop header:
MEM[vect_p_1 + VAR]
  

Re: Remove word_mode hack for split bitfields

2016-06-06 Thread Bernd Schmidt

On 05/26/2016 04:36 PM, Richard Sandiford wrote:

This patch is effectively reverting a change from 1994.  The reason
I think it's a hack is that store_bit_field_1 is creating a subreg
reference to one word of a field even though it has already proven that
the field spills into the following word.  We then rely on the special
SUBREG handling in store_split_bit_field to ignore the extent of op0 and
look inside the SUBREG_REG regardless.  I don't see any reason why we can't
pass the original op0 to store_split_bit_field instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?


I think it's OK. Ideally we'd know why the 1994 change was made, but 
that's beyond my archaeological abiliy. The code looked very different 
at the time and probably changed over the years to make this 
simplification possible.



Bernd



Re: loop-ch tweek

2016-06-06 Thread Richard Biener
On Mon, 6 Jun 2016, Jan Hubicka wrote:

> > On Sun, 5 Jun 2016, Jan Hubicka wrote:
> > 
> > > Hi,
> > > both loop-ch and loop-ivcanon want to trottle down the heuristics on paths
> > > containing call. Testing for presence of GIMPLE_CALL is wrong for internal
> > > call and cheap builtins that are expanded inline.
> > > 
> > > Bootstrapped/regtested x86_64-linux, OK?
> > 
> > First of all the name is bad - I'd say gimple_inexpensive_call_p ()
> > is better.  More comments below.
> 
> OK, the motivation for name is that I am really testing if the GIMPLE_CALL 
> will end up
> call instruction in the final assembly. No matter whetehr expensive or not.

Well, but that's not what your predicate tests ;)  For example
CLZ is considered is_inexpensive_builtin even though it may end up
as a call.  In fact even non-calls can end up as a libcall on
some targets.

> > gimple_code (stmt) == GIMPLE_CALL is redundant then.  I'd prefer to
> > make gimple_inexpensive_call_p take a gcall * argument and do the
> > test at the callers though.
> 
> OK, i will update patch.  I had mostly copied those tests from original
> code which I think had them as short cirucuits. This most probalby does not
> matter in practice and LTO may be eventually to do that for us. So it seemed
> bit like premature optimization.  I will update the patch.

Thanks,
Richard.

> Honza
> > 
> > >   {
> > > int flags = gimple_call_flags (stmt);
> > > -   tree decl = gimple_call_fndecl (stmt);
> > > -
> > > -   if (decl && DECL_IS_BUILTIN (decl)
> > > -   && is_inexpensive_builtin (decl))
> > > - ;
> > > -   else if (flags & (ECF_PURE | ECF_CONST))
> > > +   if (flags & (ECF_PURE | ECF_CONST))
> > >   size->num_pure_calls_on_hot_path++;
> > > else
> > >   size->num_non_pure_calls_on_hot_path++;
> > > 
> > > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


backward threading heuristics tweek

2016-06-06 Thread Jan Hubicka
Hi,
while looking into profile mismatches introduced by the backward threading pass
I noticed that the heuristics seems quite simplistics.  First it should be
profile sensitive and disallow duplication when optimizing cold paths. Second
it should use estimate_num_insns because gimple statement count is not really
very realistic estimate of final code size effect and third there seems to be
no reason to disable the pass for functions optimized for size.

If we block duplication for more than 1 insns for size optimized paths the pass
is able to do majority of threading decisions that are for free and improve 
codegen.
The code size benefit was between 0.5% to 2.7% on testcases I tried (tramp3d,
GCC modules, xlanancbmk and some other stuff around my hd).

Bootstrapped/regtested x86_64-linux, seems sane?

The pass should also avoid calling cleanup_cfg when no trheading was done
and i do not see why it is guarded by expensive_optimizations. What are the
main compile time complexity limitations?

Honza

* tree-ssa-threadbackward.c: Include tree-inline.h
(profitable_jump_thread_path): Use estimate_num_insns to estimate
size of copied block; for cold paths reduce duplication.
(find_jump_threads_backwards): Remove redundant tests.
(pass_thread_jumps::gate): Enable for -Os.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update testcase.
Index: tree-ssa-threadbackward.c
===
--- tree-ssa-threadbackward.c   (revision 237101)
+++ tree-ssa-threadbackward.c   (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
+#include "tree-inline.h"
 
 static int max_threaded_paths;
 
@@ -210,7 +211,7 @@ profitable_jump_thread_path (vec= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
+  if (optimize_edge_for_speed_p (taken_edge))
+{
+  if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "FSM jump-thread path not considered: "
+"the number of instructions on the path "
+"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
+ path->pop ();
+ return NULL;
+   }
+}
+  else if (n_insns > 1)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "FSM jump-thread path not considered: "
-"the number of instructions on the path "
-"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
+"duplication of %i insns is needed and optimizing for size.\n",
+n_insns);
   path->pop ();
   return NULL;
 }
@@ -600,10 +615,6 @@ fsm_find_control_statement_thread_paths
 void  
 find_jump_threads_backwards (basic_block bb)
 { 
-  if (!flag_expensive_optimizations
-  || optimize_function_for_size_p (cfun))
-return;
-
   gimple *stmt = get_gimple_control_stmt (bb);
   if (!stmt)
 return;
@@ -668,8 +679,7 @@ public:
 bool
 pass_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
 {
-  return (flag_expensive_optimizations
- && ! optimize_function_for_size_p (cfun));
+  return flag_expensive_optimizations;
 }
 
 
Index: testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
===
--- testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c(revision 237101)
+++ testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats 
-fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats" } */
+/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats 
-fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats 
-fno-guess-branch-probability" } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 16"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 11" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 6" "thread2" } } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" } } */


Re: tuple code simplification patch

2016-06-06 Thread Jonathan Wakely

On 02/06/16 23:00 +0200, François Dumont wrote:

Hi

   I was trying to play with tuple implementation and was annoyed by 
repetition of _Head type when instantiating _Head_base so I thought 
about this patch.


   How do you like it ?

   I still need to run tests, will do before commit, ok ?


Looks good, thanks. OK for trunk when testing passes.


Re: [patch] doc/sourcebuild.texi (Directives): Remove extra closing braces.

2016-06-06 Thread Jonathan Wakely

On 29/05/16 23:05 +0200, Gerald Pfeifer wrote:

On Sat, 16 Jan 2016, Jonathan Wakely wrote:

This removes stray closing braces in the docs for dg-error, dg-warning
etc.

OK for trunk?


Yes.

Sorry for the delay.  I expected someone else to pick this up for
review/approval, but now noticed that this patch apparently has
not been committed yet:

 commit 1cb064263cfcfa14da81585886750f01a5611c7e
 Author: Jonathan Wakely 
 Date:   Sat Jan 16 00:11:27 2016 +

   * doc/sourcebuild.texi (Directives): Remove extra closing braces.



I'd forgotten about it too - committed to trunk now, thanks.



Re: loop-ch tweek

2016-06-06 Thread Jan Hubicka
> On Sun, 5 Jun 2016, Jan Hubicka wrote:
> 
> > Hi,
> > both loop-ch and loop-ivcanon want to trottle down the heuristics on paths
> > containing call. Testing for presence of GIMPLE_CALL is wrong for internal
> > call and cheap builtins that are expanded inline.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> First of all the name is bad - I'd say gimple_inexpensive_call_p ()
> is better.  More comments below.

OK, the motivation for name is that I am really testing if the GIMPLE_CALL will 
end up
call instruction in the final assembly. No matter whetehr expensive or not.
> 
> gimple_code (stmt) == GIMPLE_CALL is redundant then.  I'd prefer to
> make gimple_inexpensive_call_p take a gcall * argument and do the
> test at the callers though.

OK, i will update patch.  I had mostly copied those tests from original
code which I think had them as short cirucuits. This most probalby does not
matter in practice and LTO may be eventually to do that for us. So it seemed
bit like premature optimization.  I will update the patch.

Honza
> 
> > {
> >   int flags = gimple_call_flags (stmt);
> > - tree decl = gimple_call_fndecl (stmt);
> > -
> > - if (decl && DECL_IS_BUILTIN (decl)
> > - && is_inexpensive_builtin (decl))
> > -   ;
> > - else if (flags & (ECF_PURE | ECF_CONST))
> > + if (flags & (ECF_PURE | ECF_CONST))
> > size->num_pure_calls_on_hot_path++;
> >   else
> > size->num_non_pure_calls_on_hot_path++;
> > 
> > 


[Ada] Make Gigi_Equivalent_Type idempotent

2016-06-06 Thread Eric Botcazou
This fixes a small regression in ASIS mode introduced by the new elaboration 
model for subprograms.  Tested on x86_64-suse-linux, applied on the mainline.


2016-06-06  Eric Botcazou  

* gcc-interface/decl.c (Gigi_Equivalent_Type): Make sure equivalent
types are present before returning them.  Remove final assertion.
(gnat_to_gnu_entity) : Adjust to
above change.
: Likewise.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 322166)
+++ gcc-interface/decl.c	(revision 322167)
@@ -4005,9 +4005,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 
 case E_Access_Protected_Subprogram_Type:
 case E_Anonymous_Access_Protected_Subprogram_Type:
-  /* The run-time representation is the equivalent type.  */
-  if (type_annotate_only && No (gnat_equiv_type))
+  /* If we are just annotating types and have no equivalent record type,
+	 just return ptr_void_type.  */
+  if (type_annotate_only && gnat_equiv_type == gnat_entity)
 	gnu_type = ptr_type_node;
+
+  /* The run-time representation is the equivalent type.  */
   else
 	{
 	  gnu_type = gnat_to_gnu_type (gnat_equiv_type);
@@ -4373,7 +4376,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	 just return void_type, except for root types that have discriminants
 	 because the discriminants will very likely be used in the declarative
 	 part of the associated body so they need to be translated.  */
-  if (type_annotate_only && No (gnat_equiv_type))
+  if (type_annotate_only && gnat_equiv_type == gnat_entity)
 	{
 	  if (Has_Discriminants (gnat_entity)
 	  && Root_Type (gnat_entity) == gnat_entity)
@@ -5139,26 +5142,26 @@ Gigi_Equivalent_Type (Entity_Id gnat_ent
 
 case E_Access_Protected_Subprogram_Type:
 case E_Anonymous_Access_Protected_Subprogram_Type:
-  gnat_equiv = Equivalent_Type (gnat_entity);
+  if (Present (Equivalent_Type (gnat_entity)))
+	gnat_equiv = Equivalent_Type (gnat_entity);
   break;
 
 case E_Class_Wide_Type:
   gnat_equiv = Root_Type (gnat_entity);
   break;
 
-case E_Task_Type:
-case E_Task_Subtype:
 case E_Protected_Type:
 case E_Protected_Subtype:
-  gnat_equiv = Corresponding_Record_Type (gnat_entity);
+case E_Task_Type:
+case E_Task_Subtype:
+  if (Present (Corresponding_Record_Type (gnat_entity)))
+	gnat_equiv = Corresponding_Record_Type (gnat_entity);
   break;
 
 default:
   break;
 }
 
-  gcc_assert (Present (gnat_equiv) || type_annotate_only);
-
   return gnat_equiv;
 }
 


Re: [PING^2] Re: Updated autofdo bootstrap and testing patches

2016-06-06 Thread Bernd Schmidt

On 05/30/2016 12:17 AM, Andi Kleen wrote:

Andi Kleen  writes:

Ping^2!


I think patches #1 and #5 had unaddressed comments from Jan. I think you 
should ping build system maintainers directly for #4.



Bernd



[Ada] Add support for noinline and noclone attributes

2016-06-06 Thread Eric Botcazou
They are automatically set by the middle-end when the interrupt attribute is.

Tested on x86_64-suse-linux, applied on the mainline and 6 branch.


2016-06-06  Eric Botcazou  

* gcc-interface/utils.c (gnat_internal_attribute_table): Add support
for noinline and noclone attributes.
(handle_noinline_attribute): New handler.
(handle_noclone_attribute): Likewise.

-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 237119)
+++ gcc-interface/utils.c	(working copy)
@@ -90,6 +90,8 @@ static tree handle_novops_attribute (tre
 static tree handle_nonnull_attribute (tree *, tree, tree, int, bool *);
 static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *);
 static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
+static tree handle_noinline_attribute (tree *, tree, tree, int, bool *);
+static tree handle_noclone_attribute (tree *, tree, tree, int, bool *);
 static tree handle_leaf_attribute (tree *, tree, tree, int, bool *);
 static tree handle_always_inline_attribute (tree *, tree, tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
@@ -121,6 +123,10 @@ const struct attribute_spec gnat_interna
 false },
   { "noreturn", 0, 0,  true,  false, false, handle_noreturn_attribute,
 false },
+  { "noinline", 0, 0,  true,  false, false, handle_noinline_attribute,
+false },
+  { "noclone",  0, 0,  true,  false, false, handle_noclone_attribute,
+false },
   { "leaf", 0, 0,  true,  false, false, handle_leaf_attribute,
 false },
   { "always_inline",0, 0,  true,  false, false, handle_always_inline_attribute,
@@ -5960,6 +5966,51 @@ handle_noreturn_attribute (tree *node, t
   *no_add_attrs = true;
 }
 
+  return NULL_TREE;
+}
+
+/* Handle a "noinline" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_noinline_attribute (tree *node, tree name,
+			   tree ARG_UNUSED (args),
+			   int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+{
+  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
+	{
+	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
+		   "with attribute %qs", name, "always_inline");
+	  *no_add_attrs = true;
+	}
+  else
+	DECL_UNINLINABLE (*node) = 1;
+}
+  else
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+/* Handle a "noclone" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_noclone_attribute (tree *node, tree name,
+			  tree ARG_UNUSED (args),
+			  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+
   return NULL_TREE;
 }
 


[Ada] Fix double elaboration of qualified expression in allocator

2016-06-06 Thread Eric Botcazou
This is a regression present on the mainline and 6 branch: under some peculiar 
circumstances, the qualified expression of an allocator can be elaborated 
twice by gigi.

Tested on x86_64-suse-linux, applied on the mainline and 6 branch.


2016-06-06  Eric Botcazou  

* gcc-interface/utils2.c (build_call_alloc_dealloc): Do not
substitute placeholder expressions here but...
* gcc-interface/trans.c (gnat_to_gnu) : ...here.
Make an exception to the protection of a CALL_EXPR result with an
unconstrained type only in the same cases as Call_to_gnu.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 237123)
+++ gcc-interface/trans.c	(working copy)
@@ -7640,10 +7640,11 @@ gnat_to_gnu (Node_Id gnat_node)
 	  else
 	gnu_actual_obj_type = gnu_obj_type;
 
+	  tree gnu_size = TYPE_SIZE_UNIT (gnu_actual_obj_type);
+	  gnu_size = SUBSTITUTE_PLACEHOLDER_IN_EXPR (gnu_size, gnu_ptr);
+
 	  gnu_result
-	  = build_call_alloc_dealloc (gnu_ptr,
-	  TYPE_SIZE_UNIT (gnu_actual_obj_type),
-	  gnu_obj_type,
+	  = build_call_alloc_dealloc (gnu_ptr, gnu_size, gnu_obj_type,
 	  Procedure_To_Call (gnat_node),
 	  Storage_Pool (gnat_node),
 	  gnat_node);
@@ -7729,16 +7730,22 @@ gnat_to_gnu (Node_Id gnat_node)
 N_Raise_Constraint_Error));
 }
 
-  /* If the result has side-effects and is of an unconstrained type, make a
- SAVE_EXPR so that we can be sure it will only be referenced once.  But
- this is useless for a call to a function that returns an unconstrained
- type with default discriminant, as we cannot compute the size of the
- actual returned object.  We must do this before any conversions.  */
+  /* If the result has side-effects and is of an unconstrained type, protect
+ the expression in case it will be referenced multiple times, i.e. for
+ its value and to compute the size of an object.  But do it neither for
+ an object nor a renaming declaration, nor a return statement of a call
+ to a function that returns an unconstrained record type with default
+ discriminant, because there is no size to be computed in these cases
+ and this will create a useless temporary.  We must do this before any
+ conversions.  */
   if (TREE_SIDE_EFFECTS (gnu_result)
-  && !(TREE_CODE (gnu_result) == CALL_EXPR
-	   && type_is_padding_self_referential (TREE_TYPE (gnu_result)))
   && (TREE_CODE (gnu_result_type) == UNCONSTRAINED_ARRAY_TYPE
-	  || CONTAINS_PLACEHOLDER_P (TYPE_SIZE (gnu_result_type
+	  || CONTAINS_PLACEHOLDER_P (TYPE_SIZE (gnu_result_type)))
+  && !(TREE_CODE (gnu_result) == CALL_EXPR
+	   && type_is_padding_self_referential (TREE_TYPE (gnu_result))
+	   && (Nkind (Parent (gnat_node)) == N_Object_Declaration
+	   || Nkind (Parent (gnat_node)) == N_Object_Renaming_Declaration
+	   || Nkind (Parent (gnat_node)) == N_Simple_Return_Statement)))
 gnu_result = gnat_protect_expr (gnu_result);
 
   /* Now convert the result to the result type, unless we are in one of the
Index: gcc-interface/utils2.c
===
--- gcc-interface/utils2.c	(revision 237088)
+++ gcc-interface/utils2.c	(working copy)
@@ -2268,8 +2268,6 @@ build_call_alloc_dealloc (tree gnu_obj,
   Entity_Id gnat_proc, Entity_Id gnat_pool,
   Node_Id gnat_node)
 {
-  gnu_size = SUBSTITUTE_PLACEHOLDER_IN_EXPR (gnu_size, gnu_obj);
-
   /* Explicit proc to call ?  This one is assumed to deal with the type
  alignment constraints.  */
   if (Present (gnat_proc))


[Ada] Fix single-stepping regression with tagged types

2016-06-06 Thread Eric Botcazou
This ensures that we don't import an artificial location from the spec when an 
object of a tagged type is elaborated.

Tested on x86_64-suse-linux, applied on the mainline.


2016-06-06  Eric Botcazou  

* gcc-interface/trans.c (gnat_to_gnu): Rework special code dealing
with boolean rvalues and set the location directly.  Do not set the
location in the other cases for a simple name.
(gnat_to_gnu_external): Clear the location on the expression.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 237122)
+++ gcc-interface/trans.c	(working copy)
@@ -7686,10 +7686,11 @@ gnat_to_gnu (Node_Id gnat_node)
 current_function_decl = NULL_TREE;
 
   /* When not optimizing, turn boolean rvalues B into B != false tests
- so that the code just below can put the location information of the
- reference to B on the inequality operator for better debug info.  */
+ so that we can put the location information of the reference to B on
+ the inequality operator for better debug info.  */
   if (!optimize
   && TREE_CODE (gnu_result) != INTEGER_CST
+  && TREE_CODE (gnu_result) != TYPE_DECL
   && (kind == N_Identifier
 	  || kind == N_Expanded_Name
 	  || kind == N_Explicit_Dereference
@@ -7698,15 +7699,19 @@ gnat_to_gnu (Node_Id gnat_node)
 	  || kind == N_Selected_Component)
   && TREE_CODE (get_base_type (gnu_result_type)) == BOOLEAN_TYPE
   && !lvalue_required_p (gnat_node, gnu_result_type, false, false, false))
-gnu_result = build_binary_op (NE_EXPR, gnu_result_type,
-  convert (gnu_result_type, gnu_result),
-  convert (gnu_result_type,
-	   boolean_false_node));
-
-  /* Set the location information on the result.  Note that we may have
- no result if we tried to build a CALL_EXPR node to a procedure with
- no side-effects and optimization is enabled.  */
-  if (gnu_result && EXPR_P (gnu_result))
+{
+  gnu_result
+	= build_binary_op (NE_EXPR, gnu_result_type,
+			   convert (gnu_result_type, gnu_result),
+			   convert (gnu_result_type, boolean_false_node));
+  if (TREE_CODE (gnu_result) != INTEGER_CST)
+	set_gnu_expr_location_from_node (gnu_result, gnat_node);
+}
+
+  /* Set the location information on the result if it's not a simple name.
+ Note that we may have no result if we tried to build a CALL_EXPR node
+ to a procedure with no side-effects and optimization is enabled.  */
+  else if (kind != N_Identifier && gnu_result && EXPR_P (gnu_result))
 set_gnu_expr_location_from_node (gnu_result, gnat_node);
 
   /* If we're supposed to return something of void_type, it means we have
@@ -7858,6 +7863,10 @@ gnat_to_gnu_external (Node_Id gnat_node)
   if (went_into_elab_proc)
 current_function_decl = NULL_TREE;
 
+  /* Do not import locations from external units.  */
+  if (gnu_result && EXPR_P (gnu_result))
+SET_EXPR_LOCATION (gnu_result, UNKNOWN_LOCATION);
+
   return gnu_result;
 }
 


[Ada] Small adjustments in handling of subprograms

2016-06-06 Thread Eric Botcazou
No functional change.  Tested on x86_64-suse-linux, applied on the mainline.


2016-06-06  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Remove
useless 'else' statements and tidy up.
: Fully deal with the declaration here.
: Use properly-typed constant.
Assert that we don't apply the special type treatment to dummy types.
Separate this treatment from the final back-annotation and simplify
the condition for the RM size.
(gnat_to_gnu_param): Add GNU_PARAM_TYPE parameter and adjust.
(gnat_to_gnu_subprog_type): Ajust call to gnat_to_gnu_param.
* gcc-interface/trans.c (gnat_to_gnu) : Add
comment.
(process_freeze_entity): Remove obsolete code.
(process_type): Minor tweaks.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 237119)
+++ gcc-interface/decl.c	(working copy)
@@ -496,8 +496,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	   be a FIELD_DECL.  Likewise for discriminants.  If the entity is a
 	   non-girder discriminant (in the case of derived untagged record
 	   types), return the stored discriminant it renames.  */
-	else if (Present (Original_Record_Component (gnat_entity))
-		 && Original_Record_Component (gnat_entity) != gnat_entity)
+	if (Present (Original_Record_Component (gnat_entity))
+	&& Original_Record_Component (gnat_entity) != gnat_entity)
 	  {
 	gnu_decl
 	  = gnat_to_gnu_entity (Original_Record_Component (gnat_entity),
@@ -509,7 +509,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	/* Otherwise, if we are not defining this and we have no GCC type
 	   for the containing record, make one for it.  Then we should
 	   have made our own equivalent.  */
-	else if (!definition && !present_gnu_tree (gnat_record))
+	if (!definition && !present_gnu_tree (gnat_record))
 	  {
 	/* ??? If this is in a record whose scope is a protected
 	   type and we have an Original_Record_Component, use it.
@@ -523,21 +523,21 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		  = gnat_to_gnu_entity (Original_Record_Component
 	(gnat_entity),
 	gnu_expr, false);
-		saved = true;
-		break;
+	  }
+	else
+	  {
+		gnat_to_gnu_entity (Scope (gnat_entity), NULL_TREE, false);
+		gnu_decl = get_gnu_tree (gnat_entity);
 	  }
 
-	gnat_to_gnu_entity (Scope (gnat_entity), NULL_TREE, false);
-	gnu_decl = get_gnu_tree (gnat_entity);
 	saved = true;
 	break;
 	  }
 
-	else
-	  /* Here we have no GCC type and this is a reference rather than a
-	 definition.  This should never happen.  Most likely the cause is
-	 reference before declaration in the GNAT tree for gnat_entity.  */
-	  gcc_unreachable ();
+	/* Here we have no GCC type and this is a reference rather than a
+	   definition.  This should never happen.  Most likely the cause is
+	   reference before declaration in the GNAT tree for gnat_entity.  */
+	gcc_unreachable ();
   }
 
 case E_Constant:
@@ -1064,6 +1064,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		gcc_assert (ralign >= align);
 		  }
 
+		/* The expression might not be a DECL so save it manually.  */
 		save_gnu_tree (gnat_entity, gnu_decl, true);
 		saved = true;
 		annotate_object (gnat_entity, gnu_type, NULL_TREE, false);
@@ -2828,8 +2829,9 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
   NULL_TREE, false);
 	  this_made_decl = true;
 	  gnu_type = TREE_TYPE (gnu_decl);
-
 	  save_gnu_tree (gnat_entity, NULL_TREE, false);
+	  save_gnu_tree (gnat_entity, gnu_decl, false);
+	  saved = true;
 
 	  gnu_inner = gnu_type;
 	  while (TREE_CODE (gnu_inner) == RECORD_TYPE
@@ -4356,7 +4358,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	gnu_decl = TYPE_STUB_DECL (gnu_type);
 	if (Has_Completion_In_Body (gnat_entity))
 	  DECL_TAFT_TYPE_P (gnu_decl) = 1;
-	save_gnu_tree (full_view, gnu_decl, 0);
+	save_gnu_tree (full_view, gnu_decl, false);
 	  }
   }
   break;
@@ -4455,6 +4457,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
  handling alignment and possible padding.  */
   if (is_type && (!gnu_decl || this_made_decl))
 {
+  gcc_assert (!TYPE_IS_DUMMY_P (gnu_type));
+
   /* Process the attributes, if not already done.  Note that the type is
 	 already defined so we cannot pass true for IN_PLACE here.  */
   process_attributes (&gnu_type, &attr_list, false, gnat_entity);
@@ -4703,21 +4707,6 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	}
 	}
 
-  if (!gnu_decl)
-	gnu_decl = create_type_decl (gnu_entity_name, gnu_type,
- artificial_p, debug_info_p,
- gnat_entity);
-  else
-	{
-	  TREE_TYPE (gnu_decl) = gnu_type;
-	  TYPE_STUB_DECL (gnu_type) = gnu_decl;
-	}
-}
-
-  if (is_type && !TYPE_IS_DUMMY_P (TREE_TYPE (gnu_decl)))
-{
-  gnu_type = TREE_TYPE (gnu_decl);
-
   /* If this is a derived type, relate its alia

Re: [PING] [PATCH] Make basic asm implicitly clobber memory

2016-06-06 Thread Bernd Schmidt

On 06/05/2016 06:02 PM, Bernd Edlinger wrote:


I think we all agreed on the general direction of this patch.

The patch is basically unchanged from previous version,
except one line in doc/extend.texi has been updated.

So I would like to ask if it is OK for trunk.


Are there any users of extract_asm_operands left outside recog.c? If 
not, it should be made static. Otherwise ok.



Bernd



[Ada] Remove deferred freezing of subprograms in Freeze_Entity

2016-06-06 Thread Eric Botcazou
This removes the specific mechanism present in Freeze_Entity to defer the
freezing of functions returning an incomplete type coming from a limited
context.  It was invented to cope with the old elaboration model for
subprograms in gigi, which didn't really implement AI05-151 and AI05-019.
The new elaboration model implements both AIs so the mechanism is obsolete.

No functional change.  Tested on x86_64-suse-linux, applied on the mainline.


2016-06-06  Eric Botcazou  

* einfo.ads (Returns_Limited_View): Remove.
(Set_Returns_Limited_View ): Likewise.
* einfo.adb (Returns_Limited_View): Likewise.
(Set_Returns_Limited_View ): Likewise.
* freeze.adb (Late_Freeze_Subprogram): Remove.
(Freeze_Entity): Do not defer the freezing of functions returning an
incomplete type coming from a limited context.

-- 
Eric BotcazouIndex: einfo.ads
===
--- einfo.ads	(revision 237088)
+++ einfo.ads	(working copy)
@@ -3973,12 +3973,6 @@ package Einfo is
 --   by reference, either because its return type is a by-reference-type
 --   or because the function explicitly uses the secondary stack.
 
---Returns_Limited_View (Flag134)
---   Defined in function entities. Set if the return type of the function
---   at the point of definition is a limited view. Used to handle the late
---   freezing of the function when it is called in the current semantic
---   unit while it is still unfrozen.
-
 --Reverse_Bit_Order (Flag164) [base type only]
 --   Defined in all record type entities. Set if entity has a Bit_Order
 --   aspect (set by an aspect clause or attribute definition clause) that
@@ -5972,7 +5966,6 @@ package Einfo is
--Requires_Overriding (Flag213)  (non-generic case only)
--Return_Present  (Flag54)
--Returns_By_Ref  (Flag90)
-   --Returns_Limited_View(Flag134)  (non-generic case only)
--Rewritten_For_C (Flag287)  (generate C code only)
--Sec_Stack_Needed_For_Return (Flag167)
--SPARK_Pragma_Inherited  (Flag265)
@@ -7174,7 +7167,6 @@ package Einfo is
function Return_Applies_To   (Id : E) return N;
function Return_Present  (Id : E) return B;
function Returns_By_Ref  (Id : E) return B;
-   function Returns_Limited_View(Id : E) return B;
function Reverse_Bit_Order   (Id : E) return B;
function Reverse_Storage_Order   (Id : E) return B;
function Rewritten_For_C (Id : E) return B;
@@ -7848,7 +7840,6 @@ package Einfo is
procedure Set_Return_Applies_To   (Id : E; V : N);
procedure Set_Return_Present  (Id : E; V : B := True);
procedure Set_Returns_By_Ref  (Id : E; V : B := True);
-   procedure Set_Returns_Limited_View(Id : E; V : B := True);
procedure Set_Reverse_Bit_Order   (Id : E; V : B := True);
procedure Set_Reverse_Storage_Order   (Id : E; V : B := True);
procedure Set_Rewritten_For_C (Id : E; V : B := True);
@@ -8678,7 +8669,6 @@ package Einfo is
pragma Inline (Return_Applies_To);
pragma Inline (Return_Present);
pragma Inline (Returns_By_Ref);
-   pragma Inline (Returns_Limited_View);
pragma Inline (Reverse_Bit_Order);
pragma Inline (Reverse_Storage_Order);
pragma Inline (Rewritten_For_C);
@@ -9143,7 +9133,6 @@ package Einfo is
pragma Inline (Set_Return_Applies_To);
pragma Inline (Set_Return_Present);
pragma Inline (Set_Returns_By_Ref);
-   pragma Inline (Set_Returns_Limited_View);
pragma Inline (Set_Reverse_Bit_Order);
pragma Inline (Set_Reverse_Storage_Order);
pragma Inline (Set_Rewritten_For_C);
Index: einfo.adb
===
--- einfo.adb	(revision 237088)
+++ einfo.adb	(working copy)
@@ -432,7 +432,6 @@ package body Einfo is
--No_Pool_AssignedFlag131
--Is_Default_Init_Cond_Procedure  Flag132
--Has_Inherited_Default_Init_Cond Flag133
-   --Returns_Limited_ViewFlag134
--Has_Aliased_Components  Flag135
--No_Strict_Aliasing  Flag136
--Is_Machine_Code_Subprogram  Flag137
@@ -3065,12 +3064,6 @@ package body Einfo is
   return Flag90 (Id);
end Returns_By_Ref;
 
-   function Returns_Limited_View (Id : E) return B is
-   begin
-  pragma Assert (Ekind (Id) = E_Function);
-  return Flag134 (Id);
-   end Returns_Limited_View;
-
function Reverse_Bit_Order (Id : E) return B is
begin
   pragma Assert (Is_Record_Type (Id));
@@ -6142,12 +6135,6 @@ package body Einfo is
   Set_Flag90 (Id, V);
end Set_Returns_By_Ref;
 
-   procedure 

[Ada] Follow-up work for AI05-0151 (incomplete types in profiles)

2016-06-06 Thread Eric Botcazou
This is a follow up to the initial implementation done in
  https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01710.html

It completes the transition to the new elaboration model, makes it a bit more 
efficient by reusing already elaborated entities as much as possibke and fixes 
an issue with function symbols on Windows.

Tested on x86_64-suse-linux, applied on the mainline.


2016-06-06  Eric Botcazou  

* gcc-interface/gigi.h (finish_subprog_decl): Add ASM_NAME parameter.
* gcc-interface/decl.c (gnu_ext_name_for_subprog): New function.
(gnat_to_gnu_entity) : Do not check compatibility
of profiles for builtins here...  Call gnu_ext_name_for_subprog.
Also update profiles if pointers to limited_with'ed types are
updated.
(gnat_to_gnu_param): Restore the correct source location information
for vector ABI warnings.
(associate_subprog_with_dummy_type): Add comment about AI05-019.
Set TYPE_DUMMY_IN_PROFILE_P flag unconditionally.
(update_profile): Deal with builtin declarations.
Call gnu_ext_name_for_subprog.  Adjust call to finish_subprog_decl.
(update_profiles_with): Add comment.
(gnat_to_gnu_subprog_type): Reuse the return type if it is complete.
Likewise for parameter declarations in most cases.  Do not change
the return type for the CICO mechanism if the profile is incomplete.
...but here instead.  Always reset the slot for the parameters.
* gcc-interface/utils.c (create_subprog_decl): Call
gnu_ext_name_for_subprog.  Do not set the assembler name here but...
(finish_subprog_decl): ...here instead.  Add ASM_NAME parameter.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 237088)
+++ gcc-interface/decl.c	(working copy)
@@ -204,6 +204,7 @@ static tree elaborate_reference (tree, E
 static tree gnat_to_gnu_component_type (Entity_Id, bool, bool);
 static tree gnat_to_gnu_subprog_type (Entity_Id, bool, bool, tree *);
 static tree gnat_to_gnu_field (Entity_Id, tree, int, bool, bool);
+static tree gnu_ext_name_for_subprog (Entity_Id, tree);
 static tree change_qualified_type (tree, int);
 static bool same_discriminant_p (Entity_Id, Entity_Id);
 static bool array_type_has_nonaliased_component (tree, Entity_Id);
@@ -4109,7 +4110,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 case E_Function:
 case E_Procedure:
   {
-	tree gnu_ext_name = create_concat_name (gnat_entity, NULL);
+	tree gnu_ext_name
+	  = gnu_ext_name_for_subprog (gnat_entity, gnu_entity_name);
 	enum inline_status_t inline_status
 	  = Has_Pragma_No_Inline (gnat_entity)
 	? is_suppressed
@@ -4191,48 +4193,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	gnu_type
 	  = gnat_to_gnu_subprog_type (gnat_entity, definition, debug_info_p,
   &gnu_param_list);
-
-	/* If this subprogram is expectedly bound to a GCC builtin, fetch the
-	   corresponding DECL node and check the parameter association.  */
-	if (Convention (gnat_entity) == Convention_Intrinsic
-	&& Present (Interface_Name (gnat_entity)))
-	  {
-	tree gnu_builtin_decl = builtin_decl_for (gnu_ext_name);
-
-	/* If we have a builtin DECL for that function, use it.  Check if
-	   the profiles are compatible and warn if they are not.  Note that
-	   the checker is expected to post diagnostics in this case.  */
-	if (gnu_builtin_decl)
-	  {
-		intrin_binding_t inb
-		  = { gnat_entity, gnu_type, TREE_TYPE (gnu_builtin_decl) };
-
-		if (!intrin_profiles_compatible_p (&inb))
-		  post_error
-		("?profile of& doesn''t match the builtin it binds!",
-		 gnat_entity);
-
-		gnu_decl = gnu_builtin_decl;
-		gnu_type = TREE_TYPE (gnu_builtin_decl);
-		break;
-	  }
-
-	/* Inability to find the builtin DECL most often indicates a
-	   genuine mistake, but imports of unregistered intrinsics are
-	   sometimes issued on purpose to allow hooking in alternate
-	   bodies.  We post a warning conditioned on Wshadow in this case,
-	   to let developers be notified on demand without risking false
-	   positives with common default sets of options.  */
-	else if (warn_shadow)
-	  post_error ("?gcc intrinsic not found for&!", gnat_entity);
-	  }
-
-	/* If there was no specified Interface_Name and the external and
-	   internal names of the subprogram are the same, only use the
-	   internal name to allow disambiguation of nested subprograms.  */
-	if (No (Interface_Name (gnat_entity))
-	&& gnu_ext_name == gnu_entity_name)
-	  gnu_ext_name = NULL_TREE;
+	if (DECL_P (gnu_type))
+	  {
+	gnu_decl = gnu_type;
+	gnu_type = TREE_TYPE (gnu_decl);
+	break;
+	  }
 
 	/* Deal with platform-specific calling conventions.  */
 	if (Has_Stdcall_Convention (gnat_entity))
@@ -5008,6 +4974,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  {
 	update_pointer_to (TYPE_MAIN_VARIANT 

Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-06 Thread Jakub Jelinek
On Mon, Jun 06, 2016 at 10:03:12AM +0100, Alan Hayward wrote:
> 
> On 03/06/2016 18:45, "Jakub Jelinek"  wrote:
> 
> >On Thu, Jun 02, 2016 at 05:11:15PM +0100, Alan Hayward wrote:
> >>* gcc.dg/vect/vect-live-1.c: New test.
> >>* gcc.dg/vect/vect-live-2.c: New test.
> >>* gcc.dg/vect/vect-live-5.c: New test.
> >>* gcc.dg/vect/vect-live-slp-1.c: New test.
> >>* gcc.dg/vect/vect-live-slp-2.c: New test.
> >>* gcc.dg/vect/vect-live-slp-3.c: New test.
> >
> >These tests all fail for me on i686-linux.  The problem is
> >in the use of dg-options in gcc.dg/vect/, where it override all the
> >various
> >needed vectorization options that need to be enabled on various arches
> >(e.g. -msse2 on i686).
> >
> >Fixed thusly, tested on x86_64-linux and i686-linux, ok for trunk?
> 
> Thanks for fixing this. However, I think you had a copy/paste error.
> For vect-live-slp-1.c and vect-live-slp-3.c you used dg-options instead
> of dg-additional-options, which causes the tests to fail for me as
> UNRESOLVED.
> 
> Ok to commit?

Oops.  Sure, thanks.

> testsuite/
>   * gcc.dg/vect/vect-live-1.c: Use additional-options.
>   * gcc.dg/vect/vect-live-3.c: Likewise.

Jakub


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-06 Thread Alan Hayward

On 03/06/2016 18:45, "Jakub Jelinek"  wrote:

>On Thu, Jun 02, 2016 at 05:11:15PM +0100, Alan Hayward wrote:
>>  * gcc.dg/vect/vect-live-1.c: New test.
>>  * gcc.dg/vect/vect-live-2.c: New test.
>>  * gcc.dg/vect/vect-live-5.c: New test.
>>  * gcc.dg/vect/vect-live-slp-1.c: New test.
>>  * gcc.dg/vect/vect-live-slp-2.c: New test.
>>  * gcc.dg/vect/vect-live-slp-3.c: New test.
>
>These tests all fail for me on i686-linux.  The problem is
>in the use of dg-options in gcc.dg/vect/, where it override all the
>various
>needed vectorization options that need to be enabled on various arches
>(e.g. -msse2 on i686).
>
>Fixed thusly, tested on x86_64-linux and i686-linux, ok for trunk?

Thanks for fixing this. However, I think you had a copy/paste error.
For vect-live-slp-1.c and vect-live-slp-3.c you used dg-options instead
of dg-additional-options, which causes the tests to fail for me as
UNRESOLVED.

Ok to commit?

Alan.


testsuite/
* gcc.dg/vect/vect-live-1.c: Use additional-options.
* gcc.dg/vect/vect-live-3.c: Likewise.



diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
index 
82cd8dcabed8f18408e0e5cf295c2f15e0621ae6..7fefbff17d8c34bdfb1e8d697d2b9a70a
048ae16 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-fno-tree-scev-cprop" } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */

 #include "tree-vect.h"

diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
index 
9e4a59eca593289050086b4b0a1347be5d748723..aacf5cb98071f6fec1f4b522eeefeb669
6787334 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target vect_int } */
-/* { dg-options "-fno-tree-scev-cprop" } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */

 #include "tree-vect.h"





Re: [PR71408] - Fix wrong code at -Os and above

2016-06-06 Thread Richard Biener
On Sun, Jun 5, 2016 at 12:54 PM, kugan
 wrote:
> Hi All,
>
> For the testcase in PR71408 zero_one_operation seems still broken. In
> handling NEGATE_EXPR, as part of undistribute_ops_list, in
> zero_one_operation, we are doing propagate_op_to_single_use (op, stmt, def);
>
> This results in:
> -  _14 = _5 * _12;
>_15 = (int) _11;
>_16 = ~_15;
>_17 = (unsigned int) _16;
> -  _18 = -_5;
> -  _19 = _17 * _18;
> -  _20 = _14 + _19;
> -  _24 = _5 & _20;
> +  _19 = _5 * _17;
> +  _35 = _19 + _12;
> +  _34 = _35 * _5;
> +  _20 = _34;
> +  _24 = _20 & _5;
>
> We should instead propagate (-1) as "op" is the one which gets factored out.
> With the attached patch we now have:
> -  _14 = _5 * _12;
>_15 = (int) _11;
>_16 = ~_15;
>_17 = (unsigned int) _16;
> -  _18 = -_5;
> -  _19 = _17 * _18;
> -  _20 = _14 + _19;
> -  _24 = _5 & _20;
> +  _32 = _17;
> +  _19 = -_32;
> +  _34 = _19 + _12;
> +  _33 = _34 * _5;
> +  _20 = _33;
> +  _24 = _20 & _5;
>
> Regression tested and bootstrapped on x86-64-linux-gnu with no new
> regression. Is this OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2016-06-05  Kugan Vivekanandarajah  
>
> PR middle-end/71408
> * tree-ssa-reassoc.c (zero_one_operation): Fix NEGATE_EXPR operand
> for
> propagate_op_to_single_use.
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-06-05  Kugan Vivekanandarajah  
>
> PR middle-end/71408
> * gcc.dg/tree-ssa/pr71408.c: New test.
>
>
>


[Diagnostic Patch] Clean-up diagnostic facilities in diagnostic.c

2016-06-06 Thread Paolo Carlini

Hi,

yesterday I had the idea of this small clean-up: move the work done by 
emit_diagnostic to a new non-variadic diagnostic_impl and use it to 
implement the former and all the various inform, warning, permerror, etc 
(lately we have the *_at_rich_loc variants too). Something similar can 
be done for inform_n, warning_n and errror_n. There is the minor 
subtlety of the declarations decorated with ATTRIBUTE_GCC_DIAG when 
available to suppress the build-time warning: I think I did it the right 
way, I took inspiration from the declarations of diagnostic_set_info, 
etc, in diagnostic.h.


Tested x86_64-linux.

Thanks,
Paolo.


2016-06-06  Paolo Carlini  

* diagnostic.c (diagnostic_impl, diagnostic_n_impl): New.
(inform, inform_at_rich_loc, inform_n, warning, warning_at,
warning_at_rich_loc, warning_n, pedwarn, permerror,
permerror_at_rich_loc, error, error_n, error_at, error_at_rich_loc,
sorry, fatal_error, internal_error, internal_error_no_backtrace):
Use the above.
Index: diagnostic.c
===
--- diagnostic.c(revision 237117)
+++ diagnostic.c(working copy)
@@ -46,6 +46,13 @@ along with GCC; see the file COPYING3.  If not see
 #define permissive_error_option(DC) ((DC)->opt_permissive)
 
 /* Prototypes.  */
+#ifdef ATTRIBUTE_GCC_DIAG
+static bool diagnostic_impl (rich_location *, int, const char *,
+va_list *, diagnostic_t) ATTRIBUTE_GCC_DIAG(3,0);
+static bool diagnostic_n_impl (location_t, int, int, const char *,
+  const char *, va_list *,
+  diagnostic_t) ATTRIBUTE_GCC_DIAG(5,0);
+#endif
 static void error_recursion (diagnostic_context *) ATTRIBUTE_NORETURN;
 
 static void real_abort (void) ATTRIBUTE_NORETURN;
@@ -913,29 +920,51 @@ diagnostic_append_note (diagnostic_context *contex
   va_end (ap);
 }
 
-bool
-emit_diagnostic (diagnostic_t kind, location_t location, int opt,
-const char *gmsgid, ...)
+static bool
+diagnostic_impl (rich_location *richloc, int opt,
+const char *gmsgid,
+va_list *ap, diagnostic_t kind)
 {
   diagnostic_info diagnostic;
-  va_list ap;
-  bool ret;
-  rich_location richloc (line_table, location);
-
-  va_start (ap, gmsgid);
   if (kind == DK_PERMERROR)
 {
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
+  diagnostic_set_info (&diagnostic, gmsgid, ap, richloc,
   permissive_error_kind (global_dc));
   diagnostic.option_index = permissive_error_option (global_dc);
 }
-  else {
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, kind);
+  else
+{
+  diagnostic_set_info (&diagnostic, gmsgid, ap, richloc, kind);
   if (kind == DK_WARNING || kind == DK_PEDWARN)
diagnostic.option_index = opt;
-  }
+}
+  return report_diagnostic (&diagnostic);
+}
 
-  ret = report_diagnostic (&diagnostic);
+static bool
+diagnostic_n_impl (location_t location, int opt, int n,
+  const char *singular_gmsgid,
+  const char *plural_gmsgid,
+  va_list *ap, diagnostic_t kind)
+{
+  diagnostic_info diagnostic;
+  rich_location richloc (line_table, location);
+  diagnostic_set_info_translated (&diagnostic,
+  ngettext (singular_gmsgid, plural_gmsgid, n),
+  ap, &richloc, kind);
+  if (kind == DK_WARNING)
+diagnostic.option_index = opt;
+  return report_diagnostic (&diagnostic);
+}
+
+bool
+emit_diagnostic (diagnostic_t kind, location_t location, int opt,
+const char *gmsgid, ...)
+{
+  va_list ap;
+  va_start (ap, gmsgid);
+  rich_location richloc (line_table, location);
+  bool ret = diagnostic_impl (&richloc, opt, gmsgid, &ap, kind);
   va_end (ap);
   return ret;
 }
@@ -945,13 +974,10 @@ diagnostic_append_note (diagnostic_context *contex
 void
 inform (location_t location, const char *gmsgid, ...)
 {
-  diagnostic_info diagnostic;
   va_list ap;
+  va_start (ap, gmsgid);
   rich_location richloc (line_table, location);
-
-  va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_NOTE);
-  report_diagnostic (&diagnostic);
+  diagnostic_impl (&richloc, -1, gmsgid, &ap, DK_NOTE);
   va_end (ap);
 }
 
@@ -959,12 +985,9 @@ inform (location_t location, const char *gmsgid, .
 void
 inform_at_rich_loc (rich_location *richloc, const char *gmsgid, ...)
 {
-  diagnostic_info diagnostic;
   va_list ap;
-
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, richloc, DK_NOTE);
-  report_diagnostic (&diagnostic);
+  diagnostic_impl (richloc, -1, gmsgid, &ap, DK_NOTE);
   va_end (ap);
 }
 
@@ -974,15 +997,10 @@ void
 inform_n (location_t location, int n, const char *singular_gmsgid,
   const char *plural_gmsgid, ...)
 {
-  diagnostic_info diagnostic;
   va

Re: [PR71281] ICE on gcc trunk on knl, wsm, ivb and bdw targets

2016-06-06 Thread Richard Biener
On Sat, Jun 4, 2016 at 4:25 AM, kugan  wrote:
> Hi,
>
> PR71281 happens when we use factored out negate stmt in other
> reassociations. Since we don't set the uid for this stmt, we hit the
> gcc_assert (in reassoc_stmt_dominates_stmt_p) which checks for uid being
> set. Attached patch fixes this.
>
> Regression tested on x86-64-linux-gnu with no new regression. Is this OK for
> trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2016-06-04  Kugan Vivekanandarajah  
>
> PR middle-end/71281
> * tree-ssa-reassoc.c (reassociate_bb): Set uid for negate stmt.
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-06-04  Kugan Vivekanandarajah  
>
> PR middle-end/71281
> * g++.dg/torture/pr71281.C: New test.


[Ada] Fix minor glitches related to subprogram declarations

2016-06-06 Thread Eric Botcazou
This patch fixes a few minor glitches that yield small irregularities in the
expanded code handed down to gigi, for example the declaration of subprograms
before that of the type of their parameters, types that are neither regular
nor Itypes, or return types with circularities.  No functional change.

Tested on x86_64-suse-linux, applied on the mainline.


2016-06-06  Eric Botcazou  

* exp_ch9.adb (Expand_N_Protected_Type_Declaration): Insert the
declaration of the corresponding record type before that of the
unprotected version of the subprograms that operate on it.
(Expand_Access_Protected_Subprogram_Type): Declare the
Equivalent_Type just before the original type.
* sem_ch3.adb (Handle_Late_Controlled_Primitive): Point the current
declaration to the newly created declaration for the primitive.
(Analyze_Subtype_Declaration): Remove obsolete code forcing the
freezing of the subtype before its declaration.
(Replace_Anonymous_Access_To_Protected_Subprogram): Insert the new
declaration in the nearest enclosing scope for formal parameters too.
(Build_Derived_Access_Type): Restore the status of the created Itype
after it is erased by Copy_Node.
* sem_ch6.adb (Exchange_Limited_Views): Remove guard on entry.
(Analyze_Subprogram_Body_Helper): Call Exchange_Limited_Views only if
the specification is present.
Move around the code changing the designated view of the return type
and save the original view.  Restore it on exit.
* sem_ch13.adb (Build_Predicate_Function_Declaration): Always insert
the declaration right after that of the type.

-- 
Eric BotcazouIndex: exp_ch9.adb
===
--- exp_ch9.adb	(revision 237088)
+++ exp_ch9.adb	(working copy)
@@ -6257,7 +6257,10 @@ package body Exp_Ch9 is
   Defining_Identifier => D_T2,
   Type_Definition => Def1);
 
-  Insert_After_And_Analyze (N, Decl1);
+  --  Declare the new types before the original one since the latter will
+  --  refer to them through the Equivalent_Type slot.
+
+  Insert_Before_And_Analyze (N, Decl1);
 
   --  Associate the access to subprogram with its original access to
   --  protected subprogram type. Needed by the backend to know that this
@@ -6292,7 +6295,7 @@ package body Exp_Ch9 is
   Component_List =>
 Make_Component_List (Loc, Component_Items => Comps)));
 
-  Insert_After_And_Analyze (Decl1, Decl2);
+  Insert_Before_And_Analyze (N, Decl2);
   Set_Equivalent_Type (T, E_T);
end Expand_Access_Protected_Subprogram_Type;
 
@@ -9316,6 +9319,9 @@ package body Exp_Ch9 is
 
   pragma Assert (Present (Pdef));
 
+  Insert_After (Current_Node, Rec_Decl);
+  Current_Node := Rec_Decl;
+
   --  Add private field components
 
   if Present (Private_Declarations (Pdef)) then
@@ -9576,9 +9582,6 @@ package body Exp_Ch9 is
  Append_To (Cdecls, Object_Comp);
   end if;
 
-  Insert_After (Current_Node, Rec_Decl);
-  Current_Node := Rec_Decl;
-
   --  Analyze the record declaration immediately after construction,
   --  because the initialization procedure is needed for single object
   --  declarations before the next entity is analyzed (the freeze call
Index: sem_ch13.adb
===
--- sem_ch13.adb	(revision 237088)
+++ sem_ch13.adb	(working copy)
@@ -9386,11 +9386,7 @@ package body Sem_Ch13 is
   Set_Is_Predicate_Function (SId);
   Set_Predicate_Function (Typ, SId);
 
-  if Comes_From_Source (Typ) then
- Insert_After (Parent (Typ), FDecl);
-  else
- Insert_After (Parent (Base_Type (Typ)), FDecl);
-  end if;
+  Insert_After (Parent (Typ), FDecl);
 
   Analyze (FDecl);
 
Index: sem_ch3.adb
===
--- sem_ch3.adb	(revision 237088)
+++ sem_ch3.adb	(working copy)
@@ -2168,7 +2168,7 @@ package body Sem_Ch3 is
   --  Determine whether Body_Decl denotes the body of a late controlled
   --  primitive (either Initialize, Adjust or Finalize). If this is the
   --  case, add a proper spec if the body lacks one. The spec is inserted
-  --  before Body_Decl and immedately analyzed.
+  --  before Body_Decl and immediately analyzed.
 
   procedure Remove_Visible_Refinements (Spec_Id : Entity_Id);
   --  Spec_Id is the entity of a package that may define abstract states.
@@ -2269,8 +2269,12 @@ package body Sem_Ch3 is
 
  Set_Null_Present (Spec, False);
 
- Insert_Before_And_Analyze (Body_Decl,
-   Make_Subprogram_Declaration (Loc, Specification => Spec));
+ --  Ensure that the freeze node is inserted after the declaration of
+ --  the primitive since its expansion will freeze the primitive.
+
+ 

Unreviewed patches

2016-06-06 Thread Rainer Orth
The following patches have remained unreviewed for a week:

[gotools, libcc1] Update copyright dates
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02307.html

Richard already approved the update-copyright.py changes, but the actual
effects on gotools and libcc1 require either maintainer or release
manager approval, I believe.

[build] Handle gas/gld --compress-debug-sections=type
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02325.html

This one needs a build maintainer.

Thanks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PING] [PATCH] Make basic asm implicitly clobber memory

2016-06-06 Thread Richard Biener
On Sun, 5 Jun 2016, Bernd Edlinger wrote:

> Ping...
> 
> I think we all agreed on the general direction of this patch.
> 
> The patch is basically unchanged from previous version,
> except one line in doc/extend.texi has been updated.
> 
> So I would like to ask if it is OK for trunk.

The gimple_asm_clobbers_memory_p change is ok.  I don't feel competent
to review the rest but agreed to the outcome.

Thanks,
Richard.


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-06 Thread Richard Biener
On Fri, 3 Jun 2016, Jakub Jelinek wrote:

> On Tue, Jan 12, 2016 at 05:21:37PM +0300, Ilya Enkovich wrote:
> > > --- gcc/tree-vect-slp.c.jj  2016-01-08 21:45:57.0 +0100
> > > +++ gcc/tree-vect-slp.c 2016-01-11 12:07:19.633366712 +0100
> > > @@ -2999,12 +2999,9 @@ vect_get_constant_vectors (tree op, slp_
> > >   gimple *init_stmt;
> > >   if (VECTOR_BOOLEAN_TYPE_P (vector_type))
> > > {
> > > - gcc_assert (fold_convertible_p (TREE_TYPE 
> > > (vector_type),
> > > - op));
> > > + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
> > >   init_stmt = gimple_build_assign (new_temp, 
> > > NOP_EXPR, op);
> > 
> > In vect_init_vector we had to introduce COND_EXPR to choose between 0 and 
> > -1 for
> > boolean vectors.  Shouldn't we do similar in SLP?
> 
> Apparently the answer to this is YES.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2?
> 
> 2016-06-03  Jakub Jelinek  
> 
>   PR tree-optimization/71259
>   * tree-vect-slp.c (vect_get_constant_vectors): For
>   VECTOR_BOOLEAN_TYPE_P, return all ones constant instead of
>   one for constant op, and use COND_EXPR for non-constant.
> 
>   * gcc.dg/vect/pr71259.c: New test.
> 
> --- gcc/tree-vect-slp.c.jj2016-05-24 10:56:02.0 +0200
> +++ gcc/tree-vect-slp.c   2016-06-03 17:01:12.740955935 +0200
> @@ -3056,7 +3056,7 @@ vect_get_constant_vectors (tree op, slp_
> if (integer_zerop (op))
>   op = build_int_cst (TREE_TYPE (vector_type), 0);
> else if (integer_onep (op))
> - op = build_int_cst (TREE_TYPE (vector_type), 1);
> + op = build_all_ones_cst (TREE_TYPE (vector_type));
> else
>   gcc_unreachable ();
>   }
> @@ -3071,8 +3071,14 @@ vect_get_constant_vectors (tree op, slp_
> gimple *init_stmt;
> if (VECTOR_BOOLEAN_TYPE_P (vector_type))
>   {
> +   tree true_val
> + = build_all_ones_cst (TREE_TYPE (vector_type));
> +   tree false_val
> + = build_zero_cst (TREE_TYPE (vector_type));
> gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
> -   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, op);
> +   init_stmt = gimple_build_assign (new_temp, COND_EXPR,
> +op, true_val,
> +false_val);

So this ends up generating { a ? -1 : 0, b ? -1 : 0, ... }.  That
might be less optimal than doing { a, b, ... } ? { -1, -1 ... } : { 0, 0, 
.. }
though I'm not sure we can easily construct a "proper" vector boolean
from _Bool values either.

Thus the patch is ok.

Thanks,
Richard.

>   }
> else
>   {
> --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj2016-06-03 17:05:37.693475438 
> +0200
> +++ gcc/testsuite/gcc.dg/vect/pr71259.c   2016-06-03 17:05:32.418544731 
> +0200
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/71259 */
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +long a, b[1][44][2];
> +long long c[44][17][2];
> +
> +int
> +main ()
> +{
> +  int i, j, k;
> +  check_vect ();
> +  asm volatile ("" : : : "memory");
> +  for (i = 0; i < 44; i++)
> +for (j = 0; j < 17; j++)
> +  for (k = 0; k < 2; k++)
> + c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - 
> 5105075050047261684;
> +  asm volatile ("" : : : "memory");
> +  for (i = 0; i < 44; i++) 
> +for (j = 0; j < 17; j++)
> +  for (k = 0; k < 2; k++)
> + if (c[i][j][k] != -5105075050047261684)
> +   __builtin_abort ();
> +  return 0;
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: loop-ch tweek

2016-06-06 Thread Richard Biener
On Sun, 5 Jun 2016, Jan Hubicka wrote:

> Hi,
> both loop-ch and loop-ivcanon want to trottle down the heuristics on paths
> containing call. Testing for presence of GIMPLE_CALL is wrong for internal
> call and cheap builtins that are expanded inline.
> 
> Bootstrapped/regtested x86_64-linux, OK?

First of all the name is bad - I'd say gimple_inexpensive_call_p ()
is better.  More comments below.

> Honza
> 
>   * gimple.c: Include builtins.h.
>   (gimple_real_call_p): New function.
>   * gimple.h (gimple_real_call_p): Declare.
>   * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Use it.
>   * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Likewise.
> Index: gimple.c
> ===
> --- gimple.c  (revision 237101)
> +++ gimple.c  (working copy)
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
>  #include "gimple-walk.h"
>  #include "gimplify.h"
>  #include "target.h"
> +#include "builtins.h"
>  
>  
>  /* All the tuples have their operand vector (if present) at the very bottom
> @@ -3018,3 +3019,20 @@ maybe_remove_unused_call_args (struct fu
>update_stmt_fn (fn, stmt);
>  }
>  }
> +
> +/* Return true if STMT will likely expand to real call statment.  */
> +
> +bool
> +gimple_real_call_p (gimple *stmt)
> +{
> +  if (gimple_code (stmt) != GIMPLE_CALL)
> +return false;
> +  if (gimple_call_internal_p (stmt))
> +return false;
> +  tree decl = gimple_call_fndecl (stmt);
> +  if (decl && DECL_IS_BUILTIN (decl)
> +  && (is_simple_builtin (decl)
> +   || is_inexpensive_builtin (decl)))

is_inexpensive_builtin includes is_simple_builtin handling and also
tests decl && DECL_IS_BUILTIN properly already.

> +return false;
> +  return true;
> +}
> Index: gimple.h
> ===
> --- gimple.h  (revision 237101)
> +++ gimple.h  (working copy)
> @@ -1525,6 +1525,7 @@ extern void preprocess_case_label_vec_fo
>  extern void gimple_seq_set_location (gimple_seq, location_t);
>  extern void gimple_seq_discard (gimple_seq);
>  extern void maybe_remove_unused_call_args (struct function *, gimple *);
> +extern bool gimple_real_call_p (gimple *);
>  
>  /* Formal (expression) temporary table handling: multiple occurrences of
> the same scalar expression are evaluated into the same temporary.  */
> Index: tree-ssa-loop-ch.c
> ===
> --- tree-ssa-loop-ch.c(revision 237101)
> +++ tree-ssa-loop-ch.c(working copy)
> @@ -118,7 +118,7 @@ should_duplicate_loop_header_p (basic_bl
>if (is_gimple_debug (last))
>   continue;
>  
> -  if (is_gimple_call (last))
> +  if (gimple_real_call_p (last))
>   {
> if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file,
> Index: tree-ssa-loop-ivcanon.c
> ===
> --- tree-ssa-loop-ivcanon.c   (revision 237101)
> +++ tree-ssa-loop-ivcanon.c   (working copy)
> @@ -339,15 +339,11 @@ tree_estimate_loop_size (struct loop *lo
>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>   {
> gimple *stmt = gsi_stmt (gsi);
> -   if (gimple_code (stmt) == GIMPLE_CALL)
> +   if (gimple_code (stmt) == GIMPLE_CALL
> +   && gimple_real_call_p (stmt))

gimple_code (stmt) == GIMPLE_CALL is redundant then.  I'd prefer to
make gimple_inexpensive_call_p take a gcall * argument and do the
test at the callers though.

>   {
> int flags = gimple_call_flags (stmt);
> -   tree decl = gimple_call_fndecl (stmt);
> -
> -   if (decl && DECL_IS_BUILTIN (decl)
> -   && is_inexpensive_builtin (decl))
> - ;
> -   else if (flags & (ECF_PURE | ECF_CONST))
> +   if (flags & (ECF_PURE | ECF_CONST))
>   size->num_pure_calls_on_hot_path++;
> else
>   size->num_non_pure_calls_on_hot_path++;
> 
>