date:20150803

Re: [PATCH 2/6] Introduce new edge_summary class and replace ipa_edge_args_sum.

2015-08-03 Thread Martin Liška

On 07/16/2015 04:05 PM, Martin Liška wrote:
 On 07/10/2015 03:31 PM, Martin Jambor wrote:
 Hi,

 thanks for working on this and sorry for a tad late review:

 On Thu, Jul 09, 2015 at 11:13:52AM +0200, Martin Liska wrote:
 gcc/ChangeLog:

 2015-07-03  Martin Liska  mli...@suse.cz

 * cgraph.c (symbol_table::create_edge): Introduce summary_uid
 for cgraph_edge.
 * cgraph.h (struct GTY): Likewise.

 struct GTY does not look right :-)

 * ipa-inline-analysis.c (estimate_function_body_sizes): Use
 new data structure.
 * ipa-profile.c (ipa_profile): Likewise.
 * ipa-prop.c (ipa_print_node_jump_functions):

   Likewise.

 (ipa_propagate_indirect_call_infos): Likewise.
 (ipa_free_edge_args_substructures): Likewise.
 (ipa_free_all_edge_args): Likewise.
 (ipa_edge_args_t::remove): Likewise.
 (ipa_edge_removal_hook): Likewise.
 (ipa_edge_args_t::duplicate): Likewise.
 (ipa_register_cgraph_hooks): Likewise.
 (ipa_unregister_cgraph_hooks): Likewise.
 * ipa-prop.h (ipa_check_create_edge_args): Likewise.
 (ipa_edge_args_info_available_for_edge_p): Likewise.

 Definition of ipa_edge_args_t is missing here.

 * symbol-summary.h (gt_ggc_mx): Indent properly.
 (gt_pch_nx): Likewise.
 (edge_summary): New class.
 ---
  gcc/cgraph.c  |   2 +
  gcc/cgraph.h  |   5 +-
  gcc/ipa-inline-analysis.c |   2 +-
  gcc/ipa-profile.c |   2 +-
  gcc/ipa-prop.c|  71 +++-
  gcc/ipa-prop.h|  44 ++
  gcc/symbol-summary.h  | 208 
 +-
  7 files changed, 252 insertions(+), 82 deletions(-)


 ...

 diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
 index e6725aa..f0af9b2 100644
 --- a/gcc/ipa-prop.h
 +++ b/gcc/ipa-prop.h
 @@ -493,13 +493,36 @@ public:
  extern ipa_node_params_t *ipa_node_params_sum;
  /* Vector of IPA-CP transformation data for each clone.  */
  extern GTY(()) vecipcp_transformation_summary, va_gc 
 *ipcp_transformations;
 -/* Vector where the parameter infos are actually stored. */
 -extern GTY(()) vecipa_edge_args, va_gc *ipa_edge_args_vector;
 +
 +/* Function summary for ipa_node_params.  */
 +class GTY((user)) ipa_edge_args_t: public edge_summary ipa_edge_args *
 +{
 +public:
 +  ipa_edge_args_t (symbol_table *symtab):
 +edge_summary ipa_edge_args* (symtab, true) { }
 +
 +  static ipa_edge_args_t *create_ggc (symbol_table *symtab)
 +  {

 Please move the body of this function to where the bodies of the rest
 of the member functions are.

 +ipa_edge_args_t *summary = new (ggc_cleared_alloc ipa_edge_args_t ())
 +  ipa_edge_args_t (symtab);
 +return summary;
 +  }
 +
 +  /* Hook that is called by summary when a node is duplicated.  */
 +  virtual void duplicate (cgraph_edge *edge,
 + cgraph_edge *edge2,
 + ipa_edge_args *data,
 + ipa_edge_args *data2);
 +
 +  virtual void remove (cgraph_edge *edge, ipa_edge_args *data);
 +};
 +
 +extern GTY(()) edge_summary ipa_edge_args * *ipa_edge_args_sum;
  
  /* Return the associated parameter/argument info corresponding to the given
 node/edge.  */
  #define IPA_NODE_REF(NODE) (ipa_node_params_sum-get (NODE))
 -#define IPA_EDGE_REF(EDGE) ((*ipa_edge_args_vector)[(EDGE)-uid])
 +#define IPA_EDGE_REF(EDGE) (ipa_edge_args_sum-get (EDGE))
  /* This macro checks validity of index returned by
 ipa_get_param_decl_index function.  */
  #define IS_VALID_JUMP_FUNC_INDEX(I) ((I) != -1)
 @@ -532,19 +555,8 @@ ipa_check_create_node_params (void)
  static inline void
  ipa_check_create_edge_args (void)
  {
 -  if (vec_safe_length (ipa_edge_args_vector)
 -  = (unsigned) symtab-edges_max_uid)
 -vec_safe_grow_cleared (ipa_edge_args_vector, symtab-edges_max_uid + 
 1);
 -}
 -
 -/* Returns true if the array of edge infos is large enough to accommodate 
 an
 -   info for EDGE.  The main purpose of this function is that debug dumping
 -   function can check info availability without causing reallocations.  */
 -
 -static inline bool
 -ipa_edge_args_info_available_for_edge_p (struct cgraph_edge *edge)
 -{
 -  return ((unsigned) edge-uid  vec_safe_length (ipa_edge_args_vector));
 +  if (ipa_edge_args_sum == NULL)
 +ipa_edge_args_sum = ipa_edge_args_t::create_ggc (symtab);
  }
  
  static inline ipcp_transformation_summary *
 diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
 index eefbfd9..5799443 100644
 --- a/gcc/symbol-summary.h
 +++ b/gcc/symbol-summary.h
 @@ -108,7 +108,7 @@ public:
/* Allocates new data that are stored within map.  */
T* allocate_new ()
{
 -return m_ggc ? new (ggc_alloc T ()) T() : new T () ;
 +return m_ggc ? new (ggc_alloc T ()) T () : new T () ;
}
  
/* Release an item that is stored within map.  */
 @@ -234,7 +234,7 @@ private:
  
  template typename T
  void
 -gt_ggc_mx(function_summaryT ** const summary)

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-08-03 Thread Mikael Morin


Le 03/08/2015 14:36, Paul Richard Thomas a écrit :

Dear Mikael,

Thanks for your green light!

I have been mulling over the trans-decl part of the patch and having
been wondering if it is necessary.
You mean marking entities as public?  Or setting the hidden visibility 
attribute?  Or both?

I think both are necessary.


Without optimization, private
entities can be linked to. Given the discussion concerning the
combination of submodules and private entities, I wonder if this is
not sufficient? Within submodule scope, an advisory could be given for
undefined references to suggest recompiling the module without
optimization or making the entities public.


About recompiling without optimization:
If the module contains no code, I guess that would be OK.
But otherwise, it would be pretty bad.
And one would have to do the same for submodules of a submodule: the 
parent submodule would be compiled without optimization. :-(


About making the entities public:
I think the goal of submodules is providing a way to specify a 
(hopefully) stable interface free of any internal implementation details 
that users would start playing with if the opportunity was given to 
them.  Making all entities public would go against that.



I've been reading about the hidden visibility attribute since you 
submitted the 3/3 patch(es).  I think it's the right thing. :-)


Mikael

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Kyrill Tkachov



On 03/08/15 16:45, Uros Bizjak wrote:

On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:


Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
don't think this is a good idea. In the expander, there is already
quite some target-dependent code that goes great length to utilize sbb
insn as much as possible, before cmove is used.

IMO, as far as x86 is concerned, the best solution would be to revert
the change. ix86_expand_int_movcc already does some tricks from your
patch in a target-efficient way. Generic change that was introduced by
your patch now interferes with this expansion.


Well, technically the transformation was already there, it was just never
reached for an x86 compilation because noce_try_cmove was tried in front of
it
and used a target-specific expansion.
In any case, how's this proposal?
The transformation noce_try_store_flag_constants
   /* if (test) x = a; else x = b;
  =   x = (-(test != 0)  (b - a)) + a;  */

Is a catch-all-immediates transformation in noce_try_store_flag_constants.
What if we moved it to noce_try_cmove and performed it only if the
target-specific
conditional move expansion there failed?

That way we can try the x86_64-specific sequence first and still give the
opportunity
to noce_try_store_flag_constants to perform the transformations that can
benefit targets
that don't have highly specific conditional move expanders.

Yes, let's try this approach. As was found out, some targets (e.g.
x86) hide lots of different target-dependent expansion strategies into
movcc expander. Perhaps this fact should be documented in the comment
in the generic code?


Ok, I'll work on that approach and add a comment.

Thanks,
Kyrill




Uros.

Re: [PATCH] Build *-match.o as early as possible

2015-08-03 Thread Richard Biener

On August 3, 2015 5:03:05 PM GMT+02:00, Segher Boessenkool 
seg...@kernel.crashing.org wrote:
The two files *-match.o files always finish building last, so if we
start building them as soon as possible (instead of pretty late) the
total build time will be less on a parallel build.

Bootstrapped and tested on powerpc64-linux.  Is this okay for trunk?

OK.

Richard.


Segher


2014-080-3  Segher Boessenkool  seg...@kernel.crashing.org

   * Makefile.in (OBJS): Put gimple-match.o and generic-match.o first.

---
 gcc/Makefile.in | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index be259e8..683c42a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1162,10 +1162,12 @@ C_COMMON_OBJS = c-family/c-common.o
c-family/c-cppbuiltin.o c-family/c-dump.o \
   c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o
 
 # Language-independent object files.
-# We put the insn-*.o files first so that a parallel make will build
-# them sooner, because they are large and otherwise tend to be the
-# last objects to finish building.
+# We put the *-match.o and insn-*.o files first so that a parallel
make
+# will build them sooner, because they are large and otherwise tend to
be
+# the last objects to finish building.
 OBJS = \
+  gimple-match.o \
+  generic-match.o \
   insn-attrtab.o \
   insn-automata.o \
   insn-dfatab.o \
@@ -1260,8 +1262,6 @@ OBJS = \
   gimple-fold.o \
   gimple-laddress.o \
   gimple-low.o \
-  gimple-match.o \
-  generic-match.o \
   gimple-pretty-print.o \
   gimple-ssa-isolate-paths.o \
   gimple-ssa-strength-reduction.o \

Re: [BUILDROBOT] Go runtime: calling ‘__builtin_frame_address’ with a nonzero argument is unsafe

2015-08-03 Thread Jeff Law


On 08/03/2015 08:48 AM, Martin Sebor wrote:

On 08/03/2015 05:55 AM, Jan-Benedict Glaw wrote:

On Sun, 2015-08-02 17:15:27 -0600, Martin Sebor mse...@gmail.com wrote:

OK for the trunk.  Sorry for the delay.


Thank you. Committed in revision 226480.


...und breaks native builds. When doing builds using config-list.mk, I
first build a GCC for the build machine, then re-build a
cross-configured GCC with that.


I don't know if pragma GCC diagnostic is valid in Go (still waiting
for my build to finish to confirm) but disabling the warning in
cases where the calls are known to be safe should fix the compilation
error.
My understanding is the runtime is shared across gcc-go and golang.  So 
the push/pop diagnostics may not be appropriate.


Ian Taylor should have the final say about the best way forward.

Jeff

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Kyrill Tkachov



On 03/08/15 15:15, Uros Bizjak wrote:

On Mon, Aug 3, 2015 at 3:43 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

On 03/08/15 14:37, Uros Bizjak wrote:

On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov kyrylo.tkac...@arm.com
wrote:

On 03/08/15 13:33, Uros Bizjak wrote:

Hello!


2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

   * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
   when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
   explicit.  Prefer to add the flag whenever possible.
   (noce_process_if_block): Try noce_try_store_flag_constants before
   noce_try_cmove.

2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

   * gcc.target/aarch64/csel_bfx_1.c: New test.
   * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

This patch regressed following tests on x86_64:

FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]


Sorry for that :(
I could have sworn they passed for me during my run...


While cmov3 case is questionable by itself in light of PR56309 [1],
the cnov2 case regressed from:

  cmpl%edi, %esi
  sbbl%eax, %eax
  andl$-10, %eax
  addl$5, %eax
  ret

to:

  xorl%eax, %eax
  cmpl%esi, %edi
  setbe   %al
  negl%eax
  andl$10, %eax
  subl$5, %eax
  ret

Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
anymore. Non-QImode setcc instructions are problematic on x86, since
they need to be zero-extended to their full length.

IMO, we can check if the target is able to generate insn that
implements conditional move between 0 and -1 for certain comparison
mode, like:

#(insn:TI 27 26 28 2 (parallel [
#(set (reg:SI 0 ax [orig:87 D.1845 ] [87])
#(if_then_else:SI (ltu:SI (reg:CC 17 flags)
#(const_int 0 [0]))
#(const_int -1 [0x])
#(const_int 0 [0])))
#(clobber (reg:CC 17 flags))
#]) cmov2.c:9 947 {*x86_movsicc_0_m1}
# (expr_list:REG_DEAD (reg:CC 17 flags)
#(expr_list:REG_UNUSED (reg:CC 17 flags)
#(nil
 sbbl%eax, %eax  # 27*x86_movsicc_0_m1   [length =
2]

this is the key insn, and as shown above, further asm sequence is
similar between patched and unpatched compiler.


Hmm yes.
We have a HAVE_conditional_move check, but that's not fine grained enough.
How about trying an emit_conditional_move and checking that the result is a
single insn?

Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
don't think this is a good idea. In the expander, there is already
quite some target-dependent code that goes great length to utilize sbb
insn as much as possible, before cmove is used.

IMO, as far as x86 is concerned, the best solution would be to revert
the change. ix86_expand_int_movcc already does some tricks from your
patch in a target-efficient way. Generic change that was introduced by
your patch now interferes with this expansion.


Well, technically the transformation was already there, it was just never
reached for an x86 compilation because noce_try_cmove was tried in front of it
and used a target-specific expansion.
In any case, how's this proposal?
The transformation noce_try_store_flag_constants
  /* if (test) x = a; else x = b;
 =   x = (-(test != 0)  (b - a)) + a;  */

Is a catch-all-immediates transformation in noce_try_store_flag_constants.
What if we moved it to noce_try_cmove and performed it only if the 
target-specific
conditional move expansion there failed?

That way we can try the x86_64-specific sequence first and still give the 
opportunity
to noce_try_store_flag_constants to perform the transformations that can 
benefit targets
that don't have highly specific conditional move expanders.

Kyrill



Uros.

Re: C++ delayed folding branch review

2015-08-03 Thread Jason Merrill


On 08/03/2015 05:42 AM, Kai Tietz wrote:

2015-08-03 5:49 GMT+02:00 Jason Merrill ja...@redhat.com:

On 07/31/2015 05:54 PM, Kai Tietz wrote:


The STRIP_NOPS-requirement in 'reduced_constant_expression_p' I could
remove, but for one case in constexpr.  Without folding we don't do
type-sinking/raising.



Right.


So binary/unary operations might be containing cast, which were in the
past unexpected.


Why aren't the casts folded away?


On such cast constructs, as for this vector-sample, we can't fold away


Which testcase is this?


the cast chain.  The difference here to none-delayed-folding branch is
that the cast isn't moved out of the plus-expr.  What we see now is
(plus ((vec) (const vector ...) {  }), ...).  Before we had (vec)
(plus (const vector ...) { ... }).


How could a PLUS_EXPR be considered a reduced constant, regardless of 
where the cast is?



On verify_constant we check by reduced_constant_expression_p, if value is
a constant.  We don't handle here, that NOP_EXPRs are something we want to
look through here, as it doesn't change anything if this is a constant, or
not.


NOPs around constants should have been folded away by the time we get there.


Not in this cases, as the we actually have here a switch from const to
none-const.  So there is an attribute-change, which we can't ignore in
general.


I wasn't suggesting we ignore it, we should be able to change the type 
of the vector_cst.



But I agree that for constexpr's we could special case cast
from const to none-const (as required in expressions like const vec v
= v + 1).


Right.  But really this should happen in convert.c, it shouldn't be 
specific to C++.


Jason

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Uros Bizjak

On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
 don't think this is a good idea. In the expander, there is already
 quite some target-dependent code that goes great length to utilize sbb
 insn as much as possible, before cmove is used.

 IMO, as far as x86 is concerned, the best solution would be to revert
 the change. ix86_expand_int_movcc already does some tricks from your
 patch in a target-efficient way. Generic change that was introduced by
 your patch now interferes with this expansion.


 Well, technically the transformation was already there, it was just never
 reached for an x86 compilation because noce_try_cmove was tried in front of
 it
 and used a target-specific expansion.
 In any case, how's this proposal?
 The transformation noce_try_store_flag_constants
   /* if (test) x = a; else x = b;
  =   x = (-(test != 0)  (b - a)) + a;  */

 Is a catch-all-immediates transformation in noce_try_store_flag_constants.
 What if we moved it to noce_try_cmove and performed it only if the
 target-specific
 conditional move expansion there failed?

 That way we can try the x86_64-specific sequence first and still give the
 opportunity
 to noce_try_store_flag_constants to perform the transformations that can
 benefit targets
 that don't have highly specific conditional move expanders.

Yes, let's try this approach. As was found out, some targets (e.g.
x86) hide lots of different target-dependent expansion strategies into
movcc expander. Perhaps this fact should be documented in the comment
in the generic code?

Uros.

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Ilya Enkovich

2015-08-03 17:04 GMT+03:00 Uros Bizjak ubiz...@gmail.com:
 On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 On 03/08/15 13:33, Uros Bizjak wrote:

 Hello!

 2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
  when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
  explicit.  Prefer to add the flag whenever possible.
  (noce_process_if_block): Try noce_try_store_flag_constants before
  noce_try_cmove.

 2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * gcc.target/aarch64/csel_bfx_1.c: New test.
  * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

 This patch regressed following tests on x86_64:

 FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
 FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]

 The difference for cmov3.c on x86_64 is:

cmpl%esi, %edi
movl$-5, %edx
movl$5, %eax
cmovg   %edx, %eax
ret

 vs. new code:

xorl%eax, %eax
cmpl%esi, %edi
setle   %al
negl%eax
andl$10, %eax
subl$5, %eax
ret

 I'm not sure old code is really better than new. HJ, do you have any
 better insight?

 Uros.

The original code looks better, tree height is just 2 and therefore it
can be executed in 2 cycles. New code has more dependencies and tree
height becomes 5. It is always hard to say for all x86 targets but as
a generic code the original version is better.

Thanks,
Ilya

[patch] libstdc++/67078 change _N bad name in new container access functions

2015-08-03 Thread Jonathan Wakely


We can't use _N as an identifier, as per
https://gcc.gnu.org/onlinedocs/libstdc++/manual/source_code_style.html

Tested ppc64le-linux, committed to trunk.

commit ae65b96cfb69c64d636f86d9f9154da2364e0440
Author: Jonathan Wakely jwak...@redhat.com
Date:   Mon Aug 3 16:29:44 2015 +0100

	PR libstdc++/67078
	* include/bits/range_access.h (size, empty, data): Fix _N bad name.

diff --git a/libstdc++-v3/include/bits/range_access.h b/libstdc++-v3/include/bits/range_access.h
index 2a10598..586d162 100644
--- a/libstdc++-v3/include/bits/range_access.h
+++ b/libstdc++-v3/include/bits/range_access.h
@@ -237,10 +237,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @brief  Return the size of an array.
*  @param  __array  Array.
*/
-  template typename _Tp, size_t _N
+  template typename _Tp, size_t _Nm
 constexpr size_t
-size(const _Tp (/*__array*/)[_N]) noexcept
-{ return _N; }
+size(const _Tp (/*__array*/)[_Nm]) noexcept
+{ return _Nm; }
 
   /**
*  @brief  Return whether a container is empty.
@@ -255,9 +255,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @brief  Return whether an array is empty (always false).
*  @param  __array  Container.
*/
-  template typename _Tp, size_t _N
+  template typename _Tp, size_t _Nm
 constexpr bool
-empty(const _Tp (/*__array*/)[_N]) noexcept
+empty(const _Tp (/*__array*/)[_Nm]) noexcept
 { return false; }
 
   /**
@@ -291,9 +291,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @brief  Return the data pointer of an array.
*  @param  __array  Array.
*/
-  template typename _Tp, size_t _N
+  template typename _Tp, size_t _Nm
 constexpr _Tp*
-data(_Tp (__array)[_N]) noexcept
+data(_Tp (__array)[_Nm]) noexcept
 { return __array; }
 
   /**

Re: [BUILDROBOT] Go runtime: calling ‘__builtin_frame_address’ with a nonzero argument is unsafe

2015-08-03 Thread Martin Sebor


On 08/03/2015 05:55 AM, Jan-Benedict Glaw wrote:

On Sun, 2015-08-02 17:15:27 -0600, Martin Sebor mse...@gmail.com wrote:

OK for the trunk.  Sorry for the delay.


Thank you. Committed in revision 226480.


...und breaks native builds. When doing builds using config-list.mk, I
first build a GCC for the build machine, then re-build a
cross-configured GCC with that.


I don't know if pragma GCC diagnostic is valid in Go (still waiting
for my build to finish to confirm) but disabling the warning in
cases where the calls are known to be safe should fix the compilation
error.

Index: runtime/mprof.goc
===
--- runtime/mprof.goc(revision 226505)
+++ runtime/mprof.goc(working copy)
@@ -404,10 +404,15 @@
 func Stack(b Slice, all bool) (n int) {
 byte *pc, *sp;
 bool enablegc;
-
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored -Wframe-address
+
 sp = runtime_getcallersp(b);
 pc = (byte*)(uintptr)runtime_getcallerpc(b);

+#pragma GCC diagnostic pop
+
 if(all) {
 runtime_semacquire(runtime_worldsema, false);
 runtime_m()-gcing = 1;

Martin



   While building GCC targeting the build=host machine, I get ie:

http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=459572

/bin/bash ./libtool --tag=CC   --mode=compile 
/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/xgcc
 
-B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/
 
-B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/bin/
 
-B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/lib/
 -isystem 
/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/include
 -isystem 
/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/sys-include
-DHAVE_CONFIG_H -I. -I/home/jbglaw/repos-configlist_mk/gcc/libgo  -I 
/home/jbglaw/repos-configlist_mk/gcc/libgo/runtime 
-I/home/jbglaw/repos-configlist_mk/gcc/libgo/../libffi/include 
-I../libffi/include -pthread  -fexceptions -fnon-call-exceptions 
-fplan9-extensions -fsplit-stack -Wall -Wextra -Wwrite-strings -Wcast-qual 
-Werror -minline-all-stringops -D_GNU_SOURCE -D_LAR

GEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I 
/home/jbglaw/repos-configlist_mk/gcc/libgo/../libgcc -I 
/home/jbglaw/repos-configlist_mk/gcc/libgo/../libbacktrace -I ../../gcc/include 
-g -O2 -MT mprof.lo -MD -MP -MF .deps/mprof.Tpo -c -o mprof.lo mprof.c

libtool: compile:  
/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/xgcc
 
-B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/
 
-B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/bin/
 
-B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/lib/
 -isystem 
/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/include
 -isystem 
/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/sys-include
 -DHAVE_CONFIG_H -I. -I/home/jbglaw/repos-configlist_mk/gcc/libgo -I 
/home/jbglaw/repos-configlist_mk/gcc/libgo/runtime 
-I/home/jbglaw/repos-configlist_mk/gcc/libgo/../libffi/include 
-I../libffi/include -pthread -fexceptions -fnon-call-exceptions 
-fplan9-extensions -fsplit-stack -Wall -Wextra -Wwrite-strings -Wcast-qual 
-Werror -minline-all-stringops -D_GNU_SOURCE -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BIT

S=64 -I /home/jbglaw/repos-configlist_mk/gcc/libgo/../libgcc -I 
/home/jbglaw/repos-configlist_mk/gcc/libgo/../libbacktrace -I ../../gcc/include 
-g -O2 -MT mprof.lo -MD -MP -MF .deps/mprof.Tpo -c mprof.c  -fPIC -DPIC -o 
.libs/mprof.o

/home/jbglaw/repos-configlist_mk/gcc/libgo/runtime/mprof.goc: In function 
‘runtime_Stack’:
/home/jbglaw/repos-configlist_mk/gcc/libgo/runtime/mprof.goc:408:5: error: 
calling ‘__builtin_frame_address’ with a nonzero argument is unsafe 
[-Werror=frame-address]
   sp = runtime_getcallersp(b);
  ^
cc1: all warnings being treated as errors
Makefile:2613: recipe for target 'mprof.lo' failed
make[4]: *** [mprof.lo] Error 1



Seems Go's runtime uses that feature.

MfG, JBG

[gomp4] OpenACC first private

2015-08-03 Thread Nathan Sidwell

I've committed this patch to gomp4.  The existing implementation of firstprivate 
presumes the existence of memory at the CTA level.  This patch does away with 
that, treating firstprivate as thread-private variables initialized from the 
host.


During development there was some fallout from declare handling, as that wasn't 
 creating the expected omp_region context object.  The previous handling of 
firstprivate just happened to work.  Jim has been working on resolving that problem.


nathan
2015-08-03  Nathan Sidwell  nat...@codesourcery.com

	* gimplify.c (GOVD_GANGLOCAL): Delete.
	(oacc_default_clause): Only derereference reference types. Mark
	firstprivate as GOVD_FIRSTPRIVATE.
	(gimplify_adjust_omp_clauses_1): Remove GANGLOCALL handling.
	(gimplify_omp_for): Remove bogus OpenACC outer context lookup.
	* omp-low.c (build_outer_var_ref): Simplify openacc outer ref
	lookup.
	(scan_sharing_clauses): Handle openacc firstprivate.
	(lower_omp_target): Handle openacc firstprivate.

	c/
	* c-parser.c (c_parser_oacc_data_clause): Remove firstprivate
	handling.
	(c_parser_oac_all_clauses): Firstpribsste is a firstprivate
	clause.
	* c-typeck.c (c_finish_omp_clauses): Remove GANGLOCAL handling.

	fortran/
	* trans-openmp.c (gfc_trans_omp_clauses_1): Remove GANGLOCAL
	handling.
	* gfortran.h (OMP_MAP_GANGLOCAL): Delete.
	(OMP_MAP_FORCE_TO_GANGLOCAL):  Likewise.
	* openmp.c (gfc_match_omp_clauses): Remove openacc specific
	firstprivate handling.

	testsuite/
	* gfortran.dg/goacc/parallel-tree.f95: Remove ganglocal
	expectation.
	* gfortran.dg/goacc/list.f95: Stop expected firstprivate to be a
	data clause.
	* c-c++-common/goacc/firstprivate.c: Likewise.

	cp/
	* semantics.c (finish_omp_clauses): Remove OpenACC-specific
	firstprivate handling.
	* parser.c (cp_parser_oacc_data_clause): Remove firstprivate here.
	(cp_parser_oacc_all_clauses): First private is a firstprivate clause.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c	(revision 226462)
+++ gcc/gimplify.c	(working copy)
@@ -94,9 +94,6 @@ enum gimplify_omp_var_data
 
   GOVD_FORCE_MAP = 1  16,
 
-  /* Gang-local OpenACC variable.  */
-  GOVD_GANGLOCAL = 1  17,
-
   /* OpenACC deviceptr clause.  */
   GOVD_USE_DEVPTR = 1  18,
 
@@ -5937,14 +5934,13 @@ oacc_default_clause (struct gimplify_omp
 	if (is_global_var (decl)  device_resident_p (decl))
 	  flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;
 	else if (ctx-acc_region_kind == ARK_KERNELS)
-	  /* Scalars under kernels are default 'copy'.  */
+	  /* Everything under kernels are default 'copy'.  */
 	  flags |= GOVD_FORCE_MAP | GOVD_MAP;
 	else if (ctx-acc_region_kind == ARK_PARALLEL)
 	  {
 	tree type = TREE_TYPE (decl);
 
-	/*  Should this  be REFERENCE_TYPE_P? */
-	if (POINTER_TYPE_P (type))
+	if (TREE_CODE (type) == REFERENCE_TYPE)
 	  type = TREE_TYPE (type);
 	
 	if (AGGREGATE_TYPE_P (type))
@@ -5952,12 +5948,12 @@ oacc_default_clause (struct gimplify_omp
 	  flags |= GOVD_MAP;
 	else
 	  /* Scalars default to 'firstprivate'.  */
-	  flags |= GOVD_GANGLOCAL | GOVD_MAP_TO_ONLY | GOVD_MAP;
+	  flags |= GOVD_FIRSTPRIVATE;
 	  }
 	else
 	  gcc_unreachable ();
   }
-  break;
+break;
 }
   
   return flags;
@@ -6812,10 +6808,7 @@ gimplify_adjust_omp_clauses_1 (splay_tre
   else if (code == OMP_CLAUSE_MAP)
 {
   OMP_CLAUSE_SET_MAP_KIND (clause,
-			   flags  GOVD_MAP_TO_ONLY
-			   ? (flags  GOVD_GANGLOCAL
-  ? GOMP_MAP_FORCE_TO_GANGLOCAL
-  : GOMP_MAP_TO)
+			   flags  GOVD_MAP_TO_ONLY ? GOMP_MAP_TO
 			   : (flags  GOVD_FORCE_MAP
   ? GOMP_MAP_FORCE_TOFROM
   : GOMP_MAP_TOFROM));
@@ -7542,11 +7535,7 @@ gimplify_omp_for (tree *expr_p, gimple_s
   else if (omp_is_private (gimplify_omp_ctxp, decl, 0))
 	omp_notice_variable (gimplify_omp_ctxp, decl, true);
   else
-	{
-	  if (ork == ORK_OACC  gimplify_omp_ctxp-outer_context)
-	omp_notice_variable (gimplify_omp_ctxp-outer_context, decl, true);
-	  omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN);
-	}
+	omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN);
 
   /* If DECL is not a gimple register, create a temporary variable to act
 	 as an iteration counter.  This is valid, since DECL cannot be
Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c	(revision 226462)
+++ gcc/c/c-parser.c	(working copy)
@@ -10719,9 +10719,6 @@ c_parser_oacc_data_clause (c_parser *par
 case PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT:
   kind = GOMP_MAP_DEVICE_RESIDENT;
   break;
-case PRAGMA_OACC_CLAUSE_FIRSTPRIVATE:
-  kind = GOMP_MAP_FORCE_TO_GANGLOCAL;
-  break;
 case PRAGMA_OACC_CLAUSE_HOST:
   kind = GOMP_MAP_FORCE_FROM;
   break;
@@ -12316,7 +12313,7 @@ c_parser_oacc_all_clauses (c_parser *par
 	  c_name = deviceptr;
 	  break;
 	case PRAGMA_OACC_CLAUSE_FIRSTPRIVATE:
-	  clauses =

[gomp4] Fix ptx warning

2015-08-03 Thread Nathan Sidwell

I've committed this to gomp4.  I  was reusing a size_t variable in a loop where 
unsigned would do, and fprintf got upset.


nathan
2015-08-03  Nathan Sidwell  nat...@codesourcery.com

	* config/nvptx/mkoffload.c (process): Avoid printf formatting
	warning.

Index: config/nvptx/mkoffload.c
===
--- config/nvptx/mkoffload.c	(revision 226504)
+++ config/nvptx/mkoffload.c	(working copy)
@@ -225,16 +225,16 @@ access_check (const char *name, int mode
 static void
 process (FILE *in, FILE *out)
 {
-  size_t len;
+  size_t len = 0;
   const char *input = read_file (in, len);
   const char *comma;
   id_map const *id;
   unsigned obj_count = 0;
-  size_t i;
+  unsigned ix;
 
   /* Dump out char arrays for each PTX object file.  These are
  terminated by a NUL.  */
-  for (i = 0; i != len;)
+  for (size_t i = 0; i != len;)
 {
   char c;
   
@@ -280,8 +280,8 @@ process (FILE *in, FILE *out)
 	 const char *code;\n
 	 __SIZE_TYPE__ size;\n
 	   } ptx_objs[] = {);
-  for (comma = , i = 0; i != obj_count; comma = ,, i++)
-fprintf (out, %s\n\t{ptx_code_%u, sizeof (ptx_code_%u)}, comma, i, i);
+  for (comma = , ix = 0; ix != obj_count; comma = ,, ix++)
+fprintf (out, %s\n\t{ptx_code_%u, sizeof (ptx_code_%u)}, comma, ix, ix);
   fprintf (out, \n};\n\n);
 
   /* Dump out variable idents.  */

[PING][PATCH][AArch64] Change aarch64 vector cost to match vectorizer

2015-08-03 Thread Pawel Kupidura


 Hi,

Ping for the patch submitted here 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02251.html


Thanks,
Pawel Kupidura

Re: Fix two more memory leaks in threader

2015-08-03 Thread Jeff Law


On 07/20/2015 08:19 AM, James Greenhalgh wrote:


I think we either want to defer the unordered_remove until we're done
processing all the vector elements, or make sure to look at element 'i'
again after we've moved something new in to it.
Correct.  Two loops had this mistake -- while others got it right. 
Sigh.  The easiest fix is to change how we increment i in those loops 
so that it's only incremented if we do not delete the path.  That 
happens to also match how other cases are handled.


Thanks for catching these mistakes.



A testcase would need to expose at least two threads which we later want
to cancel, one of which ends up at the end of the vector of threading
opportunities. We should then see only the first of those threads
actually get cancelled, and the second skipped over. Reproducing these
conditions is quite tough, which has stopped me finding a useful example
for the list, I'll be sure to follow-up this thread if I do get to one.

Thankfully BZ had two closely related testcases.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.


(Unfortunately it didn't resolve the ppc64 issue I was looking at ;(


Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fb9d115..9abde9f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2015-08-03  Jeff Law  l...@redhat.com
+
+   PR middle-end/66314
+   PR gcov-profile/66899
+   * tree-ssa-threadupdate.c (mark_threaded_blocks): Correctly
+   iterate over the jump threading paths when an element in the
+   jump threading paths array is eliminated.
+
 2015-08-03  Segher Boessenkool  seg...@kernel.crashing.org
 
* Makefile.in (OBJS): Put gimple-match.o and generic-match.o first.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index a403767..0a841b5 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2015-08-03  Jeff Law  l...@redhat.com
+
+   PR middle-end/66314
+   PR gcov-profile/66899
+   * gcc.dg/pr66899.c: New test.
+   * gcc.dg/pr66314.c: New test.
+
 2015-08-03  Marek Polacek  pola...@redhat.com
 
PR c/67088
diff --git a/gcc/testsuite/gcc.dg/pr66314.c b/gcc/testsuite/gcc.dg/pr66314.c
new file mode 100644
index 000..f74ab5b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr66314.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options -std=gnu89 -Os -fprofile-arcs -fsanitize=kernel-address } */
+
+char *a;
+int d;
+
+static int
+fn1 (int b, int c)
+{
+  while (a)
+if (*a)
+  return -126;
+  if (b)
+return -12;
+  if (c == -12)
+return c;
+}
+
+void
+fn2 (int b, int c)
+{
+  for (;;)
+{
+  d = fn1 (b, c);
+  switch (d)
+{
+case -126:
+  continue;
+default:
+  return;
+}
+}
+}
diff --git a/gcc/testsuite/gcc.dg/pr66899.c b/gcc/testsuite/gcc.dg/pr66899.c
new file mode 100644
index 000..1fff181
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr66899.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options -Os -fprofile-arcs } */
+
+struct
+{
+  int authority;
+} * a, *b, c, d;
+int e, f;
+static int
+fn1 ()
+{
+  if (a)
+goto verified;
+  if (b)
+goto matched;
+  return -126;
+matched:
+  e = 0;
+verified:
+  if (b)
+for (; c != b; c = d)
+  ;
+  return 0;
+}
+
+int
+fn2 ()
+{
+  for (;;)
+{
+  f = fn1 ();
+  switch (f)
+{
+case -126:
+  continue;
+default:
+  return 0;
+}
+}
+}
+
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 31ddf25..5a5f8df 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2130,7 +2130,7 @@ mark_threaded_blocks (bitmap threaded_blocks)
  cases where the second path starts at a downstream edge on the same
  path).  First record all joiner paths, deleting any in the unexpected
  case where there is already a path for that incoming edge.  */
-  for (i = 0; i  paths.length (); i++)
+  for (i = 0; i  paths.length ();)
 {
   vecjump_thread_edge * *path = paths[i];
 
@@ -2140,6 +2140,7 @@ mark_threaded_blocks (bitmap threaded_blocks)
  if ((*path)[0]-e-aux == NULL)
{
  (*path)[0]-e-aux = path;
+ i++;
}
  else
{
@@ -2149,10 +2150,15 @@ mark_threaded_blocks (bitmap threaded_blocks)
  delete_jump_thread_path (path);
}
}
+  else
+   {
+ i++;
+   }
 }
+
   /* Second, look for paths that have any other jump thread attached to
  them, and either finish converting them or cancel them.  */
-  for (i = 0; i  paths.length (); i++)
+  for (i = 0; i  paths.length ();)
 {
   vecjump_thread_edge * *path = paths[i];
   edge e = (*path)[0]-e;
@@ -2167,7 +2173,10 @@ mark_threaded_blocks (bitmap threaded_blocks)
  /* If we iterated through the entire path without exiting the loop,
 then we are good to go,

Re: [PATCH][AArch64] Change aarch64 vector cost to match vectorizer

2015-08-03 Thread James Greenhalgh

On Mon, Jul 27, 2015 at 02:22:41PM +0100, Pawel Kupidura wrote:
 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 10df325..ffafc3f 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,7 @@
 +2015-07-27  Pawel Kupidura pawel.kupid...@arm.com

Two spaces between your name and your email address, like so:

2015-07-27  Pawel Kupidura  pawel.kupid...@arm.com

 +
 +* config/aarch64/aarch64.c: Changed inner loop statement cost
 +to be consistent with vectorizer code.
 +

s/Changed/Change

   2015-07-26  Uros Bizjak ubiz...@gmail.com
 
   * config/alpha/alpha.c: Use SUBREG_P predicate.
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 020f63c..3b6f8c5 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -7079,15 +7079,9 @@ aarch64_add_stmt_cost (void *data, int count, 
 enum vect_cost_for_stmt kind,
 
 /* Statements in an inner loop relative to the loop being
vectorized are weighted more heavily.  The value here is
 - a function (linear for now) of the loop nest level.  */
 + arbitrary and could potentially be improved with analysis.  */

Your mail client has mangled the tabs in this diff, so the patch will
not apply in this form. Could you try posting again having resolved the
issues with your mail client?

 if (where == vect_body  stmt_info  stmt_in_inner_loop_p 
 (stmt_info))
 -{
 -  loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
 -  struct loop *loop =  LOOP_VINFO_LOOP (loop_info);
 -  unsigned nest_level = loop_depth (loop);
 -
 -  count *= nest_level;
 -}
 +count *= 50; /* FIXME */

Likewise here.

Thanks,
James

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Jeff Law


On 08/03/2015 09:50 AM, Ilya Enkovich wrote:



The original code looks better, tree height is just 2 and therefore it
can be executed in 2 cycles. New code has more dependencies and tree
height becomes 5. It is always hard to say for all x86 targets but as
a generic code the original version is better.

Agreed.  Reducing tree height is definitely a good thing as a general rule.

jeff

[gomp4] Remove superfluous code

2015-08-03 Thread Nathan Sidwell

I've committed this to gomp4 branch.  It tidies up the error message processing 
and removes some code that shows a remarkable confusion about how shared 
libraries work.


nathan
--
Nathan Sidwell - Director, Sourcery Services - Mentor Embedded
2015-08-03  Nathan Sidwell  nat...@codesourcery.com

	libgomp/
	* plugin/plugin-nvptx.c: Don't include dlfcn.h.
	(cuda_errlist): Constify.
	(errmsg):  Move into ...
	(cuda_error): ... here.  Make smaller.
	(_XSTR, _STR): Delete.
	(cuda_synames): Delete.
	(verify_device_library): Delete.
	(nvptx_init): Don't call it.

Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c	(revision 226533)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -43,16 +43,15 @@
 #include stdint.h
 #include string.h
 #include stdio.h
-#include dlfcn.h
 #include unistd.h
 #include assert.h
 
 #define	ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0]))
 
-static struct
+static const struct
 {
   CUresult r;
-  char *m;
+  const char *m;
 } cuda_errlist[]=
 {
   { CUDA_ERROR_INVALID_VALUE, invalid value },
@@ -109,9 +108,7 @@ static struct
   { CUDA_ERROR_UNKNOWN, unknown }
 };
 
-static char errmsg[128];
-
-static char *
+static const char *
 cuda_error (CUresult r)
 {
   int i;
@@ -119,12 +116,14 @@ cuda_error (CUresult r)
   for (i = 0; i  ARRAYSIZE (cuda_errlist); i++)
 {
   if (cuda_errlist[i].r == r)
-	return cuda_errlist[i].m[0];
+	return cuda_errlist[i].m;
 }
 
-  sprintf (errmsg[0], unknown result code: %5d, r);
+  static char errmsg[30];
+
+  snprintf (errmsg, sizeof (errmsg), unknown error code: %d, r);
 
-  return errmsg[0];
+  return errmsg;
 }
 
 static unsigned int instantiated_devices = 0;
@@ -383,74 +382,6 @@ static struct ptx_event *ptx_events;
 
 static struct ptx_device **ptx_devices;
 
-#define _XSTR(s) _STR(s)
-#define _STR(s) #s
-
-static struct _synames
-{
-  char *n;
-} cuda_symnames[] =
-{
-  { _XSTR (cuCtxCreate) },
-  { _XSTR (cuCtxDestroy) },
-  { _XSTR (cuCtxGetCurrent) },
-  { _XSTR (cuCtxPushCurrent) },
-  { _XSTR (cuCtxSynchronize) },
-  { _XSTR (cuDeviceGet) },
-  { _XSTR (cuDeviceGetAttribute) },
-  { _XSTR (cuDeviceGetCount) },
-  { _XSTR (cuEventCreate) },
-  { _XSTR (cuEventDestroy) },
-  { _XSTR (cuEventQuery) },
-  { _XSTR (cuEventRecord) },
-  { _XSTR (cuInit) },
-  { _XSTR (cuLaunchKernel) },
-  { _XSTR (cuLinkAddData) },
-  { _XSTR (cuLinkComplete) },
-  { _XSTR (cuLinkCreate) },
-  { _XSTR (cuMemAlloc) },
-  { _XSTR (cuMemAllocHost) },
-  { _XSTR (cuMemcpy) },
-  { _XSTR (cuMemcpyDtoH) },
-  { _XSTR (cuMemcpyDtoHAsync) },
-  { _XSTR (cuMemcpyHtoD) },
-  { _XSTR (cuMemcpyHtoDAsync) },
-  { _XSTR (cuMemFree) },
-  { _XSTR (cuMemFreeHost) },
-  { _XSTR (cuMemGetAddressRange) },
-  { _XSTR (cuMemHostGetDevicePointer) },
-  { _XSTR (cuMemHostRegister) },
-  { _XSTR (cuMemHostUnregister) },
-  { _XSTR (cuModuleGetFunction) },
-  { _XSTR (cuModuleLoadData) },
-  { _XSTR (cuStreamDestroy) },
-  { _XSTR (cuStreamQuery) },
-  { _XSTR (cuStreamSynchronize) },
-  { _XSTR (cuStreamWaitEvent) }
-};
-
-static int
-verify_device_library (void)
-{
-  int i;
-  void *dh, *ds;
-
-  dh = dlopen (libcuda.so, RTLD_LAZY);
-  if (!dh)
-return -1;
-
-  for (i = 0; i  ARRAYSIZE (cuda_symnames); i++)
-{
-  ds = dlsym (dh, cuda_symnames[i].n);
-  if (!ds)
-return -1;
-}
-
-  dlclose (dh);
-
-  return 0;
-}
-
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -631,16 +562,11 @@ static bool
 nvptx_init (void)
 {
   CUresult r;
-  int rc;
   int ndevs;
 
   if (instantiated_devices != 0)
 return true;
 
-  rc = verify_device_library ();
-  if (rc  0)
-return false;
-
   r = cuInit (0);
   if (r != CUDA_SUCCESS)
 GOMP_PLUGIN_fatal (cuInit error: %s, cuda_error (r));

Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-08-03 Thread Martin Sebor


I suspect the back end or even the middle end route isn't going to
work even if there was enough context to diagnose the problem
expressions because some of them will have been optimized away by then
(e.g., 'if ( __builtin_foo != 0)' is optimized into if (1) by gimple).


I was thinking that if they're optimized away, they aren't problematic
anymore; that was part of the attraction for me of handling this lower
down.


Yes, they're not problematic in that they don't cause linker unsats.
I have a few concerns with the approach of accepting or rejecting
constructs based on optimization (e.g., that it might lead to
similar problems as with -Wmaybe-uninitialized; or that it could
be perceived as inconsistent). But it may not be as bad as it seems
-- let me look into it if only as a learning exercise and see how
it goes.


I've prototyped this approach in a couple places in the middle
end. Both implementations are very simple and work great when
the code isn't optimized away. The problem is that the
optimizations done at various points between the front end and
the final gimplification make the diagnostics inconsistent.

For instance, with F being the type of the builtin, this is
diagnosed in the first prototype:

  F* foo () { if (0) return __builtin_trap; return 0; }

but this isn't:

  F* bar () { return 0 ? __builtin_trap : 0; }

because the ternary ?: expression is folded into a constant by
the front end even before it reaches the gimplifier, while the
if-else statement isn't folded until the control flow graph is
built. (As an aside: I'm wondering why that is. Why have the
front end do any optimization at all if the middle can and
does them too, and better?)

Diagnosing neither of these cases might perhaps seem like the
way to go since they are both safe (don't result in a linker
error). I implemented this approach in the second prototype in
a later pass but that exposed even more inconsistencies as even
more code is optimized (for instance, exception handling).

I think this sort of inconsistency in diagnosing semantically
equivalent language constructs would be viewed as arbitrary
and be unhelpful to users. It's too bad because otherwise this
solution would have been quite elegant and simple (touching
just one function).

Unless you have some other ideas I'm going to abandon this
approach see if the original patch can be simplified by
consoldating some of the handling into mark_rvalue_use.

Martin

PS Another problem, one that in my view sinks the middle end
approach, is the fact that references in unused static inline
functions aren't diagnosed either, even if calling the inline
function would result in taking the address of the builtin
function. E.g., this isn't diagnosed:

  static inline F* baz (void) { return __builtin_trap; }

unless baz() is called. This would make the feature largely
ineffective in C++ template libraries.

[PATCH], PR target/67071, Improve easy_altivec_constants on PowerPC

2015-08-03 Thread Michael Meissner

In preparing the next IEEE 128-bit floating point patch, I needed a quick way
to load -0.0q into a vector registers (i.e. just the MSB set). I originally had
a special purpose insn to load this value, but I decided to widen it to allow
the easy_altivec_constant support to generate constants where you use the
VSLDOI instruction to create the bottom part of all 0's or all 1's.

When I started doing the coding, I noticed that the current support to load
vectors with the MSB set in each element no longer worked, because the test
assumed the constant was stored as an unsigned value, and we now store a sign
extended number. I raised PR 67071 about this, and this patch fixes that
problem and adds more patterns of vector constants that can be loaded without
using memory.

I have built this on both big endian power7 and little endian power8 machines
with no regressions. Can I install the patch? I would also like to backport
this patch to the active branches.

[gcc]
2015-08-03  Michael Meissner  meiss...@linux.vnet.ibm.com

PR target/67071
* config/rs6000/predicates.md (easy_vector_constant_vsldoi): New
predicate to allow construction of vector constants using the
VSLDOI vector shift instruction.

* config/rs6000/rs6000-protos.h (vspltis_shifted): Add
declaration.

* config/rs6000/rs6000.c (vspltis_shifted): New function to return
the number of bytes to be shifted left and filled in with either
all zero or all one bits.
(gen_easy_altivec_constant): Call vsplitis_shifted if no other
methods exist.
(output_vec_const_move): On power8, generate XXLORC to generate
a vector constant with all 1's. Do a split if we need to use a
VSLDOI instruction.

* config/rs6000/rs6000.h (EASY_VECTOR_MSB): Use mode mask to
properly test for the MSB.

* config/rs6000/altivec.md (VSLDOI splitter): Add splitter for
vector constants that can be created with VSLDOI.

[gcc/testsuite]
2015-08-03  Michael Meissner  meiss...@linux.vnet.ibm.com

PR target/67071
* gcc.target/powerpc/pr67071-1.c: New file to test PR 67071 new
vector constants.
* gcc.target/powerpc/pr67071-2.c: Likewise.
* gcc.target/powerpc/pr67071-3.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 226414)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -562,6 +562,14 @@ (define_predicate easy_vector_constant_
   return EASY_VECTOR_MSB (val, GET_MODE_INNER (mode));
 })
 
+;; Return true if this is an easy altivec constant that we form
+;; by using VSLDOI.
+(define_predicate easy_vector_constant_vsldoi
+  (and (match_code const_vector)
+   (and (match_test TARGET_ALTIVEC)
+   (and (match_test easy_altivec_constant (op, mode))
+(match_test vspltis_shifted (op) != 0)
+
 ;; Return 1 if operand is constant zero (scalars and vectors).
 (define_predicate zero_constant
   (and (match_code const_int,const_double,const_wide_int,const_vector)
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 226413)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -31,6 +31,7 @@ extern void init_cumulative_args (CUMULA
 #endif /* TREE_CODE */
 
 extern bool easy_altivec_constant (rtx, machine_mode);
+extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
 extern int num_insns_constant (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 226414)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5448,6 +5448,96 @@ vspltis_constant (rtx op, unsigned step,
   return true;
 }
 
+/* Like vsplitis_constant, but allow the value to be shifted left with a VSLDOI
+   instruction, filling in the bottom elements with 0 or -1.
+
+   Return 0 if the constant cannot be generated with VSLDOI.  Return positive
+   for the number of zeroes to shift in, or negative for the number of 0xff
+   bytes to shift in.
+
+   OP is a CONST_VECTOR.  */
+
+int
+vspltis_shifted (rtx op)
+{
+  machine_mode mode = GET_MODE (op);
+  machine_mode inner = GET_MODE_INNER (mode);
+
+  unsigned i, j;
+  unsigned nunits;
+  unsigned mask;
+
+  HOST_WIDE_INT val;
+
+  if (mode != V16QImode  mode != V8HImode  mode != V4SImode)
+return false;
+
+  /* We need to create pseudo registers to do the shift, so don't recognize
+ shift vector constants after reload.  */
+  if (!can_create_pseudo_p ())
+return false;
+
+  nunits

Re: [fortran,patch] Extend IEEE support to all real kinds

2015-08-03 Thread Uros Bizjak

On Mon, Aug 3, 2015 at 11:14 PM, FX fxcoud...@gmail.com wrote:
 The attached patch extends the IEEE modules to all floating-point kinds. Last 
 time, when I added IEEE support, I restricted it to the float and double 
 types (real kinds 4 and 8), to be extra safe. After discussion with Uros 
 Bizjak and some reading, I’ve come to the conclusion that on most hardware 
 where we support IEEE at all, we do support enough IEEE features on extended 
 and quad prec (long double and __float128, on x86_64 hardware) to satisfy the 
 Fortran standard.

 So, this enables full IEEE support for all real kinds. Nothing changes to the 
 underlying architecture, it’s almost exclusively mechanical changes (adding 
 the necessary variants to the interfaces, etc.).

 Bootstrapped and regtested on x86_64-apple-darwin14 (with associated 
 libquadmath patch: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00124.html).
 OK to commit to trunk?

 Index: libgfortran/config/fpu-387.h
 ===
 --- libgfortran/config/fpu-387.h (revision 226429)
 +++ libgfortran/config/fpu-387.h (working copy)
 @@ -461,12 +461,12 @@ set_fpu_state (void *state)


  int
 -support_fpu_underflow_control (int kind)
 +support_fpu_underflow_control (int kind __attribute__((unused)))
  {
if (!has_sse())
  return 0;

 -  return (kind == 4 || kind == 8) ? 1 : 0;
 +  return 1;
  }

x86 doesn't support underflow control for long double (x87) and
__float128 types, so the above change is wrong.

 Index: libgfortran/config/fpu-glibc.h
 ===
 --- libgfortran/config/fpu-glibc.h (revision 226429)
 +++ libgfortran/config/fpu-glibc.h (working copy)
 @@ -439,7 +439,7 @@ int
  support_fpu_underflow_control (int kind __attribute__((unused)))
  {
  #if defined(__alpha__)  defined(FE_MAP_UMZ)
 -  return (kind == 4 || kind == 8) ? 1 : 0;
 +  return 1;
  #else
return 0;
  #endif

I'm not sure for alpha, if it supports underflow control for extended
types. Let's keep this as is for now, it is trivial to change later.

Uros.

Re: [fortran,patch] Extend IEEE support to all real kinds

2015-08-03 Thread Uros Bizjak

On Mon, Aug 3, 2015 at 11:14 PM, FX fxcoud...@gmail.com wrote:
 The attached patch extends the IEEE modules to all floating-point kinds. Last 
 time, when I added IEEE support, I restricted it to the float and double 
 types (real kinds 4 and 8), to be extra safe. After discussion with Uros 
 Bizjak and some reading, I’ve come to the conclusion that on most hardware 
 where we support IEEE at all, we do support enough IEEE features on extended 
 and quad prec (long double and __float128, on x86_64 hardware) to satisfy the 
 Fortran standard.

 So, this enables full IEEE support for all real kinds. Nothing changes to the 
 underlying architecture, it’s almost exclusively mechanical changes (adding 
 the necessary variants to the interfaces, etc.).

 Bootstrapped and regtested on x86_64-apple-darwin14 (with associated 
 libquadmath patch: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00124.html).
 OK to commit to trunk?

Can you please also introduce tests for ieee exceptions for long
double and __float128 types (as is done for real and double in
gfortran.dg/ieee/ieee_1.F90) as well as test for supported rounding
modes as done for real and double in gfortran.dg/ieee/rounding_1.f90 ?

On x86_64, these new tests will exercise exceptions and rounding modes
for __float128 soft-fp library, in addition to x87 long double tests.

Uros.

Re: [fortran,patch] Extend IEEE support to all real kinds

2015-08-03 Thread Steve Kargl

On Mon, Aug 03, 2015 at 11:14:41PM +0200, FX wrote:
 
 So, this enables full IEEE support for all real kinds. Nothing
 changes to the underlying architecture, it???s almost exclusively
 mechanical changes (adding the necessary variants to the interfaces, etc.). 
 
 Bootstrapped and regtested on x86_64-apple-darwin14 (with associated
 libquadmath patch: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00124.html).
 OK to commit to trunk?
 

This is OK to commit.  It's unclear whether I've
sufficient authority to OK the libquadmath path.

 
 PS: Once this is in, I intend to focus on the next item: allowing
 all standard-mandated IEEE functions in constant expressions.  Then,
 I believe our IEEE support will be entirely bug-free (in the sense
 that there will be no known bugs in it!).
 

See PR libquadmath/65757.  I'm not sure if it effects 
the IEEE support other than libquadmath has issues.

-- 
Steve

Re: [PR64164] drop copyrename, integrate into expand

2015-08-03 Thread Alexandre Oliva

On Jul 30, 2015, H.J. Lu hjl.to...@gmail.com wrote:

 aoliva/pr64164  is fine on x32.

Thanks.  I have made a large number of changes since you tested it,
fixing all the reported issues and then some.  Now, x86_64-linux-gnu
(-m64 and -m32), i686-pc-linux-gnu, powerpc64-linux-gnu and
powerpc64el-linux-gnu pass regstrap (r226317), and the many tens of
targets I cross-tested still get the same 'make all' errors that the
pristine tree did.

The bulk of the incremental changes had to do with handling splitting
and unsplitting of complex args, and BLKmode types passed by reference.

For the former, I had naively assumed complex args would always be
represented as CONCATs.  I now use read_complex_part to split the
expand-assigned parm rtl into components, and I use it again at unsplit
time to make sure the expand-assigned parm rtl matches that of the split
components.

The latter, in turn, almost required me to give up the entire notion of
coalescing parms.  The problem is that, for arguments passed as a
BLKmode pointer, copying the argument to an expand-assigned stack slot
is not only wasteful, it doesn't really work: we'd expand the copy in
assign_parms* and insert it before the stack allocation performed by
expand_user_vars, so we'd initialize the pseudo holding the address of
the stack slot only after its first use.

The solution I came up with was to detect BLKmode parms and NOT allow
them to coalesce with other variables, so that we can easily detect
partitions that need special handling.  The special handling amounts to
not allocating a stack slot for the partition holding the param default
def, and leaving it for assign_parms to do so.  We do, however, allocate
a MEM, in theory assigned to all partition members (though they're all
the same parm ATM, but not necessarily all SSA_NAMEs of the same parm,
since optimization causes different versions to conflict).  We leave the
address of that MEM unset, so that assign_parm knows it is to fill it in
with a pseudo holding a copy of the incoming parm address, or with the
address of the local stack slot created to hold a copy of the parameter.

It took me several rounds of trial and error to get these to pass all
complex and vector tests on x86 and ppc.  The last remaining failure was
a regression in gcc.target/powerpc/pr16458-4, caused by our inability to
hold SSA_NAMEs as REG_EXPRs in pseudos.  emit_case_decision_tree
attempted to preserve the decl as the REG_EXPR of a pseudo holding a
copy of the switch expr, and its type appears to be used to decide
whether to emit signed or unsigned compares, even though we explicitly
pass mode and unsignedp down to the cmp_and_jump expanders.  I figured
there was no good reason to prevent SSA_NAMEs in REG_EXPRs, just like
MEM_EXPRs, so I went ahead and adjusted the DECL_MODE that prevented it,
and now we expand the case decision tree as intended.

Here's a consolidated patch, followed by the consolidated incremental
patch.  I don't intend to install it this week, even if approved,
because I'm going to be away Aug 5-10, and I would like to be around
should any further problems arise.  So, ok to install when I return?

[PR64164] Drop copyrename, use coalescible partition as base when optimizing.

From: Alexandre Oliva aol...@redhat.com

for  gcc/ChangeLog

PR rtl-optimization/64164
PR bootstrap/66978
PR middle-end/66983
PR rtl-optimization/67000
PR middle-end/67034
PR middle-end/67035
* Makefile.in (OBJS): Drop tree-ssa-copyrename.o.
* tree-ssa-copyrename.c: Removed.
* opts.c (default_options_table): Drop -ftree-copyrename.  Add
-ftree-coalesce-vars.
* passes.def: Drop all occurrences of pass_rename_ssa_copies.
* common.opt (ftree-copyrename): Ignore.
(ftree-coalesce-inlined-vars): Likewise.
* doc/invoke.texi: Remove the ignored options above.
* gimple-expr.h (gimple_can_coalesce_p): Move declaration
* tree-ssa-coalesce.h: ... here.
* tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other
headers required by it.
* gimple-expr.c (gimple_can_coalesce_p): Allow coalescing
across variables when flag_tree_coalesce_vars.  Check register
use and promoted modes to allow coalescing.  Do not coalesce
maybe-byref parms with SSA_NAMEs of other variables, or
anonymous SSA_NAMEs.  Moved to tree-ssa-coalesce.c.
* tree-ssa-live.c (struct tree_int_map_hasher): Move along
with its member functions to tree-ssa-coalesce.c.
(var_map_base_init): Likewise.  Renamed to
compute_samebase_partition_bases.
(partition_view_normal): Drop want_bases parameter.
(partition_view_bitmap): Likewise.
* tree-ssa-live.h: Adjust declarations.
* tree-ssa-coalesce.c: Include explow.h and cfgexpand.h.
(build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's
default defs at the entry point.

Re: [gomp4] Remove superfluous code

2015-08-03 Thread Nathan Sidwell


On 08/03/15 19:52, Nathan Sidwell wrote:

I've committed this to gomp4 branch.  It tidies up the error message processing
and removes some code that shows a remarkable confusion about how shared
libraries work.


I've also committed it to trunk as obvious.

nathan

Re: [gofrontend-dev] Re: Go patch committed: Fix error reporting for invalid builtin calls

2015-08-03 Thread Michael Hudson-Doyle

Now I get

../../../gcc/libgo/runtime/mprof.goc: In function ‘runtime_Stack’:
../../../gcc/libgo/runtime/mprof.goc:437:19: error: ‘enablegc’ may be
used uninitialized in this function [-Werror=maybe-uninitialized]
   mstats.enablegc = enablegc;
   ^
../../../gcc/libgo/runtime/mprof.goc:406:7: note: ‘enablegc’ was declared here
  bool enablegc;
   ^

Am I doing something wrong?

Cheers,
mwh

On 4 August 2015 at 05:55, Ian Lance Taylor i...@golang.org wrote:
 On Mon, Aug 3, 2015 at 2:10 AM, Andreas Schwab sch...@suse.de wrote:

 ../../../libgo/runtime/mprof.goc: In function 'runtime_Stack':
 ../../../libgo/runtime/mprof.goc:408:5: error: calling 
 '__builtin_frame_address' with a nonzero argument is unsafe 
 [-Werror=frame-address]
   sp = runtime_getcallersp(b);

 Fixed by this patch by Chris Manghane.  The call was not actually
 necessary.  Bootstrapped and ran Go testsuite on
 x86_64-unknown-linux-gnu.  Committed to mainline.  This fixes PR
 67101.

 Ian

 --
 You received this message because you are subscribed to the Google Groups 
 gofrontend-dev group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to gofrontend-dev+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

Re: [PATCH, rs6000] Fix source operand contraints for tabort. pattern

2015-08-03 Thread Peter Bergner

On Mon, 2015-08-03 at 16:27 -0400, David Edelsohn wrote:
 On Mon, Aug 3, 2015 at 3:39 PM, Peter Bergner berg...@vnet.ibm.com wrote:
  This has passed bootstrapping and regtesting on trunk.  Ok for mainline?
 
  I'd like to also backport this to the release branches.  Is this ok for
  them once bootstrapping and regtesting are complete on them?
[snip]
 Okay.

Thanks, I have now committed this to trunk and the 4.9 and 5 branches.

Peter

Re: [gofrontend-dev] Re: Go patch committed: Fix error reporting for invalid builtin calls

2015-08-03 Thread Ian Lance Taylor

On Mon, Aug 3, 2015 at 7:24 PM, Michael Hudson-Doyle
michael.hud...@canonical.com wrote:

 Now I get

 ../../../gcc/libgo/runtime/mprof.goc: In function ‘runtime_Stack’:
 ../../../gcc/libgo/runtime/mprof.goc:437:19: error: ‘enablegc’ may be
 used uninitialized in this function [-Werror=maybe-uninitialized]
mstats.enablegc = enablegc;
^
 ../../../gcc/libgo/runtime/mprof.goc:406:7: note: ‘enablegc’ was declared here
   bool enablegc;

I don't know why I am not seeing this, but I've committed this patch
that should fix it.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 226533)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a850225433a66a58613c22185c3b09626f5545eb
+bdd98c601f2c8dbd0bf821548ba09c038f7645c4
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/mprof.goc
===
--- libgo/runtime/mprof.goc (revision 226525)
+++ libgo/runtime/mprof.goc (working copy)
@@ -403,7 +403,7 @@ func ThreadCreateProfile(p Slice) (n int
 
 func Stack(b Slice, all bool) (n int) {
byte *pc;
-   bool enablegc;
+   bool enablegc = false;

pc = (byte*)(uintptr)runtime_getcallerpc(b);

Minor typo fixes

2015-08-03 Thread Jeff Law



I was starting to look at Abe's changes to the gimple if-converter and 
realized a handful of the changes were just fixing comments.  No reason 
those shouldn't go in immediately.


So I pulled them out and applied those changes to the trunk.

Abe -- if you find more of those kind of changes, don't hesitate to 
break them out into their own patch and they can go forward very quickly.


Attached are the fixes that were actually applied.  They were 
bootstrapped for completeness.


Thanks,
Jeff
* tree-if-conv.c: Fix various typos in comments.
* tree-vect-stmts.c: Likewise.
 
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index a284c2a..b5de1b2 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2283,8 +2283,8 @@ combine_blocks (struct loop *loop, bool 
any_mask_load_store)
   ifc_bbs = NULL;
 }
 
-/* Version LOOP before if-converting it, the original loop
-   will be then if-converted, the new copy of the loop will not,
+/* Version LOOP before if-converting it; the original loop
+   will be if-converted, the new copy of the loop will not,
and the LOOP_VECTORIZED internal call will be guarding which
loop to execute.  The vectorizer pass will fold this
internal call into either true or false.  */
@@ -2456,7 +2456,7 @@ ifcvt_walk_pattern_tree (tree var, vecgimple 
*defuse_list,
   return;
 }
 
-/* Returns true if STMT can be a root of bool pattern apllied
+/* Returns true if STMT can be a root of bool pattern applied
by vectorizer.  */
 
 static bool
@@ -2486,7 +2486,7 @@ stmt_is_root_of_bool_pattern (gimple stmt)
   return false;
 }
 
-/*  Traverse all statements in BB which correspondent to loop header to
+/*  Traverse all statements in BB which correspond to loop header to
 find out all statements which can start bool pattern applied by
 vectorizer and convert multiple uses in it to conform pattern
 restrictions.  Such case can occur if the same predicate is used both
@@ -2567,7 +2567,7 @@ ifcvt_local_dce (basic_block bb)
   gimple_set_plf (phi, GF_PLF_2, true);
   worklist.safe_push (phi);
 }
-  /* Consider load/store statemnts, CALL and COND as live.  */
+  /* Consider load/store statements, CALL and COND as live.  */
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 {
   stmt = gsi_stmt (gsi);
@@ -2655,7 +2655,7 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
-  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  /* Set up aggressive if-conversion for loops marked with simd pragma.  */
   aggressive_if_conv = loop-force_vectorize;
   /* Check either outer loop was marked with simd pragma.  */
   if (!aggressive_if_conv)
@@ -2687,7 +2687,7 @@ tree_if_conversion (struct loop *loop)
   combine_blocks (loop, any_mask_load_store);
 
   /* Delete dead predicate computations and repair tree correspondent
- to bool pattern to delete multiple uses of preidcates.  */
+ to bool pattern to delete multiple uses of predicates.  */
   if (aggressive_if_conv)
 {
   ifcvt_local_dce (loop-header);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 1256b09..33a7e0c 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7125,7 +7125,7 @@ vect_is_simple_cond (tree cond, gimple stmt, 
loop_vec_info loop_vinfo,
 
When STMT is vectorized as nested cycle, REDUC_DEF is the vector variable
to be used at REDUC_INDEX (in then clause if REDUC_INDEX is 1, and in
-   else caluse if it is 2).
+   else clause if it is 2).
 
Return FALSE if not a vectorizable STMT, TRUE otherwise.  */

Re: revised and updated new-if-converter patch… [PATCH] fix PR46029: reimplement if conversion of loads and stores

2015-08-03 Thread Jeff Law


On 07/17/2015 01:57 PM, Abe wrote:

Dear all,

Relative to the previous submission of this same patch, the below
corrects some minor spacing and/or indentation issues,
misc. other formatting fixes, and makes the disabled vectorization tests
be disabled via xfail rather than by adding spaces to
deliberately cause the relevant scanned-for text to not be found by
DejaGNU so as to prevent the DejaGNU line from being interpreted.

The below is also based on a Git checkout that was rebased to the latest
upstream check-in from today,
so it should merge cleanly with trunk as of today.

Regards,

Abe








2015-06-12  Sebastian Pop  s@samsung.com
 Abe Skolnik  a.skol...@samsung.com

 PR tree-optimization/46029
 * tree-data-ref.c (struct data_ref_loc_d): Moved...
 (get_references_in_stmt): Exported.
 * tree-data-ref.h (struct data_ref_loc_d): ... here.
 (get_references_in_stmt): Declared.

 * tree-if-conv.c (struct ifc_dr): Removed.
 (IFC_DR): Removed.
 (DR_WRITTEN_AT_LEAST_ONCE): Removed.
 (DR_RW_UNCONDITIONALLY): Removed.
 (memrefs_read_or_written_unconditionally): Removed.
 (write_memrefs_written_at_least_once): Removed.
 (ifcvt_could_trap_p): Does not take refs parameter anymore.
 (ifcvt_memrefs_wont_trap): Removed.
 (has_non_addressable_refs): New.
 (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
 Removed use of refs.
 (if_convertible_stmt_p): Removed use of refs.
 (if_convertible_gimple_assign_stmt_p): Same.
 (if_convertible_loop_p_1): Removed use of refs.  Remove initialization
 of dr-aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
 (insert_address_of): New.
 (create_scratchpad): New.
 (create_indirect_cond_expr): New.
 (predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
 parameter for scratch_pad.
 (combine_blocks): Same.
 (tree_if_conversion): Same.

 testsuite/
 * g++.dg/tree-ssa/ifc-pr46029.C: New.
 * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
 * gcc.dg/tree-ssa/ifc-8.c: New.
 * gcc.dg/tree-ssa/ifc-9.c: New.
 * gcc.dg/tree-ssa/ifc-10.c: New.
 * gcc.dg/tree-ssa/ifc-11.c: New.
 * gcc.dg/tree-ssa/ifc-12.c: New.
 * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.




diff --git a/gcc/common.opt b/gcc/common.opt
index 6b2ccbc..49f6b9f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1413,7 +1413,7 @@ Common Report Var(flag_tree_loop_if_convert)
Init(-1) Optimization
  Convert conditional jumps in innermost loops to branchless equivalents

  ftree-loop-if-convert-stores
-Common Report Var(flag_tree_loop_if_convert_stores) Optimization
+Common Report Var(flag_tree_loop_if_convert_stores) Init(-1) Optimization
  Also if-convert conditional jumps containing memory writes

  ; -finhibit-size-directive inhibits output of .size for ELF.
I don't see this change mentioned anywhere in the ChangeLog.  That seems 
to be a relatively common situation.  I called out some of those issues, 
but didn't try to catch them all.  Please make sure all changes are 
reflected in the ChangeLog.






diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
index f392fbe..775fcd5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c

This change isn't mentioned in the ChangeLog.


 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c

b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
index 875d2d3..fc69ca2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options -c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats {
target *-*-* } } */
+/* { dg-options -c -O2 -ftree-vectorize -ftree-loop-if-convert-stores
-fdump-tree-ifcvt-stats { target *-*-* } } */
ISTM this really should be two tests, one with this code as-is, another 
that exactly matches the ffmpeg kernel.






diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
index 11e9533..cbd3378 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
I don't see this mentioned in the ChangeLog.  It also doesn't look like 
you actually disabled the test.  Obviously this will need to be 
addressed before your patch could go in.



diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
index 180b490..aedc66a 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c

Not mentioned in the ChangeLog.   xfail needs to be fixed.

Similarly for the others were you added xfails.



diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e6ff4ef..2a1e27d 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -4363,22 +4363,11 @@

Re: [PATCH], PowerPC IEEE 128-bit patch #4

2015-08-03 Thread Joseph Myers

On Mon, 3 Aug 2015, Michael Meissner wrote:

 The intention of theese changes (currently unwritten) is to change the 
 existing
 problematical names that use TF in their name to be something else, and 
 provide
 via a weak reference an alias for the old name. So if for example, we change
 the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses 
 say
 __gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of
 __floatditf. Older code that refers to __floatditf would still work fine.

But as those names are in the implementation namespace and aren't 
user-visible, I don't see the point in such a change (other than as part 
of a complete new incompatible ABI which gets its own copies of libgcc, 
libstdc++, libc, libm and other affected libraries).  Of course it *can* 
be done via symbol versioning to keep working with existing binaries / 
shared libraries using the existing symbols from libgcc, but the libgcc 
build system doesn't make that sort of target-specific symbol versioning 
particularly convenient.  And as usual with symbol versioning, you'd break 
compatibility with existing static libraries / .o files.

-- 
Joseph S. Myers
jos...@codesourcery.com

libgo patch committed: Fix type of RawSockaddr Data field for PPC

2015-08-03 Thread Ian Lance Taylor

The gc compiler determines the type of the Data field in the
syscall.RawSockaddr type by looking at the system header files.  The
gccgo library just has a fixed definition for it.  Unfortunately the
fixed definition does not match the header files on PPC64 systems.
This is https://golang.org/issue/11469 .  This patch from Lynn Boger
fixes this.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

Re: [PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI

2015-08-03 Thread Richard Biener

On Sun, Aug 2, 2015 at 9:40 PM, Patrick Palka patr...@parcs.ath.cx wrote:
 On Sun, Aug 2, 2015 at 2:44 PM, Bernhard Reutner-Fischer
 rep.dot@gmail.com wrote:
 On August 2, 2015 8:18:51 PM GMT+02:00, Patrick Palka patr...@parcs.ath.cx 
 wrote:

 No comment on the patch itself, but please
 s/visted/visited/
 s/VISIED_PHIS/VISITED_PHIS/

 while at it.
 TIA,


Ok.

Thanks,
Richard.

 Sure, no problem.

Re: [PATCH][AArch64][10/14] Implement target pragmas

2015-08-03 Thread Kyrill Tkachov



On 24/07/15 09:36, Kyrill Tkachov wrote:

On 21/07/15 17:52, James Greenhalgh wrote:

On Thu, Jul 16, 2015 at 04:21:05PM +0100, Kyrill Tkachov wrote:

Hi all,

This patch implements target pragmas for aarch64.
The pragmas accepted are the same as for target attributes (as required).
In addition pragmas will need to redefine the target-specific preprocessor
macros if appropriate.

A new file: aarch64-c.c is added and the code from TARGET_CPU_CPP_BUILTINS is 
moved there
and split up into the unconditional parts that are always defined and the 
conditional stuff
that depends on certain architectural features.  The pragma processing code 
calls that
to redefine preprocessor macros on the fly.
The implementation is similar to the rs6000 one.

With target pragmas implemented, we can use them in the arm_neon.h and 
arm_acle.h headers to
specify the architectural features required for those intrinsics, rather than 
#ifdef'ing them
out when FP/SIMD is not available from the command line.

We need to do this in order to handle cases where the user compiles a file with 
-mgeneral-regs-only
but has a function tagged with +simd and tries to use the arm_neon.h intrinsics.
Tests and documentation comes as a separate patch later on in the series

Bootstrapped and tested on aarch64.

Ok for trunk?

A couple of ChangeLog nits and some comments below.


2015-07-16  Kyrylo Tkachov  kyrylo.tkac...@arm.com

   * config.gcc (aarch64*-*-*): Specify c_target_objs and cxx_target_objs.
   * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS):

This should say


   * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): New.

Presumably (or maybe Define.).



   (TARGET_CPU_CPP_BUILTINS): Redefine to call aarch64_cpu_cpp_builtins.
   * config/aarch64/aarch64.c (aarch64_override_options_internal): Remove
   static keyword.
   (aarch64_reset_previous_fndecl): New function.
   * config/aarch64/aarch64-c.c: New file.
   * config/aarch64/arm_acle.h: Add pragma +crc+nofp at the top.
   Push and pop options at beginning and end.  Remove ifdef
   __ARM_FEATURE_CRC32.
   * config/aarch64/arm_neon.h: Remove #ifdef check on __ARM_NEON.
   Add pragma arch=armv8-a+simd and +crypto where appropriate.
   * config/aarch64/t-aarch64 (aarch64-c.o): New rule.

I don't see a ChangeLog entry for these hunks:


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 3a5482d..4704736 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -360,6 +360,10 @@ bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum 
machine_mode, RTX_CODE);
   #endif /* RTX_CODE */

   void aarch64_init_builtins (void);
+
+bool aarch64_process_target_attr (tree, const char*);
+void aarch64_override_options_internal (struct gcc_options *);
+
   rtx aarch64_expand_builtin (tree exp,
  rtx target,
  rtx subtarget ATTRIBUTE_UNUSED,
@@ -376,6 +380,9 @@ extern void aarch64_split_combinev16qi (rtx operands[3]);
   extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
   extern bool aarch64_madd_needs_nop (rtx_insn *);
   extern void aarch64_final_prescan_insn (rtx_insn *);
+extern void aarch64_reset_previous_fndecl (void);
+extern void aarch64_cpu_cpp_builtins (cpp_reader *);
+extern void aarch64_register_pragmas (void);
   extern bool
   aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
   bool aarch64_handle_option (struct gcc_options *, struct gcc_options *,




+static bool
+aarch64_pragma_target_parse (tree args, tree pop_target)
+{
+
+  bool ret;
+
+  /* If args is not NULL then process it and setup the target-specific
+ information that it specifies.  */
+  if (args)
+{
+  ret = aarch64_process_target_attr (args, pragma);
+  if (ret)
+aarch64_override_options_internal (global_options);

RET must equal true.


+  else
+return false;

Early return of false closes the other control path here.


+}
+
+  /* args is NULL, restore to the state described in pop_target.  */
+  else
+{
+  pop_target = pop_target ? pop_target : target_option_default_node;
+  cl_target_option_restore (global_options,
+TREE_TARGET_OPTION (pop_target));
+  ret = true;
+}

Therefore RET must equal true here.


+
+  target_option_current_node
+= build_target_option_node (global_options);
+
+  aarch64_reset_previous_fndecl ();
+  /* For the definitions, ensure all newly defined macros are considered
+ as used for -Wunused-macros.  There is no point warning about the
+ compiler predefined macros.  */
+  cpp_options *cpp_opts = cpp_get_options (parse_in);
+  unsigned char saved_warn_unused_macros = cpp_opts-warn_unused_macros;
+  cpp_opts-warn_unused_macros = 0;
+
+  aarch64_update_cpp_builtins (parse_in);
+
+  cpp_opts-warn_unused_macros = saved_warn_unused_macros;
+
+  return ret;

So we don't need RET

Re: [PATCH][AArch64][3/14] Refactor option override code

2015-08-03 Thread James Greenhalgh

On Fri, Jul 24, 2015 at 09:21:46AM +0100, Kyrill Tkachov wrote:
 
 Thanks for the review, here's an updated version.
 In the end, I chose to retain the use alloca (other patches in the series
 are reworked to use it too).
 
 How's this?

A nit or two from code you were moving or that got caught up in the diff,
but otherwise OK to commit with the issues highlighted below fixed up.

  
 -  aarch64_build_bitmask_table ();
 +/* Validate a command-line -mcpu option.  Parse the cpu and extensions (if 
 any)
 +   specified in STR and throw errors if appropriate.  Put the results if
 +   they are valid in RES and ISA_FLAGS.  */
 +static void
 +aarch64_validate_mcpu (const char *str, const struct processor **res,
 +unsigned long *isa_flags)

Newline between comment and function.

 +/* Validate a command-line -march option.  Parse the arch and extensions
 +   (if any) specified in STR and throw errors if appropriate.  Put the
 +   results, if they are valid, in RES and ISA_FLAGS.  */
 +static void
 +aarch64_validate_march (const char *str, const struct processor **res,
 +unsigned long *isa_flags)

And here.

 +/* Validate a command-line -mtune option.  Parse the cpu
 +   specified in STR and throw errors if appropriate.  Put the
 +   result, if it is valid, in RES.  */
 +static void
 +aarch64_validate_mtune (const char *str, const struct processor **res)

And here.

Thanks,
James

[PATCH] Improve genmatch code-gen

2015-08-03 Thread Richard Biener


Since we allow arbitrary IDs in captures the capture array is dense
and thus we don't need to clear it before initializing.  We can
also emit more compact source by simply using an initializer.

-   tree captures[3] ATTRIBUTE_UNUSED = {};
-   captures[0] = o20;
-   captures[1] = op1;
-   captures[2] = op2;
+   tree captures[3] ATTRIBUTE_UNUSED = { o20, op1, op2 };

this has a surprinsingly big effect on code-size at -O0 as well
(on the positive side).

Boostrap  regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-08-03  Richard Biener  rguent...@suse.de

* genmatch.c (dt_simplify::gen): Create captures array
with an initializer.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 226490)
+++ gcc/genmatch.c  (working copy)
@@ -3023,16 +3195,15 @@ dt_simplify::gen (FILE *f, int indent, b
   output_line_directive (f,
 s-result ? s-result-location : s-match-location);
   if (s-capture_max = 0)
-fprintf_indent (f, indent, tree captures[%u] ATTRIBUTE_UNUSED = {};\n,
-   s-capture_max + 1);
-
-  for (int i = 0; i = s-capture_max; ++i)
-if (indexes[i])
-  {
-   char opname[20];
-   fprintf_indent (f, indent, captures[%u] = %s;\n,
-   i, indexes[i]-get_name (opname));
-  }
+{
+  char opname[20];
+  fprintf_indent (f, indent, tree captures[%u] ATTRIBUTE_UNUSED = { %s,
+ s-capture_max + 1, indexes[0]-get_name (opname));
+
+  for (int i = 1; i = s-capture_max; ++i)
+   fprintf (f, , %s, indexes[i]-get_name (opname));
+  fprintf (f,  };\n);
+}
 
   /* If we have a split-out function for the actual transform, call it.  */
   if (info  info-fname)

Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2015-08-03 Thread Maxim Blumental

Could you probably review the patch, please?

2015-07-28 18:42 GMT+03:00 Maxim Blumental bvm...@gmail.com:
  Applied the idea with python script alternative. Review, please.

 2015-07-24 17:18 GMT+03:00 David Malcolm dmalc...@redhat.com:
 On Fri, 2015-07-24 at 10:01 +0200, Jakub Jelinek wrote:
 #!/usr/bin/python
 import sys
 with open(sys.argv[1],rb) as f:
 nextblock = f.read(12)
 while 1:
 block = nextblock
 nextblock = f.read(12)
 if block == :
 break
 str = 
 for ch in block:
 if str == :
 str =   
 else:
 str += , 
 if ord(ch)  10:
 str += 0x0 + chr(ord('0')+ord(ch))
 elif ord(ch)  16:
 str += 0x0 + chr(ord('a')+ord(ch)-10)
 else:
 str += hex(ord(ch))
 if nextblock != :
 str += ,
 print str

 python ./xxd.py $  $@
 does the same thing as
 cat $ | xxd -include  $@
 (CCing David as python expert, my python knowledge is limited and
 15 years old, not sure how portable this is (python 2 vs. python 3,
 and
 even python 2 minimal versions)).

 It doesn't work with Python 3 for various reasons (print syntax, and
 str vs bytes issues).

 I'm attaching a version which works with both Python 2 and Python 3
 (2.7.5 and 3.3.2 were the versions I tried).

 It ought to work with much older python 2 versions (as your script
 appears to), but I don't have them handy.

 Presumably it would need a license header and some descriptive comments.

 (snip)

 Dave



 --


 -
 Sincerely yours,
 Maxim Blumental



-- 


-
Sincerely yours,
Maxim Blumental

Re: [gomp4, committed] Handle double reduction in oacc kernels pass group

2015-08-03 Thread Thomas Schwinge

Hi!

On Tue, 28 Jul 2015 10:20:39 +0200, Tom de Vries tom_devr...@mentor.com wrote:
 this patch adds a test-case with a double reduction in an oacc kernels 
 region.
 
 In order to get it in the proper shape for parloops to deal with, I 
 needed to repeat the pass_lim/pass_copy_prop sequence.
 
 Bootstrapped and reg-tested on x86_64.
 
 Committed to gomp-4_0-branch.

Thanks!

 --- a/gcc/passes.def
 +++ b/gcc/passes.def
 @@ -96,6 +96,8 @@ along with GCC; see the file COPYING3.  If not see
 NEXT_PASS (pass_tree_loop_init);
 NEXT_PASS (pass_lim);
 NEXT_PASS (pass_copy_prop);
 +   NEXT_PASS (pass_lim);
 +   NEXT_PASS (pass_copy_prop);
 NEXT_PASS (pass_scev_cprop);
 NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 NEXT_PASS (pass_expand_omp_ssa);

Very much in line with the older r83,
http://news.gmane.org/find-root.php?message_id=%3C871tjd1ch2.fsf%40kepler.schwinge.homeip.net%3E,
now committed to gomp-4_0-branch in r226494:

commit 3f2d750170034024232cf619f1ba6b703e37bc89
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Aug 3 10:31:23 2015 +

Update testsuite for new pass_lim

... that got added in r226305.

gcc/testsuite/
* c-c++-common/restrict-2.c: Update for new pass_lim.
* c-c++-common/restrict-4.c: Same.
* g++.dg/tree-ssa/pr33615.C: Same.
* g++.dg/tree-ssa/restrict1.C: Same.
* gcc.dg/tm/pub-safety-1.c: Same.
* gcc.dg/tm/reg-promotion.c: Same.
* gcc.dg/tree-ssa/20050314-1.c: Same.
* gcc.dg/tree-ssa/loop-32.c: Same.
* gcc.dg/tree-ssa/loop-33.c: Same.
* gcc.dg/tree-ssa/loop-34.c: Same.
* gcc.dg/tree-ssa/loop-35.c: Same.
* gcc.dg/tree-ssa/loop-7.c: Same.
* gcc.dg/tree-ssa/pr23109.c: Same.
* gcc.dg/tree-ssa/restrict-3.c: Same.
* gcc.dg/tree-ssa/restrict-5.c: Same.
* gcc.dg/tree-ssa/ssa-lim-1.c: Same.
* gcc.dg/tree-ssa/ssa-lim-10.c: Same.
* gcc.dg/tree-ssa/ssa-lim-11.c: Same.
* gcc.dg/tree-ssa/ssa-lim-12.c: Same.
* gcc.dg/tree-ssa/ssa-lim-2.c: Same.
* gcc.dg/tree-ssa/ssa-lim-3.c: Same.
* gcc.dg/tree-ssa/ssa-lim-6.c: Same.
* gcc.dg/tree-ssa/ssa-lim-7.c: Same.
* gcc.dg/tree-ssa/ssa-lim-8.c: Same.
* gcc.dg/tree-ssa/ssa-lim-9.c: Same.
* gcc.dg/tree-ssa/structopt-1.c: Same.
* gfortran.dg/pr32921.f: Same.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226494 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp|   30 +++
 gcc/testsuite/c-c++-common/restrict-2.c |4 ++--
 gcc/testsuite/c-c++-common/restrict-4.c |4 ++--
 gcc/testsuite/g++.dg/tree-ssa/pr33615.C |4 ++--
 gcc/testsuite/g++.dg/tree-ssa/restrict1.C   |4 ++--
 gcc/testsuite/gcc.dg/tm/pub-safety-1.c  |4 ++--
 gcc/testsuite/gcc.dg/tm/reg-promotion.c |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c  |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-32.c |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-33.c |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-34.c |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-35.c |6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/pr23109.c |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c  |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c  |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c   |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c  |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c  |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c  |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c   |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c   |6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c   |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c   |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c   |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c   |4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c |4 ++--
 gcc/testsuite/gfortran.dg/pr32921.f |4 ++--
 28 files changed, 86 insertions(+), 56 deletions(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 04286dd..33519ef 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,33 @@
+2015-08-03  Thomas Schwinge  tho...@codesourcery.com
+
+   * c-c++-common/restrict-2.c: Update for new pass_lim.
+   * c-c++-common/restrict-4.c: Same.
+   * g++.dg/tree-ssa/pr33615.C: Same.
+   * g++.dg/tree-ssa/restrict1.C: Same.
+   * gcc.dg/tm/pub-safety-1.c: Same.
+   * gcc.dg/tm/reg-promotion.c: Same.
+   * gcc.dg/tree-ssa/20050314-1.c: Same.
+   * gcc.dg/tree-ssa/loop-32.c: Same.
+   * gcc.dg/tree-ssa/loop-33.c: Same.
+   * gcc.dg/tree-ssa/loop-34.c: Same.

[PATCH] Fix PR66917

2015-08-03 Thread Richard Biener


The following patch fixes PR66917 - the vectorizer assumes element
alignment even for unaligned accesses, which in case of packed
ones is not a valid assumption.  This breaks testcases on arm
at least which can do unaligned (but not completely unaligned)
loads and stores.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk 
sofar.

Richard.

2015-08-03  Richard Biener  rguent...@suse.de

PR tree-optimization/66917
* tree-vectorizer.h (struct dataref_aux): Add base_element_aligned
field.
(DR_VECT_AUX): New macro.
(set_dr_misalignment): Adjust.
(dr_misalignment): Likewise.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Compute whether the base is at least element aligned.
* tree-vect-stmts.c (ensure_base_align): Adjust.
(vectorizable_store): If the base is not element aligned
preserve alignment of the original access if misalignment is unknown.
(vectorizable_load): Likewise.


Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   (revision 226396)
+++ gcc/tree-vectorizer.h   (working copy)
@@ -707,11 +707,16 @@ typedef struct _stmt_vec_info {
 #define STMT_SLP_TYPE(S)   (S)-slp_type
 
 struct dataref_aux {
-  tree base_decl;
-  bool base_misaligned;
   int misalignment;
+  /* If true the alignment of base_decl needs to be increased.  */
+  bool base_misaligned;
+  /* If true we know the base is at least vector element alignment aligned.  */
+  bool base_element_aligned;
+  tree base_decl;
 };
 
+#define DR_VECT_AUX(dr) ((dataref_aux *)(dr)-aux)
+
 #define VECT_MAX_COST 1000
 
 /* The maximum number of intermediate steps required in multi-step type
@@ -910,14 +915,13 @@ destroy_cost_data (void *data)
   targetm.vectorize.destroy_cost_data (data);
 }
 
-
 /*-*/
 /* Info on data references alignment.  */
 /*-*/
 inline void
 set_dr_misalignment (struct data_reference *dr, int val)
 {
-  dataref_aux *data_aux = (dataref_aux *) dr-aux;
+  dataref_aux *data_aux = DR_VECT_AUX (dr);
 
   if (!data_aux)
 {
@@ -931,8 +935,7 @@ set_dr_misalignment (struct data_referen
 inline int
 dr_misalignment (struct data_reference *dr)
 {
-  gcc_assert (dr-aux);
-  return ((dataref_aux *) dr-aux)-misalignment;
+  return DR_VECT_AUX (dr)-misalignment;
 }
 
 /* Reflects actual alignment of first access in the vectorized loop,
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 226396)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -622,7 +622,6 @@ vect_compute_data_ref_alignment (struct
   tree ref = DR_REF (dr);
   tree vectype;
   tree base, base_addr;
-  bool base_aligned;
   tree misalign = NULL_TREE;
   tree aligned_to;
   unsigned HOST_WIDE_INT alignment;
@@ -698,6 +697,19 @@ vect_compute_data_ref_alignment (struct
}
 }
 
+  /* To look at alignment of the base we have to preserve an inner MEM_REF
+ as that carries alignment information of the actual access.  */
+  base = ref;
+  while (handled_component_p (base))
+base = TREE_OPERAND (base, 0);
+  if (TREE_CODE (base) == MEM_REF)
+base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
+  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0));
+  unsigned int base_alignment = get_object_alignment (base);
+
+  if (base_alignment = TYPE_ALIGN (TREE_TYPE (vectype)))
+DR_VECT_AUX (dr)-base_element_aligned = true;
+
   alignment = TYPE_ALIGN_UNIT (vectype);
 
   if ((compare_tree_int (aligned_to, alignment)  0)
@@ -713,21 +725,7 @@ vect_compute_data_ref_alignment (struct
   return true;
 }
 
-  /* To look at alignment of the base we have to preserve an inner MEM_REF
- as that carries alignment information of the actual access.  */
-  base = ref;
-  while (handled_component_p (base))
-base = TREE_OPERAND (base, 0);
-  if (TREE_CODE (base) == MEM_REF)
-base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
-  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0));
-
-  if (get_object_alignment (base) = TYPE_ALIGN (vectype))
-base_aligned = true;
-  else
-base_aligned = false;
-
-  if (!base_aligned)
+  if (base_alignment  TYPE_ALIGN (vectype))
 {
   /* Strip an inner MEM_REF to a bare decl if possible.  */
   if (TREE_CODE (base) == MEM_REF
@@ -757,8 +755,9 @@ vect_compute_data_ref_alignment (struct
   dump_printf (MSG_NOTE, \n);
 }
 
-  ((dataref_aux *)dr-aux)-base_decl = base;
-  ((dataref_aux *)dr-aux)-base_misaligned = true;
+  DR_VECT_AUX (dr)-base_decl = base;
+  DR_VECT_AUX (dr)-base_misaligned = true;
+  DR_VECT_AUX (dr)-base_element_aligned = true;
 }
 
   /* If this is a

Re: [PATCH] Unswitching outer loops.

2015-08-03 Thread Richard Biener

On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Hi Richard,

 I learned your updated patch for 23825 and it is more general in
 comparison with my.
 I'd like to propose you a compromise - let's consider my patch only
 for force-vectorize outer loop only to allow outer-loop
 vecctorization.

I don't see why we should special-case that if the approach in 23825
is sensible.

 Note that your approach will not hoist invariant
 guards if loops contains something else except for inner-loop, i.e. it
 won't be empty for taken branch.

Yes, it does not perform unswitching but guard hoisting.  Note that this
is originally Zdenek Dvoraks patch.

 I also would like to answer on your last question - CFG cleanup is
 invoked to perform deletion of single-argument phi nodes from tail
 block through substitution - such phi's prevent outer-loop
 vectorization. But it is clear that such transformation can be done
 other pass.

Hmm, I wonder why the copy_prop pass after unswitching does not
get rid of them?

 What is your opinion?

My opinion is that if we want to enhance unswitching to catch this
(or similar) cases then we should make it a lot more general than
your pattern-matching approach.  I see nothing that should prevent
us from considering unswitching non-innermost loops in general.
It should be only a cost consideration to not do non-innermost loop
unswitching (in addition to maybe a --param specifying the maximum
depth of a loop nest to unswitch).

So my first thought when seeing your patch still holds - the patch
looks very much too specific.

Richard.

 Yuri.

 2015-07-28 13:50 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Hi Richard,

 I checked that both test-cases from 23855 are sucessfully unswitched
 by proposed patch. I understand that it does not catch deeper loop
 nest as
for (i=0; i10; i++)
  for (j=0;jn;j++)
 for (k=0;k20;k++)
   ...
 but duplication of middle-loop does not look reasonable.

 Here is dump for your second test-case:

 void foo(int *ie, int *je, double *x)
 {
   int i, j;
   for (j=0; j*je; ++j)
 for (i=0; i*ie; ++i)
   x[i+j] = 0.0;
 }
 grep -i unswitch t6.c.119t.unswitch
 ;; Unswitching outer loop

 I was saying that why go with a limited approach when a patch (in
 unknown state...)
 is available that does it more generally?  Also unswitching is quite
 expensive compared
 to moving the invariant condition.

 In your patch:

 +  if (!nloop-force_vectorize)
 +nloop-force_vectorize = true;
 +  if (loop-safelen != 0)
 +nloop-safelen = loop-safelen;

 I see no guard on force_vectorize so = true looks bogus here.  Please just 
 use
 copy_loop_info.

 +  if (integer_nonzerop (cond_new))
 +gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
 +  else if (integer_zerop (cond_new))
 +gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);

 gimple_cond_make_true/false (cond_stmt);

 btw, seems odd that we have to recompute which loop is the true / false 
 variant
 when we just fed a guard condition to loop_version.  Can't we statically
 determine whether loop or nloop has the in-loop condition true or false?

 +  /* Clean-up cfg to remove useless one-argument phi in exit block of
 + outer-loop.  */
 +  cleanup_tree_cfg ();

 I know unswitching is already O(number-of-unswitched-loops * 
 size-of-function)
 because it updates SSA form after each individual unswitching (and it does 
 that
 because it invokes itself recursively on unswitched loops).  But do you 
 really
 need to invoke CFG cleanup here?

 Richard.

 Yuri.

 2015-07-14 14:06 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev ysrum...@gmail.com 
 wrote:
 Hi All,

 Here is presented simple transformation which tries to hoist out of
 outer-loop a check on zero trip count for inner-loop. This is very
 restricted transformation since it accepts outer-loops with very
 simple cfg, as for example:
 acc = 0;
for (i = 1; i = m; i++) {
   for (j = 0; j  n; j++)
  if (l[j] == i) { v[j] = acc; acc++; };
   acc = 1;
}

 Note that degenerative outer loop (without inner loop) will be
 completely deleted as dead code.
 The main goal of this transformation was to convert outer-loop to form
 accepted by outer-loop vectorization (such test-case is also included
 to patch).

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

 I think this is

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855

 as well.  It has a patch adding a invariant loop guard hoisting
 phase to loop-header copying.  Yeah, it needs updating to
 trunk again I suppose.  It's always non-stage1 when I come
 back to that patch.

 Your patch seems to be very specific and only handles outer
 loops of innermost loops.

 Richard.

 ChangeLog:
 2015-07-10  Yuri Rumyantsev  ysrum...@gmail.com

 *

Re: [ARM] Optimize compare against smin/umin

2015-08-03 Thread Christophe Lyon

Hi Michael,

It looks like this patch introduces regressions on armeb in:
gcc.dg/vect/vect-reduc-7.c
gcc.dg/vect/vect-reduc-8.c
gcc.dg/vect/vect-reduc-9.c

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/226476/report-build-info.html
for a bit more details.

Can you have a look?

Thanks,

Christophe.


On 27 July 2015 at 10:17, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi Michael,

 On 26/07/15 23:54, Michael Collison wrote:

 Here is an updated patch that addresses the issues you mentioned:

 2015-07-24  Michael Collison  michael.colli...@linaro.org
   Ramana Radhakrishnan ramana.radhakrish...@arm.com

 * gcc/config/arm/arm.md (*arm_smin_cmp): New pattern.
 (*arm_umin_cmp): Likewise.
 * gcc.target/arm/mincmp.c: Test min compare idiom.


 Just New test. would be a better ChangeLog entry for this.



 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 0be70a8..361c292 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -3455,6 +3455,44 @@
   (set_attr type multiple,multiple)]
)

 +;; t = (s/u)min (x, y)
 +;; cc = cmp (t, z)
 +;; is the same as
 +;; cmp x, z
 +;; cmpge(u) y, z
 +
 +(define_insn_and_split *arm_smin_cmp
 +  [(set (reg:CC CC_REGNUM)
 +(compare:CC
 + (smin:SI (match_operand:SI 0 s_register_operand r)
 +  (match_operand:SI 1 s_register_operand r))
 + (match_operand:SI 2 s_register_operand r)))]
 +  TARGET_32BIT
 +  #
 +   reload_completed
 +  [(set (reg:CC CC_REGNUM)
 +(compare:CC (match_dup 0) (match_dup 2)))
 +   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
 +  (set (reg:CC CC_REGNUM)
 +   (compare:CC (match_dup 1) (match_dup 2]
 +)
 +
 +(define_insn_and_split *arm_umin_cmp
 +  [(set (reg:CC CC_REGNUM)
 +(compare:CC
 + (umin:SI (match_operand:SI 0 s_register_operand r)
 +  (match_operand:SI 1 s_register_operand r))
 + (match_operand:SI 2 s_register_operand r)))]
 +  TARGET_32BIT
 +  #
 +   reload_completed
 +  [(set (reg:CC CC_REGNUM)
 +(compare:CC (match_dup 0) (match_dup 2)))
 +   (cond_exec (geu:CC (reg:CC CC_REGNUM) (const_int 0))
 +  (set (reg:CC CC_REGNUM)
 +   (compare:CC (match_dup 1) (match_dup 2]
 +)
 +
(define_expand umaxsi3
  [(parallel [
(set (match_operand:SI 0 s_register_operand )
 diff --git a/gcc/testsuite/gcc.target/arm/mincmp.c
 b/gcc/testsuite/gcc.target/arm/mincmp.c
 new file mode 100644
 index 000..2a55c6d
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/mincmp.c
 @@ -0,0 +1,18 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 } */
 +/* { dg-require-effective-target arm32 } */
 +
 +#define min(x, y) ((x) = (y)) ? (x) : (y)
 +
 +unsigned int foo (unsigned int i, unsigned int x ,unsigned int y)


 Comma after unsigned int x in the wrong place.

 +{
 +  return i  (min (x, y));
 +}
 +
 +int bar (int i, int x, int y)
 +{
 +  return i  (min (x, y));
 +}


 Can you please use GNU style for the testcase.
 New line after return type, function name an arguments on their own line.

 Ok with these changes.

 Thanks,
 Kyrill

[PATCH, PING*2] Track indirect calls for call site information in debug info.

2015-08-03 Thread Pierre-Marie de Rodat


On 07/20/2015 02:45 PM, Pierre-Marie de Rodat wrote:

On PowerPC targets with -mlongcall, most subprogram calls are turned
into indirect calls: the call target is read from a register even though
it is compile-time known. This makes it difficult for machine code
static analysis engines to recover the callee information. The attached
patch is an attempt to help such engines, generating
DW_AT_abstract_origin attributes for all DW_TAG_GNU_call_site we are
interested in.


Ping for the patch submitted in
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01641.html. Thanks!

--
Pierre-Marie de Rodat

Re: C++ delayed folding branch review

2015-08-03 Thread Kai Tietz

2015-08-03 5:49 GMT+02:00 Jason Merrill ja...@redhat.com:
 On 07/31/2015 05:54 PM, Kai Tietz wrote:

 The STRIP_NOPS-requirement in 'reduced_constant_expression_p' I could
 remove, but for one case in constexpr.  Without folding we don't do
 type-sinking/raising.


 Right.

 So binary/unary operations might be containing cast, which were in the
 past unexpected.


 Why aren't the casts folded away?

On such cast constructs, as for this vector-sample, we can't fold away
the cast chain.  The difference here to none-delayed-folding branch is
that the cast isn't moved out of the plus-expr.  What we see now is
(plus ((vec) (const vector ...) {  }), ...).  Before we had (vec)
(plus (const vector ...) { ... }).

 On verify_constant we check by reduced_constant_expression_p, if value is
 a constant.  We don't handle here, that NOP_EXPRs are something we want to
 look through here, as it doesn't change anything if this is a constant, or
 not.


 NOPs around constants should have been folded away by the time we get there.

Not in this cases, as the we actually have here a switch from const to
none-const.  So there is an attribute-change, which we can't ignore in
general.  But I agree that for constexpr's we could special case cast
from const to none-const (as required in expressions like const vec v
= v + 1).

 Jason


Kai

Re: [PATCH][AArch64][8/14] Implement TARGET_OPTION_VALID_ATTRIBUTE_P

2015-08-03 Thread James Greenhalgh

On Fri, Jul 24, 2015 at 11:43:32AM +0100, Kyrill Tkachov wrote:
 
 On 21/07/15 16:37, James Greenhalgh wrote:
  On Thu, Jul 16, 2015 at 04:20:59PM +0100, Kyrill Tkachov wrote:
  Hi all,
 
  This patch implements target attribute support via the 
  TARGET_OPTION_VALID_ATTRIBUTE_P hook.
  The aarch64_handle_option function in 
  common/config/aarch64/aarch64-common.c is exported to the
  backend and beefed up a bit.
 
  The target attributes supported by this patch reflect the command-line 
  options that we specified as Save
  earlier in the series.  Explicitly, the target attributes supported are:
 - general-regs-only
 - fix-cortex-a53-835769 and no-fix-cortex-a53-835769
 - cmodel=
 - strict-align
 - omit-leaf-frame-pointer and no-omit-leaf-frame-pointer
 - tls-dialect
 - arch=
 - cpu=
 - tune=
 
  These correspond to equivalent command-line options when prefixed with a 
  '-m'.
  Additionally, this patch supports specifying architectural features, as in 
  the -march and -mcpu options
  by themselves. So, for example we can write:
  __attribute__((target(+simd+crypto)))
  to enable 'simd' and 'crypto' on a per-function basis.
 
  The documentation and tests for this come as a separate patch later after 
  the target pragma support and
  the inlining rules are implemented.
 
  Bootstrapped and tested on aarch64.
 
  In addition to the comments below, you may want to run
  contrib/check_GNU_style.sh on this patch, it shows a number of other
  issues that would be nice to fix.
 
 Thanks, here's the updated patch.
 I used alloca for memory allocation to keep consistent with the rest of 
 aarch64.c,
 fixed the error message issues, spacing and control flow issues you pointed 
 out.
 
 How's this?

 +static bool
 +aarch64_process_one_target_attr (char *arg_str, const char* pragma_or_attr)
 +{
 +  bool ret;
 +  bool invert = false;
 +

snip

 +  ret = false;

I don't think this variable is exactly what you want. Surely as soon
as we see a failure case, we want to return false. Otherwise we could
see junk,junk,ok_stuff and return true.

So I would drop ret and rewrite this to bail out as soon as we see an
error. The downside is we won't catch multiple errors that way.

The other thing to do would be to invert the sense of your flag to something
like failed and only set it where we fail. It depends what behaviour
you're looking for.

 +  for (p_attr = aarch64_attributes; !ret  p_attr-name; p_attr++)
 +{
 +  /* If the names don't match up, or the user has given an argument
 +  to an attribute that doesn't accept one, or didn't give an argument
 +  to an attribute that expects one, fail to match.  */
 +  if (strcmp (str_to_check, p_attr-name) != 0)
 + continue;
 +
 +  bool attr_need_arg_p = p_attr-attr_type == aarch64_attr_custom
 +   || p_attr-attr_type == aarch64_attr_enum;
 +
 +  if (attr_need_arg_p ^ (arg != NULL))
 + {
 +   error (target %s %qs does not accept an argument,
 +   pragma_or_attr, str_to_check);
 +   return false;
 + }
 +
 +  /* If the name matches but the attribute does not allow no- versions
 +  then we can't match.  */
 +  if (invert  !p_attr-allow_neg)
 + {
 +   error (target %s %qs does not allow a negated form,
 +   pragma_or_attr, str_to_check);
 +   return false;
 + }
 +
 +  switch (p_attr-attr_type)
 + {
 + /* Has a custom handler registered.
 +For example, cpu=, arch=, tune=.  */
 +   case aarch64_attr_custom:
 + gcc_assert (p_attr-handler);
 + ret = p_attr-handler (arg, pragma_or_attr);
 + break;
 +
 +   /* Either set or unset a boolean option.  */
 +   case aarch64_attr_bool:
 + {
 +   struct cl_decoded_option decoded;
 +
 +   generate_option (p_attr-opt_num, NULL, !invert,
 +CL_TARGET, decoded);
 +   aarch64_handle_option (global_options, global_options_set,
 +   decoded, input_location);
 +   ret = true;
 +   break;
 + }
 +   /* Set or unset a bit in the target_flags.  aarch64_handle_option
 +  should know what mask to apply given the option number.  */
 +   case aarch64_attr_mask:
 + {
 +   struct cl_decoded_option decoded;
 +   /* We only need to specify the option number.
 +  aarch64_handle_option will know which mask to apply.  */
 +   decoded.opt_index = p_attr-opt_num;
 +   decoded.value = !invert;
 +   aarch64_handle_option (global_options, global_options_set,
 +   decoded, input_location);
 +   ret = true;
 +   break;
 + }
 +   /* Use the option setting machinery to set an option to an enum.  */
 +   case aarch64_attr_enum:
 + {
 +   gcc_assert (arg);
 +   bool valid;
 +   int

Re: [PATCH][ARM] PR target/66731 Fix vnmul insn with -frounding-math

2015-08-03 Thread Kyrill Tkachov



On 28/07/15 11:21, Szabolcs Nagy wrote:

On 24/07/15 12:27, Kyrill Tkachov wrote:

On 24/07/15 12:10, Szabolcs Nagy wrote:

(-a)*b should not be compiled to vnmul a,b with -frounding-math.
Added a new -(a*b) pattern for vnmul and the old one is only
used if !flag_rounding_math.  Updated the costs too.

This is the ARM version of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00300.html

Tested with arm-none-linux-gnueabihf cross compiler.
is this OK?

gcc/Changelog:

2015-07-20  Szabolcs Nagy  szabolcs.n...@arm.com

PR target/66731
* config/arm/arm.md (muldf3negdf_vfp): Handle -frounding-math.
(mulsf3negsf_vfp): Likewise.

This entry is misleading. You disable two existing patterns
for flag_rounding_math and you add two new patterns. The
entry should reflect that.

..

Can you give the new pattern a different name to reflect that
the neg is on the outside? Something like *negmulsf3_vfp.

..

+/* { dg-options -O2 -mfpu=vfp -mfloat-abi=hard } */

Can you please add an explicit -fno-rounding-math here? That way we get a hint 
as to
why these tests exist. Alternatively, you can rename the tests to be 
pr66731_1.c,
pr66731_2.c etc. That way in the future we'll know what issue they're testing 
for.

..

+float
+foo_s (float a, float b)
+{
+  /* { dg-final { scan-assembler vneg\.f32 } } */
+  /* { dg-final { scan-assembler vmul\.f32 } } */
+  return -a * b;
+}

I'd prefer if you just do a scan-assembler not vnmul, which is what this
patch really fixes. Whether the midend decides to use a pair of vneg+vmul
is tangential to this patch, it's the vnmul that we're trying to avoid.

[v2]:
- used different names for the new patterns
- fixed change log accordingly
- used explicit -fno-rounding-math in tests
- used scan-assembler-not vnmul

(I havent changed the names of the tests to be
consistent with the aarch64 patches but if ppl
prefer prN.c name I can do that.)

gcc/Changelog:

2015-07-28  Szabolcs Nagy  szabolcs.n...@arm.com

PR target/66731
* config/arm/arm.md (negmuldf3_vfp): Add new pattern.
(negmulsf3_vfp): Likewise.
(muldf3negdf_vfp): Disable for -frounding-math.
(mulsf3negsf_vfp): Likewise.
* config/arm/arm.c (arm_new_rtx_costs): Fix NEG cost for VNMUL,
fix MULT cost with -frounding-math.

gcc/testsuite/Changelog:

2015-07-28  Szabolcs Nagy  szabolcs.n...@arm.com

PR target/66731
* gcc.target/arm/vnmul-1.c: New.
* gcc.target/arm/vnmul-2.c: New.
* gcc.target/arm/vnmul-3.c: New.
* gcc.target/arm/vnmul-4.c: New.


This is ok, thanks.

Kyrill

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-08-03 Thread Mikael Morin


Le 29/07/2015 17:08, Paul Richard Thomas a écrit :

Dear All,

On 24 July 2015 at 10:08, Damian Rouson dam...@sourceryinstitute.org wrote:

I love this idea and had similar thoughts as well.

:D

Sent from my iPhone


On Jul 24, 2015, at 1:06 AM, Paul Richard Thomas 
paul.richard.tho...@gmail.com wrote:

Dear Mikael,

It had crossed my mind also that a .mod and a .smod file could be
written. Normally, the .smod files are produced by the submodules
themselves, so that their descendants can pick up the symbols that
they generate. There is no reason at all why this could not be
implemented; early on in the development I did just this, although I
think that it would now be easier to modify this patch.

One huge advantage of proceeding in this way is that any resulting
library can be distributed with the .mod file alone so that the
private entities are never exposed. The penalty is that a second file
is output.

With best regards

Paul



Please find attached the implementation of this suggestion.

Bootstraps and regtests on FC21/x86_64 - OK for trunk or is the
original preferred?

There hasn't been a lot of voices about this among the other active and 
less active team members.

I prefer this private members to separate smod variant.
It's OK for trunk as far as I'm concerned.
Thanks.

Mikael

PS: Regarding redundant initializations: rather have too many than too 
few. ;-)

Re: [PATCH][AArch64][9/14] Implement TARGET_CAN_INLINE_P

2015-08-03 Thread James Greenhalgh

On Thu, Jul 23, 2015 at 11:17:20AM +0100, Kyrill Tkachov wrote:
 Thanks, I've implemented the suggestions.
 Re-bootstrapped and tested on aarch64.
 How's this?

This is good. OK for trunk.

Thanks,
James

Re: [gomp4] openacc default handling

2015-08-03 Thread Thomas Schwinge

Hi!

On Wed, 29 Jul 2015 19:13:01 -0400, Nathan Sidwell nat...@acm.org wrote:
   gcc/
   * gimplify.c (oacc_default_clause): Outer scope searching moved to
   omp_notice_variable.
   (omp_notice_variable): For OpenACC search enclosing scopes before
   applying default.

Committed to gomp-4_0-branch in r226495:

commit 9446d5a23e74653407f079188e8be4bc9343e15e
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Aug 3 11:06:38 2015 +

Resolve bootstrap failure in oacc_default_clause

[...]/source-gcc/gcc/gimplify.c: In function 'unsigned int 
oacc_default_clause(gimplify_omp_ctx*, tree, bool, unsigned int)':
[...]/source-gcc/gcc/gimplify.c:5913:13: error: unused parameter 
'in_code' [-Werror=unused-parameter]
bool in_code, unsigned flags)
 ^

Fixup for r226373.

gcc/
* gimplify.c (oacc_default_clause): Remove in_code formal
parameter.  Adjust all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226495 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |5 +
 gcc/gimplify.c |5 ++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 3b907ea..a30f6a3 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-08-03  Thomas Schwinge  tho...@codesourcery.com
+
+   * gimplify.c (oacc_default_clause): Remove in_code formal
+   parameter.  Adjust all users.
+
 2015-08-02  Nathan Sidwell  nat...@codesourcery.com
 
* omp-low.c (oacc_xform_on_device): New function.
diff --git gcc/gimplify.c gcc/gimplify.c
index 0f8011d..f74f842 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -5909,8 +5909,7 @@ device_resident_p (tree decl)
but not declared in an enclosing clause.  */
 
 static unsigned
-oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl,
-bool in_code, unsigned flags)
+oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
 {
   switch (ctx-default_kind)
 {
@@ -6036,7 +6035,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
}
 
  if (is_oacc)
-   flags = oacc_default_clause (ctx, decl, in_code, flags);
+   flags = oacc_default_clause (ctx, decl, flags);
  else
flags |= GOVD_MAP;
 


Grüße,
 Thomas


signature.asc
Description: PGP signature

[PATCH, PING*2] PR debug/53927: fix value for DW_AT_static_link

2015-08-03 Thread Pierre-Marie de Rodat


On 07/20/2015 09:39 AM, Pierre-Marie de Rodat wrote:

This patch fixes the static link description in DWARF to comply with the
specification. In order to do so, it appends a field to all FRAME
objects to hold the frame base address (DW_AT_frame_base) so that the
nested subrograms can directly reference this field.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53927 for the context
(in particular why we need this additional field in FRAME objects).


Ping for the patch submitted in
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01629.html. Thanks!

--
Pierre-Marie de Rodat

Re: [PATCH] Use single shared memory block pool for all pool allocators

2015-08-03 Thread Mikhail Maltsev

On Jul 26, 2015, at 11:50 AM, Andi Kleen a...@firstfloor.org wrote:
 I've been compiling gcc with tcmalloc to do a similar speedup. It would be
 interesting to compare that to your patch.
I repeated the test with TCMalloc and jemalloc. TCMalloc shows nice results,
though it required some tweaks: this allocator has a threshold block size equal
to 32 KB, larger blocks are allocated from global heap, rather than thread cache
(and this operation is expensive), so the original patch shows worse performance
when used with TCMalloc. In order to fix this, I reduced the block size to 8 KB.
Here there are 5 columns for each value: pristine version, pristine version +
TCMalloc (and the difference in parenthesis), and patched version with TCMalloc
(difference is relative to pristine version). Likewise, for memory usage.

400.perlbench26.86  26.17 (  -2.57%)  26.17 (  -2.57%) user
  0.56   0.64 ( +14.29%)   0.61 (  +8.93%) sys
 27.45  26.84 (  -2.22%)  26.81 (  -2.33%) real
401.bzip2 2.532.5 (  -1.19%)   2.48 (  -1.98%) user
  0.07   0.09 ( +28.57%)0.1 ( +42.86%) sys
  2.612.6 (  -0.38%)   2.59 (  -0.77%) real
403.gcc  73.59  72.62 (  -1.32%)  71.72 (  -2.54%) user
  1.59   1.88 ( +18.24%)   1.88 ( +18.24%) sys
 75.27  74.58 (  -0.92%)  73.67 (  -2.13%) real
429.mcf0.4   0.41 (  +2.50%)0.4 (  +0.00%) user
  0.03   0.05 ( +66.67%)   0.05 ( +66.67%) sys
  0.44   0.47 (  +6.82%)   0.47 (  +6.82%) real
433.milc  3.22   3.24 (  +0.62%)   3.25 (  +0.93%) user
  0.22   0.32 ( +45.45%)0.3 ( +36.36%) sys
  3.48   3.59 (  +3.16%)   3.59 (  +3.16%) real
444.namd  7.54   7.41 (  -1.72%)   7.37 (  -2.25%) user
   0.1   0.15 ( +50.00%)   0.15 ( +50.00%) sys
  7.66   7.58 (  -1.04%)   7.54 (  -1.57%) real
445.gobmk20.24  19.59 (  -3.21%)   19.6 (  -3.16%) user
  0.52   0.67 ( +28.85%)   0.59 ( +13.46%) sys
  20.8  20.29 (  -2.45%)  20.23 (  -2.74%) real
450.soplex   19.08  18.47 (  -3.20%)  18.51 (  -2.99%) user
  0.87   1.11 ( +27.59%)   1.06 ( +21.84%) sys
 19.99  19.62 (  -1.85%)   19.6 (  -1.95%) real
453.povray   42.27  41.42 (  -2.01%)  41.32 (  -2.25%) user
  2.71   3.11 ( +14.76%)   3.09 ( +14.02%) sys
 45.04  44.58 (  -1.02%)  44.47 (  -1.27%) real
456.hmmer 7.27   7.22 (  -0.69%)   7.15 (  -1.65%) user
  0.31   0.36 ( +16.13%)   0.39 ( +25.81%) sys
  7.61   7.61 (  +0.00%)   7.57 (  -0.53%) real
458.sjeng 3.22   3.14 (  -2.48%)   3.15 (  -2.17%) user
  0.09   0.16 ( +77.78%)   0.14 ( +55.56%) sys
  3.32   3.32 (  +0.00%)3.3 (  -0.60%) real
462.libquantum0.86   0.87 (  +1.16%)   0.85 (  -1.16%) user
  0.05   0.08 ( +60.00%)   0.08 ( +60.00%) sys
  0.92   0.96 (  +4.35%)   0.94 (  +2.17%) real
464.h264ref  27.62  27.27 (  -1.27%)  27.16 (  -1.67%) user
  0.63   0.73 ( +15.87%)   0.75 ( +19.05%) sys
 28.28  28.03 (  -0.88%)  27.95 (  -1.17%) real
470.lbm   0.27   0.27 (  +0.00%)   0.27 (  +0.00%) user
  0.01   0.01 (  +0.00%)   0.01 (  +0.00%) sys
  0.29   0.29 (  +0.00%)   0.29 (  +0.00%) real
471.omnetpp  28.29  27.63 (  -2.33%)  27.54 (  -2.65%) user
   1.5   1.57 (  +4.67%)   1.62 (  +8.00%) sys
 29.84  29.25 (  -1.98%)  29.21 (  -2.11%) real
473.astar 1.14   1.12 (  -1.75%)   1.11 (  -2.63%) user
  0.05   0.07 ( +40.00%)   0.09 ( +80.00%) sys
  1.21   1.21 (  +0.00%)1.2 (  -0.83%) real
482.sphinx3   4.65   4.57 (  -1.72%)   4.59 (  -1.29%) user
   0.20.3 ( +50.00%)   0.26 ( +30.00%) sys
  4.88   4.89 (  +0.20%)   4.88 (  +0.00%) real
483.xalancbmk284.5  276.4 (  -2.85%) 276.48 (  -2.82%) user
 20.29  23.03 ( +13.50%)  22.82 ( +12.47%) sys
305.19 299.79 (  -1.77%) 299.67 (  -1.81%) real

400.perlbench 102308kB123004kB  (  +20696kB)116104kB  (  +13796kB)
401.bzip2  74628kB 86936kB  (  +12308kB) 84316kB  (   +9688kB)
403.gcc   190284kB218180kB  (  +27896kB)212480kB  (  +22196kB)
429.mcf19804kB 24464kB  (   +4660kB) 24320kB  (   +4516kB)
433.milc   36940kB 45308kB  (   +8368kB) 44652kB  (   +7712kB)
444.namd  183548kB193856kB  (  +10308kB)192632kB  (   +9084kB)
445.gobmk  73724kB 78792kB  (

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-08-03 Thread Richard Biener

On Fri, Jul 31, 2015 at 8:27 PM, Jeff Law l...@redhat.com wrote:
 On 07/31/2015 12:18 PM, Michael Collison wrote:

 Hi Jeff,

 Yes I will create a test case. I'm not quite sure what to check for even
 in the machine dependent test case. It's quite possible for the
 instructions that are generated to change over time.

 I think we're going to want to look at the gimple IR and search for the
 MIN/MAX expressions rather than the instructions.  Given we don't know where
 the transformation is going to land (yet), you can probably start with
 -fdump-tree-optimized and scanning the .optimized dump.

Yeah.  FWIW I'm ok with the specialized pattern in match.pd.  Indeed phiopt
isn't a good fit here (though on some targets you may end up seeing a PHI node,
dependent on LOGICAL_OP_NON_SHORT_CIRCUIT).  For longer chains
I'd say reassoc is the appropriate place to handle this as it has
already code to
combine s and ||s of conditions.  Maybe you can give it a short try to only
handle it there.

I'm also somewhat worried about code-generation on random targets.  So can
you have a quick look (just for your testcase) what the code-gen difference is
on primary platforms (just build some stage1 cc1s)?

Thanks,
Richard.

 We can still do that and have the test be target specific.

 jeff

Re: Go patch committed: Fix error reporting for invalid builtin calls

2015-08-03 Thread Andreas Schwab

Ian Lance Taylor i...@golang.org writes:

 This patch from Chris Manghane fixes the Go frontend error reporting
 for invalid builtin calls, by not losing track of whether the call is
 erroneous.  This fixes https://golang.org/issue/11561.  Bootstrapped
 and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to
 mainline.

../../../libgo/runtime/mprof.goc: In function 'runtime_Stack':
../../../libgo/runtime/mprof.goc:408:5: error: calling 
'__builtin_frame_address' with a nonzero argument is unsafe 
[-Werror=frame-address]
  sp = runtime_getcallersp(b);
 ^

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.

Re: [PATCH] Add __builtin_stack_top to x86 backend

2015-08-03 Thread Uros Bizjak

On Thu, Jul 30, 2015 at 8:41 PM, H.J. Lu hongjiu...@intel.com wrote:
 On Tue, Jul 21, 2015 at 02:45:39PM -0700, H.J. Lu wrote:
 When __builtin_frame_address is used to retrieve the address of the
 function stack frame, the frame pointer is always kept, which wastes one
 register and 2 instructions.  For x86-32, one less register means
 significant negative impact on performance.  This patch adds a new
 builtin function, __builtin_ia32_stack_top, to x86 backend.  It
 returns the stack address when the function is called.

 Any comments, feedbacks?


 Although this function is generic, but implementation is target
 specific.  I submitted a generic patch:

 https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01859.html

 So far there are no interests from other backends.  Here is a patch
 to implement __builtin_stack_top in x86 backend.  We can update x86
 backedn after it is added to middle-end.  OK for trunk?

I think that the discussion about generic implementation should come
to some conclusion first. From the discussion, here was no resolution
on which way to go.

Uros.

[PATCH, PING] DWARF: materialize subprogram renamings in Ada as imported declarations

2015-08-03 Thread Pierre-Marie de Rodat


On 07/25/2015 09:44 PM, Pierre-Marie de Rodat wrote:

This change makes GCC materialize subprogram renamings in Ada as
imported declarations (DW_TAG_imported_declarations). For instance,

 procedure Foo renames Bar;

will output:

 DW_TAG_imported_declaration:
 DW_AT_name: foo
 DW_AT_import: reference to Bar


Ping for the patch submitted in 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02166.html.


--
Pierre-Marie de Rodat

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread H.J. Lu

On Mon, Aug 3, 2015 at 7:04 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 On 03/08/15 13:33, Uros Bizjak wrote:

 Hello!

 2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
  when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
  explicit.  Prefer to add the flag whenever possible.
  (noce_process_if_block): Try noce_try_store_flag_constants before
  noce_try_cmove.

 2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * gcc.target/aarch64/csel_bfx_1.c: New test.
  * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

 This patch regressed following tests on x86_64:

 FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
 FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]

 The difference for cmov3.c on x86_64 is:

cmpl%esi, %edi
movl$-5, %edx
movl$5, %eax
cmovg   %edx, %eax
ret

 vs. new code:

xorl%eax, %eax
cmpl%esi, %edi
setle   %al
negl%eax
andl$10, %eax
subl$5, %eax
ret

 I'm not sure old code is really better than new. HJ, do you have any
 better insight?

We are looking into it.

-- 
H.J.

Re: [PATCH 2/5] Add preferred_for_{size,speed} attributes

2015-08-03 Thread H.J. Lu

On Fri, Oct 17, 2014 at 7:48 AM, Richard Sandiford
richard.sandif...@arm.com wrote:
 This is the main patch, to add new preferred_for_size and
 preferred_for_speed attributes that can be used to selectively disable
 alternatives when optimising for size or speed.  As explained in the
 docs, the new attributes are just optimisation hints and it is possible
 that size-only alternatives will sometimes end up in a block that's
 optimised for speed, or vice versa.

 The patch deals with code that directly accesses the enabled_attributes
 mask and that ought to take size/speed choices into account.  The next
 patch deals with indirect uses.  Note that I'm not making reload support
 these attributes for hopefully obvious reasons :-)

 Richard


 gcc/
 * doc/md.texi: Document preferred_for_size and preferred_for_speed
 attributes.
 * genattr.c (main): Handle preferred_for_size and
 preferred_for_speed in the same way as enabled.
 * recog.h (bool_attr): New enum.
 (target_recog): Replace x_enabled_alternatives with x_bool_attr_masks.
 (get_preferred_alternatives, check_bool_attrs): Declare.
 * recog.c (have_bool_attr, get_bool_attr, get_bool_attr_mask_uncached)
 (get_bool_attr_mask, get_preferred_alternatives, check_bool_attrs):
 New functions.
 (get_enabled_alternatives): Use get_bool_attr_mask.
 * ira-costs.c (record_reg_classes): Use get_preferred_alternatives
 instead of recog_data.enabled_alternatives.
 * ira.c (ira_setup_alts): Likewise.
 * postreload.c (reload_cse_simplify_operands): Likewise.
 * config/i386/i386.c (ix86_legitimate_combined_insn): Likewise.
 * ira-lives.c (preferred_alternatives): New variable.
 (process_bb_node_lives): Set it.
 (check_and_make_def_conflict, make_early_clobber_and_input_conflicts)
 (single_reg_class, ira_implicitly_set_insn_hard_regs): Use it instead
 of recog_data.enabled_alternatives.
 * lra-int.h (lra_insn_recog_data): Replace enabled_alternatives
 to preferred_alternatives.
 * lra-constraints.c (process_alt_operands): Update accordingly.
 * lra.c (lra_set_insn_recog_data): Likewise.
 (lra_update_insn_recog_data): Assert check_bool_attrs.


This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67029

-- 
H.J.

[PATCH] Build *-match.o as early as possible

2015-08-03 Thread Segher Boessenkool

The two files *-match.o files always finish building last, so if we
start building them as soon as possible (instead of pretty late) the
total build time will be less on a parallel build.

Bootstrapped and tested on powerpc64-linux.  Is this okay for trunk?


Segher


2014-080-3  Segher Boessenkool  seg...@kernel.crashing.org

* Makefile.in (OBJS): Put gimple-match.o and generic-match.o first.

---
 gcc/Makefile.in | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index be259e8..683c42a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1162,10 +1162,12 @@ C_COMMON_OBJS = c-family/c-common.o 
c-family/c-cppbuiltin.o c-family/c-dump.o \
   c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o
 
 # Language-independent object files.
-# We put the insn-*.o files first so that a parallel make will build
-# them sooner, because they are large and otherwise tend to be the
-# last objects to finish building.
+# We put the *-match.o and insn-*.o files first so that a parallel make
+# will build them sooner, because they are large and otherwise tend to be
+# the last objects to finish building.
 OBJS = \
+   gimple-match.o \
+   generic-match.o \
insn-attrtab.o \
insn-automata.o \
insn-dfatab.o \
@@ -1260,8 +1262,6 @@ OBJS = \
gimple-fold.o \
gimple-laddress.o \
gimple-low.o \
-   gimple-match.o \
-   generic-match.o \
gimple-pretty-print.o \
gimple-ssa-isolate-paths.o \
gimple-ssa-strength-reduction.o \
-- 
1.8.1.4

[hsa] HSA: introduce more pure kernel information emission.

2015-08-03 Thread Martin Liška

Hello.

Following patch has been installed to HSA branch.

Thanks,
Martin
From 64f7f75a8b16dee2071fef7f546fbf6b06f82c4f Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Fri, 31 Jul 2015 17:53:57 +0200
Subject: [PATCH 1/3] HSA: introduce more pure kernel information emission.

libgomp/ChangeLog:

2015-07-31  Martin Liska  mli...@suse.cz

	* plugin/plugin-hsa.c (struct hsa_kernel_description): New structure.
	(struct brig_image_desc): Use it.
	(struct kernel_info): Add omp_data_size member.
	(struct module_info): Remove unused field.
	(GOMP_OFFLOAD_load_image): Load the newly added format.
	(create_kernel_dispatch): Allocate exactly ammount of memory needed
	for OMP data passing.
	(init_single_kernel): Print the OMP memory.
	hsa.c (init_hsa_image): Use new structure.

gcc/ChangeLog:

2015-07-31  Martin Liska  mli...@suse.cz

	* hsa-brig.c (hsa_output_kernel_mapping): Serialize new data structure.
	* hsa-gen.c (hsa_function_representation::hsa_function_representation):
	Add new field.
	(gen_hsa_insns_for_kernel_call): Calculate maximum amount of memory
	needed for OMP data.
	(generate_hsa): Likewise.
	* hsa.c (hsa_add_kern_decl_mapping): New function.
	(hsa_get_decl_kernel_mapping_omp_size): Likewise.
	* hsa.h: Add declaration of these functions.
---
 gcc/hsa-brig.c  | 231 ++--
 gcc/hsa-gen.c   |  16 ++-
 gcc/hsa.c   |  13 ++-
 gcc/hsa.h   |   8 +-
 libgomp/hsa.c   |  35 +++
 libgomp/plugin/plugin-hsa.c | 107 +---
 6 files changed, 209 insertions(+), 201 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 27c41a5..1cc0b22 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -1753,7 +1753,7 @@ hsa_output_kernel_mapping (tree brig_decl)
   unsigned map_count = hsa_get_number_decl_kernel_mappings ();
 
   tree int_num_of_kernels;
-  int_num_of_kernels = build_int_cst (integer_type_node, (int) map_count);
+  int_num_of_kernels = build_int_cst (uint32_type_node, map_count);
   tree kernel_num_index_type = build_index_type (int_num_of_kernels);
   tree host_functions_array_type = build_array_type (ptr_type_node,
 		 kernel_num_index_type);
@@ -1768,7 +1768,7 @@ hsa_output_kernel_mapping (tree brig_decl)
   tree host_functions_ctor = build_constructor (host_functions_array_type,
 		host_functions_vec);
   char tmp_name[64];
-  ASM_GENERATE_INTERNAL_LABEL (tmp_name, hsa_host_functions, 1);
+  ASM_GENERATE_INTERNAL_LABEL (tmp_name, __hsa_host_functions, 1);
   tree hsa_host_func_table = build_decl (UNKNOWN_LOCATION, VAR_DECL,
 	 get_identifier (tmp_name),
 	 host_functions_array_type);
@@ -1782,115 +1782,161 @@ hsa_output_kernel_mapping (tree brig_decl)
   DECL_INITIAL (hsa_host_func_table) = host_functions_ctor;
   varpool_node::finalize_decl (hsa_host_func_table);
 
-  int len = 0;
-  for (unsigned i = 0; i  map_count; ++i)
-{
-  char *name = hsa_get_decl_kernel_mapping_name (i);
-  /* We add 1 for the terminating zero and 1 for an ampersand prefix.  */
-  len = len + strlen (name) + 2;
-}
-  len++;
+  /* Following code emits list of kernel_info structures.  */
 
-  /* Kernel mapping is a list of string names terminated by '\0'.  */
-  char *buf = XNEWVEC (char, len);
-  char *p = buf;
-  for (unsigned i = 0; i  map_count; ++i)
-{
-  char *name = hsa_get_decl_kernel_mapping_name (i);
-  int ll = strlen (name);
-  gcc_assert (ll  0);
-  *p = '';
-  p++;
-  memcpy (p, name, ll);
-  p += ll;
-  *p = '\0';
-  p++;
-}
-  *p = '\0';
-  tree kern_names = build_string (len, buf);
-  TREE_TYPE (kern_names) = build_array_type (char_type_node,
-	 build_index_type (size_int (len)));
-  free (buf);
+  tree kernel_info_type = make_node (RECORD_TYPE);
+  tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier (name), ptr_type_node);
+  DECL_CHAIN (id_f1) = NULL_TREE;
+  tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier (omp_data_size),
+			   unsigned_type_node);
+  DECL_CHAIN (id_f2) = id_f1;
+  tree id_f3 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier (kernel_dependencies_count),
+			   unsigned_type_node);
+  DECL_CHAIN (id_f3) = id_f2;
+  tree id_f4 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier (kernel_dependencies),
+			   build_pointer_type (build_pointer_type
+	   (char_type_node)));
+  DECL_CHAIN (id_f4) = id_f3;
+  finish_builtin_struct (kernel_info_type, __hsa_kernel_info, id_f4,
+			 NULL_TREE);
 
-  /* Kernel dependencies is a list of lists, where for a given kernel A all
- direct kernel dispatches to A1, A2, .., An are organized are joined by '.'
- character and each entry in the list is separated by '\0'.  */
+  int_num_of_kernels = build_int_cstu (uint32_type_node, map_count);
+  tree kernel_info_vector_type = build_array_type
+(kernel_info_type, build_index_type

[hsa] HSA: fix kernel initialization for dependent kernels.

2015-08-03 Thread Martin Liška

Hello.

Following patch has been installed to HSA branch.

Thanks,
Martin
From 125ee1ed4123527d4ce841631d4930ac96bec281 Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Mon, 3 Aug 2015 13:00:42 +0200
Subject: [PATCH 2/3] HSA: fix kernel initialization for dependent kernels

libgomp/ChangeLog:

2015-08-03  Martin Liska  mli...@suse.cz

	* plugin/plugin-hsa.c (init_single_kernel): Initialize recursively
	all dependent kernels.
	(init_kernel): Moved from this location.
---
 libgomp/plugin/plugin-hsa.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index 144548a..c05de1e 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -682,6 +682,15 @@ init_single_kernel (struct kernel_info *kernel)
   fprintf (stderr,   omp_data_size: %u\n,
 	   kernel-omp_data_size);
 }
+
+  /* FIXME: do not consider all kernels to live in a same module.  */
+  struct module_info *module = kernel-agent-first_module;
+  for (unsigned i = 0; i  kernel-dependencies_count; i++)
+{
+  struct kernel_info *dependency = get_kernel_in_module
+	(module, kernel-dependencies[i]);
+  init_single_kernel (dependency);
+}
 }
 
 /* Indent stream F by INDENT spaces.  */
@@ -766,16 +775,6 @@ init_kernel (struct kernel_info *kernel)
 
   init_single_kernel (kernel);
 
-  struct agent_info *agent = kernel-agent;
-  struct module_info *module = agent-first_module;
-
-  for (unsigned i = 0; i  kernel-dependencies_count; i++)
-{
-  struct kernel_info *dependency = get_kernel_in_module
-	(module, kernel-dependencies[i]);
-  init_single_kernel (dependency);
-}
-
   if (debug)
 fprintf (stderr, \n);
 
-- 
2.4.6

[hsa] HSA: calculate properly maximum OMP data size which is, necessary.

2015-08-03 Thread Martin Liška

Hello.

Following patch has been installed to HSA branch.

Thanks,
Martin
From 163c6a4bb917ae76dad265dca5762a3bd87c947a Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Mon, 3 Aug 2015 14:35:26 +0200
Subject: [PATCH 3/3] HSA: calculate properly maximum OMP data size which is
 necessary.

libgomp/ChangeLog:

2015-08-03  Martin Liska  mli...@suse.cz

	* plugin/plugin-hsa.c (create_kernel_dispatch): Calculate maximum
	memory necessary for OMP data.
	(init_single_kernel): Likewise.
	(create_kernel_dispatch_recursive): Pass new argument with maximum
	OMP data size.
	(init_kernel): Likewise.
	(GOMP_OFFLOAD_run): Likewise.
---
 libgomp/plugin/plugin-hsa.c | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index c05de1e..4270245 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -573,17 +573,15 @@ create_and_finalize_hsa_program (struct agent_info *agent)
 /* Create kernel dispatch data structure for given KERNEL.  */
 
 static struct hsa_kernel_dispatch *
-create_kernel_dispatch (struct kernel_info *kernel)
+create_kernel_dispatch (struct kernel_info *kernel, unsigned omp_data_size)
 {
   struct agent_info *agent = kernel-agent;
   struct hsa_kernel_dispatch *shadow = GOMP_PLUGIN_malloc_cleared
 (sizeof (struct hsa_kernel_dispatch));
 
   shadow-queue = agent-command_q;
-
-  /* Compute right size needed for memory allocation.  */
-  shadow-omp_data_memory = GOMP_PLUGIN_malloc (kernel-omp_data_size);
-
+  shadow-omp_data_memory = omp_data_size  0
+? GOMP_PLUGIN_malloc (omp_data_size) : NULL;
   unsigned dispatch_count = kernel-dependencies_count;
   shadow-kernel_dispatch_count = dispatch_count;
 
@@ -634,10 +632,11 @@ release_kernel_dispatch (struct hsa_kernel_dispatch *shadow)
   free (shadow);
 }
 
-/* Initialize a KERNEL without its dependencies.  */
+/* Initialize a KERNEL without its dependencies.  MAX_OMP_DATA_SIZE is used
+   to calculate maximum necessary memory for OMP data allocation.  */
 
 static void
-init_single_kernel (struct kernel_info *kernel)
+init_single_kernel (struct kernel_info *kernel, unsigned *max_omp_data_size)
 {
   hsa_status_t status;
   struct agent_info *agent = kernel-agent;
@@ -683,13 +682,16 @@ init_single_kernel (struct kernel_info *kernel)
 	   kernel-omp_data_size);
 }
 
+  if (kernel-omp_data_size  *max_omp_data_size)
+*max_omp_data_size = kernel-omp_data_size;
+
   /* FIXME: do not consider all kernels to live in a same module.  */
   struct module_info *module = kernel-agent-first_module;
   for (unsigned i = 0; i  kernel-dependencies_count; i++)
 {
   struct kernel_info *dependency = get_kernel_in_module
 	(module, kernel-dependencies[i]);
-  init_single_kernel (dependency);
+  init_single_kernel (dependency, max_omp_data_size);
 }
 }
 
@@ -738,11 +740,14 @@ print_kernel_dispatch (struct hsa_kernel_dispatch *dispatch, unsigned indent)
dependencies.  */
 
 static struct hsa_kernel_dispatch *
-create_kernel_dispatch_recursive (struct kernel_info *kernel)
+create_kernel_dispatch_recursive (struct kernel_info *kernel,
+  unsigned omp_data_size)
 {
   // TODO: find correct module
   struct module_info *module = kernel-agent-first_module;
-  struct hsa_kernel_dispatch *shadow = create_kernel_dispatch (kernel);
+
+  struct hsa_kernel_dispatch *shadow = create_kernel_dispatch (kernel,
+			   omp_data_size);
   shadow-debug = 0;
 
   for (unsigned i = 0; i  kernel-dependencies_count; i++)
@@ -750,7 +755,7 @@ create_kernel_dispatch_recursive (struct kernel_info *kernel)
   struct kernel_info *dependency = get_kernel_in_module
 	(module, kernel-dependencies[i]);
   shadow-children_dispatches[i] = create_kernel_dispatch_recursive
-	(dependency);
+	(dependency, omp_data_size);
 }
 
   return shadow;
@@ -760,7 +765,7 @@ create_kernel_dispatch_recursive (struct kernel_info *kernel)
The function assumes the program has been created, finalized and frozen by
create_and_finalize_hsa_program.  */
 
-void
+unsigned
 init_kernel (struct kernel_info *kernel)
 {
   if (pthread_mutex_lock (kernel-init_mutex))
@@ -770,10 +775,11 @@ init_kernel (struct kernel_info *kernel)
   if (pthread_mutex_unlock (kernel-init_mutex))
 	GOMP_PLUGIN_fatal (Could not unlock an HSA kernel initialization 
 			   mutex);
-  return;
+  return 0;
 }
 
-  init_single_kernel (kernel);
+  unsigned max_omp_data_size = 0;
+  init_single_kernel (kernel, max_omp_data_size);
 
   if (debug)
 fprintf (stderr, \n);
@@ -782,6 +788,8 @@ init_kernel (struct kernel_info *kernel)
   if (pthread_mutex_unlock (kernel-init_mutex))
 GOMP_PLUGIN_fatal (Could not unlock an HSA kernel initialization 
 		   mutex);
+
+  return max_omp_data_size;
 }
 
 /* Part of the libgomp plugin interface.  Run a kernel on a device N and pass
@@ -797,9 +805,9 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr,

Re: [PATCH] Simplify vector compare-not-select sequence

2015-08-03 Thread Bill Schmidt

On Sun, 2015-08-02 at 12:18 +0200, Andreas Schwab wrote:
 Bill Schmidt wschm...@linux.vnet.ibm.com writes:
 
  * gcc.target/powerpc/vec-cmp-sel.c: New test.
 
 FAIL: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors)
 Excess errors:
 /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:14:1: 
 error: use of 'long long' in AltiVec types is invalid without -mvsx
 /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:15:3: 
 error: use of 'long long' in AltiVec types is invalid without -mvsx
 /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:16:3: 
 error: use of 'long long' in AltiVec types is invalid without -mvsx
 /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:3: 
 error: use of 'long long' in AltiVec types is invalid without -mvsx
 /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:29: 
 error: incompatible types when initializing type '__vector unsigned long 
 long' using type '__vector __bool long long'
 /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:18:3: 
 error: use of 'long long' in AltiVec types is invalid without -mvsx
 
 Andreas.
 

Hi Andreas,

Can you please verify that this patch works for you?

Thanks,
Bill

Index: gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c  (revision 226505)
+++ gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target powerpc64*-*-* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options -maltivec -O2 } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options -maltivec -O2 -mvsx -mpower8-vector } */
 /* { dg-final { scan-assembler vcmpgtsd } } */
 /* { dg-final { scan-assembler-not xxlnor } } */

Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.

2015-08-03 Thread Martin Liška

PING.

Thanks,
Martin

On 07/20/2015 09:51 AM, Martin Liška wrote:
 On 07/16/2015 01:03 PM, Martin Liška wrote:
 On 07/09/2015 06:24 PM, Jeff Law wrote:
 On 07/09/2015 07:56 AM, mliska wrote:
 gcc/ChangeLog:

 2015-07-09  Martin Liska  mli...@suse.cz

 * dbgcnt.def: Add new debug counter.
 * ipa-icf-gimple.c (func_checker::compare_ssa_name): Add flag
 for strict mode.
 (func_checker::compare_memory_operand): Likewise.
 (func_checker::compare_cst_or_decl): Handle if we are in
 tail_merge_mode.
 (func_checker::compare_operand): Pass strict flag properly.
 (func_checker::stmt_local_def): New function.
 (func_checker::compare_phi_node): Move from sem_function class.
 (func_checker::compare_bb_tail_merge): New function.
 (func_checker::compare_bb): Improve STMT iteration.
 (func_checker::compare_gimple_call): Pass strict flag.
 (func_checker::compare_gimple_assign): Likewise.
 (func_checker::compare_gimple_label): Remove unused flag.
 (ssa_names_set): New class.
 (ssa_names_set::build): New function.
 * ipa-icf-gimple.h (func_checker::gsi_next_nonlocal): New
 function.
 (ssa_names_set::contains): New function.
 (ssa_names_set::add): Likewise.
 * ipa-icf.c (sem_function::equals_private): Use transformed
 function.
 (sem_function::compare_phi_node): Move to func_checker class.
 * ipa-icf.h: Add new declarations.
 * tree-ssa-tail-merge.c (check_edges_correspondence): New
 function.
 (find_duplicate): Add usage of IPA ICF gimple infrastructure.
 (find_clusters_1): Pass new sem_function argument.
 (find_clusters): Likewise.
 (tail_merge_optimize): Call IPA ICF comparison machinery.
 So a general question.  We're passing in STRICT to several routines, which 
 is fine.  But then we're also checking M_TAIL_MERGE_MODE.  What's the 
 difference between the two?  Can they be unified?

 Hello.

 I would say that STRICT is a bit generic mechanism that was introduced some 
 time before. It's e.g. used for checking of THIS arguments for methods and 
 make checking
 more sensitive in situations that are somehow special.

 The newly added state is orthogonal to the previous one.




 -/* Verifies that trees T1 and T2 are equivalent from perspective of ICF.  
 */
 +/* Verifies that trees T1 and T2 are equivalent from perspective of ICF.
 +   If STRICT flag is true, versions must match strictly.  */

   bool
 -func_checker::compare_ssa_name (tree t1, tree t2)
 +func_checker::compare_ssa_name (tree t1, tree t2, bool strict)
 This (and other) functions would seem to be used more than just ICF at this 
 point.  A pass over the comments to update them as appropriate would be 
 appreciated.

 @@ -626,6 +648,136 @@ func_checker::parse_labels (sem_bb *bb)
   }
   }

 +/* Return true if gimple STMT is just a local difinition in a
 +   basic block.  Used SSA names are contained in SSA_NAMES_SET.  */
 s/difinition/definition/

 Thanks.


 I didn't find this comment particularly useful in understanding what this 
 function does.  AFAICT the function looks as the uses of the LHS of STMT 
 and verifies they're all in the same block as STMT, right?

 It also verifies that the none of the operands within STMT are part of 
 SSA_NAMES_SET.

 What role do those properties play in the meaning of local definition?

 I tried to explain it more deeply what's the purpose of this function.





 @@ -1037,4 +1205,60 @@ func_checker::compare_gimple_asm (const gasm *g1, 
 const gasm *g2)
 return true;
   }

 +void
 +ssa_names_set::build (basic_block bb)
 Needs a function comment.  What are the important names we're collecting 
 here?

 Is a single forward and backward pass really sufficient to find all the 
 important names?

 In the backward pass, do you have to consider things like ASMs?  I guess 
 it's difficult to understand what you need to look at because it's not 
 entirely clear the set of SSA_NAMEs you're building.



 @@ -149,12 +153,20 @@ public:
mapping between basic blocks and labels.  */
 void parse_labels (sem_bb *bb);

 +  /* For given basic blocks BB1 and BB2 (from functions FUNC1 and FUNC),
 + true value is returned if phi nodes are semantically
 + equivalent in these blocks.  */
 +  bool compare_phi_node (sem_bb *sem_bb1, sem_bb *sem_bb2);
 Presumably in the case of tail merging, FUNC1 and FUNC will be the same :-)

 Yes, the function is not called from tail-merge pass.



 /* Verifies that trees T1 and T2 are equivalent from perspective of 
 ICF.  */
 -  bool compare_ssa_name (tree t1, tree t2);
 +  bool compare_ssa_name (tree t1, tree t2, bool strict = true);

 /* Verification function for edges E1 and E2.  */
 bool compare_edge (edge e1, edge e2);
 @@ -204,7 +216,7 @@ public:
 bool compare_tree_ssa_label (tree t1, tree t2);

 /* Function compare for equality given memory operands T1 and T2.  */
 -  bool compare_memory_operand (tree t1, tree t2);
 +  bool

Re: [PATCH][AArch64][8/14] Implement TARGET_OPTION_VALID_ATTRIBUTE_P

2015-08-03 Thread Kyrill Tkachov



On 03/08/15 11:52, James Greenhalgh wrote:

On Fri, Jul 24, 2015 at 11:43:32AM +0100, Kyrill Tkachov wrote:

On 21/07/15 16:37, James Greenhalgh wrote:

On Thu, Jul 16, 2015 at 04:20:59PM +0100, Kyrill Tkachov wrote:

Hi all,

This patch implements target attribute support via the 
TARGET_OPTION_VALID_ATTRIBUTE_P hook.
The aarch64_handle_option function in common/config/aarch64/aarch64-common.c is 
exported to the
backend and beefed up a bit.

The target attributes supported by this patch reflect the command-line options 
that we specified as Save
earlier in the series.  Explicitly, the target attributes supported are:
- general-regs-only
- fix-cortex-a53-835769 and no-fix-cortex-a53-835769
- cmodel=
- strict-align
- omit-leaf-frame-pointer and no-omit-leaf-frame-pointer
- tls-dialect
- arch=
- cpu=
- tune=

These correspond to equivalent command-line options when prefixed with a '-m'.
Additionally, this patch supports specifying architectural features, as in the 
-march and -mcpu options
by themselves. So, for example we can write:
__attribute__((target(+simd+crypto)))
to enable 'simd' and 'crypto' on a per-function basis.

The documentation and tests for this come as a separate patch later after the 
target pragma support and
the inlining rules are implemented.

Bootstrapped and tested on aarch64.


In addition to the comments below, you may want to run
contrib/check_GNU_style.sh on this patch, it shows a number of other
issues that would be nice to fix.

Thanks, here's the updated patch.
I used alloca for memory allocation to keep consistent with the rest of 
aarch64.c,
fixed the error message issues, spacing and control flow issues you pointed out.

How's this?
+static bool
+aarch64_process_one_target_attr (char *arg_str, const char* pragma_or_attr)
+{
+  bool ret;
+  bool invert = false;
+

snip


+  ret = false;

I don't think this variable is exactly what you want. Surely as soon
as we see a failure case, we want to return false. Otherwise we could
see junk,junk,ok_stuff and return true.

So I would drop ret and rewrite this to bail out as soon as we see an
error. The downside is we won't catch multiple errors that way.

The other thing to do would be to invert the sense of your flag to something
like failed and only set it where we fail. It depends what behaviour
you're looking for.


+  for (p_attr = aarch64_attributes; !ret  p_attr-name; p_attr++)
+{
+  /* If the names don't match up, or the user has given an argument
+to an attribute that doesn't accept one, or didn't give an argument
+to an attribute that expects one, fail to match.  */
+  if (strcmp (str_to_check, p_attr-name) != 0)
+   continue;
+
+  bool attr_need_arg_p = p_attr-attr_type == aarch64_attr_custom
+ || p_attr-attr_type == aarch64_attr_enum;
+
+  if (attr_need_arg_p ^ (arg != NULL))
+   {
+ error (target %s %qs does not accept an argument,
+ pragma_or_attr, str_to_check);
+ return false;
+   }
+
+  /* If the name matches but the attribute does not allow no- versions
+then we can't match.  */
+  if (invert  !p_attr-allow_neg)
+   {
+ error (target %s %qs does not allow a negated form,
+ pragma_or_attr, str_to_check);
+ return false;
+   }
+
+  switch (p_attr-attr_type)
+   {
+   /* Has a custom handler registered.
+  For example, cpu=, arch=, tune=.  */
+ case aarch64_attr_custom:
+   gcc_assert (p_attr-handler);
+   ret = p_attr-handler (arg, pragma_or_attr);
+   break;
+
+ /* Either set or unset a boolean option.  */
+ case aarch64_attr_bool:
+   {
+ struct cl_decoded_option decoded;
+
+ generate_option (p_attr-opt_num, NULL, !invert,
+  CL_TARGET, decoded);
+ aarch64_handle_option (global_options, global_options_set,
+ decoded, input_location);
+ ret = true;
+ break;
+   }
+ /* Set or unset a bit in the target_flags.  aarch64_handle_option
+should know what mask to apply given the option number.  */
+ case aarch64_attr_mask:
+   {
+ struct cl_decoded_option decoded;
+ /* We only need to specify the option number.
+aarch64_handle_option will know which mask to apply.  */
+ decoded.opt_index = p_attr-opt_num;
+ decoded.value = !invert;
+ aarch64_handle_option (global_options, global_options_set,
+ decoded, input_location);
+ ret = true;
+ break;
+   }
+ /* Use the option setting machinery to set an option to an enum.  */
+ case aarch64_attr_enum:
+   {
+ gcc_assert (arg);
+ bool valid;
+

Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-08-03 Thread Steve Kargl

On Thu, Jul 30, 2015 at 04:08:47PM +0200, Mikael Morin wrote:
 Le 29/07/2015 18:45, Steve Kargl a écrit :
 
  This builds and passes regression testing on x86_64-*-freebsd.
  OP found the problem by running the sanatizers.  I don't
  know how to build gcc with this as a build option.  I'll
  commit whichever diff you recommend.
 
 I prefer this second variant.
 OK for trunk without further comment from Richi.
 Thanks.
 

Thanks Mikael.  Committed revision 226517.

-- 
Steve

[committed] Fix error: unable to find a register to spill in class 'R1_REGS' reloading indirect call operand

2015-08-03 Thread John David Anglin

The 64-bit register indirect call does not use register %r1.  So, %r1 shouldn't 
be clobbered in
the call pattern.  Normally, this doesn't matter as register %r1 is call 
clobbered.  However,
reload can fail when the register used for the indirect call needs to be 
reloaded from a static
location.  In this case, register %r1 is needed for the reload and the clobber 
blocks the reload.

Tested on hppa64-hp-hpux11.11 and hppa2.0w-hp-hpux11.11.  Committed to trunk, 
gcc-5
and gcc-4.9.

Dave
--
John David Anglin   dave.ang...@bell.net


2015-08-03  John David Anglin  dang...@gcc.gnu.org

PR target/67060
* config/pa/pa.md (call_reg_64bit): Remove reg:DI 1 clobber.
Adjust splits to match new pattern.

Index: config/pa/pa.md
===
--- config/pa/pa.md (revision 226363)
+++ config/pa/pa.md (working copy)
@@ -7440,7 +7440,6 @@
 (define_insn call_reg_64bit
   [(call (mem:SI (match_operand:DI 0 register_operand r))
 (match_operand 1  i))
-   (clobber (reg:DI 1))
(clobber (reg:DI 2))
(clobber (match_operand 2))
(use (reg:DI 27))
@@ -7461,7 +7460,6 @@
 (define_split
   [(parallel [(call (mem:SI (match_operand 0 register_operand ))
(match_operand 1  ))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (clobber (match_operand 2))
  (use (reg:DI 27))
@@ -7472,7 +7470,6 @@
   [(set (match_dup 2) (reg:DI 27))
(parallel [(call (mem:SI (match_dup 0))
(match_dup 1))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (use (reg:DI 27))
  (use (reg:DI 29))
@@ -7482,7 +7479,6 @@
 (define_split
   [(parallel [(call (mem:SI (match_operand 0 register_operand ))
(match_operand 1  ))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (clobber (match_operand 2))
  (use (reg:DI 27))
@@ -7492,7 +7488,6 @@
   [(set (match_dup 2) (reg:DI 27))
(parallel [(call (mem:SI (match_dup 0))
(match_dup 1))
- (clobber (reg:DI 1))
  (clobber (reg:DI 2))
  (use (reg:DI 27))
  (use (reg:DI 29))
@@ -7503,7 +7498,6 @@
 (define_insn *call_reg_64bit_post_reload
   [(call (mem:SI (match_operand:DI 0 register_operand r))
 (match_operand 1  i))
-   (clobber (reg:DI 1))
(clobber (reg:DI 2))
(use (reg:DI 27))
(use (reg:DI 29))

libgo patch committed: Kill sleep processes in testsuite

2015-08-03 Thread Ian Lance Taylor

This patch from Andrew Wilkins kills off extra sleep processes when
running the libgo testsuite.  Otherwise, the sleep process would
continue after the test completes.  This doesn't affect the testsuite
but the extra sleep processes are not desirable.  Committed to
mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 226525)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d5aad2f400a0f21724e33e4ae48e1583ed8b1a87
+33d59eff1bd5de29f1fbde3b7625db28595835fd
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/testsuite/gotest
===
--- libgo/testsuite/gotest  (revision 226510)
+++ libgo/testsuite/gotest  (working copy)
@@ -518,7 +518,7 @@ xno)
wait $pid
status=$?
if ! test -f gotest-timeout; then
-   kill $alarmpid
+   ps -o pid,ppid | grep  $alarmpid | cut -f1 -d  | xargs 
kill -9
fi
else
if test $trace = true; then

Re: [RFC, patch] New attribute to create target clones

2015-08-03 Thread Jeff Law


On 07/30/2015 04:19 PM, Evgeny Stupachenko wrote:

Hi All,

The patch enables new attribute 'ctarget',
The attribute force compiler to create clones of a function with the attribute.

For example:
__attribute__((ctarget(avx,arch=slm,arch=core-avx2,default)))
So presumably we're allowing both changing the ISA and the tuning 
options?   In fact, it looks like we're able to change any -m option, right?


What about something like -mregparm?  I think the docs need to disallow 
clones with different ABI requirements.



Currently default ctarget means that foo() will be optimized with
current compiler options and target.
The other option is to switch target to target specific minimum (like
target x86-64). Is it better?
I could make an argument for either.  Do we have anything to guide us 
from other compilers such as ICC that may have a similar capability?





What do you think about attribute name? 'ctarget' is short but not
informative. Other variants are 'target_clones', 'targets'...

target_clones seems good.

For multiple_target.c: Can you please trim down the #include set.  I 
can't believe you need all that stuff.If you really need backend 
stuff (tm.h), then use backend.h instead.


It generally looks reasonable, but Jan knows the IPA code much better 
than I do and I'd like him to chime in.  You might also ask Ilya to 
review the cloning and rules around clones since he head to deal with a 
lot of that stuff in his MPX work.


jeff

Re: [PATCH], PowerPC IEEE 128-bit patch #4

2015-08-03 Thread Michael Meissner

On Fri, Jul 31, 2015 at 12:43:20AM +, Joseph Myers wrote:
 On Wed, 29 Jul 2015, Michael Meissner wrote:
 
#6Add support for using different names for the 64/128-bit integer
  conversion to IBM extended double, to allow a future version to
  switch the default for what long double is. It is not expected that GCC
  6.x will make this switch, but we would like to eventually use the
  standard TF names for the library when the default change is made. If
  this isn't clear, the following names use 'tf' in them, when they use
  IBM extended double:
 
 That would be for a completely separate ABI, incompatible with all 
 existing objects both static and shared, since the existing ABI defines 
 these names to have their existing meanings?

The OpenPower 1.1 ABI for little endian PowerPC 64-bit does not mention the
names at all.  This ABI leaves it open whether long double is IBM extended
double or IEEE 128-bit floating point. One of the goals of these patches is
someday in the future change the default.

A lot of work, particularly in the library space needs to be done to change the
default. Until we can make the switch, users that want IEEE 128-bit support
will need to use __float128.

The intention of theese changes (currently unwritten) is to change the existing
problematical names that use TF in their name to be something else, and provide
via a weak reference an alias for the old name. So if for example, we change
the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses say
__gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of
__floatditf. Older code that refers to __floatditf would still work fine.

Then sometime later (such as GCC 8.0) we could make __floatditf be a weak
reference to __floatdikt.

If you have any ideas of how to do this seemlessly, please let me know.  Steve
Monroe and David Edelsohn requested that I explore that some date in the
future, we will be able to use the standard names.

I tend to be skeptical that we can do it without running into some
incompatibility, and I feel that we just have to live with the existing TF
names, and not use TF for IEEE 128-bit.

Currently, I'm using KF for the IEEE 128-bit functions, even if long double is
mapped to IEEE 128-bit instead of long double.

Another wrinkle is the 32-bit RTEMS port actually uses IEEE 128-bit floating
point with the standard names, because they never used the IBM extended
double.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: PR c/c++/diagnostics/66098 Take -Werror into account when deciding what was the command-line status

2015-08-03 Thread Manuel López-Ibáñez

PING: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02581.html

Thanks,

Manuel.

On 30 July 2015 at 17:35, Manuel López-Ibáñez lopeziba...@gmail.com wrote:
 When I fixed PR59304, I forgot that a command-line warning can be also
 an error if -Werror was enabled. This introduced a regression since
 anything enabled in the command-line together with -Werror would get
 initially classified as a warning when reaching the first #pragma GCC
 diagnostic, and this will be the setting after a #pragma pop.

 Options that appear as arguments of -W[no-]error= are not affected by
 this since those are initially classified as errors/warnings even
 before reaching the first #pragma, thus the pop sets them correctly
 (before and after this patch). Nonetheless, the tests also check that
 they work correctly.

 Bootregtested on x86_64-linux-gnu.

 OK?


 gcc/ChangeLog:

 2015-07-29  Manuel López-Ibáñez  m...@gcc.gnu.org

 PR c/66098
 PR c/66711
 * diagnostic.c (diagnostic_classify_diagnostic): Take -Werror into
 account when deciding what was the command-line status.

 gcc/testsuite/ChangeLog:

 2015-07-29  Manuel López-Ibáñez  m...@gcc.gnu.org

 PR c/66098
 PR c/66711
 * gcc.dg/pragma-diag-3.c: New test.
 * gcc.dg/pragma-diag-4.c: New test.

Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT

2015-08-03 Thread Jeff Law


On 07/28/2015 06:00 AM, Kugan wrote:



On 27/07/15 05:38, Andreas Schwab wrote:

Kugan kugan.vivekanandara...@linaro.org writes:


* cse.c (cse_insn): Fix missing check for STRICT_LOW_PART and minor
clean up.


This breaks

gcc.target/m68k/tls-ie-xgot.c scan-assembler jsr __m68k_read_tp
gcc.target/m68k/tls-ie.c scan-assembler jsr __m68k_read_tp
gcc.target/m68k/tls-le-xtls.c scan-assembler jsr __m68k_read_tp
gcc.target/m68k/tls-le.c scan-assembler jsr __m68k_read_tp

Andreas.



Sorry for the breakage. My patch to add ZERO_EXTRACT unfortunately
restricts the behaviour in one other case. That is, even when REG_EQUAL
note and src are same, we were setting src_eqv to src when it is
STRICT_LOW_PART. Not sure why but restored the old behaviour.

I could reproduce this issue by inspecting the generated asm and made
sure that it is fixed. However I could not run regression for m68k
(Sorry I don’t have access to the set-up).
I bootstrapped and regression tested on x86_64-linux-gnu and
arm-none-linux-gnu with no new regressions.

Thanks,
Kugan


gcc/ChangeLog:

2015-07-27  Kugan Vivekanandarajah  kug...@linaro.org

* cse.c (cse_insn): Restoring old behaviour for src_eqv
 when dest and value in the REG_EQUAL are same and dest
 is STRICT_LOW_PART.

OK.

I verified there were no regressions on m68k.exp with a cross compiler 
and that the tests referenced by Andreas indeed passes with the patch 
installed.


Thanks,
jeff

Re: PR middle-end/16351 NULL dereference warnings

2015-08-03 Thread Manuel López-Ibáñez

PING: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01860.html

Actually, the xfailed test was because the function folded to nothing
and the offending code was removed without warning. Fixed in the
attached version. Same changelog.



On 22 July 2015 at 17:52, Manuel López-Ibáñez lopeziba...@gmail.com wrote:
 I took the patch in
 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01715.html and removed
 the Wnull-attribute part, since most of it can be done from the FE as
 shown in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01857.html  and
 also to make the patch smaller and easier to review.

 I also fixed the comments by Florian here:
 https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00149.html and added
 more tests from the PR and its duplicates (one xfailed, I'll open a
 new PR about it).

 Futher cleanups may be possible (infer_nonnull_range_by_attribute
 checks flag_delete_null_pointer_checks, which seems weird to me but it
 matches the existing behavior of infer_nonnull_range).

 I added this to Wall to get as much testing as possible, we can always
 move it to Wextra or disable it by default just before the release if
 it turns out to be too noisy.

 Boostrapped and regression tested on x86_64-linux-gnu.

 OK?

 gcc/ChangeLog:

 2015-07-22  Manuel López-Ibáñez  m...@gcc.gnu.org
 Jeff Law  l...@redhat.com

 PR c/16351
 * doc/invoke.texi (Wnull-dereference): New.
 * tree-vrp.c (infer_value_range): Update call to infer_nonnull_range.
 * gimple-ssa-isolate-paths.c (find_implicit_erroneous_behaviour):
 Warn for potential NULL dereferences.
 (find_explicit_erroneous_behaviour): Warn for NULL dereferences.
 * ubsan.c (instrument_nonnull_arg): Call
 infer_nonnull_range_by_attribute.
 (instrument_nonnull_return): Likewise.
 * common.opt (Wnull-dereference); New.
 * gimple.c (infer_nonnull_range): Remove bool arguments.
 (infer_nonnull_range_by_dereference): New.
 (infer_nonnull_range_by_attribute): New.
 * gimple.h: Update declarations.

 gcc/testsuite/ChangeLog:

 2015-07-22  Manuel López-Ibáñez  m...@gcc.gnu.org
 Jeff Law  l...@redhat.com

 PR c/16351
 * gcc.dg/tree-ssa/isolate-2.c: Close comment.
 * gcc.dg/tree-ssa/isolate-4.c: Likewise.
 * gcc.dg/tree-ssa/wnull-dereference.c: New test.
 * gcc.dg/tree-ssa/isolate-1.c: Test warnings with -Wnull-dereference.
 * gcc.dg/tree-ssa/isolate-3.c: Likewise.
 * gcc.dg/tree-ssa/isolate-5.c: Likewise.
 * c-c++-common/wnonnull-1.c: New test.
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 225868)
+++ gcc/tree-vrp.c  (working copy)
@@ -4936,11 +4936,11 @@ infer_value_range (gimple stmt, tree op,
  break;
   if (e == NULL)
return false;
 }
 
-  if (infer_nonnull_range (stmt, op, true, true))
+  if (infer_nonnull_range (stmt, op))
 {
   *val_p = build_int_cst (TREE_TYPE (op), 0);
   *comp_code_p = NE_EXPR;
   return true;
 }
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 225868)
+++ gcc/doc/invoke.texi (working copy)
@@ -258,10 +258,11 @@ Objective-C and Objective-C++ Dialects}.
 -Wframe-larger-than=@var{len} -Wno-free-nonheap-object -Wjump-misses-init @gol
 -Wignored-qualifiers  -Wincompatible-pointer-types @gol
 -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int @gol
 -Winit-self  -Winline  -Wno-int-conversion @gol
 -Wno-int-to-pointer-cast -Wno-invalid-offsetof @gol
+-Wnull-dereference @gol
 -Winvalid-pch -Wlarger-than=@var{len}  -Wunsafe-loop-optimizations @gol
 -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmemset-transposed-args @gol
 -Wmisleading-indentation -Wmissing-braces @gol
 -Wmissing-field-initializers -Wmissing-include-dirs @gol
@@ -4130,10 +4133,20 @@ All the above @option{-Wunused} options 
 
 In order to get a warning about an unused function parameter, you must
 either specify @option{-Wextra -Wunused} (note that @option{-Wall} implies
 @option{-Wunused}), or separately specify @option{-Wunused-parameter}.
 
+@item -Wnull-dereference
+@opindex Wnull-dereference
+@opindex Wno-null-dereference
+Warn if the compiler detects paths which trigger erroneous or
+undefined behaviour due to dereferencing a NULL pointer.  This option
+is only active when @option{-fdelete-null-pointer-checks} is active,
+which is enabled by optimizations in most targets.  The precision of
+the warnings depends on the optimization options used.  This option is
+enabled by @option{-Wall}.
+
 @item -Wuninitialized
 @opindex Wuninitialized
 @opindex Wno-uninitialized
 Warn if an automatic variable is used without first being initialized
 or if a variable may be clobbered by a @code{setjmp} call. In C++,
Index: gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c
===
---

Go patch committed: Don't make erroneous type descriptors

2015-08-03 Thread Ian Lance Taylor

This patch by Chris Manghane changes the Go frontend to not make type
descriptor for named types if we have seen errors during the
compilation.  This avoids a compiler crash, and fixes
https://golang.org/issue/11560 .  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

[committed, PATCH]: Add a testcase for PR tree-optimization/67077

2015-08-03 Thread H.J. Lu

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67077

is fixed by

https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=226089

I checked in this patch to add a testcase for PR tree-optimization/67077.

Index: ChangeLog
===
--- ChangeLog   (revision 226516)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2015-08-03  H.J. Lu  hongjiu...@intel.com
+
+   PR tree-optimization/67077
+   * gcc.dg/pr67077.c: New test.
+
 2015-08-03  Jeff Law  l...@redhat.com
 
PR middle-end/66314
Index: gcc.dg/pr67077.c
===
--- gcc.dg/pr67077.c(revision 0)
+++ gcc.dg/pr67077.c(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options -Wall -O2 } */
+
+unsigned char buffer[8];
+unsigned long
+foo (void)
+{
+  unsigned long i;
+  i = buffer[0];
+  if (i = 8)
+return i - 7;
+  i++;
+  while (i  8)
+{
+  if (buffer[i-1] != 0)
+return 0;
+  i--;
+}
+  return 1;
+}

Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

2015-08-03 Thread Jeff Law


On 08/02/2015 05:03 AM, Kumar, Venkataramanan wrote:

Hi Jakub,

Thank you for reviewing the patch.

I have incorporated your comments in the attached patch.

Note Jakub is on PTO for the next 3 weeks.






vectorize_mults_via_shift.diff.txt


diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c 
b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c

Jakub would probably like more testcases :-)

The most obvious thing to test would be other shift factors.

A negative test to verify we don't try to turn a multiply by 
non-constant or multiply by a constant that is not a power of 2 into shifts.


[ Would it make sense, for example, to turn a multiply by 3 into a 
shift-add sequence?  As Jakub said, choose_mult_variant can be your 
friend. ]





@@ -2147,6 +2152,140 @@ vect_recog_vector_vector_shift_pattern (vecgimple 
*stmts,
return pattern_stmt;
  }

+/* Detect multiplication by constant which are postive or negatives of power 2,

s/postive/positive/


Jeff

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Kyrill Tkachov



On 03/08/15 16:53, Kyrill Tkachov wrote:

On 03/08/15 16:45, Uros Bizjak wrote:

On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:


Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
don't think this is a good idea. In the expander, there is already
quite some target-dependent code that goes great length to utilize sbb
insn as much as possible, before cmove is used.

IMO, as far as x86 is concerned, the best solution would be to revert
the change. ix86_expand_int_movcc already does some tricks from your
patch in a target-efficient way. Generic change that was introduced by
your patch now interferes with this expansion.

Well, technically the transformation was already there, it was just never
reached for an x86 compilation because noce_try_cmove was tried in front of
it
and used a target-specific expansion.
In any case, how's this proposal?
The transformation noce_try_store_flag_constants
/* if (test) x = a; else x = b;
   =   x = (-(test != 0)  (b - a)) + a;  */

Is a catch-all-immediates transformation in noce_try_store_flag_constants.
What if we moved it to noce_try_cmove and performed it only if the
target-specific
conditional move expansion there failed?

That way we can try the x86_64-specific sequence first and still give the
opportunity
to noce_try_store_flag_constants to perform the transformations that can
benefit targets
that don't have highly specific conditional move expanders.

Yes, let's try this approach. As was found out, some targets (e.g.
x86) hide lots of different target-dependent expansion strategies into
movcc expander. Perhaps this fact should be documented in the comment
in the generic code?

Ok, I'll work on that approach and add a comment.


I'm testing a patch that fix the testcases on x86_64 and does not
harm codegen on aarch64. Feel free to file a PR and assign it to me.

Thanks,
Kyrill



Thanks,
Kyrill



Uros.

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Uros Bizjak

On Mon, Aug 3, 2015 at 7:20 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I
 don't think this is a good idea. In the expander, there is already
 quite some target-dependent code that goes great length to utilize sbb
 insn as much as possible, before cmove is used.

 IMO, as far as x86 is concerned, the best solution would be to revert
 the change. ix86_expand_int_movcc already does some tricks from your
 patch in a target-efficient way. Generic change that was introduced by
 your patch now interferes with this expansion.

 Well, technically the transformation was already there, it was just
 never
 reached for an x86 compilation because noce_try_cmove was tried in front
 of
 it
 and used a target-specific expansion.
 In any case, how's this proposal?
 The transformation noce_try_store_flag_constants
 /* if (test) x = a; else x = b;
=   x = (-(test != 0)  (b - a)) + a;  */

 Is a catch-all-immediates transformation in
 noce_try_store_flag_constants.
 What if we moved it to noce_try_cmove and performed it only if the
 target-specific
 conditional move expansion there failed?

 That way we can try the x86_64-specific sequence first and still give
 the
 opportunity
 to noce_try_store_flag_constants to perform the transformations that can
 benefit targets
 that don't have highly specific conditional move expanders.

 Yes, let's try this approach. As was found out, some targets (e.g.
 x86) hide lots of different target-dependent expansion strategies into
 movcc expander. Perhaps this fact should be documented in the comment
 in the generic code?

 Ok, I'll work on that approach and add a comment.


 I'm testing a patch that fix the testcases on x86_64 and does not
 harm codegen on aarch64. Feel free to file a PR and assign it to me.

PR67103 [1]

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103

Thanks,
Uros.

Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.

2015-08-03 Thread Jeff Law


On 07/16/2015 05:03 AM, Martin Liška wrote:

So a general question.  We're passing in STRICT to several routines, which is 
fine.  But then we're also checking M_TAIL_MERGE_MODE.  What's the difference 
between the two?  Can they be unified?


Hello.

I would say that STRICT is a bit generic mechanism that was introduced some 
time before. It's e.g. used for checking of THIS arguments for methods and make 
checking
more sensitive in situations that are somehow special.

The newly added state is orthogonal to the previous one.
Fair enough.  There's some cases where we've documented STRICT, and 
others where we haven't.


  If STRICT flag is true, version must match strictly
Appears as documentation for STRICT.  It seems like it'd be better to 
describe what strictly means here.


Elsewhere we have comments like:

  Be strict in case of tail-merge optimization

Which tends to confuse things a bit.  Perhaps something more like:

  In the case of tail merge optimization, XYZ must match

It seems like a nit, but to someone else reading the code I don't think 
the distinctions are all that clear right now, so if we can improve 
things, it'd be good.







I didn't find this comment particularly useful in understanding what this 
function does.  AFAICT the function looks as the uses of the LHS of STMT and 
verifies they're all in the same block as STMT, right?

It also verifies that the none of the operands within STMT are part of 
SSA_NAMES_SET.

What role do those properties play in the meaning of local definition?


I tried to explain it more deeply what's the purpose of this function.
Thanks.  It's much clearer now.   We've actually got similar code in a 
couple places (ifcvt).  I wonder if we could unify those implementations 
as a follow-up cleanup.











@@ -1037,4 +1205,60 @@ func_checker::compare_gimple_asm (const gasm *g1, const 
gasm *g2)
 return true;
   }

+void
+ssa_names_set::build (basic_block bb)

Needs a function comment.  What are the important names we're collecting here?

Is a single forward and backward pass really sufficient to find all the 
important names?

In the backward pass, do you have to consider things like ASMs?  I guess it's 
difficult to understand what you need to look at because it's not entirely 
clear the set of SSA_NAMEs you're building.
These questions and lack of function comment don't seem to have been 
addressed yet.




  +
  +using namespace ipa_icf;
  +using namespace ipa_icf_gimple;

Is this wise?  Does it significantly help with conciseness within the tail 
merging pass where it wants things ipa-icf and ipa-icf-gimple?

I'm not objecting at this point, I want to hear your thoughts.


I must agree with you, as I've started using both namespaces in tree-tail merge 
pass,
it makes not much sense anymore. I suggest to come up with a namespace that will
encapsulate 'identical code folding'-related stuff. What about:

namespace icf
Sure if it helps and is clean.  GCC does not have a policy against 
using namespace, but other codebases do (google for example) as it 
does introduce some long term maintenance issues.


So when I see a using namespace of that nature, I'm naturally going to 
question if it really helps in a significant way.  If it does, then I 
won't object.  If it's not helping in a significant way, then I'm likely 
to object.


I think the updated version is fine WRT namespaces.




?






   /* Describes a group of bbs with the same successors.  The successor bbs are
  cached in succs, and the successor edge flags are cached in succ_flags.
@@ -1220,17 +1231,48 @@ gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator 
*gsi, tree *vuse,
   }
   }

+static bool
+check_edges_correspondence (basic_block bb1, basic_block bb2)

Needs a function comment.

Still needs function comment.


I think we're pretty close here.  Most of the issues are comments -- I 
still haven't looked *real* close at ssa_names_set::build.  With a 
function comment I ought to be able to look at it more closely in the 
next (and hopefully final) iteration.



Jeff

Re: [PR66726] Factor conversion out of COND_EXPR

2015-08-03 Thread Jeff Law


On 07/26/2015 07:05 PM, Kugan wrote:


I thought that when !gimple_assign_cast_p (stmt), RHS will always
boolean. I have now added this check in the attached patch.

Thanks.



I also noticed that in maybe_optimize_range_tests, GIMPLE_COND can
have non compatible types when new_op is updated
(boolean types coming from tcc_compare results) and hence need to be
converted. Changed that as well.
Did you find this by examination or with some testcode?  If the latter, 
including a testcase for this issue would be appreciated.






Bootstrapped and regression tested on x86-64-none-linux-gnu with no new
regressions. Is this OK for trunk?

OK with an updated changelog.

Jeff

Re: [PATCH] Simplify vector compare-not-select sequence

2015-08-03 Thread Andreas Schwab

Bill Schmidt wschm...@linux.vnet.ibm.com writes:

 Index: gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c
 ===
 --- gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c(revision 226505)
 +++ gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c(working copy)
 @@ -1,6 +1,7 @@
  /* { dg-do compile { target powerpc64*-*-* } } */

If you want -m64 you need dg-require-effective-target lp64, but I see no
need for that.

  /* { dg-require-effective-target powerpc_p8vector_ok } */
 -/* { dg-options -maltivec -O2 } */
 +/* { dg-require-effective-target powerpc_vsx_ok } */
 +/* { dg-options -maltivec -O2 -mvsx -mpower8-vector } */
  /* { dg-final { scan-assembler vcmpgtsd } } */
  /* { dg-final { scan-assembler-not xxlnor } } */

Looks good.

PASS: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors)
PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler vcmpgtsd
PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler-not xxlnor
PASS: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors)
PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler vcmpgtsd
PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler-not xxlnor

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.

Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.

2015-08-03 Thread Jeff Law


On 07/20/2015 01:52 AM, Martin Liška wrote:






Due to changes dump file, it will be necessary to amend scanning of dump files.
Those changes are fine and can be installed once the prerequisites have 
been approved/installed.


Thanks,
Jeff

Re: [PATCH 1/2] gcc parts of timer refactoring

2015-08-03 Thread Jeff Law


On 07/31/2015 03:45 PM, David Malcolm wrote:

In r223092 (aka dd4d567f4b6b498242097c41d63666bdae320ac1) I moved the
state of timevar.c from being global data into a class timer with
a global instance g_timer.

This followup patch generalizes the timing within toplev so that an
external timer instance can (optionally) be passed in; this is used
in patch 2 to add an API to libgccjit for client code: gcc_jit_timer.

The gcc_jit_timer API allows client code to create arbitrary new timing
items (beyond the hardcoded list in timevar.def), giving them names,
so that client code can time whatever things are relevant to it; this
patch adds the support necessary to timevar.[ch] for this.

The patch also generalizes timevar.h's auto_timevar to avoid it
depending on the g_timer global (which is only set during the
lifetime of toplev.  We can use this to add TV_JIT_ACQUIRING_MUTEX,
and thus measure wallclock time spent waiting for the JIT mutex.

Generalize timevar.c to add support for timing jit client code, by:

  * adding support for timing client items with arbitrary
client-supplied names (e.g. compiling code, running code etc).

  * supporting having a timer instance created externally and passed
in to toplev, allowing clients to control where toplev's timing
information is accumulated

  * add a couple of jit-specific timevars

This goes with the next patch; I've split them up for ease of review.

OK for trunk?

gcc/ChangeLog:
* main.c (main): Pass in NULL for toplev's external_timer.
* timevar.c: Include coretypes.h, hash-map.h, vec.h.
(class timer::named_items): New.
(timer::named_items::named_items): New.
(timer::named_items::~named_items): New.
(timer::named_items::push): New.
(timer::named_items::pop): New.
(timer::named_items::print): New.
(timer::timer): Initialize field m_jit_client_items.
(timer::~timer): New.
(timer::push): Move bulk of implementation to...
(timer::push_internal): ...here.  New function.
(timer::pop): Move bulk of implementation to...
(timer::pop_internal): ...here.  New function.
(timer::push_client_item): New.
(timer::pop_client_item): New.
(timer::print_row): New function, taken from timer::print.
(timer::print): Print GCC items header if we also have client
items.  Move row-printing to timer::print_row.  Print any client
items.
(timer::get_topmost_item_name): New method.
* timevar.def (TV_JIT_ACQUIRING_MUTEX): New.
(TV_JIT_CLIENT_CODE): New.
* timevar.h (timer::push_client_item): New declaration.
(timer::pop_client_item): New declaration.
(timer::get_topmost_item_name): New method.
(timer::push_internal): New declaration.
(timer::pop_internal): New declaration.
(timer::print_row): New declaration.
(timer::named_items): New declaration.
(timer::m_jit_client_items): New field.
(timer): Add friend class named_items.
(auto_timevar::auto_timevar): Add timer param.
(auto_timevar::~auto_timevar): Use field m_timer.
(auto_timevar::m_timer): New field.
* toplev.c (initialize_rtl): Add g_timer as param when
constructing auto_timevar instance.
(toplev::toplev): Add external_timer param, and use it to
initialize the g_timer global if non-NULL.
(toplev::~toplev): If this created g_timer, delete it.
* toplev.h (toplev::toplev): Replace use_TV_TOTAL bool param
with external_timer timer *.
---
  gcc/main.c  |   2 +-
  gcc/timevar.c   | 247 
  gcc/timevar.def |   2 +
  gcc/timevar.h   |  35 +++-
  gcc/toplev.c|  18 +++--
  gcc/toplev.h|   4 +-
  6 files changed, 261 insertions(+), 47 deletions(-)

diff --git a/gcc/main.c b/gcc/main.c
index 79baf0d..bbd8b67 100644
--- a/gcc/main.c
+++ b/gcc/main.c
@@ -33,7 +33,7 @@ int main (int argc, char **argv);
  int
  main (int argc, char **argv)
  {
-  toplev toplev (true, /* use_TV_TOTAL */
+  toplev toplev (NULL, /* external_timer */
 true /* init_signals */);

return toplev.main (argc, argv);
diff --git a/gcc/timevar.c b/gcc/timevar.c
index 76ad22a..eebb1dc 100644
--- a/gcc/timevar.c
+++ b/gcc/timevar.c
@@ -20,7 +20,10 @@ along with GCC; see the file COPYING3.  If not see

  #include config.h
  #include system.h
+#include coretypes.h
  #include timevar.h
+#include hash-map.h
+#include vec.h
You shouldn't need to include hash-map.h and vec.h, as those get pulled 
in via core-types.h including hash-table.h now.



+  for (iter = m_stack; iter; iter=next)

Nit, whitespace around the '=' in iter=next assignment.


+{
+  next = iter-next;
+  free (iter);
+}
+  for (iter = m_unused_stack_instances; iter; iter=next)

Here too.

OK with those nits fixed.

jeff

Re: Go patch committed: Fix error reporting for invalid builtin calls

2015-08-03 Thread Ian Lance Taylor

On Mon, Aug 3, 2015 at 2:10 AM, Andreas Schwab sch...@suse.de wrote:

 ../../../libgo/runtime/mprof.goc: In function 'runtime_Stack':
 ../../../libgo/runtime/mprof.goc:408:5: error: calling 
 '__builtin_frame_address' with a nonzero argument is unsafe 
 [-Werror=frame-address]
   sp = runtime_getcallersp(b);

Fixed by this patch by Chris Manghane.  The call was not actually
necessary.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.  This fixes PR
67101.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 226510)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2bf7c643a1d2f8503070c8e6cb87852026e32400
+d5aad2f400a0f21724e33e4ae48e1583ed8b1a87
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/mprof.goc
===
--- libgo/runtime/mprof.goc (revision 226510)
+++ libgo/runtime/mprof.goc (working copy)
@@ -402,10 +402,9 @@ func ThreadCreateProfile(p Slice) (n int
 }
 
 func Stack(b Slice, all bool) (n int) {
-   byte *pc, *sp;
+   byte *pc;
bool enablegc;

-   sp = runtime_getcallersp(b);
pc = (byte*)(uintptr)runtime_getcallerpc(b);
 
if(all) {
@@ -423,7 +422,6 @@ func Stack(b Slice, all bool) (n int) {
g-writebuf = (byte*)b.__values;
g-writenbuf = b.__count;
USED(pc);
-   USED(sp);
runtime_goroutineheader(g);
runtime_traceback();
runtime_printcreatedby(g);

Re: Go patch committed: Fix error reporting for invalid builtin calls

2015-08-03 Thread Richard Biener

On Mon, Aug 3, 2015 at 11:10 AM, Andreas Schwab sch...@suse.de wrote:
 Ian Lance Taylor i...@golang.org writes:

 This patch from Chris Manghane fixes the Go frontend error reporting
 for invalid builtin calls, by not losing track of whether the call is
 erroneous.  This fixes https://golang.org/issue/11561.  Bootstrapped
 and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to
 mainline.

 ../../../libgo/runtime/mprof.goc: In function 'runtime_Stack':
 ../../../libgo/runtime/mprof.goc:408:5: error: calling 
 '__builtin_frame_address' with a nonzero argument is unsafe 
 [-Werror=frame-address]
   sp = runtime_getcallersp(b);
  ^

Introduced by Martin Sebors patch

2015-08-02  Martin Sebor  mse...@redhat.com

* c-family/c.opt (-Wframe-address): New warning option.
* doc/invoke.texi (Wframe-address): Document it.
* doc/extend.texi (__builtin_frame_address, __builtin_return_address):
Clarify possible effects of calling the functions with non-zero
arguments and mention -Wframe-address.
* builtins.c (expand_builtin_frame_address): Handle -Wframe-address.


Richard.

 Andreas.

 --
 Andreas Schwab, SUSE Labs, sch...@suse.de
 GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
 And now for something completely different.

Re: [gomp4] expand acc_on_device earlier

2015-08-03 Thread Thomas Schwinge

Hi!

On Mon, 3 Aug 2015 08:00:36 -0400, Nathan Sidwell nat...@codesourcery.com 
wrote:
 On 08/03/15 07:37, Thomas Schwinge wrote:
  On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote:
  I've committed this to gomp4 branch.  It expands the acc_on_device builtin
  earlier in the new oacc_xform pass.  This will allow more optimization 
  earlier on.
 
  Thanks!
 
  The existing expansion point now only needs to deal with the host-side 
  case.
 
  Actually, no; committed to gomp-4_0-branch in r226498:
 
 Please clarify.  This suggests a logic fault elsewhere.

Hmm, I had the following in the commit log:

When building an x86_64-intelmicemul-linux-gnu offloading compiler with
r226484, the __builtin_acc_on_device usage in 
libgomp/oacc-init.c:acc_on_device
makes it blow up:

[...]/source-gcc/libgomp/oacc-init.c: In function 'acc_on_device':
[...]/source-gcc/libgomp/oacc-init.c:643:10: internal compiler error: 
in expand_builtin_acc_on_device, at builtins.c:5891
   return __builtin_acc_on_device (dev);
  ^
0x6d9632 expand_builtin_acc_on_device
[...]/source-gcc/gcc/builtins.c:5891

In case that has been too terse (sorry), here's a bit more detail.

GCC can be configured to use nvptx-none as well as
x86_64-intelmicemul-linux-gnu offloading compilers.  In offloading
compilers' configurations, ACCEL_COMPILER is defined.  The nvptx-none
offloading compilers' libgcc/libgomp build will use the specific
libgcc/config/nvptx/gomp-acc_on_device.c:acc_on_device implementation,
but the x86_64-intelmicemul-linux-gnu one will use the default
libgomp/oacc-init.c:acc_on_device implementation (which is implemented in
terms of __builtin_acc_on_device).  Thus, in the latter case, we'd run
into this gcc_unreachable during the x86_64-intelmicemul-linux-gnu
offloading compiler's libgomp build:

/* Expand OpenACC acc_on_device.  This is expanded in the openacc
   transform pass, but if the user has this outside of an offloaded
   region, we'll find it here.  In that case we must be host or none.  */

static rtx
expand_builtin_acc_on_device (tree exp, rtx target)
{
#ifdef ACCEL_COMPILER
  gcc_unreachable ();
#else
  gcc_assert (!get_oacc_fn_attrib (current_function_decl));

In r226498, I then restored the earlier logic of
gcc/builtins.c:expand_builtin_acc_on_device, and could then also simplify
libgcc/config/nvptx/gomp-acc_on_device.c:acc_on_device to the very same
generic/simple implementation in terms of __builtin_acc_on_device.

Does that clarify?


Grüße,
 Thomas


pgp1H8R9bv5JO.pgp
Description: PGP signature

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Kyrill Tkachov



On 03/08/15 13:33, Uros Bizjak wrote:

Hello!


2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
 when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
 explicit.  Prefer to add the flag whenever possible.
 (noce_process_if_block): Try noce_try_store_flag_constants before
 noce_try_cmove.

2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/csel_bfx_1.c: New test.
 * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

This patch regressed following tests on x86_64:

FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]


Sorry for that :(
I could have sworn they passed for me during my run...


While cmov3 case is questionable by itself in light of PR56309 [1],
the cnov2 case regressed from:

cmpl%edi, %esi
sbbl%eax, %eax
andl$-10, %eax
addl$5, %eax
ret

to:

xorl%eax, %eax
cmpl%esi, %edi
setbe   %al
negl%eax
andl$10, %eax
subl$5, %eax
ret

Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
anymore. Non-QImode setcc instructions are problematic on x86, since
they need to be zero-extended to their full length.


The transformation at fault in noce_try_store_flag_constants is:
  /* if (test) x = a; else x = b;
 =   x = (-(test != 0)  (b - a)) + a;  */

I suppose we want to not perform this transformation on x86.
The question is, on what characteristic of the target do we want to gate this
on?




[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

Uros.

Re: [PATCH][AArch64][10/14] Implement target pragmas

2015-08-03 Thread James Greenhalgh

On Mon, Aug 03, 2015 at 10:36:17AM +0100, Kyrill Tkachov wrote:
 And here is a rebased version to resolve a conflict after Alan's patches went 
 in.
 

OK with the nits below fixed.

 2015-08-03  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
   * config.gcc (aarch64*-*-*): Specify c_target_objs and cxx_target_objs.
   * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define.
   (TARGET_CPU_CPP_BUILTINS): Redefine to call aarch64_cpu_cpp_builtins.
   * config/aarch64/aarch64.c (aarch64_override_options_internal): Remove
   static keyword.
   (aarch64_reset_previous_fndecl): New function.
   (aarch64_handle_attr_isa_flags): Handle +nothing in the beginning of
   the string.
   * config/aarch64/aarch64-c.c: New file.
   * config/aarch64/arm_acle.h: Add pragma +crc+nofp at the top.
   Push and pop options at beginning and end.  Remove ifdef
   __ARM_FEATURE_CRC32.
   * config/aarch64/arm_neon.h: Remove #ifdef check on __ARM_NEON.
   Add pragma arch=armv8-a+simd and +crypto where appropriate.
   * config/aarch64/t-aarch64 (aarch64-c.o): New rule.
   * config/aarch64/aarch64-protos.h (aarch64_cpu_cpp_builtins):
   Define prototype.
   (aarch64_register_pragmas): Likewise.
   (aarch64_reset_previous_fndecl): Likewise.
   (aarch64_process_target_attr): Likewise.
   (aarch64_override_options_internal): Likewise.
 
 2015-08-03  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * gcc.target/aarch64/arm_neon-nosimd-error.c: Delete.
 
 

 +/* Define the macros that we always expect to have on AArch64.  */
 +
 +static void
 +aarch64_define_unconditional_macros (cpp_reader *pfile)
 +{
 +  builtin_define (__aarch64__);
 +  builtin_define (__ARM_64BIT_STATE);
 +
 +  builtin_define (__ARM_ARCH_ISA_A64);
 +  builtin_define_with_int_value (__ARM_ALIGN_MAX_PWR, 28);
 +  builtin_define_with_int_value (__ARM_ALIGN_MAX_STACK_PWR, 16);
 +
 +  /* __ARM_ARCH_8A is not mandated by ACLE but we define it unconditionally
 + as interoperability with the same arm macro.  */
 +  builtin_define (__ARM_ARCH_8A);
 +
 +  builtin_define_with_int_value (__ARM_ARCH_PROFILE, 'A');
 +  builtin_define (__ARM_FEATURE_CLZ);
 +  builtin_define (__ARM_FEATURE_IDIV);
 +  builtin_define (__ARM_FEATURE_UNALIGNED);
 +  builtin_define (__ARM_PCS_AAPCS64);
 +  builtin_define_with_int_value (__ARM_SIZEOF_WCHAR_T, WCHAR_TYPE_SIZE / 
 8);
 +
 +}

Extra newline.

 diff --git a/gcc/config/aarch64/aarch64-protos.h 
 b/gcc/config/aarch64/aarch64-protos.h
 index 3a5482d..6844c90 100644
 --- a/gcc/config/aarch64/aarch64-protos.h
 +++ b/gcc/config/aarch64/aarch64-protos.h
 @@ -249,6 +249,7 @@ enum aarch64_symbol_type
  aarch64_classify_symbolic_expression (rtx, enum aarch64_symbol_context);
  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
  bool aarch64_constant_address_p (rtx);
 +extern void aarch64_cpu_cpp_builtins (cpp_reader *);

No need for this extern - and keep this in alphabetical order (first by
return type, then by name).

 +#pragma GCC push_options
 +#pragma GCC target (+crypto)

Keep things simple to manage and understand by inspection, and make these
+nothing+crypto.

 @@ -21067,7 +21065,8 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int 
 __c)
return __builtin_aarch64_ursra_ndi_uuus (__a, __b, __c);
  }
  
 -#ifdef __ARM_FEATURE_CRYPTO
 +#pragma GCC push_options
 +#pragma GCC target (+crypto)
  
  /* vsha1  */
  

Likewise here.

Thanks,
James

Re: offload data version number

2015-08-03 Thread Thomas Schwinge

Hi!

On Sat, 1 Aug 2015 20:20:49 -0400, Nathan Sidwell nat...@acm.org wrote:
 this is the gomp4 committed version.

 --- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp  (revision 226462)
 +++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp  (working copy)
 @@ -327,12 +327,26 @@ offload_image (const void *target_image)
free (image);
  }
  
 +/* Return the libgomp version number we're compatible with.  There is
 +   no requirement for cross-version compatibility.  */
 +
 +extern C unsigned
 +GOMP_OFFLOAD_version (void)
 +{
 +  return GOMP_VERSION;
 +}
 +
  extern C int
 -GOMP_OFFLOAD_load_image (int device, const void *target_image,
 -  addr_pair **result)
 +GOMP_OFFLOAD_load_image (int device, const unsigned version,
 +  void *target_image, addr_pair **result)
  {
TRACE ((device = %d, target_image = %p), device, target_image);
  
 +  if (GOMP_VERSION_DEV (version)  GOMP_VERSION_INTEL_MIC)
 +GOMP_PLUGIN_fatal (Offload data incompatible with intelmic plugin
 + (expected %u, received %u),
 +GOMP_VERSION_INTEL_MIC, GOMP_VERSION_DEV (version));
 +
/* If target_image is already present in address_table, then there is no 
 need
   to offload it.  */
if (address_table-count (target_image) == 0)
 @@ -353,8 +367,12 @@ GOMP_OFFLOAD_load_image (int device, con
  }
  
  extern C void
 -GOMP_OFFLOAD_unload_image (int device, const void *target_image)
 +GOMP_OFFLOAD_unload_image (int device, unsigned version,
 +const void *target_image)
  {
 +  if (GOMP_VERSION_DEV (version)  GOMP_VERSION_INTEL_MIC)
 +return;
 +
TRACE ((device = %d, target_image = %p), device, target_image);
  
/* TODO: Currently liboffloadmic doesn't support __offload_unregister_image

Committed to gomp-4_0-branch in r226497:

commit 4e0158f41a00d6c4d09ca8a48eb63832abdd2f84
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Aug 3 11:14:24 2015 +

Fix intelmic libgomp plugin build

... which got broken in r226469:

[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp: In 
function 'unsigned int GOMP_OFFLOAD_version()':

[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:336:10: 
error: 'GOMP_VERSION' was not declared in this scope
   return GOMP_VERSION;
  ^
[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp: In 
function 'int GOMP_OFFLOAD_load_image(int, unsigned int, void*, addr_pair**)':

[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:345:32: 
error: 'GOMP_VERSION_DEV' was not declared in this scope
   if (GOMP_VERSION_DEV (version)  GOMP_VERSION_INTEL_MIC)
^

[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:345:36: 
error: 'GOMP_VERSION_INTEL_MIC' was not declared in this scope
   if (GOMP_VERSION_DEV (version)  GOMP_VERSION_INTEL_MIC)
^
[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp: In 
function 'void GOMP_OFFLOAD_unload_image(int, unsigned int, const void*)':

[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:373:32: 
error: 'GOMP_VERSION_DEV' was not declared in this scope
   if (GOMP_VERSION_DEV (version)  GOMP_VERSION_INTEL_MIC)
^

[...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:373:36: 
error: 'GOMP_VERSION_INTEL_MIC' was not declared in this scope
   if (GOMP_VERSION_DEV (version)  GOMP_VERSION_INTEL_MIC)
^
make[6]: *** [libgomp_plugin_intelmic_la-libgomp-plugin-intelmic.lo] 
Error 1

liboffloadmic/
* plugin/Makefile.am (include_src_dir): Set.
[PLUGIN_HOST] (libgomp_plugin_intelmic_la_CPPFLAGS): Use it.
* plugin/Makefile.in: Regenerate.
* plugin/libgomp-plugin-intelmic.cpp: Include gomp-constants.h.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226497 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 liboffloadmic/ChangeLog.gomp |7 +++
 liboffloadmic/plugin/Makefile.am |3 ++-
 liboffloadmic/plugin/Makefile.in |3 ++-
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |3 ++-
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git liboffloadmic/ChangeLog.gomp liboffloadmic/ChangeLog.gomp
index 93d1e02..adb9e05 100644
--- liboffloadmic/ChangeLog.gomp
+++ liboffloadmic/ChangeLog.gomp
@@ -1,3 +1,10 @@
+2015-08-03  Thomas Schwinge  tho...@codesourcery.com
+
+   * plugin/Makefile.am (include_src_dir): Set.
+   [PLUGIN_HOST] (libgomp_plugin_intelmic_la_CPPFLAGS): Use it.
+   * plugin/Makefile.in: Regenerate.
+   * plugin/libgomp-plugin-intelmic.cpp: Include gomp-constants.h.
+
 2015-08-01

Re: [PATCH][AArch64][11/14] Re-layout SIMD builtin types on builtin expansion

2015-08-03 Thread James Greenhalgh

On Fri, Jul 24, 2015 at 09:38:34AM +0100, Kyrill Tkachov wrote:
 Thanks, here's an updated version.
 
 2015-07-24  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * config/aarch64/aarch64.c (aarch64_option_valid_attribute_p):
  Initialize simd builtins if TARGET_SIMD.
  * config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins):
  Make sure that the builtins are initialized only once no matter how
  many times the function is called.
  (aarch64_init_builtins): Unconditionally initialize crc builtins.
  (aarch64_relayout_simd_param): New function.
  (aarch64_simd_expand_args): Use above during argument expansion.
  * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Initialize
  simd builtins if TARGET_SIMD.
  * config/aarch64/aarch64-protos.h (aarch64_init_simd_builtins): New
  prototype.
  (aarch64_relayout_simd_types): Likewise.
 
 2015-07-24  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * gcc.target/aarch64/target_attr_crypto_ice_1.c: New test.
 

OK with a minor fix.

 diff --git a/gcc/config/aarch64/aarch64-protos.h 
 b/gcc/config/aarch64/aarch64-protos.h
 index 6844c90..99fd80e 100644
 --- a/gcc/config/aarch64/aarch64-protos.h
 +++ b/gcc/config/aarch64/aarch64-protos.h
 @@ -255,6 +255,7 @@ bool aarch64_float_const_zero_rtx_p (rtx);
  bool aarch64_function_arg_regno_p (unsigned);
  bool aarch64_gen_movmemqi (rtx *);
  bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
 +void aarch64_init_simd_builtins (void);
  bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
  bool aarch64_is_long_call_p (rtx);
  bool aarch64_label_mentioned_p (rtx);

These should be first ordered by return type, then alphabetical order.

 @@ -325,6 +326,7 @@ void aarch64_print_operand (FILE *, rtx, char);
  void aarch64_print_operand_address (FILE *, rtx);
  void aarch64_emit_call_insn (rtx);
  void aarch64_register_pragmas (void);
 +void aarch64_relayout_simd_types (void);
  void aarch64_reset_previous_fndecl (void);
  
  /* Initialize builtins for SIMD intrinsics.  */

Thanks,
James

Re: [PATCH][AArch64][12/14] Target attributes and target pragmas tests

2015-08-03 Thread James Greenhalgh

On Fri, Jul 24, 2015 at 09:40:28AM +0100, Kyrill Tkachov wrote:
 
 On 21/07/15 18:14, James Greenhalgh wrote:
  On Thu, Jul 16, 2015 at 04:21:15PM +0100, Kyrill Tkachov wrote:
  Hi all,
 
  These are the tests for target attributes and pragmas.
  I've tried to test for the inlining rules, some of the possible errors and
  the preprocessor macros changed from target pragmas.
 
  Ok for trunk?
  Mechanical changes in the pragma tests for the sake of grammar!
 
  s/defined but shouldn't/is defined but should not be/
  s/not defined but should/is not defined but should be/
 
  Note that some of the errors have different text, so you'll have to run
  through by hand and check these are consistent.
 
  It would be good to hand some of these target attribute tests off
  to the assembler to make sure we are also putting out appropriate
  directives in our output. Perhaps assemble is the more appropriate
  dg-do directive?
 
  Some more nits below (mostly missing comments on testcases).
 
 Thanks, here's an updated version.
 
 I've also added a test for the +nothing architectural feature
 attribute introduced in patch 10/14 and renamed the tests to use
 underscores in their names.
 
 How's this?

Run

  s/is is/is/

To fix typos like this:

 +#pragma GCC push_options
 +#pragma GCC target (arch=armv8-a+fp+nosimd)
 +#ifndef __ARM_FP
 +#error __ARM_FP is is not defined but should be!
 +#endif
 diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_1.c 
 b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
 new file mode 100644
 index 000..72d0838
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
 @@ -0,0 +1,12 @@
 +/* { dg-do assemble } */
 +/* { dg-options -O2 -mcpu=thunderx -save-temps } */
 +
 +__attribute__ ((target (cpu=cortex-a72.cortex-a53)))
 +int
 +foo (int a)
 +{
 +  return a + 1;
 +}
 +
 +/* { dg-final { scan-assembler //.tune cortex-a72.cortex-a53 } } */
 +/* { dg-final { scan-assembler-not thunderx } } */

Comment on what we are testing.

 diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_2.c 
 b/gcc/testsuite/gcc.target/aarch64/target_attr_2.c
 new file mode 100644
 index 000..0cd6866
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_2.c
 @@ -0,0 +1,39 @@
 +/* { dg-do assemble } */
 +/* { dg-options -O2 -mcpu=cortex-a57 -ftree-vectorize -fdump-tree-vect-all 
 } */
 +
 +/* The various was to turn off simd availability should

s/was/ways/

 +   turn off vectorization.  */
 +
 +__attribute__ ((target (+nosimd)))
 +int
 +baz (int *a)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] += 5;
 +}
 +
 +__attribute__ ((target (arch=armv8-a+nosimd)))
 +int
 +baz2 (int *a)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] += 5;
 +}
 +
 +__attribute__ ((target (cpu=cortex-a53+nosimd)))
 +int
 +baz3 (int *a)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] += 5;
 +}
 +
 +__attribute__ ((target (general-regs-only)))
 +int
 +baz4 (int *a)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] += 5;
 +}
 +
 +/* { dg-final { scan-tree-dump-not vectorized 1 loops vect } } */

 diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_8.c 
 b/gcc/testsuite/gcc.target/aarch64/target_attr_8.c
 new file mode 100644
 index 000..3f32569
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_8.c
 @@ -0,0 +1,21 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -save-temps } */
 +
 +/* bar has a subset arch of bam.  Inlining should be allowed.  */

s/arch/set of architectural flags/

Or words to that effect.

 +
 +__attribute__ ((target (arch=armv8-a+nocrc)))
 +int
 +bar (int a)
 +{
 +  return a - 6;
 +}
 +
 +__attribute__ ((target (arch=armv8-a+crc)))
 +int
 +bam (int a)
 +{
 +  return a - bar (a);
 +}
 +
 +
 +/* { dg-final { scan-assembler-not bl.*bar } } */
 diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_9.c 
 b/gcc/testsuite/gcc.target/aarch64/target_attr_9.c
 new file mode 100644
 index 000..22bd99b
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_9.c
 @@ -0,0 +1,21 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -save-temps } */
 +
 +/* bar is not a subset arch of bam.  Inlining should be rejected.  */

Likewise.

 +
 +__attribute__ ((target (arch=armv8-a+crc)))
 +int
 +bar (int a)
 +{
 +  return a - 6;
 +}
 +
 +__attribute__ ((target (arch=armv8-a+nocrc)))
 +int
 +bam (int a)
 +{
 +  return a - bar (a);
 +}
 +
 +
 +/* { dg-final { scan-assembler bl.*bar } } */

OK with those changes.

Thanks,
James

Re: [PATCH] Work around host compiler placement new aliasing bug

2015-08-03 Thread Ulrich Weigand

Richard Biener wrote:
 On Wed, Jul 29, 2015 at 3:57 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
  Hello,
 
  this patch is a workaround for the problem discussed here:
  https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01597.html
 
  The problem is that the new pool allocator code relies on C++ aliasing
  rules related to placement new (basically, that placement new changes
  the dynamic type of the referenced memory).  GCC compilers prior to
  version 4.3 did not implement this rule correctly (PR 29286).
 
  When building current GCC with a host compiler that is affected by this
  bug, and we build with optimization enabled (this typically only happens
  when building a cross-compiler), the resulting compiler binary may be
  miscompiled.
 
  The patch below attempts to detect this situation by checking whether
  the host compiler is a version of GCC prior to 4.3 (but stil accepts
  the -fno-strict-aliasing flag).  If so, -fno-strict-aliasing is added
  to the flags when building the compiler binary.
 
  Tested on i686-linux, and when building an SPU cross-compiler using
  a gcc 4.1 powerpc64-linux host compiler.
 
  OK for mainline?
 
 Ok if nobody objects.

I've checked this in now.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [gomp4] expand acc_on_device earlier

2015-08-03 Thread Thomas Schwinge

Hi!

On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote:
 I've committed this to gomp4 branch.  It expands the acc_on_device builtin 
 earlier in the new oacc_xform pass.  This will allow more optimization 
 earlier on.

As far as I remember, the Fortran front end doesn't know about/doesn't
know how to use these GCC builtins, so it's not currently able to do such
compile-time folding of acc_on_device.

   libgomp/
   * openacc.h (acc_on_device): Take int and explain why.

 --- libgomp/openacc.h (revision 226462)
 +++ libgomp/openacc.h (working copy)
 @@ -78,7 +78,11 @@ void acc_wait_all (void) __GOACC_NOTHROW
  void acc_wait_all_async (int) __GOACC_NOTHROW;
  void acc_init (acc_device_t) __GOACC_NOTHROW;
  void acc_shutdown (acc_device_t) __GOACC_NOTHROW;
 -int acc_on_device (acc_device_t) __GOACC_NOTHROW;
 +/* Library function declaration.  Although it should take an
 +   acc_device_t argument, that causes problems with matching the
 +   builtin, which takes an int (to avoid declaring the enumeration
 +   inside the compiler).  */
 +int acc_on_device (int) __GOACC_NOTHROW;
  void *acc_malloc (size_t) __GOACC_NOTHROW;
  void acc_free (void *) __GOACC_NOTHROW;
  /* Some of these would be more correct with const qualifiers, but

Hmm, too bad this doesn't resolve the following C++ XFAIL (the test case
is fine for C), gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c:

/* Have to enable optimizations, as otherwise builtins won't be expanded.  
*/
/* { dg-additional-options -O -fdump-rtl-expand } */

/* Duplicate parts of libgomp/openacc.h, because we can't include it here.  
*/

#if __cplusplus
extern C {
#endif

typedef enum acc_device_t { acc_device_X = 123 } acc_device_t;
extern int acc_on_device (int);

#if __cplusplus
}
#endif

int
f (void)
{
  const acc_device_t dev = acc_device_X;
  return acc_on_device (dev);
}

/* With -fopenacc, we're expecting the builtin to be expanded, so no calls.
   TODO: in C++, the use of enum acc_device_t for acc_on_device's parameter
   perturbs expansion as a builtin, which expects an int parameter.  It's 
fine
   when changing acc_device_t to plain int, but that's not necessarily what 
a
   user will be doing.

   { dg-final { scan-rtl-dump-times \\\(call \[^\\n\]* acc_on_device 0 
expand { xfail c++ } } } */

Do you happen to have any insight into that?


Meanwhile, committed to gomp-4_0-branch in r226501:

commit 38d7f116bf18f4a4012fda532aa70d47f3f20652
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Aug 3 11:59:15 2015 +

Update acc_on_device test cases

... to match r226484 changes.

gcc/testsuite/
* c-c++-common/goacc/acc_on_device-2-off.c (acc_on_device): Change
formal parameter to int.
* c-c++-common/goacc/acc_on_device-2.c (acc_on_device): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226501 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp   |  4 
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c |  4 +++-
 gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c | 10 ++
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 33519ef..8dd9a4c 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,5 +1,9 @@
 2015-08-03  Thomas Schwinge  tho...@codesourcery.com
 
+   * c-c++-common/goacc/acc_on_device-2-off.c (acc_on_device): Change
+   formal parameter to int.
+   * c-c++-common/goacc/acc_on_device-2.c (acc_on_device): Likewise.
+
* c-c++-common/restrict-2.c: Update for new pass_lim.
* c-c++-common/restrict-4.c: Same.
* g++.dg/tree-ssa/pr33615.C: Same.
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c 
gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
index 59c72f7..71abe11 100644
--- gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c
@@ -1,12 +1,14 @@
 /* Have to enable optimizations, as otherwise builtins won't be expanded.  */
 /* { dg-additional-options -O -fdump-rtl-expand -fno-openacc } */
 
+/* Duplicate parts of libgomp/openacc.h, because we can't include it here.  */
+
 #if __cplusplus
 extern C {
 #endif
 
 typedef enum acc_device_t { acc_device_X = 123 } acc_device_t;
-extern int acc_on_device (acc_device_t);
+extern int acc_on_device (int);
 
 #if __cplusplus
 }
diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c 
gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
index ef622a8..243e562 100644
--- gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
+++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c
@@ -1,12 +1,14 @@
 /* Have to enable optimizations, as otherwise builtins won't be expanded.  */
 /* {

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-08-03 Thread Uros Bizjak

Hello!

 2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
 when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
 explicit.  Prefer to add the flag whenever possible.
 (noce_process_if_block): Try noce_try_store_flag_constants before
 noce_try_cmove.

 2015-07-30  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/csel_bfx_1.c: New test.
 * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

This patch regressed following tests on x86_64:

FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]

While cmov3 case is questionable by itself in light of PR56309 [1],
the cnov2 case regressed from:

   cmpl%edi, %esi
   sbbl%eax, %eax
   andl$-10, %eax
   addl$5, %eax
   ret

to:

   xorl%eax, %eax
   cmpl%esi, %edi
   setbe   %al
   negl%eax
   andl$10, %eax
   subl$5, %eax
   ret

Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
anymore. Non-QImode setcc instructions are problematic on x86, since
they need to be zero-extended to their full length.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

Uros.

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-08-03 Thread Paul Richard Thomas

Dear Mikael,

Thanks for your green light!

I have been mulling over the trans-decl part of the patch and having
been wondering if it is necessary. Without optimization, private
entities can be linked to. Given the discussion concerning the
combination of submodules and private entities, I wonder if this is
not sufficient? Within submodule scope, an advisory could be given for
undefined references to suggest recompiling the module without
optimization or making the entities public.

Cheers

Paul

On 3 August 2015 at 12:44, Mikael Morin mikael.mo...@sfr.fr wrote:
 Le 29/07/2015 17:08, Paul Richard Thomas a écrit :

 Dear All,

 On 24 July 2015 at 10:08, Damian Rouson dam...@sourceryinstitute.org
 wrote:

 I love this idea and had similar thoughts as well.

 :D

 Sent from my iPhone

 On Jul 24, 2015, at 1:06 AM, Paul Richard Thomas
 paul.richard.tho...@gmail.com wrote:

 Dear Mikael,

 It had crossed my mind also that a .mod and a .smod file could be
 written. Normally, the .smod files are produced by the submodules
 themselves, so that their descendants can pick up the symbols that
 they generate. There is no reason at all why this could not be
 implemented; early on in the development I did just this, although I
 think that it would now be easier to modify this patch.

 One huge advantage of proceeding in this way is that any resulting
 library can be distributed with the .mod file alone so that the
 private entities are never exposed. The penalty is that a second file
 is output.

 With best regards

 Paul


 Please find attached the implementation of this suggestion.

 Bootstraps and regtests on FC21/x86_64 - OK for trunk or is the
 original preferred?

 There hasn't been a lot of voices about this among the other active and less
 active team members.
 I prefer this private members to separate smod variant.
 It's OK for trunk as far as I'm concerned.
 Thanks.

 Mikael

 PS: Regarding redundant initializations: rather have too many than too few.
 ;-)



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [gomp4] expand acc_on_device earlier

2015-08-03 Thread Nathan Sidwell


On 08/03/15 07:37, Thomas Schwinge wrote:

Hi!

On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote:

I've committed this to gomp4 branch.  It expands the acc_on_device builtin
earlier in the new oacc_xform pass.  This will allow more optimization earlier 
on.


Thanks!


The existing expansion point now only needs to deal with the host-side case.


Actually, no; committed to gomp-4_0-branch in r226498:


Please clarify.  This suggests a logic fault elsewhere.

nathan
--
Nathan Sidwell

RE: [PATCH, MIPS, Ping] Inline memcpy for MipsR6

2015-08-03 Thread Simon Dardis

Catherine,

Inline-memcpy-2.c updated to not run with -Os.

Patch rebased off current gcc sources.

Thanks,
Simon


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1733457..627e078 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -7520,12 +7520,22 @@ mips_block_move_straight (rtx dest, rtx src, 
HOST_WIDE_INT length)
  half-word alignment, it is usually better to move in half words.
  For instance, lh/lh/sh/sh is usually better than lwl/lwr/swl/swr
  and lw/lw/sw/sw is usually better than ldl/ldr/sdl/sdr.
- Otherwise move word-sized chunks.  */
-  if (MEM_ALIGN (src) == BITS_PER_WORD / 2
-   MEM_ALIGN (dest) == BITS_PER_WORD / 2)
-bits = BITS_PER_WORD / 2;
+ Otherwise move word-sized chunks.
+
+ For ISA_HAS_LWL_LWR we rely on the lwl/lwr  swl/swr load. Otherwise
+ picking the minimum of alignment or BITS_PER_WORD gets us the
+ desired size for bits.  */
+
+  if (!ISA_HAS_LWL_LWR)
+bits = MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)));
   else
-bits = BITS_PER_WORD;
+{
+  if (MEM_ALIGN (src) == BITS_PER_WORD / 2
+  MEM_ALIGN (dest) == BITS_PER_WORD / 2)
+   bits = BITS_PER_WORD / 2;
+  else
+   bits = BITS_PER_WORD;
+}
 
   mode = mode_for_size (bits, MODE_INT, 0);
   delta = bits / BITS_PER_UNIT;
@@ -7644,8 +7654,9 @@ mips_block_move_loop (rtx dest, rtx src, HOST_WIDE_INT 
length,
 bool
 mips_expand_block_move (rtx dest, rtx src, rtx length)
 {
-  /* Disable entirely for R6 initially.  */
-  if (!ISA_HAS_LWL_LWR)
+  if (!ISA_HAS_LWL_LWR
+   (MEM_ALIGN (src)  MIPS_MIN_MOVE_MEM_ALIGN
+ || MEM_ALIGN (dest)  MIPS_MIN_MOVE_MEM_ALIGN))
 return false;
 
   if (CONST_INT_P (length))
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index ec69ed5..4b1787d 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2969,6 +2969,9 @@ while (0)
 #undef PTRDIFF_TYPE
 #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? long int : int)
 
+/* The minimum alignment of any expanded block move.  */
+#define MIPS_MIN_MOVE_MEM_ALIGN 16
+
 /* The maximum number of bytes that can be copied by one iteration of
a movmemsi loop; see mips_block_move_loop.  */
 #define MIPS_MAX_MOVE_BYTES_PER_LOOP_ITER \
diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-1.c 
b/gcc/testsuite/gcc.target/mips/inline-memcpy-1.c
new file mode 100644
index 000..5a254b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-1.c
@@ -0,0 +1,16 @@
+/* { dg-options -fno-common isa_rev=6 } */
+/* { dg-skip-if code quality test { *-*-* } { -O0 -Os } {  } } */
+/* { dg-final { scan-assembler-not \tmemcpy } } */
+
+/* Test that memcpy is inline for target hardware
+   without swl, swr.  */
+
+#include string.h
+
+char c[40] __attribute__ ((aligned(8)));
+
+void
+f1 ()
+{
+  memcpy (c, 1234567890QWERTYUIOPASDFGHJKLZXCVBNM, 32);
+}
diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-2.c 
b/gcc/testsuite/gcc.target/mips/inline-memcpy-2.c
new file mode 100644
index 000..e144e61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-2.c
@@ -0,0 +1,17 @@
+/* { dg-options -fno-common isa_rev=6 } */
+/* { dg-skip-if code quality test { *-*-* } { -O0 -Os} {  } } */
+/* { dg-final { scan-assembler-not \tmemcpy } } */
+/* { dg-final { scan-assembler-times \tsh\t 16 } } */
+
+/* Test that inline memcpy is expanded for target hardware without
+   swl, swr when alignment is halfword and sufficent shs are produced.  */
+
+#include string.h
+
+char c[40] __attribute__ ((aligned(2)));
+
+void
+f1 ()
+{
+  memcpy (c, 1234567890QWERTYUIOPASDFGHJKLZXCVBNM, 32);
+}
diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c 
b/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c
new file mode 100644
index 000..96a0387
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c
@@ -0,0 +1,18 @@
+/* { dg-options -fno-common isa_rev=5 } */
+/* { dg-skip-if code quality test { *-*-* } { -O0 -Os} {  } } */
+/* { dg-final { scan-assembler-not \tmemcpy } } */
+/* { dg-final { scan-assembler-times swl 8 } } */
+/* { dg-final { scan-assembler-times swr 8 } } */
+
+/* Test that inline memcpy for hardware with swl, swr handles subword
+   alignment and produces enough swl/swrs for mips32.  */
+
+#include string.h
+
+char c[40] __attribute__ ((aligned(2)));
+
+void
+f1 ()
+{
+  memcpy (c, 1234567890QWERTYUIOPASDFGHJKLZXCVBNM, 32);
+}
diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-4.c 
b/gcc/testsuite/gcc.target/mips/inline-memcpy-4.c
new file mode 100644
index 000..0e7a22e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-4.c
@@ -0,0 +1,18 @@
+/* { dg-options -fno-common isa_rev=5 -mabi=64 } */
+/* { dg-skip-if code quality test { *-*-* } { -O0 -Os} {  } } */
+/* { dg-final { scan-assembler-not \tmemcpy } } */
+/* { dg-final { scan-assembler-times sdl 4 } } */
+/* { dg-final { scan-assembler-times sdr 4 } } */
+
+/* Test that inline memcpy for hardware with sdl,

Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG

2015-08-03 Thread James Greenhalgh

On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote:
 Hi all,
 
 This patch implements an aarch64-specific expansion of the signed modulo by a 
 power of 2.
 The proposed sequence makes use of the conditional negate instruction CSNEG.
 For a power of N, x % N can be calculated with:
 negs   x1, x0
 andx0, x0, #(N - 1)
 andx1, x1, #(N - 1)
 csneg  x0, x0, x1, mi
 
 So, for N == 256 this would be:
 negs   x1, x0
 andx0, x0, #255
 andx1, x1, #255
 csneg  x0, x0, x1, mi
 
 For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c 
 is:
 asr x1, x0, 63
 lsr x1, x1, 56
 add x0, x0, x1
 and x0, x0, 255
 sub x0, x0, x1
 
 Note that the CSNEG sequence is one instruction shorter and that the two and 
 operations
 are independent, compared to the existing sequence where all instructions are 
 dependent
 on the preceeding instructions.
 
 For the special case of N == 2 we can do even better:
 cmp x0, xzr
 and x0, x0, 1
 csneg   x0, x0, x0, ge
 
 I first tried implementing this in the generic code in expmed.c but that 
 didn't work
 out for a few reasons:
 
 * This relies on having a conditional-negate instruction. We could gate it on
 HAVE_conditional_move and the combiner is capable of merging the final negate 
 into
 the conditional move if a conditional negate is available (like on aarch64) 
 but on
 targets without a conditional negate this would end up emitting a separate 
 negate.
 
 * The first negs has to be a negs for the sequence to be a win i.e. having a 
 separate
 negate and compare makes the sequence slower than the existing one (at least 
 in my
 microbenchmarking) and I couldn't get subsequent passes to combine the negate 
 and combine
 into the negs (presumably due to the use of the negated result in one of the 
 ands).
 Doing it in the aarch64 backend where I could just call the exact gen_* 
 functions that
 I need worked much more cleanly.
 
 The costing logic is updated to reflect this sequence during the 
 intialisation of
 expmed.c where it calculates the smod_pow2_cheap metric.
 
 The tests will come in patch 3 of the series which are partly shared with the 
 equivalent
 arm implementation.
 
 Bootstrapped and tested on aarch64.
 Ok for trunk?
 
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 9d88a60..7bb4a55 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -6639,8 +6639,26 @@ cost_plus:
 if (VECTOR_MODE_P (mode))
   *cost += extra_cost-vect.alu;
 else if (GET_MODE_CLASS (mode) == MODE_INT)
 - *cost += (extra_cost-mult[mode == DImode].add
 -   + extra_cost-mult[mode == DImode].idiv);
 + {
 +   /* We can expand signed mod by power of 2 using a
 +  NEGS, two parallel ANDs and a CSNEG.  Assume here
 +  that CSNEG is COSTS_N_INSNS (1).  This case should
 +  only ever be reached through the set_smod_pow2_cheap check
 +  in expmed.c.  */
 +   if (code == MOD
 +CONST_INT_P (XEXP (x, 1))
 +exact_log2 (INTVAL (XEXP (x, 1)))  0
 +(mode == SImode || mode == DImode))
 + {
 +   *cost += COSTS_N_INSNS (3)
 ++ 2 * extra_cost-alu.logical
 ++ extra_cost-alu.arith;
 +   return true;
 + }
 +
 +   *cost += (extra_cost-mult[mode == DImode].add
 + + extra_cost-mult[mode == DImode].idiv);
 + }
 else if (mode == DFmode)
   *cost += (extra_cost-fp[1].mult
 + extra_cost-fp[1].div);

This looks like it calculates the wrong cost for !speed. I think we will
still expand through modmode3 when compiling for size, so we probably
still want to cost the multiple instructions.

Have I misunderstood?

Thanks,
James

Re: [gomp4] expand acc_on_device earlier

2015-08-03 Thread Thomas Schwinge

Hi!

On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote:
 I've committed this to gomp4 branch.  It expands the acc_on_device builtin 
 earlier in the new oacc_xform pass.  This will allow more optimization 
 earlier on.

Thanks!

 The existing expansion point now only needs to deal with the host-side case.

Actually, no; committed to gomp-4_0-branch in r226498:

commit 5d1009bd4b959da699cf439200ffa571bf547026
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Aug 3 11:36:01 2015 +

Fix offload compiler build, and de-duplicate acc_on_device logic

When building an x86_64-intelmicemul-linux-gnu offloading compiler with
r226484, the __builtin_acc_on_device usage in 
libgomp/oacc-init.c:acc_on_device
makes it blow up:

[...]/source-gcc/libgomp/oacc-init.c: In function 'acc_on_device':
[...]/source-gcc/libgomp/oacc-init.c:643:10: internal compiler error: 
in expand_builtin_acc_on_device, at builtins.c:5891
   return __builtin_acc_on_device (dev);
  ^
0x6d9632 expand_builtin_acc_on_device
[...]/source-gcc/gcc/builtins.c:5891
0x6d9632 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, 
int)
[...]/source-gcc/gcc/builtins.c:7127
0x8021ad expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
[...]/source-gcc/gcc/expr.c:10361
0x80e8b5 store_expr_with_bounds(tree_node*, rtx_def*, int, bool, 
tree_node*)
[...]/source-gcc/gcc/expr.c:5400
0x810eb4 expand_assignment(tree_node*, tree_node*, bool)
[...]/source-gcc/gcc/expr.c:5169
0x6f927b expand_call_stmt
[...]/source-gcc/gcc/cfgexpand.c:2349
0x6f927b expand_gimple_stmt_1
[...]/source-gcc/gcc/cfgexpand.c:3238
0x6fa77c expand_gimple_stmt
[...]/source-gcc/gcc/cfgexpand.c:3399
0x6fa842 expand_gimple_tailcall
[...]/source-gcc/gcc/cfgexpand.c:3446
0x6fc615 expand_gimple_basic_block
[...]/source-gcc/gcc/cfgexpand.c:5388
0x701456 execute
[...]/source-gcc/gcc/cfgexpand.c:6023

(This is not a problem for a nvptx-none offloading compiler, because of its
open-coded libgcc/config/nvptx/gomp-acc_on_device.c:acc_on_device
implementation.)

gcc/
* builtins.c (expand_builtin_on_device): Don't expect to be
expanded on host compiler.
libgcc/
* config/nvptx/gomp-acc_on_device.c: Don't include
gomp-constants.h.
(acc_on_device): Don't code directly here.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226498 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp   |3 +++
 gcc/builtins.c   |   30 ++
 gcc/omp-low.c|7 ++-
 libgcc/ChangeLog.gomp|   10 --
 libgcc/config/nvptx/gomp-acc_on_device.c |   16 +++-
 libgomp/oacc-init.c  |2 +-
 6 files changed, 39 insertions(+), 29 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index a30f6a3..d5cc1ed 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2015-08-03  Thomas Schwinge  tho...@codesourcery.com
 
+   * builtins.c (expand_builtin_on_device): Don't expect to be
+   expanded on host compiler.
+
* gimplify.c (oacc_default_clause): Remove in_code formal
parameter.  Adjust all users.
 
diff --git gcc/builtins.c gcc/builtins.c
index 9bc08f6..ce369a1 100644
--- gcc/builtins.c
+++ gcc/builtins.c
@@ -5880,39 +5880,45 @@ expand_stack_save (void)
 }
 
 
-/* Expand OpenACC acc_on_device.  This is expanded in the openacc
-   transform pass, but if the user has this outside of an offloaded
-   region, we'll find it here.  In that case we must be host or none.  */
+/* Expand OpenACC acc_on_device.  This is usually expanded in the
+   oacc_transform pass, earlier on, but if used outside of an offloaded region,
+   we'll find it here.  */
 
 static rtx
 expand_builtin_acc_on_device (tree exp, rtx target)
 {
-#ifdef ACCEL_COMPILER
-  gcc_unreachable ();
-#else
+#ifndef ACCEL_COMPILER
   gcc_assert (!get_oacc_fn_attrib (current_function_decl));
+#endif
   
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
 return NULL_RTX;
 
   tree arg = CALL_EXPR_ARG (exp, 0);
-  rtx val = expand_normal (arg);
+
+  /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
+  machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
+  rtx v = expand_normal (arg), v1, v2;
+#ifdef ACCEL_COMPILER
+  v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
+  v2 = GEN_INT (ACCEL_COMPILER_acc_device);
+#else
+  v1 = GEN_INT (GOMP_DEVICE_NONE);
+  v2 = GEN_INT (GOMP_DEVICE_HOST);
+#endif
   machine_mode target_mode = TYPE_MODE (integer_type_node);
   if

[gomp4] PTX launch dimensions

2015-08-03 Thread Nathan Sidwell

I've committed this to gomp4.  The ptx backend can now examine the openacc 
attribute to determine launch dimensions and figure out whether vector or worker 
single neutering is needed.


nathan
2015-08-03  Nathan Sidwell  nat...@codesourcery.com

	* config/nvptx/nvptx.c (nvptx_reorg): Check get_oacc_fn_attrib for
	launch dimensions and only do parallel processing when present.
	Check dimensions to determine neutering requirements.
	(nvptx_record_offload_symbol): Launch dimension attribute must be
	present on offloaded functions.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 226485)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -2980,13 +2980,42 @@ nvptx_reorg (void)
 if (REG_N_SETS (i) == 0  REG_N_REFS (i) == 0)
   regno_reg_rtx[i] = const0_rtx;
 
-  parallel *pars = nvptx_discover_pars (bb_insn_map);
-
-  nvptx_process_pars (pars);
-  nvptx_neuter_pars (pars, (GOMP_DIM_MASK (GOMP_DIM_VECTOR)
-			| GOMP_DIM_MASK (GOMP_DIM_WORKER)), 0);
-
-  delete pars;
+  /* Determine launch dimensions of the function.  If it is not an
+ offloaded function  (i.e. this is a regular compiler), the
+ function has no neutering.  */
+  tree attr = get_oacc_fn_attrib (current_function_decl);
+  if (attr)
+{
+  unsigned mask = 0;
+  tree dims = TREE_VALUE (attr);
+  unsigned ix;
+
+  for (ix = 0; ix != GOMP_DIM_MAX; ix++)
+	{
+	  unsigned HOST_WIDE_INT dim = 0;
+
+	  if (dims)
+	{
+	  tree cst = TREE_VALUE (dims);
+
+	  dim = TREE_INT_CST_LOW (cst);
+	  dims = TREE_CHAIN (dims);
+	}
+	  if (dim != 1)
+	mask |= GOMP_DIM_MASK (ix);
+	}
+  /* If there is worker neutering, there must be vector
+	 neutering.  Otherwise the hardware will fail.  This really
+	 should be dealt with earlier because it indicates faulty
+	 logic in determining launch dimensions.  */
+  if (mask  GOMP_DIM_MASK (GOMP_DIM_WORKER))
+	mask |= GOMP_DIM_MASK (GOMP_DIM_VECTOR);
+
+  parallel *pars = nvptx_discover_pars (bb_insn_map);
+  nvptx_process_pars (pars);
+  nvptx_neuter_pars (pars, mask, 0);
+  delete pars;
+}
 
   nvptx_reorg_subreg ();
   
@@ -3073,32 +3102,25 @@ nvptx_record_offload_symbol (tree decl)
 case FUNCTION_DECL:
   {
 	tree attr = get_oacc_fn_attrib (decl);
-	tree dims = NULL_TREE;
+	tree dims = TREE_VALUE (attr);
 	unsigned ix;
 	
-	if (attr)
-	  dims = TREE_VALUE (attr);
 	fprintf (asm_out_file, //:FUNC_MAP \%s\,
 		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
 
-	for (ix = 0; ix != GOMP_DIM_MAX; ix++)
+	for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
 	  {
-	unsigned HOST_WIDE_INT dim = 0;
-	if (dims)
-	  {
-		tree cst = TREE_VALUE (dims);
-
-		/* When device_type support is added an ealier pass
-		   should have massaged the attribute to be
-		   ptx-specific.  */
-		gcc_assert (TREE_CODE (cst) == INTEGER_CST);
-
-		dim = TREE_INT_CST_LOW (cst);
-		dims = TREE_CHAIN (dims);
-	  }
+	tree cst = TREE_VALUE (dims);
+
+	/* When device_type support is added an earlier pass
+	   should have massaged the attribute to be
+	   ptx-specific.  */
+	gcc_assert (TREE_CODE (cst) == INTEGER_CST);
+
+	unsigned HOST_WIDE_INT dim = TREE_INT_CST_LOW (cst);
 	fprintf (asm_out_file, ,  HOST_WIDE_INT_PRINT_HEX, dim);
 	  }
-	
+
 	fprintf (asm_out_file, \n);
   }
   break;

[PATCH] Merge even more tails (from for expansion) in genmatch.c

2015-08-03 Thread Richard Biener


This changes for lowering for most cases that can end up being tail-merged
to not substitute the operator into the result expression but refer to it
using a local variable of the name of the for iterator.

Boostrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

This drops the number of tail implementations from 744 to 224 (in my
dev tree).

Richard.

2015-08-03  Richard Biener  rguent...@suse.de

* genmatch.c (simplify::for_subst_vec): New member.
(binary_ok): New helper for for lowering.
(lower_for): Delay substituting operators into result expressions
if we can merge the results eventually again.
(capture_info::walk_result): Adjust for user_id appearing as
result expression operator.
(expr::gen_transform): Likewise.
(dt_simplify::gen_1): Likewise.
(dt_simplify::gen): Pass not substituted operators to tail
functions or initialize local variable with it.
(decision_tree::gen): Adjust function signature.
* match.pd: Fix tests against global code and add default
cases to switch stmts.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 226490)
+++ gcc/genmatch.c  (working copy)
@@ -683,7 +683,7 @@ struct simplify
   simplify (simplify_kind kind_, operand *match_, operand *result_,
vecvecuser_id *  for_vec_, cid_map_t *capture_ids_)
   : kind (kind_), match (match_), result (result_),
-  for_vec (for_vec_),
+  for_vec (for_vec_), for_subst_vec (vNULL),
   capture_ids (capture_ids_), capture_max (capture_ids_-elements () - 1) 
{}
 
   simplify_kind kind;
@@ -697,6 +697,7 @@ struct simplify
   /* Collected 'for' expression operators that have to be replaced
  in the lowering phase.  */
   vecvecuser_id *  for_vec;
+  vecstd::pairuser_id *, id_base *  for_subst_vec;
   /* A map of capture identifiers to indexes.  */
   cid_map_t *capture_ids;
   int capture_max;
@@ -1135,6 +1136,38 @@ replace_id (operand *o, user_id *id, id_
   return o;
 }
 
+/* Return true if the binary operator OP is ok for delayed substitution
+   during for lowering.  */
+
+static bool
+binary_ok (operator_id *op)
+{
+  switch (op-code)
+{
+case PLUS_EXPR:
+case MINUS_EXPR:
+case MULT_EXPR:
+case TRUNC_DIV_EXPR:
+case CEIL_DIV_EXPR:
+case FLOOR_DIV_EXPR:
+case ROUND_DIV_EXPR:
+case TRUNC_MOD_EXPR:
+case CEIL_MOD_EXPR:
+case FLOOR_MOD_EXPR:
+case ROUND_MOD_EXPR:
+case RDIV_EXPR:
+case EXACT_DIV_EXPR:
+case MIN_EXPR:
+case MAX_EXPR:
+case BIT_IOR_EXPR:
+case BIT_XOR_EXPR:
+case BIT_AND_EXPR:
+  return true;
+default:
+  return false;
+}
+}
+
 /* Lower recorded fors for SIN and output to SIMPLIFIERS.  */
 
 static void
@@ -1154,9 +1187,46 @@ lower_for (simplify *sin, vecsimplify *
   vecuser_id * ids = for_vec[fi];
   unsigned n_ids = ids.length ();
   unsigned max_n_opers = 0;
+  bool can_delay_subst = (sin-kind == simplify::SIMPLIFY);
   for (unsigned i = 0; i  n_ids; ++i)
-   if (ids[i]-substitutes.length ()  max_n_opers)
- max_n_opers = ids[i]-substitutes.length ();
+   {
+ if (ids[i]-substitutes.length ()  max_n_opers)
+   max_n_opers = ids[i]-substitutes.length ();
+ /* Require that all substitutes are of the same kind so that
+if we delay substitution to the result op code generation
+can look at the first substitute for deciding things like
+types of operands.  */
+ enum id_base::id_kind kind = ids[i]-substitutes[0]-kind;
+ for (unsigned j = 0; j  ids[i]-substitutes.length (); ++j)
+   if (ids[i]-substitutes[j]-kind != kind)
+ can_delay_subst = false;
+   else if (operator_id *op
+  = dyn_cast operator_id * (ids[i]-substitutes[j]))
+ {
+   operator_id *op0
+ = as_a operator_id * (ids[i]-substitutes[0]);
+   if (strcmp (op-tcc, tcc_comparison) == 0
+strcmp (op0-tcc, tcc_comparison) == 0)
+ ;
+   /* Unfortunately we can't just allow all tcc_binary.  */
+   else if (strcmp (op-tcc, tcc_binary) == 0
+ strcmp (op0-tcc, tcc_binary) == 0
+ binary_ok (op)
+ binary_ok (op0))
+ ;
+   else if ((strcmp (op-id + 1, SHIFT_EXPR) == 0
+ || strcmp (op-id + 1, ROTATE_EXPR) == 0)
+ (strcmp (op0-id + 1, SHIFT_EXPR) == 0
+|| strcmp (op0-id + 1, ROTATE_EXPR) == 0))
+ ;
+   else
+ can_delay_subst = false;
+ }
+   else if (is_a fn_id * (ids[i]-substitutes[j]))
+ ;
+   else
+ can_delay_subst = false;
+   }

Re: [C++/66443] virtual base of abstract class

2015-08-03 Thread Nathan Sidwell


On 08/02/15 23:44, Jason Merrill wrote:


It seems to me that DR 1658 ignores vbases of abstract classes for determining
whether a destructor is deleted, but says nothing about exception 
specifications.

DR 1351 specifically ignores vbases of abstract classes for determining the
exception specification of a constructor, but only for constructors.

So I think that for destructors we want to walk the base, but pass in a fake
delete_p.


Ok. that all makes sense.


Why the check for inherited_parms?  I would think that inheriting constructors
would be handled like other ctors.


I think I continue to be confused ...

nathan

1 2 >

1 - 100 of 117 matches

Mail list logo