date:20130517

[PATCH] Pattern recognizer rotate improvement

2013-05-17 Thread Jakub Jelinek

On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote:
 We have the same issue in some other places where we insert invariant
 code into the loop body - one reason there is another LIM pass
 after vectorization.

Well, in this case it causes the shift amount to be loaded into a vector
instead of scalar, therefore even when LIM moves it before the loop, it
will only work with vector/vector shifts and be more expensive that way
(need to broadcast the value in a vector).  The following patch
improves it slightly at least for loops, by just emitting the shift amount
stmts to loop preheader, rotate-4.c used to be only vectorizable with
-mavx2 (which has vector/vector shifts), now also -mavx (which doesn't)
vectorizes it.  Unfortunately this trick doesn't work for SLP vectorization,
emitting the stmts at the start of the current bb doesn't help, because
every stmt emits its own and thus it is vectorized with vector/vector
shifts only anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-05-17  Jakub Jelinek  ja...@redhat.com

* tree-vect-patterns.c (vect_recog_rotate_pattern): For
vect_external_def oprnd1 with loop_vinfo, try to emit
optional cast, negation and and stmts on the loop preheader
edge instead of into the pattern def seq.

* gcc.target/i386/rotate-4.c: Compile only with -mavx
instead of -mavx2, require only avx instead of avx2.
* gcc.target/i386/rotate-4a.c: Include avx-check.h instead
of avx2-check.h and turn into an avx runtime test instead of
avx2 runtime test.

--- gcc/tree-vect-patterns.c.jj 2013-05-16 13:56:08.0 +0200
+++ gcc/tree-vect-patterns.c2013-05-16 15:27:00.565143478 +0200
@@ -1494,6 +1494,7 @@ vect_recog_rotate_pattern (vecgimple *
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
   enum vect_def_type dt;
   optab optab1, optab2;
+  edge ext_def = NULL;
 
   if (!is_gimple_assign (last_stmt))
 return NULL;
@@ -1574,6 +1575,21 @@ vect_recog_rotate_pattern (vecgimple *
   if (*type_in == NULL_TREE)
 return NULL;
 
+  if (dt == vect_external_def
+   TREE_CODE (oprnd1) == SSA_NAME
+   loop_vinfo)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  ext_def = loop_preheader_edge (loop);
+  if (!SSA_NAME_IS_DEFAULT_DEF (oprnd1))
+   {
+ basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (oprnd1));
+ if (bb == NULL
+ || !dominated_by_p (CDI_DOMINATORS, ext_def-dest, bb))
+   ext_def = NULL;
+   }
+}
+
   def = NULL_TREE;
   if (TREE_CODE (oprnd1) == INTEGER_CST
   || TYPE_MODE (TREE_TYPE (oprnd1)) == TYPE_MODE (type))
@@ -1593,7 +1609,14 @@ vect_recog_rotate_pattern (vecgimple *
   def = vect_recog_temp_ssa_var (type, NULL);
   def_stmt = gimple_build_assign_with_ops (NOP_EXPR, def, oprnd1,
   NULL_TREE);
-  append_pattern_def_seq (stmt_vinfo, def_stmt);
+  if (ext_def)
+   {
+ basic_block new_bb
+   = gsi_insert_on_edge_immediate (ext_def, def_stmt);
+ gcc_assert (!new_bb);
+   }
+  else
+   append_pattern_def_seq (stmt_vinfo, def_stmt);
 }
   stype = TREE_TYPE (def);
 
@@ -1618,11 +1641,19 @@ vect_recog_rotate_pattern (vecgimple *
   def2 = vect_recog_temp_ssa_var (stype, NULL);
   def_stmt = gimple_build_assign_with_ops (NEGATE_EXPR, def2, def,
   NULL_TREE);
-  def_stmt_vinfo
-   = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
-  set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
-  STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
-  append_pattern_def_seq (stmt_vinfo, def_stmt);
+  if (ext_def)
+   {
+ basic_block new_bb
+   = gsi_insert_on_edge_immediate (ext_def, def_stmt);
+ gcc_assert (!new_bb);
+   }
+  else
+   {
+ def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
+ set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
+ STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
+ append_pattern_def_seq (stmt_vinfo, def_stmt);
+   }
 
   def2 = vect_recog_temp_ssa_var (stype, NULL);
   tree mask
@@ -1630,11 +1661,19 @@ vect_recog_rotate_pattern (vecgimple *
   def_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, def2,
   gimple_assign_lhs (def_stmt),
   mask);
-  def_stmt_vinfo
-   = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
-  set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
-  STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
-  append_pattern_def_seq (stmt_vinfo, def_stmt);
+  if (ext_def)
+   {
+ basic_block new_bb
+   = gsi_insert_on_edge_immediate (ext_def, def_stmt);
+ gcc_assert (!new_bb);
+   }
+  else
+   {
+ def_stmt_vinfo =

[Patch, Fortran] PR48858/55465 - permit multiple bind(C) declarations (but not definitions) for the same proc

2013-05-17 Thread Tobias Burnus


Followup (and depending on) to the C binding patches for
* COMMON: http://gcc.gnu.org/ml/fortran/2013-05/msg00048.html
* Procedures: http://gcc.gnu.org/ml/fortran/2013-05/msg00051.html
which honour Fortran 2008, where the Fortran name is no longer a global 
identifier if a binding name has been specified.


The main reason for this patch is a build failure of Open MPI (requires 
!gcc$ attributes no_arg_check, i.e. it only affects GCC 4.9). Open MPI 
uses somethine like:

  interface
subroutine pmpi_something() bind(C,name=MPI_something)
  ...
and in a different module:
  interface
subroutine mpi_something() bind(C,name=MPI_something)
  ...

Currently, gfortran rejects it because it only permits one 
definition/declaration per translation unit. However, there is no reason 
why multiple INTERFACE blocks shouldn't be permitted.



Remarks:

a) Better argument checks if definition and declaration are in the same 
file. (see INTENT patch in a test case)


b) Currently, no check is done regarding the characteristic of procedure 
declarations. Of course, the declaration has to be compatible with the C 
procedure. However, there seems to be the wish* to permit compatible 
input - even if the Fortran characteristic is different. For instance 
int * takes both a scalar integer (int i; f(i)) and arrays (int 
i[5]; f(i)). Or also popular according to the PRs: Taking a C_LOC or an 
integer(c_intptr_t).


(* Seemingly, also J3 and/or WG5 discussed this (plenum? subgroups?) and 
they had the permit it. However, finding some official document is 
difficult.)


I was wondering for a while what should be permitted and what shouldn't, 
but I have now decided to put that completely into the hands of the user.



Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias
2013-05-17  Tobias Burnus  bur...@net-b.de

	PR fortran/48858
	PR fortran/55465
	* decl.c (add_global_entry): Add sym_name.
	* parse.c (add_global_procedure): Ditto.
	* resolve.c (resolve_bind_c_derived_types): Handle multiple decl for
	a procedure.
	(resolve_global_procedure): Handle gsym-ns pointing to a module.
	* trans-decl.c (gfc_get_extern_function_decl): Ditto.

2013-05-17  Tobias Burnus  bur...@net-b.de

	PR fortran/48858
	PR fortran/55465
	* gfortran.dg/binding_label_tests_10_main.f03: Update dg-error.
	* gfortran.dg/binding_label_tests_11_main.f03: Ditto.
	* gfortran.dg/binding_label_tests_13_main.f03: Ditto.
	* gfortran.dg/binding_label_tests_3.f03: Ditto.
	* gfortran.dg/binding_label_tests_4.f03: Ditto.
	* gfortran.dg/binding_label_tests_5.f03: Ditto.
	* gfortran.dg/binding_label_tests_6.f03: Ditto.
	* gfortran.dg/binding_label_tests_7.f03: Ditto.
	* gfortran.dg/binding_label_tests_8.f03: Ditto.
	* gfortran.dg/c_loc_tests_12.f03: Fix test case.
	* gfortran.dg/binding_label_tests_24.f90: New.
	* gfortran.dg/binding_label_tests_25.f90: New.

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index cb449a2..6ab9cc7 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -5375,6 +5375,7 @@ add_global_entry (const char *name, const char *binding_label, bool sub)
   else
 	{
 	  s-type = type;
+	  s-sym_name = name;
 	  s-where = gfc_current_locus;
 	  s-defined = 1;
 	  s-ns = gfc_current_ns;
@@ -5396,6 +5397,7 @@ add_global_entry (const char *name, const char *binding_label, bool sub)
   else
 	{
 	  s-type = type;
+	  s-sym_name = name;
 	  s-binding_label = binding_label;
 	  s-where = gfc_current_locus;
 	  s-defined = 1;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index ba1730a..a223a2c 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -4359,10 +4359,15 @@ add_global_procedure (bool sub)
   if (s-defined
 	  || (s-type != GSYM_UNKNOWN
 	   s-type != (sub ? GSYM_SUBROUTINE : GSYM_FUNCTION)))
-	gfc_global_used(s, NULL);
+	{
+	  gfc_global_used (s, NULL);
+	  /* Silence follow-up errors.  */
+	  gfc_new_block-binding_label = NULL;
+	}
   else
 	{
 	  s-type = sub ? GSYM_SUBROUTINE : GSYM_FUNCTION;
+	  s-sym_name = gfc_new_block-name;
 	  s-where = gfc_current_locus;
 	  s-defined = 1;
 	  s-ns = gfc_current_ns;
@@ -4379,10 +4384,15 @@ add_global_procedure (bool sub)
   if (s-defined
 	  || (s-type != GSYM_UNKNOWN
 	   s-type != (sub ? GSYM_SUBROUTINE : GSYM_FUNCTION)))
-	gfc_global_used(s, NULL);
+	{
+	  gfc_global_used (s, NULL);
+	  /* Silence follow-up errors.  */
+	  gfc_new_block-binding_label = NULL;
+	}
   else
 	{
 	  s-type = sub ? GSYM_SUBROUTINE : GSYM_FUNCTION;
+	  s-sym_name = gfc_new_block-name;
 	  s-binding_label = gfc_new_block-binding_label;
 	  s-where = gfc_current_locus;
 	  s-defined = 1;
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index f3607b4..74e0aa4 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -2389,6 +2389,11 @@ resolve_global_procedure (gfc_symbol *sym, locus *where,
 	}
 
   def_sym = gsym-ns-proc_name;
+
+  /* This can happen if a binding name has been specified.  */
+  if (gsym-binding_label

[PATCH] Fix VEC_[LR]SHIFT_EXPR folding for big-endian (PR tree-optimization/57051)

2013-05-17 Thread Jakub Jelinek

On Thu, May 16, 2013 at 07:59:00PM +0200, Mikael Pettersson wrote:
 Jakub Jelinek writes:
   On Thu, Apr 25, 2013 at 11:47:02PM +0200, Jakub Jelinek wrote:
This patch adds folding of constant arguments v and v, which helps to
optimize the testcase from the PR back into constant store after 
 vectorized
loop is unrolled.
   
   As this fixes a regression on the 4.8 branch, I've backported it (and
   minimal prerequisite for that) to 4.8 branch too.
 
 Unfortunately this patch makes gcc.dg/vect/no-scevccp-outer-{7,13}.c fail
 on powerpc64-linux:
 
 +FAIL: gcc.dg/vect/no-scevccp-outer-13.c execution test
 +FAIL: gcc.dg/vect/no-scevccp-outer-7.c execution test
 
 which is a regression from 4.8-20130502.  Reverting r198580 fixes it.
 
 The same FAILs also occur on trunk.

Ah right, I was confused by the fact that VEC_RSHIFT_EXPR is used
not just on little endian targets, but on big endian as well
(VEC_LSHIFT_EXPR is never emitted), but the important spot is
when extracting the scalar result from the vector:

  if (BYTES_BIG_ENDIAN)
bitpos = size_binop (MULT_EXPR,
 bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
 TYPE_SIZE (scalar_type));
  else
bitpos = bitsize_zero_node;

Fixed thusly, ok for trunk/4.8?

2013-05-17  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/57051
* fold-const.c (const_binop) case VEC_LSHIFT_EXPR,
case VEC_RSHIFT_EXPR: Fix BYTES_BIG_ENDIAN handling.

--- gcc/fold-const.c.jj 2013-05-16 12:36:28.0 +0200
+++ gcc/fold-const.c2013-05-17 08:38:12.575117676 +0200
@@ -1393,7 +1393,7 @@ const_binop (enum tree_code code, tree a
  if (shiftc = outerc || (shiftc % innerc) != 0)
return NULL_TREE;
  int offset = shiftc / innerc;
- if (code == VEC_LSHIFT_EXPR)
+ if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN))
offset = -offset;
  tree zero = build_zero_cst (TREE_TYPE (type));
  for (i = 0; i  count; i++)


Jakub

Patch ping

2013-05-17 Thread Jakub Jelinek

Hi!

http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00282.html
Reject -fsanitize=address -fsanitize=thread linking that won't ever work at
runtime.

Jakub

Re: section anchors and weak hidden symbols

2013-05-17 Thread Nathan Sidwell


On 05/16/13 15:32, Rainer Orth wrote:


The new gcc.dg/visibility-21.c testcase fails on i386-pc-solaris2.11 and
x86_64-unknown-linux-gnu:

FAIL: gcc.dg/visibility-21.c (test for excess errors)
Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/visibility-21.c:1:0: warning: 
this target does not support '-fsection-anchors' [-fsection-anchors]

Fixed as follows, tested with the appropriate runtest invokation on both
targets where the test becomes UNSUPPORTED, installed on mainline.


thanks!

Re: [patch,fortran] PR50405 - Statement function with itself as argument SEGV's

2013-05-17 Thread Tobias Burnus


Am 17.05.2013 05:22, schrieb Bud Davis:

Not to much to add beyond the title and the patch.
The test file fails before (eventually, when you run out of stack) and passes 
after the patch is applied.  No new testsuite failures.


--bud

Index: gcc/gcc/fortran/resolve.c
===
--- gcc/gcc/fortran/resolve.c(revision 198955)
+++ gcc/gcc/fortran/resolve.c(working copy)
@@ -306,6 +306,14 @@
  !resolve_procedure_interface (sym))
  return;
  
+  if (strcmp (proc-name,sym-name) == 0)


Missing blank after the comma.


+{
+   gfc_error (Self referential argument 
+   '%s' at %L is not allowed, sym-name,
+   proc-declared_at);
+   return;


Indentation is wrong. (As a friend of hyphens, I would add one between 
self and referential, but it is also fine without.)



!{ dg-do compile }
! submitted by zec...@gmail.com
!{ dg-prune-output Obsolescent feature: Statement function at }


Please add ! PR fortran/50405 as comment.

Instead of dg-prune-output,  you could also use: '! { dg-options  }'. 
That will override the default setting, i.e. it removes the -pedantic.



f(f) = 0 ! { dg-error Self referential argument }
end

2013-05-17  Bud Davis  jmda...@link.com

 PR fortran/50405
 resolve.c (resolve_formal_arglist): Detect error when an argument
 has the same name as the function.


OK and thanks for the patch!

Tobias

PS: Nice that you are back to (casual) gfortran development.

[ping] Re: [patch] install host specific {bits,ext}/opt_random.h headers in host specific c++ incdir

2013-05-17 Thread Matthias Klose

ping?

Am 12.05.2013 11:58, schrieb Matthias Klose:
 {bits,ext}/opt_random.h became host specific in 4.8, but are still installed
 into the host independent c++ include dir.  install them into the host 
 specific
 include dir instead.  Ok for 4.8 and trunk?
 
   Matthias

Re: [PATCH] Pattern recognizer rotate improvement

2013-05-17 Thread Richard Biener

On Fri, 17 May 2013, Jakub Jelinek wrote:

 On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote:
  We have the same issue in some other places where we insert invariant
  code into the loop body - one reason there is another LIM pass
  after vectorization.
 
 Well, in this case it causes the shift amount to be loaded into a vector
 instead of scalar, therefore even when LIM moves it before the loop, it
 will only work with vector/vector shifts and be more expensive that way
 (need to broadcast the value in a vector).  The following patch
 improves it slightly at least for loops, by just emitting the shift amount
 stmts to loop preheader, rotate-4.c used to be only vectorizable with
 -mavx2 (which has vector/vector shifts), now also -mavx (which doesn't)
 vectorizes it.  Unfortunately this trick doesn't work for SLP vectorization,
 emitting the stmts at the start of the current bb doesn't help, because
 every stmt emits its own and thus it is vectorized with vector/vector
 shifts only anyway.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.
 
 2013-05-17  Jakub Jelinek  ja...@redhat.com
 
   * tree-vect-patterns.c (vect_recog_rotate_pattern): For
   vect_external_def oprnd1 with loop_vinfo, try to emit
   optional cast, negation and and stmts on the loop preheader
   edge instead of into the pattern def seq.
 
   * gcc.target/i386/rotate-4.c: Compile only with -mavx
   instead of -mavx2, require only avx instead of avx2.
   * gcc.target/i386/rotate-4a.c: Include avx-check.h instead
   of avx2-check.h and turn into an avx runtime test instead of
   avx2 runtime test.
 
 --- gcc/tree-vect-patterns.c.jj   2013-05-16 13:56:08.0 +0200
 +++ gcc/tree-vect-patterns.c  2013-05-16 15:27:00.565143478 +0200
 @@ -1494,6 +1494,7 @@ vect_recog_rotate_pattern (vecgimple *
bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
enum vect_def_type dt;
optab optab1, optab2;
 +  edge ext_def = NULL;
  
if (!is_gimple_assign (last_stmt))
  return NULL;
 @@ -1574,6 +1575,21 @@ vect_recog_rotate_pattern (vecgimple *
if (*type_in == NULL_TREE)
  return NULL;
  
 +  if (dt == vect_external_def
 +   TREE_CODE (oprnd1) == SSA_NAME
 +   loop_vinfo)
 +{
 +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 +  ext_def = loop_preheader_edge (loop);
 +  if (!SSA_NAME_IS_DEFAULT_DEF (oprnd1))
 + {
 +   basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (oprnd1));
 +   if (bb == NULL
 +   || !dominated_by_p (CDI_DOMINATORS, ext_def-dest, bb))
 + ext_def = NULL;
 + }
 +}
 +
def = NULL_TREE;
if (TREE_CODE (oprnd1) == INTEGER_CST
|| TYPE_MODE (TREE_TYPE (oprnd1)) == TYPE_MODE (type))
 @@ -1593,7 +1609,14 @@ vect_recog_rotate_pattern (vecgimple *
def = vect_recog_temp_ssa_var (type, NULL);
def_stmt = gimple_build_assign_with_ops (NOP_EXPR, def, oprnd1,
  NULL_TREE);
 -  append_pattern_def_seq (stmt_vinfo, def_stmt);
 +  if (ext_def)
 + {
 +   basic_block new_bb
 + = gsi_insert_on_edge_immediate (ext_def, def_stmt);
 +   gcc_assert (!new_bb);
 + }
 +  else
 + append_pattern_def_seq (stmt_vinfo, def_stmt);
  }
stype = TREE_TYPE (def);
  
 @@ -1618,11 +1641,19 @@ vect_recog_rotate_pattern (vecgimple *
def2 = vect_recog_temp_ssa_var (stype, NULL);
def_stmt = gimple_build_assign_with_ops (NEGATE_EXPR, def2, def,
  NULL_TREE);
 -  def_stmt_vinfo
 - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
 -  set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
 -  STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
 -  append_pattern_def_seq (stmt_vinfo, def_stmt);
 +  if (ext_def)
 + {
 +   basic_block new_bb
 + = gsi_insert_on_edge_immediate (ext_def, def_stmt);
 +   gcc_assert (!new_bb);
 + }
 +  else
 + {
 +   def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
 +   set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
 +   STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
 +   append_pattern_def_seq (stmt_vinfo, def_stmt);
 + }
  
def2 = vect_recog_temp_ssa_var (stype, NULL);
tree mask
 @@ -1630,11 +1661,19 @@ vect_recog_rotate_pattern (vecgimple *
def_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, def2,
  gimple_assign_lhs (def_stmt),
  mask);
 -  def_stmt_vinfo
 - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
 -  set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
 -  STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
 -  append_pattern_def_seq (stmt_vinfo, def_stmt);
 +  if (ext_def)
 + {
 +   basic_block new_bb
 + = gsi_insert_on_edge_immediate

Re: [PATCH] Fix VEC_[LR]SHIFT_EXPR folding for big-endian (PR tree-optimization/57051)

2013-05-17 Thread Richard Biener

On Fri, 17 May 2013, Jakub Jelinek wrote:

 On Thu, May 16, 2013 at 07:59:00PM +0200, Mikael Pettersson wrote:
  Jakub Jelinek writes:
On Thu, Apr 25, 2013 at 11:47:02PM +0200, Jakub Jelinek wrote:
 This patch adds folding of constant arguments v and v, which helps 
  to
 optimize the testcase from the PR back into constant store after 
  vectorized
 loop is unrolled.

As this fixes a regression on the 4.8 branch, I've backported it (and
minimal prerequisite for that) to 4.8 branch too.
  
  Unfortunately this patch makes gcc.dg/vect/no-scevccp-outer-{7,13}.c fail
  on powerpc64-linux:
  
  +FAIL: gcc.dg/vect/no-scevccp-outer-13.c execution test
  +FAIL: gcc.dg/vect/no-scevccp-outer-7.c execution test
  
  which is a regression from 4.8-20130502.  Reverting r198580 fixes it.
  
  The same FAILs also occur on trunk.
 
 Ah right, I was confused by the fact that VEC_RSHIFT_EXPR is used
 not just on little endian targets, but on big endian as well
 (VEC_LSHIFT_EXPR is never emitted), but the important spot is
 when extracting the scalar result from the vector:
 
   if (BYTES_BIG_ENDIAN)
 bitpos = size_binop (MULT_EXPR,
  bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
  TYPE_SIZE (scalar_type));
   else
 bitpos = bitsize_zero_node;
 
 Fixed thusly, ok for trunk/4.8?

Ok with a comment in front of (code == VEC_RSHIFT_EXPR) ^ 
(!BYTES_BIG_ENDIAN)

Thanks,
Richard.

 2013-05-17  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/57051
   * fold-const.c (const_binop) case VEC_LSHIFT_EXPR,
   case VEC_RSHIFT_EXPR: Fix BYTES_BIG_ENDIAN handling.
 
 --- gcc/fold-const.c.jj   2013-05-16 12:36:28.0 +0200
 +++ gcc/fold-const.c  2013-05-17 08:38:12.575117676 +0200
 @@ -1393,7 +1393,7 @@ const_binop (enum tree_code code, tree a
 if (shiftc = outerc || (shiftc % innerc) != 0)
   return NULL_TREE;
 int offset = shiftc / innerc;
 -   if (code == VEC_LSHIFT_EXPR)
 +   if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN))
   offset = -offset;
 tree zero = build_zero_cst (TREE_TYPE (type));
 for (i = 0; i  count; i++)
 
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [PATCH] Fix incorrect discriminator assignment.

2013-05-17 Thread Richard Biener

On Wed, May 15, 2013 at 6:50 PM, Cary Coutant ccout...@google.com wrote:
 gcc/ChangeLog:

 * tree-cfg.c (locus_descrim_hasher::hash): Only hash lineno.
 (locus_descrim_hasher::equal): Likewise.
 (next_discriminator_for_locus): Likewise.
 (assign_discriminator): Add return value.
 (make_edges): Assign more discriminator if necessary.
 (make_cond_expr_edges): Likewise.
 (make_goto_expr_edges): Likewise.

 gcc/testsuite/ChangeLog:

 * gcc.dg/debug/dwarf2/discriminator.c: New Test.

 Looks OK to me, but I can't approve patches for tree-cfg.c.

 Two comments:

 (1) In the C++ conversion, it looks like someone misspelled discrim
 in locus_descrim_hasher.

 (2) Is this worth fixing in trunk when we'll eventually switch to a
 per-instruction discriminator?

 This patch fixes a common case where one line is distributed to 3 BBs,
 but only 1 discriminator is assigned.

As far as I can see the patch makes discriminators coarser (by hashing
and comparing on LOCATION_LINE and not on the location).  It also has
the changes like

-  assign_discriminator (entry_locus, then_bb);
+  if (assign_discriminator (entry_locus, then_bb))
+assign_discriminator (entry_locus, bb);

which is what the comment refers to?  I think those changes are short-sighted
because what happens if the 2nd assign_discriminator call has exactly the
same issue?  Especially with the make_goto_expr_edges change there
may be multiple predecessors where I cannot see how the change is
correct.  Yes, the change changes something and thus may fix the testcase
but it doesn't change things in a predictable way as far as I can see.

So - the change to compare/hash LOCATION_LINE looks good to me,
but the assign_discriminator change needs more explaining.

Richard.

 -cary

Re: debuggability of recog_data

2013-05-17 Thread Richard Biener

On Thu, May 16, 2013 at 11:04 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 16, 2013 at 2:02 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Steven Bosscher stevenb@gmail.com writes:
 On Wed, May 15, 2013 at 12:14 AM, Mike Stump wrote:
 I don't what to bike shed.  So, I'm happy if the next poor soul that
 touches it just does so.  If people like recog_data_info, I'd be happy
 to change it to that.  Let's give then peanut gallery a day to vote on
 it.  :-)

 Usually we append _d or _def to structure definitions, so 
 recog_data_def?

 Gah, I wrote the patch from memory and forgot about the bit after the comma.
 I'm not trying to be contrary really. :-)

 Bootstrapped  regression-tested on x86_64-linux-gnu.  OK to install?

 Thanks,
 Richard


 gcc/
 * recog.h (Recog_data): Rename to...
 (recog_data_t): ...this.
 ^^^
 It should be recog_data_d.

Ok with that change.

Thanks,
Richard.


 --
 H.J.

[PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)

2013-05-17 Thread Jakub Jelinek

On Thu, May 16, 2013 at 06:22:10PM +0200, Jakub Jelinek wrote:
 As discussed in the PR, there seem to be only 3 define_split
 patterns that use dead_or_set_p, one in i386.md and two in s390.md,
 but unfortunately insn splitting is done in many passes
 (combine, split{1,2,3,4,5}, dbr, pro_and_epilogue, final, sometimes mach)
 and only in combine the note problem is computed.  Computing the note
 problem in split{1,2,3,4,5} just because of the single pattern on i?86 -m32
 and one on s390x -m64 might be too expensive, and while neither of these
 targets do dbr scheduling, e.g. during final without cfg one can't
 df_analyze.
 
 So, the following patch fixes it by doing the transformation instead
 in the peephole2 pass which computes the notes problem and has REG_DEAD
 notes up2date (and peep2_reg_dead_p is used there heavily and works).

Alternative, so far untested, patch is let the register is not dead splitter
do its job always during split2 and just fix it up during peephole2, if the
register was dead.

For the non-cltd case the peephole2 is always desirable, we get rid of
a register move and free one hard register for potential other uses after
peephole2 (cprop_hardreg? anything else that could benefit from that?).

For the cltd case, it is questionable, while we gain a free hard register
at that spot, it isn't guaranteed any pass will benefit from that, and
cltd is 2 byts smaller than sarl $31, %eax.  Though, not sure about the
performance.  So, the patch below doesn't use the second peephole2 for -Os.

2013-05-17  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/57281
PR rtl-optimization/57300
* config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove.
(extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo
what the other splitter did if the registers are dead.

* gcc.dg/pr57300.c: New test.
* gcc.c-torture/execute/pr57281.c: New test.

--- gcc/config/i386/i386.md.jj  2013-05-16 18:22:59.0 +0200
+++ gcc/config/i386/i386.md 2013-05-17 10:11:20.365455394 +0200
@@ -3332,22 +3332,8 @@ (define_insn extendsidi2_1
   !TARGET_64BIT
   #)
 
-;; Extend to memory case when source register does die.
-(define_split
-  [(set (match_operand:DI 0 memory_operand)
-   (sign_extend:DI (match_operand:SI 1 register_operand)))
-   (clobber (reg:CC FLAGS_REG))
-   (clobber (match_operand:SI 2 register_operand))]
-  (reload_completed
- dead_or_set_p (insn, operands[1])
- !reg_mentioned_p (operands[1], operands[0]))
-  [(set (match_dup 3) (match_dup 1))
-   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
- (clobber (reg:CC FLAGS_REG))])
-   (set (match_dup 4) (match_dup 1))]
-  split_double_mode (DImode, operands[0], 1, operands[3], operands[4]);)
-
-;; Extend to memory case when source register does not die.
+;; Split the memory case.  If the source register doesn't die, it will stay
+;; this way, if it does die, following peephole2s take care of it.
 (define_split
   [(set (match_operand:DI 0 memory_operand)
(sign_extend:DI (match_operand:SI 1 register_operand)))
@@ -3376,6 +3362,48 @@ (define_split
   DONE;
 })
 
+;; Peepholes for the case where the source register does die, after
+;; being split with the above splitter.
+(define_peephole2
+  [(set (match_operand:SI 0 memory_operand)
+   (match_operand:SI 1 register_operand))
+   (set (match_operand:SI 2 register_operand) (match_dup 1))
+   (parallel [(set (match_dup 2)
+  (ashiftrt:SI (match_dup 2)
+   (match_operand:QI 3 const_int_operand)))
+  (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SI 4 memory_operand) (match_dup 2))]
+  INTVAL (operands[3]) == 31
+REGNO (operands[1]) != REGNO (operands[2])
+peep2_reg_dead_p (2, operands[1])
+peep2_reg_dead_p (4, operands[2])
+!reg_mentioned_p (operands[2], operands[4])
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
+ (clobber (reg:CC FLAGS_REG))])
+   (set (match_dup 4) (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:SI 0 memory_operand)
+   (match_operand:SI 1 register_operand))
+   (parallel [(set (match_operand:SI 2 register_operand)
+  (ashiftrt:SI (match_dup 1)
+   (match_operand:QI 3 const_int_operand)))
+  (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SI 4 memory_operand) (match_dup 2))]
+  INTVAL (operands[3]) == 31
+   /* cltd is shorter than sarl $31, %eax */
+!optimize_function_for_size_p (cfun)
+true_regnum (operands[1]) == AX_REG
+true_regnum (operands[2]) == DX_REG
+peep2_reg_dead_p (2, operands[1])
+peep2_reg_dead_p (3, operands[2])
+!reg_mentioned_p (operands[2], operands[4])
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
+

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Richard Biener

On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
 vectorization opportunities.  Previously, GCC treated malloc'd arrays as
 only guaranteeing 4-byte alignment, even though the glibc implementation
 guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
 which is sufficient to permit the missed vectorization opportunities.

 The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
 but doing so currently exposes a latent bug that degrades a 64-bit
 benchmark.  I have therefore not included that change at this time, but
 added a FIXME recording the information.

 Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
 regressions.  Verified that SPEC CPU2006 degradations are fixed with no
 new degradations.  Ok for trunk?  Also, do you want any backports?

You say this is because glibc guarantees such alignment - shouldn't this
be guarded by a proper check then?  Probably AIX guarantees similar
alignment but there are also embedded elf/newlib targets, no?

Btw, the #define should possibly move to config/linux.h guarded by
OPTION_GLIBC?

Richard.

 Thanks,
 Bill


 2013-05-16  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/rs6000.h (MALLOC_ABI_ALIGNMENT): New #define.


 Index: gcc/config/rs6000/rs6000.h
 ===
 --- gcc/config/rs6000/rs6000.h  (revision 198998)
 +++ gcc/config/rs6000/rs6000.h  (working copy)
 @@ -2297,6 +2297,13 @@ extern char rs6000_reg_names[][8];   /* register 
 nam
  /* How to align the given loop. */
  #define LOOP_ALIGN(LABEL)  rs6000_loop_align(LABEL)

 +/* Alignment guaranteed by __builtin_malloc.  */
 +/* FIXME:  128-bit alignment is guaranteed by glibc for TARGET_64BIT.
 +   However, specifying the stronger guarantee currently leads to
 +   a regression in SPEC CPU2006 437.leslie3d.  The stronger
 +   guarantee should be implemented here once that's fixed.  */
 +#define MALLOC_ABI_ALIGNMENT (64)
 +
  /* Pick up the return address upon entry to a procedure. Used for
 dwarf2 unwind information.  This also enables the table driven
 mechanism.  */

Re: [fixincludes] solaris_pow_int_overload should use __cplusplus

2013-05-17 Thread Rainer Orth

Bruce Korb bk...@gnu.org writes:

 On 05/16/13 06:41, Rainer Orth wrote:
 Work is going on to incorporate all applicable fixincludes fixes into
 the Solaris headers proper.  One fix is currently problematic since it
 uses an G++-internal macro (__GXX_EXPERIMENTAL_CXX0X__) where libstdc++
 cmath already switched to testing __cplusplus.  The following patch
 updates the fix to match cmath.

 Tested by mainline bootstraps on i386-pc-solaris2.11,
 sparc-sun-solaris2.11 and 4.8 bootstrap on i386-pc-solaris2.10.

 Ok for mainline and 4.8 branch if they pass?

 Look good to me.  Thanks.

Testing completed successfully now on both 4.8 branch and mainline, thus
I've installed the patch on mainline.

Jakub, may I install on the 4.8 branch now or better wait until 4.8.1 is
released?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
 On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
  vectorization opportunities.  Previously, GCC treated malloc'd arrays as
  only guaranteeing 4-byte alignment, even though the glibc implementation
  guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
  which is sufficient to permit the missed vectorization opportunities.
 
  The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
  but doing so currently exposes a latent bug that degrades a 64-bit
  benchmark.  I have therefore not included that change at this time, but
  added a FIXME recording the information.
 
  Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
  regressions.  Verified that SPEC CPU2006 degradations are fixed with no
  new degradations.  Ok for trunk?  Also, do you want any backports?
 
 You say this is because glibc guarantees such alignment - shouldn't this
 be guarded by a proper check then?  Probably AIX guarantees similar
 alignment but there are also embedded elf/newlib targets, no?
 
 Btw, the #define should possibly move to config/linux.h guarded by
 OPTION_GLIBC?

For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
glibc really guarantees on most architectures (I think x32 provides more
than that, but it can override).  But changing it requires some analysis
on commonly used malloc alternatives on Linux, what exactly do they
guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
Note that on most targets there is some standard type that requires
such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
conformance matter.

Jakub

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread David Edelsohn

On Fri, May 17, 2013 at 4:47 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
 On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
  vectorization opportunities.  Previously, GCC treated malloc'd arrays as
  only guaranteeing 4-byte alignment, even though the glibc implementation
  guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
  which is sufficient to permit the missed vectorization opportunities.
 
  The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
  but doing so currently exposes a latent bug that degrades a 64-bit
  benchmark.  I have therefore not included that change at this time, but
  added a FIXME recording the information.
 
  Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
  regressions.  Verified that SPEC CPU2006 degradations are fixed with no
  new degradations.  Ok for trunk?  Also, do you want any backports?

 You say this is because glibc guarantees such alignment - shouldn't this
 be guarded by a proper check then?  Probably AIX guarantees similar
 alignment but there are also embedded elf/newlib targets, no?

 Btw, the #define should possibly move to config/linux.h guarded by
 OPTION_GLIBC?

 For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
 glibc really guarantees on most architectures (I think x32 provides more
 than that, but it can override).  But changing it requires some analysis
 on commonly used malloc alternatives on Linux, what exactly do they
 guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
 Note that on most targets there is some standard type that requires
 such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
 conformance matter.

Jakub,

As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance
regression for 16 byte alignment on PPC64.  8 bytes is a work-around
for now.

- David

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 04:55:20AM -0400, David Edelsohn wrote:
 As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance
 regression for 16 byte alignment on PPC64.  8 bytes is a work-around
 for now.

I was talking about the config/linux.h definition, PPC64 can of course
override it to workaround something temporarily, but because there is some
problem on PPC64 we shouldn't penalize code say on x86_64, s390x or aarch64.

Jakub

Re: [PATCH] Don't invalidate string length cache when not needed

2013-05-17 Thread Marek Polacek

On Thu, May 16, 2013 at 10:33:27PM +0200, Jakub Jelinek wrote:
 On Thu, May 16, 2013 at 06:44:03PM +0200, Marek Polacek wrote:
  --- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200
  +++ gcc/tree-ssa-strlen.c   2013-05-16 17:57:33.963150006 +0200
  @@ -1693,8 +1693,10 @@ handle_char_store (gimple_stmt_iterator
  }
else
  {
  - si-writable = true;
  - si-dont_invalidate = true;
  + /* The string might be e.g. in the .rodata section.  */
  + si-writable = false;
 
 No, this really should be si-writable = true; (and comment not needed).
 Ok with that change.

Done.

 The thing is, while the string might not be known to be writable before,
 i.e. we can't optimize away this store, because supposedly it should
 trap, if we notice e.g. another write to the same location (writing zero
 there again), we can optimize that other write already, because we know
 that this store stored there something.
 
  +#include strlenopt.h
  +
  +__attribute__((noinline, noclone)) size_t
  +fn1 (char *p, const char *r)
  +{
  +  size_t len1 = __builtin_strlen (r);
  +  char *q = __builtin_strchr (p, '\0');
  +  *q = '\0';
  +  return len1 - __builtin_strlen (r); // This strlen should be optimized 
  into len1.
 
 With strlenopt.h include you can avoid using __builtin_ prefixes, all the
 builtins needed are prototyped in that header.

Ugh, sure, that's only a remnant from my testing.  Thanks for the
reviews.  Will commit this one finally.

2013-05-17  Marek Polacek  pola...@redhat.com

* tree-ssa-strlen.c (handle_char_store): Don't invalidate
cached length when doing non-zero store of storing '\0' to
'\0'.

* gcc.dg/strlenopt-25.c: New test.
* gcc.dg/strlenopt-26.c: Likewise.

--- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200
+++ gcc/tree-ssa-strlen.c   2013-05-17 10:59:27.461231167 +0200
@@ -1694,7 +1694,8 @@ handle_char_store (gimple_stmt_iterator
  else
{
  si-writable = true;
- si-dont_invalidate = true;
+ gsi_next (gsi);
+ return false;
}
}
  else
@@ -1717,6 +1718,33 @@ handle_char_store (gimple_stmt_iterator
si-endptr = ssaname;
  si-dont_invalidate = true;
}
+  /* If si-length is non-zero constant, we aren't overwriting '\0',
+and if we aren't storing '\0', we know that the length of the
+string and any other zero terminated string in memory remains
+the same.  In that case we move to the next gimple statement and
+return to signal the caller that it shouldn't invalidate anything.  
+
+This is benefical for cases like:
+
+char p[20];
+void foo (char *q)
+{
+  strcpy (p, foobar);
+  size_t len = strlen (p);// This can be optimized into 6
+  size_t len2 = strlen (q);// This has to be computed
+  p[0] = 'X';
+  size_t len3 = strlen (p);// This can be optimized into 6
+  size_t len4 = strlen (q);// This can be optimized into len2
+  bar (len, len2, len3, len4);
+}
+   */ 
+  else if (si != NULL  si-length != NULL_TREE
+   TREE_CODE (si-length) == INTEGER_CST
+   integer_nonzerop (gimple_assign_rhs1 (stmt)))
+   {
+ gsi_next (gsi);
+ return false;
+   }
 }
   else if (idx == 0  initializer_zerop (gimple_assign_rhs1 (stmt)))
 {
--- gcc/testsuite/gcc.dg/strlenopt-25.c.mp  2013-05-15 17:15:18.702118637 
+0200
+++ gcc/testsuite/gcc.dg/strlenopt-25.c 2013-05-15 18:26:27.881030317 +0200
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fdump-tree-strlen } */
+
+#include strlenopt.h
+
+int
+main ()
+{
+  char p[] = foobar;
+  int len, len2;
+  len = strlen (p);
+  p[0] = 'O';
+  len2 = strlen (p);
+  return len - len2;
+}
+
+/* { dg-final { scan-tree-dump-times strlen \\( 0 strlen } } */
+/* { dg-final { cleanup-tree-dump strlen } } */
--- gcc/testsuite/gcc.dg/strlenopt-26.c.mp  2013-05-16 17:33:00.302060413 
+0200
+++ gcc/testsuite/gcc.dg/strlenopt-26.c 2013-05-17 10:58:17.605015608 +0200
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fdump-tree-strlen } */
+
+#include strlenopt.h
+
+__attribute__((noinline, noclone)) size_t
+fn1 (char *p, const char *r)
+{
+  size_t len1 = strlen (r);
+  char *q = strchr (p, '\0');
+  *q = '\0';
+  return len1 - strlen (r); // This strlen should be optimized into len1.
+}
+
+int
+main (void)
+{
+  char p[] = foobar;
+  const char *volatile q = xyzzy;
+  fn1 (p, q);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times strlen \\( 1 strlen } } */
+/* { dg-final { cleanup-tree-dump strlen } } */

Marek

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Richard Biener

On Fri, May 17, 2013 at 10:47 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
 On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
  vectorization opportunities.  Previously, GCC treated malloc'd arrays as
  only guaranteeing 4-byte alignment, even though the glibc implementation
  guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
  which is sufficient to permit the missed vectorization opportunities.
 
  The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
  but doing so currently exposes a latent bug that degrades a 64-bit
  benchmark.  I have therefore not included that change at this time, but
  added a FIXME recording the information.
 
  Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
  regressions.  Verified that SPEC CPU2006 degradations are fixed with no
  new degradations.  Ok for trunk?  Also, do you want any backports?

 You say this is because glibc guarantees such alignment - shouldn't this
 be guarded by a proper check then?  Probably AIX guarantees similar
 alignment but there are also embedded elf/newlib targets, no?

 Btw, the #define should possibly move to config/linux.h guarded by
 OPTION_GLIBC?

 For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
 glibc really guarantees on most architectures (I think x32 provides more
 than that, but it can override).  But changing it requires some analysis
 on commonly used malloc alternatives on Linux, what exactly do they
 guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
 Note that on most targets there is some standard type that requires
 such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
 conformance matter.

Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it
is supposed to be the alignment that is required for conforming to the C ABI
on the target.  For different allocators I'd rather have a new function
attribute that tells us the alignment of the allocators allocation function
(of course replacing the allocator at runtime via ELF interposition makes
that possibly fragile as well).  We shouldn't assume just because glibc
may at some day guarantee 256bit alignment that other allocators on linux
do so, too.

So - I'd rather play safe (and the default to align to BITS_PER_WORD
probably catches the C ABI guarantee on most targets I suppose).

Richard.

 Jakub

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 11:17:26AM +0200, Richard Biener wrote:
 Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it
 is supposed to be the alignment that is required for conforming to the C ABI
 on the target.  For different allocators I'd rather have a new function
 attribute that tells us the alignment of the allocators allocation function
 (of course replacing the allocator at runtime via ELF interposition makes
 that possibly fragile as well).  We shouldn't assume just because glibc

Most of the allocators I've mentioned, if not all, are providing just
malloc/calloc etc. entrypoints and are being in search scope before -lc,
so it isn't anything you can control with attributes.

I've looked at libasan so far (seems to guarantee either 16, or 64 or 128
bytes depending on flags), valgrind (seems to guarantee 8 bytes on 32-bit
arches except ppc32, and 16 bytes elsewhere), ElectricFence (no guarantees
at all, the default is 4 bytes, but can be overridden with EF_ALIGNMENT
env option to smaller/larger value, I guess let's ignore efence).

Jakub

Re: More vector folding

2013-05-17 Thread Hans-Peter Nilsson

 From: Marc Glisse marc.gli...@inria.fr
 Date: Tue, 14 May 2013 13:47:23 +0200

 Here is what I tested during the night, I'll just rename the function.
 I took the chance to remove an unnecessary alternative in TRUTH_XOR_EXPR.

 Passes bootstrap+testsuite on x86_64-linux-gnu.

 2013-05-14  Marc Glisse  marc.gli...@inria.fr

 gcc/
   * fold-const.c (fold_negate_expr): Handle vectors.
   (fold_truth_not_expr): Make it static.
   (fold_invert_truth): New static function.
   (invert_truthvalue_loc): Handle vectors. Do not call
   fold_truth_not_expr directly.
   (fold_unary_loc) BIT_NOT_EXPR: Handle comparisons.
   TRUTH_NOT_EXPR: Do not cast to boolean.
   (fold_comparison): Handle vector constants.
   (fold_binary_loc) TRUTH_XOR_EXPR: Remove redundant code.
   (fold_ternary_loc) VEC_COND_EXPR: Adapt more COND_EXPR optimizations.
   * tree.h (fold_truth_not_expr): Remove declaration.

 gcc/testsuite/
   * g++.dg/ext/vector22.C: New testcase.
   * gcc.dg/binop-xor3.c: Remove xfail.

Looks like the removed xfail caused regressions for about half
of all targets; there's now PR57313 opened for this.

Maybe an rtl test like BRANCH_COSTS or rtx_cost now matters in
the decision?

brgds, H-P

Re: rtl expansion without zero/sign extension based on VRP

2013-05-17 Thread Kugan


On 13/05/13 17:47, Richard Biener wrote:

On Mon, May 13, 2013 at 5:45 AM, Kugan
kugan.vivekanandara...@linaro.org wrote:

Hi,

This patch removes some of the redundant sign/zero
extensions using value ranges informations generated by VRP.

When GIMPLE_ASSIGN stmts with LHS type smaller than word is expanded to
rtl, if we can prove that RHS expression value can always fit in LHS
type and there is no sign conversion, truncation and extension to fit
the type is redundant. Subreg and Zero/sign extensions are therefore
redundant.

For example, when an expression is evaluated and it's value is assigned
to variable of type short, the generated rtl would look something like
the following.

(set (reg:SI 110)
   (zero_extend:SI (subreg:HI (reg:SI 117) 0)))

However, if during value range propagation, if we can say for certain
that the value of the expression which is present in register 117 is
within the limits of short and there is no sign conversion, we donâ€™t
need to perform the subreg and zero_extend; instead we can generate the
following rtl.

(set (reg:SI 110)
   (reg:SI 117)))

Attached patch adds value range information to gimple stmts during value
range propagation pass and expands rtl without subreg and zero/sign
extension if the value range is within the limits of it's type.

This change improve the geomean of spec2k int benchmark with ref by about
~3.5% on an arm chromebook.

Tested  on X86_64 and ARM.

I would like review comments on this.


A few comments on the way you preserve this information.



Thanks Richard for reviewing it.


--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -191,6 +191,11 @@ struct GTY((chain_next (%h.next)))
gimple_statement_base {
   in there.  */
unsigned int subcode : 16;

+  /* if an assignment gimple statement has RHS expression that can fit
+ LHS type, zero/sign extension to truncate is redundant.
+ Set this if we detect extension as redundant during VRP.  */
+  unsigned sign_zero_ext_redundant : 1;
+

this enlarges all gimple statements by 8 bytes, so it's out of the question.

My original plan to preserve value-range information was to re-use
SSA_NAME_PTR_INFO which is where we already store value-range
information for pointers:

struct GTY(()) ptr_info_def
{
   /* The points-to solution.  */
   struct pt_solution pt;

   /* Alignment and misalignment of the pointer in bytes.  Together
  align and misalign specify low known bits of the pointer.
  ptr  (align - 1) == misalign.  */

   /* When known, this is the power-of-two byte alignment of the object this
  pointer points into.  This is usually DECL_ALIGN_UNIT for decls and
  MALLOC_ABI_ALIGNMENT for allocated storage.  When the alignment is not
  known, it is zero.  Do not access directly but use functions
  get_ptr_info_alignment, set_ptr_info_alignment,
  mark_ptr_info_alignment_unknown and similar.  */
   unsigned int align;

   /* When alignment is known, the byte offset this pointer differs from the
  above alignment.  Access only through the same helper functions as align
  above.  */
   unsigned int misalign;
};

what you'd do is to make the ptr_info_def reference from tree_ssa_name a
reference to a union of the above (for pointer SSA names) and an alternate
info like

struct range_info_def {
   double_int min;
   double_int max;
};



I have now changed the way I am preserving this information as you have 
suggested.



or a more compressed format (a reference to a mode or type it fits for example).

The very specific case in question also points at the fact that we have
two conversion tree codes, NOP_EXPR and CONVERT_EXPR, and we've
tried to unify them (but didn't finish up that task)...



We can also remove sign/zero extension in cases other than  NOP_EXPR and 
CONVERT_EXPR,  if the value range of the RHS expressions fits LHS type 
without sign change.


For example, please look at the following gimple stmt and the resulting rtl:

;; c_3 = c_2(D)  15;

;; Without the patch
  (insn 6 5 7 (set (reg:SI 116)
  (and:SI (reg/v:SI 115 [ c ])
  (const_int 15 [0xf]))) c5.c:4 -1
   (nil))

  (insn 7 6 0 (set (reg/v:SI 111 [ c ])
  (zero_extend:SI (subreg:QI (reg:SI 116) 0))) c5.c:4 -1
   (nil))

;; with the patch
  (insn 6 5 7 (set (reg:SI 116)
  (and:SI (reg/v:SI 115 [ c ])
  (const_int 15 [0xf]))) c5.c:4 -1
   (nil))

  (insn 7 6 0 (set (reg/v:SI 111 [ c ])
  (reg:SI 116)) c5.c:4 -1
   (nil))



+static void
+process_stmt_for_ext_elimination ()
+{

we already do all the analysis when looking for opportunities to eliminate
the intermediate cast in (T2)(T1)x in simplify_conversion_using_ranges,
that's the place that should handle this case.


I have changed this.


I have attached patch with the above changes.


Thanks,
Kugan

Richard.


Thanks,
Kugan


2013-05-09  Kugan Vivekanandarajah  kug...@linaro.org

 * gcc/gimple.h (gimple_is_exp_fit_lhs, gimple_set_exp_fit_lhs):

Re: [C++ Patch] PR 18126

2013-05-17 Thread Jason Merrill


OK.

Jason

Re: [PATCH] Error out on -fsanitize=thread -fsanitize=address

2013-05-17 Thread Dodji Seketeli

Jakub Jelinek ja...@redhat.com writes:


 2013-05-06  Jakub Jelinek  ja...@redhat.com

   * gcc.c (SANITIZER_SPEC): Reject -fsanitize=address -fsanitize=thread
   linking.

This looks OK to me, thanks.

-- 
Dodji

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Bill Schmidt



On Fri, 2013-05-17 at 01:11 -0400, David Edelsohn wrote:
 On Thu, May 16, 2013 at 10:40 PM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
  vectorization opportunities.  Previously, GCC treated malloc'd arrays as
  only guaranteeing 4-byte alignment, even though the glibc implementation
  guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
  which is sufficient to permit the missed vectorization opportunities.
 
  The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
  but doing so currently exposes a latent bug that degrades a 64-bit
  benchmark.  I have therefore not included that change at this time, but
  added a FIXME recording the information.
 
  Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
  regressions.  Verified that SPEC CPU2006 degradations are fixed with no
  new degradations.  Ok for trunk?  Also, do you want any backports?
 
 Okay.  If you think that this is appropriate for 4.8 branch, that is
 okay as well.

Yes, the degradation appears to have started last summer, so updating
4.8 would be good.

 
 Is there a FSF GCC Bugzilla open for the 437.leslie3d problem?  The
 FIXME doesn't need to contain the complete explanation, but it would
 be nice to reference a longer explanation and not pretend to be
 Fermat's theorem that is too large to write in the marge.

There is now.  I opened 57309 after submitting the patch, so I can
reference that.

Thanks,
Bill

 
 Thanks, David

Re: More vector folding

2013-05-17 Thread Marc Glisse

On Fri, 17 May 2013, Hans-Peter Nilsson wrote:

From: Marc Glisse marc.gli...@inria.fr
Date: Tue, 14 May 2013 13:47:23 +0200

Here is what I tested during the night, I'll just rename the function.
I took the chance to remove an unnecessary alternative in TRUTH_XOR_EXPR.

Passes bootstrap+testsuite on x86_64-linux-gnu.

2013-05-14  Marc Glisse  marc.gli...@inria.fr

gcc/
* fold-const.c (fold_negate_expr): Handle vectors.
(fold_truth_not_expr): Make it static.
(fold_invert_truth): New static function.
(invert_truthvalue_loc): Handle vectors. Do not call
fold_truth_not_expr directly.
(fold_unary_loc) BIT_NOT_EXPR: Handle comparisons.
TRUTH_NOT_EXPR: Do not cast to boolean.
(fold_comparison): Handle vector constants.
(fold_binary_loc) TRUTH_XOR_EXPR: Remove redundant code.
(fold_ternary_loc) VEC_COND_EXPR: Adapt more COND_EXPR optimizations.
* tree.h (fold_truth_not_expr): Remove declaration.

gcc/testsuite/
* g++.dg/ext/vector22.C: New testcase.
* gcc.dg/binop-xor3.c: Remove xfail.

Looks like the removed xfail caused regressions for about half
of all targets; there's now PR57313 opened for this.

Maybe an rtl test like BRANCH_COSTS or rtx_cost now matters in
the decision?

Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.

What is the proper thing to do here? If I add the generic xfail back, 
we'll get xpass on some platforms, now we have fails on some platforms, 
and listing the platforms where we want the transformation to happen is 
just a pain. Shall I remove the testcase?

--
Marc Glisse

Re: [PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 10:25:01AM +0200, Jakub Jelinek wrote:
 Alternative, so far untested, patch is let the register is not dead splitter
 do its job always during split2 and just fix it up during peephole2, if the
 register was dead.

Now fully bootstrapped/regtested on x86_64-linux and i686-linux.

 2013-05-17  Jakub Jelinek  ja...@redhat.com
 
   PR rtl-optimization/57281
   PR rtl-optimization/57300
   * config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove.
   (extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo
   what the other splitter did if the registers are dead.
 
   * gcc.dg/pr57300.c: New test.
   * gcc.c-torture/execute/pr57281.c: New test.

Jakub

[fortran, doc] Improve random_seed example

2013-05-17 Thread Janne Blomqvist

Hi,

the example we provide for the usage of the random_seed intrinsic
could be better. At least one user has already been tripped over by
the fact that on some targets the first call to system_clock returns
0, resulting in a poor seed. Below is an improved(?!) example program.

Ok for trunk/4.8/4.7? (Changelog + patch is of course trivial once
there is agreement on the code itself)

! Seed the random generator with some random input
module rseed_rand
contains
  subroutine rseed()
implicit none
integer, allocatable :: seed(:)
integer :: i, n, un, istat, dt(8), pid, t(2), s
integer(8) :: count, tms

call random_seed(size = n)
allocate(seed(n))
! First try if the OS provides a random number generator
open(newunit=un, file=/dev/urandom, access=stream, 
 form=unformatted, iostat=istat)
if (istat == 0) then
   read(un) seed
   close(un)
else
   ! Fallback to XOR:ing the current time and pid. The PID is
   ! useful in case one launches multiple instances of the same
   ! program in parallel.
   call system_clock(count)
   if (count /= 0) then
  t = transfer(count, t)
   else
  call date_and_time(values=dt)
  tms = (dt(1) - 1970) * 365_8 * 24 * 60 * 60 * 1000 
   + dt(2) * 31_8 * 24 * 60 * 60 * 1000 
   + dt(3) * 24 * 60 * 60 * 60 * 1000 
   + dt(5) * 60 * 60 * 1000 
   + dt(6) * 60 * 1000 + dt(7) * 1000 
   + dt(8)
  t = transfer(tms, t)
   end if
   s = ieor(t(1), t(2))
   pid = getpid()
   s = ieor(s, pid)
   if (n = 3) then
  seed(1) = t(1)
  seed(2) = t(2)
  seed(3) = pid
  if (n  3) then
 seed(4:) = s + 37 * (/ (i, i = 0, n - 4) /)
  end if
   else
  seed = s + 37 * (/ (i, i = 0, n - 1 ) /)
   end if
end if
call random_seed(put=seed)
  end subroutine rseed
end module rseed_rand

--
Janne Blomqvist

[PATCH] Fix typo in stmt_kills_ref_p_1

2013-05-17 Thread Richard Biener


stmt_kills_ref_p_1 wants to compare MEM_REF offset but compares
the pointers instead (which are previously compared as equal).
This breaks my fix for PR57303.

Fixed thus, applied as obvious.

Richard.

2013-05-17  Richard Biener  rguent...@suse.de

* tree-ssa-alias.c (stmt_kills_ref_p_1): Properly compare
MEM_REF offsets.

Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 199004)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -2002,8 +2002,8 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref
  if (TREE_CODE (base) == MEM_REF  TREE_CODE (ref-base) == MEM_REF
   TREE_OPERAND (base, 0) == TREE_OPERAND (ref-base, 0))
{
- if (!tree_int_cst_equal (TREE_OPERAND (base, 0),
-  TREE_OPERAND (ref-base, 0)))
+ if (!tree_int_cst_equal (TREE_OPERAND (base, 1),
+  TREE_OPERAND (ref-base, 1)))
{
  double_int off1 = mem_ref_offset (base);
  off1 = off1.lshift (BITS_PER_UNIT == 8

Re: [PATCH] Fix typo in stmt_kills_ref_p_1

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 02:46:56PM +0200, Richard Biener wrote:
 
 stmt_kills_ref_p_1 wants to compare MEM_REF offset but compares
 the pointers instead (which are previously compared as equal).
 This breaks my fix for PR57303.
 
 Fixed thus, applied as obvious.
 
 Richard.
 
 2013-05-17  Richard Biener  rguent...@suse.de
 
   * tree-ssa-alias.c (stmt_kills_ref_p_1): Properly compare
   MEM_REF offsets.

Ugh, thanks.
As the 0th operand was equal, tree_int_cst_equal used to return always
1 and thus the if condition was never true.

 --- gcc/tree-ssa-alias.c  (revision 199004)
 +++ gcc/tree-ssa-alias.c  (working copy)
 @@ -2002,8 +2002,8 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref
 if (TREE_CODE (base) == MEM_REF  TREE_CODE (ref-base) == MEM_REF
  TREE_OPERAND (base, 0) == TREE_OPERAND (ref-base, 0))
   {
 -   if (!tree_int_cst_equal (TREE_OPERAND (base, 0),
 -TREE_OPERAND (ref-base, 0)))
 +   if (!tree_int_cst_equal (TREE_OPERAND (base, 1),
 +TREE_OPERAND (ref-base, 1)))
   {
 double_int off1 = mem_ref_offset (base);
 off1 = off1.lshift (BITS_PER_UNIT == 8

Jakub

Re: More vector folding

2013-05-17 Thread Chung-Ju Wu

2013/5/17 Marc Glisse marc.gli...@inria.fr:

 Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.

 What is the proper thing to do here? If I add the generic xfail back, we'll
 get xpass on some platforms, now we have fails on some platforms, and
 listing the platforms where we want the transformation to happen is just a
 pain. Shall I remove the testcase?


If we cannot redesign the testcase, we should remove it.
Because it seems the testcase cannot properly demonstrate
its purpose after r198983.

Otherwise we better have some comment in the testcase
to inform target maintainers listing their targets in the xfail field.

Best regards,
jasonwucj

Re: More vector folding

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 09:54:14PM +0800, Chung-Ju Wu wrote:
 2013/5/17 Marc Glisse marc.gli...@inria.fr:
 
  Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.
 
  What is the proper thing to do here? If I add the generic xfail back, we'll
  get xpass on some platforms, now we have fails on some platforms, and
  listing the platforms where we want the transformation to happen is just a
  pain. Shall I remove the testcase?
 
 
 If we cannot redesign the testcase, we should remove it.
 Because it seems the testcase cannot properly demonstrate
 its purpose after r198983.
 
 Otherwise we better have some comment in the testcase
 to inform target maintainers listing their targets in the xfail field.

You can just limit the testcase to a few selected targets, if it includes
a few widely used architectures, it will be enough for the testing and will
be better than removing the testcase altogether.

Jakub

Re: [PATCH, ARM][1 of 2] Add epilogue dwarf info for shrink-wrap

2013-05-17 Thread Ramana Radhakrishnan


On 05/16/13 07:27, Zhenqiang Chen wrote:

On 15 May 2013 06:31, Ramana Radhakrishnan ramana@googlemail.com wrote:

Sorry this had dropped off my list of patches to review somehow but
anyway here's a first cut review.

On Thu, Mar 21, 2013 at 6:58 AM, Zhenqiang Chen
zhenqiang.c...@linaro.org wrote:

Hi,

When shrink-wrap is enabled, the returns from simple-return path and
normal return path can be merged. The code is like:

  tst ...
  /  \
 |  push ...
 |  ...
 |  pop ...
  \  /
  bx lr

If the dwarf info after pop ... is incorrect, the dwarf checks will
fail at dwarf2cfi.c: function maybe_record_trace_start.

   /* We ought to have the same state incoming to a given trace no
  matter how we arrive at the trace.  Anything else means we've
  got some kind of optimization error.  */
   gcc_checking_assert (cfi_row_equal_p (cur_row, ti-beg_row));

The patch is to add epilogue dwarf info to make sure:

Before bx lr,
* All registers saved in stack had been restored.
* .cfi_def_cfa_offset 0
* .cfi_def_cfa_register is sp even if frame_pointer_needed

Boot strapped and no make check regression.


in what configuration - thumb2 / arm / cortex-a15 / cortex-a9 ? what
languages and on what platform ?


Bootstrapped both arm and thumb2 on Pandaboard ES and Chromebook.
Pandaboard ES (cortex-a9):
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-float=hard --with-fpu=vfpv3-d16
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb

Chromebook (cortext-a15):
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-tune=cortex-a15 --with-float=hard --with-fpu=vfpv3-d16
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-tune=cortex-a15 --with-float=hard --with-fpu=vfpv3-d16
--with-mode=thumb

Regression tests on Pandborad ES for both arm and thumb2 with the same
configure as bootstrap.



Is it OK?




@@ -25447,9 +25519,15 @@ arm_unwind_emit (FILE * asm_out_file, rtx insn)
   handled_one = true;
   break;



+/* The INSN is generated in epilogue.  It is set as RTX_FRAME_RELATED_P
+   to get correct dwarf info for shrink-wrap.  But We should not emit
+   unwind info for it.  */


s/dwarf/debug frame.

REplace But We should not emit unwind info for it  with  We should
not emit unwind info for it because these are used either for pretend
arguments or put the other case here.  

I think you are correct that there is no need for unwind information
as we only have unwind information valid at call points and in this
case I wouldn't expect unwind information to need to be exact for
pretend arguments


Thanks for the comments. The patch is updated and rebased to trunk.



Ok if no regressions.

regards
Ramana



-Zhenqiang

Re: More vector folding

2013-05-17 Thread Marc Glisse


On Fri, 17 May 2013, Jakub Jelinek wrote:


On Fri, May 17, 2013 at 09:54:14PM +0800, Chung-Ju Wu wrote:

2013/5/17 Marc Glisse marc.gli...@inria.fr:


Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.

What is the proper thing to do here? If I add the generic xfail back, we'll
get xpass on some platforms, now we have fails on some platforms, and
listing the platforms where we want the transformation to happen is just a
pain. Shall I remove the testcase?



If we cannot redesign the testcase, we should remove it.
Because it seems the testcase cannot properly demonstrate
its purpose after r198983.

Otherwise we better have some comment in the testcase
to inform target maintainers listing their targets in the xfail field.


You can just limit the testcase to a few selected targets, if it includes
a few widely used architectures, it will be enough for the testing and will
be better than removing the testcase altogether.


Like this? (according to the PR, ia64, sh4 and spu would work too)

I tested with:
make check-gcc RUNTESTFLAGS=dg.exp=binop-xor3.c

and it still passes on x86_64 and appears as unsupported on powerpc.

2013-05-17  Marc Glisse  marc.gli...@inria.fr

PR regression/57313
* gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).


--
Marc GlisseIndex: gcc.dg/binop-xor3.c
===
--- gcc.dg/binop-xor3.c (revision 199006)
+++ gcc.dg/binop-xor3.c (working copy)
@@ -1,11 +1,11 @@
-/* { dg-do compile } */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
 /* { dg-options -O2 -fdump-tree-optimized } */
 
 int
 foo (int a, int b)
 {
   return ((a  !b) || (!a  b));
 }
 
 /* { dg-final { scan-tree-dump-times \\\^ 1 optimized } } */
 /* { dg-final { cleanup-tree-dump optimized } } */

Re: [PATCH, ARM][2 of 2] Enable shrink-wrap for ARM

2013-05-17 Thread Ramana Radhakrishnan


On 05/16/13 07:46, Zhenqiang Chen wrote:

On 15 May 2013 06:36, Ramana Radhakrishnan ramana@googlemail.com wrote:

What happens to the *arm_simple_return pattern that already exists in
the backend ? Can you rebase to trunk ?


The *arm_simple_return is the same as simple_return used by
shrink-wrap. *arm_return and *arm_simple_return are not merged since
*arm_return is for Often the return insn will be the same as loading
from memory, while *arm_simple_return is for branch.

For thumb2, there is only simple_return pattern -- *thumb2_return.

And this is the reason why the normal return generated by epilogue
and simple return generated by shrink-wrap can be optimized as one,
which leads to dwarf info issue.

The rebased patch is attached.


Ok.

Ramana



Thanks!
-Zhenqiang



On Wed, Apr 3, 2013 at 7:50 AM, Zhenqiang Chen
zhenqiang.c...@linaro.org wrote:

On 2 April 2013 17:55, Ramana Radhakrishnan ramana@googlemail.com wrote:

On Thu, Mar 21, 2013 at 7:03 AM, Zhenqiang Chen
zhenqiang.c...@linaro.org wrote:

Hi,

The patch is to enable shrink-wrap for TARGET_ARM and TARGET_THUMB2.

Bootstrapped and no make check regression.
All previous Linaro shrink-wrap bugs (http://goo.gl/6fGg5) are verified.

Is it OK?


The tests should be part of the patch attached and not just added as
separate files in your patch submission.


Thank you for t




-Zhenqiang



Thanks!
-Zhenqiang

ChangeLog:
2013-03-21 Bernd Schmidt  ber...@codesourcery.com
Zhenqiang Chen zhenqiang.c...@linaro.org

 * config/arm/arm-protos.h: Add and update function protos.
 * config/arm/arm.c (use_simple_return_p): New added.
 (thumb2_expand_return): Check simple_return flag.
 * config/arm/arm.md: Add simple_return and conditional simple_return.
 * config/arm/iterators.md: Add iterator for return and simple_return.

testsuite/ChangeLog:
2013-03-21 Zhenqiang Chen zhenqiang.c...@linaro.org

 * gcc.dg/shrink-wrap-alloca.c: New added.
 * gcc.dg/shrink-wrap-pretend.c: New added.
 * gcc.dg/shrink-wrap-sibcall.c: New added.

Re: More vector folding

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote:
 2013-05-17  Marc Glisse  marc.gli...@inria.fr
 
   PR regression/57313
   * gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).

I'd say it should be PR testsuite/57313 (and the PR changed to that).

Ok with that change, if some other target maintainer sees this test passing
on his platform, it can be added to the target selector later.

 --- gcc.dg/binop-xor3.c   (revision 199006)
 +++ gcc.dg/binop-xor3.c   (working copy)
 @@ -1,11 +1,11 @@
 -/* { dg-do compile } */
 +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
  /* { dg-options -O2 -fdump-tree-optimized } */
  
  int
  foo (int a, int b)
  {
return ((a  !b) || (!a  b));
  }
  
  /* { dg-final { scan-tree-dump-times \\\^ 1 optimized } } */
  /* { dg-final { cleanup-tree-dump optimized } } */


Jakub

Re: More vector folding

2013-05-17 Thread Marc Glisse


On Fri, 17 May 2013, Jakub Jelinek wrote:


On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote:

2013-05-17  Marc Glisse  marc.gli...@inria.fr

PR regression/57313
* gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).


I'd say it should be PR testsuite/57313 (and the PR changed to that).


Ok, thanks.

I was actually thinking that it might be better to put the target selector 
on dg-final, as in the attached. Which do you prefer?


--
Marc GlisseIndex: gcc.dg/binop-xor3.c
===
--- gcc.dg/binop-xor3.c (revision 199006)
+++ gcc.dg/binop-xor3.c (working copy)
@@ -1,11 +1,11 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fdump-tree-optimized } */
 
 int
 foo (int a, int b)
 {
   return ((a  !b) || (!a  b));
 }
 
-/* { dg-final { scan-tree-dump-times \\\^ 1 optimized } } */
+/* { dg-final { scan-tree-dump-times \\\^ 1 optimized { target i?86-*-* 
x86_64-*-* } } } */
 /* { dg-final { cleanup-tree-dump optimized } } */

Re: More vector folding

2013-05-17 Thread Jakub Jelinek

On Fri, May 17, 2013 at 04:37:45PM +0200, Marc Glisse wrote:
 On Fri, 17 May 2013, Jakub Jelinek wrote:
 
 On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote:
 2013-05-17  Marc Glisse  marc.gli...@inria.fr
 
 PR regression/57313
 * gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).
 
 I'd say it should be PR testsuite/57313 (and the PR changed to that).
 
 Ok, thanks.
 
 I was actually thinking that it might be better to put the target
 selector on dg-final, as in the attached. Which do you prefer?

Doesn't really matter, both are fine.

Jakub

Re: [PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)

2013-05-17 Thread Richard Henderson

On 05/17/2013 01:25 AM, Jakub Jelinek wrote:
 +(define_peephole2
 +  [(set (match_operand:SI 0 memory_operand)
 + (match_operand:SI 1 register_operand))
 +   (set (match_operand:SI 2 register_operand) (match_dup 1))
 +   (parallel [(set (match_dup 2)
 +(ashiftrt:SI (match_dup 2)
 + (match_operand:QI 3 const_int_operand)))
 +(clobber (reg:CC FLAGS_REG))])
 +   (set (match_operand:SI 4 memory_operand) (match_dup 2))]
 +  INTVAL (operands[3]) == 31

No sense in using match_operand in the pattern and INTVAL == 31
in the condition when you can just use (const_int 31) in the pattern.

Modulo those two cases, the patch is ok.


r~

Re: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags

2013-05-17 Thread Uros Bizjak

On Fri, May 17, 2013 at 7:38 AM, Gopalasubramanian, Ganesh
ganesh.gopalasubraman...@amd.com wrote:

 Thank you Uros for the patch. Could you backport this to the 4.8.0?

 -Original Message-
 From: Uros Bizjak [mailto:ubiz...@gmail.com]
 Sent: Wednesday, May 15, 2013 11:16 PM
 To: gcc-patches@gcc.gnu.org
 Cc: Gopalasubramanian, Ganesh
 Subject: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW 
 and PTA_FXSR flags

 Hello!

 Attached patch adds missing PTA_PRFCHW and PTA_FXSR flags to x86 processor 
 alias table. PRFCHW CPUID flag is shared with 3dnow prefetch flag, so some 
 additional logic is needed to avoid generating SSE prefetches for non-SSE 
 3dNow! targets, while still generating full set of 3dnow prefetches on 3dNow! 
 targets.

 2013-05-15  Uros Bizjak  ubiz...@gmail.com

 * config/i386/i386.c (iy86_option_override_internal): Update
 processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags.  Add
 PTA_POPCNT to corei7 entry and remove PTA_SSE from athlon-4 entry.
 Do not enable SSE prefetch on non-SSE 3dNow! targets.  Enable
 TARGET_PRFCHW for TARGET_3DNOW targets.
 * config/i386/i386.md (prefetch): Enable for TARGET_PRFCHW instead
 of TARGET_3DNOW.
 (*prefetch_3dnow): Enable for TARGET_PRFCHW only.

 Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}  
 and was committed to mainline SVN. The patch will be backported to 4.8 branch 
 in a couple of days.

Done.

I have also reverted athlon-4 change everywhere. It does have SSE after all.

Uros.

Fix pr49146 - crash in unwinder

2013-05-17 Thread Richard Henderson

The rationale is captured in the comment in the first hunk.
Committed.


r~
PR target/49146
* unwind-dw2.c (UNWIND_COLUMN_IN_RANGE): New macro.
(execute_cfa_program): Use it when storing to fs-regs.


diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 80de5ab..041f9d5 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -59,6 +59,35 @@
 #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) (REGNO)
 #endif
 
+/* ??? For the public function interfaces, we tend to gcc_assert that the
+   column numbers are in range.  For the dwarf2 unwind info this does happen,
+   although so far in a case that doesn't actually matter.
+
+   See PR49146, in which a call from x86_64 ms abi to x86_64 unix abi stores
+   the call-saved xmm registers and annotates them.  We havn't bothered
+   providing support for the xmm registers for the x86_64 port primarily
+   because the 64-bit windows targets don't use dwarf2 unwind, using sjlj or
+   SEH instead.  Adding the support for unix targets would generally be a
+   waste.  However, some runtime libraries supplied with ICC do contain such
+   an unorthodox transition, as well as the unwind info to match.  This loss
+   of register restoration doesn't matter in practice, because the exception
+   is caught in the native unix abi, where all of the xmm registers are 
+   call clobbered.
+
+   Ideally, we'd record some bit to notice when we're failing to restore some
+   register recorded in the unwind info, but to do that we need annotation on
+   the unix-ms abi edge, so that we know when the register data may be
+   discarded.  And since this edge is also within the ICC library, we're
+   unlikely to be able to get the new annotation.
+
+   Barring a magic solution to restore the ms abi defined 128-bit xmm registers
+   (as distictly opposed to the full runtime width) without causing extra
+   overhead for normal unix abis, the best solution seems to be to simply
+   ignore unwind data for unknown columns.  */
+
+#define UNWIND_COLUMN_IN_RANGE(x) \
+__builtin_expect((x) = DWARF_FRAME_REGISTERS, 1)
+
 #ifdef REG_VALUE_IN_UNWIND_CONTEXT
 typedef _Unwind_Word _Unwind_Context_Reg_Val;
 
@@ -939,14 +968,19 @@ execute_cfa_program (const unsigned char *insn_ptr,
  reg = insn  0x3f;
  insn_ptr = read_uleb128 (insn_ptr, utmp);
  offset = (_Unwind_Sword) utmp * fs-data_align;
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how
-   = REG_SAVED_OFFSET;
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].loc.offset = offset;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   {
+ fs-regs.reg[reg].how = REG_SAVED_OFFSET;
+ fs-regs.reg[reg].loc.offset = offset;
+   }
}
   else if ((insn  0xc0) == DW_CFA_restore)
{
  reg = insn  0x3f;
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how = REG_UNSAVED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs-regs.reg[reg].how = REG_UNSAVED;
}
   else switch (insn)
{
@@ -977,26 +1011,35 @@ execute_cfa_program (const unsigned char *insn_ptr,
  insn_ptr = read_uleb128 (insn_ptr, reg);
  insn_ptr = read_uleb128 (insn_ptr, utmp);
  offset = (_Unwind_Sword) utmp * fs-data_align;
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how
-   = REG_SAVED_OFFSET;
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].loc.offset = offset;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   {
+ fs-regs.reg[reg].how = REG_SAVED_OFFSET;
+ fs-regs.reg[reg].loc.offset = offset;
+   }
  break;
 
case DW_CFA_restore_extended:
  insn_ptr = read_uleb128 (insn_ptr, reg);
  /* FIXME, this is wrong; the CIE might have said that the
 register was saved somewhere.  */
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNSAVED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs-regs.reg[reg].how = REG_UNSAVED;
  break;
 
case DW_CFA_same_value:
  insn_ptr = read_uleb128 (insn_ptr, reg);
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNSAVED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs-regs.reg[reg].how = REG_UNSAVED;
  break;
 
case DW_CFA_undefined:
  insn_ptr = read_uleb128 (insn_ptr, reg);
- fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNDEFINED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs-regs.reg[reg].how = REG_UNDEFINED;
  break;
 
case DW_CFA_nop:
@@ -1007,9 +1050,12 @@ execute_cfa_program (const unsigned char *insn_ptr,

Re: Patch ping

2013-05-17 Thread Jeff Law


On 05/17/2013 12:49 AM, Jakub Jelinek wrote:

Hi!

http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00282.html
Reject -fsanitize=address -fsanitize=thread linking that won't ever work at
runtime.
I thought Dodji already OK's this.If there's any concern about the 
scope going outside Dodji's area, then I'll approve it :-)


jeff

Re: [PATCH] Fix incorrect discriminator assignment.

2013-05-17 Thread Dehao Chen

On Fri, May 17, 2013 at 1:22 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Wed, May 15, 2013 at 6:50 PM, Cary Coutant ccout...@google.com wrote:
 gcc/ChangeLog:

 * tree-cfg.c (locus_descrim_hasher::hash): Only hash lineno.
 (locus_descrim_hasher::equal): Likewise.
 (next_discriminator_for_locus): Likewise.
 (assign_discriminator): Add return value.
 (make_edges): Assign more discriminator if necessary.
 (make_cond_expr_edges): Likewise.
 (make_goto_expr_edges): Likewise.

 gcc/testsuite/ChangeLog:

 * gcc.dg/debug/dwarf2/discriminator.c: New Test.

 Looks OK to me, but I can't approve patches for tree-cfg.c.

 Two comments:

 (1) In the C++ conversion, it looks like someone misspelled discrim
 in locus_descrim_hasher.

 (2) Is this worth fixing in trunk when we'll eventually switch to a
 per-instruction discriminator?

 This patch fixes a common case where one line is distributed to 3 BBs,
 but only 1 discriminator is assigned.

 As far as I can see the patch makes discriminators coarser (by hashing
 and comparing on LOCATION_LINE and not on the location).  It also has
 the changes like

 -  assign_discriminator (entry_locus, then_bb);
 +  if (assign_discriminator (entry_locus, then_bb))
 +assign_discriminator (entry_locus, bb);

 which is what the comment refers to?  I think those changes are short-sighted
 because what happens if the 2nd assign_discriminator call has exactly the
 same issue?  Especially with the make_goto_expr_edges change there
 may be multiple predecessors where I cannot see how the change is
 correct.  Yes, the change changes something and thus may fix the testcase
 but it doesn't change things in a predictable way as far as I can see.

 So - the change to compare/hash LOCATION_LINE looks good to me,
 but the assign_discriminator change needs more explaining.

The new discriminator assignment algorithm is:

1 FOR_EACH_BB:
2   FOR_EACH_SUCC_EDGE:
3 if (same_line(bb, succ_bb))
4   if (succ_bb has no discriminator)
5  succ_bb-discriminator = new_discriminator
6   else if (bb has no discriminator)
7  bb-discriminator = new_discriminator

Because there are so many places to handle SUCC_EDGE, thus the logic
line #3, #4 is embedded within assign_discriminator function, and the
logic in line #6 is encoded as the return value of
assign_discriminator.

Thanks,
Dehao


 Richard.

 -cary

[PATCH] [tree-optimization/57124] Updated fix for 254.gap problems

2013-05-17 Thread Jeff Law

As I believe I pointed out in a follow-up message, 254.gap is depending 
on signed overflow semantics.


This patch avoids eliminating a cast feeding a conditional when the
SSA_NAME's range has overflowed unless -fstrict-overflow is in effect. 
Thus 254.gap should be building with -fno-strict-overflow.


This patch also introduces a warning when the optimization is applied 
and the ranges have overflowed.


Bootstrapped and regression tested on x86-unknown-linux-gnu.

OK for the trunk?



commit 62bbaa8de0e8d929eb3c63331b47950e9b09d801
Author: Jeff Law l...@redhat.com
Date:   Wed May 1 12:33:20 2013 -0600

PR tree-optimization/57124
* tree-vrp.c (simplify_cond_using_ranges): Only simplify a
conversion feeding a condition if the range has an overflow
if -fstrict-overflow.  Add warnings for when we do make the
transformation.

PR tree-optimization/57124
* gcc.c-torture/execute/pr57124.c: New test.
* gcc.c-torture/execute/pr57124.x: Set -fno-strict-overflow.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8e92c44..9320f21 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2013-05-17  Jeff Law  l...@redhat.com
+
+   PR tree-optimization/57124
+   * tree-vrp.c (simplify_cond_using_ranges): Only simplify a
+   conversion feeding a condition if the range has an overflow
+   if -fstrict-overflow.  Add warnings for when we do make the
+   transformation.
+
 2013-05-16  Rainer Orth  r...@cebitec.uni-bielefeld.de
 
* reorg.c (link_cc0_insns): Wrap in #ifdef HAVE_cc0.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 879b9bc..482151c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2013-05-17  Jeff Law  l...@redhat.com
+
+   PR tree-optimization/57124
+   * gcc.c-torture/execute/pr57124.c: New test.
+   * gcc.c-torture/execute/pr57124.x: Set -fno-strict-overflow.
+
 2013-05-16  Greta Yorsh  greta.yo...@arm.com
 
* gcc.target/arm/unaligned-memcpy-2.c: Adjust expected output.
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr57124.c 
b/gcc/testsuite/gcc.c-torture/execute/pr57124.c
new file mode 100644
index 000..835d249
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr57124.c
@@ -0,0 +1,27 @@
+__attribute__ ((noinline))
+foo(short unsigned int *p1, short unsigned int *p2)
+{
+  short unsigned int x1, x4;
+  int x2, x3, x5, x6;
+  unsigned int x7;
+  
+  x1 = *p1;
+  x2 = (int) x1;
+  x3 = x2 * 65536;
+  x4 = *p2;
+  x5 = (int) x4;
+  x6 = x3 + x4;
+  x7 = (unsigned int) x6;
+  if (x7 = 268435455U)
+abort ();
+  exit (0);
+}
+
+main()
+{
+  short unsigned int x, y;
+  x = -5;
+  y = -10;
+  foo (x, y);
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr57124.x 
b/gcc/testsuite/gcc.c-torture/execute/pr57124.x
new file mode 100644
index 000..d8cacbe
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr57124.x
@@ -0,0 +1,2 @@
+set additional_flags -fno-strict-overflow
+return 0
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index b5de683..b3eccf0 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -8669,8 +8669,32 @@ simplify_cond_using_ranges (gimple stmt)
   range_fits_type_p (vr,
TYPE_PRECISION (TREE_TYPE (op0)),
TYPE_UNSIGNED (TREE_TYPE (op0)))
-  int_fits_type_p (op1, TREE_TYPE (innerop)))
+  int_fits_type_p (op1, TREE_TYPE (innerop))
+ /* The range must not have overflowed, or if it did overflow
+we must not be wrapping/trapping overflow and optimizing
+with strict overflow semantics.  */
+  ((!is_negative_overflow_infinity (vr-min)
+   !is_positive_overflow_infinity (vr-max))
+ || (!flag_wrapv  !flag_trapv  flag_strict_overflow)))
{
+ /* If the range overflowed and the user has asked for warnings
+when strict overflow semantics were used to optimize code,
+issue an appropriate warning.  */
+ if ((is_negative_overflow_infinity (vr-min)
+  || is_positive_overflow_infinity (vr-max))
+  issue_strict_overflow_warning 
(WARN_STRICT_OVERFLOW_CONDITIONAL))
+   {
+ location_t location;
+
+ if (!gimple_has_location (stmt))
+   location = input_location;
+ else
+   location = gimple_location (stmt);
+ warning_at (location, OPT_Wstrict_overflow,
+ assuming signed overflow does not occur when 
+ simplifying conditional.);
+   }
+
  tree newconst = fold_convert (TREE_TYPE (innerop), op1);
  gimple_cond_set_lhs (stmt, innerop);
  gimple_cond_set_rhs (stmt, newconst);

Add myself to MAINTAINERS as Write After Approval

2013-05-17 Thread David Malcolm

Adding myself to the MAINTAINERS in the Write After Approval category
Index: ChangeLog
===
--- ChangeLog	(revision 199021)
+++ ChangeLog	(revision 199022)
@@ -1,3 +1,7 @@
+2013-05-17  David Malcolm  dmalc...@redhat.com
+
+	* MAINTAINERS (Write After Approval): Add myself.
+
 2013-04-30  Brooks Moses  bmo...@google.com
 
 	* MAINTAINERS: Update my email; move myself from Fortran
Index: MAINTAINERS
===
--- MAINTAINERS	(revision 199021)
+++ MAINTAINERS	(revision 199022)
@@ -452,6 +452,7 @@
 Christophe Lyon	christophe.l...@st.com
 Luis Machado	luis...@br.ibm.com
 Ziga Mahkovec	ziga.mahko...@klika.si
+David Malcolm	dmalc...@redhat.com
 Simon Martin	simar...@users.sourceforge.net
 Ranjit Mathew	rmat...@hotmail.com
 Michael Matz	m...@suse.de

[wwwdocs] gcc-4.8/changes.html: mention IRA and transactional memory

2013-05-17 Thread Aldy Hernandez

Errr, how did we miss this?  Ok, I'm partly to blame for the lack of 
transactional memory in changes.html, but something as big as getting 
rid of reload?!


Would it be preferable to mention IRA in the target specific x86 section 
instead?  I figured it was an important enough change to warrant 
front-page coverage.


Is this OK?
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.116
diff -u -r1.116 changes.html
--- htdocs/gcc-4.8/changes.html 24 Apr 2013 15:14:26 -  1.116
+++ htdocs/gcc-4.8/changes.html 17 May 2013 17:49:19 -
@@ -143,6 +143,11 @@
code-fsanitize=thread/code. Instructions will be instrumented to
detect data races. The ThreadSanitizer is available on x86-64
GNU/Linux./li
+liA new local register allocator has been implemented, which
+replaces the 26 year old reload pass and improves generated code
+quality on ia32 and x86-64 targets./li
+liSupport for transactional memory has been implemented on
+selected architectures./li
   /ul

[PATCH, AArch64] Allow insv_imm to handle bigger immediates via masking to 16-bits

2013-05-17 Thread Ian Bolton

The MOVK instruction is currently not used when operand 2 is
more than 16 bits, which leads to sub-optimal code.

This patch improves those situations by removing the check and
instead masking down to 16 bits within the new X format specifier
I added recently.

OK for trunk?

Cheers,
Ian


2013-05-17  Ian Bolton  ian.bol...@arm.com

* config/aarch64/aarch64.c (aarch64_print_operand): Change the X
format
specifier to only display bottom 16 bits.

* config/aarch64/aarch64.md (insv_immmode): Allow any-sized
immediate
to match for operand 2, since it will be masked.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b57416c..1bdfd85 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3424,13 +3424,13 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   break;
 
 case 'X':
-  /* Print integer constant in hex.  */
+  /* Print bottom 16 bits of integer constant in hex.  */
   if (GET_CODE (x) != CONST_INT)
{
  output_operand_lossage (invalid operand for '%%%c', code);
  return;
}
-  asm_fprintf (f, 0x%wx, UINTVAL (x));
+  asm_fprintf (f, 0x%wx, UINTVAL (x)  0x);
   break;
 
 case 'w':
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b27bcda..403d717 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -858,9 +858,8 @@
  (const_int 16)
  (match_operand:GPI 1 const_int_operand n))
(match_operand:GPI 2 const_int_operand n))]
-  INTVAL (operands[1])  GET_MODE_BITSIZE (MODEmode)
-INTVAL (operands[1]) % 16 == 0
-UINTVAL (operands[2]) = 0x
+  UINTVAL (operands[1])  GET_MODE_BITSIZE (MODEmode)
+UINTVAL (operands[1]) % 16 == 0
   movk\\t%w0, %X2, lsl %1
   [(set_attr v8type movk)
(set_attr mode MODE)]

Re: [patch] Fix sched-deps DEP_POSTPONED, ds_t documentation

2013-05-17 Thread Steven Bosscher

Ping

On Sun, May 12, 2013 at 5:45 PM, Steven Bosscher wrote:
 Hello,

 While working on a sched-deps based delay slot scheduler, I've come to
 the conclusion that the dependencies themselves must indicate whether
 the dependent ref is delayed. So I started hacking sched-deps and ran
 into trouble... It turns out there is a problem introduced along with
 DEP_POSTPONED last year, but the real problem is the complicated ds_t
 representation and the unclear documentation. The first *6* bits on a
 ds_t were reserved for the dependence type, and two more bits were
 reserved for HARD_DEP and DEP_CANCELLED:

 -/* First '6' stands for 4 dep type bits and the HARD_DEP and DEP_CANCELLED
 -   bits.
 -   Second '4' stands for BEGIN_{DATA, CONTROL}, BE_IN_{DATA, CONTROL}
 -   dep weakness.  */
 -#define BITS_PER_DEP_WEAK ((BITS_PER_DEP_STATUS - 6) / 4)

 But DEP_POSTPONED adds another bit:

 /* Instruction has non-speculative dependence.  This bit represents the
property of an instruction - not the one of a dependence.
   Therefore, it can appear only in the TODO_SPEC field of an instruction.  */
 #define HARD_DEP (DEP_CONTROL  1)

 /* Set in the TODO_SPEC field of an instruction for which new_ready
has decided not to schedule it speculatively.  */
  #define DEP_POSTPONED (HARD_DEP  1)

 /* Set if a dependency is cancelled via speculation.  */
 #define DEP_CANCELLED (DEP_POSTPONED  1)

 I wanted to add another flag, DEP_DELAYED, and optimistically just
 added another bit, the compiler started warning, etc.


 So far we seem to've gotten away with this because the sign bit on a
 ds_t was unused:

 /* We exclude sign bit.  */
 #define BITS_PER_DEP_STATUS (HOST_BITS_PER_INT - 1)


 The attached patch extends the ds_t documentation to clarify in a
 comment how all the bits are used. I've made ds_t and dw_t unsigned
 int, because ds_t is just a collection of bits, and dw_t is unsigned.
 The target hooks I had to update are all only used by ia64.

 I opportunistically reserved the one left-over bit for my own purposes ;-)

 Bootstrappedtested on ia64-unknown-linux-gnu and on
 powerpc64-unknown-linux-gnu unix-{,-m32).
 OK for trunk?

 Ciao!
 Steven

Re: [PATCH] Use indentation in gtype.state to show nested structure

2013-05-17 Thread David Malcolm

On Wed, 2013-05-15 at 11:51 -0600, Jeff Law wrote:
 On 05/14/2013 01:07 PM, David Malcolm wrote:
  On Tue, 2013-05-14 at 11:44 -0600, Jeff Law wrote:
  On 05/06/2013 03:05 PM, David Malcolm wrote:
 
  [...snip review and comments about auto-checking of formatting...]
 
  Anyway, the patch is good to go.   Please install.
 
  Thanks for reviewing.
 
  Probably a dumb question, but by Please install, do you mean, please
  commit to SVN?
 Yes.

Thanks.  Committed to svn trunk as r199032 (using the version with the
fixed ChangeLog).

Re: web ICEs on subreg

2013-05-17 Thread Mike Stump

On May 16, 2013, at 5:26 PM, David Edelsohn dje@gmail.com wrote:
 This patch is creating new segfaults for 32 bit POWER AIX.

Thanks for the heads up.  Fixed in r199030.

2013-05-17  Mike Stump  mikest...@comcast.net

PR rtl-optimization/57304
* web.c (union_match_dups): Ensure that DF_REF_LOC exists before
accessing DF_REF_REAL_LOC.

Index: web.c
===
--- web.c   (revision 199016)
+++ web.c   (working copy)
@@ -133,9 +133,10 @@ union_match_dups (rtx insn, struct web_e
   entry = type == OP_IN ? use_entry : def_entry;
   for (; *ref; ref++)
{
- if (DF_REF_LOC (*ref) == recog_data.operand_loc[op])
+ rtx *l = DF_REF_LOC (*ref);
+ if (l == recog_data.operand_loc[op])
break;
- if (DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
+ if (l  DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
break;
}
 
@@ -143,9 +144,10 @@ union_match_dups (rtx insn, struct web_e
{
  for (ref = use_link, entry = use_entry; *ref; ref++)
{
- if (DF_REF_LOC (*ref) == recog_data.operand_loc[op])
+ rtx *l = DF_REF_LOC (*ref);
+ if (l == recog_data.operand_loc[op])
break;
- if (DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
+ if (l  DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
break;
}
}
--

[patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher

Hello,

Trying to dump the CFG as a graph fails after the pro_and_epilogue
pass because it leaves unreachable basic blocks. This patch fixes that
issue.

Will commit sometime next week if no-one objects.

Ciao!
Steven


* function.c (rest_of_handle_thread_prologue_and_epilogue):
Cleanup the CFG after thread_prologue_and_epilogue_insns.

Index: function.c
===
--- function.c  (revision 199028)
+++ function.c  (working copy)
@@ -6976,6 +6976,9 @@ rest_of_handle_thread_prologue_and_epilo
  scheduling to operate in the epilogue.  */
   thread_prologue_and_epilogue_insns ();

+  /* The prologue and epilogue may have made some blocks unreachable.  */
+  cleanup_cfg (0);
+
   /* The stack usage info is finalized during prologue expansion.  */
   if (flag_stack_usage_info)
 output_stack_usage ();

Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Jeff Law


On 05/17/2013 01:49 PM, Steven Bosscher wrote:

Hello,

Trying to dump the CFG as a graph fails after the pro_and_epilogue
pass because it leaves unreachable basic blocks. This patch fixes that
issue.

Will commit sometime next week if no-one objects.

Ciao!
Steven


 * function.c (rest_of_handle_thread_prologue_and_epilogue):
 Cleanup the CFG after thread_prologue_and_epilogue_insns.

?!?

Under what conditions does threading the prologue/epilogue make code 
unreachable?


Also note that commit after X days if no-one objects is not the projects 
policy.  Please don't take such liberties.



Jeff

Re: [Patch ARM] Fix PR target/19599

2013-05-17 Thread Richard Henderson

On 05/15/2013 04:50 AM, Ramana Radhakrishnan wrote:
/* Cannot tail-call to long calls, since these are out of range of
   a branch instruction.  */
 -  if (arm_is_long_call_p (decl))
 +  if (decl  arm_is_long_call_p (decl))
  return false;

You can tail call long calls via indirection now.  I.e. load the address first.

  (define_insn *sibcall_insn
 - [(call (mem:SI (match_operand:SI 0  X))
 + [(call (mem:SI (match_operand:SI 0 call_insn_operand Cs,Ss))

FWIW, i or s would work just as well, given you've already constrained by
call_insn_operand; Ss isn't really needed.

 +  if (arm_arch5 || arm_arch4t)
 + return \ bx\\t%0\\t%@ indirect register sibling call\;

Missing %?.

 +  if (arm_arch5 || arm_arch4t)
 + return \bx\\t%1\;

Likewise.

 diff --git a/gcc/testsuite/gcc.target/arm/pr40887.c 
 b/gcc/testsuite/gcc.target/arm/pr40887.c
 index 0b5e873..5cabe3a 100644
 --- a/gcc/testsuite/gcc.target/arm/pr40887.c
 +++ b/gcc/testsuite/gcc.target/arm/pr40887.c
 @@ -2,9 +2,9 @@
  /* { dg-options -O2 -march=armv5te }  */
  /* { dg-final { scan-assembler blx } } */
  
 -int (*indirect_func)();
 +int (*indirect_func)(int x);
  
  int indirect_call()
  {
 -return indirect_func();
 +  return indirect_func(20) + indirect_func (40);
  }
 

You've made this test case 100% useless.  Of course there will always be a blx
insn, since you've two calls there, and one of them will never be tail called.
 It should have been either removed or become your new test case.


r~

[PATCH, i386]: Also pass mmx, 3dnow and sseX options with -march=native

2013-05-17 Thread Uros Bizjak

Hello!

For some reason we didn't pass mmx, 3dnow and sseX options to cc1 with
-march=native. Passing these options will help older processors that
doesn't get detected through architecture recognition functionality to
get all their ISAs enabled.

2013-05-17  Uros Bizjak  ubiz...@gmail.com

* config/i386/driver-i386.c (host_detect_local_cpu): Pass mmx, 3dnow,
sse, sse2, sse3, ssse3 and sse4a flags to options.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline.

Uros.
Index: config/i386/driver-i386.c
===
--- config/i386/driver-i386.c   (revision 198989)
+++ config/i386/driver-i386.c   (working copy)
@@ -786,10 +786,17 @@ const char *host_detect_local_cpu (int argc, const
 
   if (arch)
 {
+  const char *mmx = has_mmx ?  -mmmx :  -mno-mmx;
+  const char *mmx3dnow = has_3dnow ?  -m3dnow :  -mno-3dnow;
+  const char *sse = has_sse ?  -msse :  -mno-sse;
+  const char *sse2 = has_sse2 ?  -msse2 :  -mno-sse2;
+  const char *sse3 = has_sse3 ?  -msse3 :  -mno-sse3;
+  const char *ssse3 = has_ssse3 ?  -mssse3 :  -mno-ssse3;
+  const char *sse4a = has_sse4a ?  -msse4a :  -mno-sse4a;
   const char *cx16 = has_cmpxchg16b ?  -mcx16 :  -mno-cx16;
   const char *sahf = has_lahf_lm ?  -msahf :  -mno-sahf;
   const char *movbe = has_movbe ?  -mmovbe :  -mno-movbe;
-  const char *ase = has_aes ?  -maes :  -mno-aes;
+  const char *aes = has_aes ?  -maes :  -mno-aes;
   const char *pclmul = has_pclmul ?  -mpclmul :  -mno-pclmul;
   const char *popcnt = has_popcnt ?  -mpopcnt :  -mno-popcnt;
   const char *abm = has_abm ?  -mabm :  -mno-abm;
@@ -817,7 +824,8 @@ const char *host_detect_local_cpu (int argc, const
   const char *xsave = has_xsave ?  -mxsave :  -mno-xsave;
   const char *xsaveopt = has_xsaveopt ?  -mxsaveopt :  -mno-xsaveopt;
 
-  options = concat (options, cx16, sahf, movbe, ase, pclmul,
+  options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3,
+   sse4a, cx16, sahf, movbe, aes, pclmul,
popcnt, abm, lwp, fma, fma4, xop, bmi, bmi2,
tbm, avx, avx2, sse4_2, sse4_1, lzcnt, rtm,
hle, rdrnd, f16c, fsgsbase, rdseed, prfchw, adx,

Re: [PATCH:RL78] Add new insn for mulqi3 and mulhi3

2013-05-17 Thread Richard Henderson

On 05/14/2013 09:43 PM, Kaushik Phatak wrote:
 Hi,
 The below patch adds expanders and insns for QI and HI mode for the RL78 
 target.
 The QI mode uses a generic 'mulu' instruction supported by all variants, while
 the HI mode creates insn for G13 and G14 target variants using hardware 
 multiply
 instructions.
 Tested on hardware and simulator with no additional regressions.
 Kindly review the same.
 
 Thanks,
 Kaushik
 
 2013-05-15  Kaushik Phatak  kaushik.pha...@kpitcummins.com

   * config/rl78/rl78.md (mulqi3,mulhi3): New define_expands.
   (mulqi3_rl78,mulhi3_rl78,mulhi3_g13): New define_insns.
 
 Index: gcc/config/rl78/rl78.md
 ===
 --- gcc/config/rl78/rl78.md   (revision 198915)
 +++ gcc/config/rl78/rl78.md   (working copy)
 @@ -235,6 +235,24 @@
[(set_attr valloc macax)]
  )
  
 +(define_expand mulqi3
 +  [(set (match_operand:QI  0 register_operand =v)
 + (mult:QI  (match_operand:QI 1 general_operand +vim)
 +   (match_operand:QI 2 nonmemory_operand vi)))
 +   ]
 +   ; mulu supported by all targets
 +  
 +)

No constraints on define_expand, only predicates.

 +(define_insn mulqi3_rl78
 +(define_insn mulhi3_rl78
 +(define_insn mulhi3_g13

These names are not used.  They should be prefixed with * to indicate the
name is just for documentation.


r~

Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher

On Fri, May 17, 2013 at 9:52 PM, Jeff Law l...@redhat.com wrote:
 On 05/17/2013 01:49 PM, Steven Bosscher wrote:

 Hello,

 Trying to dump the CFG as a graph fails after the pro_and_epilogue
 pass because it leaves unreachable basic blocks. This patch fixes that
 issue.

 Will commit sometime next week if no-one objects.

 Ciao!
 Steven


  * function.c (rest_of_handle_thread_prologue_and_epilogue):
  Cleanup the CFG after thread_prologue_and_epilogue_insns.

 ?!?

 Under what conditions does threading the prologue/epilogue make code
 unreachable?

Compiling testsuite/gcc.target/i386/pad-10.c with -fdump-rtl-all-graph.

Just before rest_of_handle_thread_prologue_and_epilogue we have:

Breakpoint 2, rest_of_handle_thread_prologue_and_epilogue () at
../../trunk/gcc/function.c:6969
6969{
(gdb) p brief_dump_cfg(stderr,-1)
;; basic block 2, loop depth 0, count 0, freq 1, maybe hot
;;  prev block 0, next block 3, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   ENTRY [100.0%]  (FALLTHRU)
;;  succ:   3 [9.2%]  (FALLTHRU)
;;  4 [90.8%]
;; basic block 3, loop depth 0, count 0, freq 922, maybe hot
;;  prev block 2, next block 4, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   2 [9.2%]  (FALLTHRU)
;;  succ:   4 [100.0%]  (FALLTHRU)
;; basic block 4, loop depth 0, count 0, freq 1, maybe hot
;;  prev block 3, next block 1, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   3 [100.0%]  (FALLTHRU)
;;  2 [90.8%]
;;  succ:   EXIT [100.0%]  (FALLTHRU)
$1 = void
(gdb) p debug_bb_n(2)
(note 6 1 4 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 4 6 31 2 NOTE_INSN_FUNCTION_BEG)
(insn 31 4 17 2 (set (reg:SI 0 ax [orig:59 D.1775 ] [59])
(reg/v:SI 4 si [orig:62 x ] [62]))
testsuite/gcc.target/i386/pad-10.c:18 85 {*movsi_internal}
 (nil))
(insn 17 31 8 2 (parallel [
(set (reg:SI 0 ax [orig:59 D.1775 ] [59])
(minus:SI (reg:SI 0 ax [orig:59 D.1775 ] [59])
(reg/v:SI 5 di [orig:61 z ] [61])))
(clobber (reg:CC 17 flags))
]) testsuite/gcc.target/i386/pad-10.c:18 295 {*subsi_1}
 (nil))
(insn 8 17 9 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:SI 4 si [orig:62 x ] [62])
(const_int 1 [0x1])))
testsuite/gcc.target/i386/pad-10.c:12 7 {*cmpsi_1}
 (expr_list:REG_DEAD (reg/v:SI 4 si [orig:62 x ] [62])
(nil)))
(jump_insn 9 8 10 2 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 18)
(pc))) testsuite/gcc.target/i386/pad-10.c:12 602 {*jcc_1}
 (expr_list:REG_BR_PROB (const_int 9078 [0x2376])
(nil))
 - 18)

$2 = (basic_block_def *) 0x76224410
(gdb) p debug_bb_n(3)
(note 10 9 33 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 33 10 11 3 (set (mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int 12 [0xc])) [2 %sfp+-4 S4 A32])
(reg/v:SI 5 di [orig:61 z ] [61])) 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg/v:SI 5 di [orig:61 z ] [61])
(nil)))
(insn 11 33 12 3 (set (reg:QI 0 ax)
(const_int 0 [0])) testsuite/gcc.target/i386/pad-10.c:14 87
{*movqi_internal}
 (nil))
(call_insn 12 11 34 3 (call (mem:QI (symbol_ref:DI (bar) [flags
0x41]  function_decl 0x76225800 bar) [0 bar S1 A8])
(const_int 0 [0])) testsuite/gcc.target/i386/pad-10.c:14 646 {*call}
 (expr_list:REG_DEAD (reg:QI 0 ax)
(nil))
(expr_list:REG_DEP_TRUE (use (reg:QI 0 ax))
(nil)))
(insn 34 12 5 3 (set (reg/v:SI 5 di [orig:61 z ] [61])
(mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int 12 [0xc])) [2 %sfp+-4 S4 A32]))
testsuite/gcc.target/i386/pad-10.c:15 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 65)
(nil)))
(insn 5 34 18 3 (set (reg:SI 0 ax [orig:59 D.1775 ] [59])
(reg/v:SI 5 di [orig:61 z ] [61]))
testsuite/gcc.target/i386/pad-10.c:15 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg/v:SI 5 di [orig:61 z ] [61])
(nil)))

$3 = (basic_block_def *) 0x76224208
(gdb) p debug_bb_n(4)
(code_label 18 5 19 4 3  [1 uses])
(note 19 18 27 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 27 19 0 4 (use (reg/i:SI 0 ax)) testsuite/gcc.target/i386/pad-10.c:19 -1
 (nil))

$4 = (basic_block_def *) 0x762242d8



Continuing from here leads to an ice in the graph dumper:

(gdb) cont
Continuing.
Breakpoint 1, internal_error (gmsgid=gmsgid@entry=0x119725e in %s, at
%s:%d) at ../../trunk/gcc/diagnostic.c:1091
1091bool
(gdb) bt 10
#0  internal_error (gmsgid=gmsgid@entry=0x119725e in %s, at %s:%d)
at ../../trunk/gcc/diagnostic.c:1091
#1  0x00dfda24 in fancy_abort (file=optimized out, line=869,
function=0xee3e60 pre_and_rev_post_order_compute(int*, int*,
bool)::__FUNCTION__ pre_and_rev_post_order_compute) at
../../trunk/gcc/diagnostic.c:1151
#2  0x0061dbab in pre_and_rev_post_order_compute
(pre_order=0x0, rev_post_order=0x1627850,
include_entry_exit=optimized out) at ../../trunk/gcc/cfganal.c:869
#3  0x00d28d22

Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Jeff Law


On 05/17/2013 02:53 PM, Steven Bosscher wrote:

On Fri, May 17, 2013 at 9:52 PM, Jeff Law l...@redhat.com wrote:

On 05/17/2013 01:49 PM, Steven Bosscher wrote:


Hello,

Trying to dump the CFG as a graph fails after the pro_and_epilogue
pass because it leaves unreachable basic blocks. This patch fixes that
issue.

Will commit sometime next week if no-one objects.

Ciao!
Steven


  * function.c (rest_of_handle_thread_prologue_and_epilogue):
  Cleanup the CFG after thread_prologue_and_epilogue_insns.


?!?

Under what conditions does threading the prologue/epilogue make code
unreachable?


Compiling testsuite/gcc.target/i386/pad-10.c with -fdump-rtl-all-graph.

Just before rest_of_handle_thread_prologue_and_epilogue we have:

Thanks.





What's happened, is that emitting the epilogue at the end of basic
block 4 (with a barrier at the end) has made the use insn 43
unreachable.
But from the description you've given, it appears that the epilogue 
itself has unreachable code, and that shouldn't be happening.  If you 
think it can happen by way of shrink-wrapping, I'd like to see the analysis.


What does the epilogue itself look like before threading it into the 
insn stream?



This reaction seems to me like an unnecessary and disappointing slap
on the wrist. Project policy does allow committing obvious patches
without review. Perhaps I should just use that policy and not give
people the chance to comment? I would think you would also know by now
that I know all CFG related code as well as anyone, and, frankly,
probably better than even you.

You do not make project policy.  It's that simple.

Yes we have a obvious patch policy, but this certainly doesn't fall 
under that policy.  I would frown upon you using the obvious patch path 
to bypass existing policy for patch review.


If you want commit without review privileges, ask for someone to sponsor 
your request an the steering committee will vote on it.


jeff

Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher

On Fri, May 17, 2013 at 11:16 PM, Jeff Law wrote:
 What's happened, is that emitting the epilogue at the end of basic
 block 4 (with a barrier at the end) has made the use insn 43
 unreachable.

 But from the description you've given, it appears that the epilogue itself
 has unreachable code, and that shouldn't be happening.  If you think it can
 happen by way of shrink-wrapping, I'd like to see the analysis.

It is not the epilogue itself but the way shrink-wrapping emits it.
The block that is unreachable has its last predecessor edge removed in
function.c:6607:

6607  redirect_edge_and_branch_force (e, *pdest_bb);

I haven't looked at how the shrink-wrapping code works exactly. It's
Bernd's code, so perhaps he can have a look. This is now PR57320.

Ciao!
Steven

Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher

On Fri, May 17, 2013 at 9:49 PM, Steven Bosscher wrote:
 Will commit sometime next week if no-one objects.

Obviously there are objections.
Patch withdrawn.

Ciao!
Steven

Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Jeff Law


On 05/17/2013 03:53 PM, Steven Bosscher wrote:

On Fri, May 17, 2013 at 11:16 PM, Jeff Law wrote:

What's happened, is that emitting the epilogue at the end of basic
block 4 (with a barrier at the end) has made the use insn 43
unreachable.


But from the description you've given, it appears that the epilogue itself
has unreachable code, and that shouldn't be happening.  If you think it can
happen by way of shrink-wrapping, I'd like to see the analysis.


It is not the epilogue itself but the way shrink-wrapping emits it.
The block that is unreachable has its last predecessor edge removed in
function.c:6607:

6607  redirect_edge_and_branch_force (e, *pdest_bb);

I haven't looked at how the shrink-wrapping code works exactly. It's
Bernd's code, so perhaps he can have a look. This is now PR57320.
OK.  Let's go with your patch then.  Approved with a comment that 
shrink-wrapping can result in unreachable edges in the epilogue.


jeff

Fix infinite loop in lto-partition.c

2013-05-17 Thread Jan Hubicka

Hi,
privatize_symbol_name skips privatizing of symbols where we know for some reason
that it is needed. Loop in rename_statics however expect privatize_symbol_name
to always rename the var and it can get into an infinite loop.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* lto-partition.c (privatize_symbol_name): Return true when
privatizing happened.
(rename_statics): Do not go into infinite loop when privatizing
is not needed.
Index: lto-partition.c
===
--- lto-partition.c (revision 198936)
+++ lto-partition.c (working copy)
@@ -766,7 +766,7 @@ lto_balanced_map (void)
   with symbols defined out of the LTO world.
 */
 
-static void
+static bool
 privatize_symbol_name (symtab_node node)
 {
   tree decl = node-symbol.decl;
@@ -781,7 +781,7 @@ privatize_symbol_name (symtab_node node)
fprintf (cgraph_dump_file,
Not privatizing symbol name: %s. It privatized already.\n,
name);
-  return;
+  return false;
 }
   /* Avoid mangling of already mangled clones. 
  ???  should have a flag whether a symbol has a 'private' name already,
@@ -793,7 +793,7 @@ privatize_symbol_name (symtab_node node)
fprintf (cgraph_dump_file,
Not privatizing symbol name: %s. Has unique name.\n,
name);
-  return;
+  return false;
 }
   change_decl_assembler_name (decl, clone_function_name (decl, lto_priv));
   if (node-symbol.lto_file_data)
@@ -804,6 +804,7 @@ privatize_symbol_name (symtab_node node)
 fprintf (cgraph_dump_file,
Privatizing symbol name: %s - %s\n,
name, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+  return true;
 }
 
 /* Promote variable VNODE to be static.  */
@@ -906,11 +907,12 @@ rename_statics (lto_symtab_encoder_t enc
 (!encoder
|| lto_symtab_encoder_lookup (encoder, s) != LCC_NOT_FOUND))
   {
-privatize_symbol_name (s);
-   /* Re-start from beggining since we do not know how many symbols 
changed a name.  */
-   s = symtab_node_for_asm (name);
+if (privatize_symbol_name (s))
+ /* Re-start from beggining since we do not know how many symbols 
changed a name.  */
+ s = symtab_node_for_asm (name);
+else s = s-symbol.next_sharing_asm_name;
   }
-   else s = s-symbol.next_sharing_asm_name;
+else s = s-symbol.next_sharing_asm_name;
 }
 
 /* Find out all static decls that need to be promoted to global because

Re: [Patch ARM] Fix PR target/19599

2013-05-17 Thread Ramana Radhakrishnan

On Fri, May 17, 2013 at 9:12 PM, Richard Henderson r...@redhat.com wrote:
 On 05/15/2013 04:50 AM, Ramana Radhakrishnan wrote:
/* Cannot tail-call to long calls, since these are out of range of
   a branch instruction.  */
 -  if (arm_is_long_call_p (decl))
 +  if (decl  arm_is_long_call_p (decl))
  return false;

 You can tail call long calls via indirection now.  I.e. load the address 
 first.


  (define_insn *sibcall_insn
 - [(call (mem:SI (match_operand:SI 0  X))
 + [(call (mem:SI (match_operand:SI 0 call_insn_operand Cs,Ss))

 FWIW, i or s would work just as well, given you've already constrained by
 call_insn_operand; Ss isn't really needed.

True - don't know what I was thinking there.


 +  if (arm_arch5 || arm_arch4t)
 + return \ bx\\t%0\\t%@ indirect register sibling call\;

 Missing %?.

It's not marked predicable - if anything these %? markers ought to be
removed as well as I don't expect these to be conditionalized ever.


 +  if (arm_arch5 || arm_arch4t)
 + return \bx\\t%1\;

 Likewise.

 diff --git a/gcc/testsuite/gcc.target/arm/pr40887.c 
 b/gcc/testsuite/gcc.target/arm/pr40887.c
 index 0b5e873..5cabe3a 100644
 --- a/gcc/testsuite/gcc.target/arm/pr40887.c
 +++ b/gcc/testsuite/gcc.target/arm/pr40887.c
 @@ -2,9 +2,9 @@
  /* { dg-options -O2 -march=armv5te }  */
  /* { dg-final { scan-assembler blx } } */

 -int (*indirect_func)();
 +int (*indirect_func)(int x);

  int indirect_call()
  {
 -return indirect_func();
 +  return indirect_func(20) + indirect_func (40);
  }


 You've made this test case 100% useless.  Of course there will always be a blx
 insn, since you've two calls there, and one of them will never be tail called.
  It should have been either removed or become your new test case.

If you haven't noticed it has effectively moved to pr19599.c as my
testcase . This is not 100% useless - this test was to make sure we
actually put out a blx for an indirect call as per PR40887 . So it
*actually* continues to test what we want as per PR40887 .

Thanks,
Ramana

 r~

Re: [GOOGLE] Back port discriminator patches to gcc-4_8

2013-05-17 Thread Cary Coutant

 The warning was attributed to the wrong lineno. Looks like
 discriminator does not work well when it coexists with macros.

I think the problem is with same_line_p. It's using expand_location to
test whether two locations refer to the same line, but expand_location
always unwinds the macro stack so that it's looking at the line number
of the macro expansion point. That means that every token in the macro
expansion will appear to be at the same line, so the loop over the
basic_block will replace all of the locations with the new_locus that
we just allocated for the new discriminator. Thus, it's clobbering the
locus for the instructions at line 27 in the macro definition with the
new locus we create to put a discriminator at line 26, and the warning
that should appear for line 27 now says line 26.

Other comments on the patch...

In expand_location_1:

+  if (min_discriminator_location != UNKNOWN_LOCATION
+   loc = min_discriminator_location
+   loc  min_discriminator_location + next_discriminator_location)

The last comparison should just be loc  next_discriminator_location.

Likewise in has_discriminator.

In assign_discriminator:

+  location_t new_locus = location_with_discriminator (locus,
discriminator);
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+  if (same_line_p (locus, gimple_location (stmt)))
+gimple_set_location (stmt, block ?
+  COMBINE_LOCATION_DATA (line_table, new_locus, block) :
+  LOCATION_LOCUS (new_locus));
+}

I'm not convinced the COMBINE_LOCATION_DATA is needed here, since
location_with_discriminator has already done that. And when block ==
NULL, calling LOCATION_LOCUS is effectively a no-op, isn't it? It
seems this could simply be:

  gimple_set_location (stmt, new_locus);

Also, you need to add vec.o to OBJS-libcommon in Makefile.in, since
input.o now depends on it. Other binaries, like gcov, won't build
otherwise.

I'm still not happy with a straightforward port of the old patch,
though, since discriminator assignment might allocate locations that
overlap those used for macro definitions -- there's no checking to see
if we've run out of locations in that case. I guess this could be OK
for the google branch for now, but not for trunk. I'm looking at
alternatives using the adhoc mapping -- do you think it would work if
we extend the adhoc mapping to track both a data (i.e., the block
pointer) and a discriminator?

-cary

Re: [GOOGLE] Back port discriminator patches to gcc-4_8

2013-05-17 Thread Dehao Chen

On Fri, May 17, 2013 at 4:35 PM, Cary Coutant ccout...@google.com wrote:
 The warning was attributed to the wrong lineno. Looks like
 discriminator does not work well when it coexists with macros.

 I think the problem is with same_line_p. It's using expand_location to
 test whether two locations refer to the same line, but expand_location
 always unwinds the macro stack so that it's looking at the line number
 of the macro expansion point. That means that every token in the macro
 expansion will appear to be at the same line, so the loop over the
 basic_block will replace all of the locations with the new_locus that
 we just allocated for the new discriminator. Thus, it's clobbering the
 locus for the instructions at line 27 in the macro definition with the
 new locus we create to put a discriminator at line 26, and the warning
 that should appear for line 27 now says line 26.

Sounds like instead of checking the macro-expansion location,
same_line_p should check the expanded macro location instead, so the
problem will be resolved?


 Other comments on the patch...

 In expand_location_1:

 +  if (min_discriminator_location != UNKNOWN_LOCATION
 +   loc = min_discriminator_location
 +   loc  min_discriminator_location + next_discriminator_location)

 The last comparison should just be loc  next_discriminator_location.

 Likewise in has_discriminator.

Acked, will update the patch.


 In assign_discriminator:

 +  location_t new_locus = location_with_discriminator (locus,
 discriminator);
 +  gimple_stmt_iterator gsi;
 +
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  if (same_line_p (locus, gimple_location (stmt)))
 +gimple_set_location (stmt, block ?
 +  COMBINE_LOCATION_DATA (line_table, new_locus, block) :
 +  LOCATION_LOCUS (new_locus));
 +}

 I'm not convinced the COMBINE_LOCATION_DATA is needed here, since
 location_with_discriminator has already done that. And when block ==
 NULL, calling LOCATION_LOCUS is effectively a no-op, isn't it? It
 seems this could simply be:

   gimple_set_location (stmt, new_locus);

But what if block != NULL?


 Also, you need to add vec.o to OBJS-libcommon in Makefile.in, since
 input.o now depends on it. Other binaries, like gcov, won't build
 otherwise.

Acked, will update the patch

 I'm still not happy with a straightforward port of the old patch,
 though, since discriminator assignment might allocate locations that
 overlap those used for macro definitions -- there's no checking to see
 if we've run out of locations in that case. I guess this could be OK
 for the google branch for now, but not for trunk. I'm looking at
 alternatives using the adhoc mapping -- do you think it would work if
 we extend the adhoc mapping to track both a data (i.e., the block
 pointer) and a discriminator?

Yeah, I think that works. The only concern is that it would increase
the memory because each locus_adhoc map entry need to have another bit
to store the extra info?

Thanks,
Dehao


 -cary

Re: Minimize downward code motion during reassociation

2013-05-17 Thread Easwaran Raman

On Thu, Apr 25, 2013 at 4:42 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Dec 7, 2012 at 9:01 PM, Easwaran Raman era...@google.com wrote:
 It seems I need to reset the debug uses of a statement before moving
 the statement itself. The attached patch starts from the leaf to root
 of the tree to be reassociated and places them at the point where
 their dependences will be met after reassociation. This bootstraps and
 I am running the tests. Ok if there are no test failures?

 +  if ((dep_bb != insert_bb
 +!dominated_by_p (CDI_DOMINATORS, insert_bb, dep_bb))
 +  || (dep_bb == insert_bb
 +   gimple_uid (insert_stmt)  gimple_uid (dep_stmt)))
 +{
 +  insert_stmt = dep_stmt;
 +}

 superfluous {}
Removed.

 +  /* If there are any debug uses of LHS, reset them.  */
 +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
 +{
 +  if (is_gimple_debug (use_stmt))
 +{
 +  gimple_debug_bind_reset_value (use_stmt);

 that's only needed for debug stmts that are not dominated by insert_point, no?
You are right. I have added the check.


 +  /* Statements marked for throw can not be in the middle of a basic block. 
 So
 + we can not insert a statement (not marked for throw) immediately
 after.  */
 +  else if (stmt_can_throw_internal (insert_point))
 +{

 use stmt_ends_bb_p (insert_point) instead

 +  edge succ_edge = find_fallthru_edge (insert_bb-succs);
 +  insert_bb = succ_edge-dest;
 +  /* Insert STMT at the beginning of the successor basic block.  */
 +  gsi_insert = gsi_after_labels (insert_bb);
 +  gsi_move_before (gsistmt, gsi_insert);

 are you sure this is a proper insert location?  How would you even arrive in
 this situation given find_insert_point dominance checks?

Let's say the last statement in a BB which can throw feeds into a
statement that is to be reassociated. Then the insert point would be
this statement. Instead, you could always insert at the beginning of
its (lone) successor. This is independent of the the find_insert_point
checks.


 Otherwise the patch looks ok - can you add one or two testcases please?
I have added a test case and submitted at r199048.

Thanks,
Easwaran



 Thanks,
 Richard.

 Thanks,
 Easwaran

 2012-12-07   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(find_insert_point): New function.
 (insert_stmt_after): Likewise.
 (get_def_stmt): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.
 (build_and_add_sum): Assign UIDs for new statements.
 (linearize_expr): Likewise.
 (do_reassoc): Renumber gimple statement UIDs.



 On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote:
 I am unable to figure out the right way to handle the debug
 statements. What I tried was to find debug statements that use the SSA
 name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
 and then moved them as well at the right place. Thus, if I have to
 move t1 = a + b down (after the definition of 'd'), I also moved all
 debug statements that use t1 after the new position of t1. That still
 caused use-before-def problems in ssa_verify. I noticed that the debug
 statements got modified behind the scenes causing these issues. Any
 hints on what is the right way to handle the debug statements would be
 very helpful.

 I think you cannot (and should not) move debug statements.  Instead you
 have to invalidate them.  Otherwise you'll introduce confusion as debug
 info cannot handle overlapping live ranges.

 But maybe Alex can clarify.

 Richard.

[C++ Patch] PR 10207

2013-05-17 Thread Paolo Carlini


Hi,

in this even older ;) and related issue, we reject empty initializer 
lists in compound-literals. The fix seems simple: just use 
cp_parser_braced_list instead of cp_parser_initializer_list, which 
allows for the special case of empty list. Tested x86_64-linux.


There is a nit which I don't want to hide: cp_parser_initializer_list + 
build_constructor does a little more than cp_parser_braced_list: in 
build_constructor there is a loop setting TREE_SIDE_EFFECTS and 
TREE_CONSTANT to the right value for the CONSTRUCTOR overall, which 
doesn't exist in cp_parser_braced_list. In case it matters - I don't 
think it does, and we have testcases with side effects in the testsuite 
- we can't simply add it to cp_parser_braced_list, because its many 
existing uses are perfectly fine without. A little more code would be 
needed, not a big issue.


Thanks,
Paolo.

///
/cp
2013-05-18  Paolo Carlini  paolo.carl...@oracle.com

PR c++/10207
* parser.c (cp_parser_postfix_expression): Use cp_parser_braced_list
instead of cp_parser_initializer_list for compound-literals.

/testsuite
2013-05-18  Paolo Carlini  paolo.carl...@oracle.com

PR c++/10207
* g++.dg/ext/complit13.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 199043)
+++ cp/parser.c (working copy)
@@ -5719,7 +5719,7 @@ cp_parser_postfix_expression (cp_parser *parser, b
if (cp_parser_allow_gnu_extensions_p (parser)
 cp_lexer_next_token_is (parser-lexer, CPP_OPEN_PAREN))
  {
-   vecconstructor_elt, va_gc *initializer_list = NULL;
+   tree initializer = NULL_TREE;
bool saved_in_type_id_in_expr_p;
 
cp_parser_parse_tentatively (parser);
@@ -5732,21 +5732,19 @@ cp_parser_postfix_expression (cp_parser *parser, b
parser-in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
/* Look for the `)'.  */
cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
-   /* Look for the `{'.  */
-   cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
/* If things aren't going well, there's no need to
   keep going.  */
if (!cp_parser_error_occurred (parser))
  {
-   bool non_constant_p;
-   /* Parse the initializer-list.  */
-   initializer_list
- = cp_parser_initializer_list (parser, non_constant_p);
-   /* Allow a trailing `,'.  */
-   if (cp_lexer_next_token_is (parser-lexer, CPP_COMMA))
- cp_lexer_consume_token (parser-lexer);
-   /* Look for the final `}'.  */
-   cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE);
+   if (cp_lexer_next_token_is (parser-lexer, CPP_OPEN_BRACE))
+ {
+   bool non_constant_p;
+   /* Parse the brace-enclosed initializer list.  */
+   initializer = cp_parser_braced_list (parser,
+non_constant_p);
+ }
+   else
+ cp_parser_simulate_error (parser);
  }
/* If that worked, we're definitely looking at a
   compound-literal expression.  */
@@ -5754,7 +5752,8 @@ cp_parser_postfix_expression (cp_parser *parser, b
  {
/* Warn the user that a compound literal is not
   allowed in standard C++.  */
-   pedwarn (input_location, OPT_Wpedantic, ISO C++ forbids 
compound-literals);
+   pedwarn (input_location, OPT_Wpedantic,
+ISO C++ forbids compound-literals);
/* For simplicity, we disallow compound literals in
   constant-expressions.  We could
   allow compound literals of integer type, whose
@@ -5772,10 +5771,8 @@ cp_parser_postfix_expression (cp_parser *parser, b
  }
/* Form the representation of the compound-literal.  */
postfix_expression
- = (finish_compound_literal
-(type, build_constructor (init_list_type_node,
-  initializer_list),
- tf_warning_or_error));
+ = (finish_compound_literal (type, initializer,
+ tf_warning_or_error));
break;
  }
  }
Index: testsuite/g++.dg/ext/complit13.C
===
--- testsuite/g++.dg/ext/complit13.C(revision 0)
+++ testsuite/g++.dg/ext/complit13.C(working copy)
@@ -0,0 +1,11 @@
+// PR c++/10207
+// { dg-options  }
+
+typedef struct { } EmptyStruct;
+typedef struct { EmptyStruct Empty; } DemoStruct;
+
+void Func()
+{
+  DemoStruct Demo;
+

Re: [Patch][gcc-4_7-branch] Backport trunk revision 187838 into gcc-4_7-branch

2013-05-17 Thread Chung-Ju Wu

Ping: http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00429.html

The patch is to fix dependency issue of libgcc Makefile.in
by adding 'rm libgcc_tm.stamp' in the clean rule.

That was fixed on main trunk (r187838) but not on gcc-4_7-branch.
Since gcc-4_7-branch is still open, is it OK to backport r187838
into gcc-4_7-branch?


Best regards,
jasonwucj

[Patch] Extend script ./contrib/download_prerequisites usage for isl and cloog

2013-05-17 Thread Chung-Ju Wu

Hi all,

Using current trunk repository, it is now able to build
compiler with in-tree isl and cloog.

This patch is to extend ./contrib/download_prerequisites
usage to download isl and cloog conditionally in case
people would like to build gcc with graphite loop optimizations.

OK for the trunk?


contrib/ChangeLog

2013-05-18  Chung-Ju Wu  jasonw...@gmail.com

* download_prerequisites: Download isl and cloog conditionally.



Index: contrib/download_prerequisites
===
--- contrib/download_prerequisites  (revision 199006)
+++ contrib/download_prerequisites  (working copy)
@@ -19,6 +19,12 @@
 # You should have received a copy of the GNU General Public License
 # along with this program. If not, see http://www.gnu.org/licenses/.

+# If you want to build GCC with the Graphite loop optimizations,
+# set GRAPHITE_LOOP_OPT=yes to download optional prerequisties
+# ISL Library and CLooG.
+GRAPHITE_LOOP_OPT=no
+
+# Necessary to build GCC.
 MPFR=mpfr-2.4.2
 GMP=gmp-4.3.2
 MPC=mpc-0.8.1
@@ -36,3 +42,19 @@
 ln -sf $MPC mpc || exit 1

 rm $MPFR.tar.bz2 $GMP.tar.bz2 $MPC.tar.gz || exit 1
+
+# Necessary to build GCC with the Graphite loop optimizations.
+if [ $GRAPHITE_LOOP_OPT == yes ] ; then
+  ISL=isl-0.11.1
+  CLOOG=cloog-0.18.0
+
+  wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1
+  tar xjf $ISL.tar.bz2  || exit 1
+  ln -sf $ISL isl || exit 1
+
+  wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$CLOOG.tar.gz || exit 1
+  tar xzf $CLOOG.tar.gz || exit 1
+  ln -sf $CLOOG cloog || exit 1
+
+  rm $ISL.tar.bz2 $CLOOG.tar.gz || exit 1
+fi


Best regards,
jasonwucj

Re: [PATCH] Fix latent bug in RTL GCSE/PRE (PR57159)

2013-05-17 Thread Jeff Law


On 05/07/2013 09:05 AM, Julian Brown wrote:

On Mon, 6 May 2013 11:53:20 -0600

I don't know. My assumption was that a simple mem would be one that
the PRE pass could handle -- that clause was intended to handle simple
mems in REG_EQUAL notes the same as simple mems as the source of a set.
Maybe that is not safe though, and it would be better to
unconditionally invalidate buried mems in REG_EQUAL notes? I was
slightly wary of inhibiting genuine optimisation opportunities.
For a simple mem, we'll do the right thing, at least that's my reading 
of the code.


I went ahead and put the two code fragments together and committed the 
change after a bootstrap and regression test on x86_64-unknown-linux-gnu.


For reference, attached is the patch that ultimately went into the tree.

Thanks,
Jeff




commit f473eb72a23bc82db0ee23e51fdd40b20417fb15
Author: law law@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Sat May 18 03:48:18 2013 +

   * gcse.c (compute_ld_motion_mems): If a non-simple MEM is
   found in a REG_EQUAL note, invalidate it.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@199049 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4536e62..d6eab5f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2013-05-17  Julian Brown  jul...@codesourcery.com
+
+   * gcse.c (compute_ld_motion_mems): If a non-simple MEM is
+   found in a REG_EQUAL note, invalidate it.
+
 2013-05-17   Easwaran Raman  era...@google.com
 
* tree-ssa-reassoc.c (find_insert_point): New function.
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 07043f7..e485985 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -3894,6 +3894,8 @@ compute_ld_motion_mems (void)
{
  rtx src = SET_SRC (PATTERN (insn));
  rtx dest = SET_DEST (PATTERN (insn));
+ rtx note = find_reg_equal_equiv_note (insn);
+ rtx src_eq;
 
  /* Check for a simple LOAD...  */
  if (MEM_P (src)  simple_mem (src))
@@ -3910,6 +3912,15 @@ compute_ld_motion_mems (void)
  invalidate_any_buried_refs (src);
}
 
+ if (note != 0  REG_NOTE_KIND (note) == REG_EQUAL)
+   src_eq = XEXP (note, 0);
+ else
+   src_eq = NULL_RTX;
+
+ if (src_eq != NULL_RTX
+  !(MEM_P (src_eq)  simple_mem (src_eq)))
+   invalidate_any_buried_refs (src_eq);
+
  /* Check for stores. Don't worry about aliased ones, they
 will block any movement we might do later. We only care
 about this exact pattern since those are the only

69 matches

Mail list logo