Re: avx runtime check

2014-05-16 Thread Richard Biener
On May 16, 2014 4:47:11 AM CEST, Mike Stump mikest...@comcast.net wrote:
This reorders the avx checks and gates on a target triplet check before
compiling any code.

Can you explain why?

Ok?

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index 40b5414..103a28a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1353,8 +1353,8 @@ proc check_effective_target_sse2_runtime { } {
# Return 1 if the target supports running AVX executables, 0 otherwise.
 
 proc check_effective_target_avx_runtime { } {
-if { [check_effective_target_avx]
- [check_avx_hw_available]
+if { [check_avx_hw_available]
+ [check_effective_target_avx]
  [check_avx_os_support_available] } {
return 1
 }




Re: [PATCH] Implement -fsanitize=float-cast-overflow

2014-05-16 Thread Jakub Jelinek
On Thu, May 15, 2014 at 09:29:44PM +, Joseph S. Myers wrote:
 On Thu, 15 May 2014, Jakub Jelinek wrote:

  But I think we can't use decimal_real_from_string, we'd need a variant
  of that function that would allow specification of the rounding mode

 My point is that you can use %.*RUe or %.*RDe formats (for max and min
 respectively), with an appropriate precision, and let MPFR do the rounding
 to an appropriate number of decimal digits in the right direction (to
 produce a value that's exactly representable in the relevant DFP type, as
 long as it's in range).

You are right, that seems to work.  So new incremental patch.

BTW, for IBM long double
__int128_t f3 (long double x) { return x; }
the
u= { -170141183460469231731687303715884105728.0 + -4194304.0 }
check is actually imprecise, while all correct long double values will be
properly accepted, if high double is exactly
-170141183460469231731687303715884105728.0 and low double is in
[-1.0, -4194304.0), then the unspecified conversion will not be
diagnosed, but I'm afraid there is nothing we can (easily) do about it,
because { -170141183460469231731687303715884105728.0 + -1.0 }, while
representable in IBM long double, is not representable in our
REAL_VALUE_TYPE, because we assume fixed precision, while IBM long double
has variable.  Guess the combination of IBM long double and __int128_t
(for long long it is still ok) is rare enough that we don't need to care
about it (alternative would be e.g. to compare against addition of
those two numbers with some optimization barrier/volatile that would
avoid it from being optimized into a single long double number.

--- gcc/convert.c
+++ gcc/convert.c
@@ -851,6 +851,8 @@
  expr = save_expr (expr);
  tree check = ubsan_instrument_float_cast (loc, type, expr);
  expr = build1 (FIX_TRUNC_EXPR, type, expr);
+ if (check == NULL)
+   return expr;
  return fold_build2 (COMPOUND_EXPR, TREE_TYPE (expr), check, expr);
}
   else
--- gcc/ubsan.c
+++ gcc/ubsan.c
@@ -47,6 +47,8 @@
 #include asan.h
 #include gimplify-me.h
 #include intl.h
+#include realmpfr.h
+#include dfp.h
 
 /* Map from a tree to a VAR_DECL tree.  */
 
@@ -903,17 +905,95 @@
 ubsan_instrument_float_cast (location_t loc, tree type, tree expr)
 {
   tree expr_type = TREE_TYPE (expr);
-  tree t, tt, fn;
+  tree t, tt, fn, min, max;
+  enum machine_mode mode = TYPE_MODE (expr_type);
+  int prec = TYPE_PRECISION (type);
+  bool uns_p = TYPE_UNSIGNED (type);
 
-  tree min = TYPE_MIN_VALUE (type);
-  tree max = TYPE_MAX_VALUE (type);
-  /* Add/subtract 1.0 so we can avoid truncating the value of EXPR.  */
-  min = fold_build2 (MINUS_EXPR, expr_type,
-build_real_from_int_cst (expr_type, min),
-build_one_cst (expr_type));
-  max = fold_build2 (PLUS_EXPR, expr_type,
-build_real_from_int_cst (expr_type, max),
-build_one_cst (expr_type));
+  /* Float to integer conversion first truncates toward zero, so
+ even signed char c = 127.875f; is not problematic.
+ Therefore, we should complain only if EXPR is unordered or smaller
+ or equal than TYPE_MIN_VALUE - 1.0 or greater or equal than
+ TYPE_MAX_VALUE + 1.0.  */
+  if (REAL_MODE_FORMAT (mode)-b == 2)
+{
+  /* For maximum, TYPE_MAX_VALUE might not be representable
+in EXPR_TYPE, e.g. if TYPE is 64-bit long long and
+EXPR_TYPE is IEEE single float, but TYPE_MAX_VALUE + 1.0 is
+either representable or infinity.  */
+  REAL_VALUE_TYPE maxval = dconst1;
+  SET_REAL_EXP (maxval, REAL_EXP (maxval) + prec - !uns_p);
+  real_convert (maxval, mode, maxval);
+  max = build_real (expr_type, maxval);
+
+  /* For unsigned, assume -1.0 is always representable.  */
+  if (uns_p)
+   min = build_minus_one_cst (expr_type);
+  else
+   {
+ /* TYPE_MIN_VALUE is generally representable (or -inf),
+but TYPE_MIN_VALUE - 1.0 might not be.  */
+ REAL_VALUE_TYPE minval = dconstm1, minval2;
+ SET_REAL_EXP (minval, REAL_EXP (minval) + prec - 1);
+ real_convert (minval, mode, minval);
+ real_arithmetic (minval2, MINUS_EXPR, minval, dconst1);
+ real_convert (minval2, mode, minval2);
+ if (real_compare (EQ_EXPR, minval, minval2)
+  !real_isinf (minval))
+   {
+ /* If TYPE_MIN_VALUE - 1.0 is not representable and
+rounds to TYPE_MIN_VALUE, we need to subtract
+more.  As REAL_MODE_FORMAT (mode)-p is the number
+of base digits, we want to subtract a number that
+will be 1  (REAL_MODE_FORMAT (mode)-p - 1)
+times smaller than minval.  */
+ minval2 = dconst1;
+ gcc_assert (prec  REAL_MODE_FORMAT (mode)-p);
+ SET_REAL_EXP (minval2,
+   REAL_EXP (minval2) + prec - 1
+   - 

Re: [PATCH] Make SCCVN constant-fold calls

2014-05-16 Thread Richard Biener
On Thu, 15 May 2014, Richard Biener wrote:

 
 For some odd reason I didn't implement this earlier.  This is one
 major source of 2nd-stage opportunities that come up when running
 two adjacent FRE passes.
 
 Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

The following is what I have applied (with also properly
value-numbering VDEFs if there).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2014-05-16  Richard Biener  rguent...@suse.de

* tree-ssa-sccvn.c (visit_use): Also constant-fold calls.

* gcc.dg/tree-ssa/ssa-fre-41.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 210418)
--- gcc/tree-ssa-sccvn.c(working copy)
*** visit_use (tree use)
*** 3566,3593 
else if (is_gimple_call (stmt))
{
  tree lhs = gimple_call_lhs (stmt);
- 
- /* ???  We could try to simplify calls.  */
- 
  if (lhs  TREE_CODE (lhs) == SSA_NAME)
{
! if (stmt_has_constants (stmt))
!   VN_INFO (lhs)-has_constants = true;
! else
{
! /* We reset expr and constantness here because we may
!have been value numbering optimistically, and
!iterating.  They may become non-constant in this case,
!even if they were optimistically constant.  */
! VN_INFO (lhs)-has_constants = false;
! VN_INFO (lhs)-expr = NULL_TREE;
}
! 
! if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
{
! changed = defs_to_varying (stmt);
  goto done;
}
}
  
  if (!gimple_call_internal_p (stmt)
--- 3566,3635 
else if (is_gimple_call (stmt))
{
  tree lhs = gimple_call_lhs (stmt);
  if (lhs  TREE_CODE (lhs) == SSA_NAME)
{
! /* Try constant folding based on our current lattice.  */
! tree simplified = gimple_fold_stmt_to_constant_1 (stmt,
!   vn_valueize);
! if (simplified)
{
! if (dump_file  (dump_flags  TDF_DETAILS))
!   {
! fprintf (dump_file, call );
! print_gimple_expr (dump_file, stmt, 0, 0);
! fprintf (dump_file,  simplified to );
! print_generic_expr (dump_file, simplified, 0);
! if (TREE_CODE (lhs) == SSA_NAME)
!   fprintf (dump_file,  has constants %d\n,
!expr_has_constants (simplified));
! else
!   fprintf (dump_file, \n);
!   }
}
! /* Setting value numbers to constants will occasionally
!screw up phi congruence because constants are not
!uniquely associated with a single ssa name that can be
!looked up.  */
! if (simplified
!  is_gimple_min_invariant (simplified))
{
! VN_INFO (lhs)-expr = simplified;
! VN_INFO (lhs)-has_constants = true;
! changed = set_ssa_val_to (lhs, simplified);
! if (gimple_vdef (stmt))
!   changed |= set_ssa_val_to (gimple_vdef (stmt),
!  gimple_vuse (stmt));
  goto done;
}
+ else if (simplified
+   TREE_CODE (simplified) == SSA_NAME)
+   {
+ changed = visit_copy (lhs, simplified);
+ if (gimple_vdef (stmt))
+   changed |= set_ssa_val_to (gimple_vdef (stmt),
+  gimple_vuse (stmt));
+ goto done;
+   }
+ else
+   {
+ if (stmt_has_constants (stmt))
+   VN_INFO (lhs)-has_constants = true;
+ else
+   {
+ /* We reset expr and constantness here because we may
+have been value numbering optimistically, and
+iterating.  They may become non-constant in this case,
+even if they were optimistically constant.  */
+ VN_INFO (lhs)-has_constants = false;
+ VN_INFO (lhs)-expr = NULL_TREE;
+   }
+ 
+ if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+   {
+ changed = defs_to_varying (stmt);
+ goto done;
+   }
+   }
}
  
  if (!gimple_call_internal_p (stmt)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-41.c

Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate

2014-05-16 Thread Svante Signell
On Wed, 2014-05-07 at 10:18 +0200, Svante Signell wrote:
 On Tue, 2014-05-06 at 15:26 +0200, Samuel Thibault wrote:

Attached is an updated patch8.diff. Arch specific code to
src/libgo/mksysinfo.sh has been added, now other systems are not
affected by the patch except the SYS_FCNTL part.

For that part of the patch without it the build on GNU/Hurd fails. On
the other hand SYS_FCNTL is not defined for e.g. GNU/Linux either. This
is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go:
func dupCloseOnExec(fd int) (newfd int, err error) {
if atomic.LoadInt32(tryDupCloexec) == 1  syscall.F_DUPFD_CLOEXEC!=0 {
r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd),
syscall.F_DUPFD_CLOEXEC, 0)

It is yet unknown how the build succeeds on Linux without the SYS_FCNTL
being defined? Maybe any the conditions above are not met. 
--- a/src/libgo/mksysinfo.sh
+++ b/src/libgo/mksysinfo.sh
@@ -210,6 +210,13 @@
   egrep '#define E[A-Z0-9_]+ ' | \
   sed -e 's/^#define \(E[A-Z0-9_]*\) .*$/const \1 = Errno(_\1)/'  ${OUT}
 
+# Special treatment of EWOULDBLOCK for GNU/Hurd
+# /usr/include/bits/errno.h: #define EWOULDBLOCK EAGAIN
+if egrep 'define EWOULDBLOCK EAGAIN' gen-sysinfo.go  /dev/null 21; then
+  egrep '^const EWOULDBLOCK = Errno(_EWOULDBLOCK)' ${OUT} | \
+sed -i.bak -e 's/_EWOULDBLOCK/_EAGAIN/' ${OUT}
+fi
+
 # The O_xxx flags.
 egrep '^const _(O|F|FD)_' gen-sysinfo.go | \
   sed -e 's/^\(const \)_\([^= ]*\)\(.*\)$/\1\2 = _\2/'  ${OUT}
@@ -225,6 +232,11 @@
   echo const F_DUPFD_CLOEXEC = 0  ${OUT}
 fi
 
+# Special treatment of SYS_FCNTL for GNU/Hurd
+if ! grep '^const SYS_FCNTL' ${OUT}  /dev/null 21; then
+  echo const SYS_FCNTL = 0  ${OUT}
+fi
+
 # These flags can be lost on i386 GNU/Linux when using
 # -D_FILE_OFFSET_BITS=64, because we see #define F_SETLK F_SETLK64
 # before we see the definition of F_SETLK64.
@@ -528,6 +540,12 @@
 
 # The stat type.
 # Prefer largefile variant if available.
+# Special treatment of st_dev for GNU/Hurd
+# /usr/include/i386-gnu/bits/stat.h: #define st_dev st_fsid
+if grep 'define st_dev st_fsid' gen-sysinfo.go  /dev/null 21; then
+  egrep '^type _stat ' gen-sysinfo.go  /dev/null 21| \
+  sed -i.bak -e 's/; st_fsid/; st_dev/' gen-sysinfo.go
+fi
 stat=`grep '^type _stat64 ' gen-sysinfo.go || true`
 if test $stat != ; then
   grep '^type _stat64 ' gen-sysinfo.go


Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware

2014-05-16 Thread Richard Biener
On Thu, 8 May 2014, Richard Biener wrote:

 
 Ok, not really predicate aware, but this makes value-numbering
 pessimistically handle non-executable edges.  In the following
 patch groundwork is laid and PHI value-numbering is adjusted
 to take advantage of edges known to be not executable.
 
 SCCVN is not well-suited to be control aware, but we still
 can see if value-numbering allows us to mark edges as
 not executable by looking at control statements.  Value-numbering
 of PHI nodes is one obvious consumer of such information
 and it also gives a natural order to do that (pessimistic)
 edge executability computation - dominator order.
 
 Thus the following adds a pass over all control statements,
 trying to simplify them after value-numbering their operands
 (and all uses recursively, as SCCVN does).
 
 With followup patches I will try to use this information to
 reduce the amount of work done (also improving optimization,
 of course).  One other obvious candidate is the alias walker
 which doesn't have to consider unreachable paths when
 walking into virtual PHIs.
 
 The patch will likely get some more cleanups (due to the hack
 in set_ssa_val_to).
 
 Comments still welcome.

Quiet as usual.  Well, the following is what I have committed
after bootstrapping and regtesting on x86_64-unknown-linux-gnu.
It fixes the inliner which is confused by random pass-local
flags on the edges to the exit block, adds one more testcase
and adjusts two.

I figured that followups for more optimizations are not
necessary as virtual operand value-numbering already gets
us most of the benefit.  Followups trying to do less work
may still be possible but they are low on priority.

Richard.

2014-05-16  Richard Biener  rguent...@suse.de

* tree-ssa-sccvn.c: Include tree-cfg.h and domwalk.h.
(set_ssa_val_to): Handle unexpected sets to VN_TOP.
(visit_phi): Ignore edges marked as not executable.
(class cond_dom_walker): New.
(cond_dom_walker::before_dom_children): Value-number
control statements and mark successor edges as not
executable if possible.
(run_scc_vn): First walk all control statements in
dominator order, marking edges as not executable.
* tree-inline.c (copy_edges_for_bb): Be not confused
about random edge flags.

* gcc.dg/tree-ssa/ssa-fre-39.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-40.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-8.c: One more elimination.
* gcc.dg/tree-ssa/struct-aliasing-2.c: Scan cddce1 dump.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c.orig   2014-05-15 12:47:20.762286122 +0200
--- gcc/tree-ssa-sccvn.c2014-05-15 13:04:57.872213342 +0200
*** along with GCC; see the file COPYING3.
*** 51,56 
--- 51,58 
  #include params.h
  #include tree-ssa-propagate.h
  #include tree-ssa-sccvn.h
+ #include tree-cfg.h
+ #include domwalk.h
  
  /* This algorithm is based on the SCC algorithm presented by Keith
 Cooper and L. Taylor Simpson in SCC-Based Value numbering
*** set_ssa_val_to (tree from, tree to)
*** 2661,2666 
--- 2663,2687 
tree currval = SSA_VAL (from);
HOST_WIDE_INT toff, coff;
  
+   /* The only thing we allow as value numbers are ssa_names
+  and invariants.  So assert that here.  We don't allow VN_TOP
+  as visiting a stmt should produce a value-number other than
+  that.
+  ???  Still VN_TOP can happen for unreachable code, so force
+  it to varying in that case.  Not all code is prepared to
+  get VN_TOP on valueization.  */
+   if (to == VN_TOP)
+ {
+   if (dump_file  (dump_flags  TDF_DETAILS))
+   fprintf (dump_file, Forcing value number to varying on 
+receiving VN_TOP\n);
+   to = from;
+ }
+ 
+   gcc_assert (to != NULL_TREE
+  (TREE_CODE (to) == SSA_NAME
+ || is_gimple_min_invariant (to)));
+ 
if (from != to)
  {
if (currval == from)
*** set_ssa_val_to (tree from, tree to)
*** 2680,2692 
to = from;
  }
  
-   /* The only thing we allow as value numbers are VN_TOP, ssa_names
-  and invariants.  So assert that here.  */
-   gcc_assert (to != NULL_TREE
-  (to == VN_TOP
- || TREE_CODE (to) == SSA_NAME
- || is_gimple_min_invariant (to)));
- 
if (dump_file  (dump_flags  TDF_DETAILS))
  {
fprintf (dump_file, Setting value number of );
--- 2701,2706 
*** visit_phi (gimple phi)
*** 3071,3077 
tree result;
tree sameval = VN_TOP;
bool allsame = true;
-   unsigned i;
  
/* TODO: We could check for this in init_sccvn, and replace this
   with a gcc_assert.  */
--- 3085,3090 
*** visit_phi (gimple phi)
*** 3080,3106 
  
/* See if all non-TOP arguments have the same value.  TOP is
   equivalent to 

Re: [AArch64 costs 0/18] Improve address- and rtx-cost models

2014-05-16 Thread Marcus Shawcroft
This series is OK.
/Marcus

On 27 March 2014 17:33, James Greenhalgh james.greenha...@arm.com wrote:
 Hi,

 This patch series improves the costing model in the AArch64 backend to
 match a number of new idioms.

 This patch is a combination of a series I had been working on, with the
 cost-model for XGene-1 proposed by Philipp Tomsich.
 ( http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01084.html )

 Where sensible I have integrated the idiom matching in Philipp's
 with my own work, though there were cases which were redundant, or
 could be folded to reduce code duplication. There were other cases where
 the code suggested XGene-1 would benefit from special-case handling.
 Without documentation for the XGene-1 I can't cater for these special
 cases, and I have not tried to do so here.

 The patch series has been bootstrapped natively on aarch64-none-elf, and
 has been through aarch64-none-elf runs with no issue.

 Is this OK for stage-1?

 Thanks,
 James


[AArch64 costs] Fixup to costing of FNMUL

2014-05-16 Thread James Greenhalgh

Hi,

While waiting for approval of the costs patches, I spotted this bug.
FNMUL should be costed like an FMUL, rather than being costed like
an FMA instruction.

Bootstrapped and tested in series with the costs patches with no issues.

I see Marcus has approved the patch series, so is this OK to go in as a fixup
to that?

Thanks,
James

---
gcc/

2014-05-15  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Fix FNMUL case.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6eb2303..0a7f441 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4711,24 +4711,18 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 {
   if (speed)
 	{
-	  /* Floating-point FMA can also support negations of the
+	  /* Floating-point FMA/FMUL can also support negations of the
 	 operands.  */
 	  if (GET_CODE (op0) == NEG)
-	{
-	  maybe_fma = true;
-	  op0 = XEXP (op0, 0);
-	}
+	op0 = XEXP (op0, 0);
 	  if (GET_CODE (op1) == NEG)
-	{
-	  maybe_fma = true;
-	  op1 = XEXP (op1, 0);
-	}
+	op1 = XEXP (op1, 0);
 
 	  if (maybe_fma)
 	/* FMADD/FNMADD/FNMSUB/FMSUB.  */
 	cost += extra_cost-fp[mode == DFmode].fma;
 	  else
-	/* FMUL.  */
+	/* FMUL/FNMUL.  */
 	cost += extra_cost-fp[mode == DFmode].mult;
 	}
 

Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-16 Thread Nicholas Clifton

Hi Eric,

  OK - here is your version of the patch, extended with a comment which 
I think is helpful for other people reading the code, and with the 
changes to builtins.c and md.texi removed, since the size of the buffer 
is not changing.


  Is this version OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2014-05-16  Nick Clifton  ni...@redhat.com

* except.c (init_eh): Correct computation of the size of a builtin
setjmp buffer for when pointers are bigger than words.


Index: gcc/except.c
===
--- gcc/except.c(revision 210490)
+++ gcc/except.c(working copy)
@@ -286,9 +286,22 @@
   tmp = size_int (FIRST_PSEUDO_REGISTER + 2 - 1);
 #endif
 #else
-  /* builtin_setjmp takes a pointer to 5 words.  */
-  tmp = size_int (5 * BITS_PER_WORD / POINTER_SIZE - 1);
+  /* Compute a minimally sized jump buffer.  We need room to store at
+least 3 pointers - stack pointer, frame pointer and return address.
+Plus for some targets we need room for an extra pointer - in the
+case of MIPS this is the global pointer.  This makes a total of four
+pointers, but to be safe we actually allocate room for 5.
+
+If pointers are smaller than words then we allocate enough room for
+5 words, just in case the backend needs this much room.  For more
+discussion on this issue see:
+http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00313.html.  */
+  if (POINTER_SIZE  BITS_PER_WORD)
+   tmp = size_int (5 - 1);
+  else
+   tmp = size_int ((5 * BITS_PER_WORD / POINTER_SIZE) - 1);
 #endif
   tmp = build_index_type (tmp);
   tmp = build_array_type (ptr_type_node, tmp);




Re: [AArch64 costs] Fixup to costing of FNMUL

2014-05-16 Thread Marcus Shawcroft
On 16 May 2014 09:26, James Greenhalgh james.greenha...@arm.com wrote:

 2014-05-15  James Greenhalgh  james.greenha...@arm.com

 * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Fix FNMUL case.

OK /Marcus


Avoiding some garbage rtl from instantiate_virtual_regs

2014-05-16 Thread Richard Sandiford
It seems that in a typical -O0 compile the amount of rtl that
starts out as needed but becomes garbage is only slightly less than
half of the total amount created.  One of the big offenders is the
vregs pass, which creates new PLUSes when instanstiating a virtual
register + a constant and which creates new MEMs when instantiating
an address involving a virtual register.  This happens a lot in -O0
code because all variables live on the stack.

The instantiation walk is fundamentally in-place: every other part
of the pattern is modified without copying.  And rtl sharing rules
guarantee that we can do the same for PLUSes of registers and MEMs.

The patch does this by adding inplace arguments to plus_constant and
replace_equiv_address.  In a -O0 compile of an oldish fold-const.ii
(where no GC takes place) it reduces the amount of used GC memory
from 169M to 166M.  The average max RSS goes down by just over 1%.
Compile time seems to decrease slightly, but probably in the noise range.

There might be other callers than can use the new interfaces too.

Tested on x86_64-linux-gnu.  Also tested by comparing the asm output
for various parts of the testsuite before and after the patch.
The only changes were that some sym+0s becamse plain syms
(i.e. (plus X (const_int 0)) became X) because of the plus_constant
change.

OK to install?

Thanks,
Richard


gcc/
* emit-rtl.h (replace_equiv_address, replace_equiv_address_nv): Add an
inplace argument.  Store the new address in the original MEM when true.
* emit-rtl.c (change_address_1): Likewise.
(adjust_address_1, adjust_automodify_address_1, offset_address):
Update accordingly.
* rtl.h (plus_constant): Add an inplace argument.
* explow.c (plus_constant): Likewise.  Try to reuse the original PLUS
when true.  Avoid generating (plus X (const_int 0)).
* function.c (instantiate_virtual_regs_in_rtx): Adjust the PLUS
in-place.  Pass true to plus_constant.
(instantiate_virtual_regs_in_insn): Pass true to replace_equiv_address.

Index: gcc/emit-rtl.h
===
--- gcc/emit-rtl.h  2014-05-15 11:27:06.000259353 +0100
+++ gcc/emit-rtl.h  2014-05-16 09:11:42.479556294 +0100
@@ -52,10 +52,10 @@ extern tree get_spill_slot_decl (bool);
ADDR.  The caller is asserting that the actual piece of memory pointed
to is the same, just the form of the address is being changed, such as
by putting something into a register.  */
-extern rtx replace_equiv_address (rtx, rtx);
+extern rtx replace_equiv_address (rtx, rtx, bool = false);
 
 /* Likewise, but the reference is not required to be valid.  */
-extern rtx replace_equiv_address_nv (rtx, rtx);
+extern rtx replace_equiv_address_nv (rtx, rtx, bool = false);
 
 extern rtx gen_blockage (void);
 extern rtvec gen_rtvec (int, ...);
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2014-05-16 09:09:18.446271662 +0100
+++ gcc/emit-rtl.c  2014-05-16 09:40:25.285714457 +0100
@@ -145,7 +145,6 @@ #define cur_insn_uid (crtl-emit.x_cur_i
 #define cur_debug_insn_uid (crtl-emit.x_cur_debug_insn_uid)
 #define first_label_num (crtl-emit.x_first_label_num)
 
-static rtx change_address_1 (rtx, enum machine_mode, rtx, int);
 static void set_used_decls (tree);
 static void mark_label_nuses (rtx);
 static hashval_t const_int_htab_hash (const void *);
@@ -2010,11 +2009,15 @@ clear_mem_size (rtx mem)
 /* Return a memory reference like MEMREF, but with its mode changed to MODE
and its address changed to ADDR.  (VOIDmode means don't change the mode.
NULL for ADDR means don't change the address.)  VALIDATE is nonzero if the
-   returned memory location is required to be valid.  The memory
-   attributes are not changed.  */
+   returned memory location is required to be valid.  INPLACE is true if any
+   changes can be made directly to MEMREF or false if MEMREF must be treated
+   as immutable.
+
+   The memory attributes are not changed.  */
 
 static rtx
-change_address_1 (rtx memref, enum machine_mode mode, rtx addr, int validate)
+change_address_1 (rtx memref, enum machine_mode mode, rtx addr, int validate,
+ bool inplace)
 {
   addr_space_t as;
   rtx new_rtx;
@@ -2042,6 +2045,12 @@ change_address_1 (rtx memref, enum machi
   if (rtx_equal_p (addr, XEXP (memref, 0))  mode == GET_MODE (memref))
 return memref;
 
+  if (inplace)
+{
+  XEXP (memref, 0) = addr;
+  return memref;
+}
+
   new_rtx = gen_rtx_MEM (mode, addr);
   MEM_COPY_ATTRIBUTES (new_rtx, memref);
   return new_rtx;
@@ -2053,7 +2062,7 @@ change_address_1 (rtx memref, enum machi
 rtx
 change_address (rtx memref, enum machine_mode mode, rtx addr)
 {
-  rtx new_rtx = change_address_1 (memref, mode, addr, 1);
+  rtx new_rtx = change_address_1 (memref, mode, addr, 1, false);
   enum machine_mode mmode = GET_MODE (new_rtx);
   struct 

[PATCH SH] Don't switch mode around fmov (pr61195)

2014-05-16 Thread Christian Bruel
Hi,

This patch reduces the number of unnecessary PR mode switches for single
precision moves. When FPSCR.SZ is not forced. Test illustration in the patch

Regtested for sh-none-elf with -m4, (-m2a still running), bootstrapped
on sh4-linux-elf board. ok for trunk ?

Many thanks,

Christian





 


2014-05-16  Christian Bruel  christian.br...@st.com

	PR target/61195
	* config/sh/sh.c (movsf_ie): Unset fp_mode for fmov.

2014-05-16  Christian Bruel  christian.br...@st.com

	PR target/61195
	* gcc.target/sh/pr61195.c: New test.

Index: config/sh/sh.md
===
--- config/sh/sh.md	(revision 210475)
+++ config/sh/sh.md	(working copy)
@@ -8357,9 +8357,26 @@ label:
   (const_int 2)
   (const_int 2)
   (const_int 0)])
-   (set (attr fp_mode) (if_then_else (eq_attr fmovd yes)
-	   (const_string single)
-	   (const_string single)))])
+  (set_attr_alternative fp_mode
+ [(if_then_else (eq_attr fmovd yes) (const_string single) (const_string none))
+  (const_string none)
+  (const_string single)
+  (const_string single)
+  (const_string none)
+  (if_then_else (eq_attr fmovd yes) (const_string single) (const_string none))
+  (if_then_else (eq_attr fmovd yes) (const_string single) (const_string none))
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)
+  (const_string none)])])
 
 (define_split
   [(set (match_operand:SF 0 register_operand )
Index: testsuite/gcc.target/sh/pr61195.c
===
--- testsuite/gcc.target/sh/pr61195.c	(revision 0)
+++ testsuite/gcc.target/sh/pr61195.c	(working copy)
@@ -0,0 +1,19 @@
+/* Verify that we don't switch mode for single moves.  */
+/* { dg-do compile }  */
+/* { dg-require-effective-target hard_float } */
+/* { dg-skip-if  { *-*-* }  { mfmovd } {  } } */
+/* { dg-final { scan-assembler-not fpscr } } */
+
+float *g;
+
+float
+foo(float f)
+{
+  return f;
+}
+
+float
+foo1(void)
+{
+  return *g;
+}


RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-05-16 Thread Ian Bolton
Ping.  This should be relatively simple to review.

Many thanks.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Ian Bolton
 Sent: 08 May 2014 18:36
 To: gcc-patches
 Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative
 immediates where sensible
 
 Hi,
 
 It currently takes 4 instructions to generate certain immediates on
 AArch64 (unless we put them in the constant pool).
 
 For example ...
 
   long long
   beefcafebabe ()
   {
 return 0xBEEFCAFEBABEll;
   }
 
 leads to ...
 
   mov x0, 0x47806
   mov x0, 0xcafe, lsl 16
   mov x0, 0xbeef, lsl 32
   orr x0, x0, -281474976710656
 
 The above case is tackled in this patch by employing MOVN
 to generate the top 32-bits in a single instruction ...
 
   mov x0, -71536975282177
   movk x0, 0xcafe, lsl 16
   movk x0, 0xbabe, lsl 0
 
 Note that where at least two half-words are 0x, existing
 code that does the immediate in two instructions is still used.)
 
 Tested on standard gcc regressions and the attached test case.
 
 OK for commit?
 
 Cheers,
 Ian
 
 
 2014-05-08  Ian Bolton  ian.bol...@arm.com
 
 gcc/
   * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
   Use MOVN when top-most half-word (and only that half-word)
   is 0x.
 gcc/testsuite/
   * gcc.target/aarch64/movn_1.c: New test.





RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case

2014-05-16 Thread Ian Bolton
Ping.  This may well be classed as obvious, but that's not
obvious to me, so I request a review.  Many thanks.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Ian Bolton
 Sent: 08 May 2014 18:42
 To: gcc-patches
 Subject: [PATCH, AArch64] Fix macro in vdup_lane_2 test case
 
 This patch fixes a defective macro definition, based on correct
 definition in similar testcases.  The test currently passes
 through luck rather than correctness.
 
 OK for commit?
 
 Cheers,
 Ian
 
 
 2014-05-08  Ian Bolton  ian.bol...@arm.com
 
 gcc/testsuite
   * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an
   actual instruction to move into the allocated register.





Re: [Patch, avr] Propagate -mrelax gcc driver flag to assembler

2014-05-16 Thread Georg-Johann Lay

Am 05/15/2014 09:55 AM, schrieb Senthil Kumar Selvaraj:

On Wed, May 14, 2014 at 12:56:54PM +0200, Rainer Orth wrote:

Georg-Johann Lay a...@gjlay.de writes:


Or what about simply that, which works for me:


Index: config/avr/avr.h
===
--- config/avr/avr.h(revision 210276)
+++ config/avr/avr.h(working copy)
@@ -512,7 +512,11 @@ extern const char *avr_device_to_sp8 (in
  %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
  %{!fexceptions:-fno-exceptions}

+#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
+#define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) %{mrelax:-mlink-relax} 
+#else
  #define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) 
+#endif

  #define LINK_SPEC \
  %{mrelax:--relax\


Better yet something like

#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
#define LINK_RELAX_SPEC %{mrelax:-mlink-relax} 
#else
#define LINK_RELAX_SPEC 
#endif

#define ASM_SPEC %:device_to_as(%{mmcu=*:%*})  LINK_RELAX_SPEC



Does this look ok? I don't have commit access, so could someone commit
this please?


Hi, looks fine to me.  Thanks

Usually, changelogs are more descriptive w.r.t. to what objects are touched 
like:

* config/avr/avr.h (LINK_RELAX_SPEC): Pass -mlink-relax to the
assembler, depending on HAVE_AS_AVR_LINK_RELAX_OPTION.
(ASM_SPEC): Use it.
* configure.ac (HAVE_AVR_AS_LINK_RELAX_OPTION) [avr]: New define if
assembler supports -mlink-relax.
* config.in: Regenerate.
* configure: Likewise.



Regards
Senthil

2014-05-15  Senthil Kumar Selvaraj  senthil_kumar.selva...@atmel.com

* config/avr/avr.h: Pass on mlink-relax to assembler.
* configure.ac: Test for mlink-relax assembler support.
* config.in: Regenerate.
* configure: Likewise.

diff --git gcc/config.in gcc/config.in
index c0ba36e..1738301 100644
--- gcc/config.in
+++ gcc/config.in
@@ -575,6 +575,12 @@
  #endif


+/* Define if your assembler supports -mlink-relax option. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AVR_AS_LINK_RELAX_OPTION
+#endif
+
+
  /* Define to 1 if you have the `clearerr_unlocked' function. */
  #ifndef USED_FOR_TARGET
  #undef HAVE_CLEARERR_UNLOCKED
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 9d34983..c59c54d 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -512,8 +512,14 @@ extern const char *avr_device_to_sp8 (int argc, const char 
**argv);
  %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
  %{!fexceptions:-fno-exceptions}

-#define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) 
-
+#ifdef HAVE_AVR_AS_LINK_RELAX_OPTION
+#define ASM_RELAX_SPEC %{mrelax:-mlink-relax}
+#else
+#define ASM_RELAX_SPEC 
+#endif
+
+#define ASM_SPEC %:device_to_as(%{mmcu=*:%*})  ASM_RELAX_SPEC
+
  #define LINK_SPEC \
  %{mrelax:--relax\
   %{mpmem-wrap-around:%{mmcu=at90usb8*:--pmem-wrap-around=8k}\
diff --git gcc/configure gcc/configure
index f4db0a0..2812cdb 100755
--- gcc/configure
+++ gcc/configure
@@ -24014,6 +24014,39 @@ $as_echo #define HAVE_AS_JSRDIRECT_RELOCS 1 
confdefs.h
  fi
  ;;

+  avr-*-*)
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for -mlink-relax 
option 5
+$as_echo_n checking assembler for -mlink-relax option...  6; }
+if test ${gcc_cv_as_avr_relax+set} = set; then :
+  $as_echo_n (cached)  6
+else
+  gcc_cv_as_avr_relax=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.text'  conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mlink-relax -o conftest.o conftest.s 
5'
+  { { eval echo \\$as_me\:${as_lineno-$LINENO}: \$ac_try\; } 5
+  (eval $ac_try) 25
+  ac_status=$?
+  $as_echo $as_me:${as_lineno-$LINENO}: \$? = $ac_status 5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_avr_relax=yes
+else
+  echo configure: failed program was 5
+  cat conftest.s 5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_avr_relax 5
+$as_echo $gcc_cv_as_avr_relax 6; }
+if test $gcc_cv_as_avr_relax = yes; then
+
+$as_echo #define HAVE_AVR_AS_LINK_RELAX_OPTION 1 confdefs.h
+
+fi
+  ;;
+
cris-*-*)
  { $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for -no-mul-bug-abort 
option 5
  $as_echo_n checking assembler for -no-mul-bug-abort option...  6; }
diff --git gcc/configure.ac gcc/configure.ac
index 8f17dfb..49a1f3d 100644
--- gcc/configure.ac
+++ gcc/configure.ac
@@ -3510,6 +3510,13 @@ case $target in
[Define if your assembler supports the lituse_jsrdirect relocation.])])
  ;;

+  avr-*-*)
+gcc_GAS_CHECK_FEATURE([-mlink-relax option], gcc_cv_as_avr_relax,,
+  [-mlink-relax], [.text],,
+  [AC_DEFINE(HAVE_AVR_AS_LINK_RELAX_OPTION, 1,
+   [Define if your assembler supports -mlink-relax option.])])
+  ;;
+
cris-*-*)
  gcc_GAS_CHECK_FEATURE([-no-mul-bug-abort option],
gcc_cv_as_cris_no_mul_bug,[2,15,91],





RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Thomas Preud'homme
Ping?

Best regards,

Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
 
 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current setup
 is actually more difficult to read.
 
 Here are the updated ChangeLogs:
 
 *** gcc/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
   PR tree-optimization/54733
   * expr.c (get_inner_reference): Add a parameter to control whether
 a
   MEM_REF should be split into base + offset.
   * tree.h (get_inner_reference): Default new parameter to false.
   * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
   (CMPNOP): Define.
   (find_bswap_or_nop_load): New.
   (find_bswap_1): Renamed to ...
   (find_bswap_or_nop_1): This. Also add support for memory source.
   (find_bswap): Renamed to ...
   (find_bswap_or_nop): This. Also add support for memory source and
   detection of bitwise operations equivalent to load in host endianness.
   (execute_optimize_bswap): Likewise. Also move its leading
 comment back
   in place and split statement transformation into ...
   (bswap_replace): This. Add assert when updating bswap_stats.
 
 *** gcc/testsuite/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
   PR tree-optimization/54733
   * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
   optimization to support memory sources and bitwise operations
   equivalent to load in host endianness.
   * gcc.dg/optimize-bswaphi-1.c: Likewise.
   * gcc.dg/optimize-bswapsi-2.c: Likewise.
   * gcc.c-torture/execute/bswap-2.c: Likewise.
 
 Ok for trunk?
 
 Best regards,
 
 Thomas
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
  Sent: Monday, May 05, 2014 7:30 PM
  To: GCC Patches
  Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
  load/store
 
  I found a way to improve the function find_bswap/find_bswap_or_nop
  and reduce its size. Please hold for the review, I will post an updated
  version as soon as I finish testing.
 
  Best regards,
 
  Thomas Preud'homme
 
 
 




Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Fri, May 16, 2014 at 12:57 AM, Steven Bosscher stevenb@gmail.com wrote:
 On Thu, May 15, 2014 at 9:26 AM, bin.cheng wrote:
 Hi,
 Targets like ARM and AARCH64 support double-word load store instructions,
 and these instructions are generally faster than the corresponding two
 load/stores.  GCC currently uses peephole2 to merge paired load/store into
 one single instruction which has a disadvantage.  It can only handle simple
 cases like the two instructions actually appear sequentially in instruction
 stream, and is too weak to handle cases in which the two load/store are
 intervened by other irrelevant instructions.

 Here comes up with a new GCC pass looking through each basic block and
 merging paired load store even they are not adjacent to each other.  The
 algorithm is pretty simple:
 1) In initialization pass iterating over instruction stream it collects
 relevant memory access information for each instruction.
 2) It iterates over each basic block, tries to find possible paired
 instruction for each memory access instruction.  During this work, it checks
 dependencies between the two possible instructions and also records the
 information indicating how to pair the two instructions.  To avoid quadratic
 behavior of the algorithm, It introduces new parameter
 max-merge-paired-loadstore-distance and set the default value to 4, which is
 large enough to catch major part of opportunities on ARM/cortex-a15.
 3) For each candidate pair, it calls back-end's hook to do target dependent
 check and merge the two instructions if possible.

 Though the parameter is set to 4, for miscellaneous benchmarks, this pass
 can merge numerous opportunities except ones already merged by peephole2
 (same level numbers of opportunities comparing to peepholed ones).  GCC
 bootstrap can also confirm this finding.

 Yet there is an open issue about when we should run this new pass.  Though
 register renaming is disabled by default now, I put this pass after it,
 because renaming can resolve some false dependencies thus benefit this pass.
 Another finding is, it can capture a lot more opportunities if it's after
 sched2, but I am not sure whether it will mess up with scheduling results in
 this way.

 So, any comments about this?

Hi Steven,
Thanks for reviewing this.  Here are some answers to the general questions.


 First off: Why does this need a target hook? We're getting more and
 more of them -- too many IMHO. There should be really good reasons for
 adding even more new ones.

Yes, I think this one does have a good reason.  The target independent
pass just makes sure that two consecutive memory access instructions
are free of data-dependency with each other, then feeds it to back-end
hook.  It's back-end's responsibility to generate correct instruction.
 It's not about modifying an existing insn then recognize it, it's
about creating new instruction sometimes.  For example, we can
generate a simple move insn in Arm mode, while have to generate a
parallel instruction in Thumb mode.  Target independent part has no
idea how to generate an expected insn.  Moreover, back-end may check
some special conditions too.


 Does this pass have to run after scheduling? The way you describe it,
No, I just meant there is more opportunities after regrenaming, and
even more opportunities after sched2, I haven't investigated reason
for the latter one yet,  but this pass doesn't depend on sched2 to
work.

 this sounds like a scheduling problem, where you don't need regrename
 to resolve false dependencies. Your sched2 pass should be able to
 prioritize mergeable loads/stores to schedule them adjacent. Of if you
 must do this before scheduling, then at least do it before sched2. Now
 you're really ruining the order of the scheduled instructions, which
 seems bad.
Yes, I agree it should NOT disturb scheduling results, that's why I
put it before sched2 and after register renaming right now.


 I don't see how regrename will help resolve [base+offset] false
 dependencies. Can you explain? I'd expect effects from
 hardreg-copyprop commoning a base register.
It's the register operand's false dependency, rather than the base's
one.  Considering below simple case:
mov r1,  #const1
store r1, [base+offset1]
mov r1, #const2
store r1, [base_offset2]
It should be renamed into:
mov r1,  #const1
store r1, [base+offset1]
mov r2, #const2
store r2, [base_offset2]
Then caught by the patch.

I will leave other comments for a moment after more discussion here.

Thanks,
bin

 ChangeLog is missing the entry for arm.c.

 Your pass should make those peephole2's redundant, so you should
 remove the relevant define_peephole2's.


  +   generated by spilling during reigster allocation.  To catch all

 s/reigster/register/


 +   whose address expression is in the form of [base_offset].  It

 s/[base_offset]/[base+offset]/


 +   only guarantee that the two consecutive memory access instructions

 s/guarantee/guarantees/


 +   

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Fri, May 16, 2014 at 1:13 AM, Jeff Law l...@redhat.com wrote:
 On 05/15/14 10:51, Mike Stump wrote:

 On May 15, 2014, at 12:26 AM, bin.cheng bin.ch...@arm.com wrote:

 Here comes up with a new GCC pass looking through each basic block
 and merging paired load store even they are not adjacent to each
 other.


 So I have a target that has load and store multiple support that
 supports large a number of registers (2-n registers), and I added a
 sched0 pass that is a light copy of the regular scheduling pass that
 uses a different cost function which arranges all loads first, then
 all stores then everything else.  Within a group of loads or stores
 the secondary key is the base register, the next key is the offset.
 The net result, all loads off the same register are sorted in
 increasing order.

 Glad to see someone else stumble on (ab)using the scheduler to do this.
Emm, If it's (ab)using, should we still do it then?


 I've poked at the scheduler several times to do similar stuff, but was never
 really satisfied with the results and never tried to polish those prototypes
 into something worth submitting.

 One example I've poked at was discovery of stores which then feed into a
 load from the same location.  Which obviously we'd prefer to turn into a
 store + copy (subject to mess of constraints).  There's a handful of these
 kind of transformations that seem to naturally drop out of this kind of
 work.
As Mike stated, merging of consecutive memory accesses is all about
the base register and the offset. I am thinking another method
collecting all memory accesses with same base register then doing the
merge work.  In this way, we should be able to merge more than 2
instructions, also it would be possible to remove redundant load
instructions in one pass.

My question is how many these redundant loads could be?  Is there any
rtl pass responsible for this now?

Thanks,
bin

 Similarly a post-reload pass could be used to promote single word
 loads/stores to double-word operations.

 If anyone cared about PA 1.1 code generation, it'd be a much cleaner way to
 support the non-fused fmpyadd fmpysub insns.

 Anyway, if you want to move forward with the idea, I'd certainly support
 doing so.

 jeff



-- 
Best Regards.


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Fri, May 16, 2014 at 12:51 AM, Mike Stump mikest...@comcast.net wrote:
 On May 15, 2014, at 12:26 AM, bin.cheng bin.ch...@arm.com wrote:
 Here comes up with a new GCC pass looking through each basic block and
 merging paired load store even they are not adjacent to each other.

 So I have a target that has load and store multiple support that supports 
 large a number of registers (2-n registers), and I added a sched0 pass that 
 is a light copy of the regular scheduling pass that uses a different cost 
 function which arranges all loads first, then all stores then everything 
 else.  Within a group of loads or stores the secondary key is the base 
 register, the next key is the offset.  The net result, all loads off the same 
 register are sorted in increasing order.  We then can use some define_insns 
 and some peephole to patterns to take the seemingly unrelated instructions, 
 which are now adjacent to knock them down into single instructions, instead 
 of the mass of instructions they were before.  And then a peephole pass that 
 runs early to allow the patterns to do the heavy lifting.  This scheme can 
 handle unlimited complexity on the addressing forms just by ensuring the cost 
 function for the new scheduling pass looks at all relevant bits (target 
 dependent, if the simple machine independent form reg + n is not enough).  
 The sched0 and the peephole pass run early:

 +  NEXT_PASS (pass_sched0);
 +  NEXT_PASS (pass_peephole1);
NEXT_PASS (pass_web);
NEXT_PASS (pass_rtl_cprop);
NEXT_PASS (pass_cse2);

 (before register allocation) so, it can arrange to have things in adjacent 
 registers for the load and store multiple instructions.  The register 
 allocator can then arrange all the code to use those registers directly.

 So, having done all this work, I think it would be nicer if there were a pass 
 that managed it (so that one doesn't have to write any of the peephole or the 
 define_insns (you need like 3*n patterns, and the patterns of O(n), so, you 
 need around n*4/2 lines of code, which is annoying for large n.  A pass could 
 use the existing load store multiple patterns directly, so, no additional 
 port work.  In my work, since I extend life times of values into registers, 
 pretty much without limit, this could be a bad thing.  The code is naturally 
 limited to only extending the lives of things where load and store multiple 
 are used, as if they aren't used, the regular scheduling pass would undo all 
 the sched0 motion.  I choose a light copy of sched, as I don't care about 
 compile time, and I wanted a very small patch that was easy to maintain.  If 
 this pass when into trunk, we'd run the new passes _only_ if a port asked for 
 them.  99% of the ports likely don't want either, though, peephole before 
 register allocation might be interesting for others to solve other issues.

 I wanted this to run before register allocation as my load and store multiple 
 instructions only take consecutive register ranges (n-m), and I need the 
 register allocator to manage to make it true.  I do reg to reg motion to move 
 things around as needed, but almost all I expect the allocator to get rid of. 
  Very complex cases might wind up with a few extra moves, but I have nice 
 bubbles that I can fit these extra moves into.

 In my scheme, no new options, no documentation for new options, no new param 
 options, no silly constants, no hard to write/maintain pass, no new weird 
 targets interface, no limit on just pairs, works on stores as well, runs 
 earlier, 430 lines instead of 1086 lines, conceptually much simpler, added 
 benefit of peephole before register allocation that can be used for other 
 things by the port...

 So, my question is, does my scheme on your target find more or fewer things?  
 Would your scheme pair pairs (so that 4 registers would go into 1 
 instruction)?

Hi Mike,
Thanks for bringing up this new method.  I had to admit that I was not
very much into this method at the first glance especially it requires
another pass in cooperation with scheduler.

For the first question, unfortunately, I can't apply the patch to svn
trunk, could you please update it thus I can do some experiment with
it?  From my experience, lots of opportunities on ARM are generated by
RA, putting it before RA would miss many cases.  Another possible
issue is about interaction with other optimizations.  Putting it at
early stage of RTL, we may disturb other optimizations like
/gcse/-fgcse-lm/-fgcse-sm/dse.  Of course, it has advantages too, for
example, fwprop_addr sometimes corrupt load/store pair opportunities
on ARM, it won't be a problem for your patch.

For the second question, the answer is no for current implementation.
For ARM the most important opportunity is the paired one, so I just
started with pair of two instructions, which is much simpler.  I do
have a draft idea how to merge more than two instructions, but haven't
worked on it yet.

Thanks,
bin


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng
On Thu, May 15, 2014 at 6:31 PM, Oleg Endo oleg.e...@t-online.de wrote:
 Hi,

 On 15 May 2014, at 09:26, bin.cheng bin.ch...@arm.com wrote:

 Hi,
 Targets like ARM and AARCH64 support double-word load store instructions,
 and these instructions are generally faster than the corresponding two
 load/stores.  GCC currently uses peephole2 to merge paired load/store into
 one single instruction which has a disadvantage.  It can only handle simple
 cases like the two instructions actually appear sequentially in instruction
 stream, and is too weak to handle cases in which the two load/store are
 intervened by other irrelevant instructions.

 Here comes up with a new GCC pass looking through each basic block and
 merging paired load store even they are not adjacent to each other.  The
 algorithm is pretty simple:
 1) In initialization pass iterating over instruction stream it collects
 relevant memory access information for each instruction.
 2) It iterates over each basic block, tries to find possible paired
 instruction for each memory access instruction.  During this work, it checks
 dependencies between the two possible instructions and also records the
 information indicating how to pair the two instructions.  To avoid quadratic
 behavior of the algorithm, It introduces new parameter
 max-merge-paired-loadstore-distance and set the default value to 4, which is
 large enough to catch major part of opportunities on ARM/cortex-a15.
 3) For each candidate pair, it calls back-end's hook to do target dependent
 check and merge the two instructions if possible.

 Though the parameter is set to 4, for miscellaneous benchmarks, this pass
 can merge numerous opportunities except ones already merged by peephole2
 (same level numbers of opportunities comparing to peepholed ones).  GCC
 bootstrap can also confirm this finding.

 This is interesting.  E.g. on SH there are insns to load/store SFmode pairs.  
 However, these insns require a mode switch and have some constraints on 
 register usage.  So in the SH case the load/store pairing would need to be 
 done before reg alloc and before mode switching.


 Yet there is an open issue about when we should run this new pass.  Though
 register renaming is disabled by default now, I put this pass after it,
 because renaming can resolve some false dependencies thus benefit this pass.
 Another finding is, it can capture a lot more opportunities if it's after
 sched2, but I am not sure whether it will mess up with scheduling results in
 this way.

 How about the following.
 Instead of adding new hooks and inserting the pass to the general pass list, 
 make the new
 pass class take the necessary callback functions directly.  Then targets can 
 just instantiate
 the pass, passing their impl of the callbacks, and insert the pass object 
 into the pass list at
 a place that fits best for the target.
Oh, I don't know we can do this in GCC.  But yes, a target may want to
run it at some place that fits best for the target.

Thanks,
bin



 So, any comments about this?

 Thanks,
 bin


 2014-05-15  Bin Cheng  bin.ch...@arm.com
* common.opt (flag_merge_paired_loadstore): New option.
* merge-paired-loadstore.c: New file.
* Makefile.in: Support new file.
* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
* doc/invoke.texi (-fmerge-paired-loadstore): New.
(max-merge-paired-loadstore-distance): New.
* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
* doc/tm.texi: Regenerated.
* target.def (merge_paired_loadstore): New.
* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
* passes.def (pass_merge_paired_loadstore): New pass.
* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.

 gcc/testsuite/ChangeLog
 2014-05-15  Bin Cheng  bin.ch...@arm.com

* gcc.target/arm/merge-paired-loadstore.c: New test.

 merge-paired-loadstore-20140515.txt



-- 
Best Regards.


Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-05-16 Thread James Greenhalgh
On Fri, Mar 28, 2014 at 03:39:53PM +, James Greenhalgh wrote:
 On Fri, Mar 28, 2014 at 03:09:22PM +, pins...@gmail.com wrote:
   On Mar 28, 2014, at 7:48 AM, James Greenhalgh james.greenha...@arm.com 
   wrote:
   On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote:
   On Mar 28, 2014, at 2:12 AM, James Greenhalgh 
   james.greenha...@arm.com wrote:
   There is no way to perform scalar addition in the vector register file,
   but with the RTX costs in place we start rewriting (x  1) to (x + x)
   on almost all cores. The code which makes this decision has no idea 
   that we
   will end up doing this (it happens well before reload) and so we end up 
   with
   very ugly code generation in the case where addition was selected, but
   we are operating in vector registers.
   
   This patch relies on the same gimmick we are already using to allow
   shifts on 32-bit scalars in the vector register file - Use a vector 32x2
   operation instead, knowing that we can safely ignore the top bits.
   
   This restores some normality to scalar_shift_1.c, however the test
   that we generate a left shift by one is clearly bogus, so remove that.
   
   This patch is pretty ugly, but it does generate superficially better
   looking code for this testcase.
   
   Tested on aarch64-none-elf with no issues.
   
   OK for stage 1?
   
   It seems we should also discourage the neon alternatives as there might 
   be
   extra movement between the two register sets which we don't want.
   
   I see your point, but we've tried to avoid doing that elsewhere in the
   AArch64 backend. Our argument has been that strictly speaking, it isn't 
   that
   the alternative is expensive, it is the movement between the register 
   sets. We
   do model that elsewhere, and the register allocator should already be 
   trying to
   avoid unneccesary moves between register classes.
   
  
  What about on a specific core where that alternative is expensive; that is
  the vector instructions are worse than the scalar ones. How are we going to
  handle this case?
 
 Certainly not by discouraging the alternative for all cores. We would need
 a more nuanced approach which could be tuned on a per-core basis. Otherwise
 we are bluntly and inaccurately pessimizing those cases where we can cheaply
 perform the operation in the vector register file (e.g. we are cleaning up
 loose ends after a vector loop, we have spilled to the vector register
 file, etc.). The register preference mechanism feels the wrong place to
 catch this as it does not allow for that degree of per-core felxibility,
 an alternative is simply disparaged slightly (?, * in LRA) or
 disparaged severely (!).
 
 I would think that we don't want to start polluting the machine description
 trying to hack around this as was done with the ARM backend's
 neon_for_64_bits/avoid_neon_for_64_bits.
 
 How have other targets solved this issue?

Did you have any further thoughts on this? I've pushed the costs patches, so
we will start to see gcc.target/aarch64/scalar_shift_1.c failing without
this or an equivalent patch.

Othereise, *ping*

Thanks,
James

---
gcc/

2014-05-16  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
vector registers.

gcc/testsuite/

2014-05-16  James Greenhalgh  james.greenha...@arm.com

* gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.

   If those mechanisms are broken, we should fix them - in that case fixing
   this by discouraging valid alternatives would seem to be gaffer-taping 
   over the
   real problem.
   
   Thanks,
   James
   
   
   Thanks,
   Andrew
   
   
   Thanks,
   James
   
   ---
   gcc/
   
   2014-03-27  James Greenhalgh  james.greenha...@arm.com
   
 * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
 vector registers.
   
   gcc/testsuite/
   2014-03-27  James Greenhalgh  james.greenha...@arm.com
   
 * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
   0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
   
  
 

--1.8.3-rc0
Content-Type: text/x-patch; 
name=0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; 
filename=0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1068,16 +1068,17 @@ (define_expand addmode3
 
 (define_insn *addsi3_aarch64
   [(set
-(match_operand:SI 0 register_operand =rk,rk,rk)
+(match_operand:SI 0 register_operand =rk,rk,w,rk)
 (plus:SI
- (match_operand:SI 1 register_operand %rk,rk,rk)
- (match_operand:SI 2 aarch64_plus_operand I,r,J)))]
+ (match_operand:SI 1 register_operand 

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 Ping?

Sorry ...

 Best regards,

 Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

   PR tree-optimization/54733
   * expr.c (get_inner_reference): Add a parameter to control whether
 a
   MEM_REF should be split into base + offset.
   * tree.h (get_inner_reference): Default new parameter to false.
   * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
   (CMPNOP): Define.
   (find_bswap_or_nop_load): New.
   (find_bswap_1): Renamed to ...
   (find_bswap_or_nop_1): This. Also add support for memory source.
   (find_bswap): Renamed to ...
   (find_bswap_or_nop): This. Also add support for memory source and
   detection of bitwise operations equivalent to load in host endianness.
   (execute_optimize_bswap): Likewise. Also move its leading
 comment back
   in place and split statement transformation into ...
   (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

   PR tree-optimization/54733
   * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
   optimization to support memory sources and bitwise operations
   equivalent to load in host endianness.
   * gcc.dg/optimize-bswaphi-1.c: Likewise.
   * gcc.dg/optimize-bswapsi-2.c: Likewise.
   * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?

Ok, I now decided otherwise and dislike the new parameter to
get_inner_reference.  Can you please revert that part and just
deal with a MEM_REF result in your only caller?

And (of course) I found another possible issue.  The way you
compute load_type and use it here:

+  /* Perform the load.  */
+  load_offset_ptr = build_int_cst (n-alias_set, 0);
+  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
+ load_offset_ptr);

makes the load always appear aligned according to the mode of
load_type.  On strict-alignment targets this may cause faults.

So what you have to do is either (simpler)

   unsigned int align = get_pointer_alignment (addr_tmp);
   tree al_load_type = load_type;
   if (align  TYPE_ALIGN (load_type))
 al_load_type = build_aligned_type (load_type, align);
...
val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
 load_offset_ptr);

or keep track of the first actual load and use

   unsigned int align = get_object_alignment (that_first_load);

first in the one that corresponds to addr_tmp.  From that there
is a much better chance to derive good alignment values.

Of course on STRICT_ALIGNMENT targets a not aligned load
will be decomposed again, so eventually doing the transformation
may no longer be profitable(?).

Thanks and sorry again for the delay.

Otherwise the patch looks good to me.

Richard.

 Best regards,

 Thomas

  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
  Sent: Monday, May 05, 2014 7:30 PM
  To: GCC Patches
  Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
  load/store
 
  I found a way to improve the function find_bswap/find_bswap_or_nop
  and reduce its size. Please hold for the review, I will post an updated
  version as soon as I finish testing.
 
  Best regards,
 
  Thomas Preud'homme
 
 
 




Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 12:10 PM, Bin.Cheng amker.ch...@gmail.com wrote:
 On Thu, May 15, 2014 at 6:31 PM, Oleg Endo oleg.e...@t-online.de wrote:
 Hi,

 On 15 May 2014, at 09:26, bin.cheng bin.ch...@arm.com wrote:

 Hi,
 Targets like ARM and AARCH64 support double-word load store instructions,
 and these instructions are generally faster than the corresponding two
 load/stores.  GCC currently uses peephole2 to merge paired load/store into
 one single instruction which has a disadvantage.  It can only handle simple
 cases like the two instructions actually appear sequentially in instruction
 stream, and is too weak to handle cases in which the two load/store are
 intervened by other irrelevant instructions.

 Here comes up with a new GCC pass looking through each basic block and
 merging paired load store even they are not adjacent to each other.  The
 algorithm is pretty simple:
 1) In initialization pass iterating over instruction stream it collects
 relevant memory access information for each instruction.
 2) It iterates over each basic block, tries to find possible paired
 instruction for each memory access instruction.  During this work, it checks
 dependencies between the two possible instructions and also records the
 information indicating how to pair the two instructions.  To avoid quadratic
 behavior of the algorithm, It introduces new parameter
 max-merge-paired-loadstore-distance and set the default value to 4, which is
 large enough to catch major part of opportunities on ARM/cortex-a15.
 3) For each candidate pair, it calls back-end's hook to do target dependent
 check and merge the two instructions if possible.

 Though the parameter is set to 4, for miscellaneous benchmarks, this pass
 can merge numerous opportunities except ones already merged by peephole2
 (same level numbers of opportunities comparing to peepholed ones).  GCC
 bootstrap can also confirm this finding.

 This is interesting.  E.g. on SH there are insns to load/store SFmode pairs. 
  However, these insns require a mode switch and have some constraints on 
 register usage.  So in the SH case the load/store pairing would need to be 
 done before reg alloc and before mode switching.


 Yet there is an open issue about when we should run this new pass.  Though
 register renaming is disabled by default now, I put this pass after it,
 because renaming can resolve some false dependencies thus benefit this pass.
 Another finding is, it can capture a lot more opportunities if it's after
 sched2, but I am not sure whether it will mess up with scheduling results in
 this way.

 How about the following.
 Instead of adding new hooks and inserting the pass to the general pass list, 
 make the new
 pass class take the necessary callback functions directly.  Then targets can 
 just instantiate
 the pass, passing their impl of the callbacks, and insert the pass object 
 into the pass list at
 a place that fits best for the target.
 Oh, I don't know we can do this in GCC.  But yes, a target may want to
 run it at some place that fits best for the target.

Btw, the bswap pass enhancements that are currently in review may
also be an opportunity to catch these.  They can merge adjacent
loads that are used composed (but not yet composed by storing
into adjacent memory).  The basic-block vectorizer should also
handle this (if the composition happens to be by storing into
adjacent memory) - of course it needs vector modes available and
it has to be enabled.

Richard.

 Thanks,
 bin



 So, any comments about this?

 Thanks,
 bin


 2014-05-15  Bin Cheng  bin.ch...@arm.com
* common.opt (flag_merge_paired_loadstore): New option.
* merge-paired-loadstore.c: New file.
* Makefile.in: Support new file.
* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
* doc/invoke.texi (-fmerge-paired-loadstore): New.
(max-merge-paired-loadstore-distance): New.
* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
* doc/tm.texi: Regenerated.
* target.def (merge_paired_loadstore): New.
* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
* passes.def (pass_merge_paired_loadstore): New pass.
* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.

 gcc/testsuite/ChangeLog
 2014-05-15  Bin Cheng  bin.ch...@arm.com

* gcc.target/arm/merge-paired-loadstore.c: New test.

 merge-paired-loadstore-20140515.txt



 --
 Best Regards.


Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 1:54 AM, Xinliang David Li davi...@google.com wrote:
 Hi, debugging runtime bugs due to devirtualization can be hard for
 very large C++ programs with complicated class hierarchy. This patch
 adds the support to report this high level transformation via
 -fopt-info (not hidden inside dump file) and the ability the do binary
 search with cutoff.

 Ok for trunk after build and test?

+  else if (dump_enabled_p ())
+   {
+ location_t loc = gimple_location (ie-call_stmt);
+ dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+  Discovered direct call to non-function in %s, 

diagnostics start with lower-case.  Why not merge this with the
dump_file case?  The point of all the infrastructure was to _not_
need to distinguish the cases ...

(similar for the other cases, and IIRC you miss one case in
tree-ssa-pre.c calling ipa_intraprocedural_devirtualization?)

Thanks,
Richard.


 thanks,

 David


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 12:56 PM,  pins...@gmail.com wrote:


 On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com 
 wrote:

 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 Ping?

 Sorry ...

 Best regards,

 Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * expr.c (get_inner_reference): Add a parameter to control whether
 a
  MEM_REF should be split into base + offset.
  * tree.h (get_inner_reference): Default new parameter to false.
  * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
  (CMPNOP): Define.
  (find_bswap_or_nop_load): New.
  (find_bswap_1): Renamed to ...
  (find_bswap_or_nop_1): This. Also add support for memory source.
  (find_bswap): Renamed to ...
  (find_bswap_or_nop): This. Also add support for memory source and
  detection of bitwise operations equivalent to load in host endianness.
  (execute_optimize_bswap): Likewise. Also move its leading
 comment back
  in place and split statement transformation into ...
  (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
  optimization to support memory sources and bitwise operations
  equivalent to load in host endianness.
  * gcc.dg/optimize-bswaphi-1.c: Likewise.
  * gcc.dg/optimize-bswapsi-2.c: Likewise.
  * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?

 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?

 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:

 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n-alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);

 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.

 So what you have to do is either (simpler)

   unsigned int align = get_pointer_alignment (addr_tmp);
   tree al_load_type = load_type;
   if (align  TYPE_ALIGN (load_type))
 al_load_type = build_aligned_type (load_type, align);
 ...
val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
 load_offset_ptr);

 or keep track of the first actual load and use

   unsigned int align = get_object_alignment (that_first_load);

 first in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.

 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).

 Not always decomposed. On MIPS, it should using the load/store left/right 
 instructions for unaligned load/stores which is normally better than 
 decomposed load/stores. So having a cost model would be nice.

Agreed, but I am happy with doing that as a followup.  Btw,
a very simple one would be to reject unaligned
SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
[of course that may be true on MIPS even for the cases where
a reasonable fast unalgined variant exists - nearly no target
defines that macro in a too fancy way]

Richard.

 Thanks,
 Andrew


 Thanks and sorry again for the delay.

 Otherwise the patch looks good to me.

 Richard.

 Best regards,

 Thomas

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Monday, May 05, 2014 7:30 PM
 To: GCC Patches
 Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
 load/store

 I found a way to improve the function find_bswap/find_bswap_or_nop
 and reduce its size. Please hold for the review, I will post an updated
 version as soon as I finish testing.

 Best regards,

 Thomas Preud'homme




Re: [PATCH, libstdc++/61166] overflow when parse number in std::duration operator

2014-05-16 Thread Jonathan Wakely

On 15/05/14 22:54 -0400, Ed Smith-Rowland wrote:

On 05/15/2014 03:03 PM, Jonathan Wakely wrote:

Here's a finished patch to simplify bits/parse_numbers.h

Tested x86_64-linux. Ed, any objection to this version?


This looks great, thanks!


I committed that to trunk, I'll put it on the 4.9 branch too.

Having done that should we actually stop using it as suggested in the 
bug trail? ;-)


I was going to do that, then realised that there's a defect in the
standard where it requires overflow in duration integer literals to be
diagnosed. That's only possible with literal operator templates, so I
think we should keep your _Parse_int code, but apply the attached
change to detect overflow.

As the TODO comment says, it should be sufficient to simply
instantiate integral_constant_Rep, _Val::value to give a diagnostic
when _Rep{_Value::value} is narrowing, but GCC only gives a warning
for it, and that's suppressed in a system header, so I do an explicit
static_assert.  That could be replaced with ...

#pragma GCC diagnostic push
#pragma GCC diagnostic error -Woverflow
#pragma GCC diagnostic error -Wsystem-headers
   templatetypename _Dur, char... _Digits
 constexpr _Dur __check_overflow()
 {
   using _Val = __parse_int::_Parse_int_Digits...;
   using _Rep = typename _Dur::rep;
   return _Dur{integral_constant_Rep, _Val::value::value};
 }
#pragma GCC diagnostic pop

... but I have plans to do that sort of thing more widely, which I'll
deal with another time as part of https://gcc.gnu.org/PR50871 and/or
https://gcc.gnu.org/PR58876 (what do other people think about using
diagnostic pragmas to locally re-enable diagnostics in our headers?)

Tested x86_64-linux, committed to trunk.
commit 361c9b79e0b1c7f2435f5b0b369a139b216dee90
Author: Jonathan Wakely jwak...@redhat.com
Date:   Fri May 16 10:31:38 2014 +0100

	* include/bits/parse_numbers.h (__parse_int::_Number_help): Check for
	overflow.
	* include/std/chrono (chrono_literals::__select_type::_Select_type):
	Remove.
	(chrono_literals::_Checked_integral_constant): Define.
	Simplify UDL operator templates and check for overflow.
	* testsuite/20_util/duration/literals/range.cc: New.

diff --git a/libstdc++-v3/include/bits/parse_numbers.h b/libstdc++-v3/include/bits/parse_numbers.h
index 0a42381a..a29d127 100644
--- a/libstdc++-v3/include/bits/parse_numbers.h
+++ b/libstdc++-v3/include/bits/parse_numbers.h
@@ -193,6 +193,7 @@ namespace __parse_int
   _Pow / (_Base * __valid_digit::value),
   _Digs...;
   using type = __ull_constant_Pow * __digit::value + __next::type::value;
+  static_assert((type::value / _Pow) == __digit::value, overflow);
 };
 
   templateunsigned _Base, unsigned long long _Pow, char _Dig
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index b114e02..39ad5e3 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -787,117 +787,79 @@ _GLIBCXX_END_NAMESPACE_VERSION
   inline namespace chrono_literals
   {
 
-namespace __select_type
-{
-
-  using namespace __parse_int;
-
-  templateunsigned long long _Val, typename _Dur
-	struct _Select_type
-	: conditional
-	_Val = static_castunsigned long long
-		  (numeric_limitstypename _Dur::rep::max()),
-	_Dur, void
-	{
-	  static constexpr typename _Select_type::type
-	value{static_casttypename _Select_type::type(_Val)};
-	};
-
-  templateunsigned long long _Val, typename _Dur
-	constexpr typename _Select_type_Val, _Dur::type
-	_Select_type_Val, _Dur::value;
+templatetypename _Rep, unsigned long long _Val
+  struct _Checked_integral_constant
+  : integral_constant_Rep, static_cast_Rep(_Val)
+  {
+	static_assert(_Checked_integral_constant::value  0
+		   _Checked_integral_constant::value == _Val,
+		  literal value cannot be represented by duration type);
+  };
 
-} // __select_type
+templatetypename _Dur, char... _Digits
+  constexpr _Dur __check_overflow()
+  {
+	using _Val = __parse_int::_Parse_int_Digits...;
+	using _Rep = typename _Dur::rep;
+	// TODO: should be simply integral_constant_Rep, _Val::value
+	// but GCC doesn't reject narrowing conversions to _Rep.
+	using _CheckedVal = _Checked_integral_constant_Rep, _Val::value;
+	return _Dur{_CheckedVal::value};
+  }
 
 constexpr chrono::durationlong double, ratio3600,1
 operatorh(long double __hours)
 { return chrono::durationlong double, ratio3600,1{__hours}; }
 
 template char... _Digits
-  constexpr typename
-  __select_type::_Select_type__select_int::_Select_int_Digits...::value,
-			 chrono::hours::type
+  constexpr chrono::hours
   operatorh()
-  {
-	return __select_type::_Select_type
-			  __select_int::_Select_int_Digits...::value,
-			  chrono::hours::value;
-  }
+  { return __check_overflowchrono::hours, _Digits...(); }
 
 constexpr chrono::durationlong double, ratio60,1
 

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 1:03 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, May 16, 2014 at 12:56 PM,  pins...@gmail.com wrote:


 On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com 
 wrote:

 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 Ping?

 Sorry ...

 Best regards,

 Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * expr.c (get_inner_reference): Add a parameter to control whether
 a
  MEM_REF should be split into base + offset.
  * tree.h (get_inner_reference): Default new parameter to false.
  * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
  (CMPNOP): Define.
  (find_bswap_or_nop_load): New.
  (find_bswap_1): Renamed to ...
  (find_bswap_or_nop_1): This. Also add support for memory source.
  (find_bswap): Renamed to ...
  (find_bswap_or_nop): This. Also add support for memory source and
  detection of bitwise operations equivalent to load in host 
 endianness.
  (execute_optimize_bswap): Likewise. Also move its leading
 comment back
  in place and split statement transformation into ...
  (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
  optimization to support memory sources and bitwise operations
  equivalent to load in host endianness.
  * gcc.dg/optimize-bswaphi-1.c: Likewise.
  * gcc.dg/optimize-bswapsi-2.c: Likewise.
  * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?

 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?

 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:

 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n-alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);

 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.

 So what you have to do is either (simpler)

   unsigned int align = get_pointer_alignment (addr_tmp);
   tree al_load_type = load_type;
   if (align  TYPE_ALIGN (load_type))
 al_load_type = build_aligned_type (load_type, align);
 ...
val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
 load_offset_ptr);

 or keep track of the first actual load and use

   unsigned int align = get_object_alignment (that_first_load);

 first in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.

 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).

 Not always decomposed. On MIPS, it should using the load/store left/right 
 instructions for unaligned load/stores which is normally better than 
 decomposed load/stores. So having a cost model would be nice.

 Agreed, but I am happy with doing that as a followup.  Btw,
 a very simple one would be to reject unaligned
 SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
 [of course that may be true on MIPS even for the cases where
 a reasonable fast unalgined variant exists - nearly no target
 defines that macro in a too fancy way]

Oh, and what happens for

unsigned foo (unsigned char *x)
{
  return x[0]  24 | x[2]  8 | x[3];
}

?  We could do an unsigned int load from x and zero byte 3
with an AND.  Enhancement for a followup, similar to also
considering vector types for the load (also I'm not sure
that uint64_type_node always has non-BLKmode for all
targets).

Richard.

 Richard.

 Thanks,
 Andrew


 Thanks and sorry again for the delay.

 Otherwise the patch looks good to me.

 Richard.

 Best regards,

 Thomas

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Monday, May 05, 2014 7:30 PM
 To: GCC Patches
 Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
 load/store

 I found a way to improve the function 

[PATCH] Fix one testcase from PR61194

2014-05-16 Thread Richard Biener

This fixes the non-regression testcase from PR61194 by also
recognizing COND_EXPRs as sink for bool expressions in vectorizer
pattern detection.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-05-16  Richard Biener  rguent...@suse.de

PR tree-optimization/61194
* tree-vect-patterns.c (adjust_bool_pattern): Also handle
bool patterns ending in a COND_EXPR.

* gcc.dg/vect/pr61194.c: New testcase.

Index: gcc/tree-vect-patterns.c
===
*** gcc/tree-vect-patterns.c(revision 210492)
--- gcc/tree-vect-patterns.c(working copy)
*** adjust_bool_pattern (tree var, tree out_
*** 2889,2895 
   S5  e_b = c_b | d_b;
   S6  f_T = (TYPE) e_b;
  
!where type 'TYPE' is an integral type.
  
 Input:
  
--- 2889,2900 
   S5  e_b = c_b | d_b;
   S6  f_T = (TYPE) e_b;
  
!where type 'TYPE' is an integral type.  Or a similar pattern
!ending in
! 
!  S6  f_Y = e_b ? r_Y : s_Y;
! 
!as results from if-conversion of a complex condition.
  
 Input:
  
*** vect_recog_bool_pattern (vecgimple *st
*** 2966,2971 
--- 2971,3015 
*type_out = vectype;
*type_in = vectype;
stmts-safe_push (last_stmt);
+   if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+  vect_recog_bool_pattern: detected:\n);
+ 
+   return pattern_stmt;
+ }
+   else if (rhs_code == COND_EXPR
+   TREE_CODE (var) == SSA_NAME)
+ {
+   vectype = get_vectype_for_scalar_type (TREE_TYPE (lhs));
+   if (vectype == NULL_TREE)
+   return NULL;
+ 
+   /* Build a scalar type for the boolean result that when
+  vectorized matches the vector type of the result in
+size and number of elements.  */
+   unsigned prec
+   = wi::udiv_trunc (TYPE_SIZE (vectype),
+ TYPE_VECTOR_SUBPARTS (vectype)).to_uhwi ();
+   tree type
+   = build_nonstandard_integer_type (prec,
+ TYPE_UNSIGNED (TREE_TYPE (var)));
+   if (get_vectype_for_scalar_type (type) == NULL_TREE)
+   return NULL;
+ 
+   if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
+   return NULL;
+ 
+   rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
+   lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+   pattern_stmt 
+ = gimple_build_assign_with_ops (COND_EXPR, lhs,
+ build2 (NE_EXPR, boolean_type_node,
+ rhs, build_int_cst (type, 0)),
+ gimple_assign_rhs2 (last_stmt),
+ gimple_assign_rhs3 (last_stmt));
+   *type_out = vectype;
+   *type_in = vectype;
+   stmts-safe_push (last_stmt);
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
   vect_recog_bool_pattern: detected:\n);
Index: gcc/testsuite/gcc.dg/vect/pr61194.c
===
*** gcc/testsuite/gcc.dg/vect/pr61194.c (revision 0)
--- gcc/testsuite/gcc.dg/vect/pr61194.c (working copy)
***
*** 0 
--- 1,43 
+ /* { dg-require-effective-target vect_cond_mixed } */
+ /* { dg-additional-options -ftree-loop-if-convert-stores } */
+ 
+ #include tree-vect.h
+ 
+ static float x[1024];
+ static float y[1024];
+ static float z[1024];
+ static float w[1024];
+ 
+ void __attribute__((noinline,noclone)) barX()
+ {
+   int i;
+   for (i=0; i1024; ++i)
+ z[i] = ((x[i]0)  (w[i]0)) ? z[i] : y[i];
+ }
+ 
+ int main()
+ {
+   int i;
+ 
+   check_vect ();
+ 
+   for (i = 0; i  1024; ++i)
+ {
+   x[i] = -10 + i;
+   w[i] = 100 - i;
+   z[i] = 0.;
+   y[i] = 1.;
+   __asm__ volatile ( : : : memory);
+ }
+ 
+   barX();
+ 
+   for (i = 0; i  1024; ++i)
+ if (z[i] != ((x[i]0  w[i]0) ? 0. : 1.))
+   abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */
+ /* { dg-final { cleanup-tree-dump vect } } */


PR 61084: Use widest_int in sparc_fold_builtin

2014-05-16 Thread Richard Sandiford
This is the second part of PR 61084, which it seems I'd forgotten to post.
pdist calculates a wide result from narrower inputs, so I should have
used widest_int rather than wide_int.

Tested by Rainer on SPARC Solaris.  OK to install?

Thanks,
Richard


gcc/
PR target/61084
* config/sparc/sparc.c (sparc_fold_builtin): Use widest_int rather
than wide_int.

Index: gcc/config/sparc/sparc.c
===
--- gcc/config/sparc/sparc.c2014-05-15 13:49:25.425654068 +0100
+++ gcc/config/sparc/sparc.c2014-05-16 12:56:48.578349758 +0100
@@ -10915,8 +10915,8 @@ sparc_fold_builtin (tree fndecl, int n_a
   TREE_CODE (arg2) == INTEGER_CST)
{
  bool overflow = false;
- wide_int result = arg2;
- wide_int tmp;
+ widest_int result = wi::to_widest (arg2);
+ widest_int tmp;
  unsigned i;
 
  for (i = 0; i  VECTOR_CST_NELTS (arg0); ++i)
@@ -10926,8 +10926,8 @@ sparc_fold_builtin (tree fndecl, int n_a
 
  bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;
 
- tmp = wi::neg (e1, neg1_ovf);
- tmp = wi::add (e0, tmp, SIGNED, add1_ovf);
+ tmp = wi::neg (wi::to_widest (e1), neg1_ovf);
+ tmp = wi::add (wi::to_widest (e0), tmp, SIGNED, add1_ovf);
  if (wi::neg_p (tmp))
tmp = wi::neg (tmp, neg2_ovf);
  else



Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-05-16 Thread Richard Earnshaw
On 08/05/14 18:36, Ian Bolton wrote:
 Hi,
 
 It currently takes 4 instructions to generate certain immediates on
 AArch64 (unless we put them in the constant pool).
 
 For example ...
 
   long long
   beefcafebabe ()
   {
 return 0xBEEFCAFEBABEll;
   }
 
 leads to ...
 
   mov x0, 0x47806
   mov x0, 0xcafe, lsl 16
   mov x0, 0xbeef, lsl 32
   orr x0, x0, -281474976710656
 
 The above case is tackled in this patch by employing MOVN
 to generate the top 32-bits in a single instruction ...
 
   mov x0, -71536975282177
   movk x0, 0xcafe, lsl 16
   movk x0, 0xbabe, lsl 0
 
 Note that where at least two half-words are 0x, existing
 code that does the immediate in two instructions is still used.)
 
 Tested on standard gcc regressions and the attached test case.
 
 OK for commit?

What about:

long long a()
{
  return 0x123456789abcll;
}

long long b()
{
  return 0x123456789abcll;
}

long long c()
{
  return 0x123456789abcll;
}

?

Surely these can also benefit from this sort of optimization, but it
looks as though you only handle the top 16 bits being set.

R.

 
 Cheers,
 Ian
 
 
 2014-05-08  Ian Bolton  ian.bol...@arm.com
 
 gcc/
   * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
   Use MOVN when top-most half-word (and only that half-word)
   is 0x.
 gcc/testsuite/
   * gcc.target/aarch64/movn_1.c: New test.
 
 
 aarch64-movn-exploitation-patch-v5.txt
 
 
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 43a83566..a8e504e 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   }
  }
  
 +  /* Look for case where upper 16 bits are set, so we can use MOVN.  */
 +  if ((val  0xll) == 0xll)
 +{
 +  emit_insn (gen_rtx_SET (VOIDmode, dest,
 +   GEN_INT (~ (~val  (0xll  32);
 +  emit_insn (gen_insv_immdi (dest, GEN_INT (16),
 +  GEN_INT ((val  16)  0x)));
 +  emit_insn (gen_insv_immdi (dest, GEN_INT (0),
 +  GEN_INT (val  0x)));
 +  return;
 +}
 +
   simple_sequence:
first = true;
mask = 0x;
 diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c 
 b/gcc/testsuite/gcc.target/aarch64/movn_1.c
 new file mode 100644
 index 000..cc11ade
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c
 @@ -0,0 +1,27 @@
 +/* { dg-do run } */
 +/* { dg-options -O2 -fno-inline --save-temps } */
 +
 +extern void abort (void);
 +
 +long long
 +foo ()
 +{
 +  /* { dg-final { scan-assembler mov\tx\[0-9\]+, -71536975282177 } } */
 +  return 0xbeefcafebabell;
 +}
 +
 +long long
 +merge4 (int a, int b, int c, int d)
 +{
 +  return ((long long) a  48 | (long long) b  32
 +   | (long long) c  16 | (long long) d);
 +}
 +
 +int main ()
 +{
 +  if (foo () != merge4 (0x, 0xbeef, 0xcafe, 0xbabe))
 +abort ();
 +  return 0;
 +}
 +
 +/* { dg-final { cleanup-saved-temps } } */
 




Re: [PATCH (for next stage 1)] Add return type to gimple function dumps

2014-05-16 Thread Richard Biener
On Tue, Apr 29, 2014 at 5:01 PM, David Malcolm dmalc...@redhat.com wrote:
 On Tue, 2014-04-29 at 11:16 +0200, Richard Biener wrote:
 On Tue, Apr 29, 2014 at 2:58 AM, David Malcolm dmalc...@redhat.com wrote:
  On Thu, 2014-04-24 at 15:46 -0600, Jeff Law wrote:
  On 03/10/14 13:22, David Malcolm wrote:
   Gimple function dumps contain the types of parameters, but not of the
   return type.
  
   The attached patch fixes this omission; here's an example of the
   before/after diff:
   $ diff -up /tmp/pr23401.c.004t.gimple.old /tmp/pr23401.c.004t.gimple.new
   --- /tmp/pr23401.c.004t.gimple.old  2014-03-10 13:40:08.972063541 
   -0400
   +++ /tmp/pr23401.c.004t.gimple.new  2014-03-10 13:39:49.346515464 
   -0400
   @@ -1,3 +1,4 @@
   +int
  (int i)
 {
   int D.1731;
  
  
   Successfully bootstrapped and regrtested on x86_64 Linux (Fedora 20).
  
   A couple of test cases needed tweaking, since they were counting the
   number of occurrences of int in the gimple dump, which thus changed
   for functions returning int (like the one above).
  
   OK for next stage 1?
  Conceptually OK.  As Richi notes, the work here is in fixing up the
  testsuite.  I didn't see a reply to Richi's question, particularly WRT
  the Fortran testsuite.
 
  I'm attaching a revised version of the patch which adds the use of
  TDF_SLIM (though it didn't appear to be necessary in the test I did of a
  function returning a struct).
 
  Successfully bootstrapped  regrtested on x86_64 Linux (Fedora 20),
  using:
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
 
  I didn't see any new failures from this in the testsuite, in particular
  gfortran.sum.  Here's a comparison of the before/after test results,
  generated using my jamais-vu tool [1], with comments added by me
  inline:
 
  Comparing 16 common .sum files
  --
 
   gcc/testsuite/ada/acats/acats.sum : total: 2320 PASS: 2320
   gcc/testsuite/g++/g++.sum : total: 90421 FAIL: 3 PASS: 86969 XFAIL: 445 
  UNSUPPORTED: 3004
   gcc/testsuite/gcc/gcc.sum : total: 110458 FAIL: 45 PASS: 108292 XFAIL: 
  265 XPASS: 33 UNSUPPORTED: 1823
   gcc/testsuite/gfortran/gfortran.sum : total: 45717 PASS: 45600 XFAIL: 52 
  UNSUPPORTED: 65
   gcc/testsuite/gnat/gnat.sum : total: 1255 PASS: 1234 XFAIL: 18 
  UNSUPPORTED: 3
   gcc/testsuite/go/go.sum : total: 7266 PASS: 7258 XFAIL: 1 UNTESTED: 6 
  UNSUPPORTED: 1
   gcc/testsuite/obj-c++/obj-c++.sum : total: 1450 PASS: 1354 XFAIL: 10 
  UNSUPPORTED: 86
   gcc/testsuite/objc/objc.sum : total: 2973 PASS: 2893 XFAIL: 6 
  UNSUPPORTED: 74
   x86_64-unknown-linux-gnu/boehm-gc/testsuite/boehm-gc.sum : total: 13 
  PASS: 12 UNSUPPORTED: 1
   x86_64-unknown-linux-gnu/libatomic/testsuite/libatomic.sum : total: 54 
  PASS: 54
   x86_64-unknown-linux-gnu/libffi/testsuite/libffi.sum : total: 1856 PASS: 
  1801 UNSUPPORTED: 55
   x86_64-unknown-linux-gnu/libgo/libgo.sum : total: 122 PASS: 122
   x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.sum : total: 2420 
  PASS: 2420
   x86_64-unknown-linux-gnu/libitm/testsuite/libitm.sum : total: 30 PASS: 26 
  XFAIL: 3 UNSUPPORTED: 1
   x86_64-unknown-linux-gnu/libjava/testsuite/libjava.sum : total: 2586 
  PASS: 2582 XFAIL: 4
   x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum : total: 
  10265 PASS: 1 XFAIL: 41 UNSUPPORTED: 224
 
  (...i.e. the totals were unchanged between unpatched/patched for all of
  the .sum files; and yes, Fortran was tested.  Should there be a
  gcj.sum?)
 
  Tests that went away in gcc/testsuite/gcc/gcc.sum: 2
  
 
   PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 5
   PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 3
 
  Tests appeared in gcc/testsuite/gcc/gcc.sum: 2
  --
 
   PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 6
   PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 4
 
 
  (...my comparison tool isn't smart enough yet to tie these went
  away/appeared results together; they reflect the fixups from the
  patch).
 
  Tests that went away in gcc/testsuite/go/go.sum: 2
  --
 
   PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
  build) compilation,  -O2 -g
   PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
  build) execution,  -O2 -g
 
  Tests appeared in gcc/testsuite/go/go.sum: 2
  
 
   PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
  build) compilation,  -O2 -g
   PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
  build) execution,  -O2 -g
 
  (...I hand edited the above, this main.go test embeds numerous paths,
  which change between the two builds; so nothing really changed here).
 
 
  Are the above results sane?

 Yes.

  I'm not sure why I didn't see the failures 

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Evgeny Stupachenko
The test uses SSSE3 because of the following restriction in i386.c:

static bool
expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d)
{
  rtx rperm[2][16], vperm, l, h, op, m128;
  unsigned int i, nelt, eltsz;

  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
return false;

Does the following fix ok?

2014-05-16  Evgeny Stupachenko  evstu...@gmail.com

   * gcc.dg/vect/pr52252-ld.c: Fix target for the test.


diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
index 6e3cb52..301433b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
+++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -g -ftree-vectorize -mssse3
-fdump-tree-vect-details { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details } */
+/* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */

 #define byte unsigned char

On Tue, May 13, 2014 at 12:21 PM, Richard Biener rguent...@suse.de wrote:
 On Mon, 12 May 2014, Evgeny Stupachenko wrote:

 The test is on general changes. However I was able to test it on x86 only.
 I see 2 possible solutions:
 1. Set the test for x86 only.
 2. Modify it so that it will pass on sparc-sun-solaris2.

 If 2. is not acceptable I'll create patch for 1.
 Currently I don't see why in0_9 = *in_27 is not supported. Does the
 test fail because of unsupported permutation?

 The test uses

 /* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details
 { target { i?86-*-* x86_64-*-* } } } */

 that's bogus.  You shouldn't add any dg-options.  Instead use proper
 dg-effective-target checks for the mssse3 feature you are using.
 Note that the dg-final checking is applied regardless of the options
 above are applied or not.

 Why does the test only succeed with -mssse3 btw?

 The proper way to restrict the test to a single target is to use

 /* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */

 Sorry for not catching this in the review.

 Richard.


 On Mon, May 12, 2014 at 7:14 PM, Rainer Orth
 r...@cebitec.uni-bielefeld.de wrote:
  Evgeny Stupachenko evstu...@gmail.com writes:
 
  Patch with fixes attached.
  Currently if-structure is as following:
  +  if (count == 3)
  ...
  +  else
  +   {
  + /* If length is not equal to 3 then only power of 2 is 
  supported.  */
  + gcc_assert (exact_log2 (count) != -1);
 
  For stores group I've created another mail thread.
  [...]
  2014-05-06  Evgeny Stupachenko  evstu...@gmail.com
 
 PR tree-optimization/52252
 * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.
 
  This test FAILs on sparc-sun-solaris2.11, both 32 and 64-bit:
 
  FAIL: gcc.dg/vect/pr52252-ld.c scan-tree-dump-times vect vectorized 1 
  loops 1
  FAIL: gcc.dg/vect/pr52252-ld.c -flto -ffat-lto-objects  
  scan-tree-dump-times vect vectorized 1 loops 1
 
  The dumps have
 
  /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/pr52252-ld.c:10:3: 
  note: not vectorized: relevant stmt not supported: in0_9 = *in_27;
  /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/pr52252-ld.c:7:1: 
  note: vectorized 0 loops in function.
 
  Rainer
 
  --
  -
  Rainer Orth, Center for Biotechnology, Bielefeld University



 --
 Richard Biener rguent...@suse.de
 SUSE / SUSE Labs
 SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Rainer Orth
Hi Evgeny,

 Does the following fix ok?

 2014-05-16  Evgeny Stupachenko  evstu...@gmail.com

* gcc.dg/vect/pr52252-ld.c: Fix target for the test.


 diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
 b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
 index 6e3cb52..301433b 100644
 --- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
 +++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
 @@ -1,5 +1,6 @@
  /* { dg-do compile } */
 -/* { dg-options -O2 -g -ftree-vectorize -mssse3
 -fdump-tree-vect-details { target { i?86-*-* x86_64-*-* } } } */
 +/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details } 
 */
 +/* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */

If the test is really x86 specific, move it to gcc.target/i386 and
remove the dg-skip-if.  Otherwise, add an explanation for skipping the
test on other targets to the first arg of dg-skip-if.  This is supposed
to be a comment stating why the test is skipped, not why literally.

Approval or rejection of the testcase is up to the target maintainers.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate

2014-05-16 Thread Ian Lance Taylor
On Fri, May 16, 2014 at 1:03 AM, Svante Signell
svante.sign...@gmail.com wrote:

 For that part of the patch without it the build on GNU/Hurd fails. On
 the other hand SYS_FCNTL is not defined for e.g. GNU/Linux either. This
 is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go:
 func dupCloseOnExec(fd int) (newfd int, err error) {
 if atomic.LoadInt32(tryDupCloexec) == 1  syscall.F_DUPFD_CLOEXEC!=0 {
 r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd),
 syscall.F_DUPFD_CLOEXEC, 0)

 It is yet unknown how the build succeeds on Linux without the SYS_FCNTL
 being defined? Maybe any the conditions above are not met.

On GNU/Linux SYS_FCNTL is defined by the generated file sysinfo.go,
because SYS_fcntl is defined by syscall.h.

Ian


Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Jakub Jelinek
On Fri, May 16, 2014 at 03:11:05PM +0200, Rainer Orth wrote:
 Hi Evgeny,
 
  Does the following fix ok?
 
  2014-05-16  Evgeny Stupachenko  evstu...@gmail.com
 
 * gcc.dg/vect/pr52252-ld.c: Fix target for the test.
 
 
  diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
  b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
  index 6e3cb52..301433b 100644
  --- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
  +++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
  @@ -1,5 +1,6 @@
   /* { dg-do compile } */
  -/* { dg-options -O2 -g -ftree-vectorize -mssse3
  -fdump-tree-vect-details { target { i?86-*-* x86_64-*-* } } } */
  +/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details 
  } */
  +/* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */
 
 If the test is really x86 specific, move it to gcc.target/i386 and
 remove the dg-skip-if.  Otherwise, add an explanation for skipping the
 test on other targets to the first arg of dg-skip-if.  This is supposed
 to be a comment stating why the test is skipped, not why literally.

Well, I don't see anything i?86/x86_64 specific on the test.  What
is specific is the -mssse3, which supposedly should be added through
/* { dg-additional-options -mssse3 { target { i?86-*-* x86_64-*-* } } } */
and then perhaps the test might not necessarily be vectorized (so the
dg-final line may need target guard as well.
But, I see no reason not to try to compile this on other targets.

Jakub


Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Evgeny Stupachenko
I'm not sure about other architectures. I can test it on x86. Most
likely it will pass on Arm, but again I'm not sure that { arm*-*-* }
is ok.

The test is on general changes. So we can wait for others and if there
are no more objections leave sparc-sun-solaris2 as target to skip.

Or change to the following:
/* { dg-skip-if The test should pass on x86, other architectures are
untested { ! { x86_64-*-* i?86-*-* } } } */

So that other will add their targets if necessary.

Thanks,
Evgeny


Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Rainer Orth
Evgeny Stupachenko evstu...@gmail.com writes:

 The test is on general changes. So we can wait for others and if there
 are no more objections leave sparc-sun-solaris2 as target to skip.

If so, use sparc*-*-* instead.

 Or change to the following:
 /* { dg-skip-if The test should pass on x86, other architectures are
 untested { ! { x86_64-*-* i?86-*-* } } } */

 So that other will add their targets if necessary.

I wouldn't do it this way, because this will never happen.  Rather,
start with all targets and skip or xfail if necessary, adding an
explanation why this is necessary.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] [PING^2] Fix for PR libstdc++/60758

2014-05-16 Thread Alexey Merzlyakov

On 07.05.2014 13:28, Ramana Radhakrishnan wrote:

On 05/07/14 09:19, Yury Gribov wrote:

 Original Message 
Subject: [PING] [PATCH] Fix for PR libstdc++/60758
Date: Thu, 17 Apr 2014 17:48:12 +0400
From: Alexey Merzlyakov alexey.merzlya...@samsung.com
To: Ramana Radhakrishnan ramra...@arm.com
CC: gcc-patches@gcc.gnu.org gcc-patches@gcc.gnu.org, Viacheslav
Garbuzov v.garbu...@samsung.com, Yury Gribov y.gri...@samsung.com

Hi,

This fixes infinite backtrace in __cxa_end_cleanup().
Regtest was finished with no regressions on arm-linux-gnueabi(sf).

The patch posted at:
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html


This is OK to apply if no regressions.

Thanks,
Ramana



Thanks in advance.

Best regards,
Merzlyakov Alexey





I have re-tested it again on arm-linux-gnueabi(sf) - no regressions.
The change was committed to mainline as revision 210515.

Best regards,
Merzlyakov Alexey



Re: avx runtime check

2014-05-16 Thread Mike Stump
On May 15, 2014, at 11:52 PM, Richard Biener richard.guent...@gmail.com wrote:
 On May 16, 2014 4:47:11 AM CEST, Mike Stump mikest...@comcast.net wrote:
 This reorders the avx checks and gates on a target triplet check before
 compiling any code.
 
 Can you explain why?

Sure, because check_avx_hw_available runs [istarget x86_64-*-*] || [istarget 
i?86-*-*] before doing anything, like compiling:

   typedef double __m512d __attribute__ ((__vector_size__ (64)));

__m512d _mm512_add (__m512d a)
{
  return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
}

with -mavx512f, which my target doesn’t have, but even running a compilation of 
that seems wrong.  The other possibility would be to add in a:

# If this is not the right target then we can skip the test.

  
if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
expr 0
} else {

into the test for check_effective_target_avx512f.


proc check_effective_target_avx512f { } {
return [check_no_compiler_messages avx512f object {
typedef double __m512d __attribute__ ((__vector_size__ (64)));

__m512d _mm512_add (__m512d a)
{
  return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
}
} -O2 -mavx512f ]
}

proc check_avx_hw_available { } {
return [check_cached_effective_target avx_hw_available {
# If this is not the right target then we can skip the test.

  
if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
expr 0
} else {
check_runtime_nocache avx_hw_available {
#include cpuid.h  

  
int main ()
{
  unsigned int eax, ebx, ecx, edx;
  if (__get_cpuid (1, eax, ebx, ecx, edx))
return ((ecx  (bit_AVX | bit_OSXSAVE))
!= (bit_AVX | bit_OSXSAVE));
  return 1;
}
} 
}
}]
}


 diff --git a/gcc/testsuite/lib/target-supports.exp
 b/gcc/testsuite/lib/target-supports.exp
 index 40b5414..103a28a 100644
 --- a/gcc/testsuite/lib/target-supports.exp
 +++ b/gcc/testsuite/lib/target-supports.exp
 @@ -1353,8 +1353,8 @@ proc check_effective_target_sse2_runtime { } {
 # Return 1 if the target supports running AVX executables, 0 otherwise.
 
 proc check_effective_target_avx_runtime { } {
 -if { [check_effective_target_avx]
 - [check_avx_hw_available]
 +if { [check_avx_hw_available]
 + [check_effective_target_avx]
 [check_avx_os_support_available] } {
   return 1
}


[PATCH][ARM] Vectorize bswap[16,32,64] operations

2014-05-16 Thread Kyrill Tkachov

Hi all,

This is the aarch32 version of 
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00850.html that allows us to 
(auto-)vectorise the __builtin_bswap[16,32,64] functions using the vrev 
instructions.
For that we create some new NEON builtins and get 
arm_builtin_vectorized_function to map to them when asked to vectorise the 
corresponding builtin.


The tests for this were added with the aarch64 patch mentioned above but were 
disabled for arm. This patch enables them now that we support the operations (of 
course they now pass on arm)


Tested arm-none-eabi and bootstrapped on arm-none-linux-gnueabihf.

Ok for trunk?


Thanks,
Kyrill

2014-05-16  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/neon.md (neon_bswapmode): New pattern.
* config/arm/arm.c (neon_itype): Add NEON_BSWAP.
(arm_init_neon_builtins): Handle NEON_BSWAP.
Define required type nodes.
(arm_expand_neon_builtin): Handle NEON_BSWAP.
(arm_builtin_vectorized_function): Handle BUILTIN_BSWAP builtins.
* config/arm/arm_neon_builtins.def (bswap): Define builtins.
* config/arm/iterators.md (VDQHSD): New mode iterator.

2014-05-16  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* lib/target-supports.exp (check_effective_target_vect_bswap):
Specify arm*-*-* support.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1e44080..6d430a2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23064,6 +23064,7 @@ typedef enum {
   NEON_BINOP,
   NEON_TERNOP,
   NEON_UNOP,
+  NEON_BSWAP,
   NEON_GETLANE,
   NEON_SETLANE,
   NEON_CREATE,
@@ -23531,14 +23532,19 @@ arm_init_neon_builtins (void)
 
   tree V8QI_type_node;
   tree V4HI_type_node;
+  tree V4UHI_type_node;
   tree V4HF_type_node;
   tree V2SI_type_node;
+  tree V2USI_type_node;
   tree V2SF_type_node;
   tree V16QI_type_node;
   tree V8HI_type_node;
+  tree V8UHI_type_node;
   tree V4SI_type_node;
+  tree V4USI_type_node;
   tree V4SF_type_node;
   tree V2DI_type_node;
+  tree V2UDI_type_node;
 
   tree intUQI_type_node;
   tree intUHI_type_node;
@@ -23634,16 +23640,26 @@ arm_init_neon_builtins (void)
   const_intDI_pointer_node = build_pointer_type (const_intDI_node);
   const_float_pointer_node = build_pointer_type (const_float_node);
 
+  /* Unsigned integer types for various mode sizes.  */
+  intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
+  intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
+  intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
+  intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
+  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
   /* Now create vector types based on our NEON element types.  */
   /* 64-bit vectors.  */
   V8QI_type_node =
 build_vector_type_for_mode (neon_intQI_type_node, V8QImode);
   V4HI_type_node =
 build_vector_type_for_mode (neon_intHI_type_node, V4HImode);
+  V4UHI_type_node =
+build_vector_type_for_mode (intUHI_type_node, V4HImode);
   V4HF_type_node =
 build_vector_type_for_mode (neon_floatHF_type_node, V4HFmode);
   V2SI_type_node =
 build_vector_type_for_mode (neon_intSI_type_node, V2SImode);
+  V2USI_type_node =
+build_vector_type_for_mode (intUSI_type_node, V2SImode);
   V2SF_type_node =
 build_vector_type_for_mode (neon_float_type_node, V2SFmode);
   /* 128-bit vectors.  */
@@ -23651,19 +23667,18 @@ arm_init_neon_builtins (void)
 build_vector_type_for_mode (neon_intQI_type_node, V16QImode);
   V8HI_type_node =
 build_vector_type_for_mode (neon_intHI_type_node, V8HImode);
+  V8UHI_type_node =
+build_vector_type_for_mode (intUHI_type_node, V8HImode);
   V4SI_type_node =
 build_vector_type_for_mode (neon_intSI_type_node, V4SImode);
+  V4USI_type_node =
+build_vector_type_for_mode (intUSI_type_node, V4SImode);
   V4SF_type_node =
 build_vector_type_for_mode (neon_float_type_node, V4SFmode);
   V2DI_type_node =
 build_vector_type_for_mode (neon_intDI_type_node, V2DImode);
-
-  /* Unsigned integer types for various mode sizes.  */
-  intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
-  intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
-  intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
-  intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
-  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
+  V2UDI_type_node =
+build_vector_type_for_mode (intUDI_type_node, V2DImode);
 
 
   (*lang_hooks.types.register_builtin_type) (intUQI_type_node,
@@ -23741,8 +23756,6 @@ arm_init_neon_builtins (void)
 
   if (TARGET_CRYPTO  TARGET_HARD_FLOAT)
   {
-tree V4USI_type_node =
-  build_vector_type_for_mode (intUSI_type_node, V4SImode);
 
 tree V16UQI_type_node =
   build_vector_type_for_mode (intUQI_type_node, V16QImode);
@@ -24106,6 +24119,31 @@ arm_init_neon_builtins (void)
 	ftype = build_function_type_list (return_type, 

Re: [Committed] [PATCH,*/2] shrink wrap a function with a single loop: split live_edge

2014-05-16 Thread Dominique Dhumieres
 One of the two commits breaks several fortran tests in 32 bit mode, ...

It it is not the default you need to compile the failing tests with 
-mtune=core2 or -mtune=corei7.

Dominique


[patch] fix recent libstdc++ test failures

2014-05-16 Thread Jonathan Wakely

Tested x86_64-linux, -m32 and -m64, committed to trunk.

commit c5265e76b75cbfa0744c354f6d5aaf98c327ff8e
Author: Jonathan Wakely jwak...@redhat.com
Date:   Fri May 16 13:26:18 2014 +0100

	* testsuite/20_util/make_unsigned/requirements/typedefs-1.cc: Fix
	test for 32-bit target.
	* testsuite/20_util/make_unsigned/requirements/typedefs-2.cc:
	Likewise.
	* testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc:
	Fix test.

diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc
index a893ede..d9f13aa 100644
--- a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc
+++ b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc
@@ -49,7 +49,7 @@ void test01()
 
 #ifdef _GLIBCXX_USE_WCHAR_T
   typedef make_unsignedvolatile wchar_t::type  	test23_type;
-  static_assert(is_sametest23_type, volatile unsigned wchar_t::value, );
+  static_assert(is_unsignedtest23_type::value, );
 #endif
 
   // Chapter 48, chapter 20. Smallest rank such that new unsigned type
diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc
index 637b0c7..807e2bf 100644
--- a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc
+++ b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc
@@ -30,6 +30,7 @@ void test01()
 {
   using std::make_unsigned;
   using std::is_same;
+  using std::is_unsigned;
 
   // Positive tests.
   typedef make_unsignedconst unsigned int::type  	test2_type;
@@ -50,7 +51,7 @@ void test01()
 
 #ifdef _GLIBCXX_USE_WCHAR_T
   typedef make_unsignedvolatile wchar_t::type  	test23_type;
-  static_assert(is_sametest23_type, volatile unsigned wchar_t::value, );
+  static_assert(is_unsignedtest23_type::value, );
 #endif
 
   typedef make_unsignedtest_enum::type  	  test24_type;
diff --git a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc
index 26fc3d0..dd52a70 100644
--- a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc
+++ b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc
@@ -33,8 +33,7 @@ test01()
   bool test __attribute__((unused)) = true;
 
   const std::tr1::shared_ptrA p1(new A);
-  p1.reset(); // { dg-error discards qualifiers }
-  // { dg-error no match  { target *-*-* } 36 }
+  p1.reset(); // { dg-error no match }
 
   return 0;
 }


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Steven Bosscher
On Fri, May 16, 2014 at 12:51 PM, Richard Biener wrote:
 Btw, the bswap pass enhancements that are currently in review may
 also be an opportunity to catch these.  They can merge adjacent
 loads that are used composed (but not yet composed by storing
 into adjacent memory).  The basic-block vectorizer should also
 handle this (if the composition happens to be by storing into
 adjacent memory) - of course it needs vector modes available and
 it has to be enabled.

That won't work for cleaning up spill code load/reloads, which is one
of the motivations for the new pass.

Ciao!
Steven


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Ramana Radhakrishnan

 Btw, the bswap pass enhancements that are currently in review may
 also be an opportunity to catch these.  They can merge adjacent
 loads that are used composed (but not yet composed by storing
 into adjacent memory).  The basic-block vectorizer should also
 handle this (if the composition happens to be by storing into
 adjacent memory) - of course it needs vector modes available and
 it has to be enabled.

Should we really do it there ?  If we start merging multiple loads and
stores into the vector register set, on some architectures or
microarchitectures the cost of moving to and from the vector register
set to the general purpose register set might be too expensive for
some operations and then we get into all sorts of issues.

I think there is merit in an RTL pass.

Ramana


 Richard.

 Thanks,
 bin



 So, any comments about this?

 Thanks,
 bin


 2014-05-15  Bin Cheng  bin.ch...@arm.com
* common.opt (flag_merge_paired_loadstore): New option.
* merge-paired-loadstore.c: New file.
* Makefile.in: Support new file.
* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
* doc/invoke.texi (-fmerge-paired-loadstore): New.
(max-merge-paired-loadstore-distance): New.
* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
* doc/tm.texi: Regenerated.
* target.def (merge_paired_loadstore): New.
* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
* passes.def (pass_merge_paired_loadstore): New pass.
* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.

 gcc/testsuite/ChangeLog
 2014-05-15  Bin Cheng  bin.ch...@arm.com

* gcc.target/arm/merge-paired-loadstore.c: New test.

 merge-paired-loadstore-20140515.txt



 --
 Best Regards.


[PATCH 0/5] let gdb reuse gcc'c C compiler

2014-05-16 Thread Tom Tromey
Hi!

This patch series is half of a project to let gdb reuse gcc (which
half depends on which list you are seeing this on), so that users can
compile small snippets of code and evaluate them in the current
context of the inferior.

This first series implements this idea for C.  A user can compile a
code snippet and it will be inserted into the inferior and evaluated.
Declarations needed by the snippet are supplied by gdb, and there is a
bit of magic so that the snippets can refer to local variables in the
current frame.

The new command allows for arbitrary code to be inserted -- not just
expressions.  For example:

(gdb) compile code int i; for (i = 0; i  3; ++i) printf (#%d\n, i)
#0
#1
#2

This series supplies a gcc plugin to do most of the work, so that any
gcc crashes -- seen during development due to translation bugs -- do
not also crash gdb.

The interface between gdb and gcc is defined by a few files added to
include/.

There is a new shared library which gdb loads in order to communicate
with the gcc plugin.  This library communicates with the gcc plugin
using a simple, ad-hoc RPC mechanism.

This shared library exports a single public function which is used to
instantiate any needed objects.  This makes it simple to version the
API and avoid undue synchronization between gcc and gdb.

We think the plugin is best suited to be put into the gcc repository
because it is coupled more tightly to gcc than to gdb.

To try it out, just build gcc and gdb with the patches applied.  Then
set your PATH and LD_LIBRARY_PATH to point to the right subdirectories
of the new gcc install directory.

In later series we plan to extend this functionality; either on the
gcc side, say by writing a similar plugin for C++; or on the gdb side,
say by making it possible to compile breakpoint conditions.  However,
we haven't yet decided exactly which future projects we will tackle or
in what order.




[PATCH 1/5] export finish_bitfield_layout from stor-layout

2014-05-16 Thread Tom Tromey
The gdb plugin handles some aspects of type layout on its own.  It
does this because it knows the layout of types, but not the path by
which the layout was determined -- so normal gcc things like
TYPE_PACKED cannot be used.

This patch exposes one bit of stor-layout so it can be used by the
plugin.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Tom Tromey  tro...@redhat.com

* stor-layout.c (finish_bitfield_layout): Now public.  Change
argument type to 'tree'.
(finish_record_layout): Update.
* stor-layout.h (finish_bitfield_layout): Declare.
---
 gcc/ChangeLog |  8 
 gcc/stor-layout.c | 12 ++--
 gcc/stor-layout.h |  1 +
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index 8fa4dc8..ad057d9 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -1931,10 +1931,10 @@ finish_bitfield_representative (tree repr, tree field)
 }
 
 /* Compute and set FIELD_DECLs for the underlying objects we should
-   use for bitfield access for the structure laid out with RLI.  */
+   use for bitfield access for the structure T.  */
 
-static void
-finish_bitfield_layout (record_layout_info rli)
+void
+finish_bitfield_layout (tree t)
 {
   tree field, prev;
   tree repr = NULL_TREE;
@@ -1943,10 +1943,10 @@ finish_bitfield_layout (record_layout_info rli)
  we could use the underlying type as hint for the representative
  if the bitfield would fit and the representative would not exceed
  the union in size.  */
-  if (TREE_CODE (rli-t) != RECORD_TYPE)
+  if (TREE_CODE (t) != RECORD_TYPE)
 return;
 
-  for (prev = NULL_TREE, field = TYPE_FIELDS (rli-t);
+  for (prev = NULL_TREE, field = TYPE_FIELDS (t);
field; field = DECL_CHAIN (field))
 {
   if (TREE_CODE (field) != FIELD_DECL)
@@ -2033,7 +2033,7 @@ finish_record_layout (record_layout_info rli, int free_p)
   finalize_type_size (rli-t);
 
   /* Compute bitfield representatives.  */
-  finish_bitfield_layout (rli);
+  finish_bitfield_layout (rli-t);
 
   /* Propagate TYPE_PACKED to variants.  With C++ templates,
  handle_packed_attribute is too early to do this.  */
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 0ff98f8..cd58672 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -35,6 +35,7 @@ extern tree rli_size_so_far (record_layout_info);
 extern void normalize_rli (record_layout_info);
 extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
+extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
 extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
-- 
1.9.0



[PATCH 3/5] introduce the binding oracle

2014-05-16 Thread Tom Tromey
gdb wants to supply any declarations that may be referred to by the
user's code.  Hooking into symbol lookup was an efficient way to
accomplish this.

This patch introducing a binding oracle that is consulted whenever a
symbol binding is looked up for the first time.  The oracle is just a
global function pointer.  If it is NULL, no special work is done.  It
is called with the identifier to supply and with an enum argument
indicating the kind of binding being requested.  The oracle can then
call back into the C front end (via the new functions c_pushtag and
c_bind) to supply a binding; or it can silently do nothing if the
request could not be fulfilled.

The code caches Whether the oracle has been called to avoid repeated
useless queries.

There is a little hack in c_print_identifier to avoid calling the
binding oracle here.  This makes debugging gcc in the presence of the
plugin remain relatively sane -- without this, calling debug_tree or
the like can confusingly call into the plugin.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Tom Tromey  tro...@redhat.com

* c-tree.h (enum c_oracle_request): New.
(c_binding_oracle_function): New typedef.
(c_binding_oracle, c_pushtag, c_bind): Declare.
* c-decl.c (c_binding_oracle): New global.
(I_SYMBOL_CHECKED): New macro.
(i_symbol_binding): New function.
(I_SYMBOL_BINDING, I_SYMBOL_DECL): Redefine.
(I_TAG_CHECKED): New macro.
(i_tag_binding): New function.
(I_TAG_BINDING, I_TAG_DECL): Redefine.
(I_LABEL_CHECKED): New macro.
(i_label_binding): New function.
(I_LABEL_BINDING, I_LABEL_DECL): Redefine.
(c_print_identifier): Save and restore c_binding_oracle.
(c_pushtag, c_bind): New functions.
---
 gcc/c/ChangeLog |  19 +++
 gcc/c/c-decl.c  | 161 ++--
 gcc/c/c-tree.h  |  24 +
 3 files changed, 189 insertions(+), 15 deletions(-)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index d52dcc9..b391add 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -215,21 +215,6 @@ struct GTY((chain_next (%h.prev))) c_binding {
 #define B_IN_FILE_SCOPE(b) ((b)-depth == 1 /*file_scope-depth*/)
 #define B_IN_EXTERNAL_SCOPE(b) ((b)-depth == 0 /*external_scope-depth*/)
 
-#define I_SYMBOL_BINDING(node) \
-  (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))-symbol_binding)
-#define I_SYMBOL_DECL(node) \
- (I_SYMBOL_BINDING(node) ? I_SYMBOL_BINDING(node)-decl : 0)
-
-#define I_TAG_BINDING(node) \
-  (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))-tag_binding)
-#define I_TAG_DECL(node) \
- (I_TAG_BINDING(node) ? I_TAG_BINDING(node)-decl : 0)
-
-#define I_LABEL_BINDING(node) \
-  (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))-label_binding)
-#define I_LABEL_DECL(node) \
- (I_LABEL_BINDING(node) ? I_LABEL_BINDING(node)-decl : 0)
-
 /* Each C symbol points to three linked lists of c_binding structures.
These describe the values of the identifier in the three different
namespaces defined by the language.  */
@@ -245,6 +230,96 @@ struct GTY(()) lang_identifier {
 extern char C_SIZEOF_STRUCT_LANG_IDENTIFIER_isnt_accurate
 [(sizeof(struct lang_identifier) == C_SIZEOF_STRUCT_LANG_IDENTIFIER) ? 1 : -1];
 
+/* The binding oracle; see c-tree.h.  */
+void (*c_binding_oracle) (enum c_oracle_request, tree identifier);
+
+/* This flag is set on an identifier if we have previously asked the
+   binding oracle for this identifier's symbol binding.  */
+#define I_SYMBOL_CHECKED(node) \
+  (TREE_LANG_FLAG_4 (IDENTIFIER_NODE_CHECK (node)))
+
+static inline struct c_binding* *
+i_symbol_binding (tree node)
+{
+  struct lang_identifier *lid
+= (struct lang_identifier *) IDENTIFIER_NODE_CHECK (node);
+
+  if (lid-symbol_binding == NULL
+   c_binding_oracle != NULL
+   !I_SYMBOL_CHECKED (node))
+{
+  /* Set the checked flag first, to avoid infinite recursion
+when the binding oracle calls back into gcc.  */
+  I_SYMBOL_CHECKED (node) = 1;
+  c_binding_oracle (C_ORACLE_SYMBOL, node);
+}
+
+  return lid-symbol_binding;
+}
+
+#define I_SYMBOL_BINDING(node) (*i_symbol_binding (node))
+
+#define I_SYMBOL_DECL(node) \
+ (I_SYMBOL_BINDING(node) ? I_SYMBOL_BINDING(node)-decl : 0)
+
+/* This flag is set on an identifier if we have previously asked the
+   binding oracle for this identifier's tag binding.  */
+#define I_TAG_CHECKED(node) \
+  (TREE_LANG_FLAG_5 (IDENTIFIER_NODE_CHECK (node)))
+
+static inline struct c_binding **
+i_tag_binding (tree node)
+{
+  struct lang_identifier *lid
+= (struct lang_identifier *) IDENTIFIER_NODE_CHECK (node);
+
+  if (lid-tag_binding == NULL
+   c_binding_oracle != NULL
+   !I_TAG_CHECKED (node))
+{
+  /* Set the checked flag first, to avoid infinite recursion
+when the binding oracle calls back into gcc.  */
+  I_TAG_CHECKED (node) = 1;
+  c_binding_oracle (C_ORACLE_TAG, node);
+

[PATCH 4/5] add gcc/gdb interface files

2014-05-16 Thread Tom Tromey
The gcc plugin is split into two parts.  One part is an ordinary gcc
plugin.  The other part is a shared library that is loaded by gdb.

This patch adds some files that define the interface exported by this
shared library to gdb.  These files also define the internal API by
which the gdb- and gcc-sides communicate.

These files will be kept in sync between gcc and gdb like much of
include/.

The exported API has been intentionally kept very simple.  In
particular only a single function is exported from the gdb-side
library; symbol visibility is used to hide everything else.  This
exported symbol is a function which is called to return a structure
holding function pointers that gdb then uses.  This structure is
versioned so that changes can be made without necessarily requiring a
simultaneous gdb upgrade.

Note that the C compiler API is broken out separately.  This lets us
extend it to other GCC front ends as desired.  We plan to investigate
C++ in the future.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Jan Kratochvil  jan.kratoch...@redhat.com
Tom Tromey  tro...@redhat.com

* gcc-c-fe.def: New file.
* gcc-c-interface.h: New file.
* gcc-interface.h: New file.
---
 include/ChangeLog |   8 ++
 include/gcc-c-fe.def  | 195 
 include/gcc-c-interface.h | 220 ++
 include/gcc-interface.h   | 120 +
 4 files changed, 543 insertions(+)
 create mode 100644 include/gcc-c-fe.def
 create mode 100644 include/gcc-c-interface.h
 create mode 100644 include/gcc-interface.h

diff --git a/include/gcc-c-fe.def b/include/gcc-c-fe.def
new file mode 100644
index 000..dc5a41a
--- /dev/null
+++ b/include/gcc-c-fe.def
@@ -0,0 +1,195 @@
+/* Interface between GCC C FE and GDB  -*- c -*-
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see http://www.gnu.org/licenses/.  */
+
+
+
+/* Create a new decl in GCC.  A decl is a declaration, basically a
+   kind of symbol.
+
+   NAME is the name of the new symbol.  SYM_KIND is the kind of
+   symbol being requested.  SYM_TYPE is the new symbol's C type;
+   except for labels, where this is not meaningful and should be
+   zero.  If SUBSTITUTION_NAME is not NULL, then a reference to this
+   decl in the source will later be substituted with a dereference
+   of a variable of the given name.  Otherwise, for symbols having
+   an address (e.g., functions), ADDRESS is the address.  FILENAME
+   and LINE_NUMBER refer to the symbol's source location.  If this
+   is not known, FILENAME can be NULL and LINE_NUMBER can be 0.
+   This function returns the new decl.  */
+
+GCC_METHOD7 (gcc_decl, build_decl,
+const char */* name */,
+enum gcc_c_symbol_kind /* sym_kind */,
+gcc_type /* sym_type */,
+const char */* substitution_name */,
+gcc_address /* address */,
+const char */* filename */,
+unsigned int /* line_number */)
+
+/* Insert a GCC decl into the symbol table.  DECL is the decl to
+   insert.  IS_GLOBAL is true if this is an outermost binding, and
+   false if it is a possibly-shadowing binding.  */
+
+GCC_METHOD2 (int /* bool */, bind, gcc_decl /* decl */,
+int /* bool */ /* is_global */)
+
+/* Insert a tagged type into the symbol table.  NAME is the tag name
+   of the type and TAGGED_TYPE is the type itself.  TAGGED_TYPE must
+   be either a struct, union, or enum type, as these are the only
+   types that have tags.  FILENAME and LINE_NUMBER refer to the type's
+   source location.  If this is not known, FILENAME can be NULL and
+   LINE_NUMBER can be 0.  */
+
+GCC_METHOD4 (int /* bool */, tagbind,
+const char */* name */,
+gcc_type /* tagged_type */,
+const char * /* filename */,
+unsigned int /* line_number */)
+
+/* Return the type of a pointer to a given base type.  */
+
+GCC_METHOD1 (gcc_type, build_pointer_type,
+gcc_type /* base_type */)
+
+/* Create a new 'struct' type.  Initially it has no fields.  */
+
+GCC_METHOD0 (gcc_type, build_record_type)
+
+/* Create a new 'union' type.  Initially it has no fields.  */
+
+GCC_METHOD0 (gcc_type, build_union_type)
+
+/* Add a field to a struct or union type.  FIELD_NAME is 

Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Xinliang David Li
On Fri, May 16, 2014 at 4:00 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, May 16, 2014 at 1:54 AM, Xinliang David Li davi...@google.com wrote:
 Hi, debugging runtime bugs due to devirtualization can be hard for
 very large C++ programs with complicated class hierarchy. This patch
 adds the support to report this high level transformation via
 -fopt-info (not hidden inside dump file) and the ability the do binary
 search with cutoff.

 Ok for trunk after build and test?

 +  else if (dump_enabled_p ())
 +   {
 + location_t loc = gimple_location (ie-call_stmt);
 + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
 +  Discovered direct call to non-function in %s, 
 

 diagnostics start with lower-case.  Why not merge this with the
 dump_file case?  The point of all the infrastructure was to _not_
 need to distinguish the cases ...


Right -- however in this case I don't want the node 'order' to leak
into the opt report. Should we drop it?


 (similar for the other cases, and IIRC you miss one case in
 tree-ssa-pre.c calling ipa_intraprocedural_devirtualization?)

Good catch. Will add it.

thanks,

David

 Thanks,
 Richard.


 thanks,

 David


[PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Tom Tromey
In the typical case, when compiling a snippet of user code, gdb wraps
the user's text in a dummy function.

It's somewhat odd for users if an error in their code is mentioned as
coming from this dummy function.

This patch makes it possible to suppress the function-name display in
a straightforward way: it adds a new global which the plugin can set
to declare the name of the dummy function.

This patch seems like a bit of a hack, but there didn't seem to be a
notably cleaner approach.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Tom Tromey  tro...@redhat.com

* c-lang.c (c_diagnostic_ignored_function): New global.
(c_print_error_function): New function.
(LANG_HOOKS_PRINT_ERROR_FUNCTION): Define.
* c-lang.h (c_diagnostic_ignored_function): Declare.
---
 gcc/c/ChangeLog |  8 
 gcc/c/c-lang.c  | 22 ++
 gcc/c/c-lang.h  |  4 
 3 files changed, 34 insertions(+)

diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
index 97c0443..e563813 100644
--- a/gcc/c/c-lang.c
+++ b/gcc/c/c-lang.c
@@ -35,6 +35,26 @@ along with GCC; see the file COPYING3.  If not see
 
 enum c_language_kind c_language = clk_c;
 
+// If non-zero, this names a function which should not be reported in
+// a diagnostic.  This is used by the gdb plugin to avoid showing the
+// generated function name to the user.
+const char *c_diagnostic_ignored_function;
+
+// An implementation of the print_error_function langhook that
+// respects C_DIAGNOSTIC_IGNORED_FUNCTION.
+static void
+c_print_error_function (diagnostic_context *context, const char *file,
+   diagnostic_info *diagnostic)
+{
+  if (c_diagnostic_ignored_function != NULL
+   current_function_decl != NULL_TREE
+   DECL_NAME (current_function_decl) != NULL_TREE
+   strcmp (IDENTIFIER_POINTER (DECL_NAME (current_function_decl)),
+c_diagnostic_ignored_function) == 0)
+return;
+  lhd_print_error_function (context, file, diagnostic);
+}
+
 /* Lang hooks common to C and ObjC are declared in c-objc-common.h;
consequently, there should be very few hooks below.  */
 
@@ -44,6 +64,8 @@ enum c_language_kind c_language = clk_c;
 #define LANG_HOOKS_INIT c_objc_common_init
 #undef LANG_HOOKS_INIT_TS
 #define LANG_HOOKS_INIT_TS c_common_init_ts
+#undef LANG_HOOKS_PRINT_ERROR_FUNCTION
+#define LANG_HOOKS_PRINT_ERROR_FUNCTION c_print_error_function
 
 /* Each front end provides its own lang hook initializer.  */
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
index 7fcf333..022206f 100644
--- a/gcc/c/c-lang.h
+++ b/gcc/c/c-lang.h
@@ -59,4 +59,8 @@ struct GTY(()) language_function {
attribute lists.  */
 extern GTY(()) int current_omp_declare_target_attribute;
 
+/* If non-zero, the name of a function whose name should not be
+   reported in a diagnostic.  */
+extern const char *c_diagnostic_ignored_function;
+
 #endif /* ! GCC_C_LANG_H */
-- 
1.9.0



[C++ Patch] PR 51640

2014-05-16 Thread Paolo Carlini

Hi,

the main issue in the bug report is luckily already fixed in mainline 
and 4.9 (had to do with a commit of mine ;( but I think we can do a 
little better in the diagnostic: avoid the final redundant does not 
name a type error after the error message about ambiguity. Tested 
x86_64-linux.


Thanks,
Paolo.

//
/cp
2014-05-16  Paolo Carlini  paolo.carl...@oracle.com

PR c++/51640
* parser.c (cp_parser_diagnose_invalid_type_name): Early return
when cp_parser_lookup_name sets ambiguous_decls.

/testsuite
2014-05-16  Paolo Carlini  paolo.carl...@oracle.com

PR c++/51640
* g++.dg/parse/error54.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 210512)
+++ cp/parser.c (working copy)
@@ -2880,13 +2880,21 @@ cp_parser_diagnose_invalid_type_name (cp_parser *p
  tree scope, tree id,
  location_t location)
 {
-  tree decl, old_scope;
+  tree decl, old_scope, ambiguous_decls;
   cp_parser_commit_to_tentative_parse (parser);
   /* Try to lookup the identifier.  */
   old_scope = parser-scope;
   parser-scope = scope;
-  decl = cp_parser_lookup_name_simple (parser, id, location);
+  decl = cp_parser_lookup_name (parser, id, none_type,
+   /*is_template=*/false,
+   /*is_namespace=*/false,
+   /*check_dependency=*/true,
+   ambiguous_decls, location);
   parser-scope = old_scope;
+  if (ambiguous_decls)
+/* If the lookup was ambiguous, an error will already have
+   been issued.  */
+return;
   /* If the lookup found a template-name, it means that the user forgot
   to specify an argument list. Emit a useful error message.  */
   if (TREE_CODE (decl) == TEMPLATE_DECL)
Index: testsuite/g++.dg/parse/error54.C
===
--- testsuite/g++.dg/parse/error54.C(revision 0)
+++ testsuite/g++.dg/parse/error54.C(working copy)
@@ -0,0 +1,19 @@
+// PR c++/51640
+
+class ex {};
+
+namespace t
+{
+  class ex2 : public ex {};
+}
+
+class ex2 : public ex {};
+
+void bar()
+{
+  using namespace t;
+
+  try {
+  } catch (ex2) { // { dg-error reference to 'ex2' is ambiguous }
+  }
+}


Re: [PATCH] Implement -fsanitize=float-cast-overflow

2014-05-16 Thread Joseph S. Myers
On Fri, 16 May 2014, Jakub Jelinek wrote:

 has variable.  Guess the combination of IBM long double and __int128_t
 (for long long it is still ok) is rare enough that we don't need to care
 about it (alternative would be e.g. to compare against addition of

There are existing problems with that combination anyway.  
fp-int-convert-timode.c has tests disabled for it because the compile-time 
and runtime conversions don't agree (whether for this issue or some other, 
I don't know).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Jan Hubicka
 Hi, debugging runtime bugs due to devirtualization can be hard for
 very large C++ programs with complicated class hierarchy. This patch
 adds the support to report this high level transformation via
 -fopt-info (not hidden inside dump file) and the ability the do binary
 search with cutoff.
 
 Ok for trunk after build and test?

Seems resonable to me.
 
 thanks,
 
 David

 Index: ChangeLog
 ===
 --- ChangeLog (revision 210479)
 +++ ChangeLog (working copy)
 @@ -1,3 +1,18 @@
 +2014-05-15  Xinliang David Li  davi...@google.com
 +
 + * cgraphunit.c (walk_polymorphic_call_targets): Add
 + dbgcnt and fopt-info support.
 + 2014-05-15  Xinliang David Li  davi...@google.com
 +
 + * cgraphunit.c (walk_polymorphic_call_targets): Add
 + dbgcnt and fopt-info support.
 + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto.
 + * ipa-devirt.c (ipa_devirt): Ditto.
 + * ipa.c (walk_polymorphic_call_targets): Ditto.
 + * gimple-fold.c (fold_gimple_assign): Ditto.
 + (gimple_fold_call): Ditto.
 + * dbgcnt.def: New counter.
 +
  2014-05-15  Martin Jambor  mjam...@suse.cz
  
   PR ipa/61085
 Index: ipa-prop.c
 ===
 --- ipa-prop.c(revision 210479)
 +++ ipa-prop.c(working copy)
 @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.
  #include ipa-utils.h
  #include stringpool.h
  #include tree-ssanames.h
 +#include dbgcnt.h
  
  /* Intermediate information about a parameter that is only useful during the
 run of ipa_analyze_node and is not kept afterwards.  */
 @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c
   fprintf (dump_file, ipa-prop: Discovered direct call to 
 non-function
in %s/%i, making it unreachable.\n,
ie-caller-name (), ie-caller-order);
 +  else if (dump_enabled_p ())
 + {
 +   location_t loc = gimple_location (ie-call_stmt);
 +   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
 +Discovered direct call to non-function in %s, 
 +making it unreachable\n, ie-caller-name ());

Perhaps turning it to __builtin_unreachable call and similarly in the other 
cases
we introduce __builtin_unreachable? I think that could be easier for user to 
work
out.

What king of problems in devirtualizatoin you are seeing?


Honza


Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Joseph S. Myers
On Fri, 16 May 2014, Tom Tromey wrote:

 In the typical case, when compiling a snippet of user code, gdb wraps
 the user's text in a dummy function.
 
 It's somewhat odd for users if an error in their code is mentioned as
 coming from this dummy function.
 
 This patch makes it possible to suppress the function-name display in
 a straightforward way: it adds a new global which the plugin can set
 to declare the name of the dummy function.
 
 This patch seems like a bit of a hack, but there didn't seem to be a
 notably cleaner approach.

I'd say this global actually belongs somewhere in the diagnostic_context 
(i.e., instead of the diagnostic_context_auxiliary_data (DC) actually 
being a tree as it is at present, it should point to a structure with 
whatever extra information clients wish to use to control aspects of 
diagnostic reporting).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Xinliang David Li
On Fri, May 16, 2014 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi, debugging runtime bugs due to devirtualization can be hard for
 very large C++ programs with complicated class hierarchy. This patch
 adds the support to report this high level transformation via
 -fopt-info (not hidden inside dump file) and the ability the do binary
 search with cutoff.

 Ok for trunk after build and test?

 Seems resonable to me.

 thanks,

 David

 Index: ChangeLog
 ===
 --- ChangeLog (revision 210479)
 +++ ChangeLog (working copy)
 @@ -1,3 +1,18 @@
 +2014-05-15  Xinliang David Li  davi...@google.com
 +
 + * cgraphunit.c (walk_polymorphic_call_targets): Add
 + dbgcnt and fopt-info support.
 + 2014-05-15  Xinliang David Li  davi...@google.com
 +
 + * cgraphunit.c (walk_polymorphic_call_targets): Add
 + dbgcnt and fopt-info support.
 + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto.
 + * ipa-devirt.c (ipa_devirt): Ditto.
 + * ipa.c (walk_polymorphic_call_targets): Ditto.
 + * gimple-fold.c (fold_gimple_assign): Ditto.
 + (gimple_fold_call): Ditto.
 + * dbgcnt.def: New counter.
 +
  2014-05-15  Martin Jambor  mjam...@suse.cz

   PR ipa/61085
 Index: ipa-prop.c
 ===
 --- ipa-prop.c(revision 210479)
 +++ ipa-prop.c(working copy)
 @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.
  #include ipa-utils.h
  #include stringpool.h
  #include tree-ssanames.h
 +#include dbgcnt.h

  /* Intermediate information about a parameter that is only useful during the
 run of ipa_analyze_node and is not kept afterwards.  */
 @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c
   fprintf (dump_file, ipa-prop: Discovered direct call to 
 non-function
in %s/%i, making it unreachable.\n,
ie-caller-name (), ie-caller-order);
 +  else if (dump_enabled_p ())
 + {
 +   location_t loc = gimple_location (ie-call_stmt);
 +   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
 +Discovered direct call to non-function in %s, 
 +making it unreachable\n, ie-caller-name ());

 Perhaps turning it to __builtin_unreachable call and similarly in the other 
 cases
 we introduce __builtin_unreachable? I think that could be easier for user to 
 work
 out.

Ok.


 What king of problems in devirtualizatoin you are seeing?

I have been chasing a runtime failure of a very large test built with
gcc-4_9. The bad code either calls a pure function or turn a virtual
call into __builtin_unreachable (incomplete target set). The indirect
info shows the otr type to be !maybe_derived_type, and the outer-type
gets cleared during inline update. I isolated a small test case -- but
the good news is that gcc-4_9 @head already fixed the problem.

I will  check in the test case to trunk later.

thanks,

David



 Honza


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Jeff Law

On 05/16/14 04:07, Bin.Cheng wrote:

On Fri, May 16, 2014 at 1:13 AM, Jeff Law l...@redhat.com wrote:

On 05/15/14 10:51, Mike Stump wrote:


On May 15, 2014, at 12:26 AM, bin.cheng bin.ch...@arm.com wrote:


Here comes up with a new GCC pass looking through each basic block
and merging paired load store even they are not adjacent to each
other.



So I have a target that has load and store multiple support that
supports large a number of registers (2-n registers), and I added a
sched0 pass that is a light copy of the regular scheduling pass that
uses a different cost function which arranges all loads first, then
all stores then everything else.  Within a group of loads or stores
the secondary key is the base register, the next key is the offset.
The net result, all loads off the same register are sorted in
increasing order.


Glad to see someone else stumble on (ab)using the scheduler to do this.

Emm, If it's (ab)using, should we still do it then?
I think it'd still be fine.  There's even been a comment about doing 
this kind of thing in the scheduler that's been around since the early 
90s...


The scheduler is a bit interesting in that it has a wealth of dependency 
information and the ability to reorganize the insn stream in relatively 
arbitrary ways.  That seems to make it a natural place to think about 
transformations of this nature.  We just haven't had a good 
infrastructure for doing that.


In theory we're a lot closer now to being able to plug in different 
costing/sorting models and let the scheduler do its thing.  Those models 
might rewrite for register pressure, or encourage certain independent 
insns to issue back-to-back to encourage combining, or to build 
candidate insns for delay slot scheduling, etc.



As Mike stated, merging of consecutive memory accesses is all about
the base register and the offset. I am thinking another method
collecting all memory accesses with same base register then doing the
merge work.  In this way, we should be able to merge more than 2
instructions, also it would be possible to remove redundant load
instructions in one pass.

My question is how many these redundant loads could be?  Is there any
rtl pass responsible for this now?
I suspect it's a lot less important now than it used to be.  But there's 
probably some cases where it'd be useful.  Combining sub-word accesses 
into full-word accesses come immediately to mind.


I'm not aware of any pass which does these kind of changes in a general 
form.  Some passes (caller-save) do a fair amount of work to track when 
they can generate multi-object loads/stores (and it was a huge win back 
on the old sparc processors).



jeff


Re: [PATCH (for next stage 1)] Add return type to gimple function dumps

2014-05-16 Thread David Malcolm
On Fri, 2014-05-16 at 14:59 +0200, Richard Biener wrote:
 On Tue, Apr 29, 2014 at 5:01 PM, David Malcolm dmalc...@redhat.com wrote:
  On Tue, 2014-04-29 at 11:16 +0200, Richard Biener wrote:
  On Tue, Apr 29, 2014 at 2:58 AM, David Malcolm dmalc...@redhat.com wrote:
   On Thu, 2014-04-24 at 15:46 -0600, Jeff Law wrote:
   On 03/10/14 13:22, David Malcolm wrote:
Gimple function dumps contain the types of parameters, but not of the
return type.
   
The attached patch fixes this omission; here's an example of the
before/after diff:
$ diff -up /tmp/pr23401.c.004t.gimple.old 
/tmp/pr23401.c.004t.gimple.new
--- /tmp/pr23401.c.004t.gimple.old  2014-03-10 13:40:08.972063541 
-0400
+++ /tmp/pr23401.c.004t.gimple.new  2014-03-10 13:39:49.346515464 
-0400
@@ -1,3 +1,4 @@
+int
   (int i)
  {
int D.1731;
   
   
Successfully bootstrapped and regrtested on x86_64 Linux (Fedora 20).
   
A couple of test cases needed tweaking, since they were counting the
number of occurrences of int in the gimple dump, which thus changed
for functions returning int (like the one above).
   
OK for next stage 1?
   Conceptually OK.  As Richi notes, the work here is in fixing up the
   testsuite.  I didn't see a reply to Richi's question, particularly WRT
   the Fortran testsuite.
  
   I'm attaching a revised version of the patch which adds the use of
   TDF_SLIM (though it didn't appear to be necessary in the test I did of a
   function returning a struct).
  
   Successfully bootstrapped  regrtested on x86_64 Linux (Fedora 20),
   using:
 --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
  
   I didn't see any new failures from this in the testsuite, in particular
   gfortran.sum.  Here's a comparison of the before/after test results,
   generated using my jamais-vu tool [1], with comments added by me
   inline:
  
   Comparing 16 common .sum files
   --
  
gcc/testsuite/ada/acats/acats.sum : total: 2320 PASS: 2320
gcc/testsuite/g++/g++.sum : total: 90421 FAIL: 3 PASS: 86969 XFAIL: 445 
   UNSUPPORTED: 3004
gcc/testsuite/gcc/gcc.sum : total: 110458 FAIL: 45 PASS: 108292 XFAIL: 
   265 XPASS: 33 UNSUPPORTED: 1823
gcc/testsuite/gfortran/gfortran.sum : total: 45717 PASS: 45600 XFAIL: 
   52 UNSUPPORTED: 65
gcc/testsuite/gnat/gnat.sum : total: 1255 PASS: 1234 XFAIL: 18 
   UNSUPPORTED: 3
gcc/testsuite/go/go.sum : total: 7266 PASS: 7258 XFAIL: 1 UNTESTED: 6 
   UNSUPPORTED: 1
gcc/testsuite/obj-c++/obj-c++.sum : total: 1450 PASS: 1354 XFAIL: 10 
   UNSUPPORTED: 86
gcc/testsuite/objc/objc.sum : total: 2973 PASS: 2893 XFAIL: 6 
   UNSUPPORTED: 74
x86_64-unknown-linux-gnu/boehm-gc/testsuite/boehm-gc.sum : total: 13 
   PASS: 12 UNSUPPORTED: 1
x86_64-unknown-linux-gnu/libatomic/testsuite/libatomic.sum : total: 54 
   PASS: 54
x86_64-unknown-linux-gnu/libffi/testsuite/libffi.sum : total: 1856 
   PASS: 1801 UNSUPPORTED: 55
x86_64-unknown-linux-gnu/libgo/libgo.sum : total: 122 PASS: 122
x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.sum : total: 2420 
   PASS: 2420
x86_64-unknown-linux-gnu/libitm/testsuite/libitm.sum : total: 30 PASS: 
   26 XFAIL: 3 UNSUPPORTED: 1
x86_64-unknown-linux-gnu/libjava/testsuite/libjava.sum : total: 2586 
   PASS: 2582 XFAIL: 4
x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum : total: 
   10265 PASS: 1 XFAIL: 41 UNSUPPORTED: 224
  
   (...i.e. the totals were unchanged between unpatched/patched for all of
   the .sum files; and yes, Fortran was tested.  Should there be a
   gcj.sum?)
  
   Tests that went away in gcc/testsuite/gcc/gcc.sum: 2
   
  
PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 5
PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 3
  
   Tests appeared in gcc/testsuite/gcc/gcc.sum: 2
   --
  
PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 6
PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 4
  
  
   (...my comparison tool isn't smart enough yet to tie these went
   away/appeared results together; they reflect the fixups from the
   patch).
  
   Tests that went away in gcc/testsuite/go/go.sum: 2
   --
  
PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
   build) compilation,  -O2 -g
PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
   build) execution,  -O2 -g
  
   Tests appeared in gcc/testsuite/go/go.sum: 2
   
  
PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
   build) compilation,  -O2 -g
PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
   build) execution,  -O2 -g
  
   (...I hand edited the above, this main.go test 

Re: [patch,mips] avoid invalid register for JALR

2014-05-16 Thread Maciej W. Rozycki
On Wed, 14 May 2014, Sandra Loosemore wrote:

   When I was trying to benchmark another patch (which I'll be sending
   along shortly) with CSiBE for -mabi=64, I ran into an assembler error
   like this:
   
   /tmp/ccJv2faG.s: Assembler messages:
   /tmp/ccJv2faG.s:1605: Error: a destination register must be supplied
   `jalr $31'
  
  JALR patterns should have an explicit clobber of $31, which I thought
  was also supposed to stop $31 from being used as the call address.
 
 Hm.  Yes, that ought to work, in theory
 
  Do you have a testcase?
 
 I can reproduce the error in a current mipsisa64-elfoabi build, with my patch
 to delete ADJUST_REG_ALLOC_ORDER applied.  It triggers on this file from
 CSiBE:
 
 mipsisa64-elfoabi-gcc -c -mabi=64 -O2 -fno-common -w
 csibe/src/./ttt-0.10.1.preproc/src/connect4.i

 I wonder if there's something fishy going on here.  I checked output 
produced with -dP and the offending instruction is emitted like this:

 #(call_insn 172 124 161 (parallel [
 #(call (mem:SI (reg:DI 31 $31) [0 c4_setup S4 A32])
 #(const_int 0 [0]))
 #(clobber (reg:SI 31 $31))
 #]) c4_new.i:79 594 {call_internal}
 # (expr_list:REG_DEAD (reg:DI 31 $31)
 #(expr_list:REG_DEAD (reg:DI 7 $7)
 #(expr_list:REG_DEAD (reg:DI 6 $6)
 #(expr_list:REG_DEAD (reg:DI 5 $5)
 #(expr_list:REG_DEAD (reg:DI 4 $4)
 #(nil))
 #(expr_list:DI (use (reg:DI 4 $4))
 #(expr_list:SI (use (reg:DI 5 $5))
 #(expr_list:SI (use (reg:DI 6 $6))
 #(expr_list:SI (use (reg:DI 7 $7))
 #(nil))
jalr$31  # 172  call_internal/1 [length = 4]

so clearly the clobber is ignored, or perhaps rather considered a late 
clobber instead of an early clobber that's indeed required here.

 I have reduced your case a bit and attached the result to this e-mail.  
With this code I can reproduce the problem using the following command:

$ mips-sde-elf-gcc -S -dP -mabi=64 -O2 -fno-common -w -o c4_new.s c4_new.i

and current trunk with the patch you recently posted as archived at:

https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01016.html

applied.  With the patch reverted the issue goes away ($17 is used for the 
jump), so clearly the register allocation order override made in 
mips_order_regs_for_local_alloc is currently covering an underlying bug.

  Maciejtypedef unsigned int size_t;

extern void *malloc (size_t __size);

typedef char bool;
typedef struct _Board {
  struct _Board *(*copy) (const struct _Board *);  
  void (*setup) (struct _Board *, int, int, int);
  void (*display) (struct _Board *);  
  int (*eval) (struct _Board *);
  void (*score) (struct _Board *, int *, int *);  
  bool (*full) (struct _Board *);  
  int (*winner) (struct _Board *); 
 
  bool (*valid_move) (struct _Board *, int);  
  bool (*move) (struct _Board *, int, int);   
  void (*unmove) (struct _Board *);   
  int (*getmove) (struct _Board *, int, 
int *, int *);  
  void (*help) (void);  
  char (*symbol) (struct _Board *, int, int);   
  void (*coords) (struct _Board *, int, char *);  
  int rows, cols;
  int squares;   
  int *moves;  
  int nummoves;  
  int X_player;  
  int *board;   
  int **points;  
  int numpoints;
  int depth, depth2;   
  int *center;  
} Board;

Board *c4_copy (const Board *);
void c4_setup (Board *, int, int, int);
void c4_display (Board *);
int c4_eval (Board *);
inline bool c4_full (Board *);
int c4_winner (Board *);
bool c4_valid_move (Board *, int);
bool c4_move (Board *, int, int);
void c4_unmove (Board *);
int c4_getmove (Board *, int, int *, int *);
void c4_help (void);
char c4_symbol (Board *, int, int);
void c4_coords (Board *, int, char *);

Board *c4_new (int players, int size, int depth)
{
  Board *T = (Board *) malloc (sizeof (Board));
  Board *B = (Board *)T;
  int **points;
  int *board;
  int i, j, point = 0;
  int MyPE = 0;
  int rows, cols;

  B-copy = c4_copy;
  B-setup = c4_setup;
  B-display = c4_display;
  B-eval = c4_eval;
  B-full = c4_full;
  B-winner = c4_winner;
  B-valid_move = c4_valid_move;
  B-move = c4_move;
  B-unmove = c4_unmove;
  B-getmove = c4_getmove;
  B-help = c4_help;
  B-symbol = c4_symbol;
  B-coords = c4_coords;
  B-score = ((void *)0) ;

  rows = B-rows = 6;
  cols = B-cols = 7;
  B-squares = 7;  

  if (MyPE == 0)
c4_setup ((Board *)T, players, depth, 0);

  B-board = (int *) malloc (rows*cols * sizeof(int));

  B-moves = (int *) malloc (rows*cols * sizeof(int));
  B-nummoves = 0;

  for (i = 0; i  rows*cols; i++)
B-board[i] = B-moves[i] = 0;

  B-numpoints = rows*(cols-3) + cols*(rows-3) + 2*(rows-3)*(cols-3);
  points = (int **) malloc (B-numpoints * 4 * sizeof(int *));
  board = B-board;
   
  for (j = 0; j  rows - 3; j++) {
for (i = 0; i  cols; i++, point++) {
  points[point*4+0] = board + 

Re: [PATCH 0/5] let gdb reuse gcc'c C compiler

2014-05-16 Thread Mike Stump
On May 16, 2014, at 8:26 AM, Tom Tromey tro...@redhat.com wrote:
 This patch series is half of a project to let gdb reuse gcc (which
 half depends on which list you are seeing this on), so that users can
 compile small snippets of code and evaluate them in the current
 context of the inferior.

Nice…  I looked though it, nice and lightweight and seems like the api can hold 
up through time.

Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Manuel López-Ibáñez
 This patch makes it possible to suppress the function-name display in
 a straightforward way: it adds a new global which the plugin can set
 to declare the name of the dummy function.

 This patch seems like a bit of a hack, but there didn't seem to be a
 notably cleaner approach.

Can't you override the diagnostic_starter() in your plugin? This way
you can even customize it to print gdb in the prefix (if you wish to
provide a custom prefix). If that is possible, that seems way cleaner.
If not, I wonder why not.

Otherwise, why not override the lang_hook itself? This way the plugin
can provide its own lang_hook so that your proposed
c_print_error_function lives in the plugin itself.

I'm probably missing something obvious here.

Cheers,

Manuel.


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Jeff Law

On 05/16/14 04:07, Bin.Cheng wrote:


Yes, I think this one does have a good reason.  The target independent
pass just makes sure that two consecutive memory access instructions
are free of data-dependency with each other, then feeds it to back-end
hook.  It's back-end's responsibility to generate correct instruction.
But given these two memory access insns, there's only a couple ways 
they're likely to combine into a single insn.  We could just as easily 
have the target independent code construct a new insn then try to 
recognize it.  If it's not recognized, then try the other way.


Or is it the case that we're doing something beyond upsizing the mode?



  It's not about modifying an existing insn then recognize it, it's
about creating new instruction sometimes.  For example, we can
generate a simple move insn in Arm mode, while have to generate a
parallel instruction in Thumb mode.  Target independent part has no
idea how to generate an expected insn.  Moreover, back-end may check
some special conditions too.
But can't you go through movXX to generate either the simple insn on the 
ARM or the PARALLEL on the thumb?


Jeff


Re: [C++ Patch] PR 51640

2014-05-16 Thread Jason Merrill

OK.

Jason


Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Jan Hubicka
 
 I have been chasing a runtime failure of a very large test built with
 gcc-4_9. The bad code either calls a pure function or turn a virtual
 call into __builtin_unreachable (incomplete target set). The indirect
 info shows the otr type to be !maybe_derived_type, and the outer-type
 gets cleared during inline update. I isolated a small test case -- but
 the good news is that gcc-4_9 @head already fixed the problem.
 
 I will  check in the test case to trunk later.

Good, testcase would be welcome.  I guess it was the fix for placement_new bug.
It disables some valid devirtualizations (and I thus may revisit the fix for
4.10), so it would be good to know if your testcase differs from the original
PR one.

Honza
 
 thanks,
 
 David
 
 
 
  Honza


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Mike Stump
On May 16, 2014, at 3:07 AM, Bin.Cheng amker.ch...@gmail.com wrote:
 
 I don't see how regrename will help resolve [base+offset] false
 dependencies. Can you explain? I'd expect effects from
 hardreg-copyprop commoning a base register.
 It's the register operand's false dependency, rather than the base's
 one.  Considering below simple case:
mov r1,  #const1
store r1, [base+offset1]
mov r1, #const2
store r1, [base_offset2]
 It should be renamed into:
mov r1,  #const1
store r1, [base+offset1]
mov r2, #const2
store r2, [base_offset2]

Ah, but, what did this look like right before pass_web?


Re: we are starting the wide int merge

2014-05-16 Thread Gerald Pfeifer
On Sat, 10 May 2014, Gerald Pfeifer wrote:
 Since (at least) 16:40 UTC that day my i386-unknown-freebsd10.0 builds
 fail as follows:
 
   Comparing stages 2 and 3
   warning: gcc/cc1obj-checksum.o differs
   warning: gcc/cc1-checksum.o differs
   warning: gcc/cc1plus-checksum.o differs
   Bootstrap comparison failure!
   gcc/fold-const.o differs
   gcc/simplify-rtx.o differs
   gcc/tree-ssa-ccp.o differs
 
 (FreeBSD/i386 really builds for i486, but retains the original name;
 I'm traveling with limited access, but would not be surprised for this
 to also show up for i386-*-linux-gnu or i486-*-linux-gnu.)

Is anybody able to reproduce this, for example on a GNU/Linux system?

This tester of mine hasn't been able to bootstrap for nearly a week,
and timing-wise it would be really a coincidence were this not due to
wide-int.

Gerald


Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-16 Thread Eric Botcazou
OK - here is your version of the patch, extended with a comment which
 I think is helpful for other people reading the code, and with the
 changes to builtins.c and md.texi removed, since the size of the buffer
 is not changing.
 
Is this version OK to apply ?

Yes, IMO that's fine, thanks.

-- 
Eric Botcazou


Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-16 Thread Eric Botcazou
 But, fails whenever the size of the mode of the save area is bigger than a
 certain amount…  On my port, the size taken up by the  save area is large
 enough to cause this to fail.  :-(

That's a bit unexpected, why do you need so big a save area exactly?  The only 
architecture for which this doesn't work is the IA-64, which is a very special 
beast...  In this case, the way out is to define DONT_USE_BUILTIN_SETJMP and 
JMP_BUF_SIZE to the needed size.

-- 
Eric Botcazou


Re: [PATCH] Fix ARM NAN fraction bits

2014-05-16 Thread Maciej W. Rozycki
On Thu, 27 Feb 2014, Joey Ye wrote:

 Current ARM soft-float implementation is violating the RTABI
 (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pd
 f) Section 4.1.1.1:
 
 When not otherwise specified by IEEE 754, the result on an invalid operation
 should be the quiet NaN bit pattern with only the most significant bit of
 the significand set, and all other significand bits zero. 
 
 This patch fixes it by setting _FP_NANFRAC_* to zero.
 
 Ran make check test with –mfloat-abi=soft. No regression.
 
 OK to checkin?
 
 2014-02-27  Joey Ye  joey...@arm.com
 * sysdeps/arm/soft-fp/sfp-machine.h 
   (_FP_NANFRAC_S, _FP_NANFRAC_D, _FP_NANFRAC_Q):
   Set to zero.
 
 
 diff --git a/sysdeps/arm/soft-fp/sfp-machine.h
 b/sysdeps/arm/soft-fp/sfp-machine.h
 index 52a08b5..32697fe 100644
 --- a/sysdeps/arm/soft-fp/sfp-machine.h
 +++ b/sysdeps/arm/soft-fp/sfp-machine.h
 @@ -21,9 +21,9 @@
 #define _FP_DIV_MEAT_D(R,X,Y)  _FP_DIV_MEAT_2_udiv(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)  _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
 -#define _FP_NANFRAC_S  ((_FP_QNANBIT_S  1) - 1)
 -#define _FP_NANFRAC_D ((_FP_QNANBIT_D  1) - 1),
 -1
 -#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q  1) - 1),
 -1, -1, -1
 +#define _FP_NANFRAC_S 0
 +#define _FP_NANFRAC_D    0, 0
 +#define _FP_NANFRAC_Q    0, 0, 0, 0
 #define _FP_NANSIGN_S   0
 #define _FP_NANSIGN_D  0
 #define _FP_NANSIGN_Q  0

 This did regrettably, when propagated to libgcc, regress 
gcc.dg/torture/builtin-math-7.c on soft-fp arm-eabi targets, currently 
ARMv6-M (`-march=armv6-m -mthumb') only.  This is because these NANFRAC 
macros have now no bits set and as a result when used to construct a NaN 
in the semi-raw mode, they build an infinity instead.  Consequently 
operations such as (Inf - Inf) now produce Inf rather than NaN.  The 
change worked for the original test case, because division is made in the 
canonical mode, where the quiet bit is set separately, from the fp class.

 Here's a fix making code match the commit description quoted above, that 
is set the most significant bit of the significand.  This is also what 
targets similar in this respect do.

 OK to apply?  OK for libgcc (against libgcc/config/arm/sfp-machine.h), in 
particular for GCC 4.8 and 4.9?

2014-05-16  Maciej W. Rozycki  ma...@codesourcery.com

PR libgcc/60166
* sysdeps/arm/soft-fp/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D)
(_FP_NANSIGN_Q): Set the quiet bit.

  Maciej

glibc-soft-fp-arm-nanfrac.diff
Index: glibc-fsf-trunk-quilt/sysdeps/arm/soft-fp/sfp-machine.h
===
--- glibc-fsf-trunk-quilt.orig/sysdeps/arm/soft-fp/sfp-machine.h
2014-05-16 03:25:52.0 +0100
+++ glibc-fsf-trunk-quilt/sysdeps/arm/soft-fp/sfp-machine.h 2014-05-16 
03:31:34.451805339 +0100
@@ -21,9 +21,9 @@
 #define _FP_DIV_MEAT_D(R,X,Y)  _FP_DIV_MEAT_2_udiv(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)  _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
-#define _FP_NANFRAC_S  0
-#define _FP_NANFRAC_D  0, 0
-#define _FP_NANFRAC_Q  0, 0, 0, 0
+#define _FP_NANFRAC_S  _FP_QNANBIT_S
+#define _FP_NANFRAC_D  _FP_QNANBIT_D, 0
+#define _FP_NANFRAC_Q  _FP_QNANBIT_Q, 0, 0, 0
 #define _FP_NANSIGN_S  0
 #define _FP_NANSIGN_D  0
 #define _FP_NANSIGN_Q  0


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread pinskia


 On May 16, 2014, at 4:13 AM, Richard Biener richard.guent...@gmail.com 
 wrote:
 
 On Fri, May 16, 2014 at 1:03 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, May 16, 2014 at 12:56 PM,  pins...@gmail.com wrote:
 
 
 On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com 
 wrote:
 
 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 Ping?
 
 Sorry ...
 
 Best regards,
 
 Thomas Preud'homme
 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
 
 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.
 
 Here are the updated ChangeLogs:
 
 *** gcc/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
 PR tree-optimization/54733
 * expr.c (get_inner_reference): Add a parameter to control whether
 a
 MEM_REF should be split into base + offset.
 * tree.h (get_inner_reference): Default new parameter to false.
 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
 (CMPNOP): Define.
 (find_bswap_or_nop_load): New.
 (find_bswap_1): Renamed to ...
 (find_bswap_or_nop_1): This. Also add support for memory source.
 (find_bswap): Renamed to ...
 (find_bswap_or_nop): This. Also add support for memory source and
 detection of bitwise operations equivalent to load in host 
 endianness.
 (execute_optimize_bswap): Likewise. Also move its leading
 comment back
 in place and split statement transformation into ...
 (bswap_replace): This. Add assert when updating bswap_stats.
 
 *** gcc/testsuite/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
 PR tree-optimization/54733
 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
 optimization to support memory sources and bitwise operations
 equivalent to load in host endianness.
 * gcc.dg/optimize-bswaphi-1.c: Likewise.
 * gcc.dg/optimize-bswapsi-2.c: Likewise.
 * gcc.c-torture/execute/bswap-2.c: Likewise.
 
 Ok for trunk?
 
 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?
 
 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:
 
 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n-alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);
 
 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.
 
 So what you have to do is either (simpler)
 
  unsigned int align = get_pointer_alignment (addr_tmp);
  tree al_load_type = load_type;
  if (align  TYPE_ALIGN (load_type))
al_load_type = build_aligned_type (load_type, align);
 ...
   val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
load_offset_ptr);
 
 or keep track of the first actual load and use
 
  unsigned int align = get_object_alignment (that_first_load);
 
 first in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.
 
 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).
 
 Not always decomposed. On MIPS, it should using the load/store left/right 
 instructions for unaligned load/stores which is normally better than 
 decomposed load/stores. So having a cost model would be nice.
 
 Agreed, but I am happy with doing that as a followup.  Btw,
 a very simple one would be to reject unaligned
 SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
 [of course that may be true on MIPS even for the cases where
 a reasonable fast unalgined variant exists - nearly no target
 defines that macro in a too fancy way]
 
 Oh, and what happens for
 
 unsigned foo (unsigned char *x)
 {
  return x[0]  24 | x[2]  8 | x[3];
 }
 
 ?  We could do an unsigned int load from x and zero byte 3
 with an AND.  Enhancement for a followup, similar to also
 considering vector types for the load (also I'm not sure
 that uint64_type_node always has non-BLKmode for all
 targets).

No we cannot if x[4] is on a different page which is not mapped in, we get a 
fault. Not something we want. 

Thanks,
Andrew


 
 Richard.
 
 Richard.
 
 Thanks,
 Andrew
 
 
 Thanks and sorry again for the delay.
 
 Otherwise the patch looks good to me.
 
 Richard.
 
 Best regards,
 
 Thomas
 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org 

Re: [PATCH] Fix ARM NAN fraction bits

2014-05-16 Thread Joseph S. Myers
On Fri, 16 May 2014, Maciej W. Rozycki wrote:

 2014-05-16  Maciej W. Rozycki  ma...@codesourcery.com
 
   PR libgcc/60166
   * sysdeps/arm/soft-fp/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D)
   (_FP_NANSIGN_Q): Set the quiet bit.

OK for glibc.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PR 61084: Use widest_int in sparc_fold_builtin

2014-05-16 Thread Eric Botcazou
 This is the second part of PR 61084, which it seems I'd forgotten to post.
 pdist calculates a wide result from narrower inputs, so I should have
 used widest_int rather than wide_int.

Is that documented?  Because, if even you wide-int guys got it wrong...

   PR target/61084
   * config/sparc/sparc.c (sparc_fold_builtin): Use widest_int rather
   than wide_int.

OK, thanks.

-- 
Eric Botcazou


Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware

2014-05-16 Thread Jeff Law

On 05/16/14 02:02, Richard Biener wrote:



Quiet as usual.
Which strongly suggests folks trust you to do the right thing here :-) 
 I think the FRE/PRE reference in $SUBJECT made me ignore the patch 
entirely -- my brain hurts when I look at our tree PRE implementation.


Jeff


Eliminate write-only variables

2014-05-16 Thread Jan Hubicka
Hi,
this patch adds code to remove write only static variables.  While analyzing
effectivity of LTO on firefox, I noticed that surprisingly large part of
binary's data segment is occupied by these.  Fixed thus.
(this is quite trivial transformation, I just never considered it important
enough to work on it).

The patch goes by marking write only variables in ipa.c (at same time we
discover addressable flag) and also fixes handling of the flags for
aliases. References to variables are then removed by fixup_cfg.
As first cut, I only remove stores without side effects, so copies from
volatile variables are preserved. I also kill LHS of function calls.
I do not attempt to remove asm statements.  This means that some references
may be left in the code and therefore the IPA code does not eliminate the
referneces after discovering write only variable and instead it relies
on dead variable elimination to do the job later.  Consequently not all write
only variables are removed with WHOPR in the case the references ends up
in different partitions. Something I can address incrementally.

Also I think dwarf2out should be updated to mark value of the write only
variables as optimized out.  Jakub, can you help me with this?
(I do not think it is valid to output the optimized out value of constructor)

Bootstrapped/regtested x86_64-linux, will commit it later today.

Honza

* varpool.c (dump_varpool_node): Dump write-only flag.
* lto-cgraph.c (lto_output_varpool_node, input_varpool_node): Stream
write-only flag.
* tree-cfg.c (execute_fixup_cfg): Remove statements setting write-only 
variables.


* gcc.c-torture/execute/20101011-1.c: Update testcase.
* gcc.dg/ira-shrinkwrap-prep-1.c: Update testcase.
* gcc.dg/tree-ssa/writeonly.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-6.c: Update testcase.
* gcc.dg/tree-ssa/pr21559.c: Update testcase.
* gcc.dg/debug/pr35154.c: Update testcase.
* gcc.target/i386/vectorize1.c: Update testcase.
* ipa.c (process_references): New function.
(set_readonly_bit): New function.
(set_writeonly_bit): New function.
(clear_addressable_bit): New function.
(ipa_discover_readonly_nonaddressable_var): Mark write only variables; 
fix
handling of aliases.
* cgraph.h (struct varpool_node): Add writeonly flag.

Index: varpool.c
===
--- varpool.c   (revision 210514)
+++ varpool.c   (working copy)
@@ -211,6 +211,8 @@ dump_varpool_node (FILE *f, varpool_node
 fprintf (f,  read-only);
   if (ctor_for_folding (node-decl) != error_mark_node)
 fprintf (f,  const-value-known);
+  if (node-writeonly)
+fprintf (f,  write-only);
   fprintf (f, \n);
 }
 
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 210514)
+++ lto-cgraph.c(working copy)
@@ -562,6 +562,7 @@ lto_output_varpool_node (struct lto_simp
   bp_pack_value (bp, node-forced_by_abi, 1);
   bp_pack_value (bp, node-unique_name, 1);
   bp_pack_value (bp, node-body_removed, 1);
+  bp_pack_value (bp, node-writeonly, 1);
   bp_pack_value (bp, node-definition, 1);
   alias_p = node-alias  (!boundary_p || node-weakref);
   bp_pack_value (bp, alias_p, 1);
@@ -1153,6 +1154,7 @@ input_varpool_node (struct lto_file_decl
   node-forced_by_abi = bp_unpack_value (bp, 1);
   node-unique_name = bp_unpack_value (bp, 1);
   node-body_removed = bp_unpack_value (bp, 1);
+  node-writeonly = bp_unpack_value (bp, 1);
   node-definition = bp_unpack_value (bp, 1);
   node-alias = bp_unpack_value (bp, 1);
   node-weakref = bp_unpack_value (bp, 1);
Index: tree-cfg.c
===
--- tree-cfg.c  (revision 210514)
+++ tree-cfg.c  (working copy)
@@ -8431,7 +8431,7 @@ execute_fixup_cfg (void)
   FOR_EACH_BB_FN (bb, cfun)
 {
   bb-count = apply_scale (bb-count, count_scale);
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
{
  gimple stmt = gsi_stmt (gsi);
  tree decl = is_gimple_call (stmt)
@@ -8457,9 +8457,45 @@ execute_fixup_cfg (void)
todo |= TODO_cleanup_cfg;
 }
 
+ /* Remove stores to variables we marked readonly.
+Keep access when store has side effect, i.e. in case when source
+is volatile.  */
+ if (gimple_store_p (stmt)
+  !gimple_has_side_effects (stmt))
+   {
+ tree lhs = get_base_address (gimple_get_lhs (stmt));
+
+ if (TREE_CODE (lhs) == VAR_DECL
+  (TREE_STATIC (lhs) || DECL_EXTERNAL (lhs))
+  varpool_get_node (lhs)-writeonly)
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (gsi, true);
+ release_defs (stmt);
+ 

Re: [PATCH] Use direct computation to calculate fallthrough edge count.

2014-05-16 Thread Jan Hubicka
 This patch uses direct computation (instead of indirect minus
 computation) to calculate fallthrough edge's count and frequency. This
 is less error-prone when the profile is inconsistent (e.g. all succ
 edge counts sums larger than bb count).
 
 Bootstrapped and regression test on-going.
 
 OK for trunk if test pass?

OK,
thanks!

Honza
 
 Thanks,
 Dehao
 
 gcc/ChangeLog:
 2014-05-16  Dehao Chen  de...@google.com
 
 * cfghooks.c (make_forwarder_block): Use direct computation to
 get fall-through edge's count and frequency.
 
 Index: gcc/cfghooks.c
 ===
 --- gcc/cfghooks.c (revision 210518)
 +++ gcc/cfghooks.c (working copy)
 @@ -833,6 +833,9 @@ make_forwarder_block (basic_block bb, bool (*redir
 
fallthru = split_block_after_labels (bb);
dummy = fallthru-src;
 +  dummy-count = 0;
 +  dummy-frequency = 0;
 +  fallthru-count = 0;
bb = fallthru-dest;
 
/* Redirect back edges we want to keep.  */
 @@ -842,20 +845,13 @@ make_forwarder_block (basic_block bb, bool (*redir
 
if (redirect_edge_p (e))
   {
 +  dummy-frequency += EDGE_FREQUENCY (e);
 +  dummy-count += e-count;
 +  fallthru-count += e-count;
ei_next (ei);
continue;
   }
 
 -  dummy-frequency -= EDGE_FREQUENCY (e);
 -  dummy-count -= e-count;
 -  if (dummy-frequency  0)
 - dummy-frequency = 0;
 -  if (dummy-count  0)
 - dummy-count = 0;
 -  fallthru-count -= e-count;
 -  if (fallthru-count  0)
 - fallthru-count = 0;
 -
e_src = e-src;
jump = redirect_edge_and_branch_force (e, bb);
if (jump != NULL)


[PATCH] Use direct computation to calculate fallthrough edge count.

2014-05-16 Thread Dehao Chen
This patch uses direct computation (instead of indirect minus
computation) to calculate fallthrough edge's count and frequency. This
is less error-prone when the profile is inconsistent (e.g. all succ
edge counts sums larger than bb count).

Bootstrapped and regression test on-going.

OK for trunk if test pass?

Thanks,
Dehao

gcc/ChangeLog:
2014-05-16  Dehao Chen  de...@google.com

* cfghooks.c (make_forwarder_block): Use direct computation to
get fall-through edge's count and frequency.

Index: gcc/cfghooks.c
===
--- gcc/cfghooks.c (revision 210518)
+++ gcc/cfghooks.c (working copy)
@@ -833,6 +833,9 @@ make_forwarder_block (basic_block bb, bool (*redir

   fallthru = split_block_after_labels (bb);
   dummy = fallthru-src;
+  dummy-count = 0;
+  dummy-frequency = 0;
+  fallthru-count = 0;
   bb = fallthru-dest;

   /* Redirect back edges we want to keep.  */
@@ -842,20 +845,13 @@ make_forwarder_block (basic_block bb, bool (*redir

   if (redirect_edge_p (e))
  {
+  dummy-frequency += EDGE_FREQUENCY (e);
+  dummy-count += e-count;
+  fallthru-count += e-count;
   ei_next (ei);
   continue;
  }

-  dummy-frequency -= EDGE_FREQUENCY (e);
-  dummy-count -= e-count;
-  if (dummy-frequency  0)
- dummy-frequency = 0;
-  if (dummy-count  0)
- dummy-count = 0;
-  fallthru-count -= e-count;
-  if (fallthru-count  0)
- fallthru-count = 0;
-
   e_src = e-src;
   jump = redirect_edge_and_branch_force (e, bb);
   if (jump != NULL)


Re: Eliminate write-only variables

2014-05-16 Thread Andreas Schwab
Jan Hubicka hubi...@ucw.cz writes:

 +   /* Remove stores to variables we marked readonly.

s/read/write/

 +   /* For calls we can simply remove LHS when it is known to be read 
 only.  */

s/read/write/

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: Eliminate write-only variables

2014-05-16 Thread Jan Hubicka
 Jan Hubicka hubi...@ucw.cz writes:
 
  + /* Remove stores to variables we marked readonly.
 
 s/read/write/
 
  + /* For calls we can simply remove LHS when it is known to be read 
  only.  */
 
 s/read/write/

Ah, thanks!

Honza
 
 Andreas.
 
 -- 
 Andreas Schwab, sch...@linux-m68k.org
 GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
 And now for something completely different.


Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Xinliang David Li
On Fri, May 16, 2014 at 9:51 AM, Jan Hubicka hubi...@ucw.cz wrote:

 I have been chasing a runtime failure of a very large test built with
 gcc-4_9. The bad code either calls a pure function or turn a virtual
 call into __builtin_unreachable (incomplete target set). The indirect
 info shows the otr type to be !maybe_derived_type, and the outer-type
 gets cleared during inline update. I isolated a small test case -- but
 the good news is that gcc-4_9 @head already fixed the problem.

 I will  check in the test case to trunk later.

 Good, testcase would be welcome.  I guess it was the fix for placement_new 
 bug.
 It disables some valid devirtualizations (and I thus may revisit the fix for
 4.10), so it would be good to know if your testcase differs from the original
 PR one.

I thought so too -- but when I backed out this single change from
trunk, the test still passes, so something else is also going on.

David


 Honza

 thanks,

 David

 
 
  Honza


patch to fix PR60969

2014-05-16 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60969

The patch was bootstrapped and tested on x86/x86-64.

Committed as rev. 210519 to gcc 4.9 branch and as rev. 210520 to trunk.

2014-05-16  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/60969
* ira-costs.c (record_reg_classes): Allow only memory for pseudo.
Calculate costs for this case.

2014-05-16  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/60969
* g++.dg/pr60969.C: New.

Index: ira-costs.c
===
--- ira-costs.c (revision 210069)
+++ ira-costs.c (working copy)
@@ -762,10 +762,11 @@ record_reg_classes (int n_alts, int n_op
 into that class.  */
  if (REG_P (op)  REGNO (op) = FIRST_PSEUDO_REGISTER)
{
- if (classes[i] == NO_REGS)
+ if (classes[i] == NO_REGS  ! allows_mem[i])
{
  /* We must always fail if the operand is a REG, but
-we did not find a suitable class.
+we did not find a suitable class and memory is
+not allowed.
 
 Otherwise we may perform an uninitialized read
 from this_op_costs after the `continue' statement
@@ -783,50 +784,90 @@ record_reg_classes (int n_alts, int n_op
  bool out_p = recog_data.operand_type[i] != OP_IN;
  enum reg_class op_class = classes[i];
  move_table *move_in_cost, *move_out_cost;
+ short (*mem_cost)[2];
 
  ira_init_register_move_cost_if_necessary (mode);
  if (! in_p)
{
  ira_assert (out_p);
- move_out_cost = ira_may_move_out_cost[mode];
- for (k = cost_classes_ptr-num - 1; k = 0; k--)
+ if (op_class == NO_REGS)
{
- rclass = cost_classes[k];
- pp_costs[k]
-   = move_out_cost[op_class][rclass] * frequency;
+ mem_cost = ira_memory_move_cost[mode];
+ for (k = cost_classes_ptr-num - 1; k = 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k] = mem_cost[rclass][0] * frequency;
+   }
+   }
+ else
+   {
+ move_out_cost = ira_may_move_out_cost[mode];
+ for (k = cost_classes_ptr-num - 1; k = 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k]
+   = move_out_cost[op_class][rclass] * frequency;
+   }
}
}
  else if (! out_p)
{
  ira_assert (in_p);
- move_in_cost = ira_may_move_in_cost[mode];
- for (k = cost_classes_ptr-num - 1; k = 0; k--)
+ if (op_class == NO_REGS)
{
- rclass = cost_classes[k];
- pp_costs[k]
-   = move_in_cost[rclass][op_class] * frequency;
+ mem_cost = ira_memory_move_cost[mode];
+ for (k = cost_classes_ptr-num - 1; k = 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k] = mem_cost[rclass][1] * frequency;
+   }
+   }
+ else
+   {
+ move_in_cost = ira_may_move_in_cost[mode];
+ for (k = cost_classes_ptr-num - 1; k = 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k]
+   = move_in_cost[rclass][op_class] * frequency;
+   }
}
}
  else
{
- move_in_cost = ira_may_move_in_cost[mode];
- move_out_cost = ira_may_move_out_cost[mode];
- for (k = cost_classes_ptr-num - 1; k = 0; k--)
-   {
- rclass = cost_classes[k];
- pp_costs[k] = ((move_in_cost[rclass][op_class]
- + move_out_cost[op_class][rclass])
-* frequency);
+ if (op_class == NO_REGS)
+   {
+ mem_cost = ira_memory_move_cost[mode];
+ 

[C++ patch] Reduce vtable alignment

2014-05-16 Thread Jan Hubicka
Hi,
compiling:

struct A
{
  virtual void foo(void) {};
  virtual void foo2(void) {};
  virtual void foo3(void) {};
  virtual void foo4(void) {};
  virtual void foo5(void) {};
} a;

give 32 byte alignment to the virtual table on i386, because we bump up
alignments of arrays to size of vector operations.  This is wasteful, since
virutal tables are never really accessed this way.

I am testing the following patch, OK if it passes?
The patch also removes apparently 20 years old hack for SPARC.

Honza
* class.c (build_vtable): Force alignment of virtual tables
to be pointer size only to save space.
Index: class.c
===
--- class.c (revision 210521)
+++ class.c (working copy)
@@ -768,11 +768,8 @@ build_vtable (tree class_type, tree name
   TREE_READONLY (decl) = 1;
   DECL_VIRTUAL_P (decl) = 1;
   DECL_ALIGN (decl) = TARGET_VTABLE_ENTRY_ALIGN;
+  DECL_USER_ALIGN (decl) = true;
   DECL_VTABLE_OR_VTT_P (decl) = 1;
-  /* At one time the vtable info was grabbed 2 words at a time.  This
- fails on sparc unless you have 8-byte alignment.  (tiemann) */
-  DECL_ALIGN (decl) = MAX (TYPE_ALIGN (double_type_node),
-  DECL_ALIGN (decl));
   set_linkage_according_to_type (class_type, decl);
   /* The vtable has not been defined -- yet.  */
   DECL_EXTERNAL (decl) = 1;


Re: Avoiding some garbage rtl from instantiate_virtual_regs

2014-05-16 Thread Jeff Law

On 05/16/14 02:47, Richard Sandiford wrote:

It seems that in a typical -O0 compile the amount of rtl that
starts out as needed but becomes garbage is only slightly less than
half of the total amount created.  One of the big offenders is the
vregs pass, which creates new PLUSes when instanstiating a virtual
register + a constant and which creates new MEMs when instantiating
an address involving a virtual register.  This happens a lot in -O0
code because all variables live on the stack.

The instantiation walk is fundamentally in-place: every other part
of the pattern is modified without copying.  And rtl sharing rules
guarantee that we can do the same for PLUSes of registers and MEMs.

The patch does this by adding inplace arguments to plus_constant and
replace_equiv_address.  In a -O0 compile of an oldish fold-const.ii
(where no GC takes place) it reduces the amount of used GC memory
from 169M to 166M.  The average max RSS goes down by just over 1%.
Compile time seems to decrease slightly, but probably in the noise range.

There might be other callers than can use the new interfaces too.

Tested on x86_64-linux-gnu.  Also tested by comparing the asm output
for various parts of the testsuite before and after the patch.
The only changes were that some sym+0s becamse plain syms
(i.e. (plus X (const_int 0)) became X) because of the plus_constant
change.

OK to install?

Thanks,
Richard


gcc/
* emit-rtl.h (replace_equiv_address, replace_equiv_address_nv): Add an
inplace argument.  Store the new address in the original MEM when true.
* emit-rtl.c (change_address_1): Likewise.
(adjust_address_1, adjust_automodify_address_1, offset_address):
Update accordingly.
* rtl.h (plus_constant): Add an inplace argument.
* explow.c (plus_constant): Likewise.  Try to reuse the original PLUS
when true.  Avoid generating (plus X (const_int 0)).
* function.c (instantiate_virtual_regs_in_rtx): Adjust the PLUS
in-place.  Pass true to plus_constant.
(instantiate_virtual_regs_in_insn): Pass true to replace_equiv_address.

Index: gcc/emit-rtl.h
===
--- gcc/emit-rtl.h  2014-05-15 11:27:06.000259353 +0100
+++ gcc/emit-rtl.h  2014-05-16 09:11:42.479556294 +0100
@@ -52,10 +52,10 @@ extern tree get_spill_slot_decl (bool);
 ADDR.  The caller is asserting that the actual piece of memory pointed
 to is the same, just the form of the address is being changed, such as
 by putting something into a register.  */
-extern rtx replace_equiv_address (rtx, rtx);
+extern rtx replace_equiv_address (rtx, rtx, bool = false);

  /* Likewise, but the reference is not required to be valid.  */
-extern rtx replace_equiv_address_nv (rtx, rtx);
+extern rtx replace_equiv_address_nv (rtx, rtx, bool = false);
Presumably the default value for the inplace argument is to avoid having 
to fixup all the call sites.


I guess that's OK.  Clearly it's a safe default and avoids a fair amount 
of unnecessary churn.



OK for the trunk.

THanks,
Jeff


Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware

2014-05-16 Thread Richard Biener
On May 16, 2014 7:07:11 PM CEST, Jeff Law l...@redhat.com wrote:
On 05/16/14 02:02, Richard Biener wrote:

 Quiet as usual.
Which strongly suggests folks trust you to do the right thing here :-) 
  I think the FRE/PRE reference in $SUBJECT made me ignore the patch 
entirely -- my brain hurts when I look at our tree PRE implementation.

Heh, it's much easier to understand than it once was!

:)

Richard.

Jeff




Re: [DOC Patch] Incorrect @xref in #pragma visibility

2014-05-16 Thread Jeff Law

On 05/14/14 20:25, David Wohlferd wrote:

2014-05-14  David Wohlferd d...@limegreensocks.com

  * doc/extend.texi: (Visibility Pragmas) Fix misplaced @xref

Approved and installed on your behalf.

Thanks,
jeff



Re: [PATCH 1/1][ira-costs] grammar fix of comments

2014-05-16 Thread Jeff Law

On 05/14/14 18:45, Zhouyi Zhou wrote:

Fix grammar error for comments above
process_bb_node_for_hard_reg_moves

Signed-off-by: Zhouyi Zhou yizhouz...@ict.ac.cn
---
  gcc/ira-costs.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 648806b..fbfb070 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -1855,7 +1855,7 @@ find_costs_and_classes (FILE *dump_file)

  /* Process moves involving hard regs to modify allocno hard register
 costs.  We can do this only after determining allocno class.  If a
-   hard register forms a register class, than moves with the hard
+   hard register forms a register class, then moves with the hard
 register are already taken into account in class costs for the
 allocno.  */
  static void

THanks.  Installed.

jeff



Re: [PATCH 1/5] export finish_bitfield_layout from stor-layout

2014-05-16 Thread Jeff Law

On 05/16/14 09:26, Tom Tromey wrote:

The gdb plugin handles some aspects of type layout on its own.  It
does this because it knows the layout of types, but not the path by
which the layout was determined -- so normal gcc things like
TYPE_PACKED cannot be used.

This patch exposes one bit of stor-layout so it can be used by the
plugin.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Tom Tromey  tro...@redhat.com

* stor-layout.c (finish_bitfield_layout): Now public.  Change
argument type to 'tree'.
(finish_record_layout): Update.
* stor-layout.h (finish_bitfield_layout): Declare.

OK.  However, please hold off installing until the entire set is approved.



Jeff



Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Jeff Law

On 05/16/14 09:26, Tom Tromey wrote:

In the typical case, when compiling a snippet of user code, gdb wraps
the user's text in a dummy function.

It's somewhat odd for users if an error in their code is mentioned as
coming from this dummy function.

This patch makes it possible to suppress the function-name display in
a straightforward way: it adds a new global which the plugin can set
to declare the name of the dummy function.

This patch seems like a bit of a hack, but there didn't seem to be a
notably cleaner approach.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Tom Tromey  tro...@redhat.com

* c-lang.c (c_diagnostic_ignored_function): New global.
(c_print_error_function): New function.
(LANG_HOOKS_PRINT_ERROR_FUNCTION): Define.
* c-lang.h (c_diagnostic_ignored_function): Declare.

Just a few nites.

In c-lang.c, please use the old C-style comments.  If for no other 
reason than it's consistent with all the other nearby code.   Consider 
using non-NULL when referring to pointers rather than non-zero.  */


Otherwise OK.  Please wait to install until the entire kit is approved.

BTW, didn't see patch #5 of the series.

jeff



Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Tom Tromey
Jeff BTW, didn't see patch #5 of the series.

Maybe it was too big.
I will try to resend it compressed.

Tom


[PATCH 5/5] add libcc1

2014-05-16 Thread Tom Tromey
 Tom == Tom Tromey tro...@redhat.com writes:

Tom This patch series is half of a project to let gdb reuse gcc (which
Tom half depends on which list you are seeing this on), so that users can
Tom compile small snippets of code and evaluate them in the current
Tom context of the inferior.

Jeff noted that patch #5 didn't make it through.

I've edited this one down by removing the auto-generated stuff , and
then compressed it.

If anybody wants to apply this to try it out, it's all on github.

https://github.com/tromey/gcc
https://github.com/tromey/gdb

In each repository the branch named submit/compile holds the rebased
series that were submitted.

The development branches are both named gdbjit.  These are much
messier but not ever rebased, so perhaps safer to track.

Tom

This patch adds the plugin to the gcc tree and updates the top-level
configury.

It seems better to have this code in the gcc tree than in the gdb
tree, because it is bound more tightly to gcc.  The gcc plugin makes
direct calls into various parts of gcc to do its work; whereas on the
gdb side everything is done via a relatively simple API, without any
direct connection to gdb internals.  This breakdown made the most
sense because most calls are from gdb to gcc rather than vice versa.

The plugin itself consists of two parts.  These parts communicate via
a simple ad hoc RPC system implemented in the plugin code.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Tom Tromey  tro...@redhat.com

* Makefile.def: Add libcc1 to host_modules.
* configure.ac (host_tools): Add libcc1.
* Makefile.in, configure: Rebuild.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Jan Kratochvil  jan.kratoch...@redhat.com
Tom Tromey  tro...@redhat.com

* aclocal.m4: New file.
* callbacks.cc: New file.
* callbacks.hh: New file.
* cc1plugin-config.h.in: New file.
* configure: New file.
* configure.ac: New file.
* connection.cc: New file.
* connection.hh: New file.
* libcc1.cc: New file.
* libcc1plugin.sym: New file.
* libcc1.sym: New file.
* Makefile.am: New file.
* Makefile.in: New file.
* marshall.cc: New file.
* marshall.hh: New file.
* names.cc: New file.
* names.hh: New file.
* plugin.cc: New file.
* rpc.hh: New file.
* status.hh: New file.
---
 ChangeLog| 7 +
 Makefile.def | 5 +
 Makefile.in  |   995 ++-
 configure| 2 +-
 configure.ac | 4 +-
 libcc1/ChangeLog |24 +
 libcc1/Makefile.am   |42 +
 libcc1/Makefile.in   |   614 ++
 libcc1/aclocal.m4|   980 +++
 libcc1/callbacks.cc  |90 +
 libcc1/callbacks.hh  |64 +
 libcc1/cc1plugin-config.h.in |92 +
 libcc1/configure | 16956 +
 libcc1/configure.ac  |65 +
 libcc1/connection.cc |   153 +
 libcc1/connection.hh |   114 +
 libcc1/libcc1.cc |   454 ++
 libcc1/libcc1.sym| 1 +
 libcc1/libcc1plugin.sym  | 2 +
 libcc1/marshall.cc   |   166 +
 libcc1/marshall.hh   |93 +
 libcc1/names.cc  |46 +
 libcc1/names.hh  |55 +
 libcc1/plugin.cc |   895 +++
 libcc1/rpc.hh|   486 ++
 libcc1/status.hh |33 +
 26 files changed, 22430 insertions(+), 8 deletions(-)
 create mode 100644 libcc1/ChangeLog
 create mode 100644 libcc1/Makefile.am
 create mode 100644 libcc1/Makefile.in
 create mode 100644 libcc1/aclocal.m4
 create mode 100644 libcc1/callbacks.cc
 create mode 100644 libcc1/callbacks.hh
 create mode 100644 libcc1/cc1plugin-config.h.in
 create mode 100755 libcc1/configure
 create mode 100644 libcc1/configure.ac
 create mode 100644 libcc1/connection.cc
 create mode 100644 libcc1/connection.hh
 create mode 100644 libcc1/libcc1.cc
 create mode 100644 libcc1/libcc1.sym
 create mode 100644 libcc1/libcc1plugin.sym
 create mode 100644 libcc1/marshall.cc
 create mode 100644 libcc1/marshall.hh
 create mode 100644 libcc1/names.cc
 create mode 100644 libcc1/names.hh
 create mode 100644 libcc1/plugin.cc
 create mode 100644 libcc1/rpc.hh
 create mode 100644 libcc1/status.hh



0001-add-libcc1.patch.gz
Description: the patch


Re: [C++ patch] Reduce vtable alignment

2014-05-16 Thread Jan Hubicka
Hi,
this patch makes also the rtti type info for A in the testcase:

struct A
{
  virtual void foo(void) {};
  virtual void foo2(void) {};
  virtual void foo3(void) {};
  virtual void foo4(void) {};
  virtual void foo5(void) {};
} a;

aligned only to the ABI requirement (8) instead of being bumped up to 16 bytes
by the following code in i386.c:
  /* x86-64 ABI requires arrays greater than 16 bytes to be aligned
 to 16byte boundary.  */
  if (TARGET_64BIT)
{
  if ((opt ? AGGREGATE_TYPE_P (type) : TREE_CODE (type) == ARRAY_TYPE)
   TYPE_SIZE (type)
   TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
   wi::geu_p (TYPE_SIZE (type), 128)
   align  128)
return 128;
}

Here the variable is first run through align_variable and that decides to add
optional alignment. We really want only ABI required alignment here.
Does the following patch look resonable?

* rtti.c: Include tm_p.h
(emit_tinfo_decl): Align type infos only as required by the target ABI.

Index: rtti.c
===
--- rtti.c  (revision 210521)
+++ rtti.c  (working copy)
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 #include coretypes.h
 #include tm.h
 #include tree.h
+#include tm_p.h
 #include stringpool.h
 #include stor-layout.h
 #include cp-tree.h
@@ -1596,6 +1597,12 @@ emit_tinfo_decl (tree decl)
   DECL_INITIAL (decl) = init;
   mark_used (decl);
   cp_finish_decl (decl, init, false, NULL_TREE, 0);
+  /* Avoid targets optionally bumping up the alignment to improve
+vector instruction accesses, tinfo are never accessed this way.  */
+#ifdef DATA_ABI_ALIGNMENT
+  DECL_ALIGN (decl) = DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE 
(decl)));
+  DECL_USER_ALIGN (decl) = true;
+#endif
   return true;
 }
   else


Re: [PATCH 3/5] introduce the binding oracle

2014-05-16 Thread Jeff Law

On 05/16/14 09:26, Tom Tromey wrote:

gdb wants to supply any declarations that may be referred to by the
user's code.  Hooking into symbol lookup was an efficient way to
accomplish this.

This patch introducing a binding oracle that is consulted whenever a
symbol binding is looked up for the first time.  The oracle is just a
global function pointer.  If it is NULL, no special work is done.  It
is called with the identifier to supply and with an enum argument
indicating the kind of binding being requested.  The oracle can then
call back into the C front end (via the new functions c_pushtag and
c_bind) to supply a binding; or it can silently do nothing if the
request could not be fulfilled.

The code caches Whether the oracle has been called to avoid repeated
useless queries.

There is a little hack in c_print_identifier to avoid calling the
binding oracle here.  This makes debugging gcc in the presence of the
plugin remain relatively sane -- without this, calling debug_tree or
the like can confusingly call into the plugin.

2014-05-16  Phil Muldoon  pmuld...@redhat.com
Tom Tromey  tro...@redhat.com

* c-tree.h (enum c_oracle_request): New.
(c_binding_oracle_function): New typedef.
(c_binding_oracle, c_pushtag, c_bind): Declare.
* c-decl.c (c_binding_oracle): New global.
(I_SYMBOL_CHECKED): New macro.
(i_symbol_binding): New function.
(I_SYMBOL_BINDING, I_SYMBOL_DECL): Redefine.
(I_TAG_CHECKED): New macro.
(i_tag_binding): New function.
(I_TAG_BINDING, I_TAG_DECL): Redefine.
(I_LABEL_CHECKED): New macro.
(i_label_binding): New function.
(I_LABEL_BINDING, I_LABEL_DECL): Redefine.
(c_print_identifier): Save and restore c_binding_oracle.
(c_pushtag, c_bind): New functions.
---
  void
  c_print_identifier (FILE *file, tree node, int indent)
  {
+  void (*save) (enum c_oracle_request, tree identifier);
+
+  // This makes debugging much more sane.
+  save = c_binding_oracle;
+  c_binding_oracle = NULL;
+
Just a nit.  C-style comment would be appreciated.  It might also help 
to clarify what much more sane really means here.


Otherwise, it looks OK to me.

jeff



Re: [patch] fix ppc spe bootstrap error in dwf_regno

2014-05-16 Thread Jeff Law

On 04/30/14 09:15, Cesar Philippidis wrote:

I've been working on a patch to switch a few ppc targets to use softfp
in libgcc instead of fpbit and I noticed that ppc-none-eabispe fails to
bootstrap in both trunk and 4.9. The regression was introduced in
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03588.html. Essentially,
the assert for a hard register in dwf_regno () is not valid on ppc spe
targets. In rs6000_dwarf_register_span (), there is a note stating:

   /* The duality of the SPE register size wreaks all kinds of havoc.
  This is a way of distinguishing r0 in 32-bits from r0 in
  64-bits.  */

and the function adds 1200 to regno, which makes that register appear to
be a pseudo. This causes problems in dwf_regno (), which asserts that
reg is a hard register. Since the dwarf2 pass is executed after register
allocation it should, in theory, be ok for the rs6000 backend to be
using a pseudo register for this application.

Is this patch ok for trunk and 4.9? If so, please commit since I don't
have an svn account.
It seems to me the real problem here is the SPE port and this silly game 
it plays with adding a large value to regno.


While it is currently safe to remove the assert, the assert is there to 
help catch cases where a pseudo has leaked through to a point where it 
shouldn't.


I'd rather look for another approach.

Jeff


Re: [PATCH, i386, Pointer Bounds Checker 2/x] Intel Memory Protection Extensions (MPX) instructions support

2014-05-16 Thread Jeff Law

On 04/16/14 05:35, Ilya Enkovich wrote:

Hi,

This patch introduces Intel MPX bound registers and instructions.  It was 
approved earlier for 4.9 and had no significant changes since then.  I'll 
assume patch is OK if no objections arise.

Patch was bootstrapped and tested for linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-04-16  Ilya Enkovich  ilya.enkov...@intel.com

* mode-classes.def (MODE_BOUND): New.
* tree.def (BOUND_TYPE): New.
* genmodes.c (complete_mode): Support MODE_BOUND.
(BOUND_MODE): New.
(make_bound_mode): New.
* machmode.h (BOUND_MODE_P): New.
* stor-layout.c (int_mode_for_mode): Support MODE_BOUND.
(layout_type): Support BOUND_TYPE.
* tree-pretty-print.c (dump_generic_node): Support BOUND_TYPE.
* tree.c (build_int_cst_wide): Support BOUND_TYPE.
(type_contains_placeholder_1): Likewise.
* tree.h (BOUND_TYPE_P): New.
* varasm.c (output_constant): Support BOUND_TYPE.
* config/i386/constraints.md (B): New.
(Ti): New.
(Tb): New.
* config/i386/i386-modes.def (BND32): New.
(BND64): New.
* config/i386/i386-protos.h (ix86_bnd_prefixed_insn_p): New.
* config/i386/i386.c (isa_opts): Add mmpx.
(regclass_map): Add bound registers.
(dbx_register_map): Likewise.
(dbx64_register_map): Likewise.
(svr4_dbx_register_map): Likewise.
(PTA_MPX): New.
(ix86_option_override_internal) Support MPX ISA.
(ix86_code_end): Add MPX bnd prefix.
(output_set_got): Likewise.
(ix86_output_call_insn): Likewise.
(get_some_local_dynamic_name): Add '!' (MPX bnd) print prefix support.
(ix86_print_operand_punct_valid_p): Likewise.
(ix86_print_operand_address): Support UNSPEC_BNDMK_ADDR and
UNSPEC_BNDMK_ADDR.
(ix86_class_likely_spilled_p): Add bound regs support.
(ix86_hard_regno_mode_ok): Likewise.
(x86_order_regs_for_local_alloc): Likewise.
(ix86_bnd_prefixed_insn_p): New.
* config/i386/i386.h (FIRST_PSEUDO_REGISTER): Fix to new value.
(FIXED_REGISTERS): Add bound registers.
(CALL_USED_REGISTERS): Likewise.
(REG_ALLOC_ORDER): Likewise.
(HARD_REGNO_NREGS): Likewise.
(TARGET_MPX): New.
(VALID_BND_REG_MODE): New.
(FIRST_BND_REG): New.
(LAST_BND_REG): New.
(reg_class): Add BND_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(BND_REGNO_P): New.
(ANY_BND_REG_P): New.
(BNDmode): New.
(HI_REGISTER_NAMES): Add bound registers.
* config/i386/i386.md (UNSPEC_BNDMK): New.
(UNSPEC_BNDMK_ADDR): New.
(UNSPEC_BNDSTX): New.
(UNSPEC_BNDLDX): New.
(UNSPEC_BNDLDX_ADDR): New.
(UNSPEC_BNDCL): New.
(UNSPEC_BNDCU): New.
(UNSPEC_BNDCN): New.
(UNSPEC_MPX_FENCE): New.
(BND0_REG): New.
(BND1_REG): New.
(type): Add mpxmov, mpxmk, mpxchk, mpxld, mpxst.
(length_immediate): Likewise.
(prefix_0f): Likewise.
(memory): Likewise.
(prefix_rep): Check for bnd prefix.
(BND): New.
(bnd_ptr): New.
(BNDCHECK): New.
(bndcheck): New.
(*jcc_1): Add MPX bnd prefix and fix length.
(*jcc_2): Likewise.
(jump): Likewise.
(simple_return_internal): Likewise.
(simple_return_pop_internal): Likewise.
(*indirect_jump): Add MPX bnd prefix.
(*tablejump_1): Likewise.
(simple_return_internal_long): Likewise.
(simple_return_indirect_internal): Likewise.
(mode_mk): New.
(*mode_mk): New.
(movmode): New.
(*movmode_internal_mpx): New.
(mode_bndcheck): New.
(*mode_bndcheck): New.
(mode_ldx): New.
(*mode_ldx): New.
(mode_stx): New.
(*mode_stx): New.
* config/i386/predicates.md (lea_address_operand) Rename to...
(address_no_seg_operand): ... this.
(address_mpx_no_base_operand): New.
(address_mpx_no_index_operand): New.
(bnd_mem_operator): New.
* config/i386/i386.opt (mmpx): New.

Are parts of this patch missing?

The ChangeLog references several changes in the machine independent 
parts of GCC, but I don't see them.  I can recall what they look like, 
but for sanity's sake, I think this patch needs to be reposted.  It's 
likely OK, but let's do the right thing.


jeff



Re: [PATCH] Fix more typos in error messages

2014-05-16 Thread Jeff Law

On 02/25/14 09:21, Benno Schulenberg wrote:

2014-02-07  Benno Schulenbergbensb...@justemail.net


* config/arc/arc.c (arc_init): Fix typo in error message.
* config/i386/i386.c (ix86_expand_builtin): Likewise.
(split_stack_prologue_scratch_regno): Likewise.
* fortran/check.c (gfc_check_fn_rc2008): Remove duplicate
word from error message.

Thanks.  Installed.  Sorry about the delay.

jeff


[patch, libstdc++] fix TCL error in abi.exp

2014-05-16 Thread Sandra Loosemore
As noted in PR 23867, the libstdc++ ABI testsuite doesn't work for 
installed compiler testing.  There is some logic in abi.exp to check for 
the presence of the required bits and skip these tests if they're not 
there, but before it gets that far it's hitting a TCL error:


ERROR: can't read baseline_subdir_switch: no such variable
while executing
eval exec $cxx $baseline_subdir_switch
invoked from within
set baseline_subdir [eval exec $cxx $baseline_subdir_switch]
...

This patch moves the offending reference to $baseline_subdir_switch 
after the check for missing bits instead of before.  OK to commit?


-Sandra


2014-05-16  Iain Sandoe  i...@codesourcery.com
Sandra Loosemore  san...@codesourcery.com

libstdc++-v3/
* testsuite/libstdc++-abi/abi.exp: Defer setting of baseline_subdir
until after checking that the test is eligible to be run.
Index: libstdc++-v3/testsuite/libstdc++-abi/abi.exp
===
--- libstdc++-v3/testsuite/libstdc++-abi/abi.exp	(revision 210372)
+++ libstdc++-v3/testsuite/libstdc++-abi/abi.exp	(working copy)
@@ -24,8 +24,6 @@ if { [string match *-*-darwin* $target
 set lib $blddir/src/.libs/libstdc++.so
 }
 
-set baseline_subdir [eval exec $cxx $baseline_subdir_switch]
-
 # Build the support objects.
 v3-build_support
 
@@ -35,6 +33,8 @@ if { (${v3-symver} == 0) || ![info exist
 return
 }
 
+set baseline_subdir [eval exec $cxx $baseline_subdir_switch]
+
 set baseline_file \
 [file join $baseline_dir $baseline_subdir baseline_symbols.txt]
 # If there is no ABI-specific reference file use that of the default ABI.


Re: Unreviewed Patch

2014-05-16 Thread Jeff Law

On 02/22/14 16:07, rbmj wrote:

Hi all,

Just a ping, I haven't gotten anything back on this patch:
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00621.html
The patch needs to be tested by bootstrapping on another platform and 
performing a regression test.Most folks use an x86_64 linux system 
for that step.


While the odds that this patch breaks things is small, the regression 
test and bootstrap has proven quite valuable through the years in 
catching issues.


jeff


Re: [patch, libstdc++] fix TCL error in abi.exp

2014-05-16 Thread Jeff Law

On 05/16/14 13:53, Sandra Loosemore wrote:

As noted in PR 23867, the libstdc++ ABI testsuite doesn't work for
installed compiler testing.  There is some logic in abi.exp to check for
the presence of the required bits and skip these tests if they're not
there, but before it gets that far it's hitting a TCL error:

ERROR: can't read baseline_subdir_switch: no such variable
 while executing
eval exec $cxx $baseline_subdir_switch
 invoked from within
set baseline_subdir [eval exec $cxx $baseline_subdir_switch]
...

This patch moves the offending reference to $baseline_subdir_switch
after the check for missing bits instead of before.  OK to commit?

-Sandra


2014-05-16  Iain Sandoe  i...@codesourcery.com
 Sandra Loosemore  san...@codesourcery.com

 libstdc++-v3/
 * testsuite/libstdc++-abi/abi.exp: Defer setting of
baseline_subdir
 until after checking that the test is eligible to be run.

OK.
Jeff



Re: [PATCH][C-family] Fix PR61184

2014-05-16 Thread Jeff Law

On 05/14/14 03:06, Richard Biener wrote:


The following fixes pre/post-inc/dec gimplification of promoted
integer types.  There is the issue with the way TYPE_OVERFLOW_UNDEFINED
is related to TYPE_OVERFLOW_WRAPS and the (non-)semantics of
-fno-strict-overflow.

In this case, with -On -fno-strict-overflow for a variable of
type short we have !TYPE_OVERFLOW_WRAPS _and_ !TYPE_OVERFLOW_UNDEFINED
(so we're in an undefined area).  Which means that
!TYPE_OVERFLOW_UNDEFINED doesn't imply that overflow wraps.

Thus the gimplification has to play on the safe side and
always use an unsigned type unless the user specifies -fwrapv
(the flag with a proper semantic meaning).

That is, it seems to be the case that what predicate to use
(TYPE_OVERFLOW_WRAPS or TYPE_OVERFLOW_UNDEFINED, independent
on whether you invert it), depends on the use-case in a very
awkward (and error-prone) way.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu, ok
if that succeeds (I expect to have to adjust some testcases)?

Thanks,
Richard.

2014-05-14  Richard Biener  rguent...@suse.de

c-family/
* c-gimplify.c (c_gimplify_expr): Gimplify self-modify expressions
using unsigned arithmetic if overflow does not wrap instead of
if overflow is undefined.

* c-c++-common/torture/pr61184.c: New testcase.

Seems reasonable to me.

Jeff



Re: [PATCH 5/5] add libcc1

2014-05-16 Thread Joseph S. Myers
On Fri, 16 May 2014, Tom Tromey wrote:

 This patch adds the plugin to the gcc tree and updates the top-level
 configury.

I don't see anything obvious that would disable the plugin if plugins are 
unsupported (e.g. on Windows host) or disabled (--disable-plugin).  
Probably the relevant support from gcc/configure.ac needs to go somewhere 
it can be used at toplevel.

 +  self-args.push_back (gcc);

seems wrong - at least you should use the appropriate compiler name after 
transformation for cross compilers / --program-transform-name.  Though 
really the *versioned* driver $(target_noncanonical)-gcc-$(version) is the 
right one to use, in that the plugin should presumably be closely 
associated with a particular compiler installation when e.g. a 
distribution has multiple GCC versions packaged that can be installed 
simultaneously.  Having multiple copies of the plugin installed in the 
same prefix for different targets should be supported.  I'm not clear on 
which library does what - does libcc1 depend in any way on the target or 
GCC version?

How are the compiler multilib options (e.g. -m32 / -m64) specified?  Is 
that something GDB passes through, based on examining properties of the 
binary being debugged?  (Unfortunately there may not be a good way in 
general for GDB to tell what the required options for a particular binary 
are.  It can at least reasonably default to using a copy of the plugin for 
the same target triplet as GDB was configured for.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Oleg Endo
On Fri, 2014-05-16 at 18:10 +0800, Bin.Cheng wrote:
 On Thu, May 15, 2014 at 6:31 PM, Oleg Endo oleg.e...@t-online.de wrote:
 
  How about the following.
  Instead of adding new hooks and inserting the pass to the general pass 
  list, make the new
  pass class take the necessary callback functions directly.  Then targets 
  can just instantiate
  the pass, passing their impl of the callbacks, and insert the pass object 
  into the pass list at
  a place that fits best for the target.

 Oh, I don't know we can do this in GCC.  But yes, a target may want to
 run it at some place that fits best for the target.
 

I think it's better than trying to come up with a scheme that so-so fits
all.  My idea would look like:

// merge_paired_loadstore.h
class merge_paired_loadstore : public rtl_opt_pass
{
public:
  struct delegate
  {
virtual bool merge_paired_loadstore (rtx x, rtx y, ...) = 0;
...
  };

  merge_paired_loadstore (gcc::context* ctx, const char* name,
  delegate* d);
  ...
};

// target.cc

#include merge_paired_loadstore.h

static struct target_merge_loadstore_delegate :
merge_paired_loadstore::delegate
{
  virtual bool merge_paired_loadstore (...)
  {
 // code as if this was a freestanding target hook function
  };
} g_merge_loadstore_delegate;


static void target_register_passes (void)
{
  register_pass (
new merge_paired_loadstore (g, merge_ls,  
g_merge_loadstore_delegate),
PASS_POS_INSERT_AFTER, other pass, 1);
}


Then, later, maybe sometime in the future, if there's something like a
class target, it'd look like:

class my_target : public target,
  merge_paired_loadstore::delegate
{
   ... 

   virtual bool merge_paired_loadstore (...);
};

Maybe it's a bit far fetched at the moment, but it would be a start.

Cheers,
Oleg



[PATCH] Use optimize_function_for_size_p to assign register frequency

2014-05-16 Thread Dehao Chen
This patch uses optimize_function_for_size_p to replace old
optimize_size check in regs.h and ira-int.h to make it consistent.

Bootstrapped and testing on-going.

OK for trunk if test passes?

Thanks,
Dehao

gcc/ChangeLog:
2014-05-16  Dehao Chen  de...@google.com

* ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Use optimize_function_for_size_p.
* regs.h (REG_FREQ_FROM_BB): Likewise.

Index: gcc/ira-int.h
===
--- gcc/ira-int.h   (revision 210527)
+++ gcc/ira-int.h   (working copy)
@@ -42,9 +42,8 @@ along with GCC; see the file COPYING3.  If not see
profile driven feedback is available and the function is never
executed, frequency is always equivalent.  Otherwise rescale the
edge frequency.  */
-#define REG_FREQ_FROM_EDGE_FREQ(freq) \
-  (optimize_size || (flag_branch_probabilities\
- !ENTRY_BLOCK_PTR_FOR_FN (cfun)-count) \
+#define REG_FREQ_FROM_EDGE_FREQ(freq) \
+  (optimize_function_for_size_p (cfun)\
? REG_FREQ_MAX : (freq * REG_FREQ_MAX / BB_FREQ_MAX)   \
? (freq * REG_FREQ_MAX / BB_FREQ_MAX) : 1)

Index: gcc/regs.h
===
--- gcc/regs.h  (revision 210527)
+++ gcc/regs.h  (working copy)
@@ -135,9 +135,7 @@ extern size_t reg_info_p_size;
or profile driven feedback is available and the function is never executed,
frequency is always equivalent.  Otherwise rescale the basic block
frequency.  */
-#define REG_FREQ_FROM_BB(bb) (optimize_size  \
- || (flag_branch_probabilities   \
-  !ENTRY_BLOCK_PTR_FOR_FN (cfun)-count)   \
+#define REG_FREQ_FROM_BB(bb) (optimize_function_for_size_p (cfun)\
  ? REG_FREQ_MAX  \
  : ((bb)-frequency * REG_FREQ_MAX / BB_FREQ_MAX)\
  ? ((bb)-frequency * REG_FREQ_MAX / BB_FREQ_MAX)\


Re: [PATCH 5/5] add libcc1

2014-05-16 Thread Tom Tromey
 Joseph == Joseph S Myers jos...@codesourcery.com writes:

 +  self-args.push_back (gcc);

Joseph seems wrong - at least you should use the appropriate compiler
Joseph name after transformation for cross compilers /
Joseph --program-transform-name.

Ok, we'll look into it.

Joseph I'm not clear on which library does what - does libcc1 depend in
Joseph any way on the target or GCC version?

It doesn't depend on the target at all.  It does rely on the GCC plugin
using the same .def file -- in fact we should probably add a handshake
at the start so that we don't wind up with a version mismatch there.
The plugin is of course exposed to GCC internals changes as all plugins
are; but this is why we think it should be part of GCC.

Joseph How are the compiler multilib options (e.g. -m32 / -m64)
Joseph specified?  Is that something GDB passes through, based on
Joseph examining properties of the binary being debugged?
Joseph (Unfortunately there may not be a good way in general for GDB to
Joseph tell what the required options for a particular binary are.  It
Joseph can at least reasonably default to using a copy of the plugin
Joseph for the same target triplet as GDB was configured for.)

gdb extracts some information from the debuginfo.
Some information is provided by a new gdbarch method; for example this
is where -m32 / -m64 / -m31 (hi S390) come from.

I am not sure if we can get the proper $(target_noncanonical) for a
given gdbarch.  We'll look into it.

Tom


  1   2   >