[PATCH][4.8] S/390: Transaction optimization fixes

2013-10-04 Thread Andreas Krebbel
Hi,

the attached patch fixes two issues with the TX backend optimization
trying to get rid of FPR save/restore operations in some cases.

The first is that the stack pointer might get clobbered on 64 bit if
the backend was able to get rid of all the FPR saves/restores and
these were the only things requiring stack space. For more details
please see the testcase comment.

The second is that the optimization did not work as expected on 31 bit
since the loop setting the fpr save bits only for the 64 bit
call-saved FPRs made use of the call_clobbered information.

Both issues are covered by the new testcase.

Bootstrapped and regtested on s390 and s390x with --with-arch=zEC12.
No regressions.

I'm going to apply the patch to 4.8 and mainline after waiting a few
days for comments.

Bye,

-Andreas-

2013-10-04  Andreas Krebbel  andreas.kreb...@de.ibm.com

* config/s390/s390.c (s390_register_info): Make the call-saved FPR
loop to work also for 31bit ABI.
Save the stack pointer for frame_size  0.

2013-10-04  Andreas Krebbel  andreas.kreb...@de.ibm.com

* gcc.target/s390/htm-nofloat-2.c: New testcase.

---
 gcc/config/s390/s390.c|   21 
 gcc/testsuite/gcc.target/s390/htm-nofloat-2.c |   55 ++
 2 files changed, 57 insertions(+), 8 deletions(-), 11 modifications(!)

Index: gcc/config/s390/s390.c
===
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*** s390_register_info (int clobbered_regs[]
*** 7509,7516 
  {
cfun_frame_layout.fpr_bitmap = 0;
cfun_frame_layout.high_fprs = 0;
!   if (TARGET_64BIT)
!   for (i = FPR8_REGNUM; i = FPR15_REGNUM; i++)
  /* During reload we have to use the df_regs_ever_live infos
 since reload is marking FPRs used as spill slots there as
 live before actually making the code changes.  Without
--- 7509,7519 
  {
cfun_frame_layout.fpr_bitmap = 0;
cfun_frame_layout.high_fprs = 0;
! 
!   for (i = FPR0_REGNUM; i = FPR15_REGNUM; i++)
!   {
! if (call_really_used_regs[i])
!   continue;
  /* During reload we have to use the df_regs_ever_live infos
 since reload is marking FPRs used as spill slots there as
 live before actually making the code changes.  Without
*** s390_register_info (int clobbered_regs[]
*** 7523,7530 
   !global_regs[i])
{
  cfun_set_fpr_save (i);
! cfun_frame_layout.high_fprs++;
}
  }
  
for (i = 0; i  16; i++)
--- 7526,7536 
   !global_regs[i])
{
  cfun_set_fpr_save (i);
! 
! if (i = FPR8_REGNUM)
!   cfun_frame_layout.high_fprs++;
}
+   }
  }
  
for (i = 0; i  16; i++)
*** s390_register_info (int clobbered_regs[]
*** 7554,7559 
--- 7560,7566 
|| TARGET_TPF_PROFILING
|| cfun_save_high_fprs_p
|| get_frame_size ()  0
+   || (reload_completed  cfun_frame_layout.frame_size  0)
|| cfun-calls_alloca
|| cfun-stdarg);
  
*** s390_register_info (int clobbered_regs[]
*** 7652,7665 
cfun_set_fpr_save (i + FPR0_REGNUM);
}
  }
- 
-   if (!TARGET_64BIT)
- {
-   if (df_regs_ever_live_p (FPR4_REGNUM)  !global_regs[FPR4_REGNUM])
-   cfun_set_fpr_save (FPR4_REGNUM);
-   if (df_regs_ever_live_p (FPR6_REGNUM)  !global_regs[FPR6_REGNUM])
-   cfun_set_fpr_save (FPR6_REGNUM);
- }
  }
  
  /* Fill cfun-machine with info about frame of current function.  */
--- 7659,7664 
Index: gcc/testsuite/gcc.target/s390/htm-nofloat-2.c
===
*** /dev/null
--- gcc/testsuite/gcc.target/s390/htm-nofloat-2.c
***
*** 0 
--- 1,55 
+ /* { dg-do run } */
+ /* { dg-options -O3 -mhtm -Wa,-march=zEC12 --save-temps } */
+ 
+ /* __builtin_tbegin has to emit clobbers for all FPRs since the tbegin
+instruction does not automatically preserves them.  If the
+transaction body is fully contained in a function the backend tries
+after reload to get rid of the FPR save/restore operations
+triggered by the clobbers.  This testcase failed since the backend
+was able to get rid of all FPR saves/restores and since these were
+the only stack operations also of the entire stack space.  So even
+the save/restore of the stack pointer was omitted in the end.
+However, since the frame layout has been fixed before, the prologue
+still generated the stack pointer decrement making foo return with
+a modified stack pointer.  */
+ 
+ void abort(void);
+ 
+ void __attribute__((noinline))
+ foo (int a)
+ {
+   /* This is just to prevent the tbegin code from actually being
+  executed.  That way the test may even run 

[SH] PR 51244 - Fix defects introduced in 4.8

2013-10-04 Thread Oleg Endo
Hello,

Some of the things I've done in 4.8 to improve SH T bit handling turned
out to produce wrong code.  The attached patch fixes that by introducing
an SH specific RTL pass.

Tested on rev 202876 with
make -k check RUNTESTFLAGS=--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}

and no new failures.
Additional test cases will follow.
OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/ifcvt_sh.cc: New SH specific RTL pass.
* config.gcc (SH extra_objs): Add ifcvt_sh.o.
* config/sh/t-sh (ifcvt_sh.o): New entry.
* config/sh/sh.c (sh_fixed_condition_code_regs): New function 
that implements the target hook TARGET_FIXED_CONDITION_CODE_REGS.
(register_sh_passes): New function.  Register ifcvt_sh pass.
(sh_option_override): Invoke it.
(sh_canonicalize_comparison): Handle op0_preserve_value.
* sh.md (*cbranch_t): Do not try to optimize missed test and 
branch opportunities.  Canonicalize branch condition.
(nott): Allow only if pseudos can be created for non-SH2A.
Index: gcc/config.gcc
===
--- gcc/config.gcc	(revision 202876)
+++ gcc/config.gcc	(working copy)
@@ -462,6 +462,7 @@
 	cpu_type=sh
 	need_64bit_hwint=yes
 	extra_options=${extra_options} fused-madd.opt
+	extra_objs=${extra_objs} ifcvt_sh.o
 	;;
 v850*-*-*)
 	cpu_type=v850
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 202876)
+++ gcc/config/sh/sh.c	(working copy)
@@ -53,6 +53,9 @@
 #include alloc-pool.h
 #include tm-constrs.h
 #include opts.h
+#include tree-pass.h
+#include pass_manager.h
+#include context.h
 
 #include sstream
 #include vector
@@ -311,6 +314,7 @@
 static void sh_canonicalize_comparison (int *, rtx *, rtx *, bool);
 static void sh_canonicalize_comparison (enum rtx_code, rtx, rtx,
 	enum machine_mode, bool);
+static bool sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2);
 
 static void sh_init_sync_libfuncs (void) ATTRIBUTE_UNUSED;
 
@@ -587,6 +591,9 @@
 #undef TARGET_CANONICALIZE_COMPARISON
 #define TARGET_CANONICALIZE_COMPARISON	sh_canonicalize_comparison
 
+#undef TARGET_FIXED_CONDITION_CODE_REGS
+#define TARGET_FIXED_CONDITION_CODE_REGS sh_fixed_condition_code_regs
+
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION	(SYMBOL_FLAG_MACH_DEP  0)
 
@@ -710,6 +717,34 @@
 #undef err_ret
 }
 
+/* Register SH specific RTL passes.  */
+extern opt_pass* make_pass_ifcvt_sh (gcc::context* ctx, bool split_insns,
+ const char* name);
+static void
+register_sh_passes (void)
+{
+  if (!TARGET_SH1)
+return;
+
+/* Running the ifcvt_sh pass after ce1 generates better code when
+   comparisons are combined and reg-reg moves are introduced, because
+   reg-reg moves will be eliminated afterwards.  However, there are quite
+   some cases where combine will be unable to fold comparison related insns,
+   thus for now don't do it.
+  register_pass (make_pass_ifcvt_sh (g, false, ifcvt1_sh),
+		 PASS_POS_INSERT_AFTER, ce1, 1);
+*/
+
+  /*  Run ifcvt_sh pass after combine but before register allocation.  */
+  register_pass (make_pass_ifcvt_sh (g, true, ifcvt2_sh),
+		 PASS_POS_INSERT_AFTER, split1, 1);
+
+  /* Run ifcvt_sh pass after register allocation and basic block reordering
+ as this sometimes creates new opportunities.  */
+  register_pass (make_pass_ifcvt_sh (g, true, ifcvt3_sh),
+		 PASS_POS_INSERT_AFTER, split4, 1);
+}
+
 /* Implement TARGET_OPTION_OVERRIDE macro.  Validate and override 
various options, and do some machine dependent initialization.  */
 static void
@@ -1022,6 +1057,8 @@
  target CPU.  */
   selected_atomic_model_
 = parse_validate_atomic_model_option (sh_atomic_model_str);
+
+  register_sh_passes ();
 }
 
 /* Print the operand address in x to the stream.  */
@@ -1908,7 +1945,7 @@
 static void
 sh_canonicalize_comparison (enum rtx_code cmp, rtx op0, rtx op1,
 			enum machine_mode mode,
-			bool op0_preserve_value ATTRIBUTE_UNUSED)
+			bool op0_preserve_value)
 {
   /* When invoked from within the combine pass the mode is not specified,
  so try to get it from one of the operands.  */
@@ -1928,6 +1965,9 @@
   // Make sure that the constant operand is the second operand.
   if (CONST_INT_P (op0)  !CONST_INT_P (op1))
 {
+  if (op0_preserve_value)
+	return;
+
   std::swap (op0, op1);
   cmp = swap_condition (cmp);
 }
@@ -2016,6 +2056,14 @@
   *code = (int)tmp_code;
 }
 
+bool
+sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2)
+{
+  *p1 = T_REG;
+  *p2 = INVALID_REGNUM;
+  return true;
+}
+
 enum rtx_code
 prepare_cbranch_operands (rtx *operands, enum machine_mode mode,
 			  enum rtx_code comparison)
Index: gcc/config/sh/sh.md
===
--- 

[C++ Patch] PR 58560

2013-10-04 Thread Paolo Carlini

Hi,

this error recovery ICE (a low priority regression) can be easily 
avoided by checking the TREE_TYPE of exp too. Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2013-10-04  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58560
* typeck2.c (build_functional_cast): Use error_operand_p on exp.

/testsuite
2013-10-04  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58560
* g++.dg/cpp0x/auto39.C: New.
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 203200)
+++ cp/typeck2.c(working copy)
@@ -1757,7 +1757,7 @@ build_functional_cast (tree exp, tree parms, tsubs
   tree type;
   vectree, va_gc *parmvec;
 
-  if (exp == error_mark_node || parms == error_mark_node)
+  if (error_operand_p (exp) || parms == error_mark_node)
 return error_mark_node;
 
   if (TREE_CODE (exp) == TYPE_DECL)
Index: testsuite/g++.dg/cpp0x/auto39.C
===
--- testsuite/g++.dg/cpp0x/auto39.C (revision 0)
+++ testsuite/g++.dg/cpp0x/auto39.C (working copy)
@@ -0,0 +1,6 @@
+// PR c++/58560
+// { dg-do compile { target c++11 } }
+
+typedef auto T; // { dg-error typedef declared 'auto' }
+
+void foo() { T(); }


Enable SSE math on i386 with -Ofast

2013-10-04 Thread Jan Hubicka
Hi,
this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
SPECfP (2000 and 2006). Even though for exmaple the following
float a(float b)
{
   return b+10;
}

results in somewhat ridiculous
a:
.LFB0:  
.cfi_startproc
subl$4, %esp
.cfi_def_cfa_offset 8
movss   .LC0, %xmm0
addss   8(%esp), %xmm0
movss   %xmm0, (%esp)
flds(%esp)
addl$4, %esp
.cfi_def_cfa_offset 4
ret

I wonder if we can get rid at least of the redundant stack alignment on ESP...

Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no
complains.  I wonder if -ffast-math should do the same - it is documented as 
enabling
explicit set of options, bu that can be changed I guess.

* invoke.texi (Ofast): Update documentation.
* i386.h (TARGET_FPMATH_DEFAULT): Enable SSE math with -Ofast.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 203161)
+++ doc/invoke.texi (working copy)
@@ -6796,6 +6796,7 @@ Disregard strict standards compliance.
 valid for all standard-compliant programs.
 It turns on @option{-ffast-math} and the Fortran-specific
 @option{-fno-protect-parens} and @option{-fstack-arrays}.
+On i386 target it also enable @option{-mfpmath=sse}.
 
 @item -Og
 @opindex Og
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 203161)
+++ config/i386/i386.h  (working copy)
@@ -209,7 +209,8 @@ extern const struct processor_costs ix86
 
 #ifndef TARGET_FPMATH_DEFAULT
 #define TARGET_FPMATH_DEFAULT \
-  (TARGET_64BIT  TARGET_SSE ? FPMATH_SSE : FPMATH_387)
+  ((TARGET_64BIT  TARGET_SSE) \
+   || (TARGET_SSE  optimize_fast) ? FPMATH_SSE : FPMATH_387)
 #endif
 
 #define TARGET_FLOAT_RETURNS_IN_80387 TARGET_FLOAT_RETURNS


Re: [C++ Patch] PR 58448

2013-10-04 Thread Paolo Carlini

... and this is a more straightforward approach. Also tested x86_64-linux.

Thanks!
Paolo.

/
/cp
2013-10-04  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58448
* pt.c (tsubst): Use error_operand_p on parameter t.

/testsuite
2013-10-04  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58448
* g++.dg/template/crash117.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 203200)
+++ cp/pt.c (working copy)
@@ -11272,7 +11272,7 @@ tsubst (tree t, tree args, tsubst_flags_t complain
   enum tree_code code;
   tree type, r = NULL_TREE;
 
-  if (t == NULL_TREE || t == error_mark_node
+  if (t == NULL_TREE
   || t == integer_type_node
   || t == void_type_node
   || t == char_type_node
@@ -11281,6 +11281,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain
   || TREE_CODE (t) == TRANSLATION_UNIT_DECL)
 return t;
 
+  if (error_operand_p (t))
+return error_mark_node;
+
   if (DECL_P (t))
 return tsubst_decl (t, args, complain);
 
Index: testsuite/g++.dg/template/crash117.C
===
--- testsuite/g++.dg/template/crash117.C(revision 0)
+++ testsuite/g++.dg/template/crash117.C(working copy)
@@ -0,0 +1,6 @@
+// PR c++/58448
+
+class SmallVector; struct Types4;
+template typename, typename, typename, typename struct Types {
+  typedef Types4::Constructable // { dg-error template|typedef|expected }
+} TypesSmallVector, SmallVector, SmallVector, SmallVector::  // { dg-error 
expected }


Re: patch to canonicalize tree-csts.

2013-10-04 Thread Kenneth Zadeck
So here is a patch with the change. As before, bootstrapped an tested on 
x86-64.


On 10/03/2013 12:16 PM, Richard Sandiford wrote:

Kenneth Zadeck zad...@naturalbridge.com writes:

Changing the representation of unsigned constants is only worthwhile
if we can avoid the force_to_size for some unsigned cases.  I think we can
avoid it for precision = xprecision  !small_prec.  Either we should take
the hit of doing that comparison (but see below) or the change isn't
worthwhile.

i think that it is something closer to precision = xprecision +
HOST_BITS_PER_WIDE_INT  ...
The problem is that the tree cst may have one extra block beyond the
precision.

Ah, OK.


I was thinking that we should always be able to use the constant as-is
for max_wide_int-based and addr_wide_int-based operations.  The small_prec

again, you can get edge cased to death here.i think it would work
for max because that really is bigger than anything else, but it is
possible (though unlikely) to have something big converted to an address
by truncation.

But I'd have expected that conversion to be represented by an explicit
CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
something that isn't bit- or byte-address-sized.  It'd be the C equivalent
of int + long - int rather than the expected int + long - long.

Same goes for wide_int.  If we're doing arithmetic at a specific
precision, it seems odd for one of the inputs to be wider and yet
not have an explicit truncation.

Thanks,
Richard


Index: gcc/tree.c
===
--- gcc/tree.c	(revision 203039)
+++ gcc/tree.c	(working copy)
@@ -1187,10 +1187,10 @@ wide_int_to_tree (tree type, const wide_
   tree t;
   int ix = -1;
   int limit = 0;
-  int i;
+  unsigned int i;
 
   gcc_assert (type);
-  int prec = TYPE_PRECISION (type);
+  unsigned int prec = TYPE_PRECISION (type);
   signop sgn = TYPE_SIGN (type);
 
   /* Verify that everything is canonical.  */
@@ -1204,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_
 }
 
   wide_int cst = wide_int::from (pcst, prec, sgn);
-  int len = int (cst.get_len ());
-  int small_prec = prec  (HOST_BITS_PER_WIDE_INT - 1);
+  unsigned int len = int (cst.get_len ());
+  unsigned int small_prec = prec  (HOST_BITS_PER_WIDE_INT - 1);
   bool recanonize = sgn == UNSIGNED
- (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len
- small_prec;
+ small_prec
+ (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len;
 
   switch (TREE_CODE (type))
 {
@@ -1235,7 +1235,7 @@ wide_int_to_tree (tree type, const wide_
 
 case INTEGER_TYPE:
 case OFFSET_TYPE:
-  if (TYPE_UNSIGNED (type))
+  if (TYPE_SIGN (type) == UNSIGNED)
 	{
 	  /* Cache 0..N */
 	  limit = INTEGER_SHARE_LIMIT;
@@ -1294,7 +1294,7 @@ wide_int_to_tree (tree type, const wide_
 	 must be careful here because tree-csts and wide-ints are
 	 not canonicalized in the same way.  */
 	  gcc_assert (TREE_TYPE (t) == type);
-	  gcc_assert (TREE_INT_CST_NUNITS (t) == len);
+	  gcc_assert (TREE_INT_CST_NUNITS (t) == (int)len);
 	  if (recanonize)
 	{
 	  len--;
@@ -1321,7 +1321,10 @@ wide_int_to_tree (tree type, const wide_
 	  TREE_VEC_ELT (TYPE_CACHED_VALUES (type), ix) = t;
 	}
 }
-  else if (cst.get_len () == 1)
+  else if (cst.get_len () == 1
+	(TYPE_SIGN (type) == SIGNED
+	   || recanonize
+	   || cst.elt (0) = 0))
 {
   /* 99.99% of all int csts will fit in a single HWI.  Do that one
 	 efficiently.  */
@@ -1351,14 +1354,29 @@ wide_int_to_tree (tree type, const wide_
 	 for the gc to take care of.  There will not be enough of them
 	 to worry about.  */
   void **slot;
-  tree nt = make_int_cst (len);
-  TREE_INT_CST_NUNITS (nt) = len;
+  tree nt;
+  if (!recanonize
+	   TYPE_SIGN (type) == UNSIGNED 
+	   cst.elt (len - 1)  0)
+	{
+	  unsigned int blocks_needed 
+	= (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT;
+
+	  nt = make_int_cst (blocks_needed + 1);
+	  for (i = len; i  blocks_needed; i++)
+	TREE_INT_CST_ELT (nt, i) = (HOST_WIDE_INT)-1;
+
+	  TREE_INT_CST_ELT (nt, blocks_needed) = 0;
+	}
+  else
+	nt = make_int_cst (len);
   if (recanonize)
 	{
 	  len--;
 	  TREE_INT_CST_ELT (nt, len) = zext_hwi (cst.elt (len), small_prec);
 	}
-  for (int i = 0; i  len; i++)
+	
+  for (i = 0; i  len; i++)
 	TREE_INT_CST_ELT (nt, i) = cst.elt (i);
   TREE_TYPE (nt) = type;
 
@@ -10556,7 +10574,8 @@ widest_int_cst_value (const_tree x)
 
 #if HOST_BITS_PER_WIDEST_INT  HOST_BITS_PER_WIDE_INT
   gcc_assert (HOST_BITS_PER_WIDEST_INT = HOST_BITS_PER_DOUBLE_INT);
-  gcc_assert (TREE_INT_CST_NUNITS (x) = 2);
+  gcc_assert (TREE_INT_CST_NUNITS (x) = 2
+	  || (TREE_INT_CST_NUNITS (x) == 3  TREE_INT_CST_ELT (x, 2) == 0));
   
   if (TREE_INT_CST_NUNITS (x) == 1)
 val = ((HOST_WIDEST_INT)val  HOST_BITS_PER_WIDE_INT)  HOST_BITS_PER_WIDE_INT;
@@ -10565,7 +10584,8 @@ 

Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Ramana Radhakrishnan

On 10/02/13 23:49, Rong Xu wrote:

Here is the new patch. Honaz: Could you take a look?

Thanks,

-Rong

On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka hubi...@ucw.cz wrote:

Thanks for the suggestion. This is much cleaner than to use binary parameter.

Just want to make sure I understand it correctly about the orginal hitrate:
you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use
the one specified in the biniltin-expect-probability parameter.


Yes.


Should I use 90% as the default? It's hard to fit current value 0.9996
in percent form.


Yes, 90% seems fine.  The original value was set quite arbitrarily and no real
performance study was made as far as I know except yours. I think users that
are sure they use expect to gueard completely cold edges may just use 100%
instead of 0.9996, so I would not worry much about the precision.

Honza


-Rong


OK with that change.

Honza



This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug 
further yet but still reducing the testcase.


See

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619

for details.

Can you please deal with this appropriately ?

regards
Ramana




Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Andrew Sutton
 +  // Do not permit the declaration of constrained friend
 +  // function declarations. They cannot be instantiated since
 +  // the resulting declaration would never match the definition,
 +  // which must be a non-template and cannot be constrained.


 You're in the template-id code here, so must be a non-template is
 confusing:

 template class T void f();

 struct A {
   friend void fint(); // matches a template
 };

 Perhaps you mean that it must match a fully-instantiated function, so any
 constraints on the templates were considered during
 determine_specialization.

This seems like a simple comment fix, but there's a longer explanation
of what I want (see below). Would this be more appropriate?

  // Do not allow constrained friend template specializations.

The intent is stronger than to say it must match something. I don't
want to allow any declarations of the form

templatetypename T
struct X {
  friend void f(T x) requires CT; // Error.
};

This should never even get to determine_specialization since the
original declaration is never actually pushed.

We could use those constraints to match the specialization to one of
several constrained overloads, as you mentioned earlier, but I'd
rather avoid that for now. Solving that problem in general would
require that we allow constrained (explicit) specializations and
define a method of matching instantiated constraints to dependent
constraints, and that we do so as an alternative to the usual
constraint checking during template argument deduction.

Maybe it's a useful feature, but it's really hard to gauge how much
use it would actually get. We can always revisit that in the future.
Somebody else can write that paper :)

Andrew


Re: [PATCH] libgccjit.so: an embeddable JIT-compiler based on GCC

2013-10-04 Thread Joseph S. Myers
On Thu, 3 Oct 2013, David Malcolm wrote:

 Right now all you get back from the result is a void* which you're
 meant to cast to machine code for the CPU.   I guess we could add an

And right now the library is calling dlopen.  Which means that although 
what the user gets is a void *, the dynamic linker processing has dealt 
with registering eh_frame information and the right hooks have been passed 
through for GDB to see that an object has been loaded and access its debug 
information.  Is this use of dlopen intended to be part of the interface, 
or just a temporary hack with users eventually needing to use other 
interfaces for eh_frame and debug info handling?  (GDB has some support 
for JITs, but I haven't looked into the details of how it works.)

 option on the gcc_jit_context for setting which ISA you want code for.

Even apart from completely separate ISAs, there's also the matter of other 
command-line options such as -march=, which I'd think users should be able 
to specify.  And the complication that default options to cc1 can be 
determined by specs, whether in .h files or from configure options such as 
--with=arch=, so cc1's defaults may not be the same as the defaults when 
you run the driver (which are likely to be better defaults for the JIT 
than cc1's).  And if the user wants to use -march=native (which seems 
sensible for the common use case of a JIT, generating code for the same 
system as runs the compiler), that's handled through specs.

Maybe the driver should be converted to library code so it's possible to 
run command-line options through it and generate the command line that 
would get passed to cc1 (but then use that to call a function in the same 
process - with a different copy of global options data, diagnostic context 
etc. to avoid any issues from those being initialized twice - rather than 
running a subprocess).  That would also allow generating the right options 
for the assembler to pass to the library version of the assembler.

 * There are some grotesque kludges in internal-api.c, especially in
   how we go from .s assembler files to a DSO (grep for gross hack ;) )
  
  Apart from making the assembler into a shared library itself, it would 
  also be nice to avoid needing those files at all (via an API for writing 
  assembler text that doesn't depend on a FILE * pointer, although on GNU 
  hosts you can always use in-memory files via open_memstream, which would 
  make the internal changes much smaller).  But in the absence of such a 
  cleanup, running the assembler via the driver should work, although 
  inefficient.
 
 (nods)   Note that all of the kludgery (if that's a word) is hidden
 inside the library, so we can fix it all up without affecting client
 code: the client-facing API doesn't expose any of this.
 
 FWIW I added timevars for measuring the invocation of the driver; that
 kludge makes up about 50% of the overall time taken.  I haven't yet
 broken that down into assembler vs linker vs fork/exec overhead, but
 clearly this is something that could use optimization - leading to
 (very) vague thoughts involving turning parts of binutils into libraries
 also.

First I guess it might simply be a library that receives blocks of text 
that currently go to asm_out_file - but eventually there might be 
efficiency to be gained by passing binary data to the assembler in some 
cases (e.g. for large blocks of debug info) rather than generating and 
then parsing text.  (So some asm_out methods would gain implementations 
passing such data instead of generating text.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PING] [PATCH] Add a new option -ftree-bitfield-merge (patch / doc inside)

2013-10-04 Thread Zoran Jovanovic
Just to ping this patch.

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01829.html

Regards,
Zoran Jovanovic

From: Zoran Jovanovic
Sent: Tuesday, September 24, 2013 11:59 PM
To: gcc-patches@gcc.gnu.org
Cc: Petar Jovanovic
Subject: RE: [PATCH] Add a new option -ftree-bitfield-merge (patch / doc 
inside)

Hello,
This is new patch version.
Comments from Bernhard Reutner-Fischer review applied.
Also, test case bitfildmrg2.c modified - it is now execute test.


Example:

Original code:
  unnamed-unsigned:3 D.1351;
  unnamed-unsigned:9 D.1350;
  unnamed-unsigned:7 D.1349;
  D.1349_2 = p1_1(D)-f1;
  p2_3(D)-f1 = D.1349_2;
  D.1350_4 = p1_1(D)-f2;
  p2_3(D)-f2 = D.1350_4;
  D.1351_5 = p1_1(D)-f3;
  p2_3(D)-f3 = D.1351_5;

Optimized code:
  unnamed-unsigned:19 D.1358;
  _16 = pr1_2(D)-_field0;
  pr2_4(D)-_field0 = _16;

Algorithm works on basic block level and consists of following 3 major steps:
1. Go trough basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Modify trees accordingly.

New command line option -ftree-bitfield-merge is introduced.

Tested - passed gcc regression tests.

Changelog -

gcc/ChangeLog:
2013-09-24 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * Makefile.in : Added tree-sra.c to GTFILES.
  * common.opt (ftree-bitfield-merge): New option.
  * doc/invoke.texi: Added reference to -ftree-bitfield-merge.
  * tree-sra.c (ssa_bitfield_merge): New function.
  Entry for (-ftree-bitfield-merge).
  (bitfield_stmt_access_pair_htab_hash): New function.
  (bitfield_stmt_access_pair_htab_eq): New function.
  (cmp_access): New function.
  (create_and_insert_access): New function.
  (get_bit_offset): New function.
  (get_merged_bit_field_size): New function.
  (add_stmt_access_pair): New function.
  * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
  (field_byte_offset): declaration moved to tree.h, static removed.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
  * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): moved from dwarf2out.c.
  * tree.h (expressions_equal_p): declaration added.
  (field_byte_offset): declaration added.

Patch -

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a2e3f7a..54aa8e7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/asan.c \
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \
+  $(srcdir)/tree-sra.c \
   @all_gtfiles@

 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 202e169..afac514 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2164,6 +2164,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates

+ftree-bitfield-merge
+Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
+Enable bit-field merge on trees
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA-normal pass
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa0f4ed..e588cae 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
--ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
+-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -7679,6 +7679,11 @@ pointer alignment information.
 This pass only operates on local scalar variables and is enabled by default
 at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.

+@item -ftree-bitfield-merge
+@opindex ftree-bitfield-merge
+Combines several adjacent bit-field accesses that copy values
+from one memory location to another into one single bit-field access.
+
 @item -ftree-ccp
 @opindex ftree-ccp
 Perform sparse conditional constant propagation (CCP) on trees.  This
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 95049e4..e74db17 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, unsigned 
int);
 static tree field_type (const_tree);
 

Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Xinliang David Li
Dehao, can you take a look?

David

On Fri, Oct 4, 2013 at 6:05 AM, Ramana Radhakrishnan ramra...@arm.com wrote:
 On 10/02/13 23:49, Rong Xu wrote:

 Here is the new patch. Honaz: Could you take a look?

 Thanks,

 -Rong

 On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka hubi...@ucw.cz wrote:

 Thanks for the suggestion. This is much cleaner than to use binary
 parameter.

 Just want to make sure I understand it correctly about the orginal
 hitrate:
 you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use
 the one specified in the biniltin-expect-probability parameter.


 Yes.


 Should I use 90% as the default? It's hard to fit current value 0.9996
 in percent form.


 Yes, 90% seems fine.  The original value was set quite arbitrarily and no
 real
 performance study was made as far as I know except yours. I think users
 that
 are sure they use expect to gueard completely cold edges may just use
 100%
 instead of 0.9996, so I would not worry much about the precision.

 Honza


 -Rong


 OK with that change.

 Honza



 This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further
 yet but still reducing the testcase.

 See

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619

 for details.

 Can you please deal with this appropriately ?

 regards
 Ramana




Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-04 Thread Xinliang David Li
On Thu, Oct 3, 2013 at 10:15 PM, Teresa Johnson tejohn...@google.com wrote:
 This patch handles the fact that small profile count values sometimes
 get truncated down to 0 during updates due to integer arithmetic. This
 causes sub-optimal function splitting under
 -freorder-blocks-and-partition.

 The first part fixes the logic in probably_never_executed that looks
 at the bb frequency when the bb count is zero. It now correctly
 compares the count computed from the frequency to the number of
 profile runs instead of to REG_BR_PROB_BASE. The minimum ratio of
 profile counts to training runs required for a block to not be
 considered cold is now encoded in a parameter, with the default value
 of 4 to match the existing code.

 The second part ensures that frequencies are correctly updated after
 inlining. The problem is that after inlining the frequencies were
 being recomputed directly from the corresponding bb counts in
 rebuild_frequencies. If the counts had been truncated to 0, then the
 recomputed frequencies would become 0 as well (often leading to
 inconsistencies in the frequencies between blocks). Then the above
 change to probably_never_executed would not help identify these blocks
 as non-cold from the frequencies. The fix was to use the existing
 logic used during static profile rebuilding to also
 recompute/propagate the frequencies from the branch probabilities in
 the profile feedback case. I also renamed a number of estimate_*
 routines to compute_* and updated the comments to reflect the fact
 that these routines are not doing estimation (from a static profile),
 but in fact recomputing/propagating frequencies from the existing
 (either guessed or profile-feedback-based) probabilities.

For the second part, it seems to assume the branch probabilities are
better maintained than bb counts?

David


 Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
 (One more round of regression testing in progress after making some
 slight cleanup changes.)

 Thanks,
 Teresa

 2013-10-03  Teresa Johnson  tejohn...@google.com

 * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
 * predict.c (probably_never_executed): Compare frequency-based
 count to number of training runs.
 (tree_estimate_probability): Update function call to new name.
 compute_bb_frequencies and pass new parameter.
 (compute_loops_at_level): Renamed.
 (compute_loops): Ditto.
 (compute_bb_frequencies): Renamed, and new parameter to
 force recomputing frequencies.
 (rebuild_frequencies): Recompute bb frequencies from the
 probabilities instead of from counts due to truncation issues.
 * predict.h (compute_bb_frequencies): Update declaration.

 Index: params.def
 ===
 --- params.def  (revision 203152)
 +++ params.def  (working copy)
 @@ -373,6 +373,12 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
  Select fraction of the maximal frequency of executions of
 basic block in function given basic block needs to have to be
 considered hot,
  1000, 0, 0)

 +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
 +min-hot-run-ratio,
 + The minimum ratio of profile runs to basic block execution count 
 + for the block to be considered hot,
 +4, 0, 1)
 +
  DEFPARAM (PARAM_ALIGN_THRESHOLD,
   align-threshold,
   Select fraction of the maximal frequency of executions of
 basic block in function given basic block get alignment,
 Index: predict.c
 ===
 --- predict.c   (revision 203152)
 +++ predict.c   (working copy)
 @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
gcc_checking_assert (fun);
if (profile_status_for_function (fun) == PROFILE_READ)
  {
 -  if ((count * 4 + profile_info-runs / 2) / profile_info-runs  0)
 +  int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
 +  if (RDIV (count * min_run_ratio, profile_info-runs)  0)
 return false;
if (!frequency)
 return true;
if (!ENTRY_BLOCK_PTR-frequency)
 return false;
 -  if (ENTRY_BLOCK_PTR-count  ENTRY_BLOCK_PTR-count  
 REG_BR_PROB_BASE)
 +  if (ENTRY_BLOCK_PTR-count)
 {
 - return (RDIV (frequency * ENTRY_BLOCK_PTR-count,
 -   ENTRY_BLOCK_PTR-frequency)
 -  REG_BR_PROB_BASE / 4);
 +  gcov_type scaled_count
 +  = frequency * ENTRY_BLOCK_PTR-count * min_run_ratio;
 +  gcov_type computed_count;
 +  /* Check for overflow, in which case ENTRY_BLOCK_PTR-count should
 + be large enough to do the division first without losing much
 + precision.  */
 +  if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR-count)
 +{
 +  computed_count = RDIV (ENTRY_BLOCK_PTR-count,
 + 

Re: [Patch] Internal functions for testsuite in regex

2013-10-04 Thread Paolo Carlini

Hi,

On 10/04/2013 04:13 PM, Tim Shen wrote:

This is based on disscusions
here(http://gcc.gnu.org/ml/libstdc++/2013-10/msg00034.html)

And it successfully find a bug in regex_executor.tcc :)

Booted and tested under -m32, -m64 and debug before the change in
regex_executor.tcc;
-m32 and -m64 only for regex_executor.tcc, but I'll start a bootstrap now.
Can you please add short comments inline in the code explaining the 
semantics of the new functions? A short comment before each one. In 
particular, the new *_testsuite functions, isn't immediately clear in 
what they differ from the non-_testsuite variants. In any case we should 
figure out a better name, maybe even *_internal, if we can't find 
anything more accurate, but it should be self-explaining in terms of 
their C++ semantics.


Please also document in a comment before the definition of 
__regex_search_impl the __policy parameter (not just in the body of the 
function). And I don't see why we can't use an enumeration instead of 
integers, like


enum class _RegexExecutorPolicy : int { _AutoExecutor, _DFSExecutor, 
_BFSExecutor };


More trivial issues: inline functions (like all the new _testsuite ones) 
should be in .h, not in .tcc. Some unuglified names, like res ans res2.


Paolo.


Re: [Patch] Internal functions for testsuite in regex

2013-10-04 Thread Paolo Carlini

On 10/04/2013 06:04 PM, Paolo Carlini wrote:
In particular, the new *_testsuite functions, isn't immediately clear 
in what they differ from the non-_testsuite variants. In any case we 
should figure out a better name, maybe even *_internal, if we can't 
find anything more accurate, but it should be self-explaining in terms 
of their C++ semantics.
Thus, the functions are comparing two different executors. Thus say that 
in the name, like: __regex_search_debug, or __regex_search_compare. And 
please say what they are for in a comment before at least the primary 
overload, the one actually carrying out the comparison.


All the other comments live.

Paolo.



[patch] Fix unresolved symbol with -gsplit-dwarf

2013-10-04 Thread Cary Coutant
When building the location list for a variable that has been optimized
by SRA, dw_sra_loc_expr sometimes creates a DWARF expression or a
piece of an expression, but then abandons it for some reason.  When
abandoning it, we neglected to release any addr_table entries created
for DW_OP_addr_index opcodes, occasionally resulting in a link-time
unresolved symbol.  This patch releases the addr_table entries before
abandoning a location expression.

Bootstrapped and regression tested on x86-64.
Committed to trunk at r203206.

-cary


2013-10-03  Cary Coutant  ccout...@google.com

gcc/
* dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when
discarding a location list expression (or a piece of one).

Index: dwarf2out.c
===
--- dwarf2out.c (revision 203183)
+++ dwarf2out.c (working copy)
@@ -13492,6 +13492,9 @@ dw_sra_loc_expr (tree decl, rtx loc)
   if (last != NULL  opsize != bitsize)
{
  padsize += bitsize;
+ /* Discard the current piece of the descriptor and release any
+addr_table entries it uses.  */
+ remove_loc_list_addr_table_entries (cur_descr);
  continue;
}
 
@@ -13500,18 +13503,24 @@ dw_sra_loc_expr (tree decl, rtx loc)
   if (padsize)
{
  if (padsize  decl_size)
-   return NULL;
+   {
+ remove_loc_list_addr_table_entries (cur_descr);
+ goto discard_descr;
+   }
  decl_size -= padsize;
  *descr_tail = new_loc_descr_op_bit_piece (padsize, 0);
  if (*descr_tail == NULL)
-   return NULL;
+   {
+ remove_loc_list_addr_table_entries (cur_descr);
+ goto discard_descr;
+   }
  descr_tail = (*descr_tail)-dw_loc_next;
  padsize = 0;
}
   *descr_tail = cur_descr;
   descr_tail = tail;
   if (bitsize  decl_size)
-   return NULL;
+   goto discard_descr;
   decl_size -= bitsize;
   if (last == NULL)
{
@@ -13547,9 +13556,9 @@ dw_sra_loc_expr (tree decl, rtx loc)
{
  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
   (memsize  BITS_PER_WORD || bitsize  BITS_PER_WORD))
-   return NULL;
+   goto discard_descr;
  if (memsize  bitsize)
-   return NULL;
+   goto discard_descr;
  if (BITS_BIG_ENDIAN)
offset = memsize - bitsize;
}
@@ -13557,7 +13566,7 @@ dw_sra_loc_expr (tree decl, rtx loc)
 
  *descr_tail = new_loc_descr_op_bit_piece (bitsize, offset);
  if (*descr_tail == NULL)
-   return NULL;
+   goto discard_descr;
  descr_tail = (*descr_tail)-dw_loc_next;
}
 }
@@ -13568,9 +13577,14 @@ dw_sra_loc_expr (tree decl, rtx loc)
 {
   *descr_tail = new_loc_descr_op_bit_piece (decl_size, 0);
   if (*descr_tail == NULL)
-   return NULL;
+   goto discard_descr;
 }
   return descr;
+
+discard_descr:
+  /* Discard the descriptor and release any addr_table entries it uses.  */
+  remove_loc_list_addr_table_entries (descr);
+  return NULL;
 }
 
 /* Return the dwarf representation of the location list LOC_LIST of


Re: [Patch] Internal functions for testsuite in regex

2013-10-04 Thread Paolo Carlini
.. a final one: if you don't like all those *_debug functions around, 
both in rehex.h and regex.tcc, you could move all of them to a new 
header matching the new naming scheme, like regex_debug.h. For the time 
being I would recommend putting it in bits/, installing it, including it 
from regex, exactly like all the other regex bits.


A completely different option, which I also like a lot in fact, would be 
putting the new *_testsuite functions inside the already existing 
testsuite/util/testsuite/regex.h. There you would use namespace 
__gnu_test, would not require strict uglification, etc.


Paolo.


[Patch, Fortran, committed] GCC 4.8 backporting of defined assignment patches

2013-10-04 Thread Tobias Burnus
I have committed the attached patch to the GCC 4.8 branch, backporting 
some defined assignment patches. Committed as Rev. 203207/203208.


GCC 4.8 added defined assignment for components during intrinsic 
assignment, which had some issues.


a) PR 57697/PR58469
Patch: http://gcc.gnu.org/ml/fortran/2013-09/msg00039.html
Approval: http://gcc.gnu.org/ml/fortran/2013-09/msg00056.html

b) http://gcc.gnu.org/ml/fortran/2013-09/msg00016.html
c) http://gcc.gnu.org/ml/fortran/2013-09/msg00026.html
d) http://gcc.gnu.org/ml/fortran/2013-09/msg00038.html

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 203206)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,13 @@
+2013-10-04  Tobias Burnus  bur...@net-b.de
+
+	Backport from mainline
+	2013-09-25  Tobias Burnus  bur...@net-b.de
+
+	PR fortran/57697
+	PR fortran/58469
+	* resolve.c (generate_component_assignments): Avoid double free
+	at runtime and freeing a still-being used expr.
+
 2013-08-24  Mikael Morin  mik...@gcc.gnu.org
 
 	PR fortran/57798
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c	(Revision 203206)
+++ gcc/fortran/resolve.c	(Arbeitskopie)
@@ -9997,6 +9997,26 @@
 		  temp_code = build_assignment (EXEC_ASSIGN,
 		t1, (*code)-expr1,
 NULL, NULL, (*code)-loc);
+
+		  /* For allocatable LHS, check whether it is allocated.  Note
+		 that allocatable components with defined assignment are
+		 not yet support.  See PR 57696.  */
+		  if ((*code)-expr1-symtree-n.sym-attr.allocatable)
+		{
+		  gfc_code *block;
+		  gfc_expr *e =
+			gfc_lval_expr_from_sym ((*code)-expr1-symtree-n.sym);
+		  block = gfc_get_code ();
+		  block-op = EXEC_IF;
+		  block-block = gfc_get_code ();
+		  block-block-op = EXEC_IF;
+		  block-block-expr1
+			  = gfc_build_intrinsic_call (ns,
+GFC_ISYM_ALLOCATED, allocated,
+(*code)-loc, 1, e);
+		  block-block-next = temp_code;
+		  temp_code = block;
+		}
 		  add_code_to_chain (temp_code, tmp_head, tmp_tail);
 		}
 
@@ -10005,8 +10025,37 @@
 	  gfc_free_expr (this_code-ext.actual-expr);
 	  this_code-ext.actual-expr = gfc_copy_expr (t1);
 	  add_comp_ref (this_code-ext.actual-expr, comp1);
+
+	  /* If the LHS variable is allocatable and wasn't allocated and
+ the temporary is allocatable, pointer assign the address of
+ the freshly allocated LHS to the temporary.  */
+	  if ((*code)-expr1-symtree-n.sym-attr.allocatable
+		   gfc_expr_attr ((*code)-expr1).allocatable)
+		{
+		  gfc_code *block;
+		  gfc_expr *cond;
+
+		  cond = gfc_get_expr ();
+		  cond-ts.type = BT_LOGICAL;
+		  cond-ts.kind = gfc_default_logical_kind;
+		  cond-expr_type = EXPR_OP;
+		  cond-where = (*code)-loc;
+		  cond-value.op.op = INTRINSIC_NOT;
+		  cond-value.op.op1 = gfc_build_intrinsic_call (ns,
+	  GFC_ISYM_ALLOCATED, allocated,
+	  (*code)-loc, 1, gfc_copy_expr (t1));
+		  block = gfc_get_code ();
+		  block-op = EXEC_IF;
+		  block-block = gfc_get_code ();
+		  block-block-op = EXEC_IF;
+		  block-block-expr1 = cond;
+		  block-block-next = build_assignment (EXEC_POINTER_ASSIGN,
+	t1, (*code)-expr1,
+	NULL, NULL, (*code)-loc);
+		  add_code_to_chain (block, head, tail);
+		}
 	}
-	  }
+	}
   else if (this_code-op == EXEC_ASSIGN  !this_code-next)
 	{
 	  /* Don't add intrinsic assignments since they are already
@@ -10028,13 +10077,6 @@
 	}
 }
 
-  /* This is probably not necessary.  */
-  if (this_code)
-{
-  gfc_free_statements (this_code);
-  this_code = NULL;
-}
-
   /* Put the temporary assignments at the top of the generated code.  */
   if (tmp_head  component_assignment_level == 1)
 {
@@ -10043,6 +10085,30 @@
   tmp_head = tmp_tail = NULL;
 }
 
+  // If we did a pointer assignment - thus, we need to ensure that the LHS is
+  // not accidentally deallocated. Hence, nullify t1.
+  if (t1  (*code)-expr1-symtree-n.sym-attr.allocatable
+   gfc_expr_attr ((*code)-expr1).allocatable)
+{
+  gfc_code *block;
+  gfc_expr *cond;
+  gfc_expr *e;
+
+  e = gfc_lval_expr_from_sym ((*code)-expr1-symtree-n.sym);
+  cond = gfc_build_intrinsic_call (ns, GFC_ISYM_ASSOCIATED, associated,
+   (*code)-loc, 2, gfc_copy_expr (t1), e);
+  block = gfc_get_code ();
+  block-op = EXEC_IF;
+  block-block = gfc_get_code ();
+  block-block-op = EXEC_IF;
+  block-block-expr1 = cond;
+  block-block-next = build_assignment (EXEC_POINTER_ASSIGN,
+	t1, gfc_get_null_expr ((*code)-loc),
+	NULL, NULL, (*code)-loc);
+  gfc_append_code (tail, block);
+  tail = block;
+}
+
   /* Now attach the remaining code chain to the input code.  Step on
  to the end of the new code since resolution is complete.  */
   gcc_assert ((*code)-op == EXEC_ASSIGN);
@@ -10052,7 

Re: patch to canonize unsigned tree-csts

2013-10-04 Thread Richard Sandiford
I was hoping Richard would weigh in here.  In case not...

Kenneth Zadeck zad...@naturalbridge.com writes:
 I was thinking that we should always be able to use the constant as-is
 for max_wide_int-based and addr_wide_int-based operations.  The small_prec
 again, you can get edge cased to death here.i think it would work
 for max because that really is bigger than anything else, but it is
 possible (though unlikely) to have something big converted to an address
 by truncation.
 But I'd have expected that conversion to be represented by an explicit
 CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
 something that isn't bit- or byte-address-sized.  It'd be the C equivalent
 of int + long - int rather than the expected int + long - long.

 Same goes for wide_int.  If we're doing arithmetic at a specific
 precision, it seems odd for one of the inputs to be wider and yet
 not have an explicit truncation.
 you miss the second reason why we needed addr-wide-int.   A large amount 
 of the places where the addressing arithmetic is done are not type 
 safe.Only the gimple and rtl that is translated from the source 
 code is really type safe. In passes like the strength reduction 
 where they just grab things from all over, the addr-wide-int or the 
 max-wide-int are safe haven structures that are guaranteed to be large 
 enough to not matter.So what i fear here is something like a very 
 wide loop counter being grabbed into some address calculation.

It still seems really dangerous to be implicitly truncating a wider type
to addr_wide_int.  It's not something we'd ever do in mainline, because
uses of addr_wide_int are replacing uses of double_int, and double_int
is our current maximum-width representation.

Using addr_wide_int rather than max_wide_int is an optimisation.
IMO part of implementating that optimisation should be to introduce
explicit truncations whenever we try to use addr_wide_int to operate
on inputs than might be wider than addr_wide_int.

So I still think the assert is the way to go.

Thanks,
Richard



[PATCH][ARM]Use cortex tune_params for armv8-a architecture

2013-10-04 Thread Renlin Li

Hi all,

This patch will change tune_params for armv8-a architecture to general 
cortex.

Change has been tested for arm-none-eabi on the model.

Ok for trunk?

Kind regards,
Renlin Li

gcc/ChangeLog:

2013-10-03  Renlin Li renlin...@arm.com

* config/arm/arm-cores.def (cortex-a53): Use cortex tunning.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 3d59fa6..17c9bf3 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -129,7 +129,7 @@ ARM_CORE(cortex-a7,	  cortexa7,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV
 ARM_CORE(cortex-a8,	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-a9,	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
 ARM_CORE(cortex-a15,	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
-ARM_CORE(cortex-a53,	  cortexa53,	8A, FL_LDSCHED, cortex_a5)
+ARM_CORE(cortex-a53,	  cortexa53,	8A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-r4,	  cortexr4,	7R, FL_LDSCHED, cortex)
 ARM_CORE(cortex-r4f,	  cortexr4f,	7R, FL_LDSCHED, cortex)
 ARM_CORE(cortex-r5,	  cortexr5,	7R, FL_LDSCHED | FL_ARM_DIV, cortex)

Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Jason Merrill

On 10/04/2013 09:20 AM, Andrew Sutton wrote:

Perhaps you mean that it must match a fully-instantiated function, so any
constraints on the templates were considered during
determine_specialization.



This seems like a simple comment fix, but there's a longer explanation
of what I want (see below). Would this be more appropriate?

   // Do not allow constrained friend template specializations.

The intent is stronger than to say it must match something.


By must I meant that whatever it matches could only be a 
fully-instantiated function.


But I guess the main reason for disallowing constraints here is the same 
as for explicit specializations and non-template functions; 
non-dependent constraints don't really make any sense.


Jason



Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Andrew Sutton
 Perhaps you mean that it must match a fully-instantiated function, so
  any
 constraints on the templates were considered during
 determine_specialization.


 This seems like a simple comment fix, but there's a longer explanation
 of what I want (see below). Would this be more appropriate?

// Do not allow constrained friend template specializations.

 The intent is stronger than to say it must match something.


 By must I meant that whatever it matches could only be a
 fully-instantiated function.


I see what you mean. I was caught up in the wrong part of the
sentence.. But yes, that's right.

 But I guess the main reason for disallowing constraints here is the same as
 for explicit specializations and non-template functions; non-dependent
 constraints don't really make any sense.

That's the intent.

Okay to commit?

Andrew


friends-3.patch
Description: Binary data


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Rong Xu
My change on the probability of builtin_expect does have an impact on
the partial inline (more outlined functions will get inline back to
the original function).
I think this triggers an existing issue.
Dehao will explain this in his coming email.

-Rong

On Fri, Oct 4, 2013 at 6:05 AM, Ramana Radhakrishnan ramra...@arm.com wrote:
 On 10/02/13 23:49, Rong Xu wrote:

 Here is the new patch. Honaz: Could you take a look?

 Thanks,

 -Rong

 On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka hubi...@ucw.cz wrote:

 Thanks for the suggestion. This is much cleaner than to use binary
 parameter.

 Just want to make sure I understand it correctly about the orginal
 hitrate:
 you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use
 the one specified in the biniltin-expect-probability parameter.


 Yes.


 Should I use 90% as the default? It's hard to fit current value 0.9996
 in percent form.


 Yes, 90% seems fine.  The original value was set quite arbitrarily and no
 real
 performance study was made as far as I know except yours. I think users
 that
 are sure they use expect to gueard completely cold edges may just use
 100%
 instead of 0.9996, so I would not worry much about the precision.

 Honza


 -Rong


 OK with that change.

 Honza



 This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further
 yet but still reducing the testcase.

 See

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619

 for details.

 Can you please deal with this appropriately ?

 regards
 Ramana




[google/gcc-4_8] Backport fix for unresolved symbol with -gsplit-dwarf

2013-10-04 Thread Cary Coutant
When building the location list for a variable that has been optimized
by SRA, dw_sra_loc_expr sometimes creates a DWARF expression or a
piece of an expression, but then abandons it for some reason.  When
abandoning it, we neglected to release any addr_table entries created
for DW_OP_addr_index opcodes, occasionally resulting in a link-time
unresolved symbol.  This patch releases the addr_table entries before
abandoning a location expression.

Backported from trunk r203206.

Google ref b/10833306.

2013-10-03  Cary Coutant  ccout...@google.com

gcc/
* dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when
discarding a location list expression (or a piece of one).


Go patch committed: Use backend interface for references to temps

2013-10-04 Thread Ian Lance Taylor
This patch from Chris Manghane changes the Go frontend to use the
backend interface when building a reference to a temporary variable.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.8 branch.

Ian


2013-10-04  Chris Manghane  cm...@google.com

* go-gcc.cc (Backend::convert_expression): New function.


Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc	(revision 203037)
+++ gcc/go/go-gcc.cc	(revision 203038)
@@ -208,6 +208,16 @@ class Gcc_backend : public Backend
   Bexpression*
   zero_expression(Btype*);
 
+  Bexpression*
+  error_expression()
+  { return this-make_expression(error_mark_node); }
+
+  Bexpression*
+  var_expression(Bvariable* var, Location);
+
+  Bexpression*
+  indirect_expression(Bexpression* expr, bool known_valid, Location);
+
   // Statements.
 
   Bstatement*
@@ -848,6 +858,30 @@ Gcc_backend::zero_expression(Btype* btyp
   return tree_to_expr(ret);
 }
 
+// An expression that references a variable.
+
+Bexpression*
+Gcc_backend::var_expression(Bvariable* var, Location)
+{
+  tree ret = var-get_tree();
+  if (ret == error_mark_node)
+return this-error_expression();
+  return tree_to_expr(ret);
+}
+
+// An expression that indirectly references an expression.
+
+Bexpression*
+Gcc_backend::indirect_expression(Bexpression* expr, bool known_valid,
+ Location location)
+{
+  tree ret = build_fold_indirect_ref_loc(location.gcc_location(),
+ expr-get_tree());
+  if (known_valid)
+TREE_THIS_NOTRAP(ret) = 1;
+  return tree_to_expr(ret);
+}
+
 // An expression as a statement.
 
 Bstatement*
Index: gcc/go/gofrontend/backend.h
===
--- gcc/go/gofrontend/backend.h	(revision 203037)
+++ gcc/go/gofrontend/backend.h	(revision 203038)
@@ -231,6 +231,22 @@ class Backend
   virtual Bexpression*
   zero_expression(Btype*) = 0;
 
+  // Create an error expression. This is used for cases which should
+  // not occur in a correct program, in order to keep the compilation
+  // going without crashing.
+  virtual Bexpression*
+  error_expression() = 0;
+
+  // Create a reference to a variable.
+  virtual Bexpression*
+  var_expression(Bvariable* var, Location) = 0;
+
+  // Create an expression that indirects through the pointer expression EXPR
+  // (i.e., return the expression for *EXPR). KNOWN_VALID is true if the pointer
+  // is known to point to a valid memory location.
+  virtual Bexpression*
+  indirect_expression(Bexpression* expr, bool known_valid, Location) = 0;
+
   // Statements.
 
   // Create an error statement.  This is used for cases which should
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc	(revision 203037)
+++ gcc/go/gofrontend/expressions.cc	(revision 203038)
@@ -978,22 +978,19 @@ Var_expression::do_get_tree(Translate_co
 {
   Bvariable* bvar = this-variable_-get_backend_variable(context-gogo(),
 			  context-function());
-  tree ret = var_to_tree(bvar);
-  if (ret == error_mark_node)
-return error_mark_node;
   bool is_in_heap;
+  Location loc = this-location();
   if (this-variable_-is_variable())
 is_in_heap = this-variable_-var_value()-is_in_heap();
   else if (this-variable_-is_result_variable())
 is_in_heap = this-variable_-result_var_value()-is_in_heap();
   else
 go_unreachable();
+
+  Bexpression* ret = context-backend()-var_expression(bvar, loc);
   if (is_in_heap)
-{
-  ret = build_fold_indirect_ref_loc(this-location().gcc_location(), ret);
-  TREE_THIS_NOTRAP(ret) = 1;
-}
-  return ret;
+ret = context-backend()-indirect_expression(ret, true, loc);
+  return expr_to_tree(ret);
 }
 
 // Ast dump for variable expression.


libgo patch committed: Fix calling Interface on MakeFunc value

2013-10-04 Thread Ian Lance Taylor
This patch to libgo fixes calling the Interface method on a Value
created by calling MakeFunc.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.8 branch.

Ian

diff -r 52c01e7b81fe libgo/go/reflect/all_test.go
--- a/libgo/go/reflect/all_test.go	Fri Oct 04 11:00:40 2013 -0700
+++ b/libgo/go/reflect/all_test.go	Fri Oct 04 11:49:27 2013 -0700
@@ -1454,6 +1454,30 @@
 	}
 }
 
+func TestMakeFuncInterface(t *testing.T) {
+	switch runtime.GOARCH {
+	case amd64, 386:
+	default:
+		t.Skip(MakeFunc not implemented for  + runtime.GOARCH)
+	}
+
+	fn := func(i int) int { return i }
+	incr := func(in []Value) []Value {
+		return []Value{ValueOf(int(in[0].Int() + 1))}
+	}
+	fv := MakeFunc(TypeOf(fn), incr)
+	ValueOf(fn).Elem().Set(fv)
+	if r := fn(2); r != 3 {
+		t.Errorf(Call returned %d, want 3, r)
+	}
+	if r := fv.Call([]Value{ValueOf(14)})[0].Int(); r != 15 {
+		t.Errorf(Call returned %d, want 15, r)
+	}
+	if r := fv.Interface().(func(int) int)(26); r != 27 {
+		t.Errorf(Call returned %d, want 27, r)
+	}
+}
+
 type Point struct {
 	x, y int
 }
diff -r 52c01e7b81fe libgo/go/reflect/makefunc.go
--- a/libgo/go/reflect/makefunc.go	Fri Oct 04 11:00:40 2013 -0700
+++ b/libgo/go/reflect/makefunc.go	Fri Oct 04 11:49:27 2013 -0700
@@ -63,7 +63,7 @@
 
 	impl := makeFuncImpl{code: code, typ: ftyp, fn: fn}
 
-	return Value{t, unsafe.Pointer(impl), flag(Func)  flagKindShift}
+	return Value{t, unsafe.Pointer(impl), flag(FuncflagKindShift) | flagIndir}
 }
 
 // makeFuncStub is an assembly function that is the code half of


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Jan Hubicka
 I looked at this problem. Bug updated
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
 
 This is a bug when updating block during tree-inline. Basically, it is
 legal for *n to be NULL. E.g. When gimple_block(id-gimple_call) is
 NULL, remap_blocks_to_null will be called to set *n to NULL.

The NULL in gimple_block (gimple_call) comes from the call introduced by 
ipa-split?
I remember that ipa-split used to try to put the call into block since we was 
ICEing
in similar ways previously, too.  Perhaps this has changed with new BLOCK 
representation?

Honza


Re: patch to canonize unsigned tree-csts

2013-10-04 Thread Kenneth Zadeck

On 10/04/2013 01:00 PM, Richard Sandiford wrote:

I was hoping Richard would weigh in here.  In case not...

Kenneth Zadeck zad...@naturalbridge.com writes:

I was thinking that we should always be able to use the constant as-is
for max_wide_int-based and addr_wide_int-based operations.  The small_prec

again, you can get edge cased to death here.i think it would work
for max because that really is bigger than anything else, but it is
possible (though unlikely) to have something big converted to an address
by truncation.

But I'd have expected that conversion to be represented by an explicit
CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
something that isn't bit- or byte-address-sized.  It'd be the C equivalent
of int + long - int rather than the expected int + long - long.

Same goes for wide_int.  If we're doing arithmetic at a specific
precision, it seems odd for one of the inputs to be wider and yet
not have an explicit truncation.

you miss the second reason why we needed addr-wide-int.   A large amount
of the places where the addressing arithmetic is done are not type
safe.Only the gimple and rtl that is translated from the source
code is really type safe. In passes like the strength reduction
where they just grab things from all over, the addr-wide-int or the
max-wide-int are safe haven structures that are guaranteed to be large
enough to not matter.So what i fear here is something like a very
wide loop counter being grabbed into some address calculation.

It still seems really dangerous to be implicitly truncating a wider type
to addr_wide_int.  It's not something we'd ever do in mainline, because
uses of addr_wide_int are replacing uses of double_int, and double_int
is our current maximum-width representation.

Using addr_wide_int rather than max_wide_int is an optimisation.
IMO part of implementating that optimisation should be to introduce
explicit truncations whenever we try to use addr_wide_int to operate
on inputs than might be wider than addr_wide_int.

So I still think the assert is the way to go.
addr_wide_int is not as much of an optimization as it is documentation 
of what you are doing - i.e. this is addressing arithmetic.  My 
justification for putting it in was that we wanted a sort of an abstract 
type to say that this was not just user math, it was addressing 
arithmetic and that the ultimate result is going to be slammed into a 
target pointer.


I was only using that as an example to try to indicate that I did not 
think that it was wrong if someone did truncate.   In particular, would 
you want the assert to be that the value was truncated or that the type 
of the value would allow numbers that would be truncated?   I actually 
think neither.


If a programmer uses a long long on a 32 bit machine for some index 
variable and slams that into a pointer, he either knows what he is doing 
or has made a mistake.do you really think that the compiler should ice?




Thanks,
Richard





Merge from GCC 4.8 branch to gccgo branch

2013-10-04 Thread Ian Lance Taylor
I merged revision 203214 of the GCC 4.8 branch to the gccgo branch.

Ian


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Dehao Chen
On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka hubi...@ucw.cz wrote:
 I looked at this problem. Bug updated
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619

 This is a bug when updating block during tree-inline. Basically, it is
 legal for *n to be NULL. E.g. When gimple_block(id-gimple_call) is
 NULL, remap_blocks_to_null will be called to set *n to NULL.

 The NULL in gimple_block (gimple_call) comes from the call introduced by 
 ipa-split?

That is correct.

 I remember that ipa-split used to try to put the call into block since we was 
 ICEing
 in similar ways previously, too.  Perhaps this has changed with new BLOCK 
 representation?

The new BLOCK representation does not change this. I think it makes
sense to leave the block of newly introduced call_stmt as NULL because
when it's inlined back, we don't want to add additional block layers.

Dehao


 Honza


Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Jason Merrill

OK.

Jason


[gomp4] Target fallback ICV handling, ICV fixes

2013-10-04 Thread Jakub Jelinek
Hi!

I've committed the following patch to gomp-4.0-branch.

The omp-low.c changes are to fix some bugs with if clause on
#pragma omp target{, data, update}.
The c-cppbuiltin.c is to finally announce OpenMP 4.0 support for C/C++.

The libgomp changes are:
1) as required by OpenMP 4.0, thread_limit_var is now a per-data-environment
   ICV, rather than global var
2) gomp_remaining_threads_count has been removed, instead as required by the
   spec ThreadsBusy from the spec is tracked per contention group inside
   of thread_pool; if there is just one contention group, then the new
   thr-thread_pool-threads_busy should be the difference between
   icv-thread_limit_var and the old gomp_remaining_threads_count;
   so, in gomp_resolve_num_threads now we add nthreads - 1 to it
   rather than subtracting it
3) apparently the old OMP_THREAD_LIMIT code was buggy, because
   GOMP_parallel_end was also subtracting from gomp_remaining_threads_count
   rather than adding to it (with the new code it is correct to subtract;
   when I get spare time I'll write a small alternative patch for the
   release branches together with thread-limit-1.c testcase)
4) as the threads_busy count is now per-contention group, if a parallel
   isn't nested, we actually don't need to atomically update the counter,
   because there is just one thread in the contention group
5) gomp_managed_threads counter remains to be a global var, that is used
   to decide about spinning length, that is desirable to be global and
   is not user observable thing covered by the standard; I've just
   renamed the mutex guarding it
6) for GOMP_target host fallback, the function will create a new initial
   thread by making a copy of the old TLS *gomp_thread () and clearing it
   (except for affinity place and reinitializing it's place var to the
   whole place list), then restoring back
7) I've noticed that thr-release semaphore is never used for the master
   threads, so there is no point initializing it; we were initializing
   it just for the first initial thread, e.g. not in subsequent user
   pthread_create created threads that encounter #pragma omp constructs;
   and the semaphore wasn't ever destroyed
8) GOMP_teams is now implemented for the host fallback just by adjusting
   icv-thread_limit_var
9) on the target-7.c testcase I found several issues in the var remapping
   code (some fields could be uninitialized in certain cases)

Tested on x86_64-linux, committed.

2013-10-04  Jakub Jelinek  ja...@redhat.com

* omp-low.c (expand_omp_target): When handling IF clause on
#pragma omp target, split new_bb rather than entry_bb.  If
not GF_OMP_TARGET_KIND_REGION, split new_bb right before
the GOMP_TARGET stmt, rather than after labels.
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Predefine _OPENMP to
201307 instead of 201107.
libgomp/
* libgomp.h (struct gomp_task_icv): Add thread_limit_var.
(gomp_thread_limit_var, gomp_remaining_threads_count,
gomp_remaining_threads_lock): Remove.
(gomp_managed_threads_lock): New variable.
(struct gomp_thread_pool): Add threads_busy field.
(gomp_free_thread): New prototype.
* parallel.c (gomp_resolve_num_threads): Adjust for
thread_limit now being in icv-thread_limit_var.  Use
UINT_MAX instead of ULONG_MAX as infinity.  If not nested,
just return minimum of max_num_threads and icv-thread_limit_var
and if thr-thread_pool, set threads_busy to the returned value.
Otherwise, don't update atomically gomp_remaining_threads_count,
but instead thr-thread_pool-threads_busy.
(GOMP_parallel_end): Adjust for thread_limit now being in
icv-thread_limit_var.  Use UINT_MAX instead of ULONG_MAX as
infinity.  Adjust threads_busy in the pool rather than
gomp_remaining_threads_count.  Remember team-nthreads and call
gomp_team_end before adjusting threads_busy, if not nested
afterwards, just set it to 1 non-atomically.
* team.c (gomp_thread_start): Clear thr-thread_pool and
thr-task before returning.
(gomp_free_pool_helper): Clear thr-thread_pool and thr-task
before calling pthread_exit.
(gomp_free_thread): No longer static.  Use
gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
(gomp_team_start): Set thr-thread_pool-threads_busy to
nthreads immediately after creating new pool.  Use
gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
(gomp_team_end): Use gomp_managed_threads_lock instead of
gomp_remaining_threads_lock.
(initialize_team): Don't call gomp_sem_init here.
* env.c (gomp_global_icv): Initialize thread_limit_var field.
(gomp_thread_limit_var, gomp_remaining_threads_count,
gomp_remaining_threads_lock): Remove.
(gomp_managed_threads_locks): New variable.

[PATCH] Refactor a bit of jump thread identification code

2013-10-04 Thread Jeff Law


This pulls out the code to search for a threading opportunity in a 
normal block into its own subroutine.   No functional changes.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 065ebf1..cf4a45c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-04  Jeff Law  l...@redhat.com
+
+   * tree-ssa-threadedge.c: Fix some trailing whitespace problems.
+
+   * tree-ssa-threadedge.c (thread_through_normal_block): Broken out of ...
+   (thread_across_edge): Here.  Call it.
+
 2013-10-04  Cary Coutant  ccout...@google.com
 
* dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 39e921b..c2dd015 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -771,7 +771,7 @@ thread_around_empty_blocks (edge taken_edge,
   gsi = gsi_start_nondebug_bb (bb);
 
   /* If the block has no statements, but does have a single successor, then
- it's just a forwarding block and we can thread through it trivially. 
+ it's just a forwarding block and we can thread through it trivially.
 
  However, note that just threading through empty blocks with single
  successors is not inherently profitable.  For the jump thread to
@@ -779,7 +779,7 @@ thread_around_empty_blocks (edge taken_edge,
 
  By taking the return value from the recursive call, we get the
  desired effect of returning TRUE when we found a profitable jump
- threading opportunity and FALSE otherwise. 
+ threading opportunity and FALSE otherwise.
 
  This is particularly important when this routine is called after
  processing a joiner block.  Returning TRUE too aggressively in
@@ -844,13 +844,16 @@ thread_around_empty_blocks (edge taken_edge,
  path);
   return true;
 }
- 
+
   return false;
 }
-  
+
 /* We are exiting E-src, see if E-dest ends with a conditional
jump which has a known value when reached via E.
 
+   E-dest can have arbitrary side effects which, if threading is
+   successful, will be maintained.
+
Special care is necessary if E is a back edge in the CFG as we
may have already recorded equivalences for E-dest into our
various tables, including the result of the conditional at
@@ -858,11 +861,6 @@ thread_around_empty_blocks (edge taken_edge,
limited in that case to avoid short-circuiting the loop
incorrectly.
 
-   Note it is quite common for the first block inside a loop to
-   end with a conditional which is either always true or always
-   false when reached via the loop backedge.  Thus we do not want
-   to blindly disable threading across a loop backedge.
-
DUMMY_COND is a shared cond_expr used by condition simplification as 
scratch,
to avoid allocating memory.
 
@@ -873,17 +871,19 @@ thread_around_empty_blocks (edge taken_edge,
STACK is used to undo temporary equivalences created during the walk of
E-dest.
 
-   SIMPLIFY is a pass-specific function used to simplify statements.  */
+   SIMPLIFY is a pass-specific function used to simplify statements.
 
-void
-thread_across_edge (gimple dummy_cond,
-   edge e,
-   bool handle_dominating_asserts,
-   vectree *stack,
-   tree (*simplify) (gimple, gimple))
-{
-  gimple stmt;
+   Our caller is responsible for restoring the state of the expression
+   and const_and_copies stacks.  */
 
+static bool
+thread_through_normal_block (edge e,
+gimple dummy_cond,
+bool handle_dominating_asserts,
+vectree *stack,
+tree (*simplify) (gimple, gimple),
+vecjump_thread_edge * *path)
+{
   /* If E is a backedge, then we want to verify that the COND_EXPR,
  SWITCH_EXPR or GOTO_EXPR at the end of e-dest is not affected
  by any statements in e-dest.  If it is affected, then it is not
@@ -891,20 +891,19 @@ thread_across_edge (gimple dummy_cond,
   if (e-flags  EDGE_DFS_BACK)
 {
   if (cond_arg_set_in_bb (e, e-dest))
-   goto fail;
+   return false;
 }
 
-  stmt_count = 0;
-
   /* PHIs create temporary equivalences.  */
   if (!record_temporary_equivalences_from_phis (e, stack))
-goto fail;
+return false;
 
   /* Now walk each statement recording any context sensitive
  temporary equivalences we can detect.  */
-  stmt = record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify);
+  gimple stmt
+= record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify);
   if (!stmt)
-goto fail;
+return false;
 
   /* If we stopped at a COND_EXPR or SWITCH_EXPR, see if we know which arm
  will be taken.  */
@@ -927,9 +926,8 @@ thread_across_edge (gimple dummy_cond,
  /* DEST could be NULL for a computed 

Re: [C++ Patch] PR 58448

2013-10-04 Thread Jason Merrill

OK.

Jason


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Jan Hubicka
 On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka hubi...@ucw.cz wrote:
  I looked at this problem. Bug updated
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
 
  This is a bug when updating block during tree-inline. Basically, it is
  legal for *n to be NULL. E.g. When gimple_block(id-gimple_call) is
  NULL, remap_blocks_to_null will be called to set *n to NULL.
 
  The NULL in gimple_block (gimple_call) comes from the call introduced by 
  ipa-split?
 
 That is correct.
 
  I remember that ipa-split used to try to put the call into block since we 
  was ICEing
  in similar ways previously, too.  Perhaps this has changed with new BLOCK 
  representation?
 
 The new BLOCK representation does not change this. I think it makes
 sense to leave the block of newly introduced call_stmt as NULL because
 when it's inlined back, we don't want to add additional block layers.

You are right, it may be result of Jakub's changes in the area (to improve 
debug info
after inlining back).  I guess the patch makes sense then.

Honza

 
 Dehao
 
 
  Honza


Re: Enable building of libatomic on AArch64

2013-10-04 Thread Andrew Pinski
On Thu, Oct 3, 2013 at 3:43 PM, Michael Hudson-Doyle
michael.hud...@linaro.org wrote:
 Hi,

 As libatomic builds for and the tests pass on AArch64 (built on x86_64
 but tested on a foundation model, logs and summary:

 http://people.linaro.org/~mwhudson/libatomic.sum.txt
 http://people.linaro.org/~mwhudson/runtest-log-v-2.txt

 ) this patch enables the build.

 Cheers,
 mwh
 (first time posting to this list, let me know if I'm doing it wrong)

 2013-10-04  Michael Hudson-Doyle  michael.hud...@linaro.org

   * configure.tgt: Add AArch64 support.

Replying here also, This is the same patch which we have been using
internally and I would like to see this approved.

Thanks,
Andrew Pinski




Re: [C++ Patch] PR 58560

2013-10-04 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] PR 58503

2013-10-04 Thread Jason Merrill

OK.

Jason


Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-04 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 Some of the things I've done in 4.8 to improve SH T bit handling turned
 out to produce wrong code.  The attached patch fixes that by introducing
 an SH specific RTL pass.
 
 Tested on rev 202876 with
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}
 
 and no new failures.
 Additional test cases will follow.
 OK for trunk?

I've got a new failure with the patch on sh4-unknown-linux-gnu:

New tests that FAIL:

gfortran.fortran-torture/execute/forall_7.f90 execution,  -O3 -g 

 ifcvt_sh

We usually use sh_/sh- prefix for target specific things, don't we?
I'm not sure that there is some rigid naming convention for them,
though.

Regards,
kaz


Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Paolo Carlini

Hi Andrew,

On 10/04/2013 07:36 PM, Andrew Sutton wrote:

+  if (!check_template_constraints (tmpl, args))
+{
+  location_t loc = DECL_SOURCE_LOCATION (function);
+  error (%qD is not a viable candidate, function);
+  diagnose_constraints (input_location, tmpl, args);
+  return error_mark_node;
+}

Nit: loc seems unused.

Thanks,
Paolo.