Re: [patch, mips] Fix for PR target/56942

2013-04-29 Thread Richard Sandiford
Steve Ellcey  writes:
> OK, here is patch to next_real_insn to keep the ordering property intact
> and fix the bug.  OK for checkin?

Thanks, looks good to me, but an rtl/middle-end/global maintainer
would need to approve it.

Richard


Re: [testsuite] Disabling gcc.dg/cpp/trad/include.c for Android

2013-04-29 Thread Alexander Ivchenko
2013/4/29 Mike Stump :
> On Jan 9, 2013, at 7:14 AM, Alexander Ivchenko  wrote:
>>  We have test fail for gcc.dg/cpp/trad/include.c on Android. The
>> reason for that is that
>> -ftraditional-cpp is not expected to work on Android due to variadic
>> macro (like #define __builtin_warning(x, y...))
>> in standard headers and traditional preprocessor cannot handle them.
>>  The attached patch disables that test.
>
> Be sure to ask, Ok? in your patch submittals.
>
> Ok.

thank you! I thought I did ask..

> ...
> in standard headers and traditional preprocessor cannot handle them."
>
> is it ok for trunk?
>

could someone commit that patch please? I don't have commit access.

thanks,
Alexander


RE: [patch] cilkplus: Array notation for C patch

2013-04-29 Thread Joseph S. Myers
Here's a review of the changes to the compiler proper in this patch.
I don't think much more will come up from reviews of the compiler
changes - but I still need to review the testsuite changes against the
language specification to make sure that everything is properly
covered in the testsuite (which might in turn show up further things
needing to be addressed in the compiler).

> +   error_at (location, "__sec_implicit_index parameter must be a " 
> + "integer constant expression");

"an", not "a".

> diff --git a/gcc/c/ChangeLog.cilkplus b/gcc/c/ChangeLog.cilkplus

I believe the actual trunk commit, when this is ready to go in, should
simply add the ChangeLog entries for the committed changes to the top
of the existing ChangeLog files, rather than creating such a new
ChangeLog file.

> diff --git a/gcc/c/c-array-notation.c b/gcc/c/c-array-notation.c

> +#include "gcc.h"

That header is for the compiler driver.  Including it in anything
built into cc1 is suspicious.

> +/* Given an FNDECL or an ADDR_EXPR, return the corresponding

I think you mean something like "Given FNDECL, a FUNCTION_DECL or an
ADDR_EXPR", rather than "Given an FNDECL or an ADDR_EXPR".

> +/* Set *RANK of expression ARRAY, ignoring array notation specific built-in 
> +   functions if IGNORE_BUILTIN_FN is true.  The ORIG_EXPR is printed out if 
> an
> +   error occured in the rank calculation.  The functions returns false if it
> +   encounters an error in rank calculation.
> +
> +   For example, an array notation of A[:][:] or B[0:10][0:5:2] or 
> C[5][:][1:0]
> +   all have a rank of 2.  */

This still doesn't seem to say anything about the semantics of the
value *RANK on entry to the function.  (I think it's something like
*RANK being either 0, or the rank of another subexpression that must
have the same rank as this one, but you need to say that.)

> +/* Extracts all array notations in NODE and stores them in ARRAY_LIST.  If 
> +   IGNORE_BUILTIN_FN is set, then array notations inside array notation
> +   specific built-in functions are ignored.  The NODE can be anything from a
> +   full function to a single variable.   */

"can be anything"?  That seems rather ad hoc.  I'd think there should
be defined classes of trees - probably expressions and things that can
appear in them, but not tcc_exceptional or tcc_type - that can appear
here, and that you should check (in an assertion) for EXPR_P or one of
the other cases allowed.

In particular, you allow TREE_LIST in this function.  How can
TREE_LISTs get here and can they readily be avoided?  It's generally a
bad idea (and rare) to have places where something with the static
type "tree" can be either a TREE_LIST or some other kind of tree.  I
note that in the function replace_array_notations, which is presumably
intended to match this one, you *don't* handle TREE_LIST.

These functions recurse down into operands of trees.  But what about
into types?  If a type contains an expression that needs to be
evaluated as part of evaluating VLA sizes, that gets stored specially
by grokdeclarator, and in the end that expression get put in a
statement somewhere to ensure that it does get evaluated.  But that's
for expressions with side effects involved in types.  Array notation
expressions may not necessarily have side effects.  And as I
understand it, even if an expression is extracted OK by
extract_array_notation_exprs because it appears somewhere that
function looks at, replace_array_notations will need to substitute it
everywhere - substituting a copy appearing directly in a statement /
expression, while missing a copy embedded in a type, won't suffice.
So maybe you need to recurse down into types in some way?  (Then I'm
not entirely sure when it's safe to modify an existing type and when
you'd need to build up a new, similar type with the expression
modified appropriately.)

Maybe an example would help.  I see nothing in the Cilk Plus
specification to rule out expressions of the form

a[:] = ((int (*)[b[:]][c[:]]) d[:])[1][2];

meaning that each element of the array d should be cast to a
pointer-to-VLA type, with the dimensions of the VLA coming from
corresponding elements of arrays b and c, and then element[1][2] of
that VLA extracted.  But the rules for determining rank don't really
seem to consider subexpressions that appear within types, so maybe
adjustments are needed there as well.  (Of course such type names can
appear within expressions in sizeof, or compound literals, or several
other cases in the syntax, not just in casts.)

It's possible that the above case does work despite types not being
adjusted, because the logic to multiply by array sizes when doing
pointer addition / array dereference may already have taken effect
while the expressions were constructed.  But leaving types unadjusted
still seems rather risky, and would seem likely to cause problems with
debug info (consider the case where a variable is actually being
declared with the type involving arra

[Fortran-Dev] Some ubounds -> extent changes

2013-04-29 Thread Tobias Burnus
This patch changes some ubounds to extent. The patch is relative to my 
type patch - but it also applies without. It also fixes a bunch fo 
testsuite failures.


Build and regtested on x86-64-gnu-linux.
I intent to commit the patch soon. Comments and suggestions are welcome.

Tobias
2013-04-29  Tobias Burnus  

	* trans-array.c (gfc_trans_dummy_array_bias, get_std_lbound,
	gfc_alloc_allocatable_for_assignment): Change ubound to extent.
	* trans-expr.c (gfc_trans_alloc_subarray_assign): Ditto.
	* trans-intrinsic.c (gfc_conv_intrinsic_bound): Ditto.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 49eaaae..34421df 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -8110,7 +8110,7 @@ static tree
 get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size)
 {
   tree lbound;
-  tree ubound;
+  tree extent;
   tree stride;
   tree cond, cond1, cond3, cond4;
   tree tmp;
@@ -8120,10 +8120,10 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size)
 {
   tmp = gfc_rank_cst[dim];
   lbound = gfc_conv_descriptor_lbound_get (desc, tmp);
-  ubound = gfc_conv_descriptor_ubound_get (desc, tmp);
+  extent = gfc_conv_descriptor_extent_get (desc, tmp);
   stride = gfc_conv_descriptor_stride_get (desc, tmp);
-  cond1 = fold_build2_loc (input_location, GE_EXPR, boolean_type_node,
-			   ubound, lbound);
+  cond1 = fold_build2_loc (input_location, GT_EXPR, boolean_type_node,
+			   extent, gfc_index_zero_node);
   cond3 = fold_build2_loc (input_location, GE_EXPR, boolean_type_node,
 			   stride, gfc_index_zero_node);
   cond3 = fold_build2_loc (input_location, TRUTH_AND_EXPR,
@@ -8240,7 +8240,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   tree tmp;
   tree tmp2;
   tree lbound;
-  tree ubound;
+  tree extent;
   tree desc;
   tree old_desc;
   tree desc2;
@@ -8248,7 +8248,6 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
   tree jump_label1;
   tree jump_label2;
   tree neq_size;
-  tree lbd;
   int n;
   int dim;
   gfc_array_spec * as;
@@ -8411,37 +8410,24 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 
   for (n = 0; n < expr2->rank; n++)
 {
+  lbound = gfc_index_one_node;
   tmp = fold_build2_loc (input_location, MINUS_EXPR,
 			 gfc_array_index_type,
 			 loop->to[n], loop->from[n]);
-  tmp = fold_build2_loc (input_location, PLUS_EXPR,
+  extent = fold_build2_loc (input_location, PLUS_EXPR,
 			 gfc_array_index_type,
 			 tmp, gfc_index_one_node);
 
-  lbound = gfc_index_one_node;
-  ubound = tmp;
-
   if (as)
-	{
-	  lbd = get_std_lbound (expr2, desc2, n,
-as->type == AS_ASSUMED_SIZE);
-	  ubound = fold_build2_loc (input_location,
-MINUS_EXPR,
-gfc_array_index_type,
-ubound, lbound);
-	  ubound = fold_build2_loc (input_location,
-PLUS_EXPR,
-gfc_array_index_type,
-ubound, lbd);
-	  lbound = lbd;
-	}
+	lbound = get_std_lbound (expr2, desc2, n,
+ as->type == AS_ASSUMED_SIZE);
 
   gfc_conv_descriptor_lbound_set (&fblock, desc,
   gfc_rank_cst[n],
   lbound);
-  gfc_conv_descriptor_ubound_set (&fblock, desc,
+  gfc_conv_descriptor_extent_set (&fblock, desc,
   gfc_rank_cst[n],
-  ubound);
+  extent);
   gfc_conv_descriptor_stride_set (&fblock, desc,
   gfc_rank_cst[n],
   size1);
@@ -8455,7 +8441,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 offset, tmp2);
   size1 = fold_build2_loc (input_location, MULT_EXPR,
 			   gfc_array_index_type,
-			   tmp, size1);
+			   extent, size1);
 }
 
   /* Set the lhs descriptor and scalarizer offsets.  For rank > 1,
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index e21c3d2..2370f44 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -5830,7 +5830,6 @@ gfc_trans_alloc_subarray_assign (tree dest, gfc_component * cm,
 
   for (n = 0; n < expr->rank; n++)
 {
-  tree span;
   tree lbound;
 
   /* Obtain the correct lbound - ISO/IEC TR 15581:2001 page 9.
@@ -5860,14 +5859,7 @@ gfc_trans_alloc_subarray_assign (tree dest, gfc_component * cm,
 
   lbound = fold_convert (gfc_array_index_type, lbound);
 
-  /* Shift the bounds and set the offset accordingly.  */
-  tmp = gfc_conv_descriptor_ubound_get (dest, gfc_rank_cst[n]);
-  span = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type,
-		tmp, gfc_conv_descriptor_lbound_get (dest, gfc_rank_cst[n]));
-  tmp = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type,
-			 span, lbound);
-  gfc_conv_descriptor_ubound_set (&block, dest,
-  gfc_rank_cst[n], tmp);
+  /* Shift the lower_bound and set the offset accordingly.  */
   gfc_conv_descriptor_lbound_set (&block, dest,
   gfc_rank_cst[n], lbound);
 
diff --git a/gcc/fortran/trans

MEM_REF representation problem, and folding fix

2013-04-29 Thread Bernd Schmidt
Currently, MEM_REF contains two pointer arguments, one which is supposed
to be a base object and another which is supposed to be a constant
offset. This representation is somewhat problematic, as not all machines
treat pointer values as essentially integers. On machines where size_t
is smaller than a pointer, for example m32c where it's due to
limitations in the compiler, or the port I've been working on recently
where pointers contain a segment selector that does not participate in
additions, this is not an accurate representation, and it does cause
real issues.

It would be better to use a representation more like POINTER_PLUS with a
pointer and a real sizetype integer. Can someone explain the comment in
tree.def which states that the type of the constant offset is used for
TBAA purposes? It states "MEM_REF  is equivalent to
((typeof(c))p)->x [...]", so why not represent it as MEM_REF <(desired
type)p, (size_t)c>?

The following patch works around one instance of the problem. When we
fold an offset addition, the addition must be performed in sizetype,
otherwise we may get unwanted overflow. This bug triggers on m32c for
example, where an offset of 65528 (representing -8) and and offset of 8
are added, yielding an offset of 65536 instead of zero. Solved by
performing the intermediate computation in sizetype.

Bootstrapped and tested on x86_64-linux (all languages except Ada) with
no changes in the tests, and tested on m32c-elf where it fixes 22
failures. Ok?


Bernd
	* fold-const.c (fold_binary_loc): When folding an addition in the
	offset of a memref, use size_type to perform the arithmetic.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 59dbc03..6f092ab 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -10025,15 +10025,17 @@ fold_binary_loc (location_t loc,
 	  && handled_component_p (TREE_OPERAND (arg0, 0)))
 	{
 	  tree base;
+	  tree type1 = TREE_TYPE (arg1);
 	  HOST_WIDE_INT coffset;
 	  base = get_addr_base_and_unit_offset (TREE_OPERAND (arg0, 0),
 		&coffset);
 	  if (!base)
 	return NULL_TREE;
-	  return fold_build2 (MEM_REF, type,
-			  build_fold_addr_expr (base),
-			  int_const_binop (PLUS_EXPR, arg1,
-	   size_int (coffset)));
+	  arg1 = fold_convert (size_type_node, arg1);
+	  arg1 = int_const_binop (PLUS_EXPR, arg1, size_int (coffset));
+	  base = build_fold_addr_expr (base);
+	  arg1 = fold_convert (type1, arg1);
+	  return fold_build2 (MEM_REF, type, base, arg1);
 	}
 
   return NULL_TREE;


Fwd: [PATCH] Fix PR56915

2013-04-29 Thread Shixiong Xu
This patch is for the ICE of PR56915
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56915), specific to gcc
4.9. Because this patch only touches the C++ frontend, I only ran the
g++ and libstdc++ testsuits with the newly added testcase. And the
test results on Ubuntu x86_64 indicate no regression regarding the
testsuits.

The problem causing the ICE is that GCC sets DECL_INTERFACE_KNOWN too
early before 'start_preparsed_function', which generates the body of
the thunk and depends on the value of DECL_INTERFACE_KNOWN. Moving the
setting after the point of 'start_preparsed_function' or
'symtab_add_to_same_comdat_group' fixes the problem.

Shixiong
commit 1bc89baae77d7a6d4f98b70e6e603454e2837919
Author: Shixiong Xu 
Date:   Sun Apr 28 22:11:05 2013 +1200

PR c++/56915
* gcc/cp/semantics.c: Move down the setting to 
DECL_INTERFACE_KNOWN(...) after symtab_add_to_same_comdat_group(...).
* gcc/testsuite/g++.dg/torture/pr56915.C: New.


pr56915.patch
Description: Binary data


[PATCH, i386]: Fix PR44578, GCC generates MMX instructions but fails to generate "emms"

2013-04-29 Thread Uros Bizjak
Hello!

Attached patch fixes PR44578, where MMX register was allocated for
zero_extendsidi2 RTX. The patch adds "!" to the interfering
alternative, so RA won't choose alternative involving MMX register
unless absolute necessary.

2013-04-29  Uros Bizjak  

PR target/44578
* config/i386/i386.md (*zero_extendisid2): Add "!" to m->?*y
alternative.

testsuite/ChangeLog:

2013-04-29  Uros Bizjak  

PR target/44578
* gcc.target/i386/pr44578.c: New test.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32} and was committed to mainline SVN.

The patch will be backported to 4.7 and 4.8 branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 198401)
+++ config/i386/i386.md (working copy)
@@ -3049,10 +3049,10 @@
 
 (define_insn "*zero_extendsidi2"
   [(set (match_operand:DI 0 "nonimmediate_operand"
-   "=r,?r,?o,r   ,o,?*Ym,?*y,?*Yi,?*x")
+   "=r,?r,?o,r   ,o,?*Ym,?!*y,?*Yi,?*x")
(zero_extend:DI
 (match_operand:SI 1 "x86_64_zext_operand"
-   "0 ,rm,r ,rmWz,0,r   ,m  ,r   ,m")))]
+   "0 ,rm,r ,rmWz,0,r   ,m   ,r   ,m")))]
   ""
 {
   switch (get_attr_type (insn))
Index: testsuite/gcc.target/i386/pr44578.c
===
--- testsuite/gcc.target/i386/pr44578.c (revision 0)
+++ testsuite/gcc.target/i386/pr44578.c (working copy)
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mtune=athlon64" } */
+
+extern void abort (void);
+
+long double
+__attribute__((noinline, noclone))
+test (float num)
+{
+  unsigned int i;
+
+  if (num < 0.0)
+num = 0.0;
+
+  __builtin_memcpy (&i, &num, sizeof(unsigned int));
+
+  return (long double)(unsigned long long) i;
+}
+
+int
+main ()
+{
+  long double x;
+
+  x = test (0.0);
+
+  if (x != 0.0)
+abort ();
+
+  return 0;
+}


Re: [patch, mips] Fix for PR target/56942

2013-04-29 Thread Steve Ellcey
On Sat, 2013-04-27 at 08:56 +0100, Richard Sandiford wrote:

> >> But using next_real_insn was at least as correct (IMO, more correct)
> >> as next_active_insn before r197266.  It seems counterintuitive that
> >> something can be "active" but not "real".
> >> 
> >> Richard
> >
> > So should we put the active_insn_p hack/FIXME into real_next_insn?  That
> > doesn't seem like much of a win but it would probably fix the problem.
> 
> Yeah, I think so.  If "=>" mean "accepts more than", then there used
> to be a nice total order:
> 
>  next_insn
>   => next_nonnote_insn
>   => next_real_insn
>   => next_active_insn
> 
> I think we should keep that if possible, even during the transition period.
> 
> Thanks,
> Richard

OK, here is patch to next_real_insn to keep the ordering property intact
and fix the bug.  OK for checkin?

Steve Ellcey
sell...@imgtec.com



2013-04-29  Andrew Bennett 
Steve Ellcey  

PR target/56942
* emit-rtl.c (next_real_insn): Accept jump table data
as 'real' (like next_active_insn does).


diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 538b1ec..9de3f1e 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -3248,7 +3248,8 @@ next_real_insn (rtx insn)
   while (insn)
 {
   insn = NEXT_INSN (insn);
-  if (insn == 0 || INSN_P (insn))
+  if (insn == 0 || INSN_P (insn)
+ || JUMP_TABLE_DATA_P (insn)) /* FIXME */
break;
 }
 






patch to fix PR57097

2013-04-29 Thread Vladimir Makarov

The following patch fixes:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57097

The patch was successfully bootstrapped and tested on x86/x86-64.

Committed as rev. 198432.

2013-04-29  Vladimir Makarov  

PR target/57097
* lra-constraints.c (process_alt_operands): Discourage a bit more
using memory for pseudos.  Print cost dump for alternatives.
Modify cost values for conflicts with early clobbers.
(curr_insn_transform): Spill pseudos reassigned to NO_REGS.

2013-04-29  Vladimir Makarov  

PR target/57097
* gcc.target/i386/pr57097.c: New test.
Index: lra-constraints.c
===
--- lra-constraints.c   (revision 198422)
+++ lra-constraints.c   (working copy)
@@ -2013,7 +2013,7 @@ process_alt_operands (int only_alternati
 although it might takes the same number of
 reloads.  */
  if (no_regs_p && REG_P (op))
-   reject++;
+   reject += 2;
 
 #ifdef SECONDARY_MEMORY_NEEDED
  /* If reload requires moving value through secondary
@@ -2044,7 +2044,13 @@ process_alt_operands (int only_alternati
 or non-important thing to be worth to do it.  */
  overall = losers * LRA_LOSER_COST_FACTOR + reject;
  if ((best_losers == 0 || losers != 0) && best_overall < overall)
-   goto fail;
+{
+  if (lra_dump_file != NULL)
+   fprintf (lra_dump_file,
+"  alt=%d,overall=%d,losers=%d -- reject\n",
+nalt, overall, losers);
+  goto fail;
+}
 
  curr_alt[nop] = this_alternative;
  COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
@@ -2139,7 +2145,10 @@ process_alt_operands (int only_alternati
  curr_alt_dont_inherit_ops[curr_alt_dont_inherit_ops_num++]
= last_conflict_j;
  losers++;
- overall += LRA_LOSER_COST_FACTOR;
+ /* Early clobber was already reflected in REJECT. */
+ lra_assert (reject > 0);
+ reject--;
+ overall += LRA_LOSER_COST_FACTOR - 1;
}
  else
{
@@ -2163,7 +2172,10 @@ process_alt_operands (int only_alternati
}
  curr_alt_win[i] = curr_alt_match_win[i] = false;
  losers++;
- overall += LRA_LOSER_COST_FACTOR;
+ /* Early clobber was already reflected in REJECT. */
+ lra_assert (reject > 0);
+ reject--;
+ overall += LRA_LOSER_COST_FACTOR - 1;
}
}
   small_class_operands_num = 0;
@@ -2171,6 +2183,11 @@ process_alt_operands (int only_alternati
small_class_operands_num
  += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;
 
+  if (lra_dump_file != NULL)
+   fprintf (lra_dump_file, "  alt=%d,overall=%d,losers=%d,"
+"small_class_ops=%d,rld_nregs=%d\n",
+nalt, overall, losers, small_class_operands_num, reload_nregs);
+
   /* If this alternative can be made to work by reloading, and it
 needs less reloading than the others checked so far, record
 it as the chosen goal for reloading.  */
@@ -3136,7 +3153,15 @@ curr_insn_transform (void)
 spilled.  Spilled scratch pseudos are transformed
 back to scratches at the LRA end.  */
  && lra_former_scratch_operand_p (curr_insn, i))
-   change_class (REGNO (op), NO_REGS, "  Change", true);
+   {
+ int regno = REGNO (op);
+ change_class (regno, NO_REGS, "  Change", true);
+ if (lra_get_regno_hard_regno (regno) >= 0)
+   /* We don't have to mark all insn affected by the
+  spilled pseudo as there is only one such insn, the
+  current one.  */
+   reg_renumber[regno] = -1;
+   }
  continue;
}
 
Index: testsuite/gcc.target/i386/pr57097.c
===
--- testsuite/gcc.target/i386/pr57097.c (revision 0)
+++ testsuite/gcc.target/i386/pr57097.c (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fPIC" } */
+extern double ad[], bd[], cd[], dd[];
+extern long long all[], bll[], cll[], dll[];
+
+int
+main (int i, char **a)
+{
+  bd[i] = i + 64;
+  if (i % 3 == 0)
+{
+  cd[i] = i;
+}
+  dd[i] = i / 2;
+  ad[i] = i * 2;
+  if (i % 3 == 1)
+{
+  dll[i] = 127;
+}
+  dll[i] = i;
+  cll[i] = i * 2;
+  switch (i % 3)
+{
+case 0:
+  bll[i] = i + 64;
+}
+  all[i] = i / 2;
+  return 0;
+}


Re: [patch] Fix node weight updates during ipa-cp (issue7812053)

2013-04-29 Thread H.J. Lu
On Mon, Apr 29, 2013 at 10:31 AM, Teresa Johnson  wrote:
> FYI, Fixed in r198416.
>
> Thanks,
> Teresa
>

I noticed that sometimes GCC generates:

_8 = memcpy (ret_6, s_2(D), len_4);
_8 = memcpy (ret_6, s_2(D), len_4);
memcpy (_17, buffer_12(D), add_16);
memcpy (_17, buffer_12(D), add_16);
memcpy (_25, _28, _27);
memcpy (_25, _28, _27);
memcpy (_39, buffer_2, len_4);
memcpy (_39, buffer_2, len_4);
memcpy (_16, &fillbuf, pad_1);
memcpy (_16, &fillbuf, pad_1);
...



--
H.J.


[PATCH] Don't instrument with -fsanitize=thread accesses to DECL_HARD_REGISTER vars (PR tree-optimization/57104)

2013-04-29 Thread Jakub Jelinek
Hi!

DECL_HARD_REGISTER vars don't live in memory, thus they can't be
addressable.

The following patch fixes the ICE, ok for trunk/4.8?

2013-04-29  Jakub Jelinek  

PR tree-optimization/57104
* tsan.c (instrument_expr): Don't instrument accesses to
DECL_HARD_REGISTER VAR_DECLs.

* gcc.dg/pr57104.c: New test.

--- gcc/tsan.c.jj   2013-04-24 12:07:12.0 +0200
+++ gcc/tsan.c  2013-04-29 21:06:48.975888478 +0200
@@ -128,7 +128,9 @@ instrument_expr (gimple_stmt_iterator gs
return false;
 }
 
-  if (TREE_READONLY (base))
+  if (TREE_READONLY (base)
+  || (TREE_CODE (base) == VAR_DECL
+ && DECL_HARD_REGISTER (base)))
 return false;
 
   if (size == 0
--- gcc/testsuite/gcc.dg/pr57104.c.jj   2013-04-29 21:09:46.812948131 +0200
+++ gcc/testsuite/gcc.dg/pr57104.c  2013-04-29 21:09:39.0 +0200
@@ -0,0 +1,12 @@
+/* PR tree-optimization/57104 */
+/* { dg-do compile { target { x86_64-*-linux* && lp64 } } } */
+/* { dg-options "-fsanitize=thread" } */
+
+register int r asm ("r14");
+int v;
+
+int
+foo (void)
+{
+  return r + v;
+}

Jakub


Re: Make m32c build, fix PSImode truncation

2013-04-29 Thread DJ Delorie

> Sorry for missing the truncation patterns, I should have grepped
> more than m32c.md.  They look a lot like normal moves though.  Is
> truncation really not a noop, or are the patterns there to work
> around something (probably this :-))?

Not sure which pattern you're talking about, but in general, the
m32c's registers are either 16-bit or 24-bit.  You can move a pair of
16-bit registers into a 24-bit register and it truncates as part of
the move, likewise from 32-bit memory to 24-bit reg.  Note that moves
to other 32-bit destinations do *not* truncate, nor can 24-bit
registers hold 32-bit values (duh).  The 24-bit registers may also
hold a 16-bit value.

If you move a 16-bit value into a 24-bit register, it zero_extends.


[WIP RFH] #pragma omp declare simd (aka OpenMP elemental functions) parsing

2013-04-29 Thread Jakub Jelinek
Hi!

The following patch are some WIP steps towards #pragma omp declare simd
parsing.  The spec is a little bit vague, talks just that (a sequence of)
#pragma omp declare simd pragmas have to immediately precede a function
declaration or definition and that the arguments referred in its clauses
are the argument names of that function declaration or definition.

ATM the patch just throws that info away completely in
cp_finish_omp_declare_simd after calling finish_omp_clauses on it,
the plan is just for each clause list create some artificial attribute
(say "omp declare simd" with the spaces) and put the clauses as its
argument.

Now, my current problem is that in the declare-simd-1.C testcase
unfortunately on 2 lines I get 3 errors each; the problem is that this
is an explicit specialization and the original decl has no param names (or
could have different parameter names), and before start_decl returns
grokdeclarator -> grokfndecl -> check_explicit_specialization calls
duplicate_decls and throws away the new parameter names (if the new
explicit specialization isn't definition).
Any suggestions what to do?  The problem is that
grokdeclarator, grokfndecl, check_explicit_specialization are decl.c,
and have no access to cp_parser structure which contains the vector.
Should I copy the parser->omp_declare_simd_clauses vector pointer
say into cp_declarator structure so that grokfndecl could grab it from
there?  Also, for the attributes I wonder if it wouldn't be better to
finally replace the PARM_DECLs in the clauses say with parameter indexes,
because otherwise it might be difficult to adjust those during instantiation
etc.

Other comments?

2013-04-29  Jakub Jelinek  

* parser.h (struct cp_parser): Add omp_declare_simd_clauses field.
* parser.c (cp_ensure_no_omp_declare_simd): New function.
(enum pragma_context): Add pragma_member and pragma_objc_icode.
(cp_parser_linkage_specification, cp_parser_namespace_definition,
cp_parser_class_specifier_1):
Call cp_ensure_no_omp_declare_simd.
(cp_parser_init_declarator, cp_parser_member_declaration,
cp_parser_function_definition_from_specifiers_and_declarator,
cp_parser_save_member_function_body): Call cp_finish_omp_declare_simd.
(cp_parser_member_specification_opt): Pass pragma_member instead
of pragma_external to cp_parser_pragma.
(cp_parser_objc_interstitial_code): Pass pragma_objc_icode instead
of pragma_external to cp_parser_pragma.
(cp_parser_omp_var_list_no_open): If parser->omp_declare_simd_clauses,
just cp_parser_identifier the argument names.
(cp_parser_omp_all_clauses): Don't call finish_omp_clauses for
parser->omp_declare_simd_clauses.
(OMP_DECLARE_SIMD_CLAUSE_MASK): Define.
(cp_parser_omp_declare_simd, cp_finish_omp_declare_simd,
cp_parser_omp_declare): New functions.
(cp_parser_pragma): Call cp_ensure_no_omp_declare_simd.  Handle
PRAGMA_OMP_DECLARE_REDUCTION.  Replace == pragma_external with
!= pragma_stmt and != pragma_compound.

* g++.dg/gomp/declare-simd-1.C: New test.
* g++.dg/gomp/declare-simd-2.C: New test.

--- gcc/cp/parser.h.jj  2013-03-20 10:07:19.0 +0100
+++ gcc/cp/parser.h 2013-04-29 12:17:55.445392454 +0200
@@ -340,6 +340,10 @@ typedef struct GTY(()) cp_parser {
   /* The number of template parameter lists that apply directly to the
  current declaration.  */
   unsigned num_template_parameter_lists;
+
+  /* When parsing #pragma omp declare simd, this is a vector of
+ the clauses.  */
+  vec *omp_declare_simd_clauses;
 } cp_parser;
 
 /* In parser.c  */
--- gcc/cp/parser.c.jj  2013-04-24 15:24:45.0 +0200
+++ gcc/cp/parser.c 2013-04-29 19:55:05.987702600 +0200
@@ -1169,6 +1169,19 @@ cp_token_cache_new (cp_token *first, cp_
   return cache;
 }
 
+/* Diagnose if #pragma omp declare simd isn't followed immediately
+   by function declaration or definition.  */
+
+static inline void
+cp_ensure_no_omp_declare_simd (cp_parser *parser)
+{
+  if (parser->omp_declare_simd_clauses)
+{
+  error ("%<#pragma omp declare simd%> not immediately followed by "
+"function declaration or definition");
+  parser->omp_declare_simd_clauses = NULL;
+}
+}
 
 /* Decl-specifiers.  */
 
@@ -2149,7 +2162,13 @@ static bool cp_parser_function_transacti
 static tree cp_parser_transaction_cancel
   (cp_parser *);
 
-enum pragma_context { pragma_external, pragma_stmt, pragma_compound };
+enum pragma_context {
+  pragma_external,
+  pragma_member,
+  pragma_objc_icode,
+  pragma_stmt,
+  pragma_compound
+};
 static bool cp_parser_pragma
   (cp_parser *, enum pragma_context);
 
@@ -11154,6 +11173,8 @@ cp_parser_linkage_specification (cp_pars
  production.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
 {
+  cp_ensure_no_omp_declare_simd (parser);
+
   /* Consume the `{' token.  */
  

[PATCH, i386]: Fix PR57098, ICE with -mcmodel=large -msse4 and __builtin_shuffle()

2013-04-29 Thread Uros Bizjak
Hello!

2013-04-29  Uros Bizjak  

PR target/57098
* config/i386/i386.c (ix86_expand_vec_perm): Validize constant memory.

2013-04-29  Uros Bizjak  

PR target/57098
* gcc.target/i386/pr57098.c: New test.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

The patch will be backported to 4.7 and 4.8 branches.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 198401)
+++ config/i386/i386.c  (working copy)
@@ -20559,7 +20559,7 @@ ix86_expand_vec_perm (rtx operands[])
  vec[i * 2 + 1] = const1_rtx;
}
  vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
- vt = force_const_mem (maskmode, vt);
+ vt = validize_mem (force_const_mem (maskmode, vt));
  t1 = expand_simple_binop (maskmode, PLUS, t1, vt, t1, 1,
OPTAB_DIRECT);
 
@@ -20756,7 +20756,7 @@ ix86_expand_vec_perm (rtx operands[])
   for (i = 0; i < 16; ++i)
vec[i] = GEN_INT (i/e * e);
   vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
-  vt = force_const_mem (V16QImode, vt);
+  vt = validize_mem (force_const_mem (V16QImode, vt));
   if (TARGET_XOP)
emit_insn (gen_xop_pperm (mask, mask, mask, vt));
   else
@@ -20767,7 +20767,7 @@ ix86_expand_vec_perm (rtx operands[])
   for (i = 0; i < 16; ++i)
vec[i] = GEN_INT (i % e);
   vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
-  vt = force_const_mem (V16QImode, vt);
+  vt = validize_mem (force_const_mem (V16QImode, vt));
   emit_insn (gen_addv16qi3 (mask, mask, vt));
 }
 
Index: testsuite/gcc.target/i386/pr57098.c
===
--- testsuite/gcc.target/i386/pr57098.c (revision 0)
+++ testsuite/gcc.target/i386/pr57098.c (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-msse4 -mcmodel=large" } */
+
+typedef int V __attribute__((vector_size(16)));
+
+void foo (V *p, V *mask)
+{
+  *p = __builtin_shuffle (*p, *mask);
+}


[Patch, Fortran, committed] PR57114 correct intrinsic.texi (RANK) bug.

2013-04-29 Thread Tobias Burnus

Committed as obvious as Rev. 198429

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 198428)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,5 +1,11 @@
 2013-04-28  Tobias Burnus  
 
+	PR fortran/57114
+	* intrinsic.texi (RANK): Correct syntax description and
+	expected result.
+
+2013-04-28  Tobias Burnus  
+
 	PR fortran/57093
 	* trans-types.c (gfc_get_element_type): Fix handling
 	of scalar coarrays of type character.
Index: gcc/fortran/intrinsic.texi
===
--- gcc/fortran/intrinsic.texi	(Revision 198428)
+++ gcc/fortran/intrinsic.texi	(Arbeitskopie)
@@ -10279,7 +10279,7 @@
 Inquiry function
 
 @item @emph{Syntax}:
-@code{RESULT = RANGE(A)}
+@code{RESULT = RANK(A)}
 
 @item @emph{Arguments}:
 @multitable @columnfractions .15 .70
@@ -10296,7 +10296,7 @@
   integer :: a
   real, allocatable :: b(:,:)
 
-  print *, rank(a), rank(b) ! Prints:  0  3
+  print *, rank(a), rank(b) ! Prints:  0  2
 end program test_rank
 @end smallexample
 


Re: GCC does not support *mmintrin.h with function specific opts

2013-04-29 Thread Sriraman Tallam
On Thu, Apr 25, 2013 at 12:41 PM, Joseph S. Myers
 wrote:
> On Tue, 16 Apr 2013, Sriraman Tallam wrote:
>
>> Ok, it is on by default now.  There is a way to turn it off, with
>> -mno-generate-builtins.
>
> Any new option needs documenting in invoke.texi.

Added and new patch attached.

Thanks
Sri

>
> --
> Joseph S. Myers
> jos...@codesourcery.com
* config/i386/i386.c (construct_container): Do not issue SSE
return error for extern gnu_inline functions.
(def_builtin): Do not generate builtins when -mno-generate-builtins
is used.
* doc/invoke.texi: Document option -mgenerate-builtins.
* config/i386/i386.opt (mgenerate-builtins): New target option.
* config/i386/i386-c.c (ix86_target_macros_internal): Define macro
__ALL_ISA__ when generate_target_builtins is true.
* testsuite/gcc.target/i386/intrinsics_1.c: New test.
* testsuite/gcc.target/i386/intrinsics_2.c: Ditto.
* testsuite/gcc.target/i386/intrinsics_3.c: Ditto.
* testsuite/gcc.target/i386/intrinsics_4.c: Ditto.
* testsuite/gcc.target/i386/intrinsics_5.c: Ditto.
* config/i386/lzcntintrin.h: Expose header when __ALL_ISA__ is defined.
* config/i386/lwpintrin.h: Ditto.
* config/i386/xopintrin.h: Ditto.
* config/i386/fmaintrin.h: Ditto.
* config/i386/bmiintrin.h: Ditto.
* config/i386/fma4intrin.h: Ditto.
* config/i386/nmmintrin.h: Ditto.
* config/i386/tbmintrin.h: Ditto.
* config/i386/smmintrin.h: Ditto.
* config/i386/wmmintrin.h: Ditto.
* config/i386/popcntintrin.h: Ditto.
* config/i386/f16cintrin.h: Ditto.
* config/i386/pmmintrin.h: Ditto.
* config/i386/bmi2intrin.h: Ditto.
* config/i386/tmmintrin.h: Ditto.
* config/i386/xmmintrin.h: Ditto.
* config/i386/mmintrin.h: Ditto.
* config/i386/ammintrin.h: Ditto.
* config/i386/emmintrin.h: Ditto.

Index: config/i386/smmintrin.h
===
--- config/i386/smmintrin.h (revision 198212)
+++ config/i386/smmintrin.h (working copy)
@@ -27,7 +27,7 @@
 #ifndef _SMMINTRIN_H_INCLUDED
 #define _SMMINTRIN_H_INCLUDED
 
-#ifndef __SSE4_1__
+#if !defined (__SSE4_1__) && !defined (__ALL_ISA__)
 # error "SSE4.1 instruction set not enabled"
 #else
 
Index: config/i386/f16cintrin.h
===
--- config/i386/f16cintrin.h(revision 198212)
+++ config/i386/f16cintrin.h(working copy)
@@ -25,7 +25,7 @@
 # error "Never use  directly; include  or 
 instead."
 #endif
 
-#ifndef __F16C__
+#if !defined (__F16C__) && !defined (__ALL_ISA__)
 # error "F16C instruction set not enabled"
 #else
 
Index: config/i386/wmmintrin.h
===
--- config/i386/wmmintrin.h (revision 198212)
+++ config/i386/wmmintrin.h (working copy)
@@ -30,7 +30,7 @@
 /* We need definitions from the SSE2 header file.  */
 #include 
 
-#if !defined (__AES__) && !defined (__PCLMUL__)
+#if !defined (__AES__) && !defined (__PCLMUL__) && !defined (__ALL_ISA__)
 # error "AES/PCLMUL instructions not enabled"
 #else
 
Index: config/i386/bmi2intrin.h
===
--- config/i386/bmi2intrin.h(revision 198212)
+++ config/i386/bmi2intrin.h(working copy)
@@ -25,7 +25,7 @@
 # error "Never use  directly; include  instead."
 #endif
 
-#ifndef __BMI2__
+#if !defined (__BMI2__) && !defined (__ALL_ISA__)
 # error "BMI2 instruction set not enabled"
 #endif /* __BMI2__ */
 
Index: config/i386/pmmintrin.h
===
--- config/i386/pmmintrin.h (revision 198212)
+++ config/i386/pmmintrin.h (working copy)
@@ -27,7 +27,7 @@
 #ifndef _PMMINTRIN_H_INCLUDED
 #define _PMMINTRIN_H_INCLUDED
 
-#ifndef __SSE3__
+#if !defined (__SSE3__) && !defined (__ALL_ISA__)
 # error "SSE3 instruction set not enabled"
 #else
 
Index: config/i386/lzcntintrin.h
===
--- config/i386/lzcntintrin.h   (revision 198212)
+++ config/i386/lzcntintrin.h   (working copy)
@@ -25,7 +25,7 @@
 # error "Never use  directly; include  instead."
 #endif
 
-#ifndef __LZCNT__
+#if !defined (__LZCNT__) && !defined (__ALL_ISA__)
 # error "LZCNT instruction is not enabled"
 #endif /* __LZCNT__ */
 
Index: config/i386/tmmintrin.h
===
--- config/i386/tmmintrin.h (revision 198212)
+++ config/i386/tmmintrin.h (working copy)
@@ -27,7 +27,7 @@
 #ifndef _TMMINTRIN_H_INCLUDED
 #define _TMMINTRIN_H_INCLUDED
 
-#ifndef __SSSE3__
+#if !defined (__SSSE3__) && !defined (__ALL_ISA__)
 # error "SSSE3 instruction set not enabled"
 #else
 
Index: config/i386/xmmintrin.h
===
--- config/i386/xmmin

Re: [patch] Fix node weight updates during ipa-cp (issue7812053)

2013-04-29 Thread Teresa Johnson
FYI, Fixed in r198416.

Thanks,
Teresa

On Thu, Apr 25, 2013 at 10:19 PM, Teresa Johnson  wrote:
> Reproduced. This looks like another instance of a case I found testing
> my follow-on patch: the helper routines have some assertion checking
> that is too strict for the broader usage where we may be scaling
> counts up and not just down. I am verifying and will send a patch in
> the morning that suppresses this assert, which is the approach I am
> taking in the follow-on patch also coming tomorrow.
>
> Teresa
>
> On Thu, Apr 25, 2013 at 3:29 PM, H.J. Lu  wrote:
>> On Fri, Apr 5, 2013 at 7:18 AM, Teresa Johnson  wrote:
>>> On Thu, Mar 28, 2013 at 2:27 AM, Richard Biener
>>>  wrote:
 On Wed, Mar 27, 2013 at 6:22 PM, Teresa Johnson  
 wrote:
> I found that the node weight updates on cloned nodes during ipa-cp were
> leading to incorrect/insane weights. Both the original and new node weight
> computations used truncating divides, leading to a loss of total node 
> weight.
> I have fixed this by making both rounding integer divides.
>
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?

 I'm sure we can outline a rounding integer divide inline function on
 gcov_type.  To gcov-io.h, I suppose.

 Otherwise this looks ok to me.
>>>
>>> Thanks. I went ahead and worked on outlining this functionality. In
>>> the process of doing so, I discovered that there was already a method
>>> in basic-block.h to do part of this: apply_probability(), which does
>>> the rounding divide by REG_BR_PROB_BASE. There is a related function
>>> combine_probabilities() that takes 2 int probabilities instead of a
>>> gcov_type and an int probability. I decided to use apply_probability()
>>> in ipa-cp, and add a new macro GCOV_COMPUTE_SCALE to basic-block.h to
>>> compute the scale factor/probability via a rounding divide. So the
>>> ipa-cp changes I made use both GCOV_COMPUTE_SCALE and
>>> apply_probability.
>>>
>>> I then went through all the code to look for instances where we were
>>> computing scale factors/probabilities and performing scaling. I found
>>> a mix of existing uses of apply/combine_probabilities, uses of RDIV,
>>> inlined rounding divides, and truncating divides. I think it would be
>>> good to unify all of this. As a first step, I replaced all inline code
>>> sequences that were already doing rounding divides to compute scale
>>> factors/probabilities or do the scaling, to instead use the
>>> appropriate helper function/macro described above. For these
>>> locations, there should be no change to behavior.
>>>
>>> There are a number of places where there are truncating divides right
>>> now. Since changing those may impact the resulting behavior, for this
>>> patch I simply added a comment as to which helper they should use. As
>>> soon as this patch goes in I am planning to change those to use the
>>> appropriate helper and test performance, and then will send that patch
>>> for review. So for this patch, the only place where behavior is
>>> changed is in ipa-cp which was my original change.
>>>
>>> New patch is attached. Bootstrapped (both bootstrap and
>>> profiledbootstrap) and tested on x86-64-unknown-linux-gnu. Ok for
>>> trunk?
>>>
>>
>> This caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57077
>>
>>
>> H.J.
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Trivial testsuite fix

2013-04-29 Thread Andreas Schwab
Jeff Law  writes:

> commit 07373396d21b65f975c2354e7c6ab454200b40af
> Author: Jeff Law 

You should set the author accordingly.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH, i386]: Enable SSE -> GPR moves for generic x86 targets (PR target/54349)

2013-04-29 Thread Teresa Johnson
Hi Uros,

I was just updating an old bug (GCC generates MMX instructions but
fails to generate "emms"
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44578) with a case where I
am now hitting this same issue. But as I was including reproducer info
I found that the problem went away with a compiler I just updated and
rebuilt this morning. Turns out that your patch makes this problem
disappear. I think this is just a side-effect of your change though.
Specifically, enabling TARGET_INTER_UNIT_MOVES_FROM_VEC for Generic
and using that in inline_secondary_memory_needed() is changing
instruction selection in a lucky way for my test case (confirmed by
hand-modifying the code to use TARGET_INTER_UNIT_MOVES_TO_VEC instead
of TARGET_INTER_UNIT_MOVES_FROM_VEC). The code is the same going into
reload, but in reload I get the following difference for a
*zero_extendsidi2:

< Choosing alt 6 in insn 11:  (0) ?*y  (1) m
<   Creating newreg=74 from oldreg=67, assigning class MMX_REGS to r74
---
> Choosing alt 8 in insn 11:  (0) ?*x  (1) m
>   Creating newreg=74 from oldreg=67, assigning class SSE_REGS to r74

Would you agree that this is just a lucky side-effect and that there
is still a bug here? I think I will go ahead and update the bug (I can
still reproduce it with -mtune=athlon64). Here is the test case:

---
I have another instance of this issue. Trunk is generating move
instructions to implement an inlined memcpy. The move instructions use
the MMX registers, but no EMMS instruction is generated. My testcase
then calls a libm function that uses the FPU, which returns incorrect
results. This worked with an older gcc 4.7 based compiler, which
didn't use MMX registers.

The compiler was configured for x86_64-unknown-linux-gnu. The testcase
was compiled with -O2.

$ cat test.cc
#include 
#include 
#include 
#include 

namespace {
volatile double dd = 0.080553657784353652;
double dds, ddc;
}

unsigned long long test(float num) {
  if (num < 0) {
num = 0;
  }

  unsigned int i;
  memcpy(&i, &num, sizeof(unsigned int));
  unsigned long long a = i;

  sincos(dd, &dds, &ddc);
  if (isnan(dds) || isnan(ddc))
  {
printf ("Failed\n");
exit (1);
  }
  return a;
}

$ cat test_main.cc
#include 

extern unsigned long long test(float num);
int main()
{
  unsigned long long h = test(1);
  printf ("Passed\n");
}

$ g++ -O2 test*.cc -mtune=athlon64

$ a.out
Failed
---

Thanks,
Teresa

On Mon, Apr 29, 2013 at 4:08 AM, Uros Bizjak  wrote:
> Hello!
>
> Attached patch enables SSE -> general register moves for generic x86
> targets. The patch splits TARGET_INTER_UNIT_MOVES to
> TARGET_INTER_UNIT_MOVES_TO_VEC and TARGET_INTER_UNIT_MOVES_FROM_VEC
> tuning flags and updates gcc sources accordingly.
>
> According to AMD optimization manuals, direct moves *FROM* SSE (and
> MMX) registers *TO* general registers should be used for AMD K10
> family and later families. Since Intel targets are unaffected by this
> change, I have also changed generic setting to enable these moves for
> a generic target tuning.
>
> 2013-04-29  Uros Bizjak  
>
> PR target/54349
> * config/i386/i386.h (enum ix86_tune_indices)
> :
> New, split from X86_TUNE_INTER_UNIT_MOVES.
> : Remove.
> (TARGET_INTER_UNIT_MOVES_TO_VEC): New define.
> (TARGET_INTER_UNIT_MOVES_FROM_VEC): Ditto.
> (TARGET_INTER_UNIT_MOVES): Remove.
> * config/i386/i386.c (initial_ix86_tune_features): Update.
> Disable X86_TUNE_INTER_UNIT_MOVES_FROM_VEC for m_ATHLON_K8 only.
> (ix86_expand_convert_uns_didf_sse): Use
> TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
> (ix86_expand_vector_init_one_nonzero): Ditto.
> (ix86_expand_vector_init_interleave): Ditto.
> (inline_secondary_memory_needed): Return true for moves from SSE class
> registers for !TARGET_INTER_UNIT_MOVES_FROM_VEC targets and for moves
> to SSE class registers for !TARGET_INTER_UNIT_MOVES_TO_VEC targets.
> * config/i386/constraints.md (Yi, Ym): Depend on
> TARGET_INTER_UNIT_MOVES_TO_VEC.
> (Yj, Yn): New constraints.
> * config/i386/i386.md (*movdi_internal): Change constraints of
> operand 1 from Yi to Yj and from Ym to Yn.
> (*movsi_internal): Ditto.
> (*movdf_internal): Ditto.
> (*movsf_internal): Ditto.
> (*float2_1): Use
> TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
> (*float2_1 splitters): Ditto.
> (floatdi2_i387_with_xmm): Ditto.
> (floatdi2_i387_with_xmm splitters): Ditto.
> * config/i386/sse.md (movdi_to_sse): Ditto.
> (sse2_stored): Change constraint of operand 1 from Yi to Yj.
> Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of
> TARGET_INTER_UNIT_MOVES.
> (sse_storeq_rex64): Change constraint of operand 1 from Yi to Yj.
> (sse_storeq_rex64 splitter): Use TARGET_INTER_UNIT_MOVES_FROM_VEC
> instead of TARGET_INTER_UNIT_MOVES.
> * config/i386/mmx.md (*mov_internal): Change constraint 

Trivial testsuite fix

2013-04-29 Thread Jeff Law


A private message from Kai  to myself include this patch to fix an out 
of bounds array access in the testsuite.


Installed on the trunk for Kai.


commit 07373396d21b65f975c2354e7c6ab454200b40af
Author: Jeff Law 
Date:   Mon Apr 29 10:22:11 2013 -0600

   * gcc.c-torture/execute/pr55875.c

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 3364efc..73eeaf2 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2013-04-29  Kai Tietz  
+
+   * gcc.c-torture/execute/pr55875.c
+
 2013-04-29  Richard Biener  
 
PR middle-end/57075
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr55875.c 
b/gcc/testsuite/gcc.c-torture/execute/pr55875.c
index 4a0ce1b..4e56f7c 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr55875.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr55875.c
@@ -1,4 +1,4 @@
-int a[250];
+int a[251];
 __attribute__ ((noinline))
 t(int i)
 {


Re: [testsuite] Disabling gcc.dg/cpp/trad/include.c for Android

2013-04-29 Thread Mike Stump
On Jan 9, 2013, at 7:14 AM, Alexander Ivchenko  wrote:
>  We have test fail for gcc.dg/cpp/trad/include.c on Android. The
> reason for that is that
> -ftraditional-cpp is not expected to work on Android due to variadic
> macro (like #define __builtin_warning(x, y...))
> in standard headers and traditional preprocessor cannot handle them.
>  The attached patch disables that test.

Be sure to ask, Ok? in your patch submittals.

Ok.


Re: [C++ Patch/RFC] PR 57092

2013-04-29 Thread Jason Merrill

On 04/29/2013 05:05 AM, Paolo Carlini wrote:

in this 4.8/4.9 Regression, finish_decltype_type doesn't handle
ADDR_EXPR.


Hmm...we're seeing the regression because previously 
finish_decltype_type would have just returned the type of the template 
parameter so it wouldn't ever see the ADDR_EXPR at instantiation time. 
But we want to form a DECLTYPE_TYPE so that the mangling is correct. 
Perhaps the right solution is to handle this case specially in 
tsubst/DECLTYPE_TYPE: If id is true and the original expr is a 
TEMPLATE_PARM_INDEX, just instantiate the type of the template parm 
rather than its value.


Jason



[PING] SLSR for conditional candidates

2013-04-29 Thread Bill Schmidt
Half-hearted ping for
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01291.html ...

I promise this is the last major code dump for SLSR. ;)

Thanks,
Bill




Re: [PATCH] Redesign pthread in LIB_SPEC for systems without libpthread

2013-04-29 Thread Alexander Ivchenko
*ping*

thank you,
Alexander

2013/4/15 Pavel Chupin :
> On Tue, Apr 2, 2013 at 1:59 PM, Pavel Chupin  wrote:
>> On Mon, Apr 1, 2013 at 7:07 PM, Pavel Chupin  
>> wrote:
>>> On Android pthread is integrated into libc.
>>> Attached patch fixes configures for this case by trying to build test
>>> without -pthread -lpthread.
>>>
>>> 2013-04-01  Pavel Chupin  
>>>
>>> Fix libatomic and libgomp configure for systems without libpthread
>>> * libatomic/configure.ac: Add test without -pthread -lpthread.
>>> * libgomp/configure.ac: Ditto.
>>> * libatomic/configure: Regenerate.
>>> * libgomp/configure: Regenerate.
>>>
>>> OK for trunk?
>>>
>>
>> I think I made a better fix:
>>
>> 2013-04-02  Pavel Chupin  
>>
>> Redesign pthread in LIB_SPEC for systems without libpthread
>> * gcc/config/gnu-user.h: Remove pthread from GNU_USER_TARGET_LIB_SPEC
>> but keep in default LIB_SPEC
>> * gcc/config/linux-android.h: Add pthread to ANDROID_LIB_SPEC
>>
>> Is it OK for trunk?
>
> Ping
>
> --
> Pavel Chupin
> Intel Corporation


[Patch, testsuite] Add -gdwarf to debug/dwarf2 testcases (Take 2)

2013-04-29 Thread Senthil Kumar Selvaraj
This patch adds -gdwarf to the flags passed to all tests run by
dwarf2.exp. In the first attempt, I'd added the flag to dg-options in
individual testcases. Jakub then suggested adding it to the exp file
instead.

Does this look ok? If yes, could someone commit please, I don't have
commit access.

Regards
Senthil

gcc/testsuite/ChangeLog

2013-04-29  Senthil Kumar Selvaraj 

* gcc.dg/debug/dwarf2/dwarf2.exp: Replace -gdwarf-2 with -gdwarf and
force -gdwarf when invoking dg-runtest.


diff --git gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp 
gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp
index 829840c..f161787 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp
+++ gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp
@@ -22,7 +22,7 @@ load_lib gcc-dg.exp
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
-set DEFAULT_CFLAGS " -ansi -pedantic-errors -gdwarf-2"
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
 }
 
 # Initialize `dg'.
@@ -31,12 +31,12 @@ dg-init
 # Main loop.
 set comp_output [gcc_target_compile \
 "$srcdir/$subdir/../trivial.c" "trivial.S" assembly \
-"additional_flags=-gdwarf-2"]
+"additional_flags=-gdwarf"]
 if { ! [string match "*: target system does not support the * debug format*" \
 $comp_output] } {
 remove-build-file "trivial.S"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] 
$srcdir/c-c++-common/dwarf2/*.c]] \
-   "" $DEFAULT_CFLAGS
+   " -gdwarf " $DEFAULT_CFLAGS
 }
 
 # All done.


[PATCH] Fix PR57075

2013-04-29 Thread Richard Biener

When fixing PR57036 made the inliner not add abnormal edges
from calls to non-local labels or setjmps I made it not split
the blocks after the possible source of abnormal control flow.
That turns out to upset the CFG verifier so the following
re-instantiates splitting of blocks.

Bootstrap & regtest ongoing on x86_64-unknown-linux-gnu.

Richard.

2013-04-29  Richard Biener  

PR middle-end/57075
* tree-inline.c (copy_edges_for_bb): Still split the bbs,
even if not adding abnormal edges for calls that can make
abnormal gotos.

* gcc.dg/torture/pr57075.c: New testcase.

Index: gcc/tree-inline.c
===
*** gcc/tree-inline.c   (revision 198409)
--- gcc/tree-inline.c   (working copy)
*** copy_edges_for_bb (basic_block bb, gcov_
*** 1923,1933 
   into a COMPONENT_REF which doesn't.  If the copy
   can throw, the original could also throw.  */
can_throw = stmt_can_throw_internal (copy_stmt);
!   /* If the call we inline cannot make abnormal goto do not add
!  additional abnormal edges but only retain those already present
!in the original function body.  */
!   nonlocal_goto
!   = can_make_abnormal_goto && stmt_can_make_abnormal_goto (copy_stmt);
  
if (can_throw || nonlocal_goto)
{
--- 1927,1933 
   into a COMPONENT_REF which doesn't.  If the copy
   can throw, the original could also throw.  */
can_throw = stmt_can_throw_internal (copy_stmt);
!   nonlocal_goto = stmt_can_make_abnormal_goto (copy_stmt);
  
if (can_throw || nonlocal_goto)
{
*** copy_edges_for_bb (basic_block bb, gcov_
*** 1955,1960 
--- 1955,1964 
else if (can_throw)
make_eh_edges (copy_stmt);
  
+   /* If the call we inline cannot make abnormal goto do not add
+  additional abnormal edges but only retain those already present
+in the original function body.  */
+   nonlocal_goto &= can_make_abnormal_goto;
if (nonlocal_goto)
make_abnormal_goto_edges (gimple_bb (copy_stmt), true);
  
Index: gcc/testsuite/gcc.dg/torture/pr57075.c
===
*** gcc/testsuite/gcc.dg/torture/pr57075.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr57075.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ 
+ extern int baz (void) __attribute__ ((returns_twice));
+ int __attribute__ ((__leaf__))
+ foo (void)
+ {
+   return __builtin_printf ("$");
+ }
+ 
+ void
+ bar ()
+ {
+   foo ();
+   baz ();
+ }


Re: [Patch] Emit error for negative _Alignas alignment values

2013-04-29 Thread Joseph S. Myers
On Thu, 25 Apr 2013, Senthil Kumar Selvaraj wrote:

> On Wed, Apr 24, 2013 at 03:18:51PM +, Joseph S. Myers wrote:
> > On Wed, 3 Apr 2013, Senthil Kumar Selvaraj wrote:
> > 
> > > 2013-04-03Senthil Kumar Selvaraj  
> > > 
> > > 
> > >   * c-common.c (check_user_alignment): Emit error for negative values
> > > 
> > >   * gcc.dg/c1x-align-3.c: Add test for negative power of 2
> > 
> > OK (but note there should be a "." at the end of each ChangeLog entry).
> > 
> 
> Fixed now. I also moved the test case change into its own Changelog. Could
> someone commit it for me please, as I don't have commit access?

Thanks, committed.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Make m32c build, fix PSImode truncation

2013-04-29 Thread Richard Sandiford
Richard Sandiford  writes:
> Bernd Schmidt  writes:
>> On 04/27/2013 10:39 AM, Richard Sandiford wrote:
>>> Argh, that's unfortunate.  The point of that change was to make
>>> simplify_gen_unary (TRUNCATE, ...) no worse than using a subreg.
>>> Would the equivalent lowpart simplify_gen_subreg call succeed
>>> (return nonnull)?  If so, I think we want truncate to do the same.
>>> 
>>> What simplification is this blocking, and why does it lead to
>>> reload failures?
>>
>> There's an explicit (set (reg:PSI) (truncate:PSI (reg:SI)) insn which
>> currently gets changed to (set (reg:PSI) (subreg:PSI (reg:SI)) during
>> cse1. Reload fails because the subreg gets propagated into a memory
>> address, which requires a class of A_REGS, but A_REGS can only hold
>> PSImode values, not SImode.  This shows that the truncation is not
>> always a no-op: in this case it involves a register move, but there's no
>> way to describe this using TRULY_NOOP_TRUNCATION.
>
> Hmm, but isn't this a reload bug?  We have:
>
> (insn 53 51 54 10 (set (reg:HI 0 r0 [orig:26 D.2817 ] [26])
> (zero_extend:HI (mem/u/j:QI (plus:PSI (subreg:PSI (reg:SI 44 [ D.2818 
> ]) 0)
> (symbol_ref:PSI ("__clz_tab") [flags 0x40]   0x7f2c253d42f8 __clz_tab>)) [0 __clz_tab S1 A8]))) 
> /home/richards/gcc/HEAD/gcc/libgcc/libgcc2.c:520 115 {zero_extendqihi2}
>  (expr_list:REG_DEAD (reg:SI 44 [ D.2818 ])
> (nil)))
>
> Reloads for insn # 53
> Reload 0: reload_in (SI) = (reg:SI 44 [ D.2818 ])
> A_REGS, RELOAD_FOR_OTHER_ADDRESS (opnum = 0)
> reload_in_reg: (reg:SI 44 [ D.2818 ])
>
> find_reloads_address_1 is reloading the SUBREG_REG rather than the
> SUBREG itself, even though SImode is not valid for BASE_REGS == A_REGS:
>
>   if (GET_CODE (op0) == SUBREG)
> {
>   op0 = SUBREG_REG (op0);
>   code0 = GET_CODE (op0);
>   if (code0 == REG && REGNO (op0) < FIRST_PSEUDO_REGISTER)
> op0 = gen_rtx_REG (word_mode,
>(REGNO (op0) +
> subreg_regno_offset (REGNO (SUBREG_REG 
> (orig_op0)),
>  GET_MODE (SUBREG_REG 
> (orig_op0)),
>  SUBREG_BYTE (orig_op0),
>  GET_MODE (orig_op0;
> }
>
> push_reloads would specifically not convert a SUBREG reload to a
> REG reload in this case.  In principle, I think address subregs
> should be handled in the same way.
>
> So is the problem really that (subreg:PSI (reg:SI ...)) isn't a valid
> truncation on m32c?  Without TRULY_NOOP_TRUNCATION, I don't see what
> forces most code to use (truncate:PSI (reg:SI ...)) instead.  Many places
> would call gen_lowpart directly.
>
> Sorry for missing the truncation patterns, I should have grepped more
> than m32c.md.  They look a lot like normal moves though.  Is truncation
> really not a noop, or are the patterns there to work around something
> (probably this :-))?

Even if that's true, I suppose it isn't worth trying to fix such a
sensitive part of reload at this stage.  I think LRA already handles
it correctly.

In the meantime, we could work around the problem by disallowing subregs
in m32c addresses.  I think all non-paradoxical subregs[*] are going to
need a reload anyway, so it should also produce better code.

 [*] Paradoxical subregs imply an address has don't-care bits, so should
 be rare.

FWIW, the proof-of-concept patch below restores the build for me.
I realise it might fail muster on style grounds though.

Richard

gcc/
* config/m32c/m32c.c (address_pattern_p): New variable.
(encode_pattern_1): Include subregs address_pattern_p.
(encode_pattern): Add address_p parameter.
(m32c_legitimate_address_p): Update accordingly.

Index: gcc/config/m32c/m32c.c
===
--- gcc/config/m32c/m32c.c  2013-04-29 14:07:50.0 +0100
+++ gcc/config/m32c/m32c.c  2013-04-29 14:07:51.207987093 +0100
@@ -113,6 +113,7 @@ static int class_contents[LIM_REG_CLASSE
 /* These are all to support encode_pattern().  */
 static char pattern[30], *patternp;
 static GTY(()) rtx patternr[30];
+static bool address_pattern_p;
 #define RTX_IS(x) (streq (pattern, x))
 
 /* Some macros to simplify the logic throughout this file.  */
@@ -166,8 +167,9 @@ encode_pattern_1 (rtx x)
   *patternp++ = 'r';
   break;
 case SUBREG:
-  if (GET_MODE_SIZE (GET_MODE (x)) !=
- GET_MODE_SIZE (GET_MODE (XEXP (x, 0
+  if (address_pattern_p
+ || (GET_MODE_SIZE (GET_MODE (x))
+ != GET_MODE_SIZE (GET_MODE (XEXP (x, 0)
*patternp++ = 'S';
   encode_pattern_1 (XEXP (x, 0));
   break;
@@ -254,9 +256,10 @@ encode_pattern_1 (rtx x)
 }
 
 static void
-encode_pattern (rtx x)
+encode_pattern (rtx x, bool address_p = false)
 {
   patternp = pattern;
+

Re: [Patch, Ping] Emit error for negative _Alignas alignment values

2013-04-29 Thread Senthil Kumar Selvaraj
Ping - could you commit it for me please, I don't have commit
access.

Regards
Senthil

On Thu, Apr 25, 2013 at 12:10:06PM +0530, Senthil Kumar Selvaraj wrote:
> On Wed, Apr 24, 2013 at 03:18:51PM +, Joseph S. Myers wrote:
> > On Wed, 3 Apr 2013, Senthil Kumar Selvaraj wrote:
> > 
> > > 2013-04-03Senthil Kumar Selvaraj  
> > > 
> > > 
> > >   * c-common.c (check_user_alignment): Emit error for negative values
> > > 
> > >   * gcc.dg/c1x-align-3.c: Add test for negative power of 2
> > 
> > OK (but note there should be a "." at the end of each ChangeLog entry).
> > 
> 
> Fixed now. I also moved the test case change into its own Changelog. Could
> someone commit it for me please, as I don't have commit access?
> 
> Regards
> Senthil
> 
> gcc/c-family/ChangeLog
> 
> 2013-04-03  Senthil Kumar Selvaraj  
> 
>   * c-common.c (check_user_alignment): Emit error for negative values.
> 
> gcc/testsuite/ChangeLog
> 
> 2013-04-03  Senthil Kumar Selvaraj  
> 
>   * gcc.dg/c1x-align-3.c: Add test for negative power of 2.
> 
> 
> diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
> index c7cdd0f..dfdfbb6 100644
> --- gcc/c-family/c-common.c
> +++ gcc/c-family/c-common.c
> @@ -7308,9 +7308,10 @@ check_user_alignment (const_tree align, bool 
> allow_zero)
>  }
>else if (allow_zero && integer_zerop (align))
>  return -1;
> -  else if ((i = tree_log2 (align)) == -1)
> +  else if (tree_int_cst_sgn (align) == -1
> +   || (i = tree_log2 (align)) == -1)
>  {
> -  error ("requested alignment is not a power of 2");
> +  error ("requested alignment is not a positive power of 2");
>return -1;
>  }
>else if (i >= HOST_BITS_PER_INT - BITS_PER_UNIT_LOG)
> diff --git gcc/testsuite/gcc.dg/c1x-align-3.c 
> gcc/testsuite/gcc.dg/c1x-align-3.c
> index 0b2a77f..b97351c 100644
> --- gcc/testsuite/gcc.dg/c1x-align-3.c
> +++ gcc/testsuite/gcc.dg/c1x-align-3.c
> @@ -23,6 +23,7 @@ _Alignas (-(__LONG_LONG_MAX__-1)/4) char i3; /* { dg-error 
> "too large|power of 2
>  _Alignas (-(__LONG_LONG_MAX__-1)/8) char i4; /* { dg-error "too large|power 
> of 2" } */
>  _Alignas (-(__LONG_LONG_MAX__-1)/16) char i5; /* { dg-error "too large|power 
> of 2" } */
>  _Alignas (-1) char j; /* { dg-error "power of 2" } */
> +_Alignas (-2) char j; /* { dg-error "positive power of 2" } */
>  _Alignas (3) char k; /* { dg-error "power of 2" } */
> 
>  _Alignas ((void *) 1) char k; /* { dg-error "integer constant" } */


Re: Make m32c build, fix PSImode truncation

2013-04-29 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 04/27/2013 10:39 AM, Richard Sandiford wrote:
>> Argh, that's unfortunate.  The point of that change was to make
>> simplify_gen_unary (TRUNCATE, ...) no worse than using a subreg.
>> Would the equivalent lowpart simplify_gen_subreg call succeed
>> (return nonnull)?  If so, I think we want truncate to do the same.
>> 
>> What simplification is this blocking, and why does it lead to
>> reload failures?
>
> There's an explicit (set (reg:PSI) (truncate:PSI (reg:SI)) insn which
> currently gets changed to (set (reg:PSI) (subreg:PSI (reg:SI)) during
> cse1. Reload fails because the subreg gets propagated into a memory
> address, which requires a class of A_REGS, but A_REGS can only hold
> PSImode values, not SImode.  This shows that the truncation is not
> always a no-op: in this case it involves a register move, but there's no
> way to describe this using TRULY_NOOP_TRUNCATION.

Hmm, but isn't this a reload bug?  We have:

(insn 53 51 54 10 (set (reg:HI 0 r0 [orig:26 D.2817 ] [26])
(zero_extend:HI (mem/u/j:QI (plus:PSI (subreg:PSI (reg:SI 44 [ D.2818 
]) 0)
(symbol_ref:PSI ("__clz_tab") [flags 0x40]  )) [0 __clz_tab S1 A8]))) 
/home/richards/gcc/HEAD/gcc/libgcc/libgcc2.c:520 115 {zero_extendqihi2}
 (expr_list:REG_DEAD (reg:SI 44 [ D.2818 ])
(nil)))

Reloads for insn # 53
Reload 0: reload_in (SI) = (reg:SI 44 [ D.2818 ])
A_REGS, RELOAD_FOR_OTHER_ADDRESS (opnum = 0)
reload_in_reg: (reg:SI 44 [ D.2818 ])

find_reloads_address_1 is reloading the SUBREG_REG rather than the
SUBREG itself, even though SImode is not valid for BASE_REGS == A_REGS:

if (GET_CODE (op0) == SUBREG)
  {
op0 = SUBREG_REG (op0);
code0 = GET_CODE (op0);
if (code0 == REG && REGNO (op0) < FIRST_PSEUDO_REGISTER)
  op0 = gen_rtx_REG (word_mode,
 (REGNO (op0) +
  subreg_regno_offset (REGNO (SUBREG_REG 
(orig_op0)),
   GET_MODE (SUBREG_REG 
(orig_op0)),
   SUBREG_BYTE (orig_op0),
   GET_MODE (orig_op0;
  }

push_reloads would specifically not convert a SUBREG reload to a
REG reload in this case.  In principle, I think address subregs
should be handled in the same way.

So is the problem really that (subreg:PSI (reg:SI ...)) isn't a valid
truncation on m32c?  Without TRULY_NOOP_TRUNCATION, I don't see what
forces most code to use (truncate:PSI (reg:SI ...)) instead.  Many places
would call gen_lowpart directly.

Sorry for missing the truncation patterns, I should have grepped more
than m32c.md.  They look a lot like normal moves though.  Is truncation
really not a noop, or are the patterns there to work around something
(probably this :-))?

Richard


Re: [PATCH][ARM] Restrict store_minmaxsi

2013-04-29 Thread Richard Earnshaw

On 29/04/13 12:33, Kyrylo Tkachov wrote:

Hi all,

With this patch, we now only use the store_minmaxsi pattern when we're not
in a hot path.

We have found that this pattern can cause memory access bottlenecks in some
cases
(one benchmark was 45% slower when this pattern was enabled).

Tested arm-none-eabi on qemu.

Ok for trunk?

Thanks,
Kyrill

2013-04-29  Kyrylo Tkachov  

* config/arm/arm.md (store_minmaxsi): Use only when
optimize_insn_for_size_p.




OK.

R.




[PATCH] Update tail-merge header comment.

2013-04-29 Thread Tom de Vries
Steven,

I answered your question in tree-ssa-tail-merge.c about why tail-merge is not a
stand-alone gimple pass.

Committed to trunk.

Thanks,
- Tom

2013-04-29  Tom de Vries  

* tree-ssa-tail-merge.c: Update header comment.


[PATCH] Fix PR57103

2013-04-29 Thread Richard Biener

The following fixes a thinko in move_stmt_op regarding to block
updates.  It also makes the two copies look the same and removes
redundant checking.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Richard.

2013-04-29  Richard Biener  

PR middle-end/57103
* tree-cfg.c (move_stmt_op): Fix condition under which to update
TREE_BLOCK.
(move_stmt_r): Remove redundant checking.

* gcc.dg/autopar/pr57103.c: New testcase.

Index: gcc/tree-cfg.c
===
*** gcc/tree-cfg.c  (revision 198409)
--- gcc/tree-cfg.c  (working copy)
*** move_stmt_op (tree *tp, int *walk_subtre
*** 6099,6108 
  
if (EXPR_P (t))
  {
!   if (TREE_BLOCK (t) == p->orig_block
  || (p->orig_block == NULL_TREE
! && TREE_BLOCK (t) == NULL_TREE))
TREE_SET_BLOCK (t, p->new_block);
  }
else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME)
  {
--- 6099,6117 
  
if (EXPR_P (t))
  {
!   tree block = TREE_BLOCK (t);
!   if (block == p->orig_block
  || (p->orig_block == NULL_TREE
! && block != NULL_TREE))
TREE_SET_BLOCK (t, p->new_block);
+ #ifdef ENABLE_CHECKING
+   else if (block != NULL_TREE)
+   {
+ while (block && TREE_CODE (block) == BLOCK && block != p->orig_block)
+   block = BLOCK_SUPERCONTEXT (block);
+ gcc_assert (block == p->orig_block);
+   }
+ #endif
  }
else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME)
  {
*** move_stmt_r (gimple_stmt_iterator *gsi_p
*** 6187,6204 
gimple stmt = gsi_stmt (*gsi_p);
tree block = gimple_block (stmt);
  
!   if (p->orig_block == NULL_TREE
!   || block == p->orig_block
!   || block == NULL_TREE)
  gimple_set_block (stmt, p->new_block);
- #ifdef ENABLE_CHECKING
-   else if (block != p->new_block)
- {
-   while (block && block != p->orig_block)
-   block = BLOCK_SUPERCONTEXT (block);
-   gcc_assert (block);
- }
- #endif
  
switch (gimple_code (stmt))
  {
--- 6196,6205 
gimple stmt = gsi_stmt (*gsi_p);
tree block = gimple_block (stmt);
  
!   if (block == p->orig_block
!   || (p->orig_block == NULL_TREE
! && block != NULL_TREE))
  gimple_set_block (stmt, p->new_block);
  
switch (gimple_code (stmt))
  {
*** move_block_to_fn (struct function *dest_
*** 6426,6439 
  e->goto_locus = d->new_block ?
  COMBINE_LOCATION_DATA (line_table, e->goto_locus, d->new_block) :
  LOCATION_LOCUS (e->goto_locus);
- #ifdef ENABLE_CHECKING
-   else if (block != d->new_block)
- {
-   while (block && block != d->orig_block)
- block = BLOCK_SUPERCONTEXT (block);
-   gcc_assert (block);
- }
- #endif
}
  }
  
--- 6427,6432 
Index: gcc/testsuite/gcc.dg/autopar/pr57103.c
===
*** gcc/testsuite/gcc.dg/autopar/pr57103.c  (revision 0)
--- gcc/testsuite/gcc.dg/autopar/pr57103.c  (working copy)
***
*** 0 
--- 1,19 
+ /* { dg-do compile } */
+ /* { dg-options "-O -ftree-parallelize-loops=4" } */
+ 
+ int d[1024];
+ 
+ static inline int foo (void)
+ {
+   int s = 0;
+   int i = 0;
+   for (; i < 1024; i++)
+ s += d[i];
+   return s;
+ }
+ 
+ void bar (void)
+ {
+   if (foo ())
+ __builtin_abort ();
+ }


Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2013-04-29 Thread Jakub Jelinek
On Mon, Apr 29, 2013 at 03:39:58PM +0400, Maksim Kuznetsov wrote:
> 2013/4/29 Jakub Jelinek :
> > Also, why are you handling just %{ and %}, and
> > not also %| ?  I mean, if you want to print say {|} into assembly for both
> > dialects, don't you need:
> > asm ("{dialect1%{%|%}|%{%|%}dialect2}");
> > or similar?  If you use just | instead of %|, it would be handled as
> > separator of the dialects.
> 
> Sure. %| was removed due to concerns over some target architectures
> already use it, but now %| is under ASSEMBLER_DIALECT and doesn't seem
> to affect them.
> 
> ChangeLog:
> 
> 2013-04-29  Maxim Kuznetsov  
> * final.c (do_assembler_dialects): Don't handle curly braces and
> vertical bar escaped by % as dialect delimiters.
> (output_asm_insn): Print curly braces and vertical bar if escaped
> by % and ASSEMBLER_DIALECT defined.
> * doc/tm.texi (ASSEMBLER_DIALECT): Document new standard escapes.
> 
> testsuite/ChangeLog:
> 
> 2013-04-29  Maxim Kuznetsov  
> 
> * gcc.target/i386/asm-dialect-2.c: New testcase.

Ok, thanks.

Jakub


Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2013-04-29 Thread Maksim Kuznetsov
2013/4/29 Jakub Jelinek :
> Also, why are you handling just %{ and %}, and
> not also %| ?  I mean, if you want to print say {|} into assembly for both
> dialects, don't you need:
> asm ("{dialect1%{%|%}|%{%|%}dialect2}");
> or similar?  If you use just | instead of %|, it would be handled as
> separator of the dialects.

Sure. %| was removed due to concerns over some target architectures
already use it, but now %| is under ASSEMBLER_DIALECT and doesn't seem
to affect them.

ChangeLog:

2013-04-29  Maxim Kuznetsov  
* final.c (do_assembler_dialects): Don't handle curly braces and
vertical bar escaped by % as dialect delimiters.
(output_asm_insn): Print curly braces and vertical bar if escaped
by % and ASSEMBLER_DIALECT defined.
* doc/tm.texi (ASSEMBLER_DIALECT): Document new standard escapes.

testsuite/ChangeLog:

2013-04-29  Maxim Kuznetsov  

* gcc.target/i386/asm-dialect-2.c: New testcase.

--
Maxim Kuznetsov


curly_braces_20130429-2.patch
Description: Binary data


[PATCH][ARM] Restrict store_minmaxsi

2013-04-29 Thread Kyrylo Tkachov
Hi all,

With this patch, we now only use the store_minmaxsi pattern when we're not
in a hot path.

We have found that this pattern can cause memory access bottlenecks in some
cases
(one benchmark was 45% slower when this pattern was enabled).

Tested arm-none-eabi on qemu.

Ok for trunk?

Thanks,
Kyrill

2013-04-29  Kyrylo Tkachov  

* config/arm/arm.md (store_minmaxsi): Use only when
optimize_insn_for_size_p.

disable_store_minmaxsi.patch
Description: Binary data


[PATCH][committed]: Fix typo in predict.c

2013-04-29 Thread Kyrylo Tkachov
Hi all,

I've committed this typo fix in predict.c as r198408.

Thanks,
Kyrill

2013-04-29  Kyrylo Tkachov  

* predict.c: Fix typo in comment above #define PROB_VERY_UNLIKELY.

predict-spelling.patch
Description: Binary data


Re: [testsuite] Disabling gcc.dg/cpp/trad/include.c for Android

2013-04-29 Thread Alexander Ivchenko
*ping*

thanks,
Alexander

2013/3/26 Alexander Ivchenko :
> Hi,
>
> Could you please take a look at the attached fixinclude patch
> that addresses the problem:
>
> "  We have test fail for gcc.dg/cpp/trad/include.c on Android. The
> reason for that is that
> -ftraditional-cpp is not expected to work on Android due to variadic
> macro (like #define __builtin_warning(x, y...))
> in standard headers and traditional preprocessor cannot handle them."
>
> is it ok for trunk?
>
> thanks,
> Alexander
>
> 2013/1/9 Andrew Pinski :
>> On Wed, Jan 9, 2013 at 7:14 AM, Alexander Ivchenko  
>> wrote:
>>> Hi,
>>>
>>>   We have test fail for gcc.dg/cpp/trad/include.c on Android. The
>>> reason for that is that
>>> -ftraditional-cpp is not expected to work on Android due to variadic
>>> macro (like #define __builtin_warning(x, y...))
>>> in standard headers and traditional preprocessor cannot handle them.
>>>   The attached patch disables that test.
>>
>> It sounds like it is better to fix the system headers instead.  Via a
>> fixincludes for older headers and have the android folks fix them for
>> newer releases.
>>
>> Thanks,
>> Andrew Pinski


[PATCH, i386]: Enable SSE -> GPR moves for generic x86 targets (PR target/54349)

2013-04-29 Thread Uros Bizjak
Hello!

Attached patch enables SSE -> general register moves for generic x86
targets. The patch splits TARGET_INTER_UNIT_MOVES to
TARGET_INTER_UNIT_MOVES_TO_VEC and TARGET_INTER_UNIT_MOVES_FROM_VEC
tuning flags and updates gcc sources accordingly.

According to AMD optimization manuals, direct moves *FROM* SSE (and
MMX) registers *TO* general registers should be used for AMD K10
family and later families. Since Intel targets are unaffected by this
change, I have also changed generic setting to enable these moves for
a generic target tuning.

2013-04-29  Uros Bizjak  

PR target/54349
* config/i386/i386.h (enum ix86_tune_indices)
:
New, split from X86_TUNE_INTER_UNIT_MOVES.
: Remove.
(TARGET_INTER_UNIT_MOVES_TO_VEC): New define.
(TARGET_INTER_UNIT_MOVES_FROM_VEC): Ditto.
(TARGET_INTER_UNIT_MOVES): Remove.
* config/i386/i386.c (initial_ix86_tune_features): Update.
Disable X86_TUNE_INTER_UNIT_MOVES_FROM_VEC for m_ATHLON_K8 only.
(ix86_expand_convert_uns_didf_sse): Use
TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_expand_vector_init_interleave): Ditto.
(inline_secondary_memory_needed): Return true for moves from SSE class
registers for !TARGET_INTER_UNIT_MOVES_FROM_VEC targets and for moves
to SSE class registers for !TARGET_INTER_UNIT_MOVES_TO_VEC targets.
* config/i386/constraints.md (Yi, Ym): Depend on
TARGET_INTER_UNIT_MOVES_TO_VEC.
(Yj, Yn): New constraints.
* config/i386/i386.md (*movdi_internal): Change constraints of
operand 1 from Yi to Yj and from Ym to Yn.
(*movsi_internal): Ditto.
(*movdf_internal): Ditto.
(*movsf_internal): Ditto.
(*float2_1): Use
TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES.
(*float2_1 splitters): Ditto.
(floatdi2_i387_with_xmm): Ditto.
(floatdi2_i387_with_xmm splitters): Ditto.
* config/i386/sse.md (movdi_to_sse): Ditto.
(sse2_stored): Change constraint of operand 1 from Yi to Yj.
Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of
TARGET_INTER_UNIT_MOVES.
(sse_storeq_rex64): Change constraint of operand 1 from Yi to Yj.
(sse_storeq_rex64 splitter): Use TARGET_INTER_UNIT_MOVES_FROM_VEC
instead of TARGET_INTER_UNIT_MOVES.
* config/i386/mmx.md (*mov_internal): Change constraint of
operand 1 from Yi to Yj and from Ym to Yn.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32} and committed to mainline SVN.

Uros.
Index: constraints.md
===
--- constraints.md  (revision 198390)
+++ constraints.md  (working copy)
@@ -87,8 +87,10 @@
 
 ;; We use the Y prefix to denote any number of conditional register sets:
 ;;  z  First SSE register.
-;;  i  SSE2 inter-unit moves enabled
-;;  m  MMX inter-unit moves enabled
+;;  i  SSE2 inter-unit moves to SSE register enabled
+;;  j  SSE2 inter-unit moves from SSE register enabled
+;;  m  MMX inter-unit moves to MMX register enabled
+;;  n  MMX inter-unit moves from MMX register enabled
 ;;  a  Integer register when zero extensions with AND are disabled
 ;;  p  Integer register when TARGET_PARTIAL_REG_STALL is disabled
 ;;  d  Integer register when integer DFmode moves are enabled
@@ -99,13 +101,21 @@
  "First SSE register (@code{%xmm0}).")
 
 (define_register_constraint "Yi"
- "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES ? SSE_REGS : NO_REGS"
- "@internal Any SSE register, when SSE2 and inter-unit moves are enabled.")
+ "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC ? SSE_REGS : NO_REGS"
+ "@internal Any SSE register, when SSE2 and inter-unit moves to vector 
registers are enabled.")
 
+(define_register_constraint "Yj"
+ "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_FROM_VEC ? SSE_REGS : NO_REGS"
+ "@internal Any SSE register, when SSE2 and inter-unit moves from vector 
registers are enabled.")
+
 (define_register_constraint "Ym"
- "TARGET_MMX && TARGET_INTER_UNIT_MOVES ? MMX_REGS : NO_REGS"
- "@internal Any MMX register, when inter-unit moves are enabled.")
+ "TARGET_MMX && TARGET_INTER_UNIT_MOVES_TO_VEC ? MMX_REGS : NO_REGS"
+ "@internal Any MMX register, when inter-unit moves to vector registers are 
enabled.")
 
+(define_register_constraint "Yn"
+ "TARGET_MMX && TARGET_INTER_UNIT_MOVES_FROM_VEC ? MMX_REGS : NO_REGS"
+ "@internal Any MMX register, when inter-unit moves from vector registers are 
enabled.")
+
 (define_register_constraint "Yp"
  "TARGET_PARTIAL_REG_STALL ? NO_REGS : GENERAL_REGS"
  "@internal Any integer register when TARGET_PARTIAL_REG_STALL is disabled.")
Index: i386.c
===
--- i386.c  (revision 198390)
+++ i386.c  (working copy)
@@ -1931,9 +1931,12 @@ static unsigned int initial_ix86_tune_features[X86
   /* X86_TUNE_USE_FFREEP */
   m_AMD_MULTIPLE,
 
-  /* X86_TUNE_INTER_UNIT_MOVES */
+  /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC */
   ~(m_

Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2013-04-29 Thread Jakub Jelinek
On Mon, Apr 29, 2013 at 02:31:40PM +0400, Maksim Kuznetsov wrote:
> Jakub, Richard, thank you for your feedback!
> 
> > I wonder if it %{ and %} shouldn't be better handled in final.c
> > for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific.
> 
> I moved %{ and %} cases to output_asm_insn in final.c
> 
> > Also:
> > *(p + 1)
> > should be better written as p[1] (more readable).
> 
> Fixed.
> 
> I also documented new escapes.
> Could you please have a look?

ChangeLog entry is missing.  Also, why are you handling just %{ and %}, and
not also %| ?  I mean, if you want to print say {|} into assembly for both
dialects, don't you need:
asm ("{dialect1%{%|%}|%{%|%}dialect2}");
or similar?  If you use just | instead of %|, it would be handled as
separator of the dialects.  Otherwise it looks good to me.

Jakub


Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2013-04-29 Thread Maksim Kuznetsov
Jakub, Richard, thank you for your feedback!

> I wonder if it %{ and %} shouldn't be better handled in final.c
> for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific.

I moved %{ and %} cases to output_asm_insn in final.c

> Also:
> *(p + 1)
> should be better written as p[1] (more readable).

Fixed.

I also documented new escapes.
Could you please have a look?

--
Maxim Kuznetsov


curly_braces_20130429.patch
Description: Binary data


Re: [PATCH SH] Fix PR57108

2013-04-29 Thread Kaz Kojima
Christian Bruel  wrote:
> This patches set the correct operand mode for tstsi_t_zero_extract_eq,
> to avoid reload generating a move between a constant and a void register.
> 
> Reg tested for sh-elf. No performance impact
> 
> OK for 4.7, 4.8 and trunk ?

OK.

Regards,
kaz


Re: [PATCH v2] gcc: arm: linux-eabi: fix handling of armv4 bx fixups when linking

2013-04-29 Thread Richard Earnshaw

On 28/04/13 04:52, Mike Frysinger wrote:

The bpabi.h header already sets up defines to automatically use the
--fix-v4bx flag with the assembler & linker as needed, and creates a
default assembly & linker spec which uses those.  Unfortunately, the
linux-eabi.h header clobbers the LINK_SPEC define and doesn't include
the v4bx define when setting up its own.  So while the assembler spec
is retained and works fine to generate the right relocs, building for
armv4 targets doesn't invoke the linker correctly so all the relocs
get processed as if we had an armv4t target.

You can see this with -dumpspecs when configuring gcc for an armv4
target and using --with-arch=armv4:
$ armv4l-unknown-linux-gnueabi-gcc -dumpspecs |& grep -B 1 fix-v4bx
*subtarget_extra_asm_spec:
 
%{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}
 ...

With this fix in place, we also get the link spec:
$ armv4l-unknown-linux-gnueabi-gcc -dumpspecs |& grep -B 1 fix-v4bx
*link:
...  
%{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}
 ...

And all my hello world tests / glibc builds automatically turn the
bx insn into the 'mov pc, lr' insn and all is right in the world.

Signed-off-by: Mike Frysinger 

2013-04-27  Mike Frysinger  

* config/arm/bpabi.h (EABI_LINK_SPEC): Define.
(BPABI_LINK_SPEC): Use new EABI_LINK_SPEC.
* config/arm/linux-eabi.h (LINK_SPEC): Replace BE8_LINK_SPEC
with EABI_LINK_SPEC.


OK.

R.




Re: [PATCH] Fix PR57089

2013-04-29 Thread Richard Biener
On Mon, 29 Apr 2013, Richard Biener wrote:

> 
> I've tried to follow where the scalar loop appears in
> expand_omp_for_static_nochunk but got lost quickly.  So the following
> papers over the lack of OMP expansion populating the loop tree
> as I've done in the original patch introducing loops to it.
> 
> If the OMP expansion code knows at some point "here is a new loop
> and this is the header block and this is the latch block" I can
> write a helper that properly updates the loop tree with that
> information (call alloc_loop, init ->header and ->latch and
> call add_loop).  But at the moment I have no idea where to call
> that function ...

After discussion on IRC I am now testing the following (only
the degenerate case still uses fixup, I'm not sure how to
reliably get at loops here - eventually we want to revisit
that loops-with-abnormal-entries issue again).

Richard.

2013-04-29  Richard Biener  

PR middle-end/57089
* omp-low.c (expand_omp_taskreg): If the parent function had
a broken loop tree make sure to schedule a fixup for the child
as well.
(expand_omp_for_generic): Properly add loops.
(expand_omp_for_static_nochunk): Likewise.
(expand_omp_for_static_chunk): Likewise.
(expand_omp_for): For the degenerate case fixup loops.
(expand_omp_sections): Fix default bb placement in loops.
(expand_omp_atomic_pipeline): Properly add loops.

* gfortran.dg/gomp/pr57089.f90: New testcase.

Index: gcc/omp-low.c
===
*** gcc/omp-low.c   (revision 198389)
--- gcc/omp-low.c   (working copy)
*** expand_omp_taskreg (struct omp_region *r
*** 3571,3581 
new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
if (exit_bb)
single_succ_edge (new_bb)->flags = EDGE_FALLTHRU;
!   /* ???  As the OMP expansion process does not update the loop
!  tree of the original function before outlining the region to
!the new child function we need to discover loops in the child.
!Arrange for that.  */
!   child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
  
/* Remove non-local VAR_DECLs from child_cfun->local_decls list.  */
num = vec_safe_length (child_cfun->local_decls);
--- 3571,3580 
new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
if (exit_bb)
single_succ_edge (new_bb)->flags = EDGE_FALLTHRU;
!   /* When the OMP expansion process cannot guarantee an up-to-date
!  loop tree arrange for the child function to fixup loops.  */
!   if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
!   child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
  
/* Remove non-local VAR_DECLs from child_cfun->local_decls list.  */
num = vec_safe_length (child_cfun->local_decls);
*** expand_omp_for_generic (struct omp_regio
*** 4148,4153 
--- 4147,4162 
   recompute_dominator (CDI_DOMINATORS, l0_bb));
set_immediate_dominator (CDI_DOMINATORS, l1_bb,
   recompute_dominator (CDI_DOMINATORS, l1_bb));
+ 
+   struct loop *outer_loop = alloc_loop ();
+   outer_loop->header = l0_bb;
+   outer_loop->latch = l2_bb;
+   add_loop (outer_loop, l0_bb->loop_father);
+ 
+   struct loop *loop = alloc_loop ();
+   loop->header = l1_bb;
+   /* The loop may have multiple latches.  */
+   add_loop (loop, outer_loop);
  }
  }
  
*** expand_omp_for_static_nochunk (struct om
*** 4370,4375 
--- 4379,4389 
   recompute_dominator (CDI_DOMINATORS, body_bb));
set_immediate_dominator (CDI_DOMINATORS, fin_bb,
   recompute_dominator (CDI_DOMINATORS, fin_bb));
+ 
+   struct loop *loop = alloc_loop ();
+   loop->header = body_bb;
+   loop->latch = cont_bb;
+   add_loop (loop, body_bb->loop_father);
  }
  
  
*** expand_omp_for_static_chunk (struct omp_
*** 4671,4676 
--- 4685,4700 
   recompute_dominator (CDI_DOMINATORS, seq_start_bb));
set_immediate_dominator (CDI_DOMINATORS, body_bb,
   recompute_dominator (CDI_DOMINATORS, body_bb));
+ 
+   struct loop *trip_loop = alloc_loop ();
+   trip_loop->header = iter_part_bb;
+   trip_loop->latch = trip_update_bb;
+   add_loop (trip_loop, iter_part_bb->loop_father);
+ 
+   struct loop *loop = alloc_loop ();
+   loop->header = body_bb;
+   loop->latch = cont_bb;
+   add_loop (loop, trip_loop);
  }
  
  
*** expand_omp_for (struct omp_region *regio
*** 4698,4703 
--- 4722,4732 
BRANCH_EDGE (region->cont)->flags &= ~EDGE_ABNORMAL;
FALLTHRU_EDGE (region->cont)->flags &= ~EDGE_ABNORMAL;
  }
+   else
+ /* If there isnt a continue then this is a degerate case where
+the introduction of abnormal e

Re: [PATCH, AArch64] Support LDR/STR to/from S and D registers

2013-04-29 Thread Marcus Shawcroft
On 26 April 2013 14:38, Ian Bolton  wrote:
> This patch allows us to load to and store from the S and D registers,
> which helps with doing scalar operations in those registers.
>
> This has been regression tested on bare-metal and linux.
>
> OK for trunk?
>
> Cheers,
> Ian
>
>
> 2013-04-26  Ian Bolton  
>
> * config/aarch64/aarch64.md (movsi_aarch64): Support LDR/STR
> from/to S register.
> (movdi_aarch64): Support LDR/STR from/to D register.

OK
/Marcus


Re: [AArch64] Vectorize over more math.h functions.

2013-04-29 Thread Marcus Shawcroft
On 26 April 2013 14:28, James Greenhalgh  wrote:
>
> Hi,
>
> This patch adds float -> int builtins to the set
> of builtins we can try to vectorize in aarch64_builtin_vectorized_function.
>
> In particular, we add BUILT_IN_IFLOORF, BUILT_IN_ICEILF, BUILT_IN_LROUND,
> BUILT_IN_IROUNDF.
>
> The BUILT_IN_LROUND cases won't be triggered unless -ffast-math
> or something else which turns off inexact errors is enabled.
>
> Regression tested for aarch64-none-elf with no regressions.
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2013-04-26  James Greenhalgh  
>
> * config/aarch64/aarch64-builtins.c
> (aarch64_builtin_vectorized_function): Vectorize over ifloorf,
> iceilf, lround, iroundf.


OK
/Marcus


Re: [AArch64] Implement vector float->double widening and double->float narrowing.

2013-04-29 Thread Marcus Shawcroft
On 26 April 2013 14:25, James Greenhalgh  wrote:
>
> Hi,
>
> gcc.dg/vect/vect-float-truncate-1.c and
> gcc.dg/vect/vect-float-extend-1.c
>
> Were failing because widening and narrowing of floats to doubles was
> not wired up.
>
> This patch fixes that by implementing the standard names:
>
> vec_pack_trunc_v2df
> Taking two vectors of V2DFmode and returning one vector of V4SF mode.
>
> `vec_unpacks_float_hi_v4sf', `vec_unpacks_float_lo_v4sf'
> Taking one vector of V4SF mode and splitting it to two vectors of V2DF mode.
>
> Patch regression tested on aarch64-none-elf with no regressions,
> and shown to fix the bug.
>
> Thanks,
> James
> ---
> gcc/
>
> 2013-04-26  James Greenhalgh  
>
> * config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi_): New.
> (float_truncate_hi_): Likewise.
> (float_extend_lo_): Likewise.
> (float_truncate_lo_): Likewise.
> * config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): New.
> (aarch64_float_extend_lo_v2df): Likewise.
> (vec_unpacks_hi_v4sf): Likewise.
> (aarch64_float_truncate_lo_v2sf): Likewise.
> (aarch64_float_truncate_hi_v4sf): Likewise.
> (vec_pack_trunc_v2df): Likewise.
> (vec_pack_trunc_df): Likewise.

OK
/Marcus


Re: [AArch64] Add vector int to float conversions.

2013-04-29 Thread Marcus Shawcroft
On 26 April 2013 14:22, James Greenhalgh  wrote:
>
> Hi,
>
> This patch wires up builtins for int to float conversions in
> Tree, and uint to float conversions in RTL.
>
> Regression tested for aarch64-none-elf with no regressions.
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2013-04-26  James Greenhalgh  
>
> * config/aarch64/aarch64-builtins.c
> (aarch64_fold_builtin): Fold float conversions.
> * config/aarch64/aarch64-simd-builtins.def
> (floatv2si, floatv4si, floatv2di): New.
> (floatunsv2si, floatunsv4si, floatunsv2di): Likewise.
> * config/aarch64/aarch64-simd.md
> (2): New, expands to float and 
> floatuns.
> * config/aarch64/iterators.md (FLOATUORS): New.
> (optab): Add float, floatuns.
> (su_optab): Likewise.


OK
/Marcus


Re: Make m32c build, fix PSImode truncation

2013-04-29 Thread Bernd Schmidt
On 04/27/2013 10:39 AM, Richard Sandiford wrote:
> Argh, that's unfortunate.  The point of that change was to make
> simplify_gen_unary (TRUNCATE, ...) no worse than using a subreg.
> Would the equivalent lowpart simplify_gen_subreg call succeed
> (return nonnull)?  If so, I think we want truncate to do the same.
> 
> What simplification is this blocking, and why does it lead to
> reload failures?

There's an explicit (set (reg:PSI) (truncate:PSI (reg:SI)) insn which
currently gets changed to (set (reg:PSI) (subreg:PSI (reg:SI)) during
cse1. Reload fails because the subreg gets propagated into a memory
address, which requires a class of A_REGS, but A_REGS can only hold
PSImode values, not SImode.  This shows that the truncation is not
always a no-op: in this case it involves a register move, but there's no
way to describe this using TRULY_NOOP_TRUNCATION.


Bernd



Re: [AArch64] Map fcvt intrinsics to builtin name directly.

2013-04-29 Thread Marcus Shawcroft
On 26 April 2013 14:12, James Greenhalgh  wrote:
>
> Hi,
>
> This patch uses the new builtin-mapping infrastructure
> to map the fcvt family of builtins directly to their
> GCC standard pattern name.
>
> Regression tested on aarch64-none-elf with no regressions.
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2013-04-26  James Greenhalgh  
>
> * config/aarch64/aarch64-builtins.c
> (aarch64_builtin_vectorized_function): Use new names for
> fcvt builtins.
> * config/aarch64/aarch64-simd-builtins.def (fcvtzs): Split as...
> (lbtruncv2sf, lbtruncv4sf, lbtruncv2df): ...This.
> (fcvtzu): Split as...
> (lbtruncuv2sf, lbtruncuv4sf, lbtruncuv2df): ...This.
> (fcvtas): Split as...
> (lroundv2sf, lroundv4sf, lroundv2df, lroundsf, lrounddf): ...This.
> (fcvtau): Split as...
> (lrounduv2sf, lrounduv4sf, lrounduv2df, lroundusf, lroundudf): 
> ...This.
> (fcvtps): Split as...
> (lceilv2sf, lceilv4sf, lceilv2df): ...This.
> (fcvtpu): Split as...
> (lceiluv2sf, lceiluv4sf, lceiluv2df, lceilusf, lceiludf): ...This.
> (fcvtms): Split as...
> (lfloorv2sf, lfloorv4sf, lfloorv2df): ...This.
> (fcvtmu): Split as...
> (lflooruv2sf, lflooruv4sf, lflooruv2df, lfloorusf, lfloorudf): 
> ...This.
> (lfrintnv2sf, lfrintnv4sf, lfrintnv2df, lfrintnsf, lfrintndf): New.
> (lfrintnuv2sf, lfrintnuv4sf, lfrintnuv2df): Likewise.
> (lfrintnusf, lfrintnudf): Likewise.
> * config/aarch64/aarch64-simd.md
> (l2): Convert to
> define_insn.
> (aarch64_fcvt): Remove.
> * config/aarch64/iterators.md (FCVT): Include UNSPEC_FRINTN.
> (fcvt_pattern): Likewise.

OK
/Marcus


[PATCH SH] Fix PR57108

2013-04-29 Thread Christian Bruel
Hello,

This patches set the correct operand mode for tstsi_t_zero_extract_eq,
to avoid reload generating a move between a constant and a void register.

Reg tested for sh-elf. No performance impact

OK for 4.7, 4.8 and trunk ?

Thanks


2013-04-26  Christian Bruel  

	PR target/57108
	* sh.md (tstsi_t_zero_extract_eq): Set mode for operand 0.

2013-04-26  Christian Bruel  

	PR target/57108
	* gcc.target/sh/pr57108.c: New test.

Index: gcc/testsuite/gcc.target/sh/pr57108.c
===
--- gcc/testsuite/gcc.target/sh/pr57108.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr57108.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+
+void __assert_func (void) __attribute__ ((__noreturn__)) ;
+
+void ATATransfer (int num, int buffer)
+{
+ int wordCount;
+
+ while (num > 0)
+  {
+wordCount = num * 512 / sizeof (int);
+
+((0 == (buffer & 63)) ? (void)0 : __assert_func () );
+((0 == (wordCount & 31)) ? (void)0 : __assert_func ());
+  }
+
+
+ }

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 198287)
+++ gcc/config/sh/sh.md	(working copy)
@@ -689,7 +689,7 @@
 ;; Extract contiguous bits and compare them against zero.
 (define_insn "tstsi_t_zero_extract_eq"
   [(set (reg:SI T_REG)
-	(eq:SI (zero_extract:SI (match_operand 0 "logical_operand" "z")
+	(eq:SI (zero_extract:SI (match_operand:SI 0 "logical_operand" "z")
 (match_operand:SI 1 "const_int_operand")
 (match_operand:SI 2 "const_int_operand"))
 	   (const_int 0)))]


Re: [AArch64][Testsuite] Enable vect_uintfloat_cvt for AArch64.

2013-04-29 Thread Marcus Shawcroft
On 26 April 2013 14:36, James Greenhalgh  wrote:
>
> Hi,
>
> While modifying all the vcvt builtins we've fixed enough bugs
> that we can now enable vect_uintfloat_cvt for AArch64. Do that.
>
> Patch tested to ensure all newly enabled tests succeed.
>
> James
> ---
> gcc/testsuite/
>
> 2013-04-26  James Greenhalgh  
>
> * lib/target-supports.exp (vect_uintfloat_cvt): Enable for AArch64.

OK
/Marcus


Re: [AArch64] fcvt instructions - arm_neon.h changes.

2013-04-29 Thread Marcus Shawcroft
On 26 April 2013 14:34, James Greenhalgh  wrote:
>
> This patch updates the implimentation in arm_neon.h of the vcvt
> intrinsics. Where appropriate we use C statements, and where not
> possible we fall back to builtins.
>
> There were a number of errors with names and types in the current
> revision of the file. These have been corrected.
>
> Regression tested with no regressions.
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2013-04-26  James Greenhalgh  
>
> * config/aarch64/arm_neon.h
> (vcvt_f<32,64>_s<32,64>): Rewrite in C.
> (vcvt_f<32,64>_s<32,64>): Rewrite using builtins.
> (vcvt__f<32,64>_f<32,64>): Likewise.
> (vcvt_<32,64>_f<32,64>): Likewise.
> (vcvta_<32,64>_f<32,64>): Likewise.
> (vcvtm_<32,64>_f<32,64>): Likewise.
> (vcvtn_<32,64>_f<32,64>): Likewise.
> (vcvtp_<32,64>_f<32,64>): Likewise.
>
> gcc/testsuite/
>
> 2013-04-26  James Greenhalgh  
>
> * gcc.target/aarch64/vect-vcvt.c: New.


OK
/Marcus


Re: [AArch64] Add vector fix, fixuns, fix_trunc, fixuns_trunc standard patterns

2013-04-29 Thread Marcus Shawcroft
OK
/Marcus

On 26 April 2013 14:30, James Greenhalgh  wrote:
>
> Hi,
>
> This patch enables vectorization over conversions by implimenting the
> fix, fixuns, fix_trunc, fixuns_trunc, and ftrunc standard pattern names.
>
> Each of these is implimented by the frintz instruction.
> (Round towards 0)
>
> The expanders for these are blank as they are already
> implimented by the lrint standard patterns. We are
> just connecting the dots for another set of standard names.
>
> Regression tested for aarch64-none-elf with no regressions.
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2013-04-26  James Greenhalgh  
>
> * config/aarch64/aarch64-simd.md
> (2): New, maps to fix, fixuns.
> (2): New, maps to
> fix_trunc, fixuns_trunc.
> (ftrunc2): New.
> * config/aarch64/iterators.md (optab): Add fix, fixuns.
> (fix_trunc_optab): New.


Re: [Patch, fortran] PR 56981 Improve unbuffered unformatted performance

2013-04-29 Thread Janne Blomqvist
On Mon, Apr 29, 2013 at 1:46 AM, Jerry DeLisle  wrote:
> OK Janne and thanks for the patch.

Thanks for the review, committed (as well as the system_clock patch).

> What are your thoughts about special casing nul devices/

Hmm, I'm not that eager. It starts to smell of "benchmarketing"..

One thing I have been thinking of which could help would be to
implement a "start of the current record" marker in the buffering
implementation, and when flushing then only flush up to that marker.
Currently when writing small sequential unformatted records what often
happens when looking at strace output is something like

write(3, "\4\0\0\0`\0263I\4\0\0\0\4\0\0\0p\0263I\4\0\0\0\4\0\0\0\200\0263I"...,
8192) = 8192
write(3, "\4\0\0\0", 4) = 4
lseek(3, 8810688, SEEK_SET) = 8810688
write(3, "\4\0\0\0", 4) = 4
lseek(3, 8810700, SEEK_SET) = 8810700
write(3, "\4\0\0\0\20A3I\4\0\0\0\4\0\0\0
A3I\4\0\0\0\4\0\0\A3I"..., 8192) = 8192

i.e. the buffer fills up in the middle of a record, we flush it (the
8192 byte writes), but then we have to seek back and forth to fix the
record markers. This also means that we cannot fully buffer
non-seekable files (such as /dev/null) because the records around the
buffer boundaries are corrupted (which of course doesn't matter for
the particular case of /dev/null, but otherwise..). So if this issue
is fixed we could buffer those as well and get essentially the same
performance as for regular files.

E.g. something like

/* Reserve space in the buffer (flush existing data if necessary), up
to some reasonable max size (e.g. 4 KB) unless the size is small and
known upfront (e.g. direct access), and set the current_record_start
marker at the current position. Should be called when preparing a new
record (st_write()). */
void sreserve(int size);

Write the record data as usual...

/* Finish the record, i.e. set current_record_start marker to -1 to
mark that there is no current record. size should be <= reserved size.
Should be called when finishing a write (st_write_done()). */
void scommit(int size);

Well, that's a rough idea. I don't know when or if I'll have time and
motivation to implement it, though..

--
Janne Blomqvist


[C++ Patch/RFC] PR 57092

2013-04-29 Thread Paolo Carlini

Hi,

in this 4.8/4.9 Regression, finish_decltype_type doesn't handle 
ADDR_EXPR. In 4.7, finish_decltype_type deals with a TEMPLATE_PARM_INDEX 
and the testcase compiles fine, but it's quite easy - see c++/52282 - to 
trigger the same ICE there too (it would be nice to make progress on the 
latter too).


The patchlet below passes testing, not sure whether there is something 
deeper about this issue.


Thanks,
Paolo.

/
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 198381)
+++ cp/semantics.c  (working copy)
@@ -5389,6 +5389,7 @@ finish_decltype_type (tree expr, bool id_expressio
 case PARM_DECL:
 case RESULT_DECL:
 case TEMPLATE_PARM_INDEX:
+   case ADDR_EXPR:
  expr = mark_type_use (expr);
   type = TREE_TYPE (expr);
   break;
Index: testsuite/g++.dg/cpp0x/decltype53.C
===
--- testsuite/g++.dg/cpp0x/decltype53.C (revision 0)
+++ testsuite/g++.dg/cpp0x/decltype53.C (working copy)
@@ -0,0 +1,11 @@
+// PR c++/57092
+// { dg-do compile { target c++11 } }
+
+template 
+class B {
+  decltype(F) v;
+};
+
+void foo(int) {}
+
+B o;


[DWARF] Fix multiple register spanning location.

2013-04-29 Thread Christian Bruel
Hello,

We noticed a few failures with the gdb testsuite due to incorrect
mapping of floating point, noticed on SH that defines both
TARGET_DWARF_REGISTER_SPAN and DBX_REGISTER_NUMBER.

The problem was that the converted pseudo reg was never converted to the
dbx format when fed from 'multiple_reg_loc_descriptor'

reg tested for sh-elf (including gdb). bootstrap OK for arm-none-eabi,
sh64-elf and x86_64-unknown-linux-gnu

Note that this could apply to the ARM, C6X, RS6000, MIPS targets that
also defines the same macro combination. Although asking approval from
the DWARF maintainers, feedback from the respective arch maintainers
would be appreciated as I don't run the gdb testsuite on those targets.

Many thanks,

Christian
2013-04-26  Christian Bruel  

	* dwarf2out.c (multiple_reg_loc_descriptor): Use DBX_REGISTER_NUMBER
	 for spaning registers.

2013-04-26  Christian Bruel  

	* gcc.dg/debug/dwarf2/dwarf_span.c: New test case.

Index: dwarf2out.c
===
--- dwarf2out.c	(revision 198287)
+++ dwarf2out.c	(working copy)
@@ -10656,7 +10656,8 @@ multiple_reg_loc_descriptor (rtx rtl, rtx regs,
 {
   dw_loc_descr_ref t;
 
-  t = one_reg_loc_descriptor (REGNO (XVECEXP (regs, 0, i)),
+  reg = REGNO (XVECEXP (regs, 0, i));
+  t = one_reg_loc_descriptor (DBX_REGISTER_NUMBER (reg),
   VAR_INIT_STATUS_INITIALIZED);
   add_loc_descr (&loc_result, t);
   size = GET_MODE_SIZE (GET_MODE (XVECEXP (regs, 0, 0)));

Index: testsuite/gcc.dg/debug/dwarf2/dwarf_span.c
===
--- testsuite/gcc.dg/debug/dwarf2/dwarf_span.c	(revision 0)
+++ testsuite/gcc.dg/debug/dwarf2/dwarf_span.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-g -dA" } */
+/* { dg-final { scan-assembler-times "DW_OP_regx" 4 } } */
+
+double
+add_double (register double u, register double v)
+{
+  return u + v;
+}
+
+double
+wack_double (register double u, register double v)
+{
+  register double l = u, r = v;
+  l = add_double (l, r);
+  return l + r;
+}



[PATCH] Fix PR57089

2013-04-29 Thread Richard Biener

I've tried to follow where the scalar loop appears in
expand_omp_for_static_nochunk but got lost quickly.  So the following
papers over the lack of OMP expansion populating the loop tree
as I've done in the original patch introducing loops to it.

If the OMP expansion code knows at some point "here is a new loop
and this is the header block and this is the latch block" I can
write a helper that properly updates the loop tree with that
information (call alloc_loop, init ->header and ->latch and
call add_loop).  But at the moment I have no idea where to call
that function ...

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Richard.

2013-04-29  Richard Biener  

PR middle-end/57089
* omp-low.c (expand_omp_for_static_nochunk): Mark loops for
fixup.

* gfortran.dg/gomp/pr57089.f90: New testcase.

Index: gcc/omp-low.c
===
*** gcc/omp-low.c   (revision 198389)
--- gcc/omp-low.c   (working copy)
*** expand_omp_for_static_nochunk (struct om
*** 4370,4375 
--- 4370,4380 
   recompute_dominator (CDI_DOMINATORS, body_bb));
set_immediate_dominator (CDI_DOMINATORS, fin_bb,
   recompute_dominator (CDI_DOMINATORS, fin_bb));
+ 
+   /* ???  The scalar loop that remains in the body is not registered
+  with the loop tree.  Mark that for fixup.  */
+   if (current_loops)
+ loops_state_set (LOOPS_NEED_FIXUP);
  }
  
  
Index: gcc/testsuite/gfortran.dg/gomp/pr57089.f90
===
*** gcc/testsuite/gfortran.dg/gomp/pr57089.f90  (revision 0)
--- gcc/testsuite/gfortran.dg/gomp/pr57089.f90  (working copy)
***
*** 0 
--- 1,12 
+ ! PR middle-end/57089
+ ! { dg-do compile }
+ ! { dg-options "-O -fopenmp" }
+   SUBROUTINE T()
+ INTEGER:: npoints, grad_deriv
+ SELECT CASE(grad_deriv)
+ CASE (0)
+!$omp do
+DO ii=1,npoints
+END DO
+ END SELECT
+   END SUBROUTINE 


Re: mips SNaN/QNaN is swapped

2013-04-29 Thread Thomas Schwinge
Hi!

Ping.

On Mon, 22 Apr 2013 11:52:23 +0200, I wrote:
> On Fri, 5 Apr 2013 23:55:37 +0100, "Maciej W. Rozycki" 
>  wrote:
> > On Fri, 5 Apr 2013, Thomas Schwinge wrote:
> > > > Index: gcc/config/fp-bit.c
> > > > ===
> > > > RCS file: /cvs/uberbaum/gcc/config/fp-bit.c,v
> > > > retrieving revision 1.39
> > > > diff -u -p -r1.39 fp-bit.c
> > > > --- gcc/config/fp-bit.c 26 Jan 2003 10:06:57 - 1.39
> > > > +++ gcc/config/fp-bit.c 1 Apr 2003 21:35:00 -
> > > > @@ -210,7 +210,11 @@ pack_d ( fp_number_type *  src)
> > > >exp = EXPMAX;
> > > >if (src->class == CLASS_QNAN || 1)
> > > > {
> > > > +#ifdef QUIET_NAN_NEGATED
> > > > + fraction |= QUIET_NAN - 1;
> > > > +#else
> > > >   fraction |= QUIET_NAN;
> > > > +#endif
> 
> >  I think the intent of this code is to preserve a NaN's payload (it 
> > certainly does for non-QUIET_NAN_NEGATED targets)
> 
> I agree.  For preserving the payload, both the unpack/pack code also has
> to shift by NGARDS.
> 
> > Complementing the change above I think it will also make 
> > sense to clear the qNaN bit when extracting a payload from fraction in 
> > unpack_d as the class of a NaN being handled is stored separately.
> 
> I agree.
> 
> >  Also I find the "|| 1" clause in the condition immediately above the 
> > pack_d piece concerned suspicious -- why is a qNaN returned for sNaN 
> > input?  Likewise why are __thenan_sf, etc. encoded as sNaNs rather than 
> > qNaNs?  Does anybody know?
> 
> I also stumbled over that, but for all these, I suppose the idea is that
> when a sNaN is "arithmetically processed" (which includes datatype
> conversion), an INVALID exception is to be raised (though, »[fp-bit]
> implements IEEE 754 format arithmetic, but does not provide a mechanism
> [...] for generating or handling exceptions«), and then converted into a
> qNaN.
> 
> Also, I found that the bit to look at for distinguishing qNaN/sNaN is
> defined wrongly for float.  Giving me some "interesting" test results...
> ;-)
> 
> Manual testing looks good.  Automated testing is still running; in case
> nothing turns up, is this OK to check in?
> 
> libgcc/
>   * fp-bit.c (unpack_d, pack_d): Properly preserve and restore a
>   NaN's payload.
>   * fp-bit.h [FLOAT] (QUIET_NAN): Correct value.
> 
> Index: libgcc/fp-bit.c
> ===
> --- libgcc/fp-bit.c   (revision 402061)
> +++ libgcc/fp-bit.c   (working copy)
> @@ -214,11 +214,18 @@ pack_d (const fp_number_type *src)
>else if (isnan (src))
>  {
>exp = EXPMAX;
> +  /* Restore the NaN's payload.  */
> +  fraction >>= NGARDS;
> +  fraction &= QUIET_NAN - 1;
>if (src->class == CLASS_QNAN || 1)
>   {
>  #ifdef QUIET_NAN_NEGATED
> -   fraction |= QUIET_NAN - 1;
> +   /* The quiet/signaling bit remains unset.  */
> +   /* Make sure the fraction has a non-zero value.  */
> +   if (fraction == 0)
> + fraction |= QUIET_NAN - 1;
>  #else
> +   /* Set the quiet/signaling bit.  */
> fraction |= QUIET_NAN;
>  #endif
>   }
> @@ -574,8 +581,10 @@ unpack_d (FLO_union_type * src, fp_number_type * d
>   {
> dst->class = CLASS_SNAN;
>   }
> -   /* Keep the fraction part as the nan number */
> -   dst->fraction.ll = fraction;
> +   /* Now that we know which kind of NaN we got, discard the
> +  quiet/signaling bit, but do preserve the NaN payload.  */
> +   fraction &= ~QUIET_NAN;
> +   dst->fraction.ll = fraction << NGARDS;
>   }
>  }
>else
> Index: libgcc/fp-bit.h
> ===
> --- libgcc/fp-bit.h   (revision 402061)
> +++ libgcc/fp-bit.h   (working copy)
> @@ -190,7 +190,7 @@ typedef unsigned int UTItype __attribute__ ((mode
>  #define EXPBIAS 127
>  #define FRACBITS 23
>  #define EXPMAX (0xff)
> -#define QUIET_NAN 0x10L
> +#define QUIET_NAN 0x40L
>  #define FRAC_NBITS 32
>  #define FRACHIGH  0x8000L
>  #define FRACHIGH2 0xc000L
> @@ -298,7 +298,7 @@ typedef unsigned int UTItype __attribute__ ((mode
>  /* numeric parameters */
>  /* F_D_BITOFF is the number of bits offset between the MSB of the mantissa
> of a float and of a double. Assumes there are only two float types.
> -   (double::FRAC_BITS+double::NGARDS-(float::FRAC_BITS-float::NGARDS))
> +   (double::FRAC_BITS+double::NGARDS-(float::FRAC_BITS+float::NGARDS))
>   */
>  #define F_D_BITOFF (52+8-(23+7))
>  


Grüße,
 Thomas


pgpCJJckZpfEO.pgp
Description: PGP signature


Re: [RFA][PATCH] Eliminate more unnecessary type conversions

2013-04-29 Thread Richard Biener
On Fri, Apr 26, 2013 at 8:53 PM, Jeff Law  wrote:
>
> So looking at more dumps made it pretty obvious that my previous patch to
> tree-vrp.c to eliminate useless casts to boolean types which fed into
> comparisons could and should be generalized.
>
> Given:
>
>   x1 = (T1) x0;
>   if (x1 COND CONST)
>
> If the known value range for x0 fits into T1, then we can rewrite as
>
>   x1 = (T1) x0;
>   if (x0 COND (T)CONST)
>
> Which typically makes the first statement dead and may allow further
> simplifications.
>
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  OK for the
> trunk?

Ok.

Thanks,
Richard.

>
> commit ad290c7270201042bfc3cde1d84c12e639e4bff7
> Author: Jeff Law 
> Date:   Fri Apr 26 12:52:06 2013 -0600
>
> * tree-vrp.c (range_fits_type_p): Move to earlier point in file.
> (simplify_cond_using_ranges): Generalize code to simplify
> COND_EXPRs where one argument is a constant and the other
> is an SSA_NAME created by an integral type conversion.
>
> * gcc.dg/tree-ssa/vrp88.c: New test.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index d06eee6..f9b207c 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2013-04-26  Jeff Law  
> +
> +   * tree-vrp.c (range_fits_type_p): Move to earlier point in file.
> +   (simplify_cond_using_ranges): Generalize code to simplify
> +   COND_EXPRs where one argument is a constant and the other
> +   is an SSA_NAME created by an integral type conversion.
> +
>  2013-04-26  Vladimir Makarov  
>
> * rtl.h (struct rtx_def): Add comment for field jump.
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index bbea9fa..6d7839f 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2013-04-26  Jeff Law  
> +
> +   * gcc.dg/tree-ssa/vrp88.c: New test.
> +
>  2013-04-26  Jakub Jelinek  
>
> PR go/57045
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp88.c
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp88.c
> new file mode 100644
> index 000..e43bdff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp88.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +
> +/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
> +
> +
> +typedef const struct bitmap_head_def *const_bitmap;
> +typedef unsigned long BITMAP_WORD;
> +typedef struct bitmap_element_def {
> +  struct bitmap_element_def *next;
> +  BITMAP_WORD bits[((128 + (8 * 8 * 1u) - 1) / (8 * 8 * 1u))];
> +} bitmap_element;
> +typedef struct bitmap_head_def {
> +  bitmap_element *first;
> +} bitmap_head;
> +unsigned char
> +bitmap_single_bit_set_p (const_bitmap a)
> +{
> +  unsigned long count = 0;
> +  const bitmap_element *elt;
> +  unsigned ix;
> +  if ((!(a)->first))
> +return 0;
> +  elt = a->first;
> +  if (elt->next != ((void *)0))
> +return 0;
> +  for (ix = 0; ix != ((128 + (8 * 8 * 1u) - 1) / (8 * 8 * 1u)); ix++)
> +{
> +  count += __builtin_popcountl (elt->bits[ix]);
> +  if (count > 1)
> + return 0;
> +}
> +  return count == 1;
> +}
> +
> +/* Verify that VRP simplified an "if" statement.  */
> +/* { dg-final { scan-tree-dump "Folded into: if.*" "vrp1"} } */
> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
> +
> +
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index cb4a09a..07e3e01 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -8509,6 +8509,57 @@ test_for_singularity (enum tree_code cond_code, tree
> op0,
>return NULL;
>  }
>
> +/* Return whether the value range *VR fits in an integer type specified
> +   by PRECISION and UNSIGNED_P.  */
> +
> +static bool
> +range_fits_type_p (value_range_t *vr, unsigned precision, bool unsigned_p)
> +{
> +  tree src_type;
> +  unsigned src_precision;
> +  double_int tem;
> +
> +  /* We can only handle integral and pointer types.  */
> +  src_type = TREE_TYPE (vr->min);
> +  if (!INTEGRAL_TYPE_P (src_type)
> +  && !POINTER_TYPE_P (src_type))
> +return false;
> +
> +  /* An extension is fine unless VR is signed and unsigned_p,
> + and so is an identity transform.  */
> +  src_precision = TYPE_PRECISION (TREE_TYPE (vr->min));
> +  if ((src_precision < precision
> +   && !(unsigned_p && !TYPE_UNSIGNED (src_type)))
> +  || (src_precision == precision
> + && TYPE_UNSIGNED (src_type) == unsigned_p))
> +return true;
> +
> +  /* Now we can only handle ranges with constant bounds.  */
> +  if (vr->type != VR_RANGE
> +  || TREE_CODE (vr->min) != INTEGER_CST
> +  || TREE_CODE (vr->max) != INTEGER_CST)
> +return false;
> +
> +  /* For sign changes, the MSB of the double_int has to be clear.
> + An unsigned value with its MSB set cannot be represented by
> + a signed double_int, while a negative value cannot be represented
> + by an unsigned double_int.  */
> +  if (TYPE_UNSIGNED (src_type) != unsigned_p
> +  && (TREE_INT_CST_HIGH (vr->min) | TREE_INT_CST_HIGH (vr->max)) < 0)
> +return false;
> +
> +  /* Then we can 

Re: [PATCH] Fix PR57077 (issue8840045)

2013-04-29 Thread Richard Biener
On Fri, Apr 26, 2013 at 8:52 PM, Teresa Johnson  wrote:
> This patch fixes PR57077. Certain new uses of apply_probability
> are actually scaling the counts up, and the scale factor should not
> be treated as a probability as the value may exceed REG_BR_PROB_BASE.
> One example (from the PR) is when scaling counts up in LTO when merging
> profiles. Another example I found when preparing the patch to use
> the rounding divide in more places is when inlining COMDAT functions.
>
> Add new helper function apply_scale that does the scaling without
> the probability range check. I audited the new uses of apply_probability
> and changed the calls as appropriate.
>
> Profilebootstrapped and tested on x86_64-unknown-linux-gnu. Verified that this
> fixes the lto-bootstrap issue. Ok for trunk?

Ok.

Thanks,
Richard.

> 2013-04-26  Teresa Johnson  
>
> * basic-block.h (apply_scale): New function.
> (apply_probability): Use apply_scale.
> * gimple-streamer-in.c (input_bb): Ditto.
> * lto-streamer-in.c (input_cfg): Ditto.
> * lto-cgraph.c (merge_profile_summaries): Ditto.
> * tree-optimize.c (execute_fixup_cfg): Ditto.
> * tree-inline.c (copy_bb): Update comment to use
> apply_scale.
> (copy_edges_for_bb): Ditto.
> (copy_cfg_body): Ditto.
>
> Index: gimple-streamer-in.c
> ===
> --- gimple-streamer-in.c(revision 198344)
> +++ gimple-streamer-in.c(working copy)
> @@ -329,8 +329,8 @@ input_bb (struct lto_input_block *ib, enum LTO_tag
>index = streamer_read_uhwi (ib);
>bb = BASIC_BLOCK_FOR_FUNCTION (fn, index);
>
> -  bb->count = apply_probability (streamer_read_gcov_count (ib),
> - count_materialization_scale);
> +  bb->count = apply_scale (streamer_read_gcov_count (ib),
> +   count_materialization_scale);
>bb->frequency = streamer_read_hwi (ib);
>bb->flags = streamer_read_hwi (ib);
>
> Index: lto-streamer-in.c
> ===
> --- lto-streamer-in.c   (revision 198344)
> +++ lto-streamer-in.c   (working copy)
> @@ -635,8 +635,8 @@ input_cfg (struct lto_input_block *ib, struct func
>
>   dest_index = streamer_read_uhwi (ib);
>   probability = (int) streamer_read_hwi (ib);
> - count = apply_probability ((gcov_type) streamer_read_gcov_count 
> (ib),
> - count_materialization_scale);
> + count = apply_scale ((gcov_type) streamer_read_gcov_count (ib),
> +   count_materialization_scale);
>   edge_flags = streamer_read_uhwi (ib);
>
>   dest = BASIC_BLOCK_FOR_FUNCTION (fn, dest_index);
> Index: tree-inline.c
> ===
> --- tree-inline.c   (revision 198344)
> +++ tree-inline.c   (working copy)
> @@ -1519,7 +1519,7 @@ copy_bb (copy_body_data *id, basic_block bb, int f
>   basic_block_info automatically.  */
>copy_basic_block = create_basic_block (NULL, (void *) 0,
>   (basic_block) prev->aux);
> -  /* Update to use apply_probability().  */
> +  /* Update to use apply_scale().  */
>copy_basic_block->count = bb->count * count_scale / REG_BR_PROB_BASE;
>
>/* We are going to rebuild frequencies from scratch.  These values
> @@ -1891,7 +1891,7 @@ copy_edges_for_bb (basic_block bb, gcov_type count
> && old_edge->dest->aux != EXIT_BLOCK_PTR)
>   flags |= EDGE_FALLTHRU;
> new_edge = make_edge (new_bb, (basic_block) old_edge->dest->aux, 
> flags);
> -/* Update to use apply_probability().  */
> +/* Update to use apply_scale().  */
> new_edge->count = old_edge->count * count_scale / REG_BR_PROB_BASE;
> new_edge->probability = old_edge->probability;
>}
> @@ -2278,7 +2278,7 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
> incoming_frequency += EDGE_FREQUENCY (e);
> incoming_count += e->count;
>   }
> -  /* Update to use apply_probability().  */
> +  /* Update to use apply_scale().  */
>incoming_count = incoming_count * count_scale / REG_BR_PROB_BASE;
>/* Update to use EDGE_FREQUENCY.  */
>incoming_frequency
> Index: tree-optimize.c
> ===
> --- tree-optimize.c (revision 198344)
> +++ tree-optimize.c (working copy)
> @@ -131,15 +131,15 @@ execute_fixup_cfg (void)
>  ENTRY_BLOCK_PTR->count);
>
>ENTRY_BLOCK_PTR->count = cgraph_get_node (current_function_decl)->count;
> -  EXIT_BLOCK_PTR->count = apply_probability (EXIT_BLOCK_PTR->count,
> - count_scale);
> +  EXIT_BLOCK_PTR->count = apply_scale (EXIT_BLOCK_PTR->count,
> +  

Re: [PATCH] Preserve loops from CFG build until after RTL loop opts

2013-04-29 Thread Richard Biener
On Sun, 28 Apr 2013, Tom de Vries wrote:

> On 26/04/13 16:27, Tom de Vries wrote:
> > On 25/04/13 16:19, Richard Biener wrote:
> > 
> >> and compared to the previous patch changed the tree-ssa-tailmerge.c
> >> part to deal with merging of loop latch and loop preheader (even
> >> if that's a really bad idea) to not regress gcc.dg/pr50763.c.
> >> Any suggestion on how to improve that part welcome.
> 
> > So I think this is really a cornercase, and we should disregard it if that 
> > makes
> > things simpler.
> > 
> > Rather than fixing up the loop structure, we could prevent tail-merge in 
> > these
> > cases.
> > 
> > The current fix tests for current_loops == NULL, and I'm not sure that can 
> > still
> > happen there, given that we have PROP_loops.
> 
> Richard,
> 
> I've found that it happens in these g++ test-cases:
>   g++.dg/ext/mv1.C
>   g++.dg/ext/mv12.C
>   g++.dg/ext/mv2.C
>   g++.dg/ext/mv5.C
>   g++.dg/torture/covariant-1.C
>   g++.dg/torture/pr43068.C
>   g++.dg/torture/pr47714.C
> This seems rare enough to just bail out of tail-merge in those cases.
> 
> > It's not evident to me that the test bb2->loop_father->latch == bb2 is
> > sufficient. Before calling tail_merge_optimize, we call 
> > loop_optimizer_finalize
> > in which we assert that LOOPS_MAY_HAVE_MULTIPLE_LATCHES from there on, so in
> > theory we might miss some latches.
> > 
> > But I guess that pre (having started out with simple latches) maintains 
> > simple
> > latches throughout, and that tail-merge does the same.
> 
> I've added a comment related to this in the patch.
> 
> Bootstrapped and reg-tested (ada inclusive) on x86_64.
> 
> OK for trunk?

+  if (bb == NULL
+  /* Be conservative with loop structure.  It's not evident that this
test
+is sufficient.  Before tail-merge, we've just called
+loop_optimizer_finalize, and LOOPS_MAY_HAVE_MULTIPLE_LATCHES is 
now
+set, so there's no guarantee that the loop->latch value is still
valid.
+But we assume that, since we've forced LOOPS_HAVE_SIMPLE_LATCHES 
at
the
+start of pre, we've kept that property intact throughout pre, and
are
+keeping it throughout tail-merge using this test.  */
+  || bb->loop_father->latch == bb)
 return;

A more complete test would be to use what the bb_loop_header_p predicate
does - skip latch _edges_.  Not sure if that's easily possible in
the loop looking at succs

  FOR_EACH_EDGE (e, ei, bb->succs)
{
  int index = e->dest->index;
  bitmap_set_bit (same->succs, index);
  same_succ_edge_flags[index] = e->flags;
}

but we'd skip all edges for which

dominated_by_p (CDI_DOMINATORS, e->src, e->dest)

of course that's equal to skipping the whole basic-block if the
above is true.

I suppose the patch is ok as-is for now, but let's keep the above
in mind (I want to audit the whole bootstrap process for loops
that vanish and eventually re-appear, I just didn't get around
thinking about a proper way to efficiently instrument for that).

Thanks,
Richard.


Re: [PATCH] Fix VRP LSHIFT_EXPR non-singleton shift count handling (PR tree-optimization/57083)

2013-04-29 Thread Richard Biener
On Sat, 27 Apr 2013, Jakub Jelinek wrote:

> Hi!
> 
> If shift count range is [0, 1], then for unsigned LSHIFT_EXPR
> bound is the topmost bit, but as llshift method always sign-extends
> the result into double_int, the test don't properly find out that
> deriving the value range is unsafe.  In this case
> vr0 is [0x7fff8001, 0x8001], thus when shifting up by 0 or one bit
> we might shift out either zero or 1.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok.

Thanks,
Richard.

> 2013-04-26  Jakub Jelinek  
> 
>   PR tree-optimization/57083
>   * tree-vrp.c (extract_range_from_binary_expr_1): For LSHIFT_EXPR with
>   non-singleton shift count range, zero extend low_bound for uns case.
> 
>   * gcc.dg/torture/pr57083.c: New test.
> 
> --- gcc/tree-vrp.c.jj 2013-04-24 12:07:07.0 +0200
> +++ gcc/tree-vrp.c2013-04-26 17:59:41.077938198 +0200
> @@ -2837,7 +2837,7 @@ extract_range_from_binary_expr_1 (value_
>  
> if (uns)
>   {
> -   low_bound = bound;
> +   low_bound = bound.zext (prec);
> high_bound = complement.zext (prec);
> if (tree_to_double_int (vr0.max).ult (low_bound))
>   {
> --- gcc/testsuite/gcc.dg/torture/pr57083.c.jj 2013-04-26 18:09:05.396031875 
> +0200
> +++ gcc/testsuite/gcc.dg/torture/pr57083.c2013-04-26 18:08:51.0 
> +0200
> @@ -0,0 +1,15 @@
> +/* PR tree-optimization/57083 */
> +/* { dg-do run { target int32plus } } */
> +
> +extern void abort (void);
> +short x = 1;
> +int y = 0;
> +
> +int
> +main ()
> +{
> +  unsigned t = (0x7fff8001U - x) << (y == 0);
> +  if (t != 0xU)
> +abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend