Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Uros Bizjak
On Wed, Dec 7, 2011 at 11:32 PM, Teresa Johnson  wrote:

>>> An issue turned up in our internal 4.6 based testing that has been
>>> fixed on trunk. This patch backports the fix to 4.6. I also have a
>>> small test case that I will add to both 4.6 and 4.7.
>>>
>>> Bootstrapped and checked with x86_64-unknown-linux-gnu.
>>>
>>> 2011-12-07  Teresa Johnson  
>>>
>>>       Backport from mainline:
>>>
>>>       2011-08-05  Uros Bizjak  
>>>
>>>       * config/i386/i386.md (*movdi_internal_rex64): Use "!o" constraint
>>>       instead of "!m" for operand 0, alternative 4.
>>>       (*movdf_internal_rex64): Ditto for operand 0, alernative 6.
>>>
>>> 2011-12-07  Teresa Johnson  
>>>
>>>       * gcc.target/i386/movdi-rex64.c: New.
>>
>> Index: testsuite/gcc.target/i386/movdi-rex64.c
>> ===
>> --- testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
>> +++ testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-fPIE -Wwrite-strings" } */
>> +
>> +#include 
>> +static __thread char buffer[25];
>> +const char * error_message (void)
>> +{
>> +oops:
>> +    strcpy (buffer, "Unknown code ");
>> +    return 0;
>> +}
>>
>> You don't need #include for compile tests, just use:
>>
>> --cut here--
>> /* { dg-do compile } */
>> /* { dg-options "-fPIE" } */
>>
>> char *strcpy (char *dest, const char *src);
>>
>> static __thread char buffer[25];
>>
>> const char
>> * error_message (void)
>> {
>>  strcpy (buffer, "Unknown code ");
>>  return 0;
>> }
>> --cut here--
>>
>> Also this can be compiled everywhere, not just linux.
>
> Ok, I will change the testcase to replace the include and retest.
>
> Regarding the linux target restriction, though, I was concerned about
> the -fPIE option used for the test case. I noticed that in 4.7 there
> is a "pie" effective target keyword (check_effective_target_pie in
> testsuite/lib/target-supports.exp). However, that does not yet exist
> in 4.6, so rather than backport that as well I added the linux
> restriction. I see the same restriction in the other tests that use
> -fpie in gcc.target/i386 (pr39013-[12].c). What do you think?

Ah, I see. Then pleasee add back linux target selctor for 4.6 and add
fpie effective target check for 4.7.

Thanks,
Uros.


Re: [4.7][google] Adding a new option -fstack-protector-strong. (issue5461043)

2011-12-07 Thread Andrew Pinski
On Wed, Dec 7, 2011 at 5:07 PM, Han Shen  wrote:
> +  /* Examine each basic block for address taking of local variables. */

I don't think you need to look at the basic blocks to figure out if a
local variable has its address taken.  You can look through referenced
variables and see if it is a local variable and TREE_ADDRESSABLE is
set.  This should speed up the code a lot.  Though that might has some
false positives with arrays but that is ok because you are checking if
there are any arrays already anyways.

Thanks,
Andrew Pinski


[4.7][google] Adding a new option -fstack-protector-strong. (issue5461043)

2011-12-07 Thread Han Shen
Hi, this patch provides a new stack protection option - 
"fstack-protector-strong".

Background - some times stack-protector is too-simple while stack-protector-all 
over-kills, for example, to build one of our core systems, we forcibly add 
"-fstack-protector-all" to all compile commands, which brings big performance 
penalty (due to extra stack guard/check insns on function prologue and 
epilogue) on both atom and arm. To use "-fstack-protector" is just regarded as 
not secure enough (only "protects" <2% functions) by the system secure team. So 
I'd like to add the option "-fstack-protector-strong", that hits the balance 
between "-fstack-protector" and "-fstack-protector-all".

Benefit - gain big performance while sacrificing little security (for scenarios 
using -fstack-protector-all)

Status - implemented internally, to be up-streamed or merged to google branch 
only.

Detail - 
https://docs.google.com/a/google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU/edit?hl=en_US

Tested - manually, built chrome browser using the modified compiler with 
"-fstack-protector-strong".

=
gcc/ChangeLog:
* cfgexpand.c (expand_used_vars): Add logic handling 
stack-protector-strong.
(is_local_address_taken): Internal function that returns true when 
gimple contains
an address taken on function local variables.
(record_or_union_type_has_array): New, tests if a record or union type 
contains an array.
* common.opt (fstack-protector-all): New option.

gcc/testsuite/ChangeLog
* gcc.dg/fstack-protector-strong.c: New.

==

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 8684721..1d9df87 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1507,6 +1507,50 @@ estimated_stack_frame_size (struct cgraph_node *node)
   return size;
 }
 
+static int is_local_address_taken(tree t) {
+  if (t && TREE_CODE(t) == ADDR_EXPR) {
+int i;
+tree local_var;
+tree v = TREE_OPERAND(t, 0);
+switch (TREE_CODE(v)) {
+case MEM_REF:
+  for (i = 0; i < TREE_OPERAND_LENGTH(v); ++i)
+if (is_local_address_taken(TREE_OPERAND(v, i)))
+  return 1;
+  return 0;
+case COMPONENT_REF:
+  while (v && TREE_CODE(v) == COMPONENT_REF)
+v = TREE_OPERAND(v, 0);
+  break;
+case VAR_DECL:
+default:
+  ;
+}
+if (v && TREE_CODE(v) == VAR_DECL && !is_global_var(v)) {
+  FOR_EACH_VEC_ELT(tree, cfun->local_decls, i, local_var) {
+if (local_var == v)
+  return 1;
+  }
+}
+  }
+  return 0;
+}
+
+static int record_or_union_type_has_array(tree tree_type) {
+  tree fields = TYPE_FIELDS(tree_type);
+  tree f;
+  for (f = fields; f; f = DECL_CHAIN (f)) {
+if (TREE_CODE(f) == FIELD_DECL) {
+  tree field_type = TREE_TYPE(f);
+  if (RECORD_OR_UNION_TYPE_P(field_type))
+return record_or_union_type_has_array(field_type);
+  if (TREE_CODE(field_type) == ARRAY_TYPE)
+return 1;
+}
+  }
+  return 0;
+}
+
 /* Expand all variables used in the function.  */
 
 static void
@@ -1516,6 +1560,8 @@ expand_used_vars (void)
   VEC(tree,heap) *maybe_local_decls = NULL;
   unsigned i;
   unsigned len;
+  int gen_stack_protect_signal = 0;
+  basic_block bb;
 
   /* Compute the phase of the stack frame for this function.  */
   {
@@ -1548,6 +1594,63 @@ expand_used_vars (void)
}
 }
 
+  /* Examine each basic block for address taking of local variables. */
+  FOR_EACH_BB(bb) {
+gimple_stmt_iterator si;
+/* Scanning phis. */
+for (si = gsi_start_phis(bb); !gsi_end_p(si); gsi_next(&si)) {
+  gimple phi_stmt = gsi_stmt(si);
+  unsigned int i;
+  for (i = 0; i < gimple_phi_num_args(phi_stmt); ++i)
+if (is_local_address_taken(gimple_phi_arg(phi_stmt, i)->def))
+  ++gen_stack_protect_signal;
+}
+/* Scanning assignments and calls. */
+for (si = gsi_start_bb(bb); !gen_stack_protect_signal && !gsi_end_p(si);
+ gsi_next(&si)) {
+  gimple stmt = gsi_stmt (si);
+  if (is_gimple_assign(stmt)) {
+switch(gimple_assign_rhs_class(stmt)) {
+case GIMPLE_TERNARY_RHS:
+  if (is_local_address_taken(gimple_assign_rhs3(stmt))) {
+++gen_stack_protect_signal;
+break;
+  }
+  /* Otherwise, fall through. */
+case GIMPLE_BINARY_RHS:
+  if (is_local_address_taken(gimple_assign_rhs2(stmt))) {
+++gen_stack_protect_signal;
+break;
+  }
+case GIMPLE_SINGLE_RHS:
+case GIMPLE_UNARY_RHS:
+  if (is_local_address_taken(gimple_assign_rhs1(stmt)))
+++gen_stack_protect_signal;
+  break;
+case GIMPLE_INVALID_RHS:
+  break;
+}
+  }
+
+  if (!gen_stack_protect_signal && is_gimple_call(stmt)) {
+int ii, num_arg = gimple_call_num_args(stmt);
+for (ii = 0; !gen_stack_protect_signal && ii < num_arg; ++ii)
+

Re: [C++ PATCH] Avoid ICE on auto non-static data members (PR c++/51401)

2011-12-07 Thread Jason Merrill

OK.

Jason


Re: -finstrument-functions leads to unsats to _mm instrinsic wrappers

2011-12-07 Thread Xinliang David Li
On Wed, Dec 7, 2011 at 4:25 PM, Andrew Pinski  wrote:
> On Wed, Dec 7, 2011 at 4:01 PM, Xinliang David Li  wrote:
>> Build the test case in the patch file with -finstrument-functions, the
>> link will fail with unsat. The problem is gcc instruments the
>> artificial wrappers that will won't be emitted. The patch fixes the
>> problem. Bootstrap and tested on x86-64/linux.
>>
>
> I think you really should be checking for artificial attribute and not
> looking at always_inline.

Interesting -- I thought I added that. The following is the revised one.

thanks,

David

>
> Thanks,
> Andrew Pinski
Index: gimplify.c
===
--- gimplify.c	(revision 182083)
+++ gimplify.c	(working copy)
@@ -8048,6 +8048,13 @@ flag_instrument_functions_exclude_p (tre
 	  return true;
 }
 
+/* Avoid instrumenting wrapper functions to builtins.  */
+
+if (DECL_ARTIFICIAL (fndecl)
+&& DECL_DISREGARD_INLINE_LIMITS (fndecl)
+&& lookup_attribute ("always_inline", DECL_ATTRIBUTES (fndecl)))
+  return true;
+
   return false;
 }
 
Index: testsuite/gcc.target/i386/instrument-func.c
===
--- testsuite/gcc.target/i386/instrument-func.c	(revision 0)
+++ testsuite/gcc.target/i386/instrument-func.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sse } */
+/* { dg-options "-O2  -msse -finstrument-functions" } */
+
+#include 
+
+__attribute__((noinline)) __m128 foo( float *row, int x)
+{
+  __m128 vals = _mm_loadu_ps(row + x);
+  return vals;
+}
+
+int main()
+{
+  float f[10];
+  foo(f, 5);
+  return 0;
+}


Re: -finstrument-functions leads to unsats to _mm instrinsic wrappers

2011-12-07 Thread Andrew Pinski
On Wed, Dec 7, 2011 at 4:01 PM, Xinliang David Li  wrote:
> Build the test case in the patch file with -finstrument-functions, the
> link will fail with unsat. The problem is gcc instruments the
> artificial wrappers that will won't be emitted. The patch fixes the
> problem. Bootstrap and tested on x86-64/linux.
>

I think you really should be checking for artificial attribute and not
looking at always_inline.

Thanks,
Andrew Pinski


-finstrument-functions leads to unsats to _mm instrinsic wrappers

2011-12-07 Thread Xinliang David Li
Build the test case in the patch file with -finstrument-functions, the
link will fail with unsat. The problem is gcc instruments the
artificial wrappers that will won't be emitted. The patch fixes the
problem. Bootstrap and tested on x86-64/linux.

Ok for mainline?

thanks,

David
Index: gimplify.c
===
--- gimplify.c	(revision 182083)
+++ gimplify.c	(working copy)
@@ -8048,6 +8048,12 @@ flag_instrument_functions_exclude_p (tre
 	  return true;
 }
 
+/* Avoid instrumenting wrapper functions to builtins.  */
+
+if (DECL_DISREGARD_INLINE_LIMITS (fndecl)
+&& lookup_attribute ("always_inline", DECL_ATTRIBUTES (fndecl)))
+  return true;
+
   return false;
 }
 
Index: testsuite/gcc.target/i386/instrument-func.c
===
--- testsuite/gcc.target/i386/instrument-func.c	(revision 0)
+++ testsuite/gcc.target/i386/instrument-func.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sse } */
+/* { dg-options "-O2  -msse -finstrument-functions" } */
+
+#include 
+
+__attribute__((noinline)) __m128 foo( float *row, int x)
+{
+  __m128 vals = _mm_loadu_ps(row + x);
+  return vals;
+}
+
+int main()
+{
+  float f[10];
+  foo(f, 5);
+  return 0;
+}


cg1
Description: Binary data


cg2
Description: Binary data


Re: [v3] RFC: rename __calculate_memory_order

2011-12-07 Thread Benjamin Kosnik

> * include/bits/atomic_base.h (__calculate_memory_order):
> Rename to... (__cmpexch_failure_order): This, and rewrite as
> constexpr function. (compare_exchange_strong, compare_exchange_weak):
> Use it.
> * include/std/atomic (compare_exchange_strong,
> compare_exchange_weak): Likewise.
> 
> Tested x86_64-linux.

looks great to me. More constexpr, what's not to like?

-benjamin


[Committed] Fix PR libffi/50051 (MIPS libffi does not compile for mips64octeon-linux-gnu)

2011-12-07 Thread Andrew Pinski
Hi,
  The problem here is Cavium's octeon assembler rejects floating point
if the arch is set to either octeon or octeon2.  This fixes the issue
by adding:
.set mips4 so that floating point instructions are enabled.

Committed as approved by Anthony in the bugzilla.

Thanks,
Andrew Pinski

libffi/ChangeLog:
* src/mips/n32.S: Add ".set mips4".
Index: src/mips/n32.S
===
--- src/mips/n32.S  (revision 182083)
+++ src/mips/n32.S  (working copy)
@@ -43,6 +43,7 @@
 #ifdef __GNUC__
.abicalls
 #endif
+   .set mips4
.text
.align  2
.globl  ffi_call_N32


Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Teresa Johnson
On Wed, Dec 7, 2011 at 2:12 PM, Uros Bizjak  wrote:
> Hello!
>
>> An issue turned up in our internal 4.6 based testing that has been
>> fixed on trunk. This patch backports the fix to 4.6. I also have a
>> small test case that I will add to both 4.6 and 4.7.
>>
>> Bootstrapped and checked with x86_64-unknown-linux-gnu.
>>
>> 2011-12-07  Teresa Johnson  
>>
>>       Backport from mainline:
>>
>>       2011-08-05  Uros Bizjak  
>>
>>       * config/i386/i386.md (*movdi_internal_rex64): Use "!o" constraint
>>       instead of "!m" for operand 0, alternative 4.
>>       (*movdf_internal_rex64): Ditto for operand 0, alernative 6.
>>
>> 2011-12-07  Teresa Johnson  
>>
>>       * gcc.target/i386/movdi-rex64.c: New.
>
> Index: testsuite/gcc.target/i386/movdi-rex64.c
> ===
> --- testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
> +++ testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-fPIE -Wwrite-strings" } */
> +
> +#include 
> +static __thread char buffer[25];
> +const char * error_message (void)
> +{
> +oops:
> +    strcpy (buffer, "Unknown code ");
> +    return 0;
> +}
>
> You don't need #include for compile tests, just use:
>
> --cut here--
> /* { dg-do compile } */
> /* { dg-options "-fPIE" } */
>
> char *strcpy (char *dest, const char *src);
>
> static __thread char buffer[25];
>
> const char
> * error_message (void)
> {
>  strcpy (buffer, "Unknown code ");
>  return 0;
> }
> --cut here--
>
> Also this can be compiled everywhere, not just linux.

Ok, I will change the testcase to replace the include and retest.

Regarding the linux target restriction, though, I was concerned about
the -fPIE option used for the test case. I noticed that in 4.7 there
is a "pie" effective target keyword (check_effective_target_pie in
testsuite/lib/target-supports.exp). However, that does not yet exist
in 4.6, so rather than backport that as well I added the linux
restriction. I see the same restriction in the other tests that use
-fpie in gcc.target/i386 (pr39013-[12].c). What do you think?

Thanks!
Teresa

>
> OK with the testcase above.
>
> Thanks,
> Uros.



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Uros Bizjak
Hello!

> An issue turned up in our internal 4.6 based testing that has been
> fixed on trunk. This patch backports the fix to 4.6. I also have a
> small test case that I will add to both 4.6 and 4.7.
>
> Bootstrapped and checked with x86_64-unknown-linux-gnu.
>
> 2011-12-07  Teresa Johnson  
>
>   Backport from mainline:
>
>   2011-08-05  Uros Bizjak  
>
>   * config/i386/i386.md (*movdi_internal_rex64): Use "!o" constraint
>   instead of "!m" for operand 0, alternative 4.
>   (*movdf_internal_rex64): Ditto for operand 0, alernative 6.
>
> 2011-12-07  Teresa Johnson  
>
>   * gcc.target/i386/movdi-rex64.c: New.

Index: testsuite/gcc.target/i386/movdi-rex64.c
===
--- testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
+++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-fPIE -Wwrite-strings" } */
+
+#include 
+static __thread char buffer[25];
+const char * error_message (void)
+{
+oops:
+strcpy (buffer, "Unknown code ");
+return 0;
+}

You don't need #include for compile tests, just use:

--cut here--
/* { dg-do compile } */
/* { dg-options "-fPIE" } */

char *strcpy (char *dest, const char *src);

static __thread char buffer[25];

const char
* error_message (void)
{
  strcpy (buffer, "Unknown code ");
  return 0;
}
--cut here--

Also this can be compiled everywhere, not just linux.

OK with the testcase above.

Thanks,
Uros.


[Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Teresa Johnson
An issue turned up in our internal 4.6 based testing that has been
fixed on trunk. This patch backports the fix to 4.6. I also have a
small test case that I will add to both 4.6 and 4.7.

Bootstrapped and checked with x86_64-unknown-linux-gnu.

Can someone review?

Thanks,
Teresa

2011-12-07  Teresa Johnson  

Backport from mainline:

2011-08-05  Uros Bizjak  

* config/i386/i386.md (*movdi_internal_rex64): Use "!o" constraint
instead of "!m" for operand 0, alternative 4.
(*movdf_internal_rex64): Ditto for operand 0, alernative 6.

2011-12-07  Teresa Johnson  

* gcc.target/i386/movdi-rex64.c: New.


Index: 
config/i386/i386.md===---
config/i386/i386.md (revision 182083)+++ config/i386/i386.md(working
copy)@@ -1960,7 +1960,7 @@  (define_insn "*movdi_internal_rex64"
[(set (match_operand:DI 0 "nonimmediate_operand"- "=r,r  ,r,m
,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")+ "=r,r
,r,m ,!o,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym")
(match_operand:DI 1 "general_operand"     "Z ,rem,i,re,n ,C
,*y,*Ym,*y,r   ,m  ,C ,*x,*Yi,*x,r  ,m ,*Ym,*x"))]   "TARGET_64BIT &&
!(MEM_P (operands[0]) && MEM_P (operands[1]))"@@ -2905,7 +2905,7 @@
(define_insn "*movdf_internal_rex64"   [(set (match_operand:DF 0
"nonimmediate_operand"- "=f,m,f,r ,m,!r,!m,Y2*x,Y2*x,Y2*x,m   ,Yi,r
")+ "=f,m,f,r ,m,!r,!o,Y2*x,Y2*x,Y2*x,m   ,Yi,r ")  
(match_operand:DF
1 "general_operand" "fm,f,G,rm,r,F ,F ,C   ,Y2*x,m   ,Y2*x,r ,Yi"))]
  "TARGET_64BIT && !(MEM_P (operands[0]) && MEM_P (operands[1]))

Index: testsuite/gcc.target/i386/movdi-rex64.c
===
--- testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
+++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-fPIE -Wwrite-strings" } */
+
+#include 
+static __thread char buffer[25];
+const char * error_message (void)
+{
+oops:
+strcpy (buffer, "Unknown code ");
+return 0;
+}


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Avoid using constructor attribute in libcpp (PR bootstrap/50237)

2011-12-07 Thread Richard Henderson
On 12/07/2011 12:20 PM, Jakub Jelinek wrote:
>   PR bootstrap/50237
>   * internal.h (_cpp_init_lexer): New prototype.
>   * init.c (init_library): Call it.
>   * lex.c (init_vectorized_lexer): Remove constructor attribute,
>   add inline keyword.
>   (HAVE_init_vectorized_lexer): Define.
>   (_cpp_init_lexer): New function.

Ok.


r~


Re: [Patch, Fortran] PR50815 - don't -fcheck=bounds of deferred-length strings

2011-12-07 Thread Mikael Morin
On Wednesday 07 December 2011 14:53:20 Tobias Burnus wrote:
> ** PING **
> 
> On 11/29/2011 07:35 PM, Tobias Burnus wrote:
> > gfortran had an ICE when trying to insert a check whether the
> > character length between actual and dummy argument matches. This check
> > is pointless for deferred-length character arguments - thus, no bounds
> > check should be generated.
> > 
> > Build and regtested on x86-64-linux.
> > OK for the trunk?
> > 
OK, though I would have merged the skip for deferred lengths into the toplevel 
if condition (either is fine anyway).

Mikael


[C++ PATCH] Avoid ICE on auto non-static data members (PR c++/51401)

2011-12-07 Thread Jakub Jelinek
Hi!

By my reading auto is not valid on non-static data members, if they aren't
NSDMI, then we'd error on them (but differently between templates and
non-templates), but with NSDMI in template we just ICE because init is
DEFAULT_ARG.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-12-07  Jakub Jelinek  

PR c++/51401
* decl.c (grokdeclarator): Error for auto on non-static data members.

* g++.dg/cpp0x/auto7.C: Adjust expected error message.
* g++.dg/cpp0x/auto29.C: New test.

--- gcc/cp/decl.c.jj2011-12-07 12:51:17.0 +0100
+++ gcc/cp/decl.c   2011-12-07 18:09:09.113310949 +0100
@@ -9971,6 +9971,12 @@ grokdeclarator (const cp_declarator *dec
   }
 else if (decl_context == FIELD)
   {
+   if (!staticp && type_uses_auto (type))
+ {
+   error ("non-static data member declared %");
+   type = error_mark_node;
+ }
+
/* The C99 flexible array extension.  */
if (!staticp && TREE_CODE (type) == ARRAY_TYPE
&& TYPE_DOMAIN (type) == NULL_TREE)
--- gcc/testsuite/g++.dg/cpp0x/auto7.C.jj   2011-05-31 08:02:58.0 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/auto7.C  2011-12-07 18:14:18.011549947 +0100
@@ -9,5 +9,5 @@ template struct A
 {
   static auto k = 7;   // { dg-error "non-const" }
   static auto l;   // { dg-error "has no initializer" }
-  auto m;  // { dg-error "has no initializer" }
+  auto m;  // { dg-error "non-static data member declared" }
 };
--- gcc/testsuite/g++.dg/cpp0x/auto29.C.jj  2011-12-07 18:12:22.181184910 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/auto29.C 2011-12-07 18:11:52.0 +0100
@@ -0,0 +1,25 @@
+// PR c++/51401
+// { dg-do compile }
+// { dg-options "-std=c++11" }
+
+template 
+struct A
+{
+  auto i;  // { dg-error "non-static data member declared" }
+};
+
+template 
+struct B
+{
+  auto i = 0;  // { dg-error "non-static data member declared" }
+};
+
+struct C
+{
+  auto i;  // { dg-error "non-static data member declared" }
+};
+
+struct D
+{
+  auto i = 0;  // { dg-error "non-static data member declared" }
+};

Jakub


Re: [PATCH][ARM] one_cmpldi2 in NEON

2011-12-07 Thread Andrew Stubbs

On 07/12/11 16:25, Richard Earnshaw wrote:

2011-12-06  Andrew Stubbs

gcc/
* config/arm/arm.md (one_cmpldi2): Rename to ...
(one_cmpldi2_core): ... this, and modify it to prevent it being
used for NEON.
(one_cmpldi2): New define_expand.
* config/arm/neon.md (one_cmpldi2_neon): New define_insn.




+(define_insn_and_split "*one_cmpldi2_core"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "=&r,&r")
+   (not:DI (match_operand:DI 1 "arm_general_register_operand" "0,r")))]


Thinking about it, for an operation with one input and one output, there's no 
need for the
earlyclobber marker when the input is tied to the output (there's no other 
operand that can be
clobbered).

Otherwise OK.


Thanks I'll test and commit the attached updated patch once stage 1 
opens again.


Andrew
2011-12-07  Andrew Stubbs  

	gcc/
	* config/arm/arm.md (one_cmpldi2): Rename to ...
	(one_cmpldi2_core): ... this, and modify it to prevent it being
	used for NEON.
	(one_cmpldi2): New define_expand.
	* config/arm/neon.md (one_cmpldi2_neon): New define_insn.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4199,10 +4199,16 @@
   "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_VFP_DOUBLE)"
   "")
 
-(define_insn_and_split "one_cmpldi2"
-  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r")
-	(not:DI (match_operand:DI 1 "s_register_operand" "0,r")))]
+(define_expand "one_cmpldi2"
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(not:DI (match_operand:DI 1 "s_register_operand" "")))]
   "TARGET_32BIT"
+  "")
+
+(define_insn_and_split "*one_cmpldi2_core"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "=r,&r")
+	(not:DI (match_operand:DI 1 "arm_general_register_operand" "0,r")))]
+  "TARGET_32BIT && !TARGET_NEON"
   "#"
   "TARGET_32BIT && reload_completed"
   [(set (match_dup 0) (not:SI (match_dup 1)))
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -896,6 +896,20 @@
   [(set_attr "neon_type" "neon_int_1")]
 )
 
+(define_insn "*one_cmpldi2_neon"
+  [(set (match_operand:DI 0 "s_register_operand"	 "=w,?r,?&r,?w")
+	(not:DI (match_operand:DI 1 "s_register_operand" " w, 0,  r, w")))]
+  "TARGET_NEON"
+  "@
+  vmvn\t%P0, %P1
+  #
+  #
+  vmvn\t%P0, %P1"
+  [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
+   (set_attr "length" "*,8,8,*")
+   (set_attr "arch" "nota8,*,*,onlya8")]
+)
+
 (define_insn "abs2"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
 	(abs:VDQW (match_operand:VDQW 1 "s_register_operand" "w")))]


Re: [C++ PATCH] Fix ICE in cxx_incomplete_type_diagnostic (PR c++/51429)

2011-12-07 Thread Jason Merrill

OK.

Jason


Re: [C++ PATCH] Error out on [N] = style designated initializers for classes (PR c++/51229)

2011-12-07 Thread Jason Merrill

On 12/07/2011 03:25 PM, Jakub Jelinek wrote:

Hi!

We already error out on .foo = style designators for arrays, but for
[N] = style designators for classes we'd just ICE.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


The last one is a accepts-invalid even for C.  Jason, do you think you could
look at that on the C++ side, I'll look at the C one?


At some point, sure.  Would you open a PR?

Jason



[C++ PATCH] Fix ICE in cxx_incomplete_type_diagnostic (PR c++/51429)

2011-12-07 Thread Jakub Jelinek
Hi!

This testcase regressed recently when Paolo added the user friendlier
invalid use of member function vs. invalid use of member distinction,
when TREE_OPERAND (value, 1) is BASELINK, DECL_FUNCTION_MEMBER_P on it
dies with a checking ICE.
The following patch calls get_first_fn if it is overloaded_fn.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-12-07  Jakub Jelinek  

PR c++/51429
* typeck2.c (cxx_incomplete_type_diagnostic): Don't
ICE if TREE_OPERAND (value, 1) is overloaded.

* g++.dg/parse/error45.C: New test.

--- gcc/cp/typeck2.c.jj 2011-11-07 12:40:45.0 +0100
+++ gcc/cp/typeck2.c2011-12-07 15:31:30.506795286 +0100
@@ -428,15 +428,20 @@ cxx_incomplete_type_diagnostic (const_tr
 
 case OFFSET_TYPE:
 bad_member:
-  if (DECL_FUNCTION_MEMBER_P (TREE_OPERAND (value, 1))
- && ! flag_ms_extensions)
-   emit_diagnostic (diag_kind, input_location, 0,
-"invalid use of member function "
-"(did you forget the %<()%> ?)");
-  else
-   emit_diagnostic (diag_kind, input_location, 0,
-"invalid use of member "
-"(did you forget the %<&%> ?)");
+  {
+   tree member = TREE_OPERAND (value, 1);
+   if (is_overloaded_fn (member))
+ member = get_first_fn (member);
+   if (DECL_FUNCTION_MEMBER_P (member)
+   && ! flag_ms_extensions)
+ emit_diagnostic (diag_kind, input_location, 0,
+  "invalid use of member function "
+  "(did you forget the %<()%> ?)");
+   else
+ emit_diagnostic (diag_kind, input_location, 0,
+  "invalid use of member "
+  "(did you forget the %<&%> ?)");
+  }
   break;
 
 case TEMPLATE_TYPE_PARM:
--- gcc/testsuite/g++.dg/parse/error45.C.jj 2011-12-07 15:45:28.732286838 
+0100
+++ gcc/testsuite/g++.dg/parse/error45.C2011-12-07 15:44:20.0 
+0100
@@ -0,0 +1,9 @@
+// PR c++/51429
+// { dg-do compile }
+
+struct A
+{
+  void foo (double);
+  void foo (int);
+  A () { foo = 0; }// { dg-error "invalid use of member function" }
+};

Jakub


[C++ PATCH] Error out on [N] = style designated initializers for classes (PR c++/51229)

2011-12-07 Thread Jakub Jelinek
Hi!

We already error out on .foo = style designators for arrays, but for
[N] = style designators for classes we'd just ICE.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

We still have a bunch of accepts-invalid, no idea how to solve them,
because reshape_init_r is called often several times with the same d->cur.
IMHO we should error on all of:
struct A { int i; };
int a = { .foo = 6 };
int b = { [0] = 1 };
_Complex float c = { .foo = 0,  1 };
_Complex float d = { [0] = 0,  1 };
_Complex float e = { 0, .foo = 1 };
_Complex float f = { 0, [0] = 1 };
char g[] = { [7] = "abcd" };
The last one is a accepts-invalid even for C.  Jason, do you think you could
look at that on the C++ side, I'll look at the C one?

2011-12-07  Jakub Jelinek  

PR c++/51229
* decl.c (reshape_init_class): Complain if d->cur->index is
INTEGER_CST.
* parser.c (cp_parser_initializer_list): If cp_parser_parse_definitely
fails, clear designator.

* g++.dg/ext/desig3.C: New test.

--- gcc/cp/decl.c.jj2011-12-01 11:45:04.0 +0100
+++ gcc/cp/decl.c   2011-12-07 12:51:17.455762864 +0100
@@ -5078,6 +5078,14 @@ reshape_init_class (tree type, reshape_i
   /* Handle designated initializers, as an extension.  */
   if (d->cur->index)
{
+ if (TREE_CODE (d->cur->index) == INTEGER_CST)
+   {
+ if (complain & tf_error)
+   error ("%<[%E] =%> used in a GNU-style designated initializer"
+  " for class %qT", d->cur->index, type);
+ return error_mark_node;
+   }
+
  field = lookup_field_1 (type, d->cur->index, /*want_type=*/false);
 
  if (!field || TREE_CODE (field) != FIELD_DECL)
--- gcc/cp/parser.c.jj  2011-11-28 17:58:00.0 +0100
+++ gcc/cp/parser.c 2011-12-07 11:31:33.525651711 +0100
@@ -17713,7 +17713,8 @@ cp_parser_initializer_list (cp_parser* p
  designator = cp_parser_constant_expression (parser, false, NULL);
  cp_parser_require (parser, CPP_CLOSE_SQUARE, RT_CLOSE_SQUARE);
  cp_parser_require (parser, CPP_EQ, RT_EQ);
- cp_parser_parse_definitely (parser);
+ if (!cp_parser_parse_definitely (parser))
+   designator = NULL_TREE;
}
   else
designator = NULL_TREE;
--- gcc/testsuite/g++.dg/ext/desig3.C.jj2011-12-07 12:55:34.894323455 
+0100
+++ gcc/testsuite/g++.dg/ext/desig3.C   2011-12-07 12:56:32.172001945 +0100
@@ -0,0 +1,9 @@
+// PR c++/51229
+// { dg-do compile }
+// { dg-options "" }
+
+struct A { int i; };
+
+int a[5] = { .foo = 7 };// { dg-error "used in a GNU-style designated 
initializer for an array" }
+int b[] = { .foo = 8 };// { dg-error "used in a GNU-style designated 
initializer for an array" }
+A c = { [0] = {} };// { dg-error "used in a GNU-style designated 
initializer for class" }

Jakub


[PATCH] Avoid using constructor attribute in libcpp (PR bootstrap/50237)

2011-12-07 Thread Jakub Jelinek
Hi!

While this isn't real fix for this PR, which should be fixed in configury,
I think avoiding ctors in host code is useful for portability and Eric
agreed with that in the PR.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-12-07  Jakub Jelinek  

PR bootstrap/50237
* internal.h (_cpp_init_lexer): New prototype.
* init.c (init_library): Call it.
* lex.c (init_vectorized_lexer): Remove constructor attribute,
add inline keyword.
(HAVE_init_vectorized_lexer): Define.
(_cpp_init_lexer): New function.

--- libcpp/internal.h.jj2011-11-04 07:49:56.0 +0100
+++ libcpp/internal.h   2011-11-28 16:01:13.467831488 +0100
@@ -653,6 +653,7 @@ extern int _cpp_equiv_tokens (const cpp_
 extern void _cpp_init_tokenrun (tokenrun *, unsigned int);
 extern cpp_hashnode *_cpp_lex_identifier (cpp_reader *, const char *);
 extern int _cpp_remaining_tokens_num_in_context (cpp_context *);
+extern void _cpp_init_lexer (void);
 
 /* In init.c.  */
 extern void _cpp_maybe_push_include_file (cpp_reader *);
--- libcpp/init.c.jj2011-10-31 20:44:15.0 +0100
+++ libcpp/init.c   2011-11-28 16:01:42.168044257 +0100
@@ -134,6 +134,8 @@ init_library (void)
 {
   initialized = 1;
 
+  _cpp_init_lexer ();
+
   /* Set up the trigraph map.  This doesn't need to do anything if
 we were compiled with a compiler that supports C99 designated
 initializers.  */
--- libcpp/lex.c.jj 2011-10-27 08:43:10.0 +0200
+++ libcpp/lex.c2011-11-28 16:04:42.624703698 +0100
@@ -477,7 +477,8 @@ search_line_sse42 (const uchar *s, const
 typedef const uchar * (*search_line_fast_type) (const uchar *, const uchar *);
 static search_line_fast_type search_line_fast;
 
-static void __attribute__((constructor))
+#define HAVE_init_vectorized_lexer 1
+static inline void
 init_vectorized_lexer (void)
 {
   unsigned dummy, ecx = 0, edx = 0;
@@ -638,6 +639,16 @@ search_line_fast (const uchar *s, const
 
 #endif
 
+/* Initialize the lexer if needed.  */
+
+void
+_cpp_init_lexer (void)
+{
+#ifdef HAVE_init_vectorized_lexer
+  init_vectorized_lexer ();
+#endif
+}
+
 /* Returns with a logical line that contains no escaped newlines or
trigraphs.  This is a time-critical inner loop.  */
 void

Jakub


Re: [Patch, Fortran] PR 51378 Fix component-access check

2011-12-07 Thread Mikael Morin
On Friday 02 December 2011 22:01:19 Tobias Burnus wrote:
> Found via Reinhold Bader's test suite: If a component is public, it
> remains public even if the extended type has PRIVATE.
> 
> Build and regtested on x86-64-linux.
> OK for the trunk?
> 
OK.

Mikael


Re: [Patch, Fortran] PR51407 - allow BOZ edit descriptors for REAL/COMPLEX

2011-12-07 Thread Mikael Morin
On Wednesday 07 December 2011 14:54:36 Tobias Burnus wrote:
> * ping * ?
> 
> On 12/04/2011 06:46 PM, Tobias Burnus wrote:
> > Hi all,
> > 
> > as Dominique has found, Fortran 2008 allows the BOZ edit descriptors
> > now also with REAL and COMPLEX arguments. (See PR for quotes from the
> > standard.)
> > 
> > Build and regtested on x86-64-linux.
> > OK for the trunk?
> > 
OK.

Mikael


Re: PR 51386

2011-12-07 Thread François Dumont

Attached patch applied:

2011-12-07  François Dumont 

PR libstdc++/51386
* include/bits/hashtable_policy.h 
(_Prime_rehash_policy::_M_next_bkt):

Fix computation of _M_prev_resize so that hashtable do not keep on
being rehashed when _M_max_load_factor is lower than 1.

François

On 12/07/2011 11:21 AM, Paolo Carlini wrote:

Hi,

Ok to commit ?
Yes, thanks a lot for handling this. Please remember to add a proper 
header to the ChangeLog entry.


Thanks again,
Paolo.



Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h	(revision 181975)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -300,23 +300,30 @@
   {
 // Optimize lookups involving the first elements of __prime_list.
 // (useful to speed-up, eg, constructors)
-static const unsigned long __fast_bkt[12]
+static const unsigned char __fast_bkt[12]
   = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 };
 
+if (__n <= 11)
+  {
+	_M_prev_resize = 0;
+	_M_next_resize
+	  = __builtin_ceil(__fast_bkt[__n] * (long double)_M_max_load_factor);
+	return __fast_bkt[__n];
+  }
+
 const unsigned long* __p
-  = __n <= 11 ? __fast_bkt + __n
-		  : std::lower_bound(__prime_list + 5,
- __prime_list + _S_n_primes, __n);
+  = std::lower_bound(__prime_list + 5, __prime_list + _S_n_primes, __n);
 
-_M_prev_resize = __builtin_floor(*__p * (long double)_M_max_load_factor);
-if (__p != __fast_bkt)
-  _M_prev_resize = std::min(_M_prev_resize,
-static_cast(*(__p - 1)));
-// Lets guaranty a minimal grow step of 11:
+// Shrink will take place only if the number of elements is small enough
+// so that the prime number 2 steps before __p is large enough to still
+// conform to the max load factor:
+_M_prev_resize
+  = __builtin_floor(*(__p - 2) * (long double)_M_max_load_factor);
+
+// Let's guaranty that a minimal grow step of 11 is used
 if (*__p - __n < 11)
-  __p = std::lower_bound(__prime_list + 5,
-			 __prime_list + _S_n_primes, __n + 11);
-_M_next_resize = __builtin_floor(*__p * (long double)_M_max_load_factor);
+  __p = std::lower_bound(__p, __prime_list + _S_n_primes, __n + 11);
+_M_next_resize = __builtin_ceil(*__p * (long double)_M_max_load_factor);
 return *__p;
   }
 


Re: [Patch, Fortran] PR51448 [4.6/4.7] Fix realloc with RHS conversion function

2011-12-07 Thread Mikael Morin
On Wednesday 07 December 2011 17:45:52 Tobias Burnus wrote:
> This fixes a -frealloc-lhs regression where the RHS is handled by a
> "conversion function" whose argument has component refs.
> 
> Build and regtested on x86-64-linux.
> OK for the trunk and 4.7?
> 
> Tobias
OK.

Mikael


Re: [PATCH] [MIPS] Add -march=octeon+ support for GCC

2011-12-07 Thread Andrew Pinski
On Tue, Dec 6, 2011 at 6:28 PM, Andrew Pinski  wrote:
> Hi,
>  This patch adds -march=octeon+ to GCC.
> OK? Bootstrapped and tested on mips64-linux-gnu configured with
> --with-arch=octeon+ .
>
> Thanks,
> Andrew Pinski
>
> gcc/ChangeLog:
> * mips/mips-cpus.def (octeon+): New CPU.
> * config/mips/mips-tables.opt: Regenerate.
> * config/mips/mips.h (MIPS_CPP_SET_PROCESSOR): Emit '+' as 'P'.
>
> testsuite/ChangeLog:
> * gcc.target/mips/mult-1.c: Forbit all Octeon processors.
> * gcc.target/mips/dmult-1.c: Likewise.
> * gcc.target/mips/branch-1.c: Likewise.
> * gcc.target/mips/extend-1.c: Likewise.

Woops I forgot the patch.

Thanks,
Andrew Pinski
Index: testsuite/gcc.target/mips/mult-1.c
===
--- testsuite/gcc.target/mips/mult-1.c  (revision 182066)
+++ testsuite/gcc.target/mips/mult-1.c  (working copy)
@@ -1,6 +1,6 @@
 /* For SI->DI widening multiplication we should use DINS to combine the two
halves.  For Octeon use DMUL with explicit widening.  */
-/* { dg-options "-O -mgp64 isa_rev>=2 forbid_cpu=octeon" } */
+/* { dg-options "-O -mgp64 isa_rev>=2 forbid_cpu=octeon\[\+0-9\]*" } */
 /* { dg-final { scan-assembler "\tdins\t" } } */
 /* { dg-final { scan-assembler-not "\tdsll\t" } } */
 /* { dg-final { scan-assembler-not "\tdsrl\t" } } */
Index: testsuite/gcc.target/mips/dmult-1.c
===
--- testsuite/gcc.target/mips/dmult-1.c (revision 182066)
+++ testsuite/gcc.target/mips/dmult-1.c (working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "forbid_cpu=octeon -mgp64" } */
+/* { dg-options "forbid_cpu=octeon\[\+0-9\]* -mgp64" } */
 /* { dg-final { scan-assembler "\tdmult\t" } } */
 /* { dg-final { scan-assembler "\tmflo\t" } } */
 /* { dg-final { scan-assembler-not "\tdmul\t" } } */
Index: testsuite/gcc.target/mips/branch-1.c
===
--- testsuite/gcc.target/mips/branch-1.c(revision 182066)
+++ testsuite/gcc.target/mips/branch-1.c(working copy)
@@ -2,7 +2,7 @@
but we test for "bbit" elsewhere.  On other targets, we should implement
the "if" statements using an "andi" instruction followed by a branch
on zero.  */
-/* { dg-options "-O2 forbid_cpu=octeon" } */
+/* { dg-options "-O2 forbid_cpu=octeon\[\+0-9\]*" } */
 
 void bar (void);
 NOMIPS16 void f1 (int x) { if (x & 4) bar (); }
Index: testsuite/gcc.target/mips/extend-1.c
===
--- testsuite/gcc.target/mips/extend-1.c(revision 182066)
+++ testsuite/gcc.target/mips/extend-1.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "-O -mgp64 forbid_cpu=octeon" } */
+/* { dg-options "-O -mgp64 forbid_cpu=octeon\[\+0-9\]*" } */
 /* { dg-final { scan-assembler-times "\tdsll\t" 5 } } */
 /* { dg-final { scan-assembler-times "\tdsra\t" 5 } } */
 /* { dg-final { scan-assembler-not "\tsll\t" } } */
Index: config/mips/mips-tables.opt
===
--- config/mips/mips-tables.opt (revision 182066)
+++ config/mips/mips-tables.opt (working copy)
@@ -603,3 +603,6 @@
 EnumValue
 Enum(mips_arch_opt_value) String(octeon) Value(80) Canonical
 
+EnumValue
+Enum(mips_arch_opt_value) String(octeon+) Value(81) Canonical
+
Index: config/mips/mips-cpus.def
===
--- config/mips/mips-cpus.def   (revision 182066)
+++ config/mips/mips-cpus.def   (working copy)
@@ -145,3 +145,4 @@
 
 /* MIPS64 Release 2 processors.  */
 MIPS_CPU ("octeon", PROCESSOR_OCTEON, 65, PTF_AVOID_BRANCHLIKELY)
+MIPS_CPU ("octeon+", PROCESSOR_OCTEON, 65, PTF_AVOID_BRANCHLIKELY)
Index: config/mips/mips.h
===
--- config/mips/mips.h  (revision 182066)
+++ config/mips/mips.h  (working copy)
@@ -329,7 +329,10 @@
\
   macro = concat ((PREFIX), "_", (INFO)->name, NULL);  \
   for (p = macro; *p != 0; p++)\
-   *p = TOUPPER (*p);  \
+if (*p == '+')  \
+  *p = 'P'; \
+else\
+  *p = TOUPPER (*p);\
\
   builtin_define (macro);  \
   builtin_define_with_value ((PREFIX), (INFO)->name, 1);   \


Re: [PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Uros Bizjak
On Wed, Dec 7, 2011 at 7:58 PM, Iain Sandoe
 wrote:

>>> Currently we are failing...
>>>
>>> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread
>>> simulation test
>>> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread
>>> simulation test
>>> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread
>>> simulation test
>>> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread
>>> simulation test
>>>
>>> on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread
>>> of
>>> gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20
>>> seconds
>>> eliminates the failures (as these test take ~16 seconds on
>>> x86_64-apple-darwin11).
>>> Okay for gcc trunk?
>
>
> if it's only one test can't you use { dg-timeout-factor 2.0  ?
>
>
>> As said elsewhere, this will double the amount of already large
>> logfile in case of failed test.
>>
>> Do we really need such detailed log?
>
>
> anything to optimize what's in the logs would be welcome in debugging

I fully agree, but it is trivial to re-run the test in the debugger
outside the testsuite run. IMO, logging a couple of lines for
execution of every instruction (in a loop!) is a bit excessive.

Uros.


Re: [PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Iain Sandoe


On 7 Dec 2011, at 18:47, Uros Bizjak wrote:

On Wed, Dec 7, 2011 at 7:43 PM, Jack Howarth  
 wrote:

Currently we are failing...

FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread  
simulation test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread  
simulation test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread  
simulation test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread  
simulation test


on x86_64-apple-darwin11 due to the 10 second timeout in simulate- 
thread of
gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout  
to 20 seconds
eliminates the failures (as these test take ~16 seconds on x86_64- 
apple-darwin11).

Okay for gcc trunk?


if it's only one test can't you use { dg-timeout-factor 2.0  ?


As said elsewhere, this will double the amount of already large
logfile in case of failed test.

Do we really need such detailed log?


anything to optimize what's in the logs would be welcome in debugging

Iain



Re: Update to Fortran "invoke" documentation about the features -finit- *really* provides.

2011-12-07 Thread Toon Moene

On 12/06/2011 08:32 PM, Steve Kargl wrote:


On Mon, Dec 05, 2011 at 07:21:59PM +0100, Toon Moene wrote:


2011-12-05  Toon Moene

PR/51310
invoke.texi: Itemize the cases for which -finit-  doesn't
work.

OK for trunk ? (and perhaps later for the 4.6 branch ?



Looks good to me.  You can apply it to the 4.6 branch
if you have time.


And then  shortly before applying it, I realized that the proper 
documentation of the limitations might be dependent on the 
-fno-automatic, -fstack-arrays and -fmax-stack-var-size=n compiler flags 
used.


So I'll come back tomorrow with version 2.0 of this patch, after 
checking out all of the above (the documentation of 4.6 and 4.7 will be 
different if using -fstack-arrays makes a difference, because that 
option only exists in 4.7).


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: [PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Uros Bizjak
On Wed, Dec 7, 2011 at 7:43 PM, Jack Howarth  wrote:
> Currently we are failing...
>
> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread simulation 
> test
> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread simulation 
> test
> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread simulation 
> test
> FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread simulation 
> test
>
> on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread of
> gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 
> seconds
> eliminates the failures (as these test take ~16 seconds on 
> x86_64-apple-darwin11).
> Okay for gcc trunk?

As said elsewhere, this will double the amount of already large
logfile in case of failed test.

Do we really need such detailed log?

Uros.


[PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Jack Howarth
Currently we are failing...

FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread simulation 
test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread simulation 
test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread simulation 
test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread simulation 
test

on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread of
gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 seconds
eliminates the failures (as these test take ~16 seconds on 
x86_64-apple-darwin11).
Okay for gcc trunk?
 Jack

gcc/testsuite/

2011-12-07  Jack Howarth 

* lib/gcc-simulate-thread.exp (simulate-thread): Increase timeout
to 20 seconds.


Index: gcc/testsuite/lib/gcc-simulate-thread.exp
===
--- gcc/testsuite/lib/gcc-simulate-thread.exp   (revision 182083)
+++ gcc/testsuite/lib/gcc-simulate-thread.exp   (working copy)
@@ -56,8 +56,8 @@ proc simulate-thread { args } {
 
 set gdb_worked 0
 
-# Set timeout to 10 seconds due to huge amount of generated log.
-remote_expect target 10 {
+# Set timeout to 20 seconds due to huge amount of generated log.
+remote_expect target 20 {
# Too old GDB
-re "Unhandled dwarf expression|Error in sourced command file" {
unsupported "$testcase $message"


Re: Fix a bug in ThreadSanitizer pass (issue 5448109)

2011-12-07 Thread Xinliang David Li
ok for google/main.

David

On Wed, Dec 7, 2011 at 3:04 AM,   wrote:
> On 2011/12/05 17:05:17, dvyukov wrote:
>>
>> This is for google-main branch.
>> Fix taking address of SSA_NAME in ThreadSanitizer pass.
>
>
>> Index: gcc/tree-tsan.c
>> ===
>> --- gcc/tree-tsan.c     (revision 182014)
>> +++ gcc/tree-tsan.c     (working copy)
>> @@ -726,13 +726,20 @@
>>    struct mop_desc mop;
>>    unsigned fld_off;
>>    unsigned fld_size;
>> +  tree base;
>
>
>> +
>> +  base = get_base_address (expr);
>> +  if (base == NULL_TREE
>> +      || TREE_CODE (base) == SSA_NAME
>> +      || TREE_CODE (base) == STRING_CST)
>> +    return;
>> +
>>    tcode = TREE_CODE (expr);
>
>
>>    /* Below are things we do not instrument
>>       (no possibility of races or not implemented yet).  */
>>    if ((func_ignore & (tsan_ignore_mop | tsan_ignore_rec))
>> -      || get_base_address (expr) == NULL
>>        /* Compiler-emitted artificial variables.  */
>>        || (DECL_P (expr) && DECL_ARTIFICIAL (expr))
>>        /* The var does not live in memory -> no possibility of races.
>
> */
>>
>> Index: gcc/ChangeLog.google-main
>> ===
>> --- gcc/ChangeLog.google-main   (revision 182014)
>> +++ gcc/ChangeLog.google-main   (working copy)
>> @@ -1,3 +1,8 @@
>> +2011-12-05   Dmitriy Vyukov  
>>
>> +
>> +       Fix taking address of SSA_NAME in ThreadSanitizer pass.
>> +        * gcc/tree-tsan.c (handle_expr): Add the additional check.
>> +
>>  2011-11-30   Dmitriy Vyukov  
>
>
>>        Add directives to run ThreadSanitizer tests
>
>
>> --
>> This patch is available for review at
>
> http://codereview.appspot.com/5448109
>
> Sorry, I forgot to provide details about the crash.
>
> The compiler crashes on the following code (dump just before tsan pass):
>
> store_param_double (struct NET * net, struct MYSQL_BIND * param)
> {
>  double value;
>  unsigned char * D.26882;
>  long int D.26881;
>  long int D.26879;
>  unsigned char * D.26877;
>  double value.122;
>  void * D.26875;
>
> :
>  D.26875_2 = param_1(D)->buffer;
>  value.122_3 = MEM[(double *)D.26875_2];
>  D.26877_5 = net_4(D)->write_pos;
>  D.26879_7 = VIEW_CONVERT_EXPR doubleget_union>(value.122_3).m[0];
>  MEM[(long int *)D.26877_5] = D.26879_7;
>  D.26877_8 = net_4(D)->write_pos;
>  D.26881_11 = VIEW_CONVERT_EXPR doubleget_union>(value.122_3).m[1];
>  MEM[(long int *)D.26877_8 + 4B] = D.26881_11;
>  D.26877_12 = net_4(D)->write_pos;
>  D.26882_13 = D.26877_12 + 8;
>  net_4(D)->write_pos = D.26882_13;
>  return;
> }
>
> with the following error message:
>
> libmysql.c: In function 'store_param_double':
> libmysql.c:2314:13: error: conversion of an SSA_NAME on the left hand
> side
> VIEW_CONVERT_EXPR(D.29825_23);
> D.29824_24 = &VIEW_CONVERT_EXPR(D.29825_23).m[0];
> libmysql.c:2314:13: internal compiler error: verify_gimple failed
>
> Is it a correct fix for the issue? If base_address refers to SSA_NAME we
> just do not instrument it, right?
>
>
> http://codereview.appspot.com/5448109/


Re: [C++ PATCH] ICE with invalid user-defined literals (PR c++/51420)

2011-12-07 Thread Jason Merrill

Applied, thanks.

Jason


[Patch, Fortran] PR51448 [4.6/4.7] Fix realloc with RHS conversion function

2011-12-07 Thread Tobias Burnus
This fixes a -frealloc-lhs regression where the RHS is handled by a 
"conversion function" whose argument has component refs.


Build and regtested on x86-64-linux.
OK for the trunk and 4.7?

Tobias
2011-12-07  Tobias Burnus  

	PR fortran/51448
	* fortran/trans-array.c (get_std_lbound): Fix handling of
	conversion functions.

2011-12-07  Tobias Burnus  

	PR fortran/51448
	* gfortran.dg/realloc_on_assign_8.f90: New.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index ee8f896..2fd5749 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7428,7 +7584,16 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size)
 			  gfc_array_index_type, cond,
 			  lbound, gfc_index_one_node);
 }
-  else if (expr->expr_type == EXPR_VARIABLE)
+
+  if (expr->expr_type == EXPR_FUNCTION)
+{
+  /* A conversion function, so use the argument.  */
+  gcc_assert (expr->value.function.isym
+		  && expr->value.function.isym->conversion);
+  expr = expr->value.function.actual->expr;
+}
+
+  if (expr->expr_type == EXPR_VARIABLE)
 {
   tmp = TREE_TYPE (expr->symtree->n.sym->backend_decl);
   for (ref = expr->ref; ref; ref = ref->next)
@@ -7441,15 +7606,6 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size)
 	}
   return GFC_TYPE_ARRAY_LBOUND(tmp, dim);
 }
-  else if (expr->expr_type == EXPR_FUNCTION)
-{
-  /* A conversion function, so use the argument.  */
-  expr = expr->value.function.actual->expr;
-  if (expr->expr_type != EXPR_VARIABLE)
-	return gfc_index_one_node;
-  desc = TREE_TYPE (expr->symtree->n.sym->backend_decl);
-  return get_std_lbound (expr, desc, dim, assumed_size);
-}
 
   return gfc_index_one_node;
 }
--- /dev/null	2011-12-07 07:00:28.727532040 +0100
+++ gcc/gcc/testsuite/gfortran.dg/realloc_on_assign_8.f90	2011-12-07 16:34:46.0 +0100
@@ -0,0 +1,17 @@
+! { dg-do compile }
+!
+! PR fortran/51448
+!
+! Contribued by François Willot
+!
+  PROGRAM MAIN
+  IMPLICIT NONE
+  TYPE mytype
+REAL b(2)
+  END TYPE mytype
+  TYPE(mytype) a
+  DOUBLE PRECISION, ALLOCATABLE :: x(:)
+  ALLOCATE(x(2))
+  a%b=0.0E0
+  x=a%b
+  END


Re: [patch] ARM: Fix miscompilation in arm.md:*minmax_arithsi. (PR target/51408)

2011-12-07 Thread Richard Earnshaw
On 05/12/11 10:42, Richard Earnshaw wrote:
> On 04/12/11 13:32, kazu_hir...@mentor.com wrote:
>> Hi,
>>
>> Attached is a patch to fix miscompilation in
>> arm.md:*minmax_arithsi.
>>
>> The following testcase, reduced from efgcvt_r.c:fcvt_r in glibc, gets
>> miscompiled:
>>
>> extern void abort (void);
>>
>> int __attribute__((noinline))
>> foo (int a, int b)
>> {
>>   int max = (b > 0) ? b : 0;
>>   return max - a;
>> }
>>
>> int
>> main (void)
>> {
>>   if (foo (3, -1) != -3)
>> abort ();
>>   return 0;
>> }
>>
>> arm-none-eabi-gcc -O1 generates:
>>
>> foo:
>>  @ Function supports interworking.
>>  @ args = 0, pretend = 0, frame = 0
>>  @ frame_needed = 0, uses_anonymous_args = 0
>>  @ link register save eliminated.
>>  cmp r1, #0
>>  rsbge   r0, r0, r1
>>  bx  lr
>>
>> This would be equivalent to:
>>
>>   return b >= 0 ? b - a : a;
>>
>> which is different from:
>>
>>   return b >= 0 ? b - a : -a;
>>
>> That is, in assembly code, we should have an "else" clause like so:
>>
>>  cmp r1, #0
>>  rsbge   r0, r0, r1  <- then clause
>>  rsblt   r0, r0, #0  <- else clause
>>  bx  lr
>>
>> The problem comes from the fact that arm.md:*minmax_arithsi does not
>> add the "else" clause even though MINUS is not commutative.
>>
>> The patch fixes the problem by always requiring the "else" clause in
>> the MINUS case.
>>
>> Tested by running gcc testsuite on various ARM subarchitectures.  OK
>> to apply?
>>
>> Kazu Hirata
>>
>> gcc/
>> 2011-12-04  Kazu Hirata  
>>
>>  PR target/51408
>>  * config/arm/arm.md (*minmax_arithsi): Always require the else
>>  clause in the MINUS case.
>>
>> gcc/testsuite/
>> 2011-12-04  Kazu Hirata  
>>
>>  PR target/51408
>>  * gcc.dg/pr51408.c: New.
>>
> 
> OK.
> 
> R.

BTW, I would expect this to also exist in all the release branches.  Could you 
back-port it where
needed please.

TIA,

R.



Re: Tidy up MD_INCLUDES in config/arm/t-arm

2011-12-07 Thread Richard Earnshaw
On 01/12/11 11:09, Georg-Johann Lay wrote:
> Richard Earnshaw wrote:
>> On 29/11/11 09:42, Matthew Gretton-Dann wrote:
>>> All,
>>>
>>> Whilst developing the Cortex-A15 integer pipeline patch it was noted 
>>> that the MD_INCLUDES variable in config/arm/t-arm has not been kept 
>>> up-to-date.
>>>
>>> The attached patch fixes this, and rearranges the list of md files into 
>>> alphabetical order.
>>>
>>> The list was generated using `ls -1 *.md | grep -v arm\\.md`.
>>>
>>> Tested by doing a arm-none-eabi build.
>>>
>>> Can someone please review, and if appropriate apply?
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> gcc/ChangeLog:
>>> 2011-11-29  Matthew Gretton-Dann  
>>>
>>> * config/arm/t-arm (MD_INCLUDES): Ensure all md files are
>>> listed.
>>>
>>
>> OK.
>>
>> R.
> 
> Is each entry mandatory in that list?
> 
> I thought gen-tools already arrange for great part of MD_INCLUDES?
> 
> For example, after adding (include "avr-dimode.md") to avr.md, ./gcc/mddeps.mk
> reads:
> 
> MD_INCLUDES = \
>   ../../../gcc.gnu.org/trunk/gcc/config/avr/predicates.md \
>   ../../../gcc.gnu.org/trunk/gcc/config/avr/constraints.md \
>   ../../../gcc.gnu.org/trunk/gcc/config/avr/avr-dimode.md
> 
> ../../../gcc.gnu.org/trunk/gcc/config/avr/predicates.md:
> 
> ../../../gcc.gnu.org/trunk/gcc/config/avr/constraints.md:
> 
> ../../../gcc.gnu.org/trunk/gcc/config/avr/avr-dimode.md:
> 
> so that maintaining such a list might be considerable easier.
> 

Indeed.  I hadn't realised that that feature had been added since I first added 
that macro.  It does
indeed look as though the whole definition of MD_INCLUDES is now redundant.

R.




Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Jakub Jelinek
On Wed, Dec 07, 2011 at 04:16:25PM +, Joseph S. Myers wrote:
> On Wed, 7 Dec 2011, Richard Guenther wrote:
> 
> > code - well, that's ok.  Pointing people to the latest official
> > release series (in this case 4.6.x) is also ok, we're keeping too
> > many branches active IMNSHO.
> 
> As I recall we agreed in London that both 4.3 and 4.4 should be closed 
> (after a final release from each branch) - and in general two active 
> release branches was enough - but so far have only closed 4.3.
> 
> (Closed = no datestamp updates, no snapshots, no releases, no tracking 
> regression status in Bugzilla, not absolutely disallowing commits if 
> someone feels it's still useful to put some fix on an old branch.)

I'll handle 4.4.7 and closing the branch in the coming weeks.

Jakub


Re: [PATCH][ARM] one_cmpldi2 in NEON

2011-12-07 Thread Richard Earnshaw
On 06/12/11 17:59, Andrew Stubbs wrote:
> This patch adds a one's complement pattern for doing DImode 'not' in 
> NEON registers.
> 
> There are already patterns for doing one's complement of vectors, and 
> even though it boils down to the same instruction, the DImode case was 
> missing.
> 
> The patch needs to be a little more complicated than using a mode 
> iterator that includes DI because it needs to coexist with the non-neon 
> one_cmpldi2 (renamed by this patch to "one_cmpldi2_core").
> 
> OK for when stage 1 opens again?
> 
> Andrew
> 
> 
> neon-not.patch
> 
> 
> 2011-12-06  Andrew Stubbs  
> 
>   gcc/
>   * config/arm/arm.md (one_cmpldi2): Rename to ...
>   (one_cmpldi2_core): ... this, and modify it to prevent it being
>   used for NEON.
>   (one_cmpldi2): New define_expand.
>   * config/arm/neon.md (one_cmpldi2_neon): New define_insn.


> +(define_insn_and_split "*one_cmpldi2_core"
> +  [(set (match_operand:DI 0 "arm_general_register_operand" "=&r,&r")
> + (not:DI (match_operand:DI 1 "arm_general_register_operand" "0,r")))]

Thinking about it, for an operation with one input and one output, there's no 
need for the
earlyclobber marker when the input is tied to the output (there's no other 
operand that can be
clobbered).

Otherwise OK.

R.



Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo
On Wed, Dec 7, 2011 at 11:16, Richard Guenther  wrote:

> I'm going to apply it tomorrow, when full testing hopefully finished

Sure.  But remember the zombie kitties!


Diego.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Diego Novillo wrote:

> On 12/07/11 10:54, Richard Guenther wrote:
> > On Wed, 7 Dec 2011, Diego Novillo wrote:
> > 
> > > On 12/07/11 10:46, Richard Guenther wrote:
> > > > On Wed, 7 Dec 2011, Diego Novillo wrote:
> > > > 
> > > > > On 12/07/11 09:52, Richard Guenther wrote:
> > > > > 
> > > > > > Index: gcc/lto-streamer-out.c
> > > > > > ===
> > > > > > *** gcc/lto-streamer-out.c  (revision 182081)
> > > > > > --- gcc/lto-streamer-out.c  (working copy)
> > > > > > *** tree_is_indexable (tree t)
> > > > > > *** 129,134 
> > > > > > --- 129,144 
> > > > > >else if (TREE_CODE (t) == VAR_DECL&&decl_function_context
> > > > > > (t)
> > > > > > &&!TREE_STATIC (t))
> > > > > >  return false;
> > > > > > +   /* If this is a decl generated for block local externs for
> > > > > > +  debug info generation, stream it unshared alongside
> > > > > > BLOCK_VARS.
> > > > > > */
> > > > > > +   else if (VAR_OR_FUNCTION_DECL_P (t)
> > > > > > +  /* ???  The following tests are a literal match on what
> > > > > > + c-decl.c:pop_scope does.  */
> > > > > 
> > > > > Factor it into a common routine then?
> > > > 
> > > > pop_scope of course _sets_ the values that way, it doesn't test them.
> > > 
> > > Yes.  I meant factoring, so future users have something to call, instead
> > > of
> > > three seemingly random flag checks.  pop_scope could also be calling some
> > > complementary setter instead of doing the seemingly random flag setting.
> > 
> > I don't see a good way to factor out the flags setting.  Factoring out
> > the copying maybe.  But well... we can do that when a 2nd user
> > comes along?
> 
> The problem is that the 2nd user will cut-n-paste from this one. However, if
> you find adding a little function too strenuous, I guess it's not too big a
> deal.

Testing this kind of patches turns out to be quite time-consuming
(I do a LTO bootstrap and two SPEC 2k6 builds (-g and -g0)), so
yeah ...

The following is what I ended up LTO bootstrapping (finished ok),
testing is still in progress, as is SPEC 2k6 build.

It fixes PR49945 in a similar way, by streaming VLA types locally as well.

I'm going to apply it tomorrow, when full testing hopefully finished

Thanks,
Richard.

2011-12-08  Richard Guenther  

PR lto/48437
PR lto/49945
* lto-streamer-out.c (tree_is_indexable): Exclude block-local
extern declarations.

PR lto/48437
* gcc.dg/lto/20111207-2_0.c: New testcase.
* gcc.dg/guality/pr48437.c: Likewise.

Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,147 
else if (TREE_CODE (t) == VAR_DECL && decl_function_context (t)
   && !TREE_STATIC (t))
  return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */
+  && TREE_PUBLIC (t)
+  && DECL_EXTERNAL (t)
+  && DECL_CONTEXT (t)
+  && TREE_CODE (DECL_CONTEXT (t)) == FUNCTION_DECL)
+ return false;
+   else if (TYPE_P (t)
+  && variably_modified_type_p (t, NULL_TREE))
+ return false;
else
  return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME);
  }
Index: gcc/testsuite/gcc.dg/lto/20111207-2_0.c
===
*** gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
--- gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
***
*** 0 
--- 1,17 
+ /* { dg-lto-do run } */
+ 
+ int
+ test (void)
+ {
+   int f (void);
+   return 0;
+ }
+ 
+ int
+ main (void)
+ {
+   int f (void);
+   int test (void);
+ 
+   return test ();
+ }
Index: gcc/testsuite/gcc.dg/guality/pr48437.c
===
*** gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
--- gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
***
*** 0 
--- 1,17 
+ /* PR lto/48437 */
+ /* { dg-do run } */
+ /* { dg-options "-g" } */
+ 
+ #include "../nop.h"
+ 
+ int i __attribute__((used));
+ int main()
+ {
+   volatile int i;
+   for (i = 3; i < 7; ++i)
+ {
+   extern int i;
+   asm volatile (NOP : : : "memory"); /* { dg-final { gdb-test 14 "i" "0" 
} } */
+ }
+   return 0;
+ }


Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Joseph S. Myers
On Wed, 7 Dec 2011, Richard Guenther wrote:

> code - well, that's ok.  Pointing people to the latest official
> release series (in this case 4.6.x) is also ok, we're keeping too
> many branches active IMNSHO.

As I recall we agreed in London that both 4.3 and 4.4 should be closed 
(after a final release from each branch) - and in general two active 
release branches was enough - but so far have only closed 4.3.

(Closed = no datestamp updates, no snapshots, no releases, no tracking 
regression status in Bugzilla, not absolutely disallowing commits if 
someone feels it's still useful to put some fix on an old branch.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo

On 12/07/11 10:54, Richard Guenther wrote:

On Wed, 7 Dec 2011, Diego Novillo wrote:


On 12/07/11 10:46, Richard Guenther wrote:

On Wed, 7 Dec 2011, Diego Novillo wrote:


On 12/07/11 09:52, Richard Guenther wrote:


Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
   else if (TREE_CODE (t) == VAR_DECL&&decl_function_context (t)
&&!TREE_STATIC (t))
 return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.
*/
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */


Factor it into a common routine then?


pop_scope of course _sets_ the values that way, it doesn't test them.


Yes.  I meant factoring, so future users have something to call, instead of
three seemingly random flag checks.  pop_scope could also be calling some
complementary setter instead of doing the seemingly random flag setting.


I don't see a good way to factor out the flags setting.  Factoring out
the copying maybe.  But well... we can do that when a 2nd user
comes along?


The problem is that the 2nd user will cut-n-paste from this one. 
However, if you find adding a little function too strenuous, I guess 
it's not too big a deal.



Diego.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Diego Novillo wrote:

> On 12/07/11 10:46, Richard Guenther wrote:
> > On Wed, 7 Dec 2011, Diego Novillo wrote:
> > 
> > > On 12/07/11 09:52, Richard Guenther wrote:
> > > 
> > > > Index: gcc/lto-streamer-out.c
> > > > ===
> > > > *** gcc/lto-streamer-out.c  (revision 182081)
> > > > --- gcc/lto-streamer-out.c  (working copy)
> > > > *** tree_is_indexable (tree t)
> > > > *** 129,134 
> > > > --- 129,144 
> > > >   else if (TREE_CODE (t) == VAR_DECL&&   decl_function_context (t)
> > > > &&   !TREE_STATIC (t))
> > > > return false;
> > > > +   /* If this is a decl generated for block local externs for
> > > > +  debug info generation, stream it unshared alongside BLOCK_VARS.
> > > > */
> > > > +   else if (VAR_OR_FUNCTION_DECL_P (t)
> > > > +  /* ???  The following tests are a literal match on what
> > > > + c-decl.c:pop_scope does.  */
> > > 
> > > Factor it into a common routine then?
> > 
> > pop_scope of course _sets_ the values that way, it doesn't test them.
> 
> Yes.  I meant factoring, so future users have something to call, instead of
> three seemingly random flag checks.  pop_scope could also be calling some
> complementary setter instead of doing the seemingly random flag setting.

I don't see a good way to factor out the flags setting.  Factoring out
the copying maybe.  But well... we can do that when a 2nd user
comes along?

Richard.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo

On 12/07/11 10:46, Richard Guenther wrote:

On Wed, 7 Dec 2011, Diego Novillo wrote:


On 12/07/11 09:52, Richard Guenther wrote:


Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
  else if (TREE_CODE (t) == VAR_DECL&&   decl_function_context (t)
&&   !TREE_STATIC (t))
return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */


Factor it into a common routine then?


pop_scope of course _sets_ the values that way, it doesn't test them.


Yes.  I meant factoring, so future users have something to call, instead 
of three seemingly random flag checks.  pop_scope could also be calling 
some complementary setter instead of doing the seemingly random flag 
setting.



Diego.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Diego Novillo wrote:

> On 12/07/11 09:52, Richard Guenther wrote:
> 
> > Index: gcc/lto-streamer-out.c
> > ===
> > *** gcc/lto-streamer-out.c  (revision 182081)
> > --- gcc/lto-streamer-out.c  (working copy)
> > *** tree_is_indexable (tree t)
> > *** 129,134 
> > --- 129,144 
> >  else if (TREE_CODE (t) == VAR_DECL&&  decl_function_context (t)
> > &&  !TREE_STATIC (t))
> >return false;
> > +   /* If this is a decl generated for block local externs for
> > +  debug info generation, stream it unshared alongside BLOCK_VARS.  */
> > +   else if (VAR_OR_FUNCTION_DECL_P (t)
> > +  /* ???  The following tests are a literal match on what
> > + c-decl.c:pop_scope does.  */
> 
> Factor it into a common routine then?

pop_scope of course _sets_ the values that way, it doesn't test them.

Richard.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo

On 12/07/11 09:52, Richard Guenther wrote:


Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
 else if (TREE_CODE (t) == VAR_DECL&&  decl_function_context (t)
&&  !TREE_STATIC (t))
   return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */


Factor it into a common routine then?

Diego.


[SMS, DDG] Correctly delete anti-dep edges for renaming

2011-12-07 Thread Roman Zhuykov
This letter contains second separate patch for ddg.c
It prevents removing some edges in data dependency graph
when renaming is allowed.

2011/10/12 Ayal Zaks :
>>> The other bug happens only with -fmodulo-sched-allow-regmoves.  Here we
>>> eliminate some anti-dependence edges in data dependency graph in order to
>>> resolve them later by adding some register moves (renaming instructions).  
>>> But
>>> in situation as in example above compiler gives an ICE because it can't 
>>> create
>>> a register move, when regR is hardware flag register.  So we have to know 
>>> which
>>> register(s) cause anti-dependency in order to understand whether we can 
>>> ignore
>>> it.  I can't find any easy way to gather this information, so I create my 
>>> own
>>> structures to store this info and had implemented my own hooks for
>>> sched_analyze function.  This leads to more complex interconnection between
>>> ddg.c and modulo-sched.c.
>
> Well, having ddg.c's/create_ddg_dep_from_intra_loop_link() consult
> "Register sets from modulo scheduler structures" to determine if an
> anti-dep can be eliminated, seems awkward. One should be able to build
> a ddg independent of any modulo scheduler structures.

I think now I found a proper way to gather information about
the register(s) causing anti-dependency (see the patch attached).

> One simple solution is to keep all anti-dep edges of registers that
> are clobbered anywhere in the loop. This might be overly conservative,
> but perhaps not so if "On x86_64 it is clobbered by most of arithmetic
> instructions".
> If an anti-dep between a use and a dep of a clobber register is to be
> removed, a dependence should be created between the use and a
> clobbering instruction following the def. Hope this is clear.
>
> This too should be solved by a dedicated (separate) patch, for easier 
> digestion.
>
> Presumably, the ddg already has all intra-iteration edges preventing
> motion of clobbering instructions within an iteration, and we are only
> missing inter-iteration deps or deps surfaced by eliminating
> anti-deps, right?
>
> You might consider introducing a new type of dependence for such
> edges, "Clobber", if it would help.
>

Ayal, I suppose here we have some misunderstanding.
This is not directly related to the problem of clobbers discussed previously.
Here I talk about the need to find out the register causing the
anti-dependence (for instance, we need to check whether it can be renamed).
Currently, SMS simply attempts to take the RHS of the second instruction (via
single_set()), and if it's a register, SMS assumes it's a register causing the
dependency. This breaks in a following scenario:

insn1: ... (read flags)
insn2: regA = regB - regC; update flags

Here, SMS would wrongly take regA as the dependency register, while it was in
fact the flags register. So, I rewrote this part in a different way and make
a separate patch.

--
Roman Zhuykov
zhr...@ispras.ru
2011-12-07  Roman Zhuykov  
	* ddg.c (create_ddg_dep_from_intra_loop_link): Correclty determine
	which register causes anti-dependency.  Prevent removing anti-dep
	edge from ddg when hard_register_p.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..0a3726f 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -204,23 +204,34 @@ create_ddg_dep_from_intra_loop_link (ddg_ptr g, ddg_node_ptr src_node,
   && (t == ANTI_DEP && dt == REG_DEP)
   && !autoinc_var_is_used_p (dest_node->insn, src_node->insn))
 {
-  rtx set;
+  df_ref *def_rec;
+  bool can_delete_dep = true;
 
-  set = single_set (dest_node->insn);
-  /* TODO: Handle registers that REG_P is not true for them, i.e.
+  /* TODO: Handle !REG_P register correctly, i.e.
  subregs and special registers.  */
-  if (set && REG_P (SET_DEST (set)))
-{
-  int regno = REGNO (SET_DEST (set));
-  df_ref first_def;
-  struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
+  for (def_rec = DF_INSN_DEFS (dest_node->insn); *def_rec; def_rec++)
+	{
+	  df_ref first_def, def = *def_rec, *use_rec;
+	  unsigned regno = DF_REF_REGNO (def);
+	  struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
+	  bool src_uses = false;
 
-  first_def = df_bb_regno_first_def_find (g->bb, regno);
-  gcc_assert (first_def);
+	  for (use_rec = DF_INSN_USES (src_node->insn); *use_rec; use_rec++)
+	if (regno == DF_REF_REGNO (*use_rec))
+	  src_uses = true;
 
-  if (bitmap_bit_p (&bb_info->gen, DF_REF_ID (first_def)))
-return;
-}
+	  if (!src_uses)
+	continue;
+
+	  first_def = df_bb_regno_first_def_find (g->bb, regno);
+	  gcc_assert (first_def);
+
+	  if (HARD_REGISTER_NUM_P (regno)
+	  || !bitmap_bit_p (&bb_info->gen, DF_REF_ID (first_def)))
+	can_delete_dep = false;
+	}
+  if (can_delete_dep)
+	return;
 }
 
latency = dep_cost (link);


[SMS, DDG] Additional edges to instructions with clobber

2011-12-07 Thread Roman Zhuykov
Apologies for the messed up previous e-mails.

This letter contains the first separate patch for ddg.c
It creates additional nessesary anti-dep edges in data dependency graph.

2011/10/12 Ayal Zaks :
>>> The following situation happens when SMS is enabled without register 
>>> renaming
>>> (-fno-modulo-sched-allow-regmoves).  When data dependency graph is built, 
>>> there
>>> is a step when we generate anti-dependencies from last register use to first
>>> write of this register at the next iteration.
>
> is a step when we generate anti-dependencies from all last registers
> uses (i.e., of last register def) to first write of this register at
> the next iteration.
>

Right.

>>> At this moment we should also
>>> create such dependencies to all instructions which clobber the register to
>>> prevent this clobbers being before last use is new schedule.
>>>
>
> well, we simply need to connect these last uses to either the first
> write *or* the first clobber of this register at the next iteration,
> according to whichever is first, no?

No, is works now just how you describe. Clobber is considered as a write,
and last uses are connected with first write or first clobber.
But this is not sufficient: similarly to how there's no dependencies
between last uses,
there shall be no dependency between "first" clobbers (i.e. clobbers
of a register
can be permuted freely). So at least we have to make an additional dependency
edge to each clobber before first write.

>>> Here is an model of example:
>>>
>>> loop {
>>> set1 regR
>>> use1 regR
>>> clobber regR
>>> set2 regR
>>> use2 regR
>>> }
>>>
>>> If we create only use2->set1 anti-dependency (and no use2->cloober) the
>>> following wrong schedule is possible:
>>>
>>> prologue {
>>> set1 regR
>>> use1 regR
>>> clobber regR
>>> }
>>> kernel {
>>> set2 regR
>>> clobber regR (instruction from next iteration in terms of old loop kernel)
>>> use2 regR
>>> set1 regR (also from next iteration)
>>> use1 regR (also from next iteration)
>>> }
>>> epilogue {
>>> set2 regR
>>> use2 regR
>>> }
>>>
>
> strange; according to prolog (and epilog), clobber should appear after
> use1 in the kernel, no? Aren't there (intra-iteration) dependencies
> preventing the clobber to skip over use1 and/or set1?
>

Yeah, this is bad example - I wrongly suggested that there is no
anti-dep between use1 and clobber.

New example:
loop {
 clobber1
 clobber2
 set
 use
}
When there is no "use->clobber2" anti-dep - the following wrong schedule
is possible. Clobber2 is on stage0, other instructions are on stage1.
prologue {
 clobber2
}
kernel {
 clobber1
 set
 clobber2
 use
}
epilogue {
 clobber1
 set
 use
}

>>> This problem was succesfully fixed by creating a vector of all clobbering
>>> instructions together with first write and adding all needed dependencies.
>>>
>
> seems like an overkill to me; we didn't draw an edge between every
> last use and every write, because writes are kept in order by having
> output dependences between them. So should be the case with clobbers.

Clobbers themselves aren't kept in order - there are no output dependences
between them. They may be scheduled in any order.

> Presumably, the ddg already has all intra-iteration edges preventing
> motion of clobbering instructions within an iteration, and we are only
> missing inter-iteration deps or deps surfaced by eliminating
> anti-deps, right?

I think the new version of a fix suits this reasoning.
Now it creates dependencies only to clobber instructions before first write.
And certainly to first write instruction itself.

--
Roman Zhuykov
zhr...@ispras.ru
2011-12-07  Roman Zhuykov  
	* ddg.c: New VEC.
	(add_cross_iteration_register_deps): Store information about
	clobbers before first write to a register.  Use collected
	information to create anti-dependencies from last uses.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..e3e065c 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 /* A flag indicating that a ddg edge belongs to an SCC or not.  */
 enum edge_flag {NOT_IN_SCC = 0, IN_SCC};
 
+/* A vector of dependencies needed while processing clobbers.  */
+DEF_VEC_P(df_ref);
+DEF_VEC_ALLOC_P(df_ref,heap);
+
 /* Forward declarations.  */
 static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr);
 static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr);
@@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to,
 static void
 add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 {
-  int regno = DF_REF_REGNO (last_def);
+  unsigned int regno = DF_REF_REGNO (last_def);
   struct df_link *r_use;
   int has_use_in_bb_p = false;
-  rtx def_insn = DF_REF_INSN (last_def);
+  rtx insn, def_insn = DF_REF_INSN (last_def);
   ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn);
   ddg_node_ptr use_node;
 #ifdef ENABLE_CHECKING
   struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
 #endif
-  df_ref first_def = df_bb_reg

Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)

2011-12-07 Thread Jason Merrill

On 12/07/2011 10:18 AM, Jakub Jelinek wrote:

On Wed, Dec 07, 2011 at 10:10:08AM -0500, Jason Merrill wrote:

On 12/06/2011 02:48 PM, Jakub Jelinek wrote:

-  gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type));
+  gcc_assert (!processing_template_decl
+ || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE));


How about SCALAR_TYPE_P (strip_array_types (type))?  OK with that change.


That would be readable, but expensive - SCALAR_TYPE_P evaluates
its argument 11 times.


Good point.  Your patch is OK, then.

Jasno



Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)

2011-12-07 Thread Jakub Jelinek
On Wed, Dec 07, 2011 at 10:10:08AM -0500, Jason Merrill wrote:
> On 12/06/2011 02:48 PM, Jakub Jelinek wrote:
> >-  gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type));
> >+  gcc_assert (!processing_template_decl
> >+  || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE));
> 
> How about SCALAR_TYPE_P (strip_array_types (type))?  OK with that change.

That would be readable, but expensive - SCALAR_TYPE_P evaluates
its argument 11 times.

Jakub


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Jakub Jelinek wrote:

> On Wed, Dec 07, 2011 at 03:52:35PM +0100, Richard Guenther wrote:
> > *** gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
> > --- gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
> > ***
> > *** 0 
> > --- 1,15 
> > + /* PR lto/48437 */
> > + /* { dg-do run } */
> > + /* { dg-options "-g" } */
> > + 
> > + int i __attribute__((used));
> > + int main()
> > + {
> > +   volatile int i;
> > +   for (i = 3; i < 7; ++i)
> > + {
> > +   extern int i; /* { dg-final { gdb-test 7 "i" "0" } } */
> > +   asm volatile ("" : : : "memory");
> 
> Do you really want to test it on line 7 ({ of after main())?

Oops - so much for adding the /* dg */ stuff late ...

> I'd expect you want to look at it on the line where asm volatile is,
> and make sure you can put a breakpoint on it:
> #include "../nop.h"
> early and use NOP instead of "" as the asm pattern.

Changed.

Thanks,
Richard.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Jakub Jelinek
On Wed, Dec 07, 2011 at 03:52:35PM +0100, Richard Guenther wrote:
> *** gcc/testsuite/gcc.dg/guality/pr48437.c(revision 0)
> --- gcc/testsuite/gcc.dg/guality/pr48437.c(revision 0)
> ***
> *** 0 
> --- 1,15 
> + /* PR lto/48437 */
> + /* { dg-do run } */
> + /* { dg-options "-g" } */
> + 
> + int i __attribute__((used));
> + int main()
> + {
> +   volatile int i;
> +   for (i = 3; i < 7; ++i)
> + {
> +   extern int i; /* { dg-final { gdb-test 7 "i" "0" } } */
> +   asm volatile ("" : : : "memory");

Do you really want to test it on line 7 ({ of after main())?
I'd expect you want to look at it on the line where asm volatile is,
and make sure you can put a breakpoint on it:
#include "../nop.h"
early and use NOP instead of "" as the asm pattern.

Jakub


Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)

2011-12-07 Thread Jason Merrill

On 12/06/2011 02:48 PM, Jakub Jelinek wrote:

-  gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type));
+  gcc_assert (!processing_template_decl
+ || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE));


How about SCALAR_TYPE_P (strip_array_types (type))?  OK with that change.

Jason



[SMS, DDG] Additional edges to instructions with clobber

2011-12-07 Thread Roman Zhuykov
Apologies for the messed up previous e-mail.

This letter contains the first separate patch for ddg.cIt creates
additional nessesary anti-dep edges in data dependency graph.
2011/10/12 Ayal Zaks :>>> The following situation
happens when SMS is enabled without register renaming>>>
(-fno-modulo-sched-allow-regmoves).  When data dependency graph is
built, there>>> is a step when we generate anti-dependencies from last
register use to first>>> write of this register at the next
iteration.>> is a step when we generate anti-dependencies from all
last registers> uses (i.e., of last register def) to first write of
this register at> the next iteration.>
Right.
>>> At this moment we should also>>> create such dependencies to all 
>>> instructions which clobber the register to>>> prevent this clobbers being 
>>> before last use is new schedule.> well, we simply need to connect these 
>>> last uses to either the first> write *or* the first clobber of this 
>>> register at the next iteration,> according to whichever is first, no?
No, is works now just how you describe. Clobber is considered as a
write,and last uses are connected with first write or first
clobber.But this is not sufficient: similarly to how there's no
dependenciesbetween last uses,there shall be no dependency between
"first" clobbers (i.e. clobbersof a registercan be permuted freely).
So at least we have to make an additional dependencyedge to each
clobber before first write.
>>> Here is an model of example:>> loop {>>> set1 regR>>> use1 regR>>> 
>>> clobber regR>>> set2 regR>>> use2 regR>>> }>> If we create only 
>>> use2->set1 anti-dependency (and no use2->cloober) the>>> following wrong 
>>> schedule is possible:>> prologue {>>> set1 regR>>> use1 regR>>> clobber 
>>> regR>>> }>>> kernel {>>> set2 regR>>> clobber regR (instruction from next 
>>> iteration in terms of old loop kernel)>>> use2 regR>>> set1 regR (also from 
>>> next iteration)>>> use1 regR (also from next iteration)>>> }>>> epilogue 
>>> {>>> set2 regR>>> use2 regR>>> }> strange; according to prolog (and 
>>> epilog), clobber should appear after> use1 in the kernel, no? Aren't there 
>>> (intra-iteration) dependencies> preventing the clobber to skip over use1 
>>> and/or set1?>
Yeah, this is bad example - I wrongly suggested that there is
noanti-dep between use1 and clobber.
New example:loop { clobber1 clobber2 set use}When there is no
"use->clobber2" anti-dep - the following wrong scheduleis possible.
Clobber2 is on stage0, other instructions are on stage1.prologue {
clobber2}kernel { clobber1 set clobber2 use}epilogue { clobber1 set
use}
>>> This problem was succesfully fixed by creating a vector of all 
>>> clobbering>>> instructions together with first write and adding all needed 
>>> dependencies.> seems like an overkill to me; we didn't draw an edge 
>>> between every> last use and every write, because writes are kept in order 
>>> by having> output dependences between them. So should be the case with 
>>> clobbers.
Clobbers themselves aren't kept in order - there are no output
dependencesbetween them. They may be scheduled in any order.
> Presumably, the ddg already has all intra-iteration edges preventing> motion 
> of clobbering instructions within an iteration, and we are only> missing 
> inter-iteration deps or deps surfaced by eliminating> anti-deps, right?
I think the new version of a fix suits this reasoning.Now it creates
dependencies only to clobber instructions before first write.And
certainly to first write instruction itself.
--Roman zhuykovzhr...@ispras.ru
2011-12-07  Roman Zhuykov  
	* ddg.c: New VEC.
	(add_cross_iteration_register_deps): Store information about
	clobbers before first write to a register.  Use collected
	information to create anti-dependencies from last uses.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..e3e065c 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 /* A flag indicating that a ddg edge belongs to an SCC or not.  */
 enum edge_flag {NOT_IN_SCC = 0, IN_SCC};
 
+/* A vector of dependencies needed while processing clobbers.  */
+DEF_VEC_P(df_ref);
+DEF_VEC_ALLOC_P(df_ref,heap);
+
 /* Forward declarations.  */
 static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr);
 static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr);
@@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to,
 static void
 add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 {
-  int regno = DF_REF_REGNO (last_def);
+  unsigned int regno = DF_REF_REGNO (last_def);
   struct df_link *r_use;
   int has_use_in_bb_p = false;
-  rtx def_insn = DF_REF_INSN (last_def);
+  rtx insn, def_insn = DF_REF_INSN (last_def);
   ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn);
   ddg_node_ptr use_node;
 #ifdef ENABLE_CHECKING
   struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
 #endif
-  df_ref first_def = df_bb_regno_first_def_find

Re: [PATCH] Fix PR tree-optimization/51315

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 3:32 PM, Eric Botcazou  wrote:
>> You are basically (but non-optimally) locally re-implementing
>> what expr.c:get_object_or_type_alignment does, for the
>> bare MEM_REF case (you don't consider offsets that
>> do not change the alignment, nor alignment information
>> present on the SSA name).
>
> Adjusted version attached, regression-tested on SPARC/Solaris.  I'll do a full
> testing cycle if it is OK for mainline.

Yes, the patch is ok for mainline.

> get_object_or_type_alignment doesn't exist on the 4.6 branch, the expanders 
> for
> MEM_REF and TARGET_MEM_REF do:
>
>  MAX (TYPE_ALIGN (TREE_TYPE (exp)),
>       get_object_alignment (exp, BIGGEST_ALIGNMENT)))
>
> instead, so I presume I should do the same for them in tree_non_aligned_mem_p?

Yeah, you can do that.

Thanks,
Richard.

>
> 2011-12-07  Eric Botcazou  
>
>        PR tree-optimization/51315
>        * tree.h (get_object_or_type_alignment): Declare.
>        * expr.c (get_object_or_type_alignment): Move to...
>        * builtins.c (get_object_or_type_alignment): ...here.  Add assertion.
>        * tree-sra.c (tree_non_mode_aligned_mem_p): Rename to...
>        (tree_non_aligned_mem_p): ...this.  Add ALIGN parameter.  Look into
>        MEM_REFs and use get_object_or_type_alignment for them.
>        (build_accesses_from_assign): Adjust for above change.
>        (access_precludes_ipa_sra_p): Likewise.
>
>
>
> --
> Eric Botcazou


[PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther

I'm testing the following patch for PR48437 - the C frontend generates
decl copies for placing them in BLOCK_VARS for

int
test (void)
{
  extern int f (void);
  return 0;
}

we currently end up putting those into the decl merging machinery,
causing DECL_CHAIN re-use and subsequent crashes.  By not indexing
them we make sure to keep them separate.

LTO bootstrap & regtest pending on x86_64-unknown-linux-gnu, ok?

[Similar fix has to be employed for VLA types]

Thanks,
Richard.

2011-12-07  Richard Guenther  

PR lto/48437
* lto-streamer-out.c (tree_is_indexable): Exclude block-local
extern declarations.

* gcc.dg/lto/20111207-2_0.c: New testcase.
* gcc.dg/guality/pr48437.c: Likewise.

Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
else if (TREE_CODE (t) == VAR_DECL && decl_function_context (t)
   && !TREE_STATIC (t))
  return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */
+  && TREE_PUBLIC (t)
+  && DECL_EXTERNAL (t)
+  && DECL_CONTEXT (t)
+  && TREE_CODE (DECL_CONTEXT (t)) == FUNCTION_DECL)
+ return false;
else
  return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME);
  }
Index: gcc/testsuite/gcc.dg/lto/20111207-2_0.c
===
*** gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
--- gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
***
*** 0 
--- 1,17 
+ /* { dg-lto-do run } */
+ 
+ int
+ test (void)
+ {
+   int f (void);
+   return 0;
+ }
+ 
+ int
+ main (void)
+ {
+   int f (void);
+   int test (void);
+ 
+   return test ();
+ }
Index: gcc/testsuite/gcc.dg/guality/pr48437.c
===
*** gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
--- gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
***
*** 0 
--- 1,15 
+ /* PR lto/48437 */
+ /* { dg-do run } */
+ /* { dg-options "-g" } */
+ 
+ int i __attribute__((used));
+ int main()
+ {
+   volatile int i;
+   for (i = 3; i < 7; ++i)
+ {
+   extern int i; /* { dg-final { gdb-test 7 "i" "0" } } */
+   asm volatile ("" : : : "memory");
+ }
+   return 0;
+ }


[SMS, DDG] Additional edges to instructions with clobber

2011-12-07 Thread Roman Zhuykov
This letter contains the first separate patch for ddg.cIt creates
additional nessesary anti-dep edges in data dependency graph.
2011/10/12 Ayal Zaks :>>> The following situation
happens when SMS is enabled without register renaming>>>
(-fno-modulo-sched-allow-regmoves).  When data dependency graph is
built, there>>> is a step when we generate anti-dependencies from last
register use to first>>> write of this register at the next
iteration.>> is a step when we generate anti-dependencies from all
last registers> uses (i.e., of last register def) to first write of
this register at> the next iteration.>
Right.
>>> At this moment we should also>>> create such dependencies to all 
>>> instructions which clobber the register to>>> prevent this clobbers being 
>>> before last use is new schedule.> well, we simply need to connect these 
>>> last uses to either the first> write *or* the first clobber of this 
>>> register at the next iteration,> according to whichever is first, no?
No, is works now just how you describe. Clobber is considered as a
write,and last uses are connected with first write or first
clobber.But this is not sufficient: similarly to how there's no
dependenciesbetween last uses,there shall be no dependency between
"first" clobbers (i.e. clobbersof a registercan be permuted freely).
So at least we have to make an additional dependencyedge to each
clobber before first write.
>>> Here is an model of example:>> loop {>>> set1 regR>>> use1 regR>>> 
>>> clobber regR>>> set2 regR>>> use2 regR>>> }>> If we create only 
>>> use2->set1 anti-dependency (and no use2->cloober) the>>> following wrong 
>>> schedule is possible:>> prologue {>>> set1 regR>>> use1 regR>>> clobber 
>>> regR>>> }>>> kernel {>>> set2 regR>>> clobber regR (instruction from next 
>>> iteration in terms of old loop kernel)>>> use2 regR>>> set1 regR (also from 
>>> next iteration)>>> use1 regR (also from next iteration)>>> }>>> epilogue 
>>> {>>> set2 regR>>> use2 regR>>> }> strange; according to prolog (and 
>>> epilog), clobber should appear after> use1 in the kernel, no? Aren't there 
>>> (intra-iteration) dependencies> preventing the clobber to skip over use1 
>>> and/or set1?>
Yeah, this is bad example - I wrongly suggested that there is
noanti-dep between use1 and clobber.
New example:loop { clobber1 clobber2 set use}When there is no
"use->clobber2" anti-dep - the following wrong scheduleis possible.
Clobber2 is on stage0, other instructions are on stage1.prologue {
clobber2}kernel { clobber1 set clobber2 use}epilogue { clobber1 set
use}
>>> This problem was succesfully fixed by creating a vector of all 
>>> clobbering>>> instructions together with first write and adding all needed 
>>> dependencies.> seems like an overkill to me; we didn't draw an edge 
>>> between every> last use and every write, because writes are kept in order 
>>> by having> output dependences between them. So should be the case with 
>>> clobbers.
Clobbers themselves aren't kept in order - there are no output
dependencesbetween them. They may be scheduled in any order.
> Presumably, the ddg already has all intra-iteration edges preventing> motion 
> of clobbering instructions within an iteration, and we are only> missing 
> inter-iteration deps or deps surfaced by eliminating> anti-deps, right?
I think the new version of a fix suits this reasoning.Now it creates
dependencies only to clobber instructions before first write.And
certainly to first write instruction itself.
--Roman zhuykovzhr...@ispras.ru
2011-12-07  Roman Zhuykov  
	* ddg.c: New VEC.
	(add_cross_iteration_register_deps): Store information about
	clobbers before first write to a register.  Use collected
	information to create anti-dependencies from last uses.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..e3e065c 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 /* A flag indicating that a ddg edge belongs to an SCC or not.  */
 enum edge_flag {NOT_IN_SCC = 0, IN_SCC};
 
+/* A vector of dependencies needed while processing clobbers.  */
+DEF_VEC_P(df_ref);
+DEF_VEC_ALLOC_P(df_ref,heap);
+
 /* Forward declarations.  */
 static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr);
 static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr);
@@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to,
 static void
 add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 {
-  int regno = DF_REF_REGNO (last_def);
+  unsigned int regno = DF_REF_REGNO (last_def);
   struct df_link *r_use;
   int has_use_in_bb_p = false;
-  rtx def_insn = DF_REF_INSN (last_def);
+  rtx insn, def_insn = DF_REF_INSN (last_def);
   ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn);
   ddg_node_ptr use_node;
 #ifdef ENABLE_CHECKING
   struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
 #endif
-  df_ref first_def = df_bb_regno_first_def_find (g->bb, regno);
+  static VEC(df_ref,heap) *c

Re: [SMS] Support new loop pattern

2011-12-07 Thread Roman Zhuykov
Apologies for the messed up previous e-mail.

2011/10/12 Ayal Zaks :
>>> - the last jump instruction should look like:  pc=(regF!=0)?label:pc, regF 
>>> is
>
> you'd probably want to bump to next instruction if falling through,
> e.g., pc=(regF!=0)?label:pc+4
>

It is considered that program counter is increased automatically on
hardware level.
Otherwise we should add something like "pc=pc+4" in parallel to each
instruction in RTL.

>>>  flag register;
>>> - the last instruction which sets regF should be: regF=COMPARE(regC,X), 
>>> where X
>>>  is a constant, or maybe a register, which is not changed inside a loop;
>>> - only one instruction modifies regC inside a loop (other can use regC, but 
>>> not
>>>  write), and it should simply adjust it by a constant: regC=regC+step, where
>>>  step is a constant.
>>
>>> When doloop is succesfully scheduled by SMS, its number of
>>> iterations of loop kernel should be decreased by the number of stages in a
>>> schedule minus one, while other iterations expand to prologue and epilogue.
>>> In new supported loops such approach can't be used, because some
>>> instructions can use count register (regC).  Instead of this,
>>> the final register value X in compare instruction regF=COMPARE(regC,X)
>>> is changed to another value Y respective to the stage this instruction
>>> is scheduled (Y = X - stage * step).
>
> making sure this does not underflow; i.e., that the number of
> iterations is no less than stage (you've addressed this towards the
> end below).
>

Yes, this situation is processed correctly.

>>
>> The main difference from doloop case is that regC can be used by some
>> instructions in loop body.
>> That's why we are unable to simply adjust regC initial value, but have
>> to keep it's value correct on each particular iteration.
>> So, we change comparison instruction accordingly.
>>
>> An example:
>> int a[100];
>> int main()
>> {
>>  int i;
>>  for (i = 85; i > 12; i -= 5)
>>      a[i] = i * i;
>>  return a[15]-225;
>> }
>> ARM assembler with "-O2 -fno-auto-inc-dec":
>>        ldr     r0, .L5
>>        mov     r3, #85
>>        mov     r2, r0
>> .L2:
>>        mul     r1, r3, r3
>>        sub     r3, r3, #5
>>        cmp     r3, #10
>>        str     r1, [r2, #340]
>>        sub     r2, r2, #20
>>        bne     .L2
>>        ldr     r0, [r0, #60]
>>        sub     r0, r0, #225
>>        bx      lr
>> .L5:
>>        .word   a
>>
>> Loop body is executed 15 times.
>> When compiling with SMS, it finds a schedule with ii=7, stage_count=3
>> and following times:
>> Stage  Time       Insn
>> 0          5      mul     r1, r3, r3
>> 1         10     sub     r3, r3, #5
>> 1         11     cmp     r3, #10
>> 1         11     str     r1, [r2, #340]
>> 1         13     bne     .L2
>> 2         16     sub     r2, r2, #20
>>
>
> branch is not scheduled last?
>

Yes, branch schedule time is smaller then sub's one.
This mean that "sub r2, r2, $20" instruction from original iteration
number K will be executed after
"bne .L2" from original iteration number K.
But certainly bne remains to be the last instuction in new loop body.
Below you can see how it looks after SMS.

>> To make new schedule correct the loop body
>> should be executed 14 times and we change compare instruction:
>
> the loop itself should execute 13 times.

with i =
85, 80, 75, 70, 65
60, 55, 50, 45, 40
35, 30, 25, 20, 15
this gives total 15 iterations (15 stores to memory).
And new loop body will be executed 13 times (one store goes to
epilogue and one - to prologue).

>> regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.
>> In our example regC is r3, X is 10, step = -5, compare instruction
>> is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15.
>>
>
> right. In general, if the compare is on stage s (starting from 0), it
> will be executed s times in the epilog, so it should exit the loop
> upon reaching Y = X - s * step.
>
>> So, after SMS it looks like:
>>        ldr     r0, .L5
>>        mov     r3, #85
>>        mov     r2, r0
>> ;;prologue
>>        mul     r1, r3, r3      ;;from stage 0 first iteration
>>        sub     r3, r3, #5      ;;3 insns from stage 1 first iteration
>>        cmp     r3, #10
>>        str     r1, [r2, #340]
>>        mul     r1, r3, r3      ;;from stage 0 second iteration
>> ;;body
>> .L2:
>>        sub     r3, r3, #5
>>        sub     r2, r2, #20
>>        cmp     r3, #15         ;; new value to compare with is Y=15
>>        str     r1, [r2, #340]
>>        mul     r1, r3, r3
>>        bne     .L2
>> ;;epilogue
>>        sub     r2, r2, #20     ;;from stage 2 pre-last iteration
>>        sub     r3, r3, #5      ;;3 insns from stage 1 last iteration
>>        cmp     r3, #10
>>        str     r1, [r2, #340]
>>        sub     r2, r2, #20     ;;from stage 2 last iteration
>>
>>        ldr     r0, [r0, #60]
>>        sub     r0, r0, #225
>>        bx      lr
>> .L5:
>>        .word   a
>>

Here in comments I mention why insn was copied to prol

Re: RFC: ARM 64-bit shifts in NEON

2011-12-07 Thread Andrew Stubbs

On Wed 07 Dec 2011 14:20:43 GMT, Richard Earnshaw wrote:

Would it not require an unspec to prevent 'clever things' happening to
the negative shift, if we were to encode these in the machine
description? I'm not too clear on what these 'clever things' might be in
the case of shift-by-register (presumably value-range propagation is
one), but I know the NEON shifts are encoded this way for safety.



Given the way the shift patterns in the compiler are written today, quite 
possibly.  Though in the
general case of a non-constant shift the optimizer probably wouldn't be able to 
safely make any
assumptions that would break things.


I've noticed that the right-shift NEON insns have an "EQUIV" entry in 
the dumps, so I'm suspicious that even then we're not totally safe from 
"optimization".



I suspect that the shift patterns should really be changed to make the shift be 
by a QImode value;
this would then correctly describe the number of bits in the register that are 
really involved in
the shift.  Further, we could then say that, for core registers, the full value 
in that QI register
was used to determine the shift.  It would be quite a lot of churn to fix this 
though.


Yeah, I considered this, but I couldn't figure out how to make the 
details work, and anyway, I've seen the compiler trying to truncate 
things (unnecessarily) for that sort of thing, so I haven't actually 
tried it.


Maybe I'll have a go and see what happens. I suspect it's need extra 
patterns to combine the truncate seamlessly, and allow actual QImode 
input also?



None of this directly helps with your neon usage, but it does show that we
really don't need to clobber the condition code register to get an
efficient sequence.


Except that it doesn't in the case of a shift by one where there is a
two-instruction sequence that clobbers CC. Presumably this special case
can be treated differently though, right from expand.



All of the sequences above can be simplified significantly if the shift amount 
is constant and I
think then, that with the exception of the special case you mention (which is 
only for shift right
by 1) you never need the condition codes and you never need more than 3 ARM 
instructions:


Actually, there are "1bit" patterns for all the shift types.


shifts<  32

LSL AH, AH, #n
ORR AH, AH, AL, LSR #(32 - n)
LSL AL, AL, #n

shifts>= 32
LSL AH, AL, #(n - 32)
MOV AL, #0

In fact both of the above sequences are equally good for Thumb2.  If we lost 
the RRX tweak it
wouldn't be a major loss (we could even put it back as a peephole2 to handle 
the common case where
the condition code registers were known to be dead).


Yes, these are what the compiler currently generates. With my patch, 
*sh*di3 never fails to expand (not if TARGET_NEON is true, anyway), so 
the compiler doesn't do it automatically any more, so I have added 
splitters to do it manually.


Andrew


Re: [SMS] Support new loop pattern

2011-12-07 Thread Roman Zhuykov
2011/10/12 Ayal Zaks :>>> - the last jump
instruction should look like:  pc=(regF!=0)?label:pc, regF is>> you'd
probably want to bump to next instruction if falling through,> e.g.,
pc=(regF!=0)?label:pc+4>
It is considered that program counter is increased automatically
onhardware level.Otherwise we should add something like "pc=pc+4" in
parallel to eachinstruction in RTL.
>>>  flag register;>>> - the last instruction which sets regF should be: 
>>> regF=COMPARE(regC,X), where X>>>  is a constant, or maybe a register, which 
>>> is not changed inside a loop;>>> - only one instruction modifies regC 
>>> inside a loop (other can use regC, but not>>>  write), and it should simply 
>>> adjust it by a constant: regC=regC+step, where>>>  step is a constant.> 
>>> When doloop is succesfully scheduled by SMS, its number of>>> iterations of 
>>> loop kernel should be decreased by the number of stages in a>>> schedule 
>>> minus one, while other iterations expand to prologue and epilogue.>>> In 
>>> new supported loops such approach can't be used, because some>>> 
>>> instructions can use count register (regC).  Instead of this,>>> the final 
>>> register value X in compare instruction regF=COMPARE(regC,X)>>> is changed 
>>> to another value Y respective to the stage this instruction>>> is scheduled 
>>> (Y = X - stage * step).>> making sure this does not underflow; i.e., that 
>>> the number of> iterations is no less than stage (you've addressed this 
>>> towards the> end below).>
Yes, this situation is processed correctly.
 The main difference from doloop case is that regC can be used by some>> 
 instructions in loop body.>> That's why we are unable to simply adjust 
 regC initial value, but have>> to keep it's value correct on each 
 particular iteration.>> So, we change comparison instruction 
 accordingly. An example:>> int a[100];>> int main()>> {>>  int i;>>  
 for (i = 85; i > 12; i -= 5)>>      a[i] = i * i;>>  return a[15]-225;>> 
 }>> ARM assembler with "-O2 -fno-auto-inc-dec":>>        ldr     r0, .L5>> 
        mov     r3, #85>>        mov     r2, r0>> .L2:>>        mul     r1, 
 r3, r3>>        sub     r3, r3, #5>>        cmp     r3, #10>>        str   
   r1, [r2, #340]>>        sub     r2, r2, #20>>        bne     .L2>>       
  ldr     r0, [r0, #60]>>        sub     r0, r0, #225>>        bx      lr>> 
 .L5:>>        .word   a Loop body is executed 15 times.>> When 
 compiling with SMS, it finds a schedule with ii=7, stage_count=3>> and 
 following times:>> Stage  Time       Insn>> 0          5      mul     r1, 
 r3, r3>> 1         10     sub     r3, r3, #5>> 1         11     cmp     
 r3, #10>> 1         11     str     r1, [r2, #340]>> 1         13     bne   
   .L2>> 2         16     sub     r2, r2, #20 branch is not scheduled 
 last?>
Yes, branch schedule time is smaller then sub's one.This mean that
"sub r2, r2, $20" instruction from original iterationnumber K will be
executed after"bne .L2" from original iteration number K.But certainly
bne remains to be the last instuction in new loop body.Below you can
see how it looks after SMS.
>> To make new schedule correct the loop body>> should be executed 14 times and 
>> we change compare instruction:>> the loop itself should execute 13 times.
with i =85, 80, 75, 70, 6560, 55, 50, 45, 4035, 30, 25, 20, 15this
gives total 15 iterations (15 stores to memory).And new loop body will
be executed 13 times (one store goes toepilogue and one - to
prologue).
>> regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.>> 
>> In our example regC is r3, X is 10, step = -5, compare instruction>> is 
>> scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15. right. In 
>> general, if the compare is on stage s (starting from 0), it> will be 
>> executed s times in the epilog, so it should exit the loop> upon reaching Y 
>> = X - s * step.>>> So, after SMS it looks like:>>        ldr     r0, .L5>>   
>>      mov     r3, #85>>        mov     r2, r0>> ;;prologue>>        mul     
>> r1, r3, r3      ;;from stage 0 first iteration>>        sub     r3, r3, #5   
>>    ;;3 insns from stage 1 first iteration>>        cmp     r3, #10>>        
>> str     r1, [r2, #340]>>        mul     r1, r3, r3      ;;from stage 0 
>> second iteration>> ;;body>> .L2:>>        sub     r3, r3, #5>>        sub    
>>  r2, r2, #20>>        cmp     r3, #15         ;; new value to compare with 
>> is Y=15>>        str     r1, [r2, #340]>>        mul     r1, r3, r3>>        
>> bne     .L2>> ;;epilogue>>        sub     r2, r2, #20     ;;from stage 2 
>> pre-last iteration>>        sub     r3, r3, #5      ;;3 insns from stage 1 
>> last iteration>>        cmp     r3, #10>>        str     r1, [r2, #340]>>    
>>     sub     r2, r2, #20     ;;from stage 2 last iteration        ldr     
>> r0, [r0, #60]>>        sub     r0, r0, #225>>        bx      lr>> .L5:>>     
>>    .word   a>>
He

Re: [PATCH] Fix PR tree-optimization/51315

2011-12-07 Thread Eric Botcazou
> You are basically (but non-optimally) locally re-implementing
> what expr.c:get_object_or_type_alignment does, for the
> bare MEM_REF case (you don't consider offsets that
> do not change the alignment, nor alignment information
> present on the SSA name).

Adjusted version attached, regression-tested on SPARC/Solaris.  I'll do a full 
testing cycle if it is OK for mainline.

get_object_or_type_alignment doesn't exist on the 4.6 branch, the expanders for 
MEM_REF and TARGET_MEM_REF do:

  MAX (TYPE_ALIGN (TREE_TYPE (exp)),
   get_object_alignment (exp, BIGGEST_ALIGNMENT)))

instead, so I presume I should do the same for them in tree_non_aligned_mem_p?


2011-12-07  Eric Botcazou  

PR tree-optimization/51315
* tree.h (get_object_or_type_alignment): Declare.
* expr.c (get_object_or_type_alignment): Move to...
* builtins.c (get_object_or_type_alignment): ...here.  Add assertion.
* tree-sra.c (tree_non_mode_aligned_mem_p): Rename to...
(tree_non_aligned_mem_p): ...this.  Add ALIGN parameter.  Look into
MEM_REFs and use get_object_or_type_alignment for them.
(build_accesses_from_assign): Adjust for above change.
(access_precludes_ipa_sra_p): Likewise.



-- 
Eric Botcazou
Index: tree.h
===
--- tree.h	(revision 181993)
+++ tree.h	(working copy)
@@ -5463,6 +5463,7 @@ extern rtx builtin_memset_read_str (void
 extern bool is_builtin_fn (tree);
 extern unsigned int get_object_alignment_1 (tree, unsigned HOST_WIDE_INT *);
 extern unsigned int get_object_alignment (tree);
+extern unsigned int get_object_or_type_alignment (tree);
 extern unsigned int get_pointer_alignment (tree);
 extern tree fold_call_stmt (gimple, bool);
 extern tree gimple_fold_builtin_snprintf_chk (gimple, tree, enum built_in_function);
Index: builtins.c
===
--- builtins.c	(revision 181993)
+++ builtins.c	(working copy)
@@ -452,6 +452,31 @@ get_object_alignment (tree exp)
   return align;
 }
 
+/* Return the alignment of object EXP, also considering its type when we do
+   not know of explicit misalignment.  Only handle MEM_REF and TARGET_MEM_REF.
+
+   ??? Note that, in the general case, the type of an expression is not kept
+   consistent with misalignment information by the front-end, for example when
+   taking the address of a member of a packed structure.  However, in most of
+   the cases, expressions have the alignment of their type so we optimistically
+   fall back to this alignment when we cannot compute a misalignment.  */
+
+unsigned int
+get_object_or_type_alignment (tree exp)
+{
+  unsigned HOST_WIDE_INT misalign;
+  unsigned int align = get_object_alignment_1 (exp, &misalign);
+
+  gcc_assert (TREE_CODE (exp) == MEM_REF || TREE_CODE (exp) == TARGET_MEM_REF);
+
+  if (misalign != 0)
+align = (misalign & -misalign);
+  else
+align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align);
+
+  return align;
+}
+
 /* Return the alignment in bits of EXP, a pointer valued expression.
The alignment returned is, by default, the alignment of the thing that
EXP points to.  If it is not a POINTER_TYPE, 0 is returned.
Index: expr.c
===
--- expr.c	(revision 181993)
+++ expr.c	(working copy)
@@ -4544,27 +4544,6 @@ get_bit_range (unsigned HOST_WIDE_INT *b
 }
 }
 
-/* Return the alignment of the object EXP, also considering its type
-   when we do not know of explicit misalignment.
-   ???  Note that, in the general case, the type of an expression is not kept
-   consistent with misalignment information by the front-end, for
-   example when taking the address of a member of a packed structure.
-   However, in most of the cases, expressions have the alignment of
-   their type, so we optimistically fall back to the alignment of the
-   type when we cannot compute a misalignment.  */
-
-static unsigned int
-get_object_or_type_alignment (tree exp)
-{
-  unsigned HOST_WIDE_INT misalign;
-  unsigned int align = get_object_alignment_1 (exp, &misalign);
-  if (misalign != 0)
-align = (misalign & -misalign);
-  else
-align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align);
-  return align;
-}
-
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
is true, try generating a nontemporal store.  */
 
Index: tree-sra.c
===
--- tree-sra.c	(revision 181993)
+++ tree-sra.c	(working copy)
@@ -1067,26 +1067,29 @@ disqualify_ops_if_throwing_stmt (gimple
   return false;
 }
 
-/* Return true iff type of EXP is not sufficiently aligned.  */
+/* Return true if EXP is a memory reference less aligned than ALIGN.  This is
+   invoked only on strict-alignment targets.  */
 
 static bool
-tree_non_mode_aligned_mem_p (tree exp)
+tree_non_aligned_mem_p (tree exp, unsigned int align)
 {
-  enum machine_mode m

Re: RFC: ARM 64-bit shifts in NEON

2011-12-07 Thread Richard Earnshaw
On 07/12/11 14:03, Andrew Stubbs wrote:
> On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote:
>> So it looks like the code generated for core registers with thumb2 is
>> pretty rubbish (no real surprise there -- to get the best code you need
>> to make use of the fact that on ARM a shift by a small negative number
>> (<  -128) will give zero.  This gives us sequences like:
>>
>> For ARM state it's something like (untested)
>>
>>  @ shft<  32 , 
>> shft>= 32
>> __ashldi3_v3:
>>  sub r3, r2, #32 @ -ve   , shft 
>> - 32
>>  lsl ah, ah, r2  @ ah<<  shft, 0
>>  rsb ip, r2, #32 @ 32 - shft , -ve
>>  orr ah, ah, al, lsl r3  @ ah<<  shft, al<<  
>> shft - 32
>>  orr ah, ah, al, lsr ip  @ ah<<  shft | al>>  32 - shft  , al<<  
>> shft - 32
>>  lsl al, al, r2  @ al<<  shft, 0
>>
>> For Thumb2 (where there is no orr with register shift)
>>
>>  lslsah, ah, r2  @ ah<<  shft, 0
>>  sub r3, r2, #32 @ -ve   , shft 
>> - 32
>>  lsl ip, al, r3  @ 0 , al<<  
>> shft - 32
>>  negsr3, r3  @ 32 - shft , -ve
>>  orr ah, ah, ip  @ ah<<  shft, al<<  
>> shft - 32
>>  lsr r3, al, r3  @ al>>  32 - shft   , 0
>>  orrsah, ah, r3  @ ah<<  shft | al>>  32 - shft  , al<<  
>> shft - 32
>>  lslsal, al, r2  @ al<<  shft, 0
>>
>> Neither of which needs the condition flags during execution (and indeed
>> is probably better in both cases than the code currently in lib1funcs.asm
>> for a modern core).  The flag clobbering behaviour in the thumb2 variant
>> is only for code size saving; that would normally be added by a late
>> optimization pass.
> 
> OK, those are interesting, and I can look into making it happen, with or 
> without NEON.
> 
> Would it not require an unspec to prevent 'clever things' happening to 
> the negative shift, if we were to encode these in the machine 
> description? I'm not too clear on what these 'clever things' might be in 
> the case of shift-by-register (presumably value-range propagation is 
> one), but I know the NEON shifts are encoded this way for safety.
> 

Given the way the shift patterns in the compiler are written today, quite 
possibly.  Though in the
general case of a non-constant shift the optimizer probably wouldn't be able to 
safely make any
assumptions that would break things.

I suspect that the shift patterns should really be changed to make the shift be 
by a QImode value;
this would then correctly describe the number of bits in the register that are 
really involved in
the shift.  Further, we could then say that, for core registers, the full value 
in that QI register
was used to determine the shift.  It would be quite a lot of churn to fix this 
though.

>> None of this directly helps with your neon usage, but it does show that we
>> really don't need to clobber the condition code register to get an
>> efficient sequence.
> 
> Except that it doesn't in the case of a shift by one where there is a 
> two-instruction sequence that clobbers CC. Presumably this special case 
> can be treated differently though, right from expand.
> 

All of the sequences above can be simplified significantly if the shift amount 
is constant and I
think then, that with the exception of the special case you mention (which is 
only for shift right
by 1) you never need the condition codes and you never need more than 3 ARM 
instructions:

shifts < 32

LSL AH, AH, #n
ORR AH, AH, AL, LSR #(32 - n)
LSL AL, AL, #n

shifts >= 32
LSL AH, AL, #(n - 32)
MOV AL, #0

In fact both of the above sequences are equally good for Thumb2.  If we lost 
the RRX tweak it
wouldn't be a major loss (we could even put it back as a peephole2 to handle 
the common case where
the condition code registers were known to be dead).

R.



RFA: Fix PR middle-end/40154

2011-12-07 Thread Joern Rennecke

Having a value in the wrong mode is not the only hazard that users of
set_unique_reg_note should anticipate; it could also be that the value
is computed as a constant by the compiler, and thus the last instruction
obtained with get_last_insn is completely (or somewhat) unrelated; or an
expander could decide to save / restore some hard register around the
actual operation.
So, in general, the test should be if the candidate insn actually sets the
expected target.  There are already a number of users of set_unique_reg_note
that do that.  Others assume that a move, add or other expander emits a
single insn, or a sequence of insns that ends with a single insn, which
sets the target.  Although that is often the case, this is not a safe
assumption to make.  Fixing all these places with inlined code would
mean a lot of cut & paste programming.  So instead, I've added a new
wrapper for set_unique_reg_note to be used in these circumstances, and
also used it to replace the existing inlined checks unless something special
was going on (e.g. testing two different target / value pairs).

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
2011-12-07  Joern Rennecke  

PR middle-end/40154
* emit-rtl.c (set_dst_reg_note): New function.
* rtl.h (set_dst_reg_note): Declare.
* optabs.c (expand_binop, expand_absneg_bit): Use set_dst_reg_note.
(emit_libcall_block, expand_fix): Likewise.
* function.c (assign_parm_setup_reg, expand_function_start): Likewise.
* expmed.c (expand_mult_const, expand_divmod): Likewise.
* reload1.c (gen_reload): Likewise.

Index: optabs.c
===
--- optabs.c(revision 182075)
+++ optabs.c(working copy)
@@ -2052,11 +2052,11 @@ expand_binop (enum machine_mode mode, op
{
  rtx temp = emit_move_insn (target, xtarget);
 
- set_unique_reg_note (temp,
-  REG_EQUAL,
-  gen_rtx_fmt_ee (binoptab->code, mode,
-  copy_rtx (xop0),
-  copy_rtx (xop1)));
+ set_dst_reg_note (temp, REG_EQUAL,
+   gen_rtx_fmt_ee (binoptab->code, mode,
+   copy_rtx (xop0),
+   copy_rtx (xop1)),
+   target);
}
  else
target = xtarget;
@@ -2104,11 +2104,12 @@ expand_binop (enum machine_mode mode, op
  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
{
  temp = emit_move_insn (target ? target : product, product);
- set_unique_reg_note (temp,
-  REG_EQUAL,
-  gen_rtx_fmt_ee (MULT, mode,
-  copy_rtx (op0),
-  copy_rtx (op1)));
+ set_dst_reg_note (temp,
+   REG_EQUAL,
+   gen_rtx_fmt_ee (MULT, mode,
+   copy_rtx (op0),
+   copy_rtx (op1)),
+   target ? target : product);
}
  return product;
}
@@ -2966,8 +2967,9 @@ expand_absneg_bit (enum rtx_code code, e
   gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
   target = lowpart_subreg_maybe_copy (mode, temp, imode);
 
-  set_unique_reg_note (get_last_insn (), REG_EQUAL,
-  gen_rtx_fmt_e (code, mode, copy_rtx (op0)));
+  set_dst_reg_note (get_last_insn (), REG_EQUAL,
+   gen_rtx_fmt_e (code, mode, copy_rtx (op0)),
+   target);
 }
 
   return target;
@@ -3899,8 +3901,7 @@ emit_libcall_block (rtx insns, rtx targe
 }
 
   last = emit_move_insn (target, result);
-  if (optab_handler (mov_optab, GET_MODE (target)) != CODE_FOR_nothing)
-set_unique_reg_note (last, REG_EQUAL, copy_rtx (equiv));
+  set_dst_reg_note (last, REG_EQUAL, copy_rtx (equiv), target);
 
   if (final_dest != target)
 emit_move_insn (final_dest, target);
@@ -5213,11 +5214,10 @@ expand_fix (rtx to, rtx from, int unsign
{
  /* Make a place for a REG_NOTE and add it.  */
  insn = emit_move_insn (to, to);
- set_unique_reg_note (insn,
-  REG_EQUAL,
-  gen_rtx_fmt_e (UNSIGNED_FIX,
- GET_MODE (to),
- copy_rtx (from)));
+ set_dst_reg_note (insn, REG_EQUAL,
+   gen_rtx_fmt_e (UNSIGNED_FIX, GET_MODE (to),
+   

Re: RFC: ARM 64-bit shifts in NEON

2011-12-07 Thread Andrew Stubbs

On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote:

So it looks like the code generated for core registers with thumb2 is
pretty rubbish (no real surprise there -- to get the best code you need
to make use of the fact that on ARM a shift by a small negative number
(<  -128) will give zero.  This gives us sequences like:

For ARM state it's something like (untested)

@ shft<  32  , shft>= 32
__ashldi3_v3:
sub r3, r2, #32 @ -ve   , shft 
- 32
lsl ah, ah, r2  @ ah<<  shft  , 0
rsb ip, r2, #32 @ 32 - shft , -ve
orr ah, ah, al, lsl r3  @ ah<<  shft  , al<<  shft 
- 32
orr ah, ah, al, lsr ip  @ ah<<  shft | al>>  32 - shft  , 
al<<  shft - 32
lsl al, al, r2  @ al<<  shft  , 0

For Thumb2 (where there is no orr with register shift)

lslsah, ah, r2  @ ah<<  shft  , 0
sub r3, r2, #32 @ -ve   , shft 
- 32
lsl ip, al, r3  @ 0 , al<<  
shft - 32
negsr3, r3  @ 32 - shft , -ve
orr ah, ah, ip  @ ah<<  shft  , al<<  shft 
- 32
lsr r3, al, r3  @ al>>  32 - shft , 0
orrsah, ah, r3  @ ah<<  shft | al>>  32 - shft  , 
al<<  shft - 32
lslsal, al, r2  @ al<<  shft  , 0

Neither of which needs the condition flags during execution (and indeed
is probably better in both cases than the code currently in lib1funcs.asm
for a modern core).  The flag clobbering behaviour in the thumb2 variant
is only for code size saving; that would normally be added by a late
optimization pass.


OK, those are interesting, and I can look into making it happen, with or 
without NEON.


Would it not require an unspec to prevent 'clever things' happening to 
the negative shift, if we were to encode these in the machine 
description? I'm not too clear on what these 'clever things' might be in 
the case of shift-by-register (presumably value-range propagation is 
one), but I know the NEON shifts are encoded this way for safety.



None of this directly helps with your neon usage, but it does show that we
really don't need to clobber the condition code register to get an
efficient sequence.


Except that it doesn't in the case of a shift by one where there is a 
two-instruction sequence that clobbers CC. Presumably this special case 
can be treated differently though, right from expand.


Andrew


Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 2:32 PM, Richard Earnshaw  wrote:
> On 07/12/11 13:02, Richard Guenther wrote:
>> On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek  wrote:
>>> On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
     This is a backport for two upstream patches into our 4.6 branch.
 I submitted the first patch by Julian a while ago for backport but
 Richard Earnshaw pointed out a problem with the first patch.  The second
 patch from Joey fixes that problem.  This was tested on x86 and ARM.
>>>
>>> Why hasn't this been proposed for upstream 4.6 instead?
>>> See PR51442.
>>
>> It's indeed annoying to see arm related backports only going to
>> vendor branches, not the last officially maintained GCC release
>> branch (4.6).  I keep getting local requests to include random
>> patches to our 4.6 build to make "arm work at all".  It seriously
>> seems like having arm-linux-gnueabi as a primary target is a lie to our 
>> users.
>>
>> Arm maintainers - please consider maintaining at least the current
>> release series and shift focus away from your vendor branches.
>>
>
> So this, to some extent seems to conflict with your rules for only fixing
> regressions.  This code has always been broken in one way or another,
> so technically this doesn't qualify for the 4.6 branch.
>
> I think we need clearer rules (on the web site, not in a mailing list post)
> that describes which patches are considered acceptable on the branch and
> which are not.

We generally accept wrong-code fixes (or rejects-valid) that are
low-risk.  Target maintainers have complete freedom for their
targets provided a fix is really target-only (we even accept new
ports or new ISAs on branches, well - in theory at least).

If a maintainer thinks backporting a fix is important he can always
defer to a release manager for a final decision.

What we generally do not want is new middle-end functionality.
And we generally raise the barrier for "low-risk" the more mature
a branch is.

As a general rule, if you'd point users to "use the arm-embedded
branch" for bugreports you get, then you are doing sth wrong.
If you say, "use the arm-embedded branch" to get smaller/faster
code - well, that's ok.  Pointing people to the latest official
release series (in this case 4.6.x) is also ok, we're keeping too
many branches active IMNSHO.

Richard.

> R.
>


Re: [Patch, Fortran] PR51407 - allow BOZ edit descriptors for REAL/COMPLEX

2011-12-07 Thread Tobias Burnus

* ping * ?

On 12/04/2011 06:46 PM, Tobias Burnus wrote:

Hi all,

as Dominique has found, Fortran 2008 allows the BOZ edit descriptors 
now also with REAL and COMPLEX arguments. (See PR for quotes from the 
standard.)


Build and regtested on x86-64-linux.
OK for the trunk?

Tobias

PS: Thank you, Mikael, for reviewing my ASSOCIATE patch!




Re: [Patch, Fortran] PR50923 - [4.4-4.7] Print warning function does not return a value

2011-12-07 Thread Tobias Burnus

** PING **

On 11/30/2011 07:19 PM, Tobias Burnus wrote:
The bug has been introduced when changing the warning system to do 
more in the front end. The problem is that for:

  module m
  contains
function f()
end
  end
the sym->attr.referenced gets set - and no warning is printed. I now 
ignore the sym->attr.referenced attribute as the RESULT == NULL_TREE 
should only be reached if the variable has not been assigned a value.


Additionally, I now only set TREE_NO_WARNING() if a warning has been 
printed; that way, some other useful middle-end warnings might be 
still printed. I have to set the NO_WARNING also for sym != 
sym->result as it otherwise prints a middle end warning into addition 
to the FE warning. But the FE warning has been printed already before 
for the result variable.


Build and regtested on x86-64-linux.
OK for the trunk? How far should this be backported - all the way down 
to 4.4?


Tobias




Re: [Patch, Fortran] PR50815 - don't -fcheck=bounds of deferred-length strings

2011-12-07 Thread Tobias Burnus

** PING **

On 11/29/2011 07:35 PM, Tobias Burnus wrote:
gfortran had an ICE when trying to insert a check whether the 
character length between actual and dummy argument matches. This check 
is pointless for deferred-length character arguments - thus, no bounds 
check should be generated.


Build and regtested on x86-64-linux.
OK for the trunk?

Tobias




[PATCH] Testcase for PR48100

2011-12-07 Thread Richard Guenther

Which got fixed on trunk.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2011-12-07  Richard Guenther  

PR lto/48100
* gcc.dg/lto/20111207-1_0.c: New testcase.
* gcc.dg/lto/20111207-1_1.c: Likewise.
* gcc.dg/lto/20111207-1_2.c: Likewise.
* gcc.dg/lto/20111207-1_3.c: Likewise.

Index: gcc/testsuite/gcc.dg/lto/20111207-1_0.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_0.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_0.c (revision 0)
@@ -0,0 +1,4 @@
+/* { dg-lto-do run } */
+/* { dg-lto-options { { -flto } } } */
+/* { dg-require-linker-plugin "" } */
+/* { dg-extra-ld-options "-fuse-linker-plugin" } */
Index: gcc/testsuite/gcc.dg/lto/20111207-1_1.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_1.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_1.c (revision 0)
@@ -0,0 +1,3 @@
+/* { dg-options "-fno-lto" } */
+
+int i;
Index: gcc/testsuite/gcc.dg/lto/20111207-1_2.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_2.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_2.c (revision 0)
@@ -0,0 +1 @@
+int i;
Index: gcc/testsuite/gcc.dg/lto/20111207-1_3.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_3.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_3.c (revision 0)
@@ -0,0 +1,6 @@
+extern int i;
+
+int main()
+{
+  return i;
+}


Re: RFC: ARM 64-bit shifts in NEON

2011-12-07 Thread Richard Earnshaw
On 02/12/11 12:42, Andrew Stubbs wrote:
> Hi All,
> 
> I'm trying to implement DImode shifts using ARM NEON instructions. This 
> wouldn't be difficult in itself, but making it play nice with the 
> existing implementation is causing me problems. I'd like a few 
> suggestions/pointers/comments to help me get this right, please.
> 
> The existing shift mechanisms must be kept, partly because the NEON unit 
> is optional, and partly because it does not permit the full range of 
> DImode operations, so sometimes it's more efficient to do 64-bit 
> operations in core-registers, rather than copy all the values over to 
> NEON, do the operation, and move the result back. Which set of patterns 
> are used is determined by the register allocator and its costs mechanism.
> 
> The late decision means that the patterns may only use the post-reload 
> splitter, and so cannot rely on many of the usual passes to sort out 
> inefficiencies. In particular, the lack of combine makes it hard to 
> detect and optimize extend-and-copy sequences.
> 
> So, I've attached two patches. The first is neon-shifts.patch, and does 
> most of the work. The second is extendsidi2_neon.patch, and is intended 
> to aid moving the shift amount from SImode registers, but doesn't go as 
> far as I'd like.
> 
> I've not actually tested any of the output code just yet, so there may 
> be logic errors, but those are easily fixed later, and what I'm trying 
> to get right here is the GCC machine description.
> 
> Given this testcase:
> 
> void
> f (long long *a, int b)
> {
>   *a = *a << b;
> }
> 
> Without any patches, GCC gives this output, using only ARM core 
> registers (in thumb2 mode):
> 
> f:
>   ldr r2, [r0, #0]
>   ldr r3, [r0, #4]
>   push{r4, r5, r6}
>   rsb r6, r1, #32
>   sub r4, r1, #32
>   lsrsr6, r2, r6
>   lslsr5, r2, r4
>   lslsr3, r3, r1
>   lslsr1, r2, r1
>   orrsr3, r3, r6
>   str r1, [r0, #0]
>   andsr4, r3, r4, asr #32
>   it  cc
>   movcc   r4, r5
>   str r4, [r0, #4]
>   pop {r4, r5, r6}
>   bx  lr
> 
> With just neon-shifts.patch, we get this output, now with NEON shifts:
> 
> f:
>  flddd17, [r0, #0]   @ int
>  mov r2, r1
>  movsr3, #0
>  push{r4, r5}
>  fmdrr   d18, r2, r3 @ int
>  vshl.i64d16, d17, d18
>  fstdd16, [r0, #0]   @ int
>  pop {r4, r5}
>  bx  lr
> 
> 
> As you can see, the shift is much improved, but the shift amount is 
> first extended into two SImode registers, and then moved to a NEON 
> DImode register, which increases core-register pressure unnecessarily.
> 
> With both patches, we now get this:
> 
> f:
>  flddd17, [r0, #0]   @ int
>  vdup.32 d16, r1
>  vshr.u64d16, d16, #32   <-- still unnecessary
>  vshl.i64d16, d17, d16
>  fstdd16, [r0, #0]   @ int
>  bx  lr
> 
> Now the value is copied and then extended. I have chosen to use vdup.32 
> instead of vmov.i32 because the latter can only target half the DImode 
> registers. The right shift is necessary for a general zero-extend, but 
> is not useful in this case as only the bottom 8 bits are interesting, 
> and vdup has already done the right thing.
> 
> Note that the examples I've given are for left shifts. Right shifts are 
> also implemented, but are a little more complicated (in the 
> shift-by-register case) because the shift must be implemented as a left 
> shift by a negative amount, and so an unspec is used to prevent the 
> compiler doing anything 'clever'. Apart from an extra negation, the end 
> result is much the same, but the patterns look different.
> 
> 
> All this is a nice improvement, but I'm not happy:
> 
> 1. The post-reload split means that I've had to add a clobber for CC to 
> all the patterns, even though only some of them really need it. I think 
> I've convinced myself that this is ok because it doesn't matter before 
> scheduling, and after splitting the clobbers are only retained if 
> they're really needed, but it still feels wrong.
> 
> 2. The extend optimization is fine for general case extends, but it can 
> be improved for the shift-amount case because we actually only need the 
> bottom 8 bits, as indicated above. The problem is that there's no 
> obvious way to achieve this:
> - there's no combine pass after this point, so a pattern that 
> recognises and re-splits the extend, move and shift can't be used.
> - I don't believe there can be a pattern that uses SImode for the 
> shift amount because the value needs to be in a DImode register 
> eventually, and that means one needs to have been allocated before it 
> gets split, and that means the extend needs to be separate.
> 
> 3. The type of the shift-amoun

Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Richard Earnshaw
On 07/12/11 13:02, Richard Guenther wrote:
> On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek  wrote:
>> On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
>>> This is a backport for two upstream patches into our 4.6 branch.
>>> I submitted the first patch by Julian a while ago for backport but
>>> Richard Earnshaw pointed out a problem with the first patch.  The second
>>> patch from Joey fixes that problem.  This was tested on x86 and ARM.
>>
>> Why hasn't this been proposed for upstream 4.6 instead?
>> See PR51442.
> 
> It's indeed annoying to see arm related backports only going to
> vendor branches, not the last officially maintained GCC release
> branch (4.6).  I keep getting local requests to include random
> patches to our 4.6 build to make "arm work at all".  It seriously
> seems like having arm-linux-gnueabi as a primary target is a lie to our users.
> 
> Arm maintainers - please consider maintaining at least the current
> release series and shift focus away from your vendor branches.
> 

So this, to some extent seems to conflict with your rules for only fixing
regressions.  This code has always been broken in one way or another,
so technically this doesn't qualify for the 4.6 branch.

I think we need clearer rules (on the web site, not in a mailing list post)
that describes which patches are considered acceptable on the branch and
which are not.

R.



Re: [coverage] Some restructuring

2011-12-07 Thread Markus Trippelsdorf
On 2011.12.04 at 18:35 +, Nathan Sidwell wrote:
> I've committed this patch to break apart the gcov finalization routines, I 
> believe this will make it easier to fix the problem shown up by bug 51113 -- 
> although this patch does not.  Notable changes:
> 
> * rename coverage_begin_output to coverage_begin_function for consistency 
> with 
> coverage_end_function
> 
> * Only call it once from the profile code -- I suspect in the past it was 
> called 
> from several locations where it wasn't possible to statically determine which 
> would be the first call.  Now it's obvious.
> 
> * Move the opening of the notes file to the coverage initialization.  This 
> does 
> mean that we'll always create a notes file when generating coverage data, 
> even 
> if there are no functions in the resultant object file.  I don't think that's 
> unsurprising -- and will stop gcov itself complaining about a missing notes 
> file 
> in this case.
> 
> * Replace the trailing array of the gcov_info object with a pointer to an 
> array. 
>   This doesn't require changes to libgcov.c's source but will of course 
> change 
> the resultant object file it compiles to and constitutes an ABI change.  This 
> change allows the creation of the gcov_info data type without knowing the 
> number 
> of functions being instrumented.
> 
> tested on i686-pc-linux-gnu with profiledbootstrap.

Nathan,

this patch breaks profiled build of tramp3d:

 % c++ -w -Ofast -fprofile-generate -march=native tramp3d-v4.cpp
/tmp/ccMmeivA.o:tramp3d-v4.cpp:function Inform::flush(): error: undefined 
reference to 
'__gcov0__ZNKSt19basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv'
/tmp/ccMmeivA.o:tramp3d-v4.cpp:function Inform::flush(): error: undefined 
reference to 
'__gcov0__ZNKSt19basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv'
...

see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51449


-- 
Markus


Re: [PATCH] Remove dead labels to increase superblock scope

2011-12-07 Thread Tom de Vries
On 06/12/11 14:25, Michael Matz wrote:
> Hi,
> 
> On Tue, 6 Dec 2011, Tom de Vries wrote:
> 
>> what should be the 'next' returned by delete_insn?
> 
> There are exactly two calls of delete_insn that take the return value.  
> One (delete_insn_and_edges) just uses it to return it itself (and there 
> are no calls to delete_insn_and_edges that use the returned value), the 
> other (delete_insn_chain) wants to have the original next insn (before 
> reordering), even if that means that it can see the insn twice (once as 
> label, once as note, the latter would be skipped).
> 
> So, return the original next one.  Even better would be to somehow clean 
> up the single last use of the return value in delete_insn_chain, and make 
> delete_insn return nothing.
> 

I did that now. The only problem I ran into was an assert in remove_insn:
...
  if (BB_HEAD (bb) == insn)
{
  /* Never ever delete the basic block note without deleting whole
 basic block.  */
  gcc_assert (!NOTE_P (insn));
  BB_HEAD (bb) = next;
}
...

The problematic case was a removing a basic_block using delete_insn_chain:
...
  (code_label 33 26 34 4 1 "" [0 uses])

  (note 34 33 50 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

  (note 50 34 51 4 NOTE_INSN_EPILOGUE_BEG)
...

Normally, first the code_label was replaced by a note, then the BASIC_BLOCK note
was removed. The assert would not trigger while removing this note because the
note was not BB_HEAD.

With the fixup, when removing the BASIC_BLOCK note, the note would be at the
head and the assert would trigger:
...
  (note 34 33 50 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

  (note 33 26 34 4 1 "" NOTE_INSN_DELETED_LABEL)

  (note 50 34 51 4 NOTE_INSN_EPILOGUE_BEG)
...

I solved this by removing backwards rather than forwards in delete_insn_chain.

Bootstrapped and reg-tested on x86_64.

OK for next stage1?

Thanks,
- Tom

2011-12-07  Tom de Vries  

* cfgcleanup.c (try_optimize_cfg): Replace call to delete_isns_chain by
call to delete_insn.  Remove code to reorder BASIC_BLOCK note and
DELETED_LABEL note, and move it to ...
* cfg_rtl.c (delete_insn):  Change return type to void.
(delete_insn_and_edges): Likewise.
(delete_insn_chain): Handle new return type of delete_insn.  Delete
chain backwards rather than forwards.
* rtl.h (delete_insn, delete_insn_and_edges): Change return type to
void.
* cfglayout.c (fixup_reorder_chain): Delete unused label.

* gcc.dg/superblock.c: New test.
Index: gcc/cfgcleanup.c
===
--- gcc/cfgcleanup.c (revision 181652)
+++ gcc/cfgcleanup.c (working copy)
@@ -2632,20 +2632,7 @@ try_optimize_cfg (int mode)
 		  || ! label_is_jump_target_p (BB_HEAD (b),
 		   BB_END (single_pred (b)
 		{
-		  rtx label = BB_HEAD (b);
-
-		  delete_insn_chain (label, label, false);
-		  /* If the case label is undeletable, move it after the
-		 BASIC_BLOCK note.  */
-		  if (NOTE_KIND (BB_HEAD (b)) == NOTE_INSN_DELETED_LABEL)
-		{
-		  rtx bb_note = NEXT_INSN (BB_HEAD (b));
-
-		  reorder_insns_nobb (label, label, bb_note);
-		  BB_HEAD (b) = bb_note;
-		  if (BB_END (b) == bb_note)
-			BB_END (b) = label;
-		}
+		  delete_insn (BB_HEAD (b));
 		  if (dump_file)
 		fprintf (dump_file, "Deleted label in block %i.\n",
 			 b->index);
Index: gcc/cfglayout.c
===
--- gcc/cfglayout.c (revision 181652)
+++ gcc/cfglayout.c (working copy)
@@ -857,6 +857,9 @@ fixup_reorder_chain (void)
    (e_taken->src, e_taken->dest));
 		  e_taken->flags |= EDGE_FALLTHRU;
 		  update_br_prob_note (bb);
+		  if (LABEL_NUSES (ret_label) == 0
+		  && single_pred_p (e_taken->dest))
+		delete_insn (ret_label);
 		  continue;
 		}
 	}
Index: gcc/rtl.h
===
--- gcc/rtl.h (revision 181652)
+++ gcc/rtl.h (working copy)
@@ -2460,12 +2460,12 @@ extern void add_insn_before (rtx, rtx, s
 extern void add_insn_after (rtx, rtx, struct basic_block_def *);
 extern void remove_insn (rtx);
 extern rtx emit (rtx);
-extern rtx delete_insn (rtx);
+extern void delete_insn (rtx);
 extern rtx entry_of_function (void);
 extern void emit_insn_at_entry (rtx);
 extern void delete_insn_chain (rtx, rtx, bool);
 extern rtx unlink_insn_chain (rtx, rtx);
-extern rtx delete_insn_and_edges (rtx);
+extern void delete_insn_and_edges (rtx);
 extern rtx gen_lowpart_SUBREG (enum machine_mode, rtx);
 extern rtx gen_const_mem (enum machine_mode, rtx);
 extern rtx gen_frame_mem (enum machine_mode, rtx);
Index: gcc/cfgrtl.c
===
--- gcc/cfgrtl.c (revision 181652)
+++ gcc/cfgrtl.c (working copy)
@@ -111,12 +111,11 @@ can_delete_label_p (const_rtx label)
 	  && !in_expr_list_p (forced_labels, label));
 }
 
-/* Delete INSN by p

Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek  wrote:
> On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
>>     This is a backport for two upstream patches into our 4.6 branch.
>> I submitted the first patch by Julian a while ago for backport but
>> Richard Earnshaw pointed out a problem with the first patch.  The second
>> patch from Joey fixes that problem.  This was tested on x86 and ARM.
>
> Why hasn't this been proposed for upstream 4.6 instead?
> See PR51442.

It's indeed annoying to see arm related backports only going to
vendor branches, not the last officially maintained GCC release
branch (4.6).  I keep getting local requests to include random
patches to our 4.6 build to make "arm work at all".  It seriously
seems like having arm-linux-gnueabi as a primary target is a lie to our users.

Arm maintainers - please consider maintaining at least the current
release series and shift focus away from your vendor branches.

Thanks,
Richard.

>> 2011-11-22   Doug Kwan  
>>
>>       Backport r171347 and r181549 from trunk.
>>
>>       gcc/
>>       2011-03-23  Julian Brown  
>>
>>               * expr.c (expand_expr_real_1): Only use BLKmode for volatile
>>               accesses which are not naturally aligned.
>>
>>       2011-11-20  Joey Ye  
>>
>>               * expr.c (expand_expr_real_1): Correctly handle strict volatile
>>               bitfield loads smaller than mode size.
>>
>>       gcc/testsuite/
>>       2011-11-20  Joey Ye  
>>
>>               * gcc.dg/volatile-bitfields-1.c: New.
>
>        Jakub


Re: [PATCH] Fix PR tree-optimization/51315

2011-12-07 Thread Eric Botcazou
> You are basically (but non-optimally) locally re-implementing
> what expr.c:get_object_or_type_alignment does, for the
> bare MEM_REF case (you don't consider offsets that
> do not change the alignment, nor alignment information
> present on the SSA name).

Gosh.  And now I distinctly remember suggesting that the function be moved to 
the same file as the others...

> Note that in expr.c gete_object_or_type_alignment is only
> called on [TARGET_]MEM_REFs (I didn't export that function
> exactly because of its restrictions, if we want to export it
> we should assert it is only called for [TARGET_]MEM_REFs).

OK, let's export it, move it to builtins.c and and add the assert.

> Thus, apart from the exp_align computation the patch looks ok
> (it's only non-optimal, not incorrect).

I'll post an adjusted version.  Thanks for the review.

-- 
Eric Botcazou


Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Jakub Jelinek
On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
> This is a backport for two upstream patches into our 4.6 branch.
> I submitted the first patch by Julian a while ago for backport but
> Richard Earnshaw pointed out a problem with the first patch.  The second
> patch from Joey fixes that problem.  This was tested on x86 and ARM.

Why hasn't this been proposed for upstream 4.6 instead?
See PR51442.

> 2011-11-22   Doug Kwan  
> 
>   Backport r171347 and r181549 from trunk.
> 
>   gcc/
>   2011-03-23  Julian Brown  
> 
>   * expr.c (expand_expr_real_1): Only use BLKmode for volatile
>   accesses which are not naturally aligned.
>   
>   2011-11-20  Joey Ye  
> 
>   * expr.c (expand_expr_real_1): Correctly handle strict volatile
>   bitfield loads smaller than mode size.
> 
>   gcc/testsuite/
>   2011-11-20  Joey Ye  
> 
>   * gcc.dg/volatile-bitfields-1.c: New.

Jakub


Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 11:27 AM, Michael Zolotukhin
 wrote:
> Thanks, Richard.
> Should somebody else approve the patch or is it ok for commit to trunk?

It's ok to commit.

Richard.

> On 5 December 2011 18:04, Richard Guenther  wrote:
>> On Mon, Dec 5, 2011 at 2:28 PM, Michael Zolotukhin
>>  wrote:
 I'd just duplicate the tests you want to change to a larger array
 size and change those duplicates accordingly, leaving the original
 tests alone.
>>> Richard, I made the tests this way - please check them in the attached
>>> patch (it happened to be quite big).
>>
>> Works for me.
>>
>> Thanks,
>> Richard.
>>
 There is vect_multiple_sizes for such cases.
>>> Ira, thanks! This flag would be useful to avoid fails on the original
>>> tests when they are compiled with mavx/mavx2 - I'll prepare a patch
>>> for this soon.
>>>
>>> On 5 December 2011 13:10, Richard Guenther  
>>> wrote:
 On Mon, Dec 5, 2011 at 9:48 AM, Ira Rosen  wrote:
>
>
> gcc-patches-ow...@gcc.gnu.org wrote on 05/12/2011 10:39:07 AM:
>
>> From: Michael Zolotukhin 
>> To: Richard Guenther 
>> Cc: gcc-patches@gcc.gnu.org, izamya...@gmail.com
>> Date: 05/12/2011 10:39 AM
>> Subject: Re: [Patch] Increase array sizes in vect-tests to enable
>> 256-bit vectorization
>> Sent by: gcc-patches-ow...@gcc.gnu.org
>>
>> On 5 December 2011 10:14, Michael Zolotukhin
>>  wrote:
>> > Ok, will several tests with short arrays be enough for that or should
>> > we keep all the original tests plus new ones with longer arrays?
>>
>> BTW, there is another problem with current tests with short arrays -
>> scans are expecting specific number of some diagnostic messages like
>> "not vectorized: unsupported unaligned store", and that number would
>> be different if several vector-lengths are available - so we'll have
>> fails in those tests.
>
> There is vect_multiple_sizes for such cases.

 I'd just duplicate the tests you want to change to a larger array
 size and change those duplicates accordingly, leaving the original
 tests alone.

 Richard.
>>>
>>>
>>>
>>> --
>>> ---
>>> Best regards,
>>> Michael V. Zolotukhin,
>>> Software Engineer
>>> Intel Corporation.
>
>
>
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.


Re: [PATCH] Fix PR tree-optimization/51315

2011-12-07 Thread Richard Guenther
On Tue, Dec 6, 2011 at 11:24 PM, Eric Botcazou  wrote:
> Hi,
>
> this is a regression present on mainline and 4.6 branch at -O for the SPARC.
> The compiler generates an unaligned access for the memcpy call in:
>
>  char *s
>
>  union
>  {
>    int32_t i;
>    char a[sizeof (int32_t)];
>  }
>  v;
>
>  memcpy (v.a, s, sizeof (int32_t));
>
> The memcpy call is folded to an assignment between a couple of MEM_REFs:
>
>  MEM[(char * {ref-all})&v] = MEM[(char * {ref-all})s_1];
>
> with type 'char a[4]'.  The problem is that SRA turns this into:
>
>  v$i_27 = MEM[(char * {ref-all})s_1].i;
>
> but s isn't aligned enough for an int32_t access.
>
>
> I don't think we can scalarize in this case on strict-alignment targets.  The
> SRA pass already contains an alignment check, but it is purely internal to the
> RHS and this is clearly not sufficient anymore with MEM_REFs.
>
> The proposed fix is to enhance this alignment check to take into account both
> the RHS and the LHS (much like the memcpy folder).  I think it subsumes the
> previous check, which could be viewed as a check involving the RHS and the
> type of the LHS.  But there is a hitch: get_object_alignment is conservative
> for MEM_REF (unlike for INDIRECT_REF) so a trick is needed in order not to
> pessimize in some cases.
>
> Tested on SPARC/Solaris.  OK for mainline and 4.6 branch?

You are basically (but non-optimally) locally re-implementing
what expr.c:get_object_or_type_alignment does, for the
bare MEM_REF case (you don't consider offsets that
do not change the alignment, nor alignment information
present on the SSA name).

Note that in expr.c gete_object_or_type_alignment is only
called on [TARGET_]MEM_REFs (I didn't export that function
exactly because of its restrictions, if we want to export it
we should assert it is only called for [TARGET_]MEM_REFs).

Thus, apart from the exp_align computation the patch looks ok
(it's only non-optimal, not incorrect).

Thanks,
Richard.

>
> 2011-12-06  Eric Botcazou  
>
>        PR tree-optimization/51315
>        * tree-sra.c (tree_non_mode_aligned_mem_p): Rename to...
>        (tree_non_aligned_mem_p): ...this.  Add ALIGN parameter.  Look into
>        MEM_REFs and deal with simple dereferences specially.
>        (build_accesses_from_assign): Adjust for above change.
>        (access_precludes_ipa_sra_p): Likewise.
>
>
> 2011-12-06  Eric Botcazou  
>
>        * gcc.c-torture/execute/20111206-1.c: New test.
>
>
> --
> Eric Botcazou


[testsuite]: Move pr45830.c to gcc.dg/torture and make it pass for AVR

2011-12-07 Thread Georg-Johann Lay
In principle, test case pr45830.c works for target avr, but there is an issue
with the -ftree-switch-conversion optimization activated at higher optimization
levels: It transforms code size into .data usage and thus exceeds AVRs' RAM
size because of big CSWTCH lookup tables located in RAM.

The patch moves the test case from gcc.c-torture/execute/ to more modern
gcc.dg/torture and adds magic dg-comments as needed.  Patch lightly tested with
my avr-tools.

Ok for trunk?

Johann

PR tree-optimization/45830
* gcc.c-torture/execute/pr45830.c: Move file from here to...
* gcc.dg/torture/pr45830.c: ...here.  Add dg-do and
dg-additional-options magic.
Index: gcc.c-torture/execute/pr45830.c
===
--- gcc.c-torture/execute/pr45830.c	(revision 182043)
+++ gcc.c-torture/execute/pr45830.c	(working copy)
@@ -1,97 +0,0 @@
-/* PR tree-optimization/45830 */
-
-extern void abort (void);
-
-long long va, vb, vc, vd, ve;
-
-__attribute__((noinline)) int
-foo (int x)
-{
-  long long a, b, c, d, e;
-  switch (x)
-{
-case 0:
-case 3:
-case 1:
-case 2:
-case 4:
-  a = 1;
-  b = 129;
-  c = -12;
-  d = -4;
-  e = 128;
-  break;
-case 23:
-case 26:
-case 19:
-case 65:
-case 5:
-  a = 2;
-  b = 138;
-  c = 115;
-  d = 128;
-  e = -16;
-  break;
-case 21:
-case 20:
-case 22:
-case 38:
-case 27:
-case 66:
-case 45:
-case 47:
-  a = 3;
-  b = 6;
-  c = 127;
-  d = 25;
-  e = 257;
-  break;
-default:
-  a = 0;
-  b = 18;
-  c = 0;
-  d = 64;
-  e = 32768L;
-  break;
-}
-  va = a;
-  vb = b;
-  vc = c;
-  vd = d;
-  ve = e;
-}
-
-int
-bar (int x)
-{
-  if (x < 0)
-return 3;
-  if (x < 5)
-return 0;
-  if (x == 5 || x == 19 || x == 23 | x == 26 || x == 65)
-return 1;
-  if ((x >= 20 && x <= 22) || x == 27 || x == 38
-  || x == 45 || x == 47 || x == 66)
-return 2;
-  return 3;
-}
-
-long long expected[] =
-{ 1, 129, -12, -4, 128, 2, 138, 115, 128, -16,
-  3, 6, 127, 25, 257, 0, 18, 0, 64, 32768L };
-
-int
-main (void)
-{
-  int i, v;
-  for (i = -4; i < 70; i++)
-{
-  foo (i);
-  v = bar (i);
-  if (va != expected[5 * v] || vb != expected[5 * v + 1]
-	  || vc != expected[5 * v + 2] || vd != expected[5 * v + 3]
-	  || ve != expected[5 * v + 4])
-	abort ();
-}
-  return 0;
-}
Index: gcc.dg/torture/pr45830.c
===
--- gcc.dg/torture/pr45830.c	(revision 182043)
+++ gcc.dg/torture/pr45830.c	(working copy)
@@ -1,4 +1,6 @@
 /* PR tree-optimization/45830 */
+/* { dg-do run } */
+/* { dg-additional-options "-fno-tree-switch-conversion" { target avr-*-* } } */
 
 extern void abort (void);
 


Re: [PATCH] Fix PR50823 and its gazillion dups

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Jan Hubicka wrote:

> Am Tue 06 Dec 2011 03:41:36 PM CET schrieb Richard Guenther
> :
> 
> > 
> > This removes accounting for the number of remaining calls in
> > the inlining edge badness calculation (as discussed in private
> > with Honza a long time ago - and yes, we disagreed).  This
> 
> Well, the main reason for disagreement was that is makes programs bigger.
> The basic idea of including badness in the cost is to make inliner preffer to
> fully inline functions that can be fully inline with little effort.  This
> seems
> to work well to limit size growths when inlining.

Well, the theory might work for functions called once, but I doubt
it works for a function called 3 times and one called 2 times.
For functions called once we have the inline-functions-called-once
stuff, idependent on the fibheap priority driven inlining.

> > fixes the various ICEs of the edge badness update and caching
> > code checking which are now present for over one month.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > A SPEC 2k6 LTO run is also in progress where I hope it will fix
> > the 403.gcc and 435.gromacs builds which fail with the same
> > ICE since a month.
> > 
> > Honza, I guess you have some 12h to come up with a different fix now ;)
> 
> I have patch for maintaining the growth via edge removal/addition hooks for
> some time now that  speeds things up, too, but needs some cleanups (in
> particular it messes up heap updating somewhere and I did not work out where,
> yet).  I think I can do it over weekend, but I am fine if this patch gets into
> tree beforehand. At least we will see how important the global metric still
> is.

SPEC works, shows no regressions (in fact, binary size improves slightly).

I have applied the patch for now.  You can resort to reverting it
with a more proper fix, but there is no reason to keep this broken
during stage3.

Thanks,
Richard.


[Dodji Seketeli] Re: [PATCH] PR c++/51289 - ICE with alias template for bound template

2011-12-07 Thread Dodji Seketeli
Friendly pinging this patch.

--- Begin Message ---
Dodji Seketeli  writes:

> Jason Merrill  writes:
>
>> I guess let's check DECL_ORIGINAL_TYPE instead of TREE_TYPE for alias
>> templates.
>
> Like the below that I am currently bootstrapping?

Finally this is what passed bootstrap and testing on
x86_64-unknown-linux-gnu against trunk.

From: Dodji Seketeli 
Date: Sat, 26 Nov 2011 11:50:43 +0100
Subject: [PATCH] PR c++/51289 - ICE with alias template for bound template
 template parm

gcc/cp/

PR c++/51289
* cp-tree.h (TYPE_TEMPLATE_INFO): Rewrite this accessor macro to
better support aliased types.
(TYPE_ALIAS_P): Don't crash on TYPE_NAME nodes that are not
TYPE_DECL.
* pt.c (find_parameter_packs_r): Handle types aliases.
(push_template_decl_real): Check for bare parameter packs in the
underlying type of an alias template.

gcc/PR51289/gcc/testsuite/

PR c++/51289
* g++.dg/cpp0x/alias-decl-17.C: New test.
---
 gcc/cp/cp-tree.h   |   28 ++--
 gcc/cp/pt.c|   19 ++-
 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C |   21 +
 3 files changed, 57 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3f4f408..b821928 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2553,6 +2553,7 @@ extern void decl_shadowed_for_var_insert (tree, tree);
 #define TYPE_ALIAS_P(NODE) \
   (TYPE_P (NODE)   \
&& TYPE_NAME (NODE) \
+   && TREE_CODE (TYPE_NAME (NODE)) == TYPE_DECL\
&& TYPE_DECL_ALIAS_P (TYPE_NAME (NODE)))
 
 /* For a class type: if this structure has many fields, we'll sort them
@@ -2605,17 +2606,24 @@ extern void decl_shadowed_for_var_insert (tree, tree);
   (LANG_TYPE_CLASS_CHECK (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK (NODE)) \
->template_info)
 
-/* Template information for an ENUMERAL_, RECORD_, or UNION_TYPE.  */
+/* Template information for an ENUMERAL_, RECORD_, UNION_TYPE, or
+   BOUND_TEMPLATE_TEMPLATE_PARM type.  Note that if NODE is a
+   specialization of an alias template, this accessor returns the
+   template info for the alias template, not the one (if any) for the
+   template of the underlying type.  */
 #define TYPE_TEMPLATE_INFO(NODE)   \
-  (TREE_CODE (NODE) == ENUMERAL_TYPE   \
-   ? ENUM_TEMPLATE_INFO (NODE) :   \
-   (TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM   \
-? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE) :\
-((CLASS_TYPE_P (NODE) && !TYPE_ALIAS_P (NODE)) \
- ? CLASSTYPE_TEMPLATE_INFO (NODE)  \
- : ((TYPE_NAME (NODE) && DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))\
-   ? (DECL_TEMPLATE_INFO (TYPE_NAME (NODE)))   \
-   : NULL_TREE
+  (TYPE_ALIAS_P (NODE) \
+   ? ((TYPE_NAME (NODE) && DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))  \
+  ? DECL_TEMPLATE_INFO (TYPE_NAME (NODE))  \
+  : NULL_TREE) \
+   : ((TREE_CODE (NODE) == ENUMERAL_TYPE)  \
+  ? ENUM_TEMPLATE_INFO (NODE)  \
+  : ((TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM)\
+? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE)  \
+: (CLASS_TYPE_P (NODE) \
+   ? CLASSTYPE_TEMPLATE_INFO (NODE)\
+   : NULL_TREE
+
 
 /* Set the template information for an ENUMERAL_, RECORD_, or
UNION_TYPE to VAL.  */
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4725080..ee3a3ab 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -2976,6 +2976,20 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 (struct find_parameter_pack_data*)data;
   bool parameter_pack_p = false;
 
+  /* Handle type aliases/typedefs.  */
+  if (TYPE_P (t)
+  && TYPE_NAME (t)
+  && TREE_CODE (TYPE_NAME (t)) == TYPE_DECL
+  && TYPE_DECL_ALIAS_P (TYPE_NAME (t)))
+{
+  if (TYPE_TEMPLATE_INFO (t))
+   cp_walk_tree (&TYPE_TI_ARGS (t),
+ &find_parameter_packs_r,
+ ppd, ppd->visited);
+  *walk_subtrees = 0;
+  return NULL_TREE;
+}
+
   /* Identify whether this is a parameter pack or not.  */
   switch (TREE_CODE (t))
 {
@@ -4905,7 +4919,10 @@ push_template_decl_real (tree decl, bool is_friend)
   if (check_for_bare_parameter_packs (TYPE_RAISES_EXCEPTIONS (type)))
TYPE_RAISES_EXCEPTIONS (type) = NULL_TREE;
 }
-  else if (check_for_bare_

Re: Ping #1: [Patch,AVR] Light-weight DImode implementation.

2011-12-07 Thread Georg-Johann Lay
Georg-Johann Lay wrote:
> Denis Chertykov wrote:
> 
> The only question that remains is what the -m64 option should be like?
>
> [ ] Omit it altogether
> [ ] Leave it as is (off per default)
> [ ] Set it on per default
>
> As soon as the direction is clear, I'll post a follow-up patch to
> add the missing bits like, e.g., documentation for the new switch.
 I'll leave the decision to Denis, but I'm for omitting it.
>>> I will also defer to Denis, but I'd rather avoid having another option,
>>> if we can. Keep it simple for the users.
> 
> It might also be a hidden option like -morder2 and on per default.
> Such thing is nice for developers to play :-)
> 
>> I'm agree with Richard. I'm for omitting it.
>> Denis.

Here is a combined patch if that is more convenient for review.

The variable avr_have_dimode that was originally set in avr.opt is still there
but always set to true in avr.c. Option -m64 is omitted and thus avr.opt is
unchanged.

Passed without regressions.

Ok?

Johann

gcc/
* config/avr/avr-dimode.md: New file.
* config/avr/avr.md: Include it.
(adjust_len): Add alternatives: plus64, compare64.
(HIDI): Remove code iterator.
(code_stdname): New code attribute.
(rotx, rotsmode): Remove DI from interators.
(rotl3, *rotw, *rotb): Use HISI instead of HIDI
as code iterator.
* config/avr/avr-protos.h (avr_have_dimode): New.
(avr_out_plus64, avr_out_compare64): New.
* config/avr/avr.c (avr_out_compare): Handle DImode.
(avr_have_dimode): New variable definition and initialization.
(avr_out_compare64, avr_out_plus64): New functions.
(avr_out_plus_1): Use simplify_unary_operation to negate xval.
(adjust_insn_length): Handle ADJUST_LEN_COMPARE64, ADJUST_LEN_PLUS64.
(avr_compare_pattern): Skip DImode comparisons.

libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _adddi3, _adddi3_s8,
_subdi3, _cmpdi2, _cmpdi2_s8, _rotldi3.
* config/avr/lib1funcs.S (__adddi3, __adddi3_s8, __subdi3,
__cmpdi2, __cmpdi2_s8, __rotldi3): New functions.
Index: gcc/config/avr/avr-dimode.md
===
--- gcc/config/avr/avr-dimode.md	(revision 0)
+++ gcc/config/avr/avr-dimode.md	(revision 0)
@@ -0,0 +1,283 @@
+;;   Machine description for GNU compiler,
+;;   for Atmel AVR micro controllers.
+;;   Copyright (C) 1998 - 2011
+;;   Free Software Foundation, Inc.
+;;   Contributed by Georg Lay (a...@gjlay.de)
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;;
+
+;; The purpose of this file is to provide a light-weight DImode
+;; implementation for AVR.  The trouble with DImode is that tree -> RTL
+;; lowering leads to really unpleasant code for operations that don't
+;; work byte-wise like NEG, PLUS, MINUS, etc.  Defining optabs entries for
+;; them won't help because the optab machinery assumes these operations
+;; are cheap and does not check if a libgcc implementation is available.
+;;
+;; The DImode insns are all straight forward -- except movdi.  The approach
+;; of this implementation is to provide DImode insns without the burden of
+;; introducing movdi.
+;; 
+;; The caveat is that if there are insns for some mode, there must also be a
+;; respective move insn that describes reloads.  Therefore, this
+;; implementation uses an accumulator-based model with two hard-coded,
+;; accumulator-like registers
+;;
+;;A[] = reg:DI 18
+;;B[] = reg:DI 10
+;;
+;; so that no DImode insn contains pseudos or needs reloading.
+
+(define_constants
+  [(ACC_A	18)
+   (ACC_B	10)])
+
+;;
+;; Addition
+;;
+
+(define_expand "adddi3"
+  [(parallel [(match_operand:DI 0 "general_operand" "")
+  (match_operand:DI 1 "general_operand" "")
+  (match_operand:DI 2 "general_operand" "")])]
+  "avr_have_dimode"
+  {
+rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+
+emit_move_insn (acc_a, operands[1]);
+
+if (s8_operand (operands[2], VOIDmode))
+  {
+emit_move_insn (gen_rtx_REG (QImode, REG_X), operands[2]);
+emit_insn (gen_adddi3_co

Re: [PATCH] Fix PR50823 and its gazillion dups

2011-12-07 Thread Jan Hubicka
Am Tue 06 Dec 2011 03:41:36 PM CET schrieb Richard Guenther  
:




This removes accounting for the number of remaining calls in
the inlining edge badness calculation (as discussed in private
with Honza a long time ago - and yes, we disagreed).  This


Well, the main reason for disagreement was that is makes programs bigger.
The basic idea of including badness in the cost is to make inliner  
preffer to fully inline functions that can be fully inline with little  
effort.  This seems

to work well to limit size growths when inlining.


fixes the various ICEs of the edge badness update and caching
code checking which are now present for over one month.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
A SPEC 2k6 LTO run is also in progress where I hope it will fix
the 403.gcc and 435.gromacs builds which fail with the same
ICE since a month.

Honza, I guess you have some 12h to come up with a different fix now ;)


I have patch for maintaining the growth via edge removal/addition  
hooks for some time now that  speeds things up, too, but needs some  
cleanups (in particular it messes up heap updating somewhere and I did  
not work out where, yet).  I think I can do it over weekend, but I am  
fine if this patch gets into tree beforehand. At least we will see how  
important the global metric still is.


Thanks!
Honza



Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization

2011-12-07 Thread Michael Zolotukhin
Thanks, Richard.
Should somebody else approve the patch or is it ok for commit to trunk?

On 5 December 2011 18:04, Richard Guenther  wrote:
> On Mon, Dec 5, 2011 at 2:28 PM, Michael Zolotukhin
>  wrote:
>>> I'd just duplicate the tests you want to change to a larger array
>>> size and change those duplicates accordingly, leaving the original
>>> tests alone.
>> Richard, I made the tests this way - please check them in the attached
>> patch (it happened to be quite big).
>
> Works for me.
>
> Thanks,
> Richard.
>
>>> There is vect_multiple_sizes for such cases.
>> Ira, thanks! This flag would be useful to avoid fails on the original
>> tests when they are compiled with mavx/mavx2 - I'll prepare a patch
>> for this soon.
>>
>> On 5 December 2011 13:10, Richard Guenther  
>> wrote:
>>> On Mon, Dec 5, 2011 at 9:48 AM, Ira Rosen  wrote:


 gcc-patches-ow...@gcc.gnu.org wrote on 05/12/2011 10:39:07 AM:

> From: Michael Zolotukhin 
> To: Richard Guenther 
> Cc: gcc-patches@gcc.gnu.org, izamya...@gmail.com
> Date: 05/12/2011 10:39 AM
> Subject: Re: [Patch] Increase array sizes in vect-tests to enable
> 256-bit vectorization
> Sent by: gcc-patches-ow...@gcc.gnu.org
>
> On 5 December 2011 10:14, Michael Zolotukhin
>  wrote:
> > Ok, will several tests with short arrays be enough for that or should
> > we keep all the original tests plus new ones with longer arrays?
>
> BTW, there is another problem with current tests with short arrays -
> scans are expecting specific number of some diagnostic messages like
> "not vectorized: unsupported unaligned store", and that number would
> be different if several vector-lengths are available - so we'll have
> fails in those tests.

 There is vect_multiple_sizes for such cases.
>>>
>>> I'd just duplicate the tests you want to change to a larger array
>>> size and change those duplicates accordingly, leaving the original
>>> tests alone.
>>>
>>> Richard.
>>
>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


Re: [PATCH] _GCC_PICFLAG: use -fPIC for s390x targets

2011-12-07 Thread Rainer Orth
Mike Frysinger  writes:

> Building newer libiberty for s390x targets fails with relocation errors:
>   libiberty/pic/libiberty.a(hashtab.o): In function 'htab_create':
>   libiberty/hashtab.c:408:(.text+0x5e4): relocation truncated to fit:
>   R_390_GOT12 against symbol 'xcalloc' defined in .text section in
>   libiberty/pic/libiberty.a(xmalloc.o)
>   libiberty/pic/libiberty.a(hashtab.o): In function 'htab_try_create':
>   libiberty/hashtab.c:414:(.text+0x61c): relocation truncated to fit:
>   R_390_GOT12 against symbol 'calloc@@GLIBC_2.2' defined in .text
>   section in /lib/libc.so.6
>   collect2: ld returned 1 exit status
>
> Building with larger GOT (-fPIC rather than -fpic) fixes this.
>
> CC: Aurelien Jarno 
> CC: Martin Schwidefsky 
> Signed-off-by: Mike Frysinger 
>
> config/:
> 2011-12-06  Mike Frysinger  
>
>   * picflag.m4 (_GCC_PICFLAG): Set $1 to -fPIC for s390x*-*-*.
>
> gcc/:
> libada/:
> libgcc/:
> libiberty/:
> 2011-12-06  Mike Frysinger  
>
>   * configure: Regenerate.
> ---
>  config/picflag.m4 |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/config/picflag.m4 b/config/picflag.m4
> index f6f1b44..db2ce0f 100644
> --- a/config/picflag.m4
> +++ b/config/picflag.m4
> @@ -51,6 +51,9 @@ case "${$2}" in
>  m68k-*-*)
>   $1=-fpic
>   ;;
> +s390x*-*-*)
> + $1=-fpic
> + ;;

This doesn't match the ChangeLog/description, and certainly needs an
explanation.

>  s390*-*-*)
>   $1=-fpic
>   ;;

Perhaps it's better to remove both s390* cases and use the -fPIC default
everywhere, as does libtool.  picflag.m4 is supposed to be usable
everywhere.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][ARM] one_cmpldi2 in NEON

2011-12-07 Thread Andrew Stubbs

On 06/12/11 23:07, Richard Henderson wrote:

On 12/06/2011 01:42 PM, Andrew Stubbs wrote:

On Tue 06 Dec 2011 21:05:30 GMT, Richard Henderson wrote:

On 12/06/2011 09:59 AM, Andrew Stubbs wrote:

+(define_insn "*one_cmpldi2_neon"
+  [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w")
+(not:DI (match_operand:DI 1 "s_register_operand" " w,  0,  r, w")))]


alternative 0 == alternative 3?


Yes and no. This is an idiom used in several places in neon.md. They
are the same, but only one or other is enabled at the same time (see
the "arch" attribute) so while both 'w' and 'r' options are always
available, the order of preference is different.


Except that without *, I don't think this is what is actually achieved.
Perhaps I'm mistaken about how register preferencing is handled in the
current state of the register allocator...


Well  I can demonstrate that it does work in a simple testcase, and 
it has a visible affect in benchmark figures.


Of course, that doesn't mean it doing it the right way. As far as I 
understand it the current implementation relies on the left-to-right 
priority of the constraints.


On A9, and other A-series parts, we want it to prefer 'w'-constraint 
registers, but still use 'r'-constraint registers if those are more 
convenient.


On A8, we want it to prefer 'r' registers, all else being equal, but 
still have the freedom to use 'w' registers if the values are already there.


From reading the internals manual, it's not clear to me what the '*' 
constraint modifier really means, or how it would work in this case? 
Could you enlighten me?


Andrew