Re: [PATCH][ARM] one_cmpldi2 in NEON

2011-12-07 Thread Andrew Stubbs

On 06/12/11 23:07, Richard Henderson wrote:

On 12/06/2011 01:42 PM, Andrew Stubbs wrote:

On Tue 06 Dec 2011 21:05:30 GMT, Richard Henderson wrote:

On 12/06/2011 09:59 AM, Andrew Stubbs wrote:

+(define_insn *one_cmpldi2_neon
+  [(set (match_operand:DI 0 s_register_operand =w,?r,?r,?w)
+(not:DI (match_operand:DI 1 s_register_operand  w,  0,  r, w)))]


alternative 0 == alternative 3?


Yes and no. This is an idiom used in several places in neon.md. They
are the same, but only one or other is enabled at the same time (see
the arch attribute) so while both 'w' and 'r' options are always
available, the order of preference is different.


Except that without *, I don't think this is what is actually achieved.
Perhaps I'm mistaken about how register preferencing is handled in the
current state of the register allocator...


Well  I can demonstrate that it does work in a simple testcase, and 
it has a visible affect in benchmark figures.


Of course, that doesn't mean it doing it the right way. As far as I 
understand it the current implementation relies on the left-to-right 
priority of the constraints.


On A9, and other A-series parts, we want it to prefer 'w'-constraint 
registers, but still use 'r'-constraint registers if those are more 
convenient.


On A8, we want it to prefer 'r' registers, all else being equal, but 
still have the freedom to use 'w' registers if the values are already there.


From reading the internals manual, it's not clear to me what the '*' 
constraint modifier really means, or how it would work in this case? 
Could you enlighten me?


Andrew


Re: [PATCH] _GCC_PICFLAG: use -fPIC for s390x targets

2011-12-07 Thread Rainer Orth
Mike Frysinger vap...@gentoo.org writes:

 Building newer libiberty for s390x targets fails with relocation errors:
   libiberty/pic/libiberty.a(hashtab.o): In function 'htab_create':
   libiberty/hashtab.c:408:(.text+0x5e4): relocation truncated to fit:
   R_390_GOT12 against symbol 'xcalloc' defined in .text section in
   libiberty/pic/libiberty.a(xmalloc.o)
   libiberty/pic/libiberty.a(hashtab.o): In function 'htab_try_create':
   libiberty/hashtab.c:414:(.text+0x61c): relocation truncated to fit:
   R_390_GOT12 against symbol 'calloc@@GLIBC_2.2' defined in .text
   section in /lib/libc.so.6
   collect2: ld returned 1 exit status

 Building with larger GOT (-fPIC rather than -fpic) fixes this.

 CC: Aurelien Jarno aurel...@aurel32.net
 CC: Martin Schwidefsky schwidef...@de.ibm.com
 Signed-off-by: Mike Frysinger vap...@gentoo.org

 config/:
 2011-12-06  Mike Frysinger  vap...@gentoo.org

   * picflag.m4 (_GCC_PICFLAG): Set $1 to -fPIC for s390x*-*-*.

 gcc/:
 libada/:
 libgcc/:
 libiberty/:
 2011-12-06  Mike Frysinger  vap...@gentoo.org

   * configure: Regenerate.
 ---
  config/picflag.m4 |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/config/picflag.m4 b/config/picflag.m4
 index f6f1b44..db2ce0f 100644
 --- a/config/picflag.m4
 +++ b/config/picflag.m4
 @@ -51,6 +51,9 @@ case ${$2} in
  m68k-*-*)
   $1=-fpic
   ;;
 +s390x*-*-*)
 + $1=-fpic
 + ;;

This doesn't match the ChangeLog/description, and certainly needs an
explanation.

  s390*-*-*)
   $1=-fpic
   ;;

Perhaps it's better to remove both s390* cases and use the -fPIC default
everywhere, as does libtool.  picflag.m4 is supposed to be usable
everywhere.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization

2011-12-07 Thread Michael Zolotukhin
Thanks, Richard.
Should somebody else approve the patch or is it ok for commit to trunk?

On 5 December 2011 18:04, Richard Guenther richard.guent...@gmail.com wrote:
 On Mon, Dec 5, 2011 at 2:28 PM, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
 I'd just duplicate the tests you want to change to a larger array
 size and change those duplicates accordingly, leaving the original
 tests alone.
 Richard, I made the tests this way - please check them in the attached
 patch (it happened to be quite big).

 Works for me.

 Thanks,
 Richard.

 There is vect_multiple_sizes for such cases.
 Ira, thanks! This flag would be useful to avoid fails on the original
 tests when they are compiled with mavx/mavx2 - I'll prepare a patch
 for this soon.

 On 5 December 2011 13:10, Richard Guenther richard.guent...@gmail.com 
 wrote:
 On Mon, Dec 5, 2011 at 9:48 AM, Ira Rosen i...@il.ibm.com wrote:


 gcc-patches-ow...@gcc.gnu.org wrote on 05/12/2011 10:39:07 AM:

 From: Michael Zolotukhin michael.v.zolotuk...@gmail.com
 To: Richard Guenther richard.guent...@gmail.com
 Cc: gcc-patches@gcc.gnu.org, izamya...@gmail.com
 Date: 05/12/2011 10:39 AM
 Subject: Re: [Patch] Increase array sizes in vect-tests to enable
 256-bit vectorization
 Sent by: gcc-patches-ow...@gcc.gnu.org

 On 5 December 2011 10:14, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
  Ok, will several tests with short arrays be enough for that or should
  we keep all the original tests plus new ones with longer arrays?

 BTW, there is another problem with current tests with short arrays -
 scans are expecting specific number of some diagnostic messages like
 not vectorized: unsupported unaligned store, and that number would
 be different if several vector-lengths are available - so we'll have
 fails in those tests.

 There is vect_multiple_sizes for such cases.

 I'd just duplicate the tests you want to change to a larger array
 size and change those duplicates accordingly, leaving the original
 tests alone.

 Richard.



 --
 ---
 Best regards,
 Michael V. Zolotukhin,
 Software Engineer
 Intel Corporation.



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


Re: [PATCH] Fix PR50823 and its gazillion dups

2011-12-07 Thread Jan Hubicka
Am Tue 06 Dec 2011 03:41:36 PM CET schrieb Richard Guenther  
rguent...@suse.de:




This removes accounting for the number of remaining calls in
the inlining edge badness calculation (as discussed in private
with Honza a long time ago - and yes, we disagreed).  This


Well, the main reason for disagreement was that is makes programs bigger.
The basic idea of including badness in the cost is to make inliner  
preffer to fully inline functions that can be fully inline with little  
effort.  This seems

to work well to limit size growths when inlining.


fixes the various ICEs of the edge badness update and caching
code checking which are now present for over one month.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
A SPEC 2k6 LTO run is also in progress where I hope it will fix
the 403.gcc and 435.gromacs builds which fail with the same
ICE since a month.

Honza, I guess you have some 12h to come up with a different fix now ;)


I have patch for maintaining the growth via edge removal/addition  
hooks for some time now that  speeds things up, too, but needs some  
cleanups (in particular it messes up heap updating somewhere and I did  
not work out where, yet).  I think I can do it over weekend, but I am  
fine if this patch gets into tree beforehand. At least we will see how  
important the global metric still is.


Thanks!
Honza



Re: Ping #1: [Patch,AVR] Light-weight DImode implementation.

2011-12-07 Thread Georg-Johann Lay
Georg-Johann Lay wrote:
 Denis Chertykov wrote:
 
 The only question that remains is what the -m64 option should be like?

 [ ] Omit it altogether
 [ ] Leave it as is (off per default)
 [ ] Set it on per default

 As soon as the direction is clear, I'll post a follow-up patch to
 add the missing bits like, e.g., documentation for the new switch.
 I'll leave the decision to Denis, but I'm for omitting it.
 I will also defer to Denis, but I'd rather avoid having another option,
 if we can. Keep it simple for the users.
 
 It might also be a hidden option like -morder2 and on per default.
 Such thing is nice for developers to play :-)
 
 I'm agree with Richard. I'm for omitting it.
 Denis.

Here is a combined patch if that is more convenient for review.

The variable avr_have_dimode that was originally set in avr.opt is still there
but always set to true in avr.c. Option -m64 is omitted and thus avr.opt is
unchanged.

Passed without regressions.

Ok?

Johann

gcc/
* config/avr/avr-dimode.md: New file.
* config/avr/avr.md: Include it.
(adjust_len): Add alternatives: plus64, compare64.
(HIDI): Remove code iterator.
(code_stdname): New code attribute.
(rotx, rotsmode): Remove DI from interators.
(rotlmode3, *rotwmode, *rotbmode): Use HISI instead of HIDI
as code iterator.
* config/avr/avr-protos.h (avr_have_dimode): New.
(avr_out_plus64, avr_out_compare64): New.
* config/avr/avr.c (avr_out_compare): Handle DImode.
(avr_have_dimode): New variable definition and initialization.
(avr_out_compare64, avr_out_plus64): New functions.
(avr_out_plus_1): Use simplify_unary_operation to negate xval.
(adjust_insn_length): Handle ADJUST_LEN_COMPARE64, ADJUST_LEN_PLUS64.
(avr_compare_pattern): Skip DImode comparisons.

libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _adddi3, _adddi3_s8,
_subdi3, _cmpdi2, _cmpdi2_s8, _rotldi3.
* config/avr/lib1funcs.S (__adddi3, __adddi3_s8, __subdi3,
__cmpdi2, __cmpdi2_s8, __rotldi3): New functions.
Index: gcc/config/avr/avr-dimode.md
===
--- gcc/config/avr/avr-dimode.md	(revision 0)
+++ gcc/config/avr/avr-dimode.md	(revision 0)
@@ -0,0 +1,283 @@
+;;   Machine description for GNU compiler,
+;;   for Atmel AVR micro controllers.
+;;   Copyright (C) 1998 - 2011
+;;   Free Software Foundation, Inc.
+;;   Contributed by Georg Lay (a...@gjlay.de)
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; http://www.gnu.org/licenses/.
+
+;;
+
+;; The purpose of this file is to provide a light-weight DImode
+;; implementation for AVR.  The trouble with DImode is that tree - RTL
+;; lowering leads to really unpleasant code for operations that don't
+;; work byte-wise like NEG, PLUS, MINUS, etc.  Defining optabs entries for
+;; them won't help because the optab machinery assumes these operations
+;; are cheap and does not check if a libgcc implementation is available.
+;;
+;; The DImode insns are all straight forward -- except movdi.  The approach
+;; of this implementation is to provide DImode insns without the burden of
+;; introducing movdi.
+;; 
+;; The caveat is that if there are insns for some mode, there must also be a
+;; respective move insn that describes reloads.  Therefore, this
+;; implementation uses an accumulator-based model with two hard-coded,
+;; accumulator-like registers
+;;
+;;A[] = reg:DI 18
+;;B[] = reg:DI 10
+;;
+;; so that no DImode insn contains pseudos or needs reloading.
+
+(define_constants
+  [(ACC_A	18)
+   (ACC_B	10)])
+
+;;
+;; Addition
+;;
+
+(define_expand adddi3
+  [(parallel [(match_operand:DI 0 general_operand )
+  (match_operand:DI 1 general_operand )
+  (match_operand:DI 2 general_operand )])]
+  avr_have_dimode
+  {
+rtx acc_a = gen_rtx_REG (DImode, ACC_A);
+
+emit_move_insn (acc_a, operands[1]);
+
+if (s8_operand (operands[2], VOIDmode))
+  {
+emit_move_insn (gen_rtx_REG (QImode, REG_X), operands[2]);
+emit_insn (gen_adddi3_const8_insn ());
+  }
+else if (CONST_INT_P 

[Dodji Seketeli] Re: [PATCH] PR c++/51289 - ICE with alias template for bound template

2011-12-07 Thread Dodji Seketeli
Friendly pinging this patch.

---BeginMessage---
Dodji Seketeli do...@redhat.com writes:

 Jason Merrill ja...@redhat.com writes:

 I guess let's check DECL_ORIGINAL_TYPE instead of TREE_TYPE for alias
 templates.

 Like the below that I am currently bootstrapping?

Finally this is what passed bootstrap and testing on
x86_64-unknown-linux-gnu against trunk.

From: Dodji Seketeli do...@redhat.com
Date: Sat, 26 Nov 2011 11:50:43 +0100
Subject: [PATCH] PR c++/51289 - ICE with alias template for bound template
 template parm

gcc/cp/

PR c++/51289
* cp-tree.h (TYPE_TEMPLATE_INFO): Rewrite this accessor macro to
better support aliased types.
(TYPE_ALIAS_P): Don't crash on TYPE_NAME nodes that are not
TYPE_DECL.
* pt.c (find_parameter_packs_r): Handle types aliases.
(push_template_decl_real): Check for bare parameter packs in the
underlying type of an alias template.

gcc/PR51289/gcc/testsuite/

PR c++/51289
* g++.dg/cpp0x/alias-decl-17.C: New test.
---
 gcc/cp/cp-tree.h   |   28 ++--
 gcc/cp/pt.c|   19 ++-
 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C |   21 +
 3 files changed, 57 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3f4f408..b821928 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2553,6 +2553,7 @@ extern void decl_shadowed_for_var_insert (tree, tree);
 #define TYPE_ALIAS_P(NODE) \
   (TYPE_P (NODE)   \
 TYPE_NAME (NODE) \
+TREE_CODE (TYPE_NAME (NODE)) == TYPE_DECL\
 TYPE_DECL_ALIAS_P (TYPE_NAME (NODE)))
 
 /* For a class type: if this structure has many fields, we'll sort them
@@ -2605,17 +2606,24 @@ extern void decl_shadowed_for_var_insert (tree, tree);
   (LANG_TYPE_CLASS_CHECK (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK (NODE)) \
-template_info)
 
-/* Template information for an ENUMERAL_, RECORD_, or UNION_TYPE.  */
+/* Template information for an ENUMERAL_, RECORD_, UNION_TYPE, or
+   BOUND_TEMPLATE_TEMPLATE_PARM type.  Note that if NODE is a
+   specialization of an alias template, this accessor returns the
+   template info for the alias template, not the one (if any) for the
+   template of the underlying type.  */
 #define TYPE_TEMPLATE_INFO(NODE)   \
-  (TREE_CODE (NODE) == ENUMERAL_TYPE   \
-   ? ENUM_TEMPLATE_INFO (NODE) :   \
-   (TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM   \
-? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE) :\
-((CLASS_TYPE_P (NODE)  !TYPE_ALIAS_P (NODE)) \
- ? CLASSTYPE_TEMPLATE_INFO (NODE)  \
- : ((TYPE_NAME (NODE)  DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))\
-   ? (DECL_TEMPLATE_INFO (TYPE_NAME (NODE)))   \
-   : NULL_TREE
+  (TYPE_ALIAS_P (NODE) \
+   ? ((TYPE_NAME (NODE)  DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))  \
+  ? DECL_TEMPLATE_INFO (TYPE_NAME (NODE))  \
+  : NULL_TREE) \
+   : ((TREE_CODE (NODE) == ENUMERAL_TYPE)  \
+  ? ENUM_TEMPLATE_INFO (NODE)  \
+  : ((TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM)\
+? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE)  \
+: (CLASS_TYPE_P (NODE) \
+   ? CLASSTYPE_TEMPLATE_INFO (NODE)\
+   : NULL_TREE
+
 
 /* Set the template information for an ENUMERAL_, RECORD_, or
UNION_TYPE to VAL.  */
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4725080..ee3a3ab 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -2976,6 +2976,20 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 (struct find_parameter_pack_data*)data;
   bool parameter_pack_p = false;
 
+  /* Handle type aliases/typedefs.  */
+  if (TYPE_P (t)
+   TYPE_NAME (t)
+   TREE_CODE (TYPE_NAME (t)) == TYPE_DECL
+   TYPE_DECL_ALIAS_P (TYPE_NAME (t)))
+{
+  if (TYPE_TEMPLATE_INFO (t))
+   cp_walk_tree (TYPE_TI_ARGS (t),
+ find_parameter_packs_r,
+ ppd, ppd-visited);
+  *walk_subtrees = 0;
+  return NULL_TREE;
+}
+
   /* Identify whether this is a parameter pack or not.  */
   switch (TREE_CODE (t))
 {
@@ -4905,7 +4919,10 @@ push_template_decl_real (tree decl, bool is_friend)
   if (check_for_bare_parameter_packs (TYPE_RAISES_EXCEPTIONS (type)))
TYPE_RAISES_EXCEPTIONS (type) = NULL_TREE;
 }
-  else if 

Re: [PATCH] Fix PR50823 and its gazillion dups

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Jan Hubicka wrote:

 Am Tue 06 Dec 2011 03:41:36 PM CET schrieb Richard Guenther
 rguent...@suse.de:
 
  
  This removes accounting for the number of remaining calls in
  the inlining edge badness calculation (as discussed in private
  with Honza a long time ago - and yes, we disagreed).  This
 
 Well, the main reason for disagreement was that is makes programs bigger.
 The basic idea of including badness in the cost is to make inliner preffer to
 fully inline functions that can be fully inline with little effort.  This
 seems
 to work well to limit size growths when inlining.

Well, the theory might work for functions called once, but I doubt
it works for a function called 3 times and one called 2 times.
For functions called once we have the inline-functions-called-once
stuff, idependent on the fibheap priority driven inlining.

  fixes the various ICEs of the edge badness update and caching
  code checking which are now present for over one month.
  
  Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
  A SPEC 2k6 LTO run is also in progress where I hope it will fix
  the 403.gcc and 435.gromacs builds which fail with the same
  ICE since a month.
  
  Honza, I guess you have some 12h to come up with a different fix now ;)
 
 I have patch for maintaining the growth via edge removal/addition hooks for
 some time now that  speeds things up, too, but needs some cleanups (in
 particular it messes up heap updating somewhere and I did not work out where,
 yet).  I think I can do it over weekend, but I am fine if this patch gets into
 tree beforehand. At least we will see how important the global metric still
 is.

SPEC works, shows no regressions (in fact, binary size improves slightly).

I have applied the patch for now.  You can resort to reverting it
with a more proper fix, but there is no reason to keep this broken
during stage3.

Thanks,
Richard.


[testsuite]: Move pr45830.c to gcc.dg/torture and make it pass for AVR

2011-12-07 Thread Georg-Johann Lay
In principle, test case pr45830.c works for target avr, but there is an issue
with the -ftree-switch-conversion optimization activated at higher optimization
levels: It transforms code size into .data usage and thus exceeds AVRs' RAM
size because of big CSWTCH lookup tables located in RAM.

The patch moves the test case from gcc.c-torture/execute/ to more modern
gcc.dg/torture and adds magic dg-comments as needed.  Patch lightly tested with
my avr-tools.

Ok for trunk?

Johann

PR tree-optimization/45830
* gcc.c-torture/execute/pr45830.c: Move file from here to...
* gcc.dg/torture/pr45830.c: ...here.  Add dg-do and
dg-additional-options magic.
Index: gcc.c-torture/execute/pr45830.c
===
--- gcc.c-torture/execute/pr45830.c	(revision 182043)
+++ gcc.c-torture/execute/pr45830.c	(working copy)
@@ -1,97 +0,0 @@
-/* PR tree-optimization/45830 */
-
-extern void abort (void);
-
-long long va, vb, vc, vd, ve;
-
-__attribute__((noinline)) int
-foo (int x)
-{
-  long long a, b, c, d, e;
-  switch (x)
-{
-case 0:
-case 3:
-case 1:
-case 2:
-case 4:
-  a = 1;
-  b = 129;
-  c = -12;
-  d = -4;
-  e = 128;
-  break;
-case 23:
-case 26:
-case 19:
-case 65:
-case 5:
-  a = 2;
-  b = 138;
-  c = 115;
-  d = 128;
-  e = -16;
-  break;
-case 21:
-case 20:
-case 22:
-case 38:
-case 27:
-case 66:
-case 45:
-case 47:
-  a = 3;
-  b = 6;
-  c = 127;
-  d = 25;
-  e = 257;
-  break;
-default:
-  a = 0;
-  b = 18;
-  c = 0;
-  d = 64;
-  e = 32768L;
-  break;
-}
-  va = a;
-  vb = b;
-  vc = c;
-  vd = d;
-  ve = e;
-}
-
-int
-bar (int x)
-{
-  if (x  0)
-return 3;
-  if (x  5)
-return 0;
-  if (x == 5 || x == 19 || x == 23 | x == 26 || x == 65)
-return 1;
-  if ((x = 20  x = 22) || x == 27 || x == 38
-  || x == 45 || x == 47 || x == 66)
-return 2;
-  return 3;
-}
-
-long long expected[] =
-{ 1, 129, -12, -4, 128, 2, 138, 115, 128, -16,
-  3, 6, 127, 25, 257, 0, 18, 0, 64, 32768L };
-
-int
-main (void)
-{
-  int i, v;
-  for (i = -4; i  70; i++)
-{
-  foo (i);
-  v = bar (i);
-  if (va != expected[5 * v] || vb != expected[5 * v + 1]
-	  || vc != expected[5 * v + 2] || vd != expected[5 * v + 3]
-	  || ve != expected[5 * v + 4])
-	abort ();
-}
-  return 0;
-}
Index: gcc.dg/torture/pr45830.c
===
--- gcc.dg/torture/pr45830.c	(revision 182043)
+++ gcc.dg/torture/pr45830.c	(working copy)
@@ -1,4 +1,6 @@
 /* PR tree-optimization/45830 */
+/* { dg-do run } */
+/* { dg-additional-options -fno-tree-switch-conversion { target avr-*-* } } */
 
 extern void abort (void);
 


Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 11:27 AM, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
 Thanks, Richard.
 Should somebody else approve the patch or is it ok for commit to trunk?

It's ok to commit.

Richard.

 On 5 December 2011 18:04, Richard Guenther richard.guent...@gmail.com wrote:
 On Mon, Dec 5, 2011 at 2:28 PM, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
 I'd just duplicate the tests you want to change to a larger array
 size and change those duplicates accordingly, leaving the original
 tests alone.
 Richard, I made the tests this way - please check them in the attached
 patch (it happened to be quite big).

 Works for me.

 Thanks,
 Richard.

 There is vect_multiple_sizes for such cases.
 Ira, thanks! This flag would be useful to avoid fails on the original
 tests when they are compiled with mavx/mavx2 - I'll prepare a patch
 for this soon.

 On 5 December 2011 13:10, Richard Guenther richard.guent...@gmail.com 
 wrote:
 On Mon, Dec 5, 2011 at 9:48 AM, Ira Rosen i...@il.ibm.com wrote:


 gcc-patches-ow...@gcc.gnu.org wrote on 05/12/2011 10:39:07 AM:

 From: Michael Zolotukhin michael.v.zolotuk...@gmail.com
 To: Richard Guenther richard.guent...@gmail.com
 Cc: gcc-patches@gcc.gnu.org, izamya...@gmail.com
 Date: 05/12/2011 10:39 AM
 Subject: Re: [Patch] Increase array sizes in vect-tests to enable
 256-bit vectorization
 Sent by: gcc-patches-ow...@gcc.gnu.org

 On 5 December 2011 10:14, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
  Ok, will several tests with short arrays be enough for that or should
  we keep all the original tests plus new ones with longer arrays?

 BTW, there is another problem with current tests with short arrays -
 scans are expecting specific number of some diagnostic messages like
 not vectorized: unsupported unaligned store, and that number would
 be different if several vector-lengths are available - so we'll have
 fails in those tests.

 There is vect_multiple_sizes for such cases.

 I'd just duplicate the tests you want to change to a larger array
 size and change those duplicates accordingly, leaving the original
 tests alone.

 Richard.



 --
 ---
 Best regards,
 Michael V. Zolotukhin,
 Software Engineer
 Intel Corporation.



 --
 ---
 Best regards,
 Michael V. Zolotukhin,
 Software Engineer
 Intel Corporation.


Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Jakub Jelinek
On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
 This is a backport for two upstream patches into our 4.6 branch.
 I submitted the first patch by Julian a while ago for backport but
 Richard Earnshaw pointed out a problem with the first patch.  The second
 patch from Joey fixes that problem.  This was tested on x86 and ARM.

Why hasn't this been proposed for upstream 4.6 instead?
See PR51442.

 2011-11-22   Doug Kwan  dougk...@google.com
 
   Backport r171347 and r181549 from trunk.
 
   gcc/
   2011-03-23  Julian Brown  jul...@codesourcery.com
 
   * expr.c (expand_expr_real_1): Only use BLKmode for volatile
   accesses which are not naturally aligned.
   
   2011-11-20  Joey Ye  joey...@arm.com
 
   * expr.c (expand_expr_real_1): Correctly handle strict volatile
   bitfield loads smaller than mode size.
 
   gcc/testsuite/
   2011-11-20  Joey Ye  joey...@arm.com
 
   * gcc.dg/volatile-bitfields-1.c: New.

Jakub


Re: [PATCH] Fix PR tree-optimization/51315

2011-12-07 Thread Eric Botcazou
 You are basically (but non-optimally) locally re-implementing
 what expr.c:get_object_or_type_alignment does, for the
 bare MEM_REF case (you don't consider offsets that
 do not change the alignment, nor alignment information
 present on the SSA name).

Gosh.  And now I distinctly remember suggesting that the function be moved to 
the same file as the others...

 Note that in expr.c gete_object_or_type_alignment is only
 called on [TARGET_]MEM_REFs (I didn't export that function
 exactly because of its restrictions, if we want to export it
 we should assert it is only called for [TARGET_]MEM_REFs).

OK, let's export it, move it to builtins.c and and add the assert.

 Thus, apart from the exp_align computation the patch looks ok
 (it's only non-optimal, not incorrect).

I'll post an adjusted version.  Thanks for the review.

-- 
Eric Botcazou


Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
     This is a backport for two upstream patches into our 4.6 branch.
 I submitted the first patch by Julian a while ago for backport but
 Richard Earnshaw pointed out a problem with the first patch.  The second
 patch from Joey fixes that problem.  This was tested on x86 and ARM.

 Why hasn't this been proposed for upstream 4.6 instead?
 See PR51442.

It's indeed annoying to see arm related backports only going to
vendor branches, not the last officially maintained GCC release
branch (4.6).  I keep getting local requests to include random
patches to our 4.6 build to make arm work at all.  It seriously
seems like having arm-linux-gnueabi as a primary target is a lie to our users.

Arm maintainers - please consider maintaining at least the current
release series and shift focus away from your vendor branches.

Thanks,
Richard.

 2011-11-22   Doug Kwan  dougk...@google.com

       Backport r171347 and r181549 from trunk.

       gcc/
       2011-03-23  Julian Brown  jul...@codesourcery.com

               * expr.c (expand_expr_real_1): Only use BLKmode for volatile
               accesses which are not naturally aligned.

       2011-11-20  Joey Ye  joey...@arm.com

               * expr.c (expand_expr_real_1): Correctly handle strict volatile
               bitfield loads smaller than mode size.

       gcc/testsuite/
       2011-11-20  Joey Ye  joey...@arm.com

               * gcc.dg/volatile-bitfields-1.c: New.

        Jakub


Re: [PATCH] Remove dead labels to increase superblock scope

2011-12-07 Thread Tom de Vries
On 06/12/11 14:25, Michael Matz wrote:
 Hi,
 
 On Tue, 6 Dec 2011, Tom de Vries wrote:
 
 what should be the 'next' returned by delete_insn?
 
 There are exactly two calls of delete_insn that take the return value.  
 One (delete_insn_and_edges) just uses it to return it itself (and there 
 are no calls to delete_insn_and_edges that use the returned value), the 
 other (delete_insn_chain) wants to have the original next insn (before 
 reordering), even if that means that it can see the insn twice (once as 
 label, once as note, the latter would be skipped).
 
 So, return the original next one.  Even better would be to somehow clean 
 up the single last use of the return value in delete_insn_chain, and make 
 delete_insn return nothing.
 

I did that now. The only problem I ran into was an assert in remove_insn:
...
  if (BB_HEAD (bb) == insn)
{
  /* Never ever delete the basic block note without deleting whole
 basic block.  */
  gcc_assert (!NOTE_P (insn));
  BB_HEAD (bb) = next;
}
...

The problematic case was a removing a basic_block using delete_insn_chain:
...
  (code_label 33 26 34 4 1  [0 uses])

  (note 34 33 50 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

  (note 50 34 51 4 NOTE_INSN_EPILOGUE_BEG)
...

Normally, first the code_label was replaced by a note, then the BASIC_BLOCK note
was removed. The assert would not trigger while removing this note because the
note was not BB_HEAD.

With the fixup, when removing the BASIC_BLOCK note, the note would be at the
head and the assert would trigger:
...
  (note 34 33 50 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

  (note 33 26 34 4 1  NOTE_INSN_DELETED_LABEL)

  (note 50 34 51 4 NOTE_INSN_EPILOGUE_BEG)
...

I solved this by removing backwards rather than forwards in delete_insn_chain.

Bootstrapped and reg-tested on x86_64.

OK for next stage1?

Thanks,
- Tom

2011-12-07  Tom de Vries  t...@codesourcery.com

* cfgcleanup.c (try_optimize_cfg): Replace call to delete_isns_chain by
call to delete_insn.  Remove code to reorder BASIC_BLOCK note and
DELETED_LABEL note, and move it to ...
* cfg_rtl.c (delete_insn):  Change return type to void.
(delete_insn_and_edges): Likewise.
(delete_insn_chain): Handle new return type of delete_insn.  Delete
chain backwards rather than forwards.
* rtl.h (delete_insn, delete_insn_and_edges): Change return type to
void.
* cfglayout.c (fixup_reorder_chain): Delete unused label.

* gcc.dg/superblock.c: New test.
Index: gcc/cfgcleanup.c
===
--- gcc/cfgcleanup.c (revision 181652)
+++ gcc/cfgcleanup.c (working copy)
@@ -2632,20 +2632,7 @@ try_optimize_cfg (int mode)
 		  || ! label_is_jump_target_p (BB_HEAD (b),
 		   BB_END (single_pred (b)
 		{
-		  rtx label = BB_HEAD (b);
-
-		  delete_insn_chain (label, label, false);
-		  /* If the case label is undeletable, move it after the
-		 BASIC_BLOCK note.  */
-		  if (NOTE_KIND (BB_HEAD (b)) == NOTE_INSN_DELETED_LABEL)
-		{
-		  rtx bb_note = NEXT_INSN (BB_HEAD (b));
-
-		  reorder_insns_nobb (label, label, bb_note);
-		  BB_HEAD (b) = bb_note;
-		  if (BB_END (b) == bb_note)
-			BB_END (b) = label;
-		}
+		  delete_insn (BB_HEAD (b));
 		  if (dump_file)
 		fprintf (dump_file, Deleted label in block %i.\n,
 			 b-index);
Index: gcc/cfglayout.c
===
--- gcc/cfglayout.c (revision 181652)
+++ gcc/cfglayout.c (working copy)
@@ -857,6 +857,9 @@ fixup_reorder_chain (void)
    (e_taken-src, e_taken-dest));
 		  e_taken-flags |= EDGE_FALLTHRU;
 		  update_br_prob_note (bb);
+		  if (LABEL_NUSES (ret_label) == 0
+		   single_pred_p (e_taken-dest))
+		delete_insn (ret_label);
 		  continue;
 		}
 	}
Index: gcc/rtl.h
===
--- gcc/rtl.h (revision 181652)
+++ gcc/rtl.h (working copy)
@@ -2460,12 +2460,12 @@ extern void add_insn_before (rtx, rtx, s
 extern void add_insn_after (rtx, rtx, struct basic_block_def *);
 extern void remove_insn (rtx);
 extern rtx emit (rtx);
-extern rtx delete_insn (rtx);
+extern void delete_insn (rtx);
 extern rtx entry_of_function (void);
 extern void emit_insn_at_entry (rtx);
 extern void delete_insn_chain (rtx, rtx, bool);
 extern rtx unlink_insn_chain (rtx, rtx);
-extern rtx delete_insn_and_edges (rtx);
+extern void delete_insn_and_edges (rtx);
 extern rtx gen_lowpart_SUBREG (enum machine_mode, rtx);
 extern rtx gen_const_mem (enum machine_mode, rtx);
 extern rtx gen_frame_mem (enum machine_mode, rtx);
Index: gcc/cfgrtl.c
===
--- gcc/cfgrtl.c (revision 181652)
+++ gcc/cfgrtl.c (working copy)
@@ -111,12 +111,11 @@ can_delete_label_p (const_rtx label)
 	   !in_expr_list_p (forced_labels, label));
 }
 
-/* Delete INSN by patching it 

Re: [coverage] Some restructuring

2011-12-07 Thread Markus Trippelsdorf
On 2011.12.04 at 18:35 +, Nathan Sidwell wrote:
 I've committed this patch to break apart the gcov finalization routines, I 
 believe this will make it easier to fix the problem shown up by bug 51113 -- 
 although this patch does not.  Notable changes:
 
 * rename coverage_begin_output to coverage_begin_function for consistency 
 with 
 coverage_end_function
 
 * Only call it once from the profile code -- I suspect in the past it was 
 called 
 from several locations where it wasn't possible to statically determine which 
 would be the first call.  Now it's obvious.
 
 * Move the opening of the notes file to the coverage initialization.  This 
 does 
 mean that we'll always create a notes file when generating coverage data, 
 even 
 if there are no functions in the resultant object file.  I don't think that's 
 unsurprising -- and will stop gcov itself complaining about a missing notes 
 file 
 in this case.
 
 * Replace the trailing array of the gcov_info object with a pointer to an 
 array. 
   This doesn't require changes to libgcov.c's source but will of course 
 change 
 the resultant object file it compiles to and constitutes an ABI change.  This 
 change allows the creation of the gcov_info data type without knowing the 
 number 
 of functions being instrumented.
 
 tested on i686-pc-linux-gnu with profiledbootstrap.

Nathan,

this patch breaks profiled build of tramp3d:

 % c++ -w -Ofast -fprofile-generate -march=native tramp3d-v4.cpp
/tmp/ccMmeivA.o:tramp3d-v4.cpp:function Inform::flush(): error: undefined 
reference to 
'__gcov0__ZNKSt19basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv'
/tmp/ccMmeivA.o:tramp3d-v4.cpp:function Inform::flush(): error: undefined 
reference to 
'__gcov0__ZNKSt19basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv'
...

see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51449


-- 
Markus


Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Richard Earnshaw
On 07/12/11 13:02, Richard Guenther wrote:
 On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
 This is a backport for two upstream patches into our 4.6 branch.
 I submitted the first patch by Julian a while ago for backport but
 Richard Earnshaw pointed out a problem with the first patch.  The second
 patch from Joey fixes that problem.  This was tested on x86 and ARM.

 Why hasn't this been proposed for upstream 4.6 instead?
 See PR51442.
 
 It's indeed annoying to see arm related backports only going to
 vendor branches, not the last officially maintained GCC release
 branch (4.6).  I keep getting local requests to include random
 patches to our 4.6 build to make arm work at all.  It seriously
 seems like having arm-linux-gnueabi as a primary target is a lie to our users.
 
 Arm maintainers - please consider maintaining at least the current
 release series and shift focus away from your vendor branches.
 

So this, to some extent seems to conflict with your rules for only fixing
regressions.  This code has always been broken in one way or another,
so technically this doesn't qualify for the 4.6 branch.

I think we need clearer rules (on the web site, not in a mailing list post)
that describes which patches are considered acceptable on the branch and
which are not.

R.



[PATCH] Testcase for PR48100

2011-12-07 Thread Richard Guenther

Which got fixed on trunk.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2011-12-07  Richard Guenther  rguent...@suse.de

PR lto/48100
* gcc.dg/lto/20111207-1_0.c: New testcase.
* gcc.dg/lto/20111207-1_1.c: Likewise.
* gcc.dg/lto/20111207-1_2.c: Likewise.
* gcc.dg/lto/20111207-1_3.c: Likewise.

Index: gcc/testsuite/gcc.dg/lto/20111207-1_0.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_0.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_0.c (revision 0)
@@ -0,0 +1,4 @@
+/* { dg-lto-do run } */
+/* { dg-lto-options { { -flto } } } */
+/* { dg-require-linker-plugin  } */
+/* { dg-extra-ld-options -fuse-linker-plugin } */
Index: gcc/testsuite/gcc.dg/lto/20111207-1_1.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_1.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_1.c (revision 0)
@@ -0,0 +1,3 @@
+/* { dg-options -fno-lto } */
+
+int i;
Index: gcc/testsuite/gcc.dg/lto/20111207-1_2.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_2.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_2.c (revision 0)
@@ -0,0 +1 @@
+int i;
Index: gcc/testsuite/gcc.dg/lto/20111207-1_3.c
===
--- gcc/testsuite/gcc.dg/lto/20111207-1_3.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/20111207-1_3.c (revision 0)
@@ -0,0 +1,6 @@
+extern int i;
+
+int main()
+{
+  return i;
+}


Re: [Patch, Fortran] PR50815 - don't -fcheck=bounds of deferred-length strings

2011-12-07 Thread Tobias Burnus

** PING **

On 11/29/2011 07:35 PM, Tobias Burnus wrote:
gfortran had an ICE when trying to insert a check whether the 
character length between actual and dummy argument matches. This check 
is pointless for deferred-length character arguments - thus, no bounds 
check should be generated.


Build and regtested on x86-64-linux.
OK for the trunk?

Tobias




Re: [Patch, Fortran] PR50923 - [4.4-4.7] Print warning function does not return a value

2011-12-07 Thread Tobias Burnus

** PING **

On 11/30/2011 07:19 PM, Tobias Burnus wrote:
The bug has been introduced when changing the warning system to do 
more in the front end. The problem is that for:

  module m
  contains
function f()
end
  end
the sym-attr.referenced gets set - and no warning is printed. I now 
ignore the sym-attr.referenced attribute as the RESULT == NULL_TREE 
should only be reached if the variable has not been assigned a value.


Additionally, I now only set TREE_NO_WARNING() if a warning has been 
printed; that way, some other useful middle-end warnings might be 
still printed. I have to set the NO_WARNING also for sym != 
sym-result as it otherwise prints a middle end warning into addition 
to the FE warning. But the FE warning has been printed already before 
for the result variable.


Build and regtested on x86-64-linux.
OK for the trunk? How far should this be backported - all the way down 
to 4.4?


Tobias




Re: [Patch, Fortran] PR51407 - allow BOZ edit descriptors for REAL/COMPLEX

2011-12-07 Thread Tobias Burnus

* ping * ?

On 12/04/2011 06:46 PM, Tobias Burnus wrote:

Hi all,

as Dominique has found, Fortran 2008 allows the BOZ edit descriptors 
now also with REAL and COMPLEX arguments. (See PR for quotes from the 
standard.)


Build and regtested on x86-64-linux.
OK for the trunk?

Tobias

PS: Thank you, Mikael, for reviewing my ASSOCIATE patch!




Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 2:32 PM, Richard Earnshaw rearn...@arm.com wrote:
 On 07/12/11 13:02, Richard Guenther wrote:
 On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote:
     This is a backport for two upstream patches into our 4.6 branch.
 I submitted the first patch by Julian a while ago for backport but
 Richard Earnshaw pointed out a problem with the first patch.  The second
 patch from Joey fixes that problem.  This was tested on x86 and ARM.

 Why hasn't this been proposed for upstream 4.6 instead?
 See PR51442.

 It's indeed annoying to see arm related backports only going to
 vendor branches, not the last officially maintained GCC release
 branch (4.6).  I keep getting local requests to include random
 patches to our 4.6 build to make arm work at all.  It seriously
 seems like having arm-linux-gnueabi as a primary target is a lie to our 
 users.

 Arm maintainers - please consider maintaining at least the current
 release series and shift focus away from your vendor branches.


 So this, to some extent seems to conflict with your rules for only fixing
 regressions.  This code has always been broken in one way or another,
 so technically this doesn't qualify for the 4.6 branch.

 I think we need clearer rules (on the web site, not in a mailing list post)
 that describes which patches are considered acceptable on the branch and
 which are not.

We generally accept wrong-code fixes (or rejects-valid) that are
low-risk.  Target maintainers have complete freedom for their
targets provided a fix is really target-only (we even accept new
ports or new ISAs on branches, well - in theory at least).

If a maintainer thinks backporting a fix is important he can always
defer to a release manager for a final decision.

What we generally do not want is new middle-end functionality.
And we generally raise the barrier for low-risk the more mature
a branch is.

As a general rule, if you'd point users to use the arm-embedded
branch for bugreports you get, then you are doing sth wrong.
If you say, use the arm-embedded branch to get smaller/faster
code - well, that's ok.  Pointing people to the latest official
release series (in this case 4.6.x) is also ok, we're keeping too
many branches active IMNSHO.

Richard.

 R.



Re: RFC: ARM 64-bit shifts in NEON

2011-12-07 Thread Andrew Stubbs

On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote:

So it looks like the code generated for core registers with thumb2 is
pretty rubbish (no real surprise there -- to get the best code you need
to make use of the fact that on ARM a shift by a small negative number
(  -128) will give zero.  This gives us sequences like:

For ARM state it's something like (untested)

@ shft  32  , shft= 32
__ashldi3_v3:
sub r3, r2, #32 @ -ve   , shft 
- 32
lsl ah, ah, r2  @ ah  shft  , 0
rsb ip, r2, #32 @ 32 - shft , -ve
orr ah, ah, al, lsl r3  @ ah  shft  , al  shft 
- 32
orr ah, ah, al, lsr ip  @ ah  shft | al  32 - shft  , 
al  shft - 32
lsl al, al, r2  @ al  shft  , 0

For Thumb2 (where there is no orr with register shift)

lslsah, ah, r2  @ ah  shft  , 0
sub r3, r2, #32 @ -ve   , shft 
- 32
lsl ip, al, r3  @ 0 , al  
shft - 32
negsr3, r3  @ 32 - shft , -ve
orr ah, ah, ip  @ ah  shft  , al  shft 
- 32
lsr r3, al, r3  @ al  32 - shft , 0
orrsah, ah, r3  @ ah  shft | al  32 - shft  , 
al  shft - 32
lslsal, al, r2  @ al  shft  , 0

Neither of which needs the condition flags during execution (and indeed
is probably better in both cases than the code currently in lib1funcs.asm
for a modern core).  The flag clobbering behaviour in the thumb2 variant
is only for code size saving; that would normally be added by a late
optimization pass.


OK, those are interesting, and I can look into making it happen, with or 
without NEON.


Would it not require an unspec to prevent 'clever things' happening to 
the negative shift, if we were to encode these in the machine 
description? I'm not too clear on what these 'clever things' might be in 
the case of shift-by-register (presumably value-range propagation is 
one), but I know the NEON shifts are encoded this way for safety.



None of this directly helps with your neon usage, but it does show that we
really don't need to clobber the condition code register to get an
efficient sequence.


Except that it doesn't in the case of a shift by one where there is a 
two-instruction sequence that clobbers CC. Presumably this special case 
can be treated differently though, right from expand.


Andrew


RFA: Fix PR middle-end/40154

2011-12-07 Thread Joern Rennecke

Having a value in the wrong mode is not the only hazard that users of
set_unique_reg_note should anticipate; it could also be that the value
is computed as a constant by the compiler, and thus the last instruction
obtained with get_last_insn is completely (or somewhat) unrelated; or an
expander could decide to save / restore some hard register around the
actual operation.
So, in general, the test should be if the candidate insn actually sets the
expected target.  There are already a number of users of set_unique_reg_note
that do that.  Others assume that a move, add or other expander emits a
single insn, or a sequence of insns that ends with a single insn, which
sets the target.  Although that is often the case, this is not a safe
assumption to make.  Fixing all these places with inlined code would
mean a lot of cut  paste programming.  So instead, I've added a new
wrapper for set_unique_reg_note to be used in these circumstances, and
also used it to replace the existing inlined checks unless something special
was going on (e.g. testing two different target / value pairs).

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
2011-12-07  Joern Rennecke  joern.renne...@embecosm.com

PR middle-end/40154
* emit-rtl.c (set_dst_reg_note): New function.
* rtl.h (set_dst_reg_note): Declare.
* optabs.c (expand_binop, expand_absneg_bit): Use set_dst_reg_note.
(emit_libcall_block, expand_fix): Likewise.
* function.c (assign_parm_setup_reg, expand_function_start): Likewise.
* expmed.c (expand_mult_const, expand_divmod): Likewise.
* reload1.c (gen_reload): Likewise.

Index: optabs.c
===
--- optabs.c(revision 182075)
+++ optabs.c(working copy)
@@ -2052,11 +2052,11 @@ expand_binop (enum machine_mode mode, op
{
  rtx temp = emit_move_insn (target, xtarget);
 
- set_unique_reg_note (temp,
-  REG_EQUAL,
-  gen_rtx_fmt_ee (binoptab-code, mode,
-  copy_rtx (xop0),
-  copy_rtx (xop1)));
+ set_dst_reg_note (temp, REG_EQUAL,
+   gen_rtx_fmt_ee (binoptab-code, mode,
+   copy_rtx (xop0),
+   copy_rtx (xop1)),
+   target);
}
  else
target = xtarget;
@@ -2104,11 +2104,12 @@ expand_binop (enum machine_mode mode, op
  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
{
  temp = emit_move_insn (target ? target : product, product);
- set_unique_reg_note (temp,
-  REG_EQUAL,
-  gen_rtx_fmt_ee (MULT, mode,
-  copy_rtx (op0),
-  copy_rtx (op1)));
+ set_dst_reg_note (temp,
+   REG_EQUAL,
+   gen_rtx_fmt_ee (MULT, mode,
+   copy_rtx (op0),
+   copy_rtx (op1)),
+   target ? target : product);
}
  return product;
}
@@ -2966,8 +2967,9 @@ expand_absneg_bit (enum rtx_code code, e
   gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
   target = lowpart_subreg_maybe_copy (mode, temp, imode);
 
-  set_unique_reg_note (get_last_insn (), REG_EQUAL,
-  gen_rtx_fmt_e (code, mode, copy_rtx (op0)));
+  set_dst_reg_note (get_last_insn (), REG_EQUAL,
+   gen_rtx_fmt_e (code, mode, copy_rtx (op0)),
+   target);
 }
 
   return target;
@@ -3899,8 +3901,7 @@ emit_libcall_block (rtx insns, rtx targe
 }
 
   last = emit_move_insn (target, result);
-  if (optab_handler (mov_optab, GET_MODE (target)) != CODE_FOR_nothing)
-set_unique_reg_note (last, REG_EQUAL, copy_rtx (equiv));
+  set_dst_reg_note (last, REG_EQUAL, copy_rtx (equiv), target);
 
   if (final_dest != target)
 emit_move_insn (final_dest, target);
@@ -5213,11 +5214,10 @@ expand_fix (rtx to, rtx from, int unsign
{
  /* Make a place for a REG_NOTE and add it.  */
  insn = emit_move_insn (to, to);
- set_unique_reg_note (insn,
-  REG_EQUAL,
-  gen_rtx_fmt_e (UNSIGNED_FIX,
- GET_MODE (to),
- copy_rtx (from)));
+ set_dst_reg_note (insn, REG_EQUAL,
+   gen_rtx_fmt_e (UNSIGNED_FIX, GET_MODE (to),
+   

Re: RFC: ARM 64-bit shifts in NEON

2011-12-07 Thread Richard Earnshaw
On 07/12/11 14:03, Andrew Stubbs wrote:
 On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote:
 So it looks like the code generated for core registers with thumb2 is
 pretty rubbish (no real surprise there -- to get the best code you need
 to make use of the fact that on ARM a shift by a small negative number
 (  -128) will give zero.  This gives us sequences like:

 For ARM state it's something like (untested)

  @ shft  32 , 
 shft= 32
 __ashldi3_v3:
  sub r3, r2, #32 @ -ve   , shft 
 - 32
  lsl ah, ah, r2  @ ah  shft, 0
  rsb ip, r2, #32 @ 32 - shft , -ve
  orr ah, ah, al, lsl r3  @ ah  shft, al  
 shft - 32
  orr ah, ah, al, lsr ip  @ ah  shft | al  32 - shft  , al  
 shft - 32
  lsl al, al, r2  @ al  shft, 0

 For Thumb2 (where there is no orr with register shift)

  lslsah, ah, r2  @ ah  shft, 0
  sub r3, r2, #32 @ -ve   , shft 
 - 32
  lsl ip, al, r3  @ 0 , al  
 shft - 32
  negsr3, r3  @ 32 - shft , -ve
  orr ah, ah, ip  @ ah  shft, al  
 shft - 32
  lsr r3, al, r3  @ al  32 - shft   , 0
  orrsah, ah, r3  @ ah  shft | al  32 - shft  , al  
 shft - 32
  lslsal, al, r2  @ al  shft, 0

 Neither of which needs the condition flags during execution (and indeed
 is probably better in both cases than the code currently in lib1funcs.asm
 for a modern core).  The flag clobbering behaviour in the thumb2 variant
 is only for code size saving; that would normally be added by a late
 optimization pass.
 
 OK, those are interesting, and I can look into making it happen, with or 
 without NEON.
 
 Would it not require an unspec to prevent 'clever things' happening to 
 the negative shift, if we were to encode these in the machine 
 description? I'm not too clear on what these 'clever things' might be in 
 the case of shift-by-register (presumably value-range propagation is 
 one), but I know the NEON shifts are encoded this way for safety.
 

Given the way the shift patterns in the compiler are written today, quite 
possibly.  Though in the
general case of a non-constant shift the optimizer probably wouldn't be able to 
safely make any
assumptions that would break things.

I suspect that the shift patterns should really be changed to make the shift be 
by a QImode value;
this would then correctly describe the number of bits in the register that are 
really involved in
the shift.  Further, we could then say that, for core registers, the full value 
in that QI register
was used to determine the shift.  It would be quite a lot of churn to fix this 
though.

 None of this directly helps with your neon usage, but it does show that we
 really don't need to clobber the condition code register to get an
 efficient sequence.
 
 Except that it doesn't in the case of a shift by one where there is a 
 two-instruction sequence that clobbers CC. Presumably this special case 
 can be treated differently though, right from expand.
 

All of the sequences above can be simplified significantly if the shift amount 
is constant and I
think then, that with the exception of the special case you mention (which is 
only for shift right
by 1) you never need the condition codes and you never need more than 3 ARM 
instructions:

shifts  32

LSL AH, AH, #n
ORR AH, AH, AL, LSR #(32 - n)
LSL AL, AL, #n

shifts = 32
LSL AH, AL, #(n - 32)
MOV AL, #0

In fact both of the above sequences are equally good for Thumb2.  If we lost 
the RRX tweak it
wouldn't be a major loss (we could even put it back as a peephole2 to handle 
the common case where
the condition code registers were known to be dead).

R.



Re: [PATCH] Fix PR tree-optimization/51315

2011-12-07 Thread Eric Botcazou
 You are basically (but non-optimally) locally re-implementing
 what expr.c:get_object_or_type_alignment does, for the
 bare MEM_REF case (you don't consider offsets that
 do not change the alignment, nor alignment information
 present on the SSA name).

Adjusted version attached, regression-tested on SPARC/Solaris.  I'll do a full 
testing cycle if it is OK for mainline.

get_object_or_type_alignment doesn't exist on the 4.6 branch, the expanders for 
MEM_REF and TARGET_MEM_REF do:

  MAX (TYPE_ALIGN (TREE_TYPE (exp)),
   get_object_alignment (exp, BIGGEST_ALIGNMENT)))

instead, so I presume I should do the same for them in tree_non_aligned_mem_p?


2011-12-07  Eric Botcazou  ebotca...@adacore.com

PR tree-optimization/51315
* tree.h (get_object_or_type_alignment): Declare.
* expr.c (get_object_or_type_alignment): Move to...
* builtins.c (get_object_or_type_alignment): ...here.  Add assertion.
* tree-sra.c (tree_non_mode_aligned_mem_p): Rename to...
(tree_non_aligned_mem_p): ...this.  Add ALIGN parameter.  Look into
MEM_REFs and use get_object_or_type_alignment for them.
(build_accesses_from_assign): Adjust for above change.
(access_precludes_ipa_sra_p): Likewise.



-- 
Eric Botcazou
Index: tree.h
===
--- tree.h	(revision 181993)
+++ tree.h	(working copy)
@@ -5463,6 +5463,7 @@ extern rtx builtin_memset_read_str (void
 extern bool is_builtin_fn (tree);
 extern unsigned int get_object_alignment_1 (tree, unsigned HOST_WIDE_INT *);
 extern unsigned int get_object_alignment (tree);
+extern unsigned int get_object_or_type_alignment (tree);
 extern unsigned int get_pointer_alignment (tree);
 extern tree fold_call_stmt (gimple, bool);
 extern tree gimple_fold_builtin_snprintf_chk (gimple, tree, enum built_in_function);
Index: builtins.c
===
--- builtins.c	(revision 181993)
+++ builtins.c	(working copy)
@@ -452,6 +452,31 @@ get_object_alignment (tree exp)
   return align;
 }
 
+/* Return the alignment of object EXP, also considering its type when we do
+   not know of explicit misalignment.  Only handle MEM_REF and TARGET_MEM_REF.
+
+   ??? Note that, in the general case, the type of an expression is not kept
+   consistent with misalignment information by the front-end, for example when
+   taking the address of a member of a packed structure.  However, in most of
+   the cases, expressions have the alignment of their type so we optimistically
+   fall back to this alignment when we cannot compute a misalignment.  */
+
+unsigned int
+get_object_or_type_alignment (tree exp)
+{
+  unsigned HOST_WIDE_INT misalign;
+  unsigned int align = get_object_alignment_1 (exp, misalign);
+
+  gcc_assert (TREE_CODE (exp) == MEM_REF || TREE_CODE (exp) == TARGET_MEM_REF);
+
+  if (misalign != 0)
+align = (misalign  -misalign);
+  else
+align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align);
+
+  return align;
+}
+
 /* Return the alignment in bits of EXP, a pointer valued expression.
The alignment returned is, by default, the alignment of the thing that
EXP points to.  If it is not a POINTER_TYPE, 0 is returned.
Index: expr.c
===
--- expr.c	(revision 181993)
+++ expr.c	(working copy)
@@ -4544,27 +4544,6 @@ get_bit_range (unsigned HOST_WIDE_INT *b
 }
 }
 
-/* Return the alignment of the object EXP, also considering its type
-   when we do not know of explicit misalignment.
-   ???  Note that, in the general case, the type of an expression is not kept
-   consistent with misalignment information by the front-end, for
-   example when taking the address of a member of a packed structure.
-   However, in most of the cases, expressions have the alignment of
-   their type, so we optimistically fall back to the alignment of the
-   type when we cannot compute a misalignment.  */
-
-static unsigned int
-get_object_or_type_alignment (tree exp)
-{
-  unsigned HOST_WIDE_INT misalign;
-  unsigned int align = get_object_alignment_1 (exp, misalign);
-  if (misalign != 0)
-align = (misalign  -misalign);
-  else
-align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align);
-  return align;
-}
-
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
is true, try generating a nontemporal store.  */
 
Index: tree-sra.c
===
--- tree-sra.c	(revision 181993)
+++ tree-sra.c	(working copy)
@@ -1067,26 +1067,29 @@ disqualify_ops_if_throwing_stmt (gimple
   return false;
 }
 
-/* Return true iff type of EXP is not sufficiently aligned.  */
+/* Return true if EXP is a memory reference less aligned than ALIGN.  This is
+   invoked only on strict-alignment targets.  */
 
 static bool
-tree_non_mode_aligned_mem_p (tree exp)
+tree_non_aligned_mem_p (tree exp, unsigned int align)
 {
-  enum 

Re: [SMS] Support new loop pattern

2011-12-07 Thread Roman Zhuykov
2011/10/12 Ayal Zaks ayal.z...@gmail.com: - the last jump
instruction should look like:  pc=(regF!=0)?label:pc, regF is you'd
probably want to bump to next instruction if falling through, e.g.,
pc=(regF!=0)?label:pc+4
It is considered that program counter is increased automatically
onhardware level.Otherwise we should add something like pc=pc+4 in
parallel to eachinstruction in RTL.
  flag register; - the last instruction which sets regF should be: 
 regF=COMPARE(regC,X), where X  is a constant, or maybe a register, which 
 is not changed inside a loop; - only one instruction modifies regC 
 inside a loop (other can use regC, but not  write), and it should simply 
 adjust it by a constant: regC=regC+step, where  step is a constant. 
 When doloop is succesfully scheduled by SMS, its number of iterations of 
 loop kernel should be decreased by the number of stages in a schedule 
 minus one, while other iterations expand to prologue and epilogue. In 
 new supported loops such approach can't be used, because some 
 instructions can use count register (regC).  Instead of this, the final 
 register value X in compare instruction regF=COMPARE(regC,X) is changed 
 to another value Y respective to the stage this instruction is scheduled 
 (Y = X - stage * step). making sure this does not underflow; i.e., that 
 the number of iterations is no less than stage (you've addressed this 
 towards the end below).
Yes, this situation is processed correctly.
 The main difference from doloop case is that regC can be used by some 
 instructions in loop body. That's why we are unable to simply adjust 
 regC initial value, but have to keep it's value correct on each 
 particular iteration. So, we change comparison instruction 
 accordingly. An example: int a[100]; int main() {  int i;  
 for (i = 85; i  12; i -= 5)      a[i] = i * i;  return a[15]-225; 
 } ARM assembler with -O2 -fno-auto-inc-dec:        ldr     r0, .L5 
        mov     r3, #85        mov     r2, r0 .L2:        mul     r1, 
 r3, r3        sub     r3, r3, #5        cmp     r3, #10        str   
   r1, [r2, #340]        sub     r2, r2, #20        bne     .L2       
  ldr     r0, [r0, #60]        sub     r0, r0, #225        bx      lr 
 .L5:        .word   a Loop body is executed 15 times. When 
 compiling with SMS, it finds a schedule with ii=7, stage_count=3 and 
 following times: Stage  Time       Insn 0          5      mul     r1, 
 r3, r3 1         10     sub     r3, r3, #5 1         11     cmp     
 r3, #10 1         11     str     r1, [r2, #340] 1         13     bne   
   .L2 2         16     sub     r2, r2, #20 branch is not scheduled 
 last?
Yes, branch schedule time is smaller then sub's one.This mean that
sub r2, r2, $20 instruction from original iterationnumber K will be
executed afterbne .L2 from original iteration number K.But certainly
bne remains to be the last instuction in new loop body.Below you can
see how it looks after SMS.
 To make new schedule correct the loop body should be executed 14 times and 
 we change compare instruction: the loop itself should execute 13 times.
with i =85, 80, 75, 70, 6560, 55, 50, 45, 4035, 30, 25, 20, 15this
gives total 15 iterations (15 stores to memory).And new loop body will
be executed 13 times (one store goes toepilogue and one - to
prologue).
 regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step. 
 In our example regC is r3, X is 10, step = -5, compare instruction is 
 scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15. right. In 
 general, if the compare is on stage s (starting from 0), it will be 
 executed s times in the epilog, so it should exit the loop upon reaching Y 
 = X - s * step. So, after SMS it looks like:        ldr     r0, .L5   
      mov     r3, #85        mov     r2, r0 ;;prologue        mul     
 r1, r3, r3      ;;from stage 0 first iteration        sub     r3, r3, #5   
    ;;3 insns from stage 1 first iteration        cmp     r3, #10        
 str     r1, [r2, #340]        mul     r1, r3, r3      ;;from stage 0 
 second iteration ;;body .L2:        sub     r3, r3, #5        sub    
  r2, r2, #20        cmp     r3, #15         ;; new value to compare with 
 is Y=15        str     r1, [r2, #340]        mul     r1, r3, r3        
 bne     .L2 ;;epilogue        sub     r2, r2, #20     ;;from stage 2 
 pre-last iteration        sub     r3, r3, #5      ;;3 insns from stage 1 
 last iteration        cmp     r3, #10        str     r1, [r2, #340]    
     sub     r2, r2, #20     ;;from stage 2 last iteration        ldr     
 r0, [r0, #60]        sub     r0, r0, #225        bx      lr .L5:     
    .word   a
Here in comments I mention why insn was copied to prolog and
epilog.Only branch is not copied at all.
 Testing of this appoach reveals two bugs, which do not appear while SMS 
 was used only for doloop loops.  Both these bugs happen due to the 
 nature of the flag register.  On x86_64 it is clobbered by most of 
 arithmetic instructions. This should ideally be 

Re: RFC: ARM 64-bit shifts in NEON

2011-12-07 Thread Andrew Stubbs

On Wed 07 Dec 2011 14:20:43 GMT, Richard Earnshaw wrote:

Would it not require an unspec to prevent 'clever things' happening to
the negative shift, if we were to encode these in the machine
description? I'm not too clear on what these 'clever things' might be in
the case of shift-by-register (presumably value-range propagation is
one), but I know the NEON shifts are encoded this way for safety.



Given the way the shift patterns in the compiler are written today, quite 
possibly.  Though in the
general case of a non-constant shift the optimizer probably wouldn't be able to 
safely make any
assumptions that would break things.


I've noticed that the right-shift NEON insns have an EQUIV entry in 
the dumps, so I'm suspicious that even then we're not totally safe from 
optimization.



I suspect that the shift patterns should really be changed to make the shift be 
by a QImode value;
this would then correctly describe the number of bits in the register that are 
really involved in
the shift.  Further, we could then say that, for core registers, the full value 
in that QI register
was used to determine the shift.  It would be quite a lot of churn to fix this 
though.


Yeah, I considered this, but I couldn't figure out how to make the 
details work, and anyway, I've seen the compiler trying to truncate 
things (unnecessarily) for that sort of thing, so I haven't actually 
tried it.


Maybe I'll have a go and see what happens. I suspect it's need extra 
patterns to combine the truncate seamlessly, and allow actual QImode 
input also?



None of this directly helps with your neon usage, but it does show that we
really don't need to clobber the condition code register to get an
efficient sequence.


Except that it doesn't in the case of a shift by one where there is a
two-instruction sequence that clobbers CC. Presumably this special case
can be treated differently though, right from expand.



All of the sequences above can be simplified significantly if the shift amount 
is constant and I
think then, that with the exception of the special case you mention (which is 
only for shift right
by 1) you never need the condition codes and you never need more than 3 ARM 
instructions:


Actually, there are 1bit patterns for all the shift types.


shifts  32

LSL AH, AH, #n
ORR AH, AH, AL, LSR #(32 - n)
LSL AL, AL, #n

shifts= 32
LSL AH, AL, #(n - 32)
MOV AL, #0

In fact both of the above sequences are equally good for Thumb2.  If we lost 
the RRX tweak it
wouldn't be a major loss (we could even put it back as a peephole2 to handle 
the common case where
the condition code registers were known to be dead).


Yes, these are what the compiler currently generates. With my patch, 
*sh*di3 never fails to expand (not if TARGET_NEON is true, anyway), so 
the compiler doesn't do it automatically any more, so I have added 
splitters to do it manually.


Andrew


Re: [SMS] Support new loop pattern

2011-12-07 Thread Roman Zhuykov
Apologies for the messed up previous e-mail.

2011/10/12 Ayal Zaks ayal.z...@gmail.com:
 - the last jump instruction should look like:  pc=(regF!=0)?label:pc, regF 
 is

 you'd probably want to bump to next instruction if falling through,
 e.g., pc=(regF!=0)?label:pc+4


It is considered that program counter is increased automatically on
hardware level.
Otherwise we should add something like pc=pc+4 in parallel to each
instruction in RTL.

  flag register;
 - the last instruction which sets regF should be: regF=COMPARE(regC,X), 
 where X
  is a constant, or maybe a register, which is not changed inside a loop;
 - only one instruction modifies regC inside a loop (other can use regC, but 
 not
  write), and it should simply adjust it by a constant: regC=regC+step, where
  step is a constant.

 When doloop is succesfully scheduled by SMS, its number of
 iterations of loop kernel should be decreased by the number of stages in a
 schedule minus one, while other iterations expand to prologue and epilogue.
 In new supported loops such approach can't be used, because some
 instructions can use count register (regC).  Instead of this,
 the final register value X in compare instruction regF=COMPARE(regC,X)
 is changed to another value Y respective to the stage this instruction
 is scheduled (Y = X - stage * step).

 making sure this does not underflow; i.e., that the number of
 iterations is no less than stage (you've addressed this towards the
 end below).


Yes, this situation is processed correctly.


 The main difference from doloop case is that regC can be used by some
 instructions in loop body.
 That's why we are unable to simply adjust regC initial value, but have
 to keep it's value correct on each particular iteration.
 So, we change comparison instruction accordingly.

 An example:
 int a[100];
 int main()
 {
  int i;
  for (i = 85; i  12; i -= 5)
      a[i] = i * i;
  return a[15]-225;
 }
 ARM assembler with -O2 -fno-auto-inc-dec:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 .L2:
        mul     r1, r3, r3
        sub     r3, r3, #5
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20
        bne     .L2
        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a

 Loop body is executed 15 times.
 When compiling with SMS, it finds a schedule with ii=7, stage_count=3
 and following times:
 Stage  Time       Insn
 0          5      mul     r1, r3, r3
 1         10     sub     r3, r3, #5
 1         11     cmp     r3, #10
 1         11     str     r1, [r2, #340]
 1         13     bne     .L2
 2         16     sub     r2, r2, #20


 branch is not scheduled last?


Yes, branch schedule time is smaller then sub's one.
This mean that sub r2, r2, $20 instruction from original iteration
number K will be executed after
bne .L2 from original iteration number K.
But certainly bne remains to be the last instuction in new loop body.
Below you can see how it looks after SMS.

 To make new schedule correct the loop body
 should be executed 14 times and we change compare instruction:

 the loop itself should execute 13 times.

with i =
85, 80, 75, 70, 65
60, 55, 50, 45, 40
35, 30, 25, 20, 15
this gives total 15 iterations (15 stores to memory).
And new loop body will be executed 13 times (one store goes to
epilogue and one - to prologue).

 regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.
 In our example regC is r3, X is 10, step = -5, compare instruction
 is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15.


 right. In general, if the compare is on stage s (starting from 0), it
 will be executed s times in the epilog, so it should exit the loop
 upon reaching Y = X - s * step.

 So, after SMS it looks like:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 ;;prologue
        mul     r1, r3, r3      ;;from stage 0 first iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 first iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        mul     r1, r3, r3      ;;from stage 0 second iteration
 ;;body
 .L2:
        sub     r3, r3, #5
        sub     r2, r2, #20
        cmp     r3, #15         ;; new value to compare with is Y=15
        str     r1, [r2, #340]
        mul     r1, r3, r3
        bne     .L2
 ;;epilogue
        sub     r2, r2, #20     ;;from stage 2 pre-last iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 last iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20     ;;from stage 2 last iteration

        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a


Here in comments I mention why insn was copied to prolog and epilog.
Only branch is not copied at all.

 Testing of this appoach reveals two bugs, which do not appear while SMS was
 used only for doloop loops.  Both these bugs happen due to the nature of the
 flag register.  On 

[SMS, DDG] Additional edges to instructions with clobber

2011-12-07 Thread Roman Zhuykov
This letter contains the first separate patch for ddg.cIt creates
additional nessesary anti-dep edges in data dependency graph.
2011/10/12 Ayal Zaks ayal.z...@gmail.com: The following situation
happens when SMS is enabled without register renaming
(-fno-modulo-sched-allow-regmoves).  When data dependency graph is
built, there is a step when we generate anti-dependencies from last
register use to first write of this register at the next
iteration. is a step when we generate anti-dependencies from all
last registers uses (i.e., of last register def) to first write of
this register at the next iteration.
Right.
 At this moment we should also create such dependencies to all 
 instructions which clobber the register to prevent this clobbers being 
 before last use is new schedule. well, we simply need to connect these 
 last uses to either the first write *or* the first clobber of this 
 register at the next iteration, according to whichever is first, no?
No, is works now just how you describe. Clobber is considered as a
write,and last uses are connected with first write or first
clobber.But this is not sufficient: similarly to how there's no
dependenciesbetween last uses,there shall be no dependency between
first clobbers (i.e. clobbersof a registercan be permuted freely).
So at least we have to make an additional dependencyedge to each
clobber before first write.
 Here is an model of example: loop { set1 regR use1 regR 
 clobber regR set2 regR use2 regR } If we create only 
 use2-set1 anti-dependency (and no use2-cloober) the following wrong 
 schedule is possible: prologue { set1 regR use1 regR clobber 
 regR } kernel { set2 regR clobber regR (instruction from next 
 iteration in terms of old loop kernel) use2 regR set1 regR (also from 
 next iteration) use1 regR (also from next iteration) } epilogue 
 { set2 regR use2 regR } strange; according to prolog (and 
 epilog), clobber should appear after use1 in the kernel, no? Aren't there 
 (intra-iteration) dependencies preventing the clobber to skip over use1 
 and/or set1?
Yeah, this is bad example - I wrongly suggested that there is
noanti-dep between use1 and clobber.
New example:loop { clobber1 clobber2 set use}When there is no
use-clobber2 anti-dep - the following wrong scheduleis possible.
Clobber2 is on stage0, other instructions are on stage1.prologue {
clobber2}kernel { clobber1 set clobber2 use}epilogue { clobber1 set
use}
 This problem was succesfully fixed by creating a vector of all 
 clobbering instructions together with first write and adding all needed 
 dependencies. seems like an overkill to me; we didn't draw an edge 
 between every last use and every write, because writes are kept in order 
 by having output dependences between them. So should be the case with 
 clobbers.
Clobbers themselves aren't kept in order - there are no output
dependencesbetween them. They may be scheduled in any order.
 Presumably, the ddg already has all intra-iteration edges preventing motion 
 of clobbering instructions within an iteration, and we are only missing 
 inter-iteration deps or deps surfaced by eliminating anti-deps, right?
I think the new version of a fix suits this reasoning.Now it creates
dependencies only to clobber instructions before first write.And
certainly to first write instruction itself.
--Roman zhuykovzhr...@ispras.ru
2011-12-07  Roman Zhuykov  zhr...@ispras.ru
	* ddg.c: New VEC.
	(add_cross_iteration_register_deps): Store information about
	clobbers before first write to a register.  Use collected
	information to create anti-dependencies from last uses.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..e3e065c 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 /* A flag indicating that a ddg edge belongs to an SCC or not.  */
 enum edge_flag {NOT_IN_SCC = 0, IN_SCC};
 
+/* A vector of dependencies needed while processing clobbers.  */
+DEF_VEC_P(df_ref);
+DEF_VEC_ALLOC_P(df_ref,heap);
+
 /* Forward declarations.  */
 static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr);
 static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr);
@@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to,
 static void
 add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 {
-  int regno = DF_REF_REGNO (last_def);
+  unsigned int regno = DF_REF_REGNO (last_def);
   struct df_link *r_use;
   int has_use_in_bb_p = false;
-  rtx def_insn = DF_REF_INSN (last_def);
+  rtx insn, def_insn = DF_REF_INSN (last_def);
   ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn);
   ddg_node_ptr use_node;
 #ifdef ENABLE_CHECKING
   struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb);
 #endif
-  df_ref first_def = df_bb_regno_first_def_find (g-bb, regno);
+  static VEC(df_ref,heap) *clobbers;
+  df_ref first_write_def = NULL;
+  df_ref *def_rec;
+  unsigned int uid;
+  
+  clobbers = VEC_alloc (df_ref, heap, 0);
+
+  FOR_BB_INSNS (g-bb, insn)
+{
+  

[PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther

I'm testing the following patch for PR48437 - the C frontend generates
decl copies for placing them in BLOCK_VARS for

int
test (void)
{
  extern int f (void);
  return 0;
}

we currently end up putting those into the decl merging machinery,
causing DECL_CHAIN re-use and subsequent crashes.  By not indexing
them we make sure to keep them separate.

LTO bootstrap  regtest pending on x86_64-unknown-linux-gnu, ok?

[Similar fix has to be employed for VLA types]

Thanks,
Richard.

2011-12-07  Richard Guenther  rguent...@suse.de

PR lto/48437
* lto-streamer-out.c (tree_is_indexable): Exclude block-local
extern declarations.

* gcc.dg/lto/20111207-2_0.c: New testcase.
* gcc.dg/guality/pr48437.c: Likewise.

Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
else if (TREE_CODE (t) == VAR_DECL  decl_function_context (t)
!TREE_STATIC (t))
  return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */
+   TREE_PUBLIC (t)
+   DECL_EXTERNAL (t)
+   DECL_CONTEXT (t)
+   TREE_CODE (DECL_CONTEXT (t)) == FUNCTION_DECL)
+ return false;
else
  return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME);
  }
Index: gcc/testsuite/gcc.dg/lto/20111207-2_0.c
===
*** gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
--- gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
***
*** 0 
--- 1,17 
+ /* { dg-lto-do run } */
+ 
+ int
+ test (void)
+ {
+   int f (void);
+   return 0;
+ }
+ 
+ int
+ main (void)
+ {
+   int f (void);
+   int test (void);
+ 
+   return test ();
+ }
Index: gcc/testsuite/gcc.dg/guality/pr48437.c
===
*** gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
--- gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
***
*** 0 
--- 1,15 
+ /* PR lto/48437 */
+ /* { dg-do run } */
+ /* { dg-options -g } */
+ 
+ int i __attribute__((used));
+ int main()
+ {
+   volatile int i;
+   for (i = 3; i  7; ++i)
+ {
+   extern int i; /* { dg-final { gdb-test 7 i 0 } } */
+   asm volatile ( : : : memory);
+ }
+   return 0;
+ }


Re: [PATCH] Fix PR tree-optimization/51315

2011-12-07 Thread Richard Guenther
On Wed, Dec 7, 2011 at 3:32 PM, Eric Botcazou ebotca...@adacore.com wrote:
 You are basically (but non-optimally) locally re-implementing
 what expr.c:get_object_or_type_alignment does, for the
 bare MEM_REF case (you don't consider offsets that
 do not change the alignment, nor alignment information
 present on the SSA name).

 Adjusted version attached, regression-tested on SPARC/Solaris.  I'll do a full
 testing cycle if it is OK for mainline.

Yes, the patch is ok for mainline.

 get_object_or_type_alignment doesn't exist on the 4.6 branch, the expanders 
 for
 MEM_REF and TARGET_MEM_REF do:

  MAX (TYPE_ALIGN (TREE_TYPE (exp)),
       get_object_alignment (exp, BIGGEST_ALIGNMENT)))

 instead, so I presume I should do the same for them in tree_non_aligned_mem_p?

Yeah, you can do that.

Thanks,
Richard.


 2011-12-07  Eric Botcazou  ebotca...@adacore.com

        PR tree-optimization/51315
        * tree.h (get_object_or_type_alignment): Declare.
        * expr.c (get_object_or_type_alignment): Move to...
        * builtins.c (get_object_or_type_alignment): ...here.  Add assertion.
        * tree-sra.c (tree_non_mode_aligned_mem_p): Rename to...
        (tree_non_aligned_mem_p): ...this.  Add ALIGN parameter.  Look into
        MEM_REFs and use get_object_or_type_alignment for them.
        (build_accesses_from_assign): Adjust for above change.
        (access_precludes_ipa_sra_p): Likewise.



 --
 Eric Botcazou


[SMS, DDG] Additional edges to instructions with clobber

2011-12-07 Thread Roman Zhuykov
Apologies for the messed up previous e-mail.

This letter contains the first separate patch for ddg.cIt creates
additional nessesary anti-dep edges in data dependency graph.
2011/10/12 Ayal Zaks ayal.z...@gmail.com: The following situation
happens when SMS is enabled without register renaming
(-fno-modulo-sched-allow-regmoves).  When data dependency graph is
built, there is a step when we generate anti-dependencies from last
register use to first write of this register at the next
iteration. is a step when we generate anti-dependencies from all
last registers uses (i.e., of last register def) to first write of
this register at the next iteration.
Right.
 At this moment we should also create such dependencies to all 
 instructions which clobber the register to prevent this clobbers being 
 before last use is new schedule. well, we simply need to connect these 
 last uses to either the first write *or* the first clobber of this 
 register at the next iteration, according to whichever is first, no?
No, is works now just how you describe. Clobber is considered as a
write,and last uses are connected with first write or first
clobber.But this is not sufficient: similarly to how there's no
dependenciesbetween last uses,there shall be no dependency between
first clobbers (i.e. clobbersof a registercan be permuted freely).
So at least we have to make an additional dependencyedge to each
clobber before first write.
 Here is an model of example: loop { set1 regR use1 regR 
 clobber regR set2 regR use2 regR } If we create only 
 use2-set1 anti-dependency (and no use2-cloober) the following wrong 
 schedule is possible: prologue { set1 regR use1 regR clobber 
 regR } kernel { set2 regR clobber regR (instruction from next 
 iteration in terms of old loop kernel) use2 regR set1 regR (also from 
 next iteration) use1 regR (also from next iteration) } epilogue 
 { set2 regR use2 regR } strange; according to prolog (and 
 epilog), clobber should appear after use1 in the kernel, no? Aren't there 
 (intra-iteration) dependencies preventing the clobber to skip over use1 
 and/or set1?
Yeah, this is bad example - I wrongly suggested that there is
noanti-dep between use1 and clobber.
New example:loop { clobber1 clobber2 set use}When there is no
use-clobber2 anti-dep - the following wrong scheduleis possible.
Clobber2 is on stage0, other instructions are on stage1.prologue {
clobber2}kernel { clobber1 set clobber2 use}epilogue { clobber1 set
use}
 This problem was succesfully fixed by creating a vector of all 
 clobbering instructions together with first write and adding all needed 
 dependencies. seems like an overkill to me; we didn't draw an edge 
 between every last use and every write, because writes are kept in order 
 by having output dependences between them. So should be the case with 
 clobbers.
Clobbers themselves aren't kept in order - there are no output
dependencesbetween them. They may be scheduled in any order.
 Presumably, the ddg already has all intra-iteration edges preventing motion 
 of clobbering instructions within an iteration, and we are only missing 
 inter-iteration deps or deps surfaced by eliminating anti-deps, right?
I think the new version of a fix suits this reasoning.Now it creates
dependencies only to clobber instructions before first write.And
certainly to first write instruction itself.
--Roman zhuykovzhr...@ispras.ru
2011-12-07  Roman Zhuykov  zhr...@ispras.ru
	* ddg.c: New VEC.
	(add_cross_iteration_register_deps): Store information about
	clobbers before first write to a register.  Use collected
	information to create anti-dependencies from last uses.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..e3e065c 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 /* A flag indicating that a ddg edge belongs to an SCC or not.  */
 enum edge_flag {NOT_IN_SCC = 0, IN_SCC};
 
+/* A vector of dependencies needed while processing clobbers.  */
+DEF_VEC_P(df_ref);
+DEF_VEC_ALLOC_P(df_ref,heap);
+
 /* Forward declarations.  */
 static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr);
 static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr);
@@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to,
 static void
 add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 {
-  int regno = DF_REF_REGNO (last_def);
+  unsigned int regno = DF_REF_REGNO (last_def);
   struct df_link *r_use;
   int has_use_in_bb_p = false;
-  rtx def_insn = DF_REF_INSN (last_def);
+  rtx insn, def_insn = DF_REF_INSN (last_def);
   ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn);
   ddg_node_ptr use_node;
 #ifdef ENABLE_CHECKING
   struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb);
 #endif
-  df_ref first_def = df_bb_regno_first_def_find (g-bb, regno);
+  static VEC(df_ref,heap) *clobbers;
+  df_ref first_write_def = NULL;
+  df_ref *def_rec;
+  unsigned int uid;
+  
+  clobbers = VEC_alloc (df_ref, heap, 0);

Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)

2011-12-07 Thread Jason Merrill

On 12/06/2011 02:48 PM, Jakub Jelinek wrote:

-  gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type));
+  gcc_assert (!processing_template_decl
+ || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE));


How about SCALAR_TYPE_P (strip_array_types (type))?  OK with that change.

Jason



Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Jakub Jelinek
On Wed, Dec 07, 2011 at 03:52:35PM +0100, Richard Guenther wrote:
 *** gcc/testsuite/gcc.dg/guality/pr48437.c(revision 0)
 --- gcc/testsuite/gcc.dg/guality/pr48437.c(revision 0)
 ***
 *** 0 
 --- 1,15 
 + /* PR lto/48437 */
 + /* { dg-do run } */
 + /* { dg-options -g } */
 + 
 + int i __attribute__((used));
 + int main()
 + {
 +   volatile int i;
 +   for (i = 3; i  7; ++i)
 + {
 +   extern int i; /* { dg-final { gdb-test 7 i 0 } } */
 +   asm volatile ( : : : memory);

Do you really want to test it on line 7 ({ of after main())?
I'd expect you want to look at it on the line where asm volatile is,
and make sure you can put a breakpoint on it:
#include ../nop.h
early and use NOP instead of  as the asm pattern.

Jakub


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Jakub Jelinek wrote:

 On Wed, Dec 07, 2011 at 03:52:35PM +0100, Richard Guenther wrote:
  *** gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
  --- gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
  ***
  *** 0 
  --- 1,15 
  + /* PR lto/48437 */
  + /* { dg-do run } */
  + /* { dg-options -g } */
  + 
  + int i __attribute__((used));
  + int main()
  + {
  +   volatile int i;
  +   for (i = 3; i  7; ++i)
  + {
  +   extern int i; /* { dg-final { gdb-test 7 i 0 } } */
  +   asm volatile ( : : : memory);
 
 Do you really want to test it on line 7 ({ of after main())?

Oops - so much for adding the /* dg */ stuff late ...

 I'd expect you want to look at it on the line where asm volatile is,
 and make sure you can put a breakpoint on it:
 #include ../nop.h
 early and use NOP instead of  as the asm pattern.

Changed.

Thanks,
Richard.


Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)

2011-12-07 Thread Jakub Jelinek
On Wed, Dec 07, 2011 at 10:10:08AM -0500, Jason Merrill wrote:
 On 12/06/2011 02:48 PM, Jakub Jelinek wrote:
 -  gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type));
 +  gcc_assert (!processing_template_decl
 +  || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE));
 
 How about SCALAR_TYPE_P (strip_array_types (type))?  OK with that change.

That would be readable, but expensive - SCALAR_TYPE_P evaluates
its argument 11 times.

Jakub


Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)

2011-12-07 Thread Jason Merrill

On 12/07/2011 10:18 AM, Jakub Jelinek wrote:

On Wed, Dec 07, 2011 at 10:10:08AM -0500, Jason Merrill wrote:

On 12/06/2011 02:48 PM, Jakub Jelinek wrote:

-  gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type));
+  gcc_assert (!processing_template_decl
+ || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE));


How about SCALAR_TYPE_P (strip_array_types (type))?  OK with that change.


That would be readable, but expensive - SCALAR_TYPE_P evaluates
its argument 11 times.


Good point.  Your patch is OK, then.

Jasno



[SMS, DDG] Additional edges to instructions with clobber

2011-12-07 Thread Roman Zhuykov
Apologies for the messed up previous e-mails.

This letter contains the first separate patch for ddg.c
It creates additional nessesary anti-dep edges in data dependency graph.

2011/10/12 Ayal Zaks ayal.z...@gmail.com:
 The following situation happens when SMS is enabled without register 
 renaming
 (-fno-modulo-sched-allow-regmoves).  When data dependency graph is built, 
 there
 is a step when we generate anti-dependencies from last register use to first
 write of this register at the next iteration.

 is a step when we generate anti-dependencies from all last registers
 uses (i.e., of last register def) to first write of this register at
 the next iteration.


Right.

 At this moment we should also
 create such dependencies to all instructions which clobber the register to
 prevent this clobbers being before last use is new schedule.


 well, we simply need to connect these last uses to either the first
 write *or* the first clobber of this register at the next iteration,
 according to whichever is first, no?

No, is works now just how you describe. Clobber is considered as a write,
and last uses are connected with first write or first clobber.
But this is not sufficient: similarly to how there's no dependencies
between last uses,
there shall be no dependency between first clobbers (i.e. clobbers
of a register
can be permuted freely). So at least we have to make an additional dependency
edge to each clobber before first write.

 Here is an model of example:

 loop {
 set1 regR
 use1 regR
 clobber regR
 set2 regR
 use2 regR
 }

 If we create only use2-set1 anti-dependency (and no use2-cloober) the
 following wrong schedule is possible:

 prologue {
 set1 regR
 use1 regR
 clobber regR
 }
 kernel {
 set2 regR
 clobber regR (instruction from next iteration in terms of old loop kernel)
 use2 regR
 set1 regR (also from next iteration)
 use1 regR (also from next iteration)
 }
 epilogue {
 set2 regR
 use2 regR
 }


 strange; according to prolog (and epilog), clobber should appear after
 use1 in the kernel, no? Aren't there (intra-iteration) dependencies
 preventing the clobber to skip over use1 and/or set1?


Yeah, this is bad example - I wrongly suggested that there is no
anti-dep between use1 and clobber.

New example:
loop {
 clobber1
 clobber2
 set
 use
}
When there is no use-clobber2 anti-dep - the following wrong schedule
is possible. Clobber2 is on stage0, other instructions are on stage1.
prologue {
 clobber2
}
kernel {
 clobber1
 set
 clobber2
 use
}
epilogue {
 clobber1
 set
 use
}

 This problem was succesfully fixed by creating a vector of all clobbering
 instructions together with first write and adding all needed dependencies.


 seems like an overkill to me; we didn't draw an edge between every
 last use and every write, because writes are kept in order by having
 output dependences between them. So should be the case with clobbers.

Clobbers themselves aren't kept in order - there are no output dependences
between them. They may be scheduled in any order.

 Presumably, the ddg already has all intra-iteration edges preventing
 motion of clobbering instructions within an iteration, and we are only
 missing inter-iteration deps or deps surfaced by eliminating
 anti-deps, right?

I think the new version of a fix suits this reasoning.
Now it creates dependencies only to clobber instructions before first write.
And certainly to first write instruction itself.

--
Roman Zhuykov
zhr...@ispras.ru
2011-12-07  Roman Zhuykov  zhr...@ispras.ru
	* ddg.c: New VEC.
	(add_cross_iteration_register_deps): Store information about
	clobbers before first write to a register.  Use collected
	information to create anti-dependencies from last uses.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..e3e065c 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -49,6 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 /* A flag indicating that a ddg edge belongs to an SCC or not.  */
 enum edge_flag {NOT_IN_SCC = 0, IN_SCC};
 
+/* A vector of dependencies needed while processing clobbers.  */
+DEF_VEC_P(df_ref);
+DEF_VEC_ALLOC_P(df_ref,heap);
+
 /* Forward declarations.  */
 static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr);
 static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr);
@@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to,
 static void
 add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def)
 {
-  int regno = DF_REF_REGNO (last_def);
+  unsigned int regno = DF_REF_REGNO (last_def);
   struct df_link *r_use;
   int has_use_in_bb_p = false;
-  rtx def_insn = DF_REF_INSN (last_def);
+  rtx insn, def_insn = DF_REF_INSN (last_def);
   ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn);
   ddg_node_ptr use_node;
 #ifdef ENABLE_CHECKING
   struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb);
 #endif
-  df_ref first_def = df_bb_regno_first_def_find (g-bb, regno);
+  static VEC(df_ref,heap) *clobbers;
+  df_ref first_write_def = NULL;
+  df_ref 

[SMS, DDG] Correctly delete anti-dep edges for renaming

2011-12-07 Thread Roman Zhuykov
This letter contains second separate patch for ddg.c
It prevents removing some edges in data dependency graph
when renaming is allowed.

2011/10/12 Ayal Zaks ayal.z...@gmail.com:
 The other bug happens only with -fmodulo-sched-allow-regmoves.  Here we
 eliminate some anti-dependence edges in data dependency graph in order to
 resolve them later by adding some register moves (renaming instructions).  
 But
 in situation as in example above compiler gives an ICE because it can't 
 create
 a register move, when regR is hardware flag register.  So we have to know 
 which
 register(s) cause anti-dependency in order to understand whether we can 
 ignore
 it.  I can't find any easy way to gather this information, so I create my 
 own
 structures to store this info and had implemented my own hooks for
 sched_analyze function.  This leads to more complex interconnection between
 ddg.c and modulo-sched.c.

 Well, having ddg.c's/create_ddg_dep_from_intra_loop_link() consult
 Register sets from modulo scheduler structures to determine if an
 anti-dep can be eliminated, seems awkward. One should be able to build
 a ddg independent of any modulo scheduler structures.

I think now I found a proper way to gather information about
the register(s) causing anti-dependency (see the patch attached).

 One simple solution is to keep all anti-dep edges of registers that
 are clobbered anywhere in the loop. This might be overly conservative,
 but perhaps not so if On x86_64 it is clobbered by most of arithmetic
 instructions.
 If an anti-dep between a use and a dep of a clobber register is to be
 removed, a dependence should be created between the use and a
 clobbering instruction following the def. Hope this is clear.

 This too should be solved by a dedicated (separate) patch, for easier 
 digestion.

 Presumably, the ddg already has all intra-iteration edges preventing
 motion of clobbering instructions within an iteration, and we are only
 missing inter-iteration deps or deps surfaced by eliminating
 anti-deps, right?

 You might consider introducing a new type of dependence for such
 edges, Clobber, if it would help.


Ayal, I suppose here we have some misunderstanding.
This is not directly related to the problem of clobbers discussed previously.
Here I talk about the need to find out the register causing the
anti-dependence (for instance, we need to check whether it can be renamed).
Currently, SMS simply attempts to take the RHS of the second instruction (via
single_set()), and if it's a register, SMS assumes it's a register causing the
dependency. This breaks in a following scenario:

insn1: ... (read flags)
insn2: regA = regB - regC; update flags

Here, SMS would wrongly take regA as the dependency register, while it was in
fact the flags register. So, I rewrote this part in a different way and make
a separate patch.

--
Roman Zhuykov
zhr...@ispras.ru
2011-12-07  Roman Zhuykov  zhr...@ispras.ru
	* ddg.c (create_ddg_dep_from_intra_loop_link): Correclty determine
	which register causes anti-dependency.  Prevent removing anti-dep
	edge from ddg when hard_register_p.
---

diff --git a/gcc/ddg.c b/gcc/ddg.c
index 2b1cfe8..0a3726f 100644
--- a/gcc/ddg.c
+++ b/gcc/ddg.c
@@ -204,23 +204,34 @@ create_ddg_dep_from_intra_loop_link (ddg_ptr g, ddg_node_ptr src_node,
(t == ANTI_DEP  dt == REG_DEP)
!autoinc_var_is_used_p (dest_node-insn, src_node-insn))
 {
-  rtx set;
+  df_ref *def_rec;
+  bool can_delete_dep = true;
 
-  set = single_set (dest_node-insn);
-  /* TODO: Handle registers that REG_P is not true for them, i.e.
+  /* TODO: Handle !REG_P register correctly, i.e.
  subregs and special registers.  */
-  if (set  REG_P (SET_DEST (set)))
-{
-  int regno = REGNO (SET_DEST (set));
-  df_ref first_def;
-  struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb);
+  for (def_rec = DF_INSN_DEFS (dest_node-insn); *def_rec; def_rec++)
+	{
+	  df_ref first_def, def = *def_rec, *use_rec;
+	  unsigned regno = DF_REF_REGNO (def);
+	  struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb);
+	  bool src_uses = false;
 
-  first_def = df_bb_regno_first_def_find (g-bb, regno);
-  gcc_assert (first_def);
+	  for (use_rec = DF_INSN_USES (src_node-insn); *use_rec; use_rec++)
+	if (regno == DF_REF_REGNO (*use_rec))
+	  src_uses = true;
 
-  if (bitmap_bit_p (bb_info-gen, DF_REF_ID (first_def)))
-return;
-}
+	  if (!src_uses)
+	continue;
+
+	  first_def = df_bb_regno_first_def_find (g-bb, regno);
+	  gcc_assert (first_def);
+
+	  if (HARD_REGISTER_NUM_P (regno)
+	  || !bitmap_bit_p (bb_info-gen, DF_REF_ID (first_def)))
+	can_delete_dep = false;
+	}
+  if (can_delete_dep)
+	return;
 }
 
latency = dep_cost (link);


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo

On 12/07/11 09:52, Richard Guenther wrote:


Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
 else if (TREE_CODE (t) == VAR_DECL  decl_function_context (t)
  !TREE_STATIC (t))
   return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */


Factor it into a common routine then?

Diego.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Diego Novillo wrote:

 On 12/07/11 09:52, Richard Guenther wrote:
 
  Index: gcc/lto-streamer-out.c
  ===
  *** gcc/lto-streamer-out.c  (revision 182081)
  --- gcc/lto-streamer-out.c  (working copy)
  *** tree_is_indexable (tree t)
  *** 129,134 
  --- 129,144 
   else if (TREE_CODE (t) == VAR_DECL  decl_function_context (t)
!TREE_STATIC (t))
 return false;
  +   /* If this is a decl generated for block local externs for
  +  debug info generation, stream it unshared alongside BLOCK_VARS.  */
  +   else if (VAR_OR_FUNCTION_DECL_P (t)
  +  /* ???  The following tests are a literal match on what
  + c-decl.c:pop_scope does.  */
 
 Factor it into a common routine then?

pop_scope of course _sets_ the values that way, it doesn't test them.

Richard.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo

On 12/07/11 10:46, Richard Guenther wrote:

On Wed, 7 Dec 2011, Diego Novillo wrote:


On 12/07/11 09:52, Richard Guenther wrote:


Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
  else if (TREE_CODE (t) == VAR_DECL   decl_function_context (t)
   !TREE_STATIC (t))
return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */


Factor it into a common routine then?


pop_scope of course _sets_ the values that way, it doesn't test them.


Yes.  I meant factoring, so future users have something to call, instead 
of three seemingly random flag checks.  pop_scope could also be calling 
some complementary setter instead of doing the seemingly random flag 
setting.



Diego.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Diego Novillo wrote:

 On 12/07/11 10:46, Richard Guenther wrote:
  On Wed, 7 Dec 2011, Diego Novillo wrote:
  
   On 12/07/11 09:52, Richard Guenther wrote:
   
Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
  else if (TREE_CODE (t) == VAR_DECL   decl_function_context (t)
   !TREE_STATIC (t))
return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.
*/
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */
   
   Factor it into a common routine then?
  
  pop_scope of course _sets_ the values that way, it doesn't test them.
 
 Yes.  I meant factoring, so future users have something to call, instead of
 three seemingly random flag checks.  pop_scope could also be calling some
 complementary setter instead of doing the seemingly random flag setting.

I don't see a good way to factor out the flags setting.  Factoring out
the copying maybe.  But well... we can do that when a 2nd user
comes along?

Richard.


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo

On 12/07/11 10:54, Richard Guenther wrote:

On Wed, 7 Dec 2011, Diego Novillo wrote:


On 12/07/11 10:46, Richard Guenther wrote:

On Wed, 7 Dec 2011, Diego Novillo wrote:


On 12/07/11 09:52, Richard Guenther wrote:


Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,144 
   else if (TREE_CODE (t) == VAR_DECLdecl_function_context (t)
!TREE_STATIC (t))
 return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.
*/
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */


Factor it into a common routine then?


pop_scope of course _sets_ the values that way, it doesn't test them.


Yes.  I meant factoring, so future users have something to call, instead of
three seemingly random flag checks.  pop_scope could also be calling some
complementary setter instead of doing the seemingly random flag setting.


I don't see a good way to factor out the flags setting.  Factoring out
the copying maybe.  But well... we can do that when a 2nd user
comes along?


The problem is that the 2nd user will cut-n-paste from this one. 
However, if you find adding a little function too strenuous, I guess 
it's not too big a deal.



Diego.


Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Joseph S. Myers
On Wed, 7 Dec 2011, Richard Guenther wrote:

 code - well, that's ok.  Pointing people to the latest official
 release series (in this case 4.6.x) is also ok, we're keeping too
 many branches active IMNSHO.

As I recall we agreed in London that both 4.3 and 4.4 should be closed 
(after a final release from each branch) - and in general two active 
release branches was enough - but so far have only closed 4.3.

(Closed = no datestamp updates, no snapshots, no releases, no tracking 
regression status in Bugzilla, not absolutely disallowing commits if 
someone feels it's still useful to put some fix on an old branch.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Richard Guenther
On Wed, 7 Dec 2011, Diego Novillo wrote:

 On 12/07/11 10:54, Richard Guenther wrote:
  On Wed, 7 Dec 2011, Diego Novillo wrote:
  
   On 12/07/11 10:46, Richard Guenther wrote:
On Wed, 7 Dec 2011, Diego Novillo wrote:

 On 12/07/11 09:52, Richard Guenther wrote:
 
  Index: gcc/lto-streamer-out.c
  ===
  *** gcc/lto-streamer-out.c  (revision 182081)
  --- gcc/lto-streamer-out.c  (working copy)
  *** tree_is_indexable (tree t)
  *** 129,134 
  --- 129,144 
 else if (TREE_CODE (t) == VAR_DECLdecl_function_context
  (t)
  !TREE_STATIC (t))
   return false;
  +   /* If this is a decl generated for block local externs for
  +  debug info generation, stream it unshared alongside
  BLOCK_VARS.
  */
  +   else if (VAR_OR_FUNCTION_DECL_P (t)
  +  /* ???  The following tests are a literal match on what
  + c-decl.c:pop_scope does.  */
 
 Factor it into a common routine then?

pop_scope of course _sets_ the values that way, it doesn't test them.
   
   Yes.  I meant factoring, so future users have something to call, instead
   of
   three seemingly random flag checks.  pop_scope could also be calling some
   complementary setter instead of doing the seemingly random flag setting.
  
  I don't see a good way to factor out the flags setting.  Factoring out
  the copying maybe.  But well... we can do that when a 2nd user
  comes along?
 
 The problem is that the 2nd user will cut-n-paste from this one. However, if
 you find adding a little function too strenuous, I guess it's not too big a
 deal.

Testing this kind of patches turns out to be quite time-consuming
(I do a LTO bootstrap and two SPEC 2k6 builds (-g and -g0)), so
yeah ...

The following is what I ended up LTO bootstrapping (finished ok),
testing is still in progress, as is SPEC 2k6 build.

It fixes PR49945 in a similar way, by streaming VLA types locally as well.

I'm going to apply it tomorrow, when full testing hopefully finished

Thanks,
Richard.

2011-12-08  Richard Guenther  rguent...@suse.de

PR lto/48437
PR lto/49945
* lto-streamer-out.c (tree_is_indexable): Exclude block-local
extern declarations.

PR lto/48437
* gcc.dg/lto/20111207-2_0.c: New testcase.
* gcc.dg/guality/pr48437.c: Likewise.

Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 182081)
--- gcc/lto-streamer-out.c  (working copy)
*** tree_is_indexable (tree t)
*** 129,134 
--- 129,147 
else if (TREE_CODE (t) == VAR_DECL  decl_function_context (t)
!TREE_STATIC (t))
  return false;
+   /* If this is a decl generated for block local externs for
+  debug info generation, stream it unshared alongside BLOCK_VARS.  */
+   else if (VAR_OR_FUNCTION_DECL_P (t)
+  /* ???  The following tests are a literal match on what
+ c-decl.c:pop_scope does.  */
+   TREE_PUBLIC (t)
+   DECL_EXTERNAL (t)
+   DECL_CONTEXT (t)
+   TREE_CODE (DECL_CONTEXT (t)) == FUNCTION_DECL)
+ return false;
+   else if (TYPE_P (t)
+   variably_modified_type_p (t, NULL_TREE))
+ return false;
else
  return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME);
  }
Index: gcc/testsuite/gcc.dg/lto/20111207-2_0.c
===
*** gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
--- gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0)
***
*** 0 
--- 1,17 
+ /* { dg-lto-do run } */
+ 
+ int
+ test (void)
+ {
+   int f (void);
+   return 0;
+ }
+ 
+ int
+ main (void)
+ {
+   int f (void);
+   int test (void);
+ 
+   return test ();
+ }
Index: gcc/testsuite/gcc.dg/guality/pr48437.c
===
*** gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
--- gcc/testsuite/gcc.dg/guality/pr48437.c  (revision 0)
***
*** 0 
--- 1,17 
+ /* PR lto/48437 */
+ /* { dg-do run } */
+ /* { dg-options -g } */
+ 
+ #include ../nop.h
+ 
+ int i __attribute__((used));
+ int main()
+ {
+   volatile int i;
+   for (i = 3; i  7; ++i)
+ {
+   extern int i;
+   asm volatile (NOP : : : memory); /* { dg-final { gdb-test 14 i 0 
} } */
+ }
+   return 0;
+ }


Re: [PATCH][LTO] Fix PR48437

2011-12-07 Thread Diego Novillo
On Wed, Dec 7, 2011 at 11:16, Richard Guenther rguent...@suse.de wrote:

 I'm going to apply it tomorrow, when full testing hopefully finished

Sure.  But remember the zombie kitties!


Diego.


Re: [PATCH][ARM] one_cmpldi2 in NEON

2011-12-07 Thread Richard Earnshaw
On 06/12/11 17:59, Andrew Stubbs wrote:
 This patch adds a one's complement pattern for doing DImode 'not' in 
 NEON registers.
 
 There are already patterns for doing one's complement of vectors, and 
 even though it boils down to the same instruction, the DImode case was 
 missing.
 
 The patch needs to be a little more complicated than using a mode 
 iterator that includes DI because it needs to coexist with the non-neon 
 one_cmpldi2 (renamed by this patch to one_cmpldi2_core).
 
 OK for when stage 1 opens again?
 
 Andrew
 
 
 neon-not.patch
 
 
 2011-12-06  Andrew Stubbs  a...@codesourcery.com
 
   gcc/
   * config/arm/arm.md (one_cmpldi2): Rename to ...
   (one_cmpldi2_core): ... this, and modify it to prevent it being
   used for NEON.
   (one_cmpldi2): New define_expand.
   * config/arm/neon.md (one_cmpldi2_neon): New define_insn.


 +(define_insn_and_split *one_cmpldi2_core
 +  [(set (match_operand:DI 0 arm_general_register_operand =r,r)
 + (not:DI (match_operand:DI 1 arm_general_register_operand 0,r)))]

Thinking about it, for an operation with one input and one output, there's no 
need for the
earlyclobber marker when the input is tied to the output (there's no other 
operand that can be
clobbered).

Otherwise OK.

R.



Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)

2011-12-07 Thread Jakub Jelinek
On Wed, Dec 07, 2011 at 04:16:25PM +, Joseph S. Myers wrote:
 On Wed, 7 Dec 2011, Richard Guenther wrote:
 
  code - well, that's ok.  Pointing people to the latest official
  release series (in this case 4.6.x) is also ok, we're keeping too
  many branches active IMNSHO.
 
 As I recall we agreed in London that both 4.3 and 4.4 should be closed 
 (after a final release from each branch) - and in general two active 
 release branches was enough - but so far have only closed 4.3.
 
 (Closed = no datestamp updates, no snapshots, no releases, no tracking 
 regression status in Bugzilla, not absolutely disallowing commits if 
 someone feels it's still useful to put some fix on an old branch.)

I'll handle 4.4.7 and closing the branch in the coming weeks.

Jakub


Re: Tidy up MD_INCLUDES in config/arm/t-arm

2011-12-07 Thread Richard Earnshaw
On 01/12/11 11:09, Georg-Johann Lay wrote:
 Richard Earnshaw wrote:
 On 29/11/11 09:42, Matthew Gretton-Dann wrote:
 All,

 Whilst developing the Cortex-A15 integer pipeline patch it was noted 
 that the MD_INCLUDES variable in config/arm/t-arm has not been kept 
 up-to-date.

 The attached patch fixes this, and rearranges the list of md files into 
 alphabetical order.

 The list was generated using `ls -1 *.md | grep -v arm\\.md`.

 Tested by doing a arm-none-eabi build.

 Can someone please review, and if appropriate apply?

 Thanks,

 Matt

 gcc/ChangeLog:
 2011-11-29  Matthew Gretton-Dann  matthew.gretton-d...@arm.com

 * config/arm/t-arm (MD_INCLUDES): Ensure all md files are
 listed.


 OK.

 R.
 
 Is each entry mandatory in that list?
 
 I thought gen-tools already arrange for great part of MD_INCLUDES?
 
 For example, after adding (include avr-dimode.md) to avr.md, ./gcc/mddeps.mk
 reads:
 
 MD_INCLUDES = \
   ../../../gcc.gnu.org/trunk/gcc/config/avr/predicates.md \
   ../../../gcc.gnu.org/trunk/gcc/config/avr/constraints.md \
   ../../../gcc.gnu.org/trunk/gcc/config/avr/avr-dimode.md
 
 ../../../gcc.gnu.org/trunk/gcc/config/avr/predicates.md:
 
 ../../../gcc.gnu.org/trunk/gcc/config/avr/constraints.md:
 
 ../../../gcc.gnu.org/trunk/gcc/config/avr/avr-dimode.md:
 
 so that maintaining such a list might be considerable easier.
 

Indeed.  I hadn't realised that that feature had been added since I first added 
that macro.  It does
indeed look as though the whole definition of MD_INCLUDES is now redundant.

R.




Re: [patch] ARM: Fix miscompilation in arm.md:*minmax_arithsi. (PR target/51408)

2011-12-07 Thread Richard Earnshaw
On 05/12/11 10:42, Richard Earnshaw wrote:
 On 04/12/11 13:32, kazu_hir...@mentor.com wrote:
 Hi,

 Attached is a patch to fix miscompilation in
 arm.md:*minmax_arithsi.

 The following testcase, reduced from efgcvt_r.c:fcvt_r in glibc, gets
 miscompiled:

 extern void abort (void);

 int __attribute__((noinline))
 foo (int a, int b)
 {
   int max = (b  0) ? b : 0;
   return max - a;
 }

 int
 main (void)
 {
   if (foo (3, -1) != -3)
 abort ();
   return 0;
 }

 arm-none-eabi-gcc -O1 generates:

 foo:
  @ Function supports interworking.
  @ args = 0, pretend = 0, frame = 0
  @ frame_needed = 0, uses_anonymous_args = 0
  @ link register save eliminated.
  cmp r1, #0
  rsbge   r0, r0, r1
  bx  lr

 This would be equivalent to:

   return b = 0 ? b - a : a;

 which is different from:

   return b = 0 ? b - a : -a;

 That is, in assembly code, we should have an else clause like so:

  cmp r1, #0
  rsbge   r0, r0, r1  - then clause
  rsblt   r0, r0, #0  - else clause
  bx  lr

 The problem comes from the fact that arm.md:*minmax_arithsi does not
 add the else clause even though MINUS is not commutative.

 The patch fixes the problem by always requiring the else clause in
 the MINUS case.

 Tested by running gcc testsuite on various ARM subarchitectures.  OK
 to apply?

 Kazu Hirata

 gcc/
 2011-12-04  Kazu Hirata  k...@codesourcery.com

  PR target/51408
  * config/arm/arm.md (*minmax_arithsi): Always require the else
  clause in the MINUS case.

 gcc/testsuite/
 2011-12-04  Kazu Hirata  k...@codesourcery.com

  PR target/51408
  * gcc.dg/pr51408.c: New.

 
 OK.
 
 R.

BTW, I would expect this to also exist in all the release branches.  Could you 
back-port it where
needed please.

TIA,

R.



[Patch, Fortran] PR51448 [4.6/4.7] Fix realloc with RHS conversion function

2011-12-07 Thread Tobias Burnus
This fixes a -frealloc-lhs regression where the RHS is handled by a 
conversion function whose argument has component refs.


Build and regtested on x86-64-linux.
OK for the trunk and 4.7?

Tobias
2011-12-07  Tobias Burnus  bur...@net-b.de

	PR fortran/51448
	* fortran/trans-array.c (get_std_lbound): Fix handling of
	conversion functions.

2011-12-07  Tobias Burnus  bur...@net-b.de

	PR fortran/51448
	* gfortran.dg/realloc_on_assign_8.f90: New.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index ee8f896..2fd5749 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7428,7 +7584,16 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size)
 			  gfc_array_index_type, cond,
 			  lbound, gfc_index_one_node);
 }
-  else if (expr-expr_type == EXPR_VARIABLE)
+
+  if (expr-expr_type == EXPR_FUNCTION)
+{
+  /* A conversion function, so use the argument.  */
+  gcc_assert (expr-value.function.isym
+		   expr-value.function.isym-conversion);
+  expr = expr-value.function.actual-expr;
+}
+
+  if (expr-expr_type == EXPR_VARIABLE)
 {
   tmp = TREE_TYPE (expr-symtree-n.sym-backend_decl);
   for (ref = expr-ref; ref; ref = ref-next)
@@ -7441,15 +7606,6 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size)
 	}
   return GFC_TYPE_ARRAY_LBOUND(tmp, dim);
 }
-  else if (expr-expr_type == EXPR_FUNCTION)
-{
-  /* A conversion function, so use the argument.  */
-  expr = expr-value.function.actual-expr;
-  if (expr-expr_type != EXPR_VARIABLE)
-	return gfc_index_one_node;
-  desc = TREE_TYPE (expr-symtree-n.sym-backend_decl);
-  return get_std_lbound (expr, desc, dim, assumed_size);
-}
 
   return gfc_index_one_node;
 }
--- /dev/null	2011-12-07 07:00:28.727532040 +0100
+++ gcc/gcc/testsuite/gfortran.dg/realloc_on_assign_8.f90	2011-12-07 16:34:46.0 +0100
@@ -0,0 +1,17 @@
+! { dg-do compile }
+!
+! PR fortran/51448
+!
+! Contribued by François Willot
+!
+  PROGRAM MAIN
+  IMPLICIT NONE
+  TYPE mytype
+REAL b(2)
+  END TYPE mytype
+  TYPE(mytype) a
+  DOUBLE PRECISION, ALLOCATABLE :: x(:)
+  ALLOCATE(x(2))
+  a%b=0.0E0
+  x=a%b
+  END


Re: [C++ PATCH] ICE with invalid user-defined literals (PR c++/51420)

2011-12-07 Thread Jason Merrill

Applied, thanks.

Jason


Re: Fix a bug in ThreadSanitizer pass (issue 5448109)

2011-12-07 Thread Xinliang David Li
ok for google/main.

David

On Wed, Dec 7, 2011 at 3:04 AM,  dvyu...@google.com wrote:
 On 2011/12/05 17:05:17, dvyukov wrote:

 This is for google-main branch.
 Fix taking address of SSA_NAME in ThreadSanitizer pass.


 Index: gcc/tree-tsan.c
 ===
 --- gcc/tree-tsan.c     (revision 182014)
 +++ gcc/tree-tsan.c     (working copy)
 @@ -726,13 +726,20 @@
    struct mop_desc mop;
    unsigned fld_off;
    unsigned fld_size;
 +  tree base;


 +
 +  base = get_base_address (expr);
 +  if (base == NULL_TREE
 +      || TREE_CODE (base) == SSA_NAME
 +      || TREE_CODE (base) == STRING_CST)
 +    return;
 +
    tcode = TREE_CODE (expr);


    /* Below are things we do not instrument
       (no possibility of races or not implemented yet).  */
    if ((func_ignore  (tsan_ignore_mop | tsan_ignore_rec))
 -      || get_base_address (expr) == NULL
        /* Compiler-emitted artificial variables.  */
        || (DECL_P (expr)  DECL_ARTIFICIAL (expr))
        /* The var does not live in memory - no possibility of races.

 */

 Index: gcc/ChangeLog.google-main
 ===
 --- gcc/ChangeLog.google-main   (revision 182014)
 +++ gcc/ChangeLog.google-main   (working copy)
 @@ -1,3 +1,8 @@
 +2011-12-05   Dmitriy Vyukov  mailto:dvyu...@google.com

 +
 +       Fix taking address of SSA_NAME in ThreadSanitizer pass.
 +        * gcc/tree-tsan.c (handle_expr): Add the additional check.
 +
  2011-11-30   Dmitriy Vyukov  mailto:dvyu...@google.com


        Add directives to run ThreadSanitizer tests


 --
 This patch is available for review at

 http://codereview.appspot.com/5448109

 Sorry, I forgot to provide details about the crash.

 The compiler crashes on the following code (dump just before tsan pass):

 store_param_double (struct NET * net, struct MYSQL_BIND * param)
 {
  double value;
  unsigned char * D.26882;
  long int D.26881;
  long int D.26879;
  unsigned char * D.26877;
  double value.122;
  void * D.26875;

 bb 2:
  D.26875_2 = param_1(D)-buffer;
  value.122_3 = MEM[(double *)D.26875_2];
  D.26877_5 = net_4(D)-write_pos;
  D.26879_7 = VIEW_CONVERT_EXPRunion
 doubleget_union(value.122_3).m[0];
  MEM[(long int *)D.26877_5] = D.26879_7;
  D.26877_8 = net_4(D)-write_pos;
  D.26881_11 = VIEW_CONVERT_EXPRunion
 doubleget_union(value.122_3).m[1];
  MEM[(long int *)D.26877_8 + 4B] = D.26881_11;
  D.26877_12 = net_4(D)-write_pos;
  D.26882_13 = D.26877_12 + 8;
  net_4(D)-write_pos = D.26882_13;
  return;
 }

 with the following error message:

 libmysql.c: In function 'store_param_double':
 libmysql.c:2314:13: error: conversion of an SSA_NAME on the left hand
 side
 VIEW_CONVERT_EXPRunion doubleget_union(D.29825_23);
 D.29824_24 = VIEW_CONVERT_EXPRunion doubleget_union(D.29825_23).m[0];
 libmysql.c:2314:13: internal compiler error: verify_gimple failed

 Is it a correct fix for the issue? If base_address refers to SSA_NAME we
 just do not instrument it, right?


 http://codereview.appspot.com/5448109/


[PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Jack Howarth
Currently we are failing...

FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread simulation 
test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread simulation 
test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread simulation 
test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread simulation 
test

on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread of
gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 seconds
eliminates the failures (as these test take ~16 seconds on 
x86_64-apple-darwin11).
Okay for gcc trunk?
 Jack

gcc/testsuite/

2011-12-07  Jack Howarth howa...@bromo.med.uc.edu

* lib/gcc-simulate-thread.exp (simulate-thread): Increase timeout
to 20 seconds.


Index: gcc/testsuite/lib/gcc-simulate-thread.exp
===
--- gcc/testsuite/lib/gcc-simulate-thread.exp   (revision 182083)
+++ gcc/testsuite/lib/gcc-simulate-thread.exp   (working copy)
@@ -56,8 +56,8 @@ proc simulate-thread { args } {
 
 set gdb_worked 0
 
-# Set timeout to 10 seconds due to huge amount of generated log.
-remote_expect target 10 {
+# Set timeout to 20 seconds due to huge amount of generated log.
+remote_expect target 20 {
# Too old GDB
-re Unhandled dwarf expression|Error in sourced command file {
unsupported $testcase $message


Re: [PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Uros Bizjak
On Wed, Dec 7, 2011 at 7:43 PM, Jack Howarth howa...@bromo.med.uc.edu wrote:
 Currently we are failing...

 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread simulation 
 test
 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread simulation 
 test
 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread simulation 
 test
 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread simulation 
 test

 on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread of
 gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 
 seconds
 eliminates the failures (as these test take ~16 seconds on 
 x86_64-apple-darwin11).
 Okay for gcc trunk?

As said elsewhere, this will double the amount of already large
logfile in case of failed test.

Do we really need such detailed log?

Uros.


Re: Update to Fortran invoke documentation about the features -finit-type *really* provides.

2011-12-07 Thread Toon Moene

On 12/06/2011 08:32 PM, Steve Kargl wrote:


On Mon, Dec 05, 2011 at 07:21:59PM +0100, Toon Moene wrote:


2011-12-05  Toon Moenet...@moene.org

PR/51310
invoke.texi: Itemize the cases for which -finit-type  doesn't
work.

OK for trunk ? (and perhaps later for the 4.6 branch ?



Looks good to me.  You can apply it to the 4.6 branch
if you have time.


And then  shortly before applying it, I realized that the proper 
documentation of the limitations might be dependent on the 
-fno-automatic, -fstack-arrays and -fmax-stack-var-size=n compiler flags 
used.


So I'll come back tomorrow with version 2.0 of this patch, after 
checking out all of the above (the documentation of 4.6 and 4.7 will be 
different if using -fstack-arrays makes a difference, because that 
option only exists in 4.7).


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: [PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Iain Sandoe


On 7 Dec 2011, at 18:47, Uros Bizjak wrote:

On Wed, Dec 7, 2011 at 7:43 PM, Jack Howarth  
howa...@bromo.med.uc.edu wrote:

Currently we are failing...

FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread  
simulation test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread  
simulation test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread  
simulation test
FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread  
simulation test


on x86_64-apple-darwin11 due to the 10 second timeout in simulate- 
thread of
gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout  
to 20 seconds
eliminates the failures (as these test take ~16 seconds on x86_64- 
apple-darwin11).

Okay for gcc trunk?


if it's only one test can't you use { dg-timeout-factor 2.0  ?


As said elsewhere, this will double the amount of already large
logfile in case of failed test.

Do we really need such detailed log?


anything to optimize what's in the logs would be welcome in debugging

Iain



Re: [PATCH] increase timeout in simulate-thread gdb test

2011-12-07 Thread Uros Bizjak
On Wed, Dec 7, 2011 at 7:58 PM, Iain Sandoe
develo...@sandoe-acoustics.co.uk wrote:

 Currently we are failing...

 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O1 -g  thread
 simulation test
 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O2 -g  thread
 simulation test
 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -O3 -g  thread
 simulation test
 FAIL: gcc.dg/simulate-thread/atomic-load-int128.c  -Os -g  thread
 simulation test

 on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread
 of
 gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20
 seconds
 eliminates the failures (as these test take ~16 seconds on
 x86_64-apple-darwin11).
 Okay for gcc trunk?


 if it's only one test can't you use { dg-timeout-factor 2.0  ?


 As said elsewhere, this will double the amount of already large
 logfile in case of failed test.

 Do we really need such detailed log?


 anything to optimize what's in the logs would be welcome in debugging

I fully agree, but it is trivial to re-run the test in the debugger
outside the testsuite run. IMO, logging a couple of lines for
execution of every instruction (in a loop!) is a bit excessive.

Uros.


Re: [PATCH] [MIPS] Add -march=octeon+ support for GCC

2011-12-07 Thread Andrew Pinski
On Tue, Dec 6, 2011 at 6:28 PM, Andrew Pinski pins...@gmail.com wrote:
 Hi,
  This patch adds -march=octeon+ to GCC.
 OK? Bootstrapped and tested on mips64-linux-gnu configured with
 --with-arch=octeon+ .

 Thanks,
 Andrew Pinski

 gcc/ChangeLog:
 * mips/mips-cpus.def (octeon+): New CPU.
 * config/mips/mips-tables.opt: Regenerate.
 * config/mips/mips.h (MIPS_CPP_SET_PROCESSOR): Emit '+' as 'P'.

 testsuite/ChangeLog:
 * gcc.target/mips/mult-1.c: Forbit all Octeon processors.
 * gcc.target/mips/dmult-1.c: Likewise.
 * gcc.target/mips/branch-1.c: Likewise.
 * gcc.target/mips/extend-1.c: Likewise.

Woops I forgot the patch.

Thanks,
Andrew Pinski
Index: testsuite/gcc.target/mips/mult-1.c
===
--- testsuite/gcc.target/mips/mult-1.c  (revision 182066)
+++ testsuite/gcc.target/mips/mult-1.c  (working copy)
@@ -1,6 +1,6 @@
 /* For SI-DI widening multiplication we should use DINS to combine the two
halves.  For Octeon use DMUL with explicit widening.  */
-/* { dg-options -O -mgp64 isa_rev=2 forbid_cpu=octeon } */
+/* { dg-options -O -mgp64 isa_rev=2 forbid_cpu=octeon\[\+0-9\]* } */
 /* { dg-final { scan-assembler \tdins\t } } */
 /* { dg-final { scan-assembler-not \tdsll\t } } */
 /* { dg-final { scan-assembler-not \tdsrl\t } } */
Index: testsuite/gcc.target/mips/dmult-1.c
===
--- testsuite/gcc.target/mips/dmult-1.c (revision 182066)
+++ testsuite/gcc.target/mips/dmult-1.c (working copy)
@@ -1,4 +1,4 @@
-/* { dg-options forbid_cpu=octeon -mgp64 } */
+/* { dg-options forbid_cpu=octeon\[\+0-9\]* -mgp64 } */
 /* { dg-final { scan-assembler \tdmult\t } } */
 /* { dg-final { scan-assembler \tmflo\t } } */
 /* { dg-final { scan-assembler-not \tdmul\t } } */
Index: testsuite/gcc.target/mips/branch-1.c
===
--- testsuite/gcc.target/mips/branch-1.c(revision 182066)
+++ testsuite/gcc.target/mips/branch-1.c(working copy)
@@ -2,7 +2,7 @@
but we test for bbit elsewhere.  On other targets, we should implement
the if statements using an andi instruction followed by a branch
on zero.  */
-/* { dg-options -O2 forbid_cpu=octeon } */
+/* { dg-options -O2 forbid_cpu=octeon\[\+0-9\]* } */
 
 void bar (void);
 NOMIPS16 void f1 (int x) { if (x  4) bar (); }
Index: testsuite/gcc.target/mips/extend-1.c
===
--- testsuite/gcc.target/mips/extend-1.c(revision 182066)
+++ testsuite/gcc.target/mips/extend-1.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-options -O -mgp64 forbid_cpu=octeon } */
+/* { dg-options -O -mgp64 forbid_cpu=octeon\[\+0-9\]* } */
 /* { dg-final { scan-assembler-times \tdsll\t 5 } } */
 /* { dg-final { scan-assembler-times \tdsra\t 5 } } */
 /* { dg-final { scan-assembler-not \tsll\t } } */
Index: config/mips/mips-tables.opt
===
--- config/mips/mips-tables.opt (revision 182066)
+++ config/mips/mips-tables.opt (working copy)
@@ -603,3 +603,6 @@
 EnumValue
 Enum(mips_arch_opt_value) String(octeon) Value(80) Canonical
 
+EnumValue
+Enum(mips_arch_opt_value) String(octeon+) Value(81) Canonical
+
Index: config/mips/mips-cpus.def
===
--- config/mips/mips-cpus.def   (revision 182066)
+++ config/mips/mips-cpus.def   (working copy)
@@ -145,3 +145,4 @@
 
 /* MIPS64 Release 2 processors.  */
 MIPS_CPU (octeon, PROCESSOR_OCTEON, 65, PTF_AVOID_BRANCHLIKELY)
+MIPS_CPU (octeon+, PROCESSOR_OCTEON, 65, PTF_AVOID_BRANCHLIKELY)
Index: config/mips/mips.h
===
--- config/mips/mips.h  (revision 182066)
+++ config/mips/mips.h  (working copy)
@@ -329,7 +329,10 @@
\
   macro = concat ((PREFIX), _, (INFO)-name, NULL);  \
   for (p = macro; *p != 0; p++)\
-   *p = TOUPPER (*p);  \
+if (*p == '+')  \
+  *p = 'P'; \
+else\
+  *p = TOUPPER (*p);\
\
   builtin_define (macro);  \
   builtin_define_with_value ((PREFIX), (INFO)-name, 1);   \


Re: [Patch, Fortran] PR51448 [4.6/4.7] Fix realloc with RHS conversion function

2011-12-07 Thread Mikael Morin
On Wednesday 07 December 2011 17:45:52 Tobias Burnus wrote:
 This fixes a -frealloc-lhs regression where the RHS is handled by a
 conversion function whose argument has component refs.
 
 Build and regtested on x86-64-linux.
 OK for the trunk and 4.7?
 
 Tobias
OK.

Mikael


Re: PR 51386

2011-12-07 Thread François Dumont

Attached patch applied:

2011-12-07  François Dumont fdum...@gcc.gnu.org

PR libstdc++/51386
* include/bits/hashtable_policy.h 
(_Prime_rehash_policy::_M_next_bkt):

Fix computation of _M_prev_resize so that hashtable do not keep on
being rehashed when _M_max_load_factor is lower than 1.

François

On 12/07/2011 11:21 AM, Paolo Carlini wrote:

Hi,

Ok to commit ?
Yes, thanks a lot for handling this. Please remember to add a proper 
header to the ChangeLog entry.


Thanks again,
Paolo.



Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h	(revision 181975)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -300,23 +300,30 @@
   {
 // Optimize lookups involving the first elements of __prime_list.
 // (useful to speed-up, eg, constructors)
-static const unsigned long __fast_bkt[12]
+static const unsigned char __fast_bkt[12]
   = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 };
 
+if (__n = 11)
+  {
+	_M_prev_resize = 0;
+	_M_next_resize
+	  = __builtin_ceil(__fast_bkt[__n] * (long double)_M_max_load_factor);
+	return __fast_bkt[__n];
+  }
+
 const unsigned long* __p
-  = __n = 11 ? __fast_bkt + __n
-		  : std::lower_bound(__prime_list + 5,
- __prime_list + _S_n_primes, __n);
+  = std::lower_bound(__prime_list + 5, __prime_list + _S_n_primes, __n);
 
-_M_prev_resize = __builtin_floor(*__p * (long double)_M_max_load_factor);
-if (__p != __fast_bkt)
-  _M_prev_resize = std::min(_M_prev_resize,
-static_caststd::size_t(*(__p - 1)));
-// Lets guaranty a minimal grow step of 11:
+// Shrink will take place only if the number of elements is small enough
+// so that the prime number 2 steps before __p is large enough to still
+// conform to the max load factor:
+_M_prev_resize
+  = __builtin_floor(*(__p - 2) * (long double)_M_max_load_factor);
+
+// Let's guaranty that a minimal grow step of 11 is used
 if (*__p - __n  11)
-  __p = std::lower_bound(__prime_list + 5,
-			 __prime_list + _S_n_primes, __n + 11);
-_M_next_resize = __builtin_floor(*__p * (long double)_M_max_load_factor);
+  __p = std::lower_bound(__p, __prime_list + _S_n_primes, __n + 11);
+_M_next_resize = __builtin_ceil(*__p * (long double)_M_max_load_factor);
 return *__p;
   }
 


Re: [Patch, Fortran] PR51407 - allow BOZ edit descriptors for REAL/COMPLEX

2011-12-07 Thread Mikael Morin
On Wednesday 07 December 2011 14:54:36 Tobias Burnus wrote:
 * ping * ?
 
 On 12/04/2011 06:46 PM, Tobias Burnus wrote:
  Hi all,
  
  as Dominique has found, Fortran 2008 allows the BOZ edit descriptors
  now also with REAL and COMPLEX arguments. (See PR for quotes from the
  standard.)
  
  Build and regtested on x86-64-linux.
  OK for the trunk?
  
OK.

Mikael


Re: [Patch, Fortran] PR 51378 Fix component-access check

2011-12-07 Thread Mikael Morin
On Friday 02 December 2011 22:01:19 Tobias Burnus wrote:
 Found via Reinhold Bader's test suite: If a component is public, it
 remains public even if the extended type has PRIVATE.
 
 Build and regtested on x86-64-linux.
 OK for the trunk?
 
OK.

Mikael


[PATCH] Avoid using constructor attribute in libcpp (PR bootstrap/50237)

2011-12-07 Thread Jakub Jelinek
Hi!

While this isn't real fix for this PR, which should be fixed in configury,
I think avoiding ctors in host code is useful for portability and Eric
agreed with that in the PR.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-12-07  Jakub Jelinek  ja...@redhat.com

PR bootstrap/50237
* internal.h (_cpp_init_lexer): New prototype.
* init.c (init_library): Call it.
* lex.c (init_vectorized_lexer): Remove constructor attribute,
add inline keyword.
(HAVE_init_vectorized_lexer): Define.
(_cpp_init_lexer): New function.

--- libcpp/internal.h.jj2011-11-04 07:49:56.0 +0100
+++ libcpp/internal.h   2011-11-28 16:01:13.467831488 +0100
@@ -653,6 +653,7 @@ extern int _cpp_equiv_tokens (const cpp_
 extern void _cpp_init_tokenrun (tokenrun *, unsigned int);
 extern cpp_hashnode *_cpp_lex_identifier (cpp_reader *, const char *);
 extern int _cpp_remaining_tokens_num_in_context (cpp_context *);
+extern void _cpp_init_lexer (void);
 
 /* In init.c.  */
 extern void _cpp_maybe_push_include_file (cpp_reader *);
--- libcpp/init.c.jj2011-10-31 20:44:15.0 +0100
+++ libcpp/init.c   2011-11-28 16:01:42.168044257 +0100
@@ -134,6 +134,8 @@ init_library (void)
 {
   initialized = 1;
 
+  _cpp_init_lexer ();
+
   /* Set up the trigraph map.  This doesn't need to do anything if
 we were compiled with a compiler that supports C99 designated
 initializers.  */
--- libcpp/lex.c.jj 2011-10-27 08:43:10.0 +0200
+++ libcpp/lex.c2011-11-28 16:04:42.624703698 +0100
@@ -477,7 +477,8 @@ search_line_sse42 (const uchar *s, const
 typedef const uchar * (*search_line_fast_type) (const uchar *, const uchar *);
 static search_line_fast_type search_line_fast;
 
-static void __attribute__((constructor))
+#define HAVE_init_vectorized_lexer 1
+static inline void
 init_vectorized_lexer (void)
 {
   unsigned dummy, ecx = 0, edx = 0;
@@ -638,6 +639,16 @@ search_line_fast (const uchar *s, const
 
 #endif
 
+/* Initialize the lexer if needed.  */
+
+void
+_cpp_init_lexer (void)
+{
+#ifdef HAVE_init_vectorized_lexer
+  init_vectorized_lexer ();
+#endif
+}
+
 /* Returns with a logical line that contains no escaped newlines or
trigraphs.  This is a time-critical inner loop.  */
 void

Jakub


[C++ PATCH] Error out on [N] = style designated initializers for classes (PR c++/51229)

2011-12-07 Thread Jakub Jelinek
Hi!

We already error out on .foo = style designators for arrays, but for
[N] = style designators for classes we'd just ICE.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

We still have a bunch of accepts-invalid, no idea how to solve them,
because reshape_init_r is called often several times with the same d-cur.
IMHO we should error on all of:
struct A { int i; };
int a = { .foo = 6 };
int b = { [0] = 1 };
_Complex float c = { .foo = 0,  1 };
_Complex float d = { [0] = 0,  1 };
_Complex float e = { 0, .foo = 1 };
_Complex float f = { 0, [0] = 1 };
char g[] = { [7] = abcd };
The last one is a accepts-invalid even for C.  Jason, do you think you could
look at that on the C++ side, I'll look at the C one?

2011-12-07  Jakub Jelinek  ja...@redhat.com

PR c++/51229
* decl.c (reshape_init_class): Complain if d-cur-index is
INTEGER_CST.
* parser.c (cp_parser_initializer_list): If cp_parser_parse_definitely
fails, clear designator.

* g++.dg/ext/desig3.C: New test.

--- gcc/cp/decl.c.jj2011-12-01 11:45:04.0 +0100
+++ gcc/cp/decl.c   2011-12-07 12:51:17.455762864 +0100
@@ -5078,6 +5078,14 @@ reshape_init_class (tree type, reshape_i
   /* Handle designated initializers, as an extension.  */
   if (d-cur-index)
{
+ if (TREE_CODE (d-cur-index) == INTEGER_CST)
+   {
+ if (complain  tf_error)
+   error (%[%E] =% used in a GNU-style designated initializer
+   for class %qT, d-cur-index, type);
+ return error_mark_node;
+   }
+
  field = lookup_field_1 (type, d-cur-index, /*want_type=*/false);
 
  if (!field || TREE_CODE (field) != FIELD_DECL)
--- gcc/cp/parser.c.jj  2011-11-28 17:58:00.0 +0100
+++ gcc/cp/parser.c 2011-12-07 11:31:33.525651711 +0100
@@ -17713,7 +17713,8 @@ cp_parser_initializer_list (cp_parser* p
  designator = cp_parser_constant_expression (parser, false, NULL);
  cp_parser_require (parser, CPP_CLOSE_SQUARE, RT_CLOSE_SQUARE);
  cp_parser_require (parser, CPP_EQ, RT_EQ);
- cp_parser_parse_definitely (parser);
+ if (!cp_parser_parse_definitely (parser))
+   designator = NULL_TREE;
}
   else
designator = NULL_TREE;
--- gcc/testsuite/g++.dg/ext/desig3.C.jj2011-12-07 12:55:34.894323455 
+0100
+++ gcc/testsuite/g++.dg/ext/desig3.C   2011-12-07 12:56:32.172001945 +0100
@@ -0,0 +1,9 @@
+// PR c++/51229
+// { dg-do compile }
+// { dg-options  }
+
+struct A { int i; };
+
+int a[5] = { .foo = 7 };// { dg-error used in a GNU-style designated 
initializer for an array }
+int b[] = { .foo = 8 };// { dg-error used in a GNU-style designated 
initializer for an array }
+A c = { [0] = {} };// { dg-error used in a GNU-style designated 
initializer for class }

Jakub


[C++ PATCH] Fix ICE in cxx_incomplete_type_diagnostic (PR c++/51429)

2011-12-07 Thread Jakub Jelinek
Hi!

This testcase regressed recently when Paolo added the user friendlier
invalid use of member function vs. invalid use of member distinction,
when TREE_OPERAND (value, 1) is BASELINK, DECL_FUNCTION_MEMBER_P on it
dies with a checking ICE.
The following patch calls get_first_fn if it is overloaded_fn.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-12-07  Jakub Jelinek  ja...@redhat.com

PR c++/51429
* typeck2.c (cxx_incomplete_type_diagnostic): Don't
ICE if TREE_OPERAND (value, 1) is overloaded.

* g++.dg/parse/error45.C: New test.

--- gcc/cp/typeck2.c.jj 2011-11-07 12:40:45.0 +0100
+++ gcc/cp/typeck2.c2011-12-07 15:31:30.506795286 +0100
@@ -428,15 +428,20 @@ cxx_incomplete_type_diagnostic (const_tr
 
 case OFFSET_TYPE:
 bad_member:
-  if (DECL_FUNCTION_MEMBER_P (TREE_OPERAND (value, 1))
-  ! flag_ms_extensions)
-   emit_diagnostic (diag_kind, input_location, 0,
-invalid use of member function 
-(did you forget the %()% ?));
-  else
-   emit_diagnostic (diag_kind, input_location, 0,
-invalid use of member 
-(did you forget the %% ?));
+  {
+   tree member = TREE_OPERAND (value, 1);
+   if (is_overloaded_fn (member))
+ member = get_first_fn (member);
+   if (DECL_FUNCTION_MEMBER_P (member)
+! flag_ms_extensions)
+ emit_diagnostic (diag_kind, input_location, 0,
+  invalid use of member function 
+  (did you forget the %()% ?));
+   else
+ emit_diagnostic (diag_kind, input_location, 0,
+  invalid use of member 
+  (did you forget the %% ?));
+  }
   break;
 
 case TEMPLATE_TYPE_PARM:
--- gcc/testsuite/g++.dg/parse/error45.C.jj 2011-12-07 15:45:28.732286838 
+0100
+++ gcc/testsuite/g++.dg/parse/error45.C2011-12-07 15:44:20.0 
+0100
@@ -0,0 +1,9 @@
+// PR c++/51429
+// { dg-do compile }
+
+struct A
+{
+  void foo (double);
+  void foo (int);
+  A () { foo = 0; }// { dg-error invalid use of member function }
+};

Jakub


Re: [C++ PATCH] Error out on [N] = style designated initializers for classes (PR c++/51229)

2011-12-07 Thread Jason Merrill

On 12/07/2011 03:25 PM, Jakub Jelinek wrote:

Hi!

We already error out on .foo = style designators for arrays, but for
[N] = style designators for classes we'd just ICE.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


The last one is a accepts-invalid even for C.  Jason, do you think you could
look at that on the C++ side, I'll look at the C one?


At some point, sure.  Would you open a PR?

Jason



Re: [C++ PATCH] Fix ICE in cxx_incomplete_type_diagnostic (PR c++/51429)

2011-12-07 Thread Jason Merrill

OK.

Jason


Re: [PATCH][ARM] one_cmpldi2 in NEON

2011-12-07 Thread Andrew Stubbs

On 07/12/11 16:25, Richard Earnshaw wrote:

2011-12-06  Andrew Stubbsa...@codesourcery.com

gcc/
* config/arm/arm.md (one_cmpldi2): Rename to ...
(one_cmpldi2_core): ... this, and modify it to prevent it being
used for NEON.
(one_cmpldi2): New define_expand.
* config/arm/neon.md (one_cmpldi2_neon): New define_insn.




+(define_insn_and_split *one_cmpldi2_core
+  [(set (match_operand:DI 0 arm_general_register_operand =r,r)
+   (not:DI (match_operand:DI 1 arm_general_register_operand 0,r)))]


Thinking about it, for an operation with one input and one output, there's no 
need for the
earlyclobber marker when the input is tied to the output (there's no other 
operand that can be
clobbered).

Otherwise OK.


Thanks I'll test and commit the attached updated patch once stage 1 
opens again.


Andrew
2011-12-07  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config/arm/arm.md (one_cmpldi2): Rename to ...
	(one_cmpldi2_core): ... this, and modify it to prevent it being
	used for NEON.
	(one_cmpldi2): New define_expand.
	* config/arm/neon.md (one_cmpldi2_neon): New define_insn.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4199,10 +4199,16 @@
   TARGET_32BIT  TARGET_HARD_FLOAT  (TARGET_FPA || TARGET_VFP_DOUBLE)
   )
 
-(define_insn_and_split one_cmpldi2
-  [(set (match_operand:DI 0 s_register_operand =r,r)
-	(not:DI (match_operand:DI 1 s_register_operand 0,r)))]
+(define_expand one_cmpldi2
+  [(set (match_operand:DI 0 s_register_operand )
+	(not:DI (match_operand:DI 1 s_register_operand )))]
   TARGET_32BIT
+  )
+
+(define_insn_and_split *one_cmpldi2_core
+  [(set (match_operand:DI 0 arm_general_register_operand =r,r)
+	(not:DI (match_operand:DI 1 arm_general_register_operand 0,r)))]
+  TARGET_32BIT  !TARGET_NEON
   #
   TARGET_32BIT  reload_completed
   [(set (match_dup 0) (not:SI (match_dup 1)))
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -896,6 +896,20 @@
   [(set_attr neon_type neon_int_1)]
 )
 
+(define_insn *one_cmpldi2_neon
+  [(set (match_operand:DI 0 s_register_operand	 =w,?r,?r,?w)
+	(not:DI (match_operand:DI 1 s_register_operand  w, 0,  r, w)))]
+  TARGET_NEON
+  @
+  vmvn\t%P0, %P1
+  #
+  #
+  vmvn\t%P0, %P1
+  [(set_attr neon_type neon_int_1,*,*,neon_int_1)
+   (set_attr length *,8,8,*)
+   (set_attr arch nota8,*,*,onlya8)]
+)
+
 (define_insn absmode2
   [(set (match_operand:VDQW 0 s_register_operand =w)
 	(abs:VDQW (match_operand:VDQW 1 s_register_operand w)))]


[C++ PATCH] Avoid ICE on auto non-static data members (PR c++/51401)

2011-12-07 Thread Jakub Jelinek
Hi!

By my reading auto is not valid on non-static data members, if they aren't
NSDMI, then we'd error on them (but differently between templates and
non-templates), but with NSDMI in template we just ICE because init is
DEFAULT_ARG.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-12-07  Jakub Jelinek  ja...@redhat.com

PR c++/51401
* decl.c (grokdeclarator): Error for auto on non-static data members.

* g++.dg/cpp0x/auto7.C: Adjust expected error message.
* g++.dg/cpp0x/auto29.C: New test.

--- gcc/cp/decl.c.jj2011-12-07 12:51:17.0 +0100
+++ gcc/cp/decl.c   2011-12-07 18:09:09.113310949 +0100
@@ -9971,6 +9971,12 @@ grokdeclarator (const cp_declarator *dec
   }
 else if (decl_context == FIELD)
   {
+   if (!staticp  type_uses_auto (type))
+ {
+   error (non-static data member declared %auto%);
+   type = error_mark_node;
+ }
+
/* The C99 flexible array extension.  */
if (!staticp  TREE_CODE (type) == ARRAY_TYPE
 TYPE_DOMAIN (type) == NULL_TREE)
--- gcc/testsuite/g++.dg/cpp0x/auto7.C.jj   2011-05-31 08:02:58.0 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/auto7.C  2011-12-07 18:14:18.011549947 +0100
@@ -9,5 +9,5 @@ templateint struct A
 {
   static auto k = 7;   // { dg-error non-const }
   static auto l;   // { dg-error has no initializer }
-  auto m;  // { dg-error has no initializer }
+  auto m;  // { dg-error non-static data member declared }
 };
--- gcc/testsuite/g++.dg/cpp0x/auto29.C.jj  2011-12-07 18:12:22.181184910 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/auto29.C 2011-12-07 18:11:52.0 +0100
@@ -0,0 +1,25 @@
+// PR c++/51401
+// { dg-do compile }
+// { dg-options -std=c++11 }
+
+template int
+struct A
+{
+  auto i;  // { dg-error non-static data member declared }
+};
+
+template int
+struct B
+{
+  auto i = 0;  // { dg-error non-static data member declared }
+};
+
+struct C
+{
+  auto i;  // { dg-error non-static data member declared }
+};
+
+struct D
+{
+  auto i = 0;  // { dg-error non-static data member declared }
+};

Jakub


Re: [Patch, Fortran] PR50815 - don't -fcheck=bounds of deferred-length strings

2011-12-07 Thread Mikael Morin
On Wednesday 07 December 2011 14:53:20 Tobias Burnus wrote:
 ** PING **
 
 On 11/29/2011 07:35 PM, Tobias Burnus wrote:
  gfortran had an ICE when trying to insert a check whether the
  character length between actual and dummy argument matches. This check
  is pointless for deferred-length character arguments - thus, no bounds
  check should be generated.
  
  Build and regtested on x86-64-linux.
  OK for the trunk?
  
OK, though I would have merged the skip for deferred lengths into the toplevel 
if condition (either is fine anyway).

Mikael


Re: [PATCH] Avoid using constructor attribute in libcpp (PR bootstrap/50237)

2011-12-07 Thread Richard Henderson
On 12/07/2011 12:20 PM, Jakub Jelinek wrote:
   PR bootstrap/50237
   * internal.h (_cpp_init_lexer): New prototype.
   * init.c (init_library): Call it.
   * lex.c (init_vectorized_lexer): Remove constructor attribute,
   add inline keyword.
   (HAVE_init_vectorized_lexer): Define.
   (_cpp_init_lexer): New function.

Ok.


r~


[Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Teresa Johnson
An issue turned up in our internal 4.6 based testing that has been
fixed on trunk. This patch backports the fix to 4.6. I also have a
small test case that I will add to both 4.6 and 4.7.

Bootstrapped and checked with x86_64-unknown-linux-gnu.

Can someone review?

Thanks,
Teresa

2011-12-07  Teresa Johnson  tejohn...@google.com

Backport from mainline:

2011-08-05  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (*movdi_internal_rex64): Use !o constraint
instead of !m for operand 0, alternative 4.
(*movdf_internal_rex64): Ditto for operand 0, alernative 6.

2011-12-07  Teresa Johnson  tejohn...@google.com

* gcc.target/i386/movdi-rex64.c: New.


Index: 
config/i386/i386.md===---
config/i386/i386.md (revision 182083)+++ config/i386/i386.md(working
copy)@@ -1960,7 +1960,7 @@  (define_insn *movdi_internal_rex64
[(set (match_operand:DI 0 nonimmediate_operand- =r,r  ,r,m
,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym)+ =r,r
,r,m ,!o,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym)
(match_operand:DI 1 general_operand     Z ,rem,i,re,n ,C
,*y,*Ym,*y,r   ,m  ,C ,*x,*Yi,*x,r  ,m ,*Ym,*x))]   TARGET_64BIT 
!(MEM_P (operands[0])  MEM_P (operands[1]))@@ -2905,7 +2905,7 @@
(define_insn *movdf_internal_rex64   [(set (match_operand:DF 0
nonimmediate_operand- =f,m,f,r ,m,!r,!m,Y2*x,Y2*x,Y2*x,m   ,Yi,r
)+ =f,m,f,r ,m,!r,!o,Y2*x,Y2*x,Y2*x,m   ,Yi,r )  
(match_operand:DF
1 general_operand fm,f,G,rm,r,F ,F ,C   ,Y2*x,m   ,Y2*x,r ,Yi))]
  TARGET_64BIT  !(MEM_P (operands[0])  MEM_P (operands[1]))

Index: testsuite/gcc.target/i386/movdi-rex64.c
===
--- testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
+++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options -fPIE -Wwrite-strings } */
+
+#include string.h
+static __thread char buffer[25];
+const char * error_message (void)
+{
+oops:
+strcpy (buffer, Unknown code );
+return 0;
+}


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Uros Bizjak
Hello!

 An issue turned up in our internal 4.6 based testing that has been
 fixed on trunk. This patch backports the fix to 4.6. I also have a
 small test case that I will add to both 4.6 and 4.7.

 Bootstrapped and checked with x86_64-unknown-linux-gnu.

 2011-12-07  Teresa Johnson  tejohn...@google.com

   Backport from mainline:

   2011-08-05  Uros Bizjak  ubiz...@gmail.com

   * config/i386/i386.md (*movdi_internal_rex64): Use !o constraint
   instead of !m for operand 0, alternative 4.
   (*movdf_internal_rex64): Ditto for operand 0, alernative 6.

 2011-12-07  Teresa Johnson  tejohn...@google.com

   * gcc.target/i386/movdi-rex64.c: New.

Index: testsuite/gcc.target/i386/movdi-rex64.c
===
--- testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
+++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options -fPIE -Wwrite-strings } */
+
+#include string.h
+static __thread char buffer[25];
+const char * error_message (void)
+{
+oops:
+strcpy (buffer, Unknown code );
+return 0;
+}

You don't need #include for compile tests, just use:

--cut here--
/* { dg-do compile } */
/* { dg-options -fPIE } */

char *strcpy (char *dest, const char *src);

static __thread char buffer[25];

const char
* error_message (void)
{
  strcpy (buffer, Unknown code );
  return 0;
}
--cut here--

Also this can be compiled everywhere, not just linux.

OK with the testcase above.

Thanks,
Uros.


Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Teresa Johnson
On Wed, Dec 7, 2011 at 2:12 PM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 An issue turned up in our internal 4.6 based testing that has been
 fixed on trunk. This patch backports the fix to 4.6. I also have a
 small test case that I will add to both 4.6 and 4.7.

 Bootstrapped and checked with x86_64-unknown-linux-gnu.

 2011-12-07  Teresa Johnson  tejohn...@google.com

       Backport from mainline:

       2011-08-05  Uros Bizjak  ubiz...@gmail.com

       * config/i386/i386.md (*movdi_internal_rex64): Use !o constraint
       instead of !m for operand 0, alternative 4.
       (*movdf_internal_rex64): Ditto for operand 0, alernative 6.

 2011-12-07  Teresa Johnson  tejohn...@google.com

       * gcc.target/i386/movdi-rex64.c: New.

 Index: testsuite/gcc.target/i386/movdi-rex64.c
 ===
 --- testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
 +++ testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
 @@ -0,0 +1,11 @@
 +/* { dg-do compile { target *-*-linux* } } */
 +/* { dg-options -fPIE -Wwrite-strings } */
 +
 +#include string.h
 +static __thread char buffer[25];
 +const char * error_message (void)
 +{
 +oops:
 +    strcpy (buffer, Unknown code );
 +    return 0;
 +}

 You don't need #include for compile tests, just use:

 --cut here--
 /* { dg-do compile } */
 /* { dg-options -fPIE } */

 char *strcpy (char *dest, const char *src);

 static __thread char buffer[25];

 const char
 * error_message (void)
 {
  strcpy (buffer, Unknown code );
  return 0;
 }
 --cut here--

 Also this can be compiled everywhere, not just linux.

Ok, I will change the testcase to replace the include and retest.

Regarding the linux target restriction, though, I was concerned about
the -fPIE option used for the test case. I noticed that in 4.7 there
is a pie effective target keyword (check_effective_target_pie in
testsuite/lib/target-supports.exp). However, that does not yet exist
in 4.6, so rather than backport that as well I added the linux
restriction. I see the same restriction in the other tests that use
-fpie in gcc.target/i386 (pr39013-[12].c). What do you think?

Thanks!
Teresa


 OK with the testcase above.

 Thanks,
 Uros.



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[Committed] Fix PR libffi/50051 (MIPS libffi does not compile for mips64octeon-linux-gnu)

2011-12-07 Thread Andrew Pinski
Hi,
  The problem here is Cavium's octeon assembler rejects floating point
if the arch is set to either octeon or octeon2.  This fixes the issue
by adding:
.set mips4 so that floating point instructions are enabled.

Committed as approved by Anthony in the bugzilla.

Thanks,
Andrew Pinski

libffi/ChangeLog:
* src/mips/n32.S: Add .set mips4.
Index: src/mips/n32.S
===
--- src/mips/n32.S  (revision 182083)
+++ src/mips/n32.S  (working copy)
@@ -43,6 +43,7 @@
 #ifdef __GNUC__
.abicalls
 #endif
+   .set mips4
.text
.align  2
.globl  ffi_call_N32


Re: [v3] RFC: rename __calculate_memory_order

2011-12-07 Thread Benjamin Kosnik

 * include/bits/atomic_base.h (__calculate_memory_order):
 Rename to... (__cmpexch_failure_order): This, and rewrite as
 constexpr function. (compare_exchange_strong, compare_exchange_weak):
 Use it.
 * include/std/atomic (compare_exchange_strong,
 compare_exchange_weak): Likewise.
 
 Tested x86_64-linux.

looks great to me. More constexpr, what's not to like?

-benjamin


-finstrument-functions leads to unsats to _mm instrinsic wrappers

2011-12-07 Thread Xinliang David Li
Build the test case in the patch file with -finstrument-functions, the
link will fail with unsat. The problem is gcc instruments the
artificial wrappers that will won't be emitted. The patch fixes the
problem. Bootstrap and tested on x86-64/linux.

Ok for mainline?

thanks,

David
Index: gimplify.c
===
--- gimplify.c	(revision 182083)
+++ gimplify.c	(working copy)
@@ -8048,6 +8048,12 @@ flag_instrument_functions_exclude_p (tre
 	  return true;
 }
 
+/* Avoid instrumenting wrapper functions to builtins.  */
+
+if (DECL_DISREGARD_INLINE_LIMITS (fndecl)
+ lookup_attribute (always_inline, DECL_ATTRIBUTES (fndecl)))
+  return true;
+
   return false;
 }
 
Index: testsuite/gcc.target/i386/instrument-func.c
===
--- testsuite/gcc.target/i386/instrument-func.c	(revision 0)
+++ testsuite/gcc.target/i386/instrument-func.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sse } */
+/* { dg-options -O2  -msse -finstrument-functions } */
+
+#include xmmintrin.h
+
+__attribute__((noinline)) __m128 foo( float *row, int x)
+{
+  __m128 vals = _mm_loadu_ps(row + x);
+  return vals;
+}
+
+int main()
+{
+  float f[10];
+  foo(f, 5);
+  return 0;
+}


cg1
Description: Binary data


cg2
Description: Binary data


Re: -finstrument-functions leads to unsats to _mm instrinsic wrappers

2011-12-07 Thread Andrew Pinski
On Wed, Dec 7, 2011 at 4:01 PM, Xinliang David Li davi...@google.com wrote:
 Build the test case in the patch file with -finstrument-functions, the
 link will fail with unsat. The problem is gcc instruments the
 artificial wrappers that will won't be emitted. The patch fixes the
 problem. Bootstrap and tested on x86-64/linux.


I think you really should be checking for artificial attribute and not
looking at always_inline.

Thanks,
Andrew Pinski


Re: -finstrument-functions leads to unsats to _mm instrinsic wrappers

2011-12-07 Thread Xinliang David Li
On Wed, Dec 7, 2011 at 4:25 PM, Andrew Pinski pins...@gmail.com wrote:
 On Wed, Dec 7, 2011 at 4:01 PM, Xinliang David Li davi...@google.com wrote:
 Build the test case in the patch file with -finstrument-functions, the
 link will fail with unsat. The problem is gcc instruments the
 artificial wrappers that will won't be emitted. The patch fixes the
 problem. Bootstrap and tested on x86-64/linux.


 I think you really should be checking for artificial attribute and not
 looking at always_inline.

Interesting -- I thought I added that. The following is the revised one.

thanks,

David


 Thanks,
 Andrew Pinski
Index: gimplify.c
===
--- gimplify.c	(revision 182083)
+++ gimplify.c	(working copy)
@@ -8048,6 +8048,13 @@ flag_instrument_functions_exclude_p (tre
 	  return true;
 }
 
+/* Avoid instrumenting wrapper functions to builtins.  */
+
+if (DECL_ARTIFICIAL (fndecl)
+ DECL_DISREGARD_INLINE_LIMITS (fndecl)
+ lookup_attribute (always_inline, DECL_ATTRIBUTES (fndecl)))
+  return true;
+
   return false;
 }
 
Index: testsuite/gcc.target/i386/instrument-func.c
===
--- testsuite/gcc.target/i386/instrument-func.c	(revision 0)
+++ testsuite/gcc.target/i386/instrument-func.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sse } */
+/* { dg-options -O2  -msse -finstrument-functions } */
+
+#include xmmintrin.h
+
+__attribute__((noinline)) __m128 foo( float *row, int x)
+{
+  __m128 vals = _mm_loadu_ps(row + x);
+  return vals;
+}
+
+int main()
+{
+  float f[10];
+  foo(f, 5);
+  return 0;
+}


Re: [C++ PATCH] Avoid ICE on auto non-static data members (PR c++/51401)

2011-12-07 Thread Jason Merrill

OK.

Jason


[4.7][google] Adding a new option -fstack-protector-strong. (issue5461043)

2011-12-07 Thread Han Shen
Hi, this patch provides a new stack protection option - 
fstack-protector-strong.

Background - some times stack-protector is too-simple while stack-protector-all 
over-kills, for example, to build one of our core systems, we forcibly add 
-fstack-protector-all to all compile commands, which brings big performance 
penalty (due to extra stack guard/check insns on function prologue and 
epilogue) on both atom and arm. To use -fstack-protector is just regarded as 
not secure enough (only protects 2% functions) by the system secure team. So 
I'd like to add the option -fstack-protector-strong, that hits the balance 
between -fstack-protector and -fstack-protector-all.

Benefit - gain big performance while sacrificing little security (for scenarios 
using -fstack-protector-all)

Status - implemented internally, to be up-streamed or merged to google branch 
only.

Detail - 
https://docs.google.com/a/google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU/edit?hl=en_US

Tested - manually, built chrome browser using the modified compiler with 
-fstack-protector-strong.

=
gcc/ChangeLog:
* cfgexpand.c (expand_used_vars): Add logic handling 
stack-protector-strong.
(is_local_address_taken): Internal function that returns true when 
gimple contains
an address taken on function local variables.
(record_or_union_type_has_array): New, tests if a record or union type 
contains an array.
* common.opt (fstack-protector-all): New option.

gcc/testsuite/ChangeLog
* gcc.dg/fstack-protector-strong.c: New.

==

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 8684721..1d9df87 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1507,6 +1507,50 @@ estimated_stack_frame_size (struct cgraph_node *node)
   return size;
 }
 
+static int is_local_address_taken(tree t) {
+  if (t  TREE_CODE(t) == ADDR_EXPR) {
+int i;
+tree local_var;
+tree v = TREE_OPERAND(t, 0);
+switch (TREE_CODE(v)) {
+case MEM_REF:
+  for (i = 0; i  TREE_OPERAND_LENGTH(v); ++i)
+if (is_local_address_taken(TREE_OPERAND(v, i)))
+  return 1;
+  return 0;
+case COMPONENT_REF:
+  while (v  TREE_CODE(v) == COMPONENT_REF)
+v = TREE_OPERAND(v, 0);
+  break;
+case VAR_DECL:
+default:
+  ;
+}
+if (v  TREE_CODE(v) == VAR_DECL  !is_global_var(v)) {
+  FOR_EACH_VEC_ELT(tree, cfun-local_decls, i, local_var) {
+if (local_var == v)
+  return 1;
+  }
+}
+  }
+  return 0;
+}
+
+static int record_or_union_type_has_array(tree tree_type) {
+  tree fields = TYPE_FIELDS(tree_type);
+  tree f;
+  for (f = fields; f; f = DECL_CHAIN (f)) {
+if (TREE_CODE(f) == FIELD_DECL) {
+  tree field_type = TREE_TYPE(f);
+  if (RECORD_OR_UNION_TYPE_P(field_type))
+return record_or_union_type_has_array(field_type);
+  if (TREE_CODE(field_type) == ARRAY_TYPE)
+return 1;
+}
+  }
+  return 0;
+}
+
 /* Expand all variables used in the function.  */
 
 static void
@@ -1516,6 +1560,8 @@ expand_used_vars (void)
   VEC(tree,heap) *maybe_local_decls = NULL;
   unsigned i;
   unsigned len;
+  int gen_stack_protect_signal = 0;
+  basic_block bb;
 
   /* Compute the phase of the stack frame for this function.  */
   {
@@ -1548,6 +1594,63 @@ expand_used_vars (void)
}
 }
 
+  /* Examine each basic block for address taking of local variables. */
+  FOR_EACH_BB(bb) {
+gimple_stmt_iterator si;
+/* Scanning phis. */
+for (si = gsi_start_phis(bb); !gsi_end_p(si); gsi_next(si)) {
+  gimple phi_stmt = gsi_stmt(si);
+  unsigned int i;
+  for (i = 0; i  gimple_phi_num_args(phi_stmt); ++i)
+if (is_local_address_taken(gimple_phi_arg(phi_stmt, i)-def))
+  ++gen_stack_protect_signal;
+}
+/* Scanning assignments and calls. */
+for (si = gsi_start_bb(bb); !gen_stack_protect_signal  !gsi_end_p(si);
+ gsi_next(si)) {
+  gimple stmt = gsi_stmt (si);
+  if (is_gimple_assign(stmt)) {
+switch(gimple_assign_rhs_class(stmt)) {
+case GIMPLE_TERNARY_RHS:
+  if (is_local_address_taken(gimple_assign_rhs3(stmt))) {
+++gen_stack_protect_signal;
+break;
+  }
+  /* Otherwise, fall through. */
+case GIMPLE_BINARY_RHS:
+  if (is_local_address_taken(gimple_assign_rhs2(stmt))) {
+++gen_stack_protect_signal;
+break;
+  }
+case GIMPLE_SINGLE_RHS:
+case GIMPLE_UNARY_RHS:
+  if (is_local_address_taken(gimple_assign_rhs1(stmt)))
+++gen_stack_protect_signal;
+  break;
+case GIMPLE_INVALID_RHS:
+  break;
+}
+  }
+
+  if (!gen_stack_protect_signal  is_gimple_call(stmt)) {
+int ii, num_arg = gimple_call_num_args(stmt);
+for (ii = 0; !gen_stack_protect_signal  ii  num_arg; ++ii)
+  if 

Re: [4.7][google] Adding a new option -fstack-protector-strong. (issue5461043)

2011-12-07 Thread Andrew Pinski
On Wed, Dec 7, 2011 at 5:07 PM, Han Shen shen...@google.com wrote:
 +  /* Examine each basic block for address taking of local variables. */

I don't think you need to look at the basic blocks to figure out if a
local variable has its address taken.  You can look through referenced
variables and see if it is a local variable and TREE_ADDRESSABLE is
set.  This should speed up the code a lot.  Though that might has some
false positives with arrays but that is ok because you are checking if
there are any arrays already anyways.

Thanks,
Andrew Pinski


Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase

2011-12-07 Thread Uros Bizjak
On Wed, Dec 7, 2011 at 11:32 PM, Teresa Johnson tejohn...@google.com wrote:

 An issue turned up in our internal 4.6 based testing that has been
 fixed on trunk. This patch backports the fix to 4.6. I also have a
 small test case that I will add to both 4.6 and 4.7.

 Bootstrapped and checked with x86_64-unknown-linux-gnu.

 2011-12-07  Teresa Johnson  tejohn...@google.com

       Backport from mainline:

       2011-08-05  Uros Bizjak  ubiz...@gmail.com

       * config/i386/i386.md (*movdi_internal_rex64): Use !o constraint
       instead of !m for operand 0, alternative 4.
       (*movdf_internal_rex64): Ditto for operand 0, alernative 6.

 2011-12-07  Teresa Johnson  tejohn...@google.com

       * gcc.target/i386/movdi-rex64.c: New.

 Index: testsuite/gcc.target/i386/movdi-rex64.c
 ===
 --- testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
 +++ testsuite/gcc.target/i386/movdi-rex64.c     (revision 0)
 @@ -0,0 +1,11 @@
 +/* { dg-do compile { target *-*-linux* } } */
 +/* { dg-options -fPIE -Wwrite-strings } */
 +
 +#include string.h
 +static __thread char buffer[25];
 +const char * error_message (void)
 +{
 +oops:
 +    strcpy (buffer, Unknown code );
 +    return 0;
 +}

 You don't need #include for compile tests, just use:

 --cut here--
 /* { dg-do compile } */
 /* { dg-options -fPIE } */

 char *strcpy (char *dest, const char *src);

 static __thread char buffer[25];

 const char
 * error_message (void)
 {
  strcpy (buffer, Unknown code );
  return 0;
 }
 --cut here--

 Also this can be compiled everywhere, not just linux.

 Ok, I will change the testcase to replace the include and retest.

 Regarding the linux target restriction, though, I was concerned about
 the -fPIE option used for the test case. I noticed that in 4.7 there
 is a pie effective target keyword (check_effective_target_pie in
 testsuite/lib/target-supports.exp). However, that does not yet exist
 in 4.6, so rather than backport that as well I added the linux
 restriction. I see the same restriction in the other tests that use
 -fpie in gcc.target/i386 (pr39013-[12].c). What do you think?

Ah, I see. Then pleasee add back linux target selctor for 4.6 and add
fpie effective target check for 4.7.

Thanks,
Uros.