Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order

2011-10-20 Thread Uros Bizjak
On Thu, Oct 20, 2011 at 6:39 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 Updated patch is attached. Test fails wihout and passing with the fix.

 ChangeLog entry:
 2011-10-20  Kirill Yukhin  kirill.yuk...@intel.com

        PR target/50766
        * config/i386/i386.md (bmi_bextr_mode): Update register/
        memory operand order.
        (bmi2_bzhi_mode3): Ditto.
        (bmi2_pdep_mode3): Ditto.
        (bmi2_pext_mode3): Ditto.

 testsuite/ChangeLog entry:
 2011-10-20  Kirill Yukhin  kirill.yuk...@intel.com

        PR target/50766
        * gcc.target/i386/pr50766.c: New test.

 Could you please have a look?

OK.

Thanks,
Uros.


[Patch, gcc, testsuite] Adjust optimization levels for some cases.

2011-10-20 Thread Terry Guo
Hello,

These four cases check the amount of the desired instructions. At O2 level,
some factors like loop unroll will increase the amount of them. This patch
is proposing to adjust the optimization level to O1 (the minimal
requirement) to avoid such impact. In this way, the cases are more robust.
Regression test is performed on arm-none-eabi target. No regression found.
Is it OK to trunk?

BR,
Terry

2011-10-20  Terry Guo  terry@arm.com

* gcc.target/arm/wmul-1.c: Adjust optimization levels.
* gcc.target/arm/wmul-2.c: Ditto.
* gcc.target/arm/wmul-3.c: Ditto.
* gcc.target/arm/wmul-4.c: Ditto.

diff --git a/gcc/testsuite/gcc.target/arm/wmul-1.c
b/gcc/testsuite/gcc.target/arm/wmul-1.c
index 426c939..d50 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-1.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options -O2 } */
+/* { dg-options -O1 -fexpensive-optimizations } */
 
 int mac(const short *a, const short *b, int sqr, int *sum)
 {
diff --git a/gcc/testsuite/gcc.target/arm/wmul-2.c
b/gcc/testsuite/gcc.target/arm/wmul-2.c
index 898b5f0..2ea55f9 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-2.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options -O2 } */
+/* { dg-options -O1 -fexpensive-optimizations } */
 
 void vec_mpy(int y[], const short x[], short scaler)
 {
diff --git a/gcc/testsuite/gcc.target/arm/wmul-3.c
b/gcc/testsuite/gcc.target/arm/wmul-3.c
index 83f73fb..144b553 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-3.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options -O2 } */
+/* { dg-options -O1 -fexpensive-optimizations } */
 
 int mac(const short *a, const short *b, int sqr, int *sum)
 {
diff --git a/gcc/testsuite/gcc.target/arm/wmul-4.c
b/gcc/testsuite/gcc.target/arm/wmul-4.c
index a297bda..68f9866 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-4.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options -O2 } */
+/* { dg-options -O1 -fexpensive-optimizations } */
 
 int mac(const int *a, const int *b, long long sqr, long long *sum)
 {





RE: PING: [PATCH, ARM, iWMMXt][1/5]: ARM code generic change

2011-10-20 Thread Xinyu Qi
Ping

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01100.html

* config/arm/arm.c (arm_option_override): Enable use of iWMMXt with VFP.
Disable use of iwMMXt and Neon.
(arm_expand_binop_builtin): Accept VOIDmode op.
* config/arm/arm.md (*arm_movdi): Remove check for TARGET_IWMMXT.
(*arm_movsi_insn): Likewise.
(iwmmxt.md): Include earlier.


RE: PING: [PATCH, ARM, iWMMXt][3/5]: built in define and expand

2011-10-20 Thread Xinyu Qi
Ping

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01103.html

* config/arm/arm.c (enum arm_builtins): Revise built-in fcode.
(builtin_description bdesc_2arg): Revise built in declaration.
(builtin_description bdesc_1arg): Likewise.
(arm_init_iwmmxt_builtins): Revise built in initialization.
(arm_expand_builtin): Revise built in expansion.


RE: PING: [PATCH, ARM, iWMMXt][4/5]: WMMX machine description

2011-10-20 Thread Xinyu Qi
Ping

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00279.html

* config/arm/arm.c (arm_output_iwmmxt_shift_immediate): New function.
(arm_output_iwmmxt_tinsr): Likewise.
* config/arm/arm-protos.h (arm_output_iwmmxt_shift_immediate): Declare.
(arm_output_iwmmxt_tinsr): Likewise.
* config/arm/iwmmxt.md (WCGR0, WCGR1, WCGR2, WCGR3): New constant.
(iwmmxt_psadbw, iwmmxt_walign, iwmmxt_tmrc, iwmmxt_tmcr): Delete.
(iwmmxt_tbcstqi, iwmmxt_tbcsthi, iwmmxt_tbcstsi): Likewise
(*iwmmxt_clrv8qi, *iwmmxt_clrv4hi, *iwmmxt_clrv2si): Likewise.
(tbcstv8qi, tbcstv4hi, tbsctv2si): New pattern.
(iwmmxt_clrv8qi, iwmmxt_clrv4hi, iwmmxt_clrv2si): Likewise.
(*andmode3_iwmmxt, *iormode3_iwmmxt, *xormode3_iwmmxt): Likewise.
(rorimode3, ashrimode3_iwmmxt, lshrimode3_iwmmxt): Likewise.
(ashlimode3_iwmmxt, iwmmxt_waligni, iwmmxt_walignr): Likewise.
(iwmmxt_walignr0, iwmmxt_walignr1): Likewise.
(iwmmxt_walignr2, iwmmxt_walignr3): Likewise.
(iwmmxt_setwcgr0, iwmmxt_setwcgr1): Likewise.
(iwmmxt_setwcgr2, iwmmxt_setwcgr3): Likewise.
(iwmmxt_getwcgr0, iwmmxt_getwcgr1): Likewise.
(iwmmxt_getwcgr2, iwmmxt_getwcgr3): Likewise.
(All instruction patterns): Add wtype attribute.
(*iwmmxt_arm_movdi, *iwmmxt_movsi_insn): iWMMXt coexist with vfp. 
(iwmmxt_uavgrndv8qi3, iwmmxt_uavgrndv4hi3): Revise the pattern.
(iwmmxt_uavgv8qi3, iwmmxt_uavgv4hi3): Likewise.
(iwmmxt_tinsrb, iwmmxt_tinsrh, iwmmxt_tinsrw):Likewise.
(eqv8qi3, eqv4hi3, eqv2si3, gtuv8qi3): Likewise.
(gtuv4hi3, gtuv2si3, gtv8qi3, gtv4hi3, gtv2si3): Likewise.
(iwmmxt_wunpckihh, iwmmxt_wunpckihw, iwmmxt_wunpckilh): Likewise.
(iwmmxt_wunpckilw, iwmmxt_wunpckehub, iwmmxt_wunpckehuh): Likewise.
(iwmmxt_wunpckehuw, iwmmxt_wunpckehsb, iwmmxt_wunpckehsh): Likewise.
(iwmmxt_wunpckehsw, iwmmxt_wunpckelub, iwmmxt_wunpckeluh): Likewise.
(iwmmxt_wunpckeluw, iwmmxt_wunpckelsb, iwmmxt_wunpckelsh): Likewise.
(iwmmxt_wunpckelsw, iwmmxt_wmadds, iwmmxt_wmaddu): Likewise.
(iwmmxt_wsadb, iwmmxt_wsadh, iwmmxt_wsadbz, iwmmxt_wsadhz): Likewise.
(iwmmxt2.md): Include.
* config/arm/iwmmxt2.md: New file.
* config/arm/iterators.md (VMMX2): New mode_iterator.
* config/arm/arm.md (wtype): New attribute.
(UNSPEC_WMADDS, UNSPEC_WMADDU): Delete.
(UNSPEC_WALIGNI): New unspec.
* config/arm/t-arm (MD_INCLUDES): Add iwmmxt2.md.


RE: PING: [PATCH, ARM, iWMMXt][5/5]: pipeline description

2011-10-20 Thread Xinyu Qi
Ping

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01106.html

* config/arm/t-arm (MD_INCLUDES): Add marvell-f-iwmmxt.md.
* config/arm/marvell-f-iwmmxt.md: New file.
* config/arm/arm.md (marvell-f-iwmmxt.md): Include.


[PATCH] Loop IM cost TLC

2011-10-20 Thread Richard Guenther

We've got new tree codes, the following makes Loop IM cost
consider those expensive that make sense.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2011-10-19  Richard Guenther  rguent...@suse.de

* tree-ssa-loop-im.c (stmt_cost): Add WIDEN_*, FMA_EXPR
and rotates to the set of expensive operations.

Index: gcc/tree-ssa-loop-im.c
===
*** gcc/tree-ssa-loop-im.c  (revision 180191)
--- gcc/tree-ssa-loop-im.c  (working copy)
*** stmt_cost (gimple stmt)
*** 549,554 
--- 549,559 
switch (gimple_assign_rhs_code (stmt))
  {
  case MULT_EXPR:
+ case WIDEN_MULT_EXPR:
+ case WIDEN_MULT_PLUS_EXPR:
+ case WIDEN_MULT_MINUS_EXPR:
+ case DOT_PROD_EXPR:
+ case FMA_EXPR:
  case TRUNC_DIV_EXPR:
  case CEIL_DIV_EXPR:
  case FLOOR_DIV_EXPR:
*** stmt_cost (gimple stmt)
*** 565,570 
--- 570,578 
  
  case LSHIFT_EXPR:
  case RSHIFT_EXPR:
+ case WIDEN_LSHIFT_EXPR:
+ case LROTATE_EXPR:
+ case RROTATE_EXPR:
cost += 20;
break;
  


[PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)

2011-10-20 Thread Dodji Seketeli
Hello,

The below fixes an embarrassing thinko of mine that breaks bootstrap on
SPU and PPC targets (at very least).  I am surprised it doesn't break
more code.  :-(

I have lightly tested it on SPU in a cross compiled environment (so I
couldn't bootstrap it there) and I have bootstrapped it on
x86_64-unknown-linux-gnu.  One person confirmed in the audit trail of
the PR that it fixes the issue for him on PPC, so I am proposing the
patch even if I don't know if it bootstraps on SPU or PPC in general.

OK for trunk?

From: Dodji Seketeli do...@redhat.com
Date: Thu, 20 Oct 2011 09:43:49 +0200
Subject: [PATCH] Fix thinko in _cpp_remaining_tokens_num_in_context

libcpp/

* lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of
number of tokens in direct tokens contexts.
---
 libcpp/ChangeLog |6 ++
 libcpp/lex.c |3 +--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libcpp/lex.c b/libcpp/lex.c
index cd6ae9f..cf8ef7d 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -1710,8 +1710,7 @@ _cpp_remaining_tokens_num_in_context (cpp_reader *pfile)
 {
   cpp_context *context = pfile-context;
   if (context-tokens_kind == TOKENS_KIND_DIRECT)
-return ((LAST (context).token - FIRST (context).token)
-   / sizeof (cpp_token));
+return (LAST (context).token - FIRST (context).token);
   else if (context-tokens_kind == TOKENS_KIND_INDIRECT
   || context-tokens_kind == TOKENS_KIND_EXTENDED)
 return ((LAST (context).ptoken - FIRST (context).ptoken)
-- 
1.7.6.4

-- 
Dodji


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-20 Thread Sergey Ostanevich
 Please provide a patch which can be applied.  Cut/paste doesn't create
 a working patch.  Please attach it.

 --
 H.J.


Will that works?
Sergos.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6c73404..e21cf86 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2011-10-20  Sergey Ostanevich  sergos@gmail.com
+
+   * config/i386/i386.c (processor_target_table): Change Atom
+   align_loops_max_skip to 15.
+
 2011-10-17  Michael Spertus  mike_sper...@symantec.com

* gcc/c-family/c-common.c (c_common_reswords): Add __bases,
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..8c60086 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2596,7 +2596,7 @@ static const struct ptt
processor_target_table[PROCESSOR_max] =
   {bdver1_cost, 32, 24, 32, 7, 32},
   {bdver2_cost, 32, 24, 32, 7, 32},
   {btver1_cost, 32, 24, 32, 7, 32},
-  {atom_cost, 16, 7, 16, 7, 16}
+  {atom_cost, 16, 15, 16, 7, 16}
 };

 static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =


Re: [PATCH] Account for devirtualization opportunities in inliner

2011-10-20 Thread Richard Guenther
On Wed, Oct 19, 2011 at 11:59 PM, Maxim Kuvyrkov ma...@codesourcery.com wrote:
 On 28/09/2011, at 4:56 PM, Maxim Kuvyrkov wrote:

 Jan,

 The following patch starts a series of patches which improve 
 devirtualization optimizations in GCC.

 This patch builds on ipa-cp.c and ipa-prop.c infrastructure for analyzing 
 parameters and jump functions and adds basic estimation of devirtualization 
 benefit from inlining an edge.  E.g., if inlining A across edge E into B 
 will allow some of the indirect edges of A to be resolved, then inlining 
 cost of edge E is reduced.

 The patch was bootstrapped and regtested on x86_64-linux-gnu on both -m32 
 and -m64 multilibs.

 OK to commit?

 Ping.

 The primary change of this patch is to make evaluate_conditions_for_edge to 
 output KNOWN_VALS and KNOWN_BINFOS arrays in addition to conditions for a 
 callsite.  KNOWN_VALS and KNOWN_BINFOS are then passed on to a subroutine of 
 estimate_calls_size_and_time, which uses ipa-prop.c infrastructure to check 
 if it will be possible to devirtualize any of the indirect edged within 
 callee.  If possible, then *size and *time returned by 
 estimate_calls_size_and_time are reduced to account for the devirtualization 
 benefits.

 OK for trunk?

I miss testcase(s).  Any assesment on how this improves devirtualization
in practice (for example for Mozilla)?

Thanks,
Richard.

 --
 Maxim Kuvyrkov
 CodeSourcery / Mentor Graphics




Re: Avoid gcc.dg/tree-prof/val-prof-7.c dependence on strings.h

2011-10-20 Thread Richard Guenther
On Thu, Oct 20, 2011 at 1:25 AM, Joseph S. Myers
jos...@codesourcery.com wrote:
 The testcase gcc.dg/tree-prof/val-prof-7.c includes strings.h to get
 a declaration of bzero.  This causes it to fail on targets where bzero
 (a legacy function removed in the latest version of POSIX) is not
 declared in that header; declaring it explicitly in the testcase is
 more reliable.  This patch changes the include to an explicit
 declaration.

 Tested with cross to i686-mingw32 (where the header just includes
 string.h and does not provide a declaration of bzero).  OK to
 commit?

Ok.

Thanks,
Richard.

 2011-10-19  Joseph Myers  jos...@codesourcery.com

        * gcc.dg/tree-prof/val-prof-7.c: Declare bzero instead of
        including strings.h.

 Index: gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
 ===
 --- gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (revision 180200)
 +++ gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (working copy)
 @@ -1,7 +1,7 @@
  /* { dg-options -O2 -fdump-ipa-profile -mtune=core2 } */
  /* { dg-skip-if  { ! { i?86-*-* x86_64-*-* } } { * } {  } } */

 -#include strings.h
 +extern void bzero (void *, __SIZE_TYPE__);

  int foo(int len)
  {

 --
 Joseph S. Myers
 jos...@codesourcery.com



[PATCH, i386]: Use reciprocal sequences for vectorized SFmode division and sqrtf(x) for -ffast-math

2011-10-20 Thread Uros Bizjak
Hello!

This patch builds on recent patch by Michael (that implemented
fine-grained control on -mrecip option) and with -ffast-math emits
reciprocal sequences with additional NR step for vectorized SFmode
division and vectorized sqrtf(x).

2011-10-20  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.h (RECIP_MASK_DEFAULT): New define.
* config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT.
* doc/invoke.texi (mrecip): Document that GCC implements vectorized
single float division and vectorized sqrtf(x) with reciprocal sequence
with additional Newton-Raphson step with -ffast-math.

The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph
to check if I didn't mess something with options handling.

The effect of the patch is 7% faster gas_dyn from polyhedron testsuite
on corei7-avx.

Uros.
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 180176)
+++ config/i386/i386.h  (working copy)
@@ -2322,6 +2322,7 @@
 #define RECIP_MASK_VEC_SQRT0x08
 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \
 | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
+#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
 
 #define TARGET_RECIP_DIV   ((recip_mask  RECIP_MASK_DIV) != 0)
 #define TARGET_RECIP_SQRT  ((recip_mask  RECIP_MASK_SQRT) != 0)
Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 180176)
+++ config/i386/i386.opt(working copy)
@@ -32,7 +32,7 @@
 HOST_WIDE_INT ix86_isa_flags_explicit
 
 TargetVariable
-int recip_mask
+int recip_mask = RECIP_MASK_DEFAULT
 
 Variable
 int recip_mask_explicit
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 180176)
+++ doc/invoke.texi (working copy)
@@ -12927,6 +12927,11 @@
 already with @option{-ffast-math} (or the above option combination), and
 doesn't need @option{-mrecip}.
 
+Also note that GCC emits the above sequence with additional Newton-Raphson step
+for vectorized single float division and vectorized sqrtf(x) already with
+@option{-ffast-math} (or the above option combination), and doesn't need
+@option{-mrecip}.
+
 @item -mrecip=@var{opt}
 @opindex mrecip=opt
 This option allows to control which reciprocal estimate instructions


Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Richard Guenther
On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote:
 x...@google.com (Rong Xu) writes:

 After some off-line discussion, we decided to use a more general approach
 to control the printing of optimization messages/warnings. We will
 introduce a new option -fopt-info:
  * fopt-info=0 or fno-opt-info: no message will be emitted.
  * fopt-info or fopt-info=1: emit important warnings and optimization
    messages with large performance impact.
  * fopt-info=2: warnings and optimization messages targeting power users.
  * fopt-info=3: informational messages for compiler developers.

This doesn't look scalable if you consider that each pass would print
as much of a mess like -fvectorizer-verbose=5.

I think =2 and =3 should be omitted - we do have dump-files for a reason.

Also the coverage/profile cases you changed do not at all match
... with large performance impact.  In fact the impact is completely
unknown (as it would be the case usually).

I'd rather have a way to make dump-files more structured (so, following
some standard reporting scheme) than introducing yet another way
of output.  [after making dump-files more consistent it will be easy
to revisit patches like this, there would be a natural general central
way to implement it]

So, please fix dump-files instead.  And for coverage/profiling, fill
in stuff in a dump-file!

Richard.

 It would be interested to have some warnings about missing SRA
 opportunities in =1 or =2. I found that sometimes fixing those can give a
 large speedup.

 Right now a common case that prevents SRA on structure field
 is simply a memset or memcpy.

 -Andi


 --
 a...@linux.intel.com -- Speaking for myself only



Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)

2011-10-20 Thread Ulrich Weigand
Dodji Seketeli wrote:

 I have lightly tested it on SPU in a cross compiled environment (so I
 couldn't bootstrap it there) and I have bootstrapped it on
 x86_64-unknown-linux-gnu.  One person confirmed in the audit trail of
 the PR that it fixes the issue for him on PPC, so I am proposing the
 patch even if I don't know if it bootstraps on SPU or PPC in general.

Well, SPU doesn't bootstrap as such (it's a target-only architecture),
but I can confirm that the patch does fix the newlib build failure I
was seeing on SPU.

Thanks for the quick fix!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order

2011-10-20 Thread Kirill Yukhin

 OK.

 Thanks,
 Uros.

Great,
could anybody please commit that?

K


Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)

2011-10-20 Thread Ulrich Weigand
Dodji Seketeli wrote:

cpp_context *context = pfile-context;
if (context-tokens_kind == TOKENS_KIND_DIRECT)
 -return ((LAST (context).token - FIRST (context).token)
 - / sizeof (cpp_token));
 +return (LAST (context).token - FIRST (context).token);
else if (context-tokens_kind == TOKENS_KIND_INDIRECT
  || context-tokens_kind == TOKENS_KIND_EXTENDED)
  return ((LAST (context).ptoken - FIRST (context).ptoken)

B.t.w. isn't the same thinko also present in the else if path:

  else if (context-tokens_kind == TOKENS_KIND_INDIRECT
   || context-tokens_kind == TOKENS_KIND_EXTENDED)
return ((LAST (context).ptoken - FIRST (context).ptoken)
/ sizeof (cpp_token *));

ptoken seems to be of type const cpp_token **, so the pointer
subtraction already divides by sizeof (cpp_token *).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: Avoid -mno-accumulate-outgoing-args in tests on Windows target

2011-10-20 Thread Richard Guenther
On Thu, Oct 20, 2011 at 2:03 AM, Joseph S. Myers
jos...@codesourcery.com wrote:
 The -mno-accumulate-outgoing-args option does not work with the stack
 probing used on Windows targets, giving a warning and so causing tests
 using that option to fail.  This patch makes three tests not use that
 option on affected targets, like sse-10.c (see
 http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00180.html for the
 introduction of the warning and the sse-10.c change).

 Tested with cross to i686-mingw32.  OK to commit?

Ok.

Thanks,
Richard.

 2011-10-19  Joseph Myers  jos...@codesourcery.com

        * gcc.target/i386/pr40906-1.c, gcc.target/i386/pr40906-2.c,
        gcc.target/i386/pr46226.c: Do not use
        -mno-accumulate-outgoing-args.

 Index: gcc/testsuite/gcc.target/i386/pr40906-2.c
 ===
 --- gcc/testsuite/gcc.target/i386/pr40906-2.c   (revision 180200)
 +++ gcc/testsuite/gcc.target/i386/pr40906-2.c   (working copy)
 @@ -1,6 +1,7 @@
  /* { dg-do run } */
  /* { dg-require-effective-target ia32 } */
  /* { dg-options -O2 -Wno-psabi -fomit-frame-pointer 
 -fno-asynchronous-unwind-tables -mpush-args -mno-accumulate-outgoing-args 
 -m128bit-long-double } */
 +/* { dg-options -O2 -Wno-psabi -fomit-frame-pointer 
 -fno-asynchronous-unwind-tables -mpush-args -m128bit-long-double { target 
 *-*-mingw* *-*-cygwin* } } */

  void abort (void);

 Index: gcc/testsuite/gcc.target/i386/pr46226.c
 ===
 --- gcc/testsuite/gcc.target/i386/pr46226.c     (revision 180200)
 +++ gcc/testsuite/gcc.target/i386/pr46226.c     (working copy)
 @@ -1,5 +1,6 @@
  /* { dg-do run } */
  /* { dg-options -Os -fomit-frame-pointer -mno-accumulate-outgoing-args 
 -fno-asynchronous-unwind-tables } */
 +/* { dg-options -Os -fomit-frame-pointer -fno-asynchronous-unwind-tables { 
 target *-*-mingw* *-*-cygwin* } } */

  extern void abort(void);

 Index: gcc/testsuite/gcc.target/i386/pr40906-1.c
 ===
 --- gcc/testsuite/gcc.target/i386/pr40906-1.c   (revision 180200)
 +++ gcc/testsuite/gcc.target/i386/pr40906-1.c   (working copy)
 @@ -1,6 +1,7 @@
  /* { dg-do run } */
  /* { dg-require-effective-target ia32 } */
  /* { dg-options -O2 -fomit-frame-pointer -fno-asynchronous-unwind-tables 
 -mpush-args -mno-accumulate-outgoing-args } */
 +/* { dg-options -O2 -fomit-frame-pointer -fno-asynchronous-unwind-tables 
 -mpush-args { target *-*-mingw* *-*-cygwin* } } */

  void abort (void);


 --
 Joseph S. Myers
 jos...@codesourcery.com



Re: Use of vector instructions in memmov/memset expanding

2011-10-20 Thread Michael Zolotukhin
Middle-end part of the patch is attached.

On 20 October 2011 12:34, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
 I fixed the tests as well as updated my branch and fixed introduced
 during this process bugs.
 Here is fixed complete patch (other parts will be sent in consequent letters).

 The changes passed bootstrap and make check.

 On 29 September 2011 15:21, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 On Thu, Sep 29, 2011 at 03:14:40PM +0400, Michael Zolotukhin wrote:
 +/* { dg-options -O2 -march=atom -mtune=atom -m64 -dp } */

 The testcases are wrong, -m64 or -m32 should never appear in dg-options,
 instead if the testcase is specific to -m64, it should be guarded with
 /* { dg-do compile { target lp64 } } */
 resp. ia32 (or ilp32, depending on what exactly should be done for -mx32),
 if you have the same testcase for -m32 and -m64, but just want different
 scan-assembler for the two cases, then just guard the scan-assembler
 with lp64 resp. ia32/ilp32 target and add second one for the other target.

        Jakub

 --
 ---
 Best regards,
 Michael V. Zolotukhin,
 Software Engineer
 Intel Corporation.




-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


memfunc-mid-3.patch
Description: Binary data


Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)

2011-10-20 Thread Dodji Seketeli
Ulrich Weigand uweig...@de.ibm.com writes:

 I can confirm that the patch does fix the newlib build failure I was
 seeing on SPU.

Pheew, thank you.

Below is a better patch that I am bootstrapping at the moment.

From: Dodji Seketeli do...@redhat.com
Date: Thu, 20 Oct 2011 09:43:49 +0200
Subject: [PATCH] Fix thinko in _cpp_remaining_tokens_num_in_context

libcpp/

* lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of
number of tokens.
---
 libcpp/ChangeLog |6 ++
 libcpp/lex.c |6 ++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index bbb4085..128d3e1 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2011-10-20  Dodji Seketeli  do...@redhat.com
+
+   PR bootstrap/50801
+   * lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of
+   number of tokens.
+
 2011-10-18  Dodji Seketeli  do...@redhat.com
 
PR bootstrap/50760
diff --git a/libcpp/lex.c b/libcpp/lex.c
index cd6ae9f..527368b 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -1710,12 +1710,10 @@ _cpp_remaining_tokens_num_in_context (cpp_reader *pfile)
 {
   cpp_context *context = pfile-context;
   if (context-tokens_kind == TOKENS_KIND_DIRECT)
-return ((LAST (context).token - FIRST (context).token)
-   / sizeof (cpp_token));
+return (LAST (context).token - FIRST (context).token);
   else if (context-tokens_kind == TOKENS_KIND_INDIRECT
   || context-tokens_kind == TOKENS_KIND_EXTENDED)
-return ((LAST (context).ptoken - FIRST (context).ptoken)
-   / sizeof (cpp_token *));
+return (LAST (context).ptoken - FIRST (context).ptoken);
   else
   abort ();
 }
-- 
1.7.6.4



-- 
Dodji


Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO

2011-10-20 Thread Richard Guenther
On Thu, Oct 20, 2011 at 9:24 AM, Andi Kleen a...@firstfloor.org wrote:
 From: Andi Kleen a...@linux.intel.com

 Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most
 convenient way to get this into existing Makefiles is using small
 wrappers that pass the plugin. This matches how other compilers
 (LLVM, icc) do this too.

 My previous attempt at using shell scripts for this
 http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html
 was not approved. Here's another attempt using wrappers written
 in C. It's only a single wrapper which just adds a --plugin
 argument before calling the respective binutils utilities.

Thanks for doing this.  How do they end up being used?  I suppose
Makefiles will need to call gcc-ar then instead of ar?  In which case
I wonder if ...

 The logic gcc.c uses to find the files is very complicated. I didn't
 try to replicate it 100% and left out some magic. I would be interested
 if this simple method works for everyone or if more code needs
 to be added. This only needs to support LTO supporting hosts of course.

;)

... using something like gcc --ar would be more convenient (as you
can then trivially share the find-the-files logic)?  Did you consider
factoring out the find-the-file logic to a shared file that you can re-use?

Thanks,
Richard.

 I didn't add any documentation because the syntax is exactly the same as
 the native ar/ranlib/nm.

 Passed bootstrap and test suite on x86_64-linux.

 gcc/:
 2011-10-19  Andi Kleen  a...@linux.intel.com

        * Makefile.in (MOSTLYCLEANFILES): Add gcc-ar/nm/ranlib.
        (native): Add gcc-ar.
        (AR_OBJS, AR_LIBS, gcc-ar, gcc-ar.o): Add.
        (install): Depend on install-gcc-ar.
        (install-gcc-ar): Add.
        (uninstall): Uninstall gcc-ar/nm/ranlib.
        * gcc-ar.c: Add new file.
 ---
  gcc/Makefile.in |   28 +--
  gcc/gcc-ar.c    |  109 
 +++
  2 files changed, 134 insertions(+), 3 deletions(-)
  create mode 100644 gcc/gcc-ar.c

 diff --git a/gcc/Makefile.in b/gcc/Makefile.in
 index 6b28ef5..7816243 100644
 --- a/gcc/Makefile.in
 +++ b/gcc/Makefile.in
 @@ -1545,7 +1545,8 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h 
 insn-codes.h \
  genrtl.h gt-*.h gtype-*.h gtype-desc.c gtyp-input.list \
  xgcc$(exeext) cpp$(exeext) cc1$(exeext) $(EXTRA_PASSES) \
  $(EXTRA_PARTS) $(EXTRA_PROGRAMS) gcc-cross$(exeext) \
 - $(SPECS) collect2$(exeext) lto-wrapper$(exeext) \
 + $(SPECS) collect2$(exeext) gcc-ar$(exeext) gcc-nm$(exeext) \
 + gcc-ranlib$(exeext) \
  gcov-iov$(build_exeext) gcov$(exeext) gcov-dump$(exeext) \
  gengtype$(exeext) *.[0-9][0-9].* *.[si] *-checksum.c libbackend.a \
  libcommon-target.a libcommon.a libgcc.mk
 @@ -1791,7 +1792,8 @@ rest.encap: lang.rest.encap
  # This is what is made with the host's compiler
  # whether making a cross compiler or not.
  native: config.status auto-host.h build-@POSUB@ $(LANGUAGES) \
 -       $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext)
 +       $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext) \
 +       gcc-ar$(exeext)

  ifeq ($(enable_plugin),yes)
  native: gengtype$(exeext)
 @@ -2049,6 +2051,17 @@ sbitmap.o: sbitmap.c sbitmap.h $(CONFIG_H) $(SYSTEM_H) 
 coretypes.h $(BASIC_BLOCK
  ebitmap.o: ebitmap.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(EBITMAP_H)
  sparseset.o: sparseset.c $(SYSTEM_H) sparseset.h $(CONFIG_H)

 +AR_OBJS = gcc-ar.o
 +AR_LIBS = @COLLECT2_LIBS@
 +gcc-ar$(exeext): $(AR_OBJS) $(LIBDEPS)
 +       +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
 +               $(AR_OBJS) $(LIBS) $(AR_LIBS)
 +
 +gcc-ar.o: gcc-ar.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H)
 +       $(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(DRIVER_DEFINES) \
 +       -DTARGET_MACHINE=\$(target_noncanonical)\ \
 +       -c $(srcdir)/gcc-ar.c $(OUTPUT_OPTION) @TARGET_SYSTEM_ROOT_DEFINE@
 +
  COLLECT2_OBJS = collect2.o collect2-aix.o tlink.o
  COLLECT2_LIBS = @COLLECT2_LIBS@
  collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 @@ -4576,7 +4589,7 @@ maintainer-clean:
  # broken is small.
  install: install-common $(INSTALL_HEADERS) \
     install-cpp install-man install-info install-@POSUB@ \
 -    install-driver install-lto-wrapper
 +    install-driver install-lto-wrapper install-gcc-ar

  ifeq ($(enable_plugin),yes)
  install: install-plugin
 @@ -4901,6 +4914,12 @@ install-collect2: collect2 installdirs
  install-lto-wrapper: lto-wrapper$(exeext)
        $(INSTALL_PROGRAM) lto-wrapper$(exeext) 
 $(DESTDIR)$(libexecsubdir)/lto-wrapper$(exeext)

 +# XXX hardlink if system supports it
 +install-gcc-ar:
 +       $(INSTALL_PROGRAM) gcc-ar$(exeext) $(DESTDIR)$(bindir)/gcc-ar$(exeext)
 +       $(INSTALL_PROGRAM) gcc-ar$(exeext) $(DESTDIR)$(bindir)/gcc-nm$(exeext)
 +       $(INSTALL_PROGRAM) gcc-ar$(exeext) 
 $(DESTDIR)$(bindir)/gcc-ranlib$(exeext)
 +
  # Cancel installation by deleting the installed files.
  uninstall: lang.uninstall
        -rm -rf $(DESTDIR)$(libsubdir)
 @@ 

Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)

2011-10-20 Thread Dodji Seketeli
Ulrich Weigand uweig...@de.ibm.com writes:

 B.t.w. isn't the same thinko also present in the else if path:

Right.  Jakub spotted it as well.  Hence the followup patch in the other
subthread.

Thanks for watching.

-- 
Dodji


Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)

2011-10-20 Thread Dodji Seketeli
Dodji Seketeli do...@redhat.com writes:

 libcpp/

   * lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of
   number of tokens.

Jakub OKed the patch on IRC, so I went ahead and committed to trunk

Thanks.

-- 
Dodji


Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO

2011-10-20 Thread Jan Hubicka
 On Thu, Oct 20, 2011 at 9:24 AM, Andi Kleen a...@firstfloor.org wrote:
  From: Andi Kleen a...@linux.intel.com
 
  Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most
  convenient way to get this into existing Makefiles is using small
  wrappers that pass the plugin. This matches how other compilers
  (LLVM, icc) do this too.
 
  My previous attempt at using shell scripts for this
  http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html
  was not approved. Here's another attempt using wrappers written
  in C. It's only a single wrapper which just adds a --plugin
  argument before calling the respective binutils utilities.
 
 Thanks for doing this.  How do they end up being used?  I suppose
 Makefiles will need to call gcc-ar then instead of ar?  In which case

Yes, it is what other compilers provide at the moment, too.

In longer run, I would like to see binutils plugin machinery to be able
to resolve this by itself for all installed compilers in the system.  This
is bit tricky:
 1) binutils already has default plugin search path.  We need to arrange our
plugin to install there
 2) it is not realistic to expect exactly one linker plugin on the system.
LLVM/Open64/ICC eventually will want to provide their own plugins on
that search path
 3) Either we will need to install plugin for every GCC release installed
or we will need to make our plugin resonably backward compatible.
This is probably not that big deal since the symbol table is rather simple
part of LTO machinery.  We broke compatibility in between 4.5/4.6 and 4.7,
but we probably could get more serious here.

 I wonder if ...
 
  The logic gcc.c uses to find the files is very complicated. I didn't
  try to replicate it 100% and left out some magic. I would be interested
  if this simple method works for everyone or if more code needs
  to be added. This only needs to support LTO supporting hosts of course.
 
 ;)
 
 ... using something like gcc --ar would be more convenient (as you
 can then trivially share the find-the-files logic)?  Did you consider
 factoring out the find-the-file logic to a shared file that you can re-use?

Hmm, these alternatives would work with me.
Bit ugly feature about gcc --ar is the fact that all options after --ar are
passed to real ar and must be in the ar's syntax. That one is different from
ours (and different from nm or ranlib's), so the formal description of how
command line options works would get bit tricky.

Honza


Re: [Patch ARM] Fix PR target/50106

2011-10-20 Thread Ramana Radhakrishnan
On 19 October 2011 20:38, Nathan Froyd nfr...@mozilla.com wrote:
 On 10/19/2011 3:27 PM, Ramana Radhakrishnan wrote:

 Index: gcc/config/arm/arm.c
 -      live_regs_mask |= extra_mask  (size / UNITS_PER_WORD);
 +      live_regs_mask |= extra_mask  ((size + 3) / UNITS_PER_WORD);

 IIUC, wouldn't ((size + UNITS_PER_WORD - 1) / UNITS_PER_WORD) be clearer?

 -Nathan



Doh ! Yes , this is what I committed.

Ramana

 2011-10-20  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

   PR target/50106
   * config/arm/arm.c (thumb_unexpanded_epilogue): Handle return
reg size from 1-3.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 180239)
+++ gcc/config/arm/arm.c(working copy)
@@ -21652,7 +21652,8 @@
   if (extra_pop  0)
 {
   unsigned long extra_mask = (1  extra_pop) - 1;
-  live_regs_mask |= extra_mask  (size / UNITS_PER_WORD);
+  live_regs_mask |= extra_mask  ((size + UNITS_PER_WORD - 1) 
+  / UNITS_PER_WORD);
 }
 
   /* The prolog may have pushed some high registers to use as


Re: [PATCH] Account for devirtualization opportunities in inliner

2011-10-20 Thread Jan Hubicka
Hi,
sorry for delayed review.  I am still trying to get ipa-inline-analysis to 
behave well on real codebases
and make my mind around how to get more advanced hints, like this one, into it.

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index bd4d2ea..5e88c2d 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -711,14 +711,23 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
 /* Work out what conditions might be true at invocation of E.  */
 
 static clause_t
-evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
+evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
+ bool inline_p,
+ VEC (tree, heap) **known_vals_ptr,
+ VEC (tree, heap) **known_binfos_ptr)

Hmm, I would make clause also returned by reference to be sonsistent and perhaps
call it something like edge_properties
since it is not really only about evaulating the clause anymore.

-/* Increase SIZE and TIME for size and time needed to handle all calls in 
NODE.  */
+/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
+   KNOWN_BINFOS.  */
+
+static void
+estimate_edge_devirt_benefit (struct cgraph_edge *ie,
+ int *size, int *time, int prob,
+ VEC (tree, heap) *known_vals,
+ VEC (tree, heap) *known_binfos)

I think this whole logic should go into estimate_edge_time_and_size.  This way
we will save all the duplication of scaling logic
Just add the known_vals/binfos arguments.

I am not quite sure how to estimate the actual benefits.  estimate_num_insns
doesn't really make a difference in between direct and indirect calls.

I see it is good idea to inline more then the destination is known  inlinable.
This is an example when we have additional knowledge that we want to mix into
badness metric that does not directly translate to time/size.  There are 
multiple
cases like this.  I was thinking of adding kind of bonus metric for this 
purpose,
but I would suggest doing this incrementally.

What about
 1) extending estimate_num_insns wieghts to account direct calls differently
from indirect calls (i.e. adding indirect_call cost value into eni wights)
I would set it 2 for size metrics and 15 for time metrics for start
 2) make estimate_edge_time_and_size to subtract difference of those two metrics
from edge costs when destination is direct.

Incrementally we can think of how to extra prioritize direct calls with 
inlinable
targets.
+/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.
+   POSSIBLE_TRUTHS, KNOWN_VALS and KNOWN_BINFOS describe context of the call
+   site.  */
 
 static void
 estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
- clause_t possible_truths)
+ clause_t possible_truths,
+ VEC (tree, heap) *known_vals,
+ VEC (tree, heap) *known_binfos)
 {
   struct cgraph_edge *e;
   for (e = node-callees; e; e = e-next_callee)
@@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, 
int *size, int *time,
}
  else
estimate_calls_size_and_time (e-callee, size, time,
- possible_truths);
+ possible_truths,
+ /* TODO: remap KNOWN_VALS and
+KNOWN_BINFOS to E-CALLEE
+parameters, and use them.  */
+ NULL, NULL);

Remapping should not be needed here - the jump functions are merged after 
marking edge inline, so jump
functions in inlined functions actually reffer to the parameters of the 
function they are inlined to.

Honza


Re: [Patch ARM] Fix PR target/50106

2011-10-20 Thread Jakub Jelinek
On Wed, Oct 19, 2011 at 08:27:26PM +0100, Ramana Radhakrishnan wrote:
 Ok to backport to 4.6 branch given it is branch freeze time ? I'll be

Yeah (with the changes Nathan suggested).

 2011-10-19  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org
 
 PR target/50106
 * config/arm/arm.c (thumb_unexpanded_epilogue): Handle return
 reg size from 1-3.

Jakub


Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO

2011-10-20 Thread Richard Guenther
On Thu, Oct 20, 2011 at 10:56 AM, Jan Hubicka hubi...@ucw.cz wrote:
 On Thu, Oct 20, 2011 at 9:24 AM, Andi Kleen a...@firstfloor.org wrote:
  From: Andi Kleen a...@linux.intel.com
 
  Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most
  convenient way to get this into existing Makefiles is using small
  wrappers that pass the plugin. This matches how other compilers
  (LLVM, icc) do this too.
 
  My previous attempt at using shell scripts for this
  http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html
  was not approved. Here's another attempt using wrappers written
  in C. It's only a single wrapper which just adds a --plugin
  argument before calling the respective binutils utilities.

 Thanks for doing this.  How do they end up being used?  I suppose
 Makefiles will need to call gcc-ar then instead of ar?  In which case

 Yes, it is what other compilers provide at the moment, too.

 In longer run, I would like to see binutils plugin machinery to be able
 to resolve this by itself for all installed compilers in the system.  This
 is bit tricky:
  1) binutils already has default plugin search path.  We need to arrange our
    plugin to install there
  2) it is not realistic to expect exactly one linker plugin on the system.
    LLVM/Open64/ICC eventually will want to provide their own plugins on
    that search path
  3) Either we will need to install plugin for every GCC release installed
    or we will need to make our plugin resonably backward compatible.
    This is probably not that big deal since the symbol table is rather simple
    part of LTO machinery.  We broke compatibility in between 4.5/4.6 and 4.7,
    but we probably could get more serious here.

 I wonder if ...

  The logic gcc.c uses to find the files is very complicated. I didn't
  try to replicate it 100% and left out some magic. I would be interested
  if this simple method works for everyone or if more code needs
  to be added. This only needs to support LTO supporting hosts of course.

 ;)

 ... using something like gcc --ar would be more convenient (as you
 can then trivially share the find-the-files logic)?  Did you consider
 factoring out the find-the-file logic to a shared file that you can re-use?

 Hmm, these alternatives would work with me.
 Bit ugly feature about gcc --ar is the fact that all options after --ar are
 passed to real ar and must be in the ar's syntax. That one is different from
 ours (and different from nm or ranlib's), so the formal description of how
 command line options works would get bit tricky.

Yeah, maybe use it as `gcc --ar`, thus make it print the found ar plus
the plugin argument ...

At least somehow sharing the file finding code would be nice, but I don't
want to block the patch in its current form if sharing it does complicate
things more than it simplifies them by not duplicating code.

Richard.

 Honza



Honnor -fno-topleverl-reorder with whopr for vars and functions

2011-10-20 Thread Jan Hubicka
Hi,
this patch makes -fno-toplevel-reorder to work better with WHOPR.
The functions and variables comes out in proper order that is needed for Linux
kernel to currently boot with LTO because linker order is important there for
kernel's initialization code.  
I also used this code when comparing various code layout algorithms - the
default layout is not as bad as one might think in most cases.

The implementation is generally simple - lto_balanced_map already works on
fixed order of functions. It however grabs variables to first partition that
reffers to them and if none is found, they are all homed in the last partition.

This needs to be changed and variables needs to be inserted in order when
corresponding function is inserted, this is reason for lto_balanced_map
changes.

Also we sort partitions by size in lto_wpa_write_files to make parallel make
finish faster.  This would mix the linker order and needs to be disbaled.
We could of course output separate linker and makefile order, but I did't bother
to do so.

Also the patch won't output toplevel asm statements correctly - these are
still homed in first partition.  I can look into this incrementally.
However to make this useful, we probably ought to prevent lto_balanced_map
to break up partitions in the middle of asm file.

This is not needed for kernel, so I deffer it for later time.

Unfortunately the patch doesn't make kernel to build since we hit quite
involved bug in partitioning and variable promotion.  I am working on fix but it
will take me bit time.
Well, extra stress on bugs in partitioning is another reason for this patch
to be interesting.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* lto/lto.c (node_cmp, varpool_node_cmp): New functions.
(lto_balanced_map): Honnor -fno-toplevel-reorder of varsfunctions.
(cmp_partitions): Rename to ...
(cmp_partitions_size): ... this one.
(cmp_partitions_order): New function.
(lto_wpa_write_files): Sort partitions by order when
-fno-toplevel-reorder is used.
Index: lto/lto.c
===
--- lto/lto.c   (revision 180181)
+++ lto/lto.c   (working copy)
@@ -1665,6 +1673,23 @@ lto_1_to_1_map (void)
 ltrans_partitions);
 }
 
+/* Helper function for qsort; sort nodes by order.  */
+static int
+node_cmp (const void *pa, const void *pb)
+{
+  const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
+  const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
+  return b-order - a-order;
+}
+
+/* Helper function for qsort; sort nodes by order.  */
+static int
+varpool_node_cmp (const void *pa, const void *pb)
+{
+  const struct varpool_node *a = *(const struct varpool_node * const *) pa;
+  const struct varpool_node *b = *(const struct varpool_node * const *) pb;
+  return b-order - a-order;
+}
 
 /* Group cgraph nodes into equally-sized partitions.
 
@@ -1708,9 +1733,11 @@ static void
 lto_balanced_map (void)
 {
   int n_nodes = 0;
+  int n_varpool_nodes = 0, varpool_pos = 0;
   struct cgraph_node **postorder =
 XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
   struct cgraph_node **order = XNEWVEC (struct cgraph_node *, cgraph_max_uid);
+  struct varpool_node **varpool_order = NULL;
   int i, postorder_len;
   struct cgraph_node *node;
   int total_size = 0, best_total_size = 0;
@@ -1722,6 +1749,7 @@ lto_balanced_map (void)
   int best_n_nodes = 0, best_n_varpool_nodes = 0, best_i = 0, best_cost =
 INT_MAX, best_internal = 0;
   int npartitions;
+  int current_order = -1;
 
   for (vnode = varpool_nodes; vnode; vnode = vnode-next)
 gcc_assert (!vnode-aux);
@@ -1731,6 +1759,7 @@ lto_balanced_map (void)
  multiple partitions, this is just an estimate of real size.  This is why
  we keep partition_size updated after every partition is finalized.  */
   postorder_len = ipa_reverse_postorder (postorder);
+
   for (i = 0; i  postorder_len; i++)
 {
   node = postorder[i];
@@ -1742,6 +1771,23 @@ lto_balanced_map (void)
 }
   free (postorder);
 
+  if (!flag_toplevel_reorder)
+{
+  qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp);
+
+  for (vnode = varpool_nodes; vnode; vnode = vnode-next)
+   if (partition_varpool_node_p (vnode))
+ n_varpool_nodes++;
+  varpool_order = XNEWVEC (struct varpool_node *, n_varpool_nodes);
+
+  n_varpool_nodes = 0;
+  for (vnode = varpool_nodes; vnode; vnode = vnode-next)
+   if (partition_varpool_node_p (vnode))
+ varpool_order[n_varpool_nodes++] = vnode;
+  qsort (varpool_order, n_varpool_nodes, sizeof (struct varpool_node *),
+varpool_node_cmp);
+}
+
   /* Compute partition size and create the first partition.  */
   partition_size = total_size / PARAM_VALUE (PARAM_LTO_PARTITIONS);
   if (partition_size  PARAM_VALUE (MIN_PARTITION_SIZE))
@@ -1756,8 +1802,20 @@ lto_balanced_map 

Re: Honnor -fno-topleverl-reorder with whopr for vars and functions

2011-10-20 Thread Richard Guenther
On Thu, 20 Oct 2011, Jan Hubicka wrote:

 Hi,
 this patch makes -fno-toplevel-reorder to work better with WHOPR.
 The functions and variables comes out in proper order that is needed for Linux
 kernel to currently boot with LTO because linker order is important there for
 kernel's initialization code.  
 I also used this code when comparing various code layout algorithms - the
 default layout is not as bad as one might think in most cases.
 
 The implementation is generally simple - lto_balanced_map already works on
 fixed order of functions. It however grabs variables to first partition that
 reffers to them and if none is found, they are all homed in the last 
 partition.
 
 This needs to be changed and variables needs to be inserted in order when
 corresponding function is inserted, this is reason for lto_balanced_map
 changes.
 
 Also we sort partitions by size in lto_wpa_write_files to make parallel make
 finish faster.  This would mix the linker order and needs to be disbaled.
 We could of course output separate linker and makefile order, but I did't 
 bother
 to do so.
 
 Also the patch won't output toplevel asm statements correctly - these are
 still homed in first partition.  I can look into this incrementally.
 However to make this useful, we probably ought to prevent lto_balanced_map
 to break up partitions in the middle of asm file.
 
 This is not needed for kernel, so I deffer it for later time.
 
 Unfortunately the patch doesn't make kernel to build since we hit quite
 involved bug in partitioning and variable promotion.  I am working on fix but 
 it
 will take me bit time.
 Well, extra stress on bugs in partitioning is another reason for this patch
 to be interesting.
 
 Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

 Honza
 
   * lto/lto.c (node_cmp, varpool_node_cmp): New functions.
   (lto_balanced_map): Honnor -fno-toplevel-reorder of varsfunctions.
   (cmp_partitions): Rename to ...
   (cmp_partitions_size): ... this one.
   (cmp_partitions_order): New function.
   (lto_wpa_write_files): Sort partitions by order when
   -fno-toplevel-reorder is used.
 Index: lto/lto.c
 ===
 --- lto/lto.c (revision 180181)
 +++ lto/lto.c (working copy)
 @@ -1665,6 +1673,23 @@ lto_1_to_1_map (void)
ltrans_partitions);
  }
  
 +/* Helper function for qsort; sort nodes by order.  */
 +static int
 +node_cmp (const void *pa, const void *pb)
 +{
 +  const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
 +  const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
 +  return b-order - a-order;
 +}
 +
 +/* Helper function for qsort; sort nodes by order.  */
 +static int
 +varpool_node_cmp (const void *pa, const void *pb)
 +{
 +  const struct varpool_node *a = *(const struct varpool_node * const *) pa;
 +  const struct varpool_node *b = *(const struct varpool_node * const *) pb;
 +  return b-order - a-order;
 +}
  
  /* Group cgraph nodes into equally-sized partitions.
  
 @@ -1708,9 +1733,11 @@ static void
  lto_balanced_map (void)
  {
int n_nodes = 0;
 +  int n_varpool_nodes = 0, varpool_pos = 0;
struct cgraph_node **postorder =
  XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
struct cgraph_node **order = XNEWVEC (struct cgraph_node *, 
 cgraph_max_uid);
 +  struct varpool_node **varpool_order = NULL;
int i, postorder_len;
struct cgraph_node *node;
int total_size = 0, best_total_size = 0;
 @@ -1722,6 +1749,7 @@ lto_balanced_map (void)
int best_n_nodes = 0, best_n_varpool_nodes = 0, best_i = 0, best_cost =
  INT_MAX, best_internal = 0;
int npartitions;
 +  int current_order = -1;
  
for (vnode = varpool_nodes; vnode; vnode = vnode-next)
  gcc_assert (!vnode-aux);
 @@ -1731,6 +1759,7 @@ lto_balanced_map (void)
   multiple partitions, this is just an estimate of real size.  This is why
   we keep partition_size updated after every partition is finalized.  */
postorder_len = ipa_reverse_postorder (postorder);
 +
for (i = 0; i  postorder_len; i++)
  {
node = postorder[i];
 @@ -1742,6 +1771,23 @@ lto_balanced_map (void)
  }
free (postorder);
  
 +  if (!flag_toplevel_reorder)
 +{
 +  qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp);
 +
 +  for (vnode = varpool_nodes; vnode; vnode = vnode-next)
 + if (partition_varpool_node_p (vnode))
 +   n_varpool_nodes++;
 +  varpool_order = XNEWVEC (struct varpool_node *, n_varpool_nodes);
 +
 +  n_varpool_nodes = 0;
 +  for (vnode = varpool_nodes; vnode; vnode = vnode-next)
 + if (partition_varpool_node_p (vnode))
 +   varpool_order[n_varpool_nodes++] = vnode;
 +  qsort (varpool_order, n_varpool_nodes, sizeof (struct varpool_node *),
 +  varpool_node_cmp);
 +}
 +
/* Compute partition size and create the first partition.  */

Re: [PATCH, i386]: Use reciprocal sequences for vectorized SFmode division and sqrtf(x) for -ffast-math

2011-10-20 Thread Michael Matz
Hi,

On Thu, 20 Oct 2011, Uros Bizjak wrote:

 This patch builds on recent patch by Michael (that implemented 
 fine-grained control on -mrecip option) and with -ffast-math emits 
 reciprocal sequences with additional NR step for vectorized SFmode 
 division and vectorized sqrtf(x).

FWIW, I didn't yet come to do the same for cpu2006, but here are the two 
results of polyhedron (sandybridge, with baseflags -Ofast -funroll-loops 
-fpeel-loops -march=corei7-avx -mveclibabi=svml -flto -fwhole-program, 
i.e. without increasing the inline limits, and linking against libimf and 
libsvml).  With the above flags:

  Benchmark   Compile  Executable   Ave Run  Number   Estim
Name(secs) (bytes)(secs) Repeats   Err %
   -   ---  --   --- ---  --
  ac  4.68 4086864  6.16   2  0.0211
  aermod 68.22 5603956 13.40   5  0.1864
 air 10.46 4961134  3.78   5  0.2888
capacita  3.74 4213850 19.24   3  0.0998
 channel  1.44 4808524  1.22   5  0.2898
   doduc 12.64 4288238 19.91   5  0.1128
 fatigue  4.47 4217301  3.71   5  0.0989
 gas_dyn  6.92 4211997  3.43   5  2.8640
  induct  7.44 4385543 10.33   5  0.2719
   linpk  1.28 4053798  5.88   2  0.0647
mdbx  3.97 4114107  7.63   5  0.1365
  nf  4.89 4147809  7.90   2  0.0380
 protein 15.07 5049415 20.70   5  0.7615
  rnflow 11.89 4260434 16.05   5  0.1359
test_fpu  8.11 4207868  3.69   5  0.6687
tfft  0.99 4110713  0.84   5  0.3024

Geometric Mean Execution Time =   6.35 seconds

With the above flags plus -mrecip=vec-sqrt,vec-div:

   Benchmark   Compile  Executable   Ave Run  Number   Estim
Name(secs) (bytes)(secs) Repeats   Err %
   -   ---  --   --- ---  --
  ac  3.85 4086864  6.17   2  0.0227
  aermod 68.31 5603956 13.38   2  0.0019
 air 10.92 4961134  3.77   5  0.1367
capacita  3.71 4213850 18.68   2  0.0391
 channel  1.41 4808524  1.22   5  0.3327
   doduc 12.66 4288238 19.93   5  0.2391
 fatigue  4.36 4217301  3.70   2  0.0567
 gas_dyn  6.91 4211997  2.31   2  0.0867
  induct  7.46 4385543 10.31   5  0.1201
   linpk  1.70 4053798  5.88   2  0.0383
mdbx  3.98 4114107  7.68   5  0.4000
  nf  4.89 4147809  7.89   2  0.0348
 protein 14.00 5049415 20.51   2  0.0478
  rnflow 11.89 4260434 16.05   4  0.0837
test_fpu  8.09 4207868  3.71   5  0.7097
tfft  1.13 4110713  0.83   5  0.2290

Geometric Mean Execution Time =   6.18 seconds

I.e. gas_dyn improves quite a bit (as expected), and the rest still works.  
I know that cpu2006 also works, but as said have no recent measurements 
for that, which I'm going to take now.


Ciao,
Michael.


Re: [PATCH] Distribute inliner's size_time data across entries with similar predicates

2011-10-20 Thread Jan Hubicka
Hi,
 Jan,
 
 The following patch started as a one-liner for ipa-inline-analysis.c: 
 account_size_time() to merge predicates when we are adding data to entry[0] 
 (i.e., when space for 32 size_time entries is exhausted):
 
 @@ -537,6 +592,9 @@ account_size_time (struct inline_summary
  }
else
  {
 +  e-predicate = or_predicates (summary-conds, e-predicate, pred);
e-size += size;
e-time += time;
if (e-time  MAX_TIME * INLINE_TIME_SCALE)

As we discussed, this is not needed in current form because we arrange first 
predicate to be always
true and thus we could always place there all the costs that did not fit 
elwhere.

The patch has a problem with fact that the predicates must be always 
conservative i.e. when
they are proved to be false the code must be unreachable after inlining.

We could either go with your patch with the distance fuction modified to accept
only predicates such that the new predicate is implied by them.  If you are
willing to play with this, I have no problem with going for this.

The accounting is run just at most N statements of time, so the overall time 
should not be too bad.

We could also stay with current logic until we hit real world testcases that 
demonstrate need
for something like this and drop comment in the code above explaning why or is 
not needed
to avoid confussion.

Honza


Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)

2011-10-20 Thread Richard Guenther
On Wed, 19 Oct 2011, Jakub Jelinek wrote:

 Hi!
 
 Similarly to casts of bool to integer, even stores into bool arrays
 can be handled similarly.  Just we need to ensure tree-vect-data-refs.c
 doesn't reject vectorization before tree-vect-patterns.c has a chance
 to optimize it.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok with ...

 2011-10-19  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/50596
   * tree-vect-stmts.c (vect_mark_relevant): Only use
   FOR_EACH_IMM_USE_FAST if lhs is SSA_NAME.
   (vectorizable_store): If is_pattern_stmt_p look through
   VIEW_CONVERT_EXPR on lhs.
   * tree-vect-patterns.c (vect_recog_bool_pattern): Optimize
   also stores into bool memory in addition to casts from bool
   to integral types.
   (vect_mark_pattern_stmts): If pattern_stmt already has vinfo
   created, don't create it again.
   * tree-vect-data-refs.c (vect_analyze_data_refs): For stores
   into bool memory use vectype for integral type corresponding
   to bool's mode.
   * tree-vect-loop.c (vect_determine_vectorization_factor): Give up
   if a store into bool memory hasn't been replaced by the pattern
   recognizer.
 
   * gcc.dg/vect/vect-cond-10.c: New test.
 
 --- gcc/tree-vect-stmts.c.jj  2011-10-18 23:52:07.0 +0200
 +++ gcc/tree-vect-stmts.c 2011-10-19 14:19:00.0 +0200
 @@ -159,19 +159,20 @@ vect_mark_relevant (VEC(gimple,heap) **w
/* This use is out of pattern use, if LHS has other uses that are
   pattern uses, we should mark the stmt itself, and not the 
 pattern
   stmt.  */
 -  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
 -{
 -  if (is_gimple_debug (USE_STMT (use_p)))
 -continue;
 -  use_stmt = USE_STMT (use_p);
 +   if (TREE_CODE (lhs) == SSA_NAME)
 + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
 +   {
 + if (is_gimple_debug (USE_STMT (use_p)))
 +   continue;
 + use_stmt = USE_STMT (use_p);
  
 -  if (vinfo_for_stmt (use_stmt)
 -   STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt)))
 -{
 -  found = true;
 -  break;
 -}
 -}
 + if (vinfo_for_stmt (use_stmt)
 +  STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt)))
 +   {
 + found = true;
 + break;
 +   }
 +   }
  }
  
if (!found)
 @@ -3656,6 +3657,9 @@ vectorizable_store (gimple stmt, gimple_
  return false;
  
scalar_dest = gimple_assign_lhs (stmt);
 +  if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR
 +   is_pattern_stmt_p (stmt_info))
 +scalar_dest = TREE_OPERAND (scalar_dest, 0);
if (TREE_CODE (scalar_dest) != ARRAY_REF
 TREE_CODE (scalar_dest) != INDIRECT_REF
 TREE_CODE (scalar_dest) != COMPONENT_REF

Just change the if () stmt to

 if (!handled_component_p (scalar_dest)
  TREE_CODE (scalar_dest) != MEM_REF)
   return false;

 --- gcc/tree-vect-patterns.c.jj   2011-10-18 23:52:05.0 +0200
 +++ gcc/tree-vect-patterns.c  2011-10-19 13:55:27.0 +0200
 @@ -1933,6 +1933,50 @@ vect_recog_bool_pattern (VEC (gimple, he
VEC_safe_push (gimple, heap, *stmts, last_stmt);
return pattern_stmt;
  }
 +  else if (rhs_code == SSA_NAME
 + STMT_VINFO_DATA_REF (stmt_vinfo))
 +{
 +  stmt_vec_info pattern_stmt_info;
 +  vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
 +  gcc_assert (vectype != NULL_TREE);
 +  if (!check_bool_pattern (var, loop_vinfo))
 + return NULL;
 +
 +  rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts);
 +  if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF)
 + {
 +   lhs = copy_node (lhs);

We don't handle TARGET_MEM_REF in vectorizable_store, so no need to
do it here.  In fact, just unconditionally do ...

 +   TREE_TYPE (lhs) = TREE_TYPE (vectype);
 + }
 +  else
 + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);

... this (wrap it in a V_C_E).  No need to special-case any
MEM_REFs.

 +  if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))

This should never be false, so you can as well unconditionally build
the conversion stmt.

 + {
 +   tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
 +   gimple cast_stmt
 + = gimple_build_assign_with_ops (NOP_EXPR, rhs2, rhs, NULL_TREE);
 +   STMT_VINFO_PATTERN_DEF_STMT (stmt_vinfo) = cast_stmt;
 +   rhs = rhs2;
 + }
 +  pattern_stmt
 + = gimple_build_assign_with_ops (SSA_NAME, lhs, rhs, NULL_TREE);
 +  pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL);
 +  set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info);
 +  STMT_VINFO_DATA_REF 

[patch] dwarf2out crash: missing GTY? (PR 50806)

2011-10-20 Thread Jan Kratochvil
Hi,

with custom patched dwarf2out.c I got a crash on memory mangled by the garbage
collector.  With patched GTY there the crash no longer happened - but I do not
have a reproducer anymore, sorry if it is a bogus patch.

The memory corrupted later was initially allocated and stored into
mem_loc_result-dw_loc_oprnd1.v.val_loc.  I do not think there is any other
reference to it than that field with no GTY.

GIT 33e7b55c2549d655d88ec64c06c51912d0d07527
gcc (GCC) 4.7.0 20111002 (experimental)

11900 mem_loc_result-dw_loc_oprnd1.v.val_loc = op0;
(gdb) bt
#0  mem_loc_descriptor (rtl=, mode=SImode, mem_mode=VOIDmode, 
initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:11900
#1  in loc_descriptor (rtl=, mode=SImode, 
initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12790
#2  in loc_descriptor (rtl=, mode=SImode, 
initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12614
#3  in dw_loc_list_1 (loc=, varloc=, want_address=2, 
initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12889
#4  in dw_loc_list (loc_list=, decl=, want_address=2) at gcc/dwarf2out.c:13145
#5  in loc_list_from_tree (loc=, want_address=2) at gcc/dwarf2out.c:13538
#6  in add_location_or_const_value_attribute (die=, decl=, cache_p=0 '\000', 
attr=DW_AT_location) at gcc/dwarf2out.c:15048
#7  in gen_formal_parameter_die (node=, origin=0x0, emit_name_p=1 '\001', 
context_die=) at gcc/dwarf2out.c:16804
#8  in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19632
#9  in gen_subprogram_die (decl=, context_die=) at gcc/dwarf2out.c:17560
#10 in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19545
#11 in dwarf2out_decl (decl=) at gcc/dwarf2out.c:19919
#12 in dwarf2out_function_decl (decl=) at gcc/dwarf2out.c:19927
#13 in rest_of_handle_final () at gcc/final.c:4252
#14 in execute_one_pass (pass=0x4dbe120) at gcc/passes.c:2064
#15 in execute_pass_list (pass=0x4dbe120) at gcc/passes.c:2119
#16 in execute_pass_list (pass=0x4dbef00) at gcc/passes.c:2120
#17 in execute_pass_list (pass=0x4dbeea0) at gcc/passes.c:2120
#18 in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420
#19 in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803
#20 in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862
#21 in cgraph_optimize () at gcc/cgraphunit.c:2133
#22 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310
#23 in c_write_global_declarations () at gcc/c-decl.c:9936
#24 in compile_file () at gcc/toplev.c:581
#25 in do_compile () at gcc/toplev.c:1925
#26 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001
#27 in main (argc=101, argv=) at gcc/main.c:36

It was later freed (watchpoint hit) by:

(gdb) bt
#0  __memset_sse2 () at ../sysdeps/x86_64/memset.S:333
#1  in poison_pages () at gcc/ggc-page.c:1845
#2  in ggc_collect () at gcc/ggc-page.c:1938
#3  in execute_todo (flags=2) at gcc/passes.c:1763
#4  in execute_one_pass (pass=0x4dbce80) at gcc/passes.c:2087
#5  in execute_pass_list (pass=0x4dbce80) at gcc/passes.c:2119
#6  in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420
#7  in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803
#8  in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862
#9  in cgraph_optimize () at gcc/cgraphunit.c:2133
#10 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310
#11 in c_write_global_declarations () at gcc/c-decl.c:9936
#12 in compile_file () at gcc/toplev.c:581
#13 in do_compile () at gcc/toplev.c:1925
#14 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001
#15 in main (argc=101, argv=) at gcc/main.c:36

And later it crashed on the mangled memory.


OK to check it in?  No regression testing done.


Thanks,
Jan


gcc/
2011-10-20  Jan Kratochvil  jan.kratoch...@redhat.com

* dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr;

--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1211,7 +1210,7 @@ typedef struct GTY(()) dw_loc_list_struct {
   char *ll_symbol; /* Label for beginning of location list.
  Only on head of list */
   const char *section; /* Section this loclist is relative to */
-  dw_loc_descr_ref expr;
+  dw_loc_descr_ref GTY(()) expr;
   hashval_t hash;
   /* True if all addresses in this and subsequent lists are known to be
  resolved.  */


Re: [patch] dwarf2out crash: missing GTY? (PR 50806)

2011-10-20 Thread Richard Guenther
On Thu, Oct 20, 2011 at 12:14 PM, Jan Kratochvil
jan.kratoch...@redhat.com wrote:
 Hi,

 with custom patched dwarf2out.c I got a crash on memory mangled by the garbage
 collector.  With patched GTY there the crash no longer happened - but I do not
 have a reproducer anymore, sorry if it is a bogus patch.

 The memory corrupted later was initially allocated and stored into
 mem_loc_result-dw_loc_oprnd1.v.val_loc.  I do not think there is any other
 reference to it than that field with no GTY.

 GIT 33e7b55c2549d655d88ec64c06c51912d0d07527
 gcc (GCC) 4.7.0 20111002 (experimental)

 11900         mem_loc_result-dw_loc_oprnd1.v.val_loc = op0;
 (gdb) bt
 #0  mem_loc_descriptor (rtl=, mode=SImode, mem_mode=VOIDmode, 
 initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:11900
 #1  in loc_descriptor (rtl=, mode=SImode, 
 initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12790
 #2  in loc_descriptor (rtl=, mode=SImode, 
 initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12614
 #3  in dw_loc_list_1 (loc=, varloc=, want_address=2, 
 initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12889
 #4  in dw_loc_list (loc_list=, decl=, want_address=2) at gcc/dwarf2out.c:13145
 #5  in loc_list_from_tree (loc=, want_address=2) at gcc/dwarf2out.c:13538
 #6  in add_location_or_const_value_attribute (die=, decl=, cache_p=0 '\000', 
 attr=DW_AT_location) at gcc/dwarf2out.c:15048
 #7  in gen_formal_parameter_die (node=, origin=0x0, emit_name_p=1 '\001', 
 context_die=) at gcc/dwarf2out.c:16804
 #8  in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19632
 #9  in gen_subprogram_die (decl=, context_die=) at gcc/dwarf2out.c:17560
 #10 in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19545
 #11 in dwarf2out_decl (decl=) at gcc/dwarf2out.c:19919
 #12 in dwarf2out_function_decl (decl=) at gcc/dwarf2out.c:19927
 #13 in rest_of_handle_final () at gcc/final.c:4252
 #14 in execute_one_pass (pass=0x4dbe120) at gcc/passes.c:2064
 #15 in execute_pass_list (pass=0x4dbe120) at gcc/passes.c:2119
 #16 in execute_pass_list (pass=0x4dbef00) at gcc/passes.c:2120
 #17 in execute_pass_list (pass=0x4dbeea0) at gcc/passes.c:2120
 #18 in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420
 #19 in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803
 #20 in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862
 #21 in cgraph_optimize () at gcc/cgraphunit.c:2133
 #22 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310
 #23 in c_write_global_declarations () at gcc/c-decl.c:9936
 #24 in compile_file () at gcc/toplev.c:581
 #25 in do_compile () at gcc/toplev.c:1925
 #26 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001
 #27 in main (argc=101, argv=) at gcc/main.c:36

 It was later freed (watchpoint hit) by:

 (gdb) bt
 #0  __memset_sse2 () at ../sysdeps/x86_64/memset.S:333
 #1  in poison_pages () at gcc/ggc-page.c:1845
 #2  in ggc_collect () at gcc/ggc-page.c:1938
 #3  in execute_todo (flags=2) at gcc/passes.c:1763
 #4  in execute_one_pass (pass=0x4dbce80) at gcc/passes.c:2087
 #5  in execute_pass_list (pass=0x4dbce80) at gcc/passes.c:2119
 #6  in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420
 #7  in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803
 #8  in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862
 #9  in cgraph_optimize () at gcc/cgraphunit.c:2133
 #10 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310
 #11 in c_write_global_declarations () at gcc/c-decl.c:9936
 #12 in compile_file () at gcc/toplev.c:581
 #13 in do_compile () at gcc/toplev.c:1925
 #14 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001
 #15 in main (argc=101, argv=) at gcc/main.c:36

 And later it crashed on the mangled memory.


 OK to check it in?  No regression testing done.

I don't see how it can make any difference.

Richard.


 Thanks,
 Jan


 gcc/
 2011-10-20  Jan Kratochvil  jan.kratoch...@redhat.com

        * dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr;

 --- a/gcc/dwarf2out.c
 +++ b/gcc/dwarf2out.c
 @@ -1211,7 +1210,7 @@ typedef struct GTY(()) dw_loc_list_struct {
   char *ll_symbol; /* Label for beginning of location list.
                      Only on head of list */
   const char *section; /* Section this loclist is relative to */
 -  dw_loc_descr_ref expr;
 +  dw_loc_descr_ref GTY(()) expr;
   hashval_t hash;
   /* True if all addresses in this and subsequent lists are known to be
      resolved.  */



Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)

2011-10-20 Thread Jakub Jelinek
On Thu, Oct 20, 2011 at 11:42:01AM +0200, Richard Guenther wrote:
  +  if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR
  +   is_pattern_stmt_p (stmt_info))
  +scalar_dest = TREE_OPERAND (scalar_dest, 0);
 if (TREE_CODE (scalar_dest) != ARRAY_REF
  TREE_CODE (scalar_dest) != INDIRECT_REF
  TREE_CODE (scalar_dest) != COMPONENT_REF
 
 Just change the if () stmt to
 
  if (!handled_component_p (scalar_dest)
   TREE_CODE (scalar_dest) != MEM_REF)
return false;

That will accept BIT_FIELD_REF and ARRAY_RANGE_REF (as well as VCE outside of 
pattern stmts).
The VCEs I hope don't appear, but the first two might, and I'm not sure
we are prepared to handle them.  Certainly not BIT_FIELD_REFs.

  +  rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, 
  stmts);
  +  if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF)
  +   {
  + lhs = copy_node (lhs);
 
 We don't handle TARGET_MEM_REF in vectorizable_store, so no need to
 do it here.  In fact, just unconditionally do ...
 
  + TREE_TYPE (lhs) = TREE_TYPE (vectype);
  +   }
  +  else
  +   lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);
 
 ... this (wrap it in a V_C_E).  No need to special-case any
 MEM_REFs.

Ok.  After all it seems vectorizable_store pretty much ignores it
(except for the scalar_dest check above).  For aliasing it uses the type
from DR_REF and otherwise it uses the vectorized type.

  +  if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
 
 This should never be false, so you can as well unconditionally build
 the conversion stmt.

You mean because currently adjust_bool_pattern will prefer signed types
over unsigned while here lhs will be unsigned?  I guess I should
change it to use signed type for the memory store too to avoid the extra
cast instead.  Both types can be certainly the same precision, e.g. for:
unsigned char a[N], b[N];
unsigned int d[N], e[N];
bool c[N];
...
  for (i = 0; i  N; ++i)
c[i] = a[i]  b[i];
or different precision, e.g. for:
  for (i = 0; i  N; ++i)
c[i] = d[i]  e[i];

  @@ -347,6 +347,28 @@ vect_determine_vectorization_factor (loo
gcc_assert (STMT_VINFO_DATA_REF (stmt_info)
|| is_pattern_stmt_p (stmt_info));
vectype = STMT_VINFO_VECTYPE (stmt_info);
  + if (STMT_VINFO_DATA_REF (stmt_info))
  +   {
  + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
  + tree scalar_type = TREE_TYPE (DR_REF (dr));
  + /* vect_analyze_data_refs will allow bool writes through,
  +in order to allow vect_recog_bool_pattern to transform
  +those.  If they couldn't be transformed, give up now.  */
  + if (((TYPE_PRECISION (scalar_type) == 1
  +TYPE_UNSIGNED (scalar_type))
  +  || TREE_CODE (scalar_type) == BOOLEAN_TYPE)
 
 Shouldn't it be always possible to vectorize those?  For loads
 we can assume the memory contains only 1 or 0 (we assume that for
 scalar loads), for stores we can mask out all other bits explicitly
 if you add support for truncating conversions to non-mode precision
 (in fact, we could support non-mode precision vectorization that way,
 if not support bitfield loads or extending conversions).

Not without the pattern recognizer transforming it into something.
That is something we've discussed on IRC before I started working on the
first vect_recog_bool_pattern patch, we'd need to special case bool and
one-bit precision types in way too many places all around the vectorizer.
Another reason for that was that what vect_recog_bool_pattern does currently
is certainly way faster than what would we end up with if we just handled
bool as unsigned (or signed?) char with masking on casts and stores
- the ability to use any integer type for the bools rather than char
as appropriate means we can avoid many VEC_PACK_TRUNK_EXPRs and
corresponding VEC_UNPACK_{LO,HI}_EXPRs.
So the chosen solution was attempt to transform some of bool patterns
into something the vectorizer can handle easily.
And that can be extended over time what it handles.

The above just reflects it, probably just me trying to be too cautious,
the vectorization would likely fail on the stmt feeding the store, because
get_vectype_for_scalar_type would fail on it.

If we wanted to support general TYPE_PRECISION != GET_MODE_BITSIZE (TYPE_MODE)
vectorization (hopefully with still preserving the pattern bool recognizer
for the above stated reasons), we'd start with changing
get_vectype_for_scalar_type to handle those types (then the
tree-vect-data-refs.c and tree-vect-loop.c changes from this patch would
be unnecessary), but then we'd need to handle it in other places too
(I guess loads would be fine (unless BIT_FIELD_REF loads), but then
casts and stores need extra code).

Jakub


Re: [patch] dwarf2out crash: missing GTY? (PR 50806)

2011-10-20 Thread Jakub Jelinek
On Thu, Oct 20, 2011 at 12:21:58PM +0200, Richard Guenther wrote:
 I don't see how it can make any difference.

Indeed, I see no changes in gt-dwarf2out.h with the patch.
So it doesn't do anything.

  2011-10-20  Jan Kratochvil  jan.kratoch...@redhat.com
 
         * dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr;
 
  --- a/gcc/dwarf2out.c
  +++ b/gcc/dwarf2out.c
  @@ -1211,7 +1210,7 @@ typedef struct GTY(()) dw_loc_list_struct {
    char *ll_symbol; /* Label for beginning of location list.
                       Only on head of list */
    const char *section; /* Section this loclist is relative to */
  -  dw_loc_descr_ref expr;
  +  dw_loc_descr_ref GTY(()) expr;
    hashval_t hash;
    /* True if all addresses in this and subsequent lists are known to be
       resolved.  */
 

Jakub


Plug some bogus used uninitialized warnings

2011-10-20 Thread Jan Hubicka
Hi,
last time I tried profiledbootstrap with LTO I had to plug the following
warnings.  Will commit it as obvoius later today.

Honza

* pt.c (unify_pack_expansion): Iniitalize bad_old_arg
and bad_new_arg.
* parser.c (cp_parser_ctor_initializer_opt_and_function_body):
Initialize list.

* sched-deps.c (sched_get_condition_with_rev_uncached): Iniitalize tmp.

Index: cp/pt.c
===
*** cp/pt.c (revision 180241)
--- cp/pt.c (working copy)
*** unify_pack_expansion (tree tparms, tree
*** 15714,15720 
  }
else
{
! tree bad_old_arg, bad_new_arg;
  tree old_args = ARGUMENT_PACK_ARGS (old_pack);
  
  if (!comp_template_args_with_info (old_args, new_args,
--- 15714,15720 
  }
else
{
! tree bad_old_arg = NULL, bad_new_arg = NULL;
  tree old_args = ARGUMENT_PACK_ARGS (old_pack);
  
  if (!comp_template_args_with_info (old_args, new_args,
Index: cp/parser.c
===
*** cp/parser.c (revision 180241)
--- cp/parser.c (working copy)
*** cp_parser_function_body (cp_parser *pars
*** 16887,16893 
  static bool
  cp_parser_ctor_initializer_opt_and_function_body (cp_parser *parser)
  {
!   tree body, list;
bool ctor_initializer_p;
const bool check_body_p =
   DECL_CONSTRUCTOR_P (current_function_decl)
--- 16887,16893 
  static bool
  cp_parser_ctor_initializer_opt_and_function_body (cp_parser *parser)
  {
!   tree body, list = NULL;
bool ctor_initializer_p;
const bool check_body_p =
   DECL_CONSTRUCTOR_P (current_function_decl)
Index: sched-deps.c
===
*** sched-deps.c(revision 180241)
--- sched-deps.c(working copy)
*** sched_get_condition_with_rev_uncached (c
*** 544,550 
  static rtx
  sched_get_condition_with_rev (const_rtx insn, bool *rev)
  {
!   bool tmp;
  
if (INSN_LUID (insn) == 0)
  return sched_get_condition_with_rev_uncached (insn, rev);
--- 544,550 
  static rtx
  sched_get_condition_with_rev (const_rtx insn, bool *rev)
  {
!   bool tmp = false;
  
if (INSN_LUID (insn) == 0)
  return sched_get_condition_with_rev_uncached (insn, rev);


Re: Plug some bogus used uninitialized warnings

2011-10-20 Thread Jakub Jelinek
On Thu, Oct 20, 2011 at 12:35:39PM +0200, Jan Hubicka wrote:
 Hi,
 last time I tried profiledbootstrap with LTO I had to plug the following
 warnings.  Will commit it as obvoius later today.

Please use NULL_TREE instead of NULL for tree initializers.

Jakub


[Ada] Fix couple of issues with pragma Source_Reference

2011-10-20 Thread Eric Botcazou
The GNAT specific pragma Source_Reference can alter the source line mapping,
leading to (logical) line numbers lower than 1 or greater than the maximum
number of lines in the file.  Dodji's patch shows that we weren't taking it 
into account in gigi.

Tested on i586-suse-linux, applied on the mainline.


2011-10-20  Eric Botcazou  ebotca...@adacore.com

* back_end.adb (Call_Back_End): Pass the maximum logical line number
instead of the maximum physical line number to gigi.
* gcc-interface/trans.c (Sloc_to_locus): Cope with line zero.


2011-10-20  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/source_ref1.adb: New test.
* gnat.dg/source_ref2.adb: Likewise.


-- 
Eric Botcazou
Index: back_end.adb
===
--- back_end.adb	(revision 180235)
+++ back_end.adb	(working copy)
@@ -114,9 +114,13 @@ package body Back_End is
  return;
   end if;
 
+  --  The back end needs to know the maximum line number that can appear
+  --  in a Sloc, in other words the maximum logical line number.
+
   for J in 1 .. Last_Source_File loop
  File_Info_Array (J).File_Name:= Full_Debug_Name (J);
- File_Info_Array (J).Num_Source_Lines := Num_Source_Lines (J);
+ File_Info_Array (J).Num_Source_Lines :=
+   Nat (Physical_To_Logical (Last_Source_Line (J), J));
   end loop;
 
   if Generate_SCIL then
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 180235)
+++ gcc-interface/trans.c	(working copy)
@@ -8393,6 +8393,10 @@ Sloc_to_locus (Source_Ptr Sloc, location
   Column_Number column = Get_Column_Number (Sloc);
   struct line_map *map = LINEMAPS_ORDINARY_MAP_AT (line_table, file - 1);
 
+  /* We can have zero if pragma Source_Reference is in effect.  */
+  if (line  1)
+	line = 1;
+
   /* Translate the location.  */
   *locus = linemap_position_for_line_and_column (map, line, column);
 }
pragma Source_Reference (1, p2.adb);

procedure Source_Ref2 is
pragma Source_Reference (2, p2.adb);
begin
   null;
end;
pragma Source_Reference (3, p1.adb);

procedure Source_Ref1 is
begin
   null;
end;


[Ada] Housekeeping work in gigi (40/n)

2011-10-20 Thread Eric Botcazou
Tested on i586-suse-linux, applied on the mainline.


2011-10-20  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/trans.c (lhs_or_actual_p): New predicate.
(unchecked_conversion_nop): Use it.
(gnat_to_gnu): Likewise.


-- 
Eric Botcazou
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 180242)
+++ gcc-interface/trans.c	(working copy)
@@ -4472,6 +4472,28 @@ Compilation_Unit_to_gnu (Node_Id gnat_no
   invalidate_global_renaming_pointers ();
 }
 
+/* Return true if GNAT_NODE is on the LHS of an assignment or an actual
+   parameter of a call.  */
+
+static bool
+lhs_or_actual_p (Node_Id gnat_node)
+{
+  Node_Id gnat_parent = Parent (gnat_node);
+  Node_Kind kind = Nkind (gnat_parent);
+
+  if (kind == N_Assignment_Statement  Name (gnat_parent) == gnat_node)
+return true;
+
+  if ((kind == N_Procedure_Call_Statement || kind == N_Function_Call)
+   Name (gnat_parent) != gnat_node)
+return true;
+
+  if (kind == N_Parameter_Association)
+return true;
+
+  return false;
+}
+
 /* Return true if GNAT_NODE, an unchecked type conversion, is a no-op as far
as gigi is concerned.  This is used to avoid conversions on the LHS.  */
 
@@ -4483,11 +4505,7 @@ unchecked_conversion_nop (Node_Id gnat_n
   /* The conversion must be on the LHS of an assignment or an actual parameter
  of a call.  Otherwise, even if the conversion was essentially a no-op, it
  could de facto ensure type consistency and this should be preserved.  */
-  if (!(Nkind (Parent (gnat_node)) == N_Assignment_Statement
-	 Name (Parent (gnat_node)) == gnat_node)
-   !((Nkind (Parent (gnat_node)) == N_Procedure_Call_Statement
-	|| Nkind (Parent (gnat_node)) == N_Function_Call)
-	Name (Parent (gnat_node)) != gnat_node))
+  if (!lhs_or_actual_p (gnat_node))
 return false;
 
   from_type = Etype (Expression (gnat_node));
@@ -6528,13 +6546,13 @@ gnat_to_gnu (Node_Id gnat_node)
   /* Now convert the result to the result type, unless we are in one of the
  following cases:
 
-   1. If this is the Name of an assignment statement or a parameter of
-	  a procedure call, return the result almost unmodified since the
-	  RHS will have to be converted to our type in that case, unless
-	  the result type has a simpler size.  Likewise if there is just
-	  a no-op unchecked conversion in-between.  Similarly, don't convert
-	  integral types that are the operands of an unchecked conversion
-	  since we need to ignore those conversions (for 'Valid).
+   1. If this is the LHS of an assignment or an actual parameter of a
+	  call, return the result almost unmodified since the RHS will have
+	  to be converted to our type in that case, unless the result type
+	  has a simpler size.  Likewise if there is just a no-op unchecked
+	  conversion in-between.  Similarly, don't convert integral types
+	  that are the operands of an unchecked conversion since we need
+	  to ignore those conversions (for 'Valid).
 
2. If we have a label (which doesn't have any well-defined type), a
 	  field or an error, return the result almost unmodified.  Similarly,
@@ -6549,13 +6567,9 @@ gnat_to_gnu (Node_Id gnat_node)
4. Finally, if the type of the result is already correct.  */
 
   if (Present (Parent (gnat_node))
-   ((Nkind (Parent (gnat_node)) == N_Assignment_Statement
-	Name (Parent (gnat_node)) == gnat_node)
+   (lhs_or_actual_p (gnat_node)
 	  || (Nkind (Parent (gnat_node)) == N_Unchecked_Type_Conversion
 	   unchecked_conversion_nop (Parent (gnat_node)))
-	  || (Nkind (Parent (gnat_node)) == N_Procedure_Call_Statement
-	   Name (Parent (gnat_node)) != gnat_node)
-	  || Nkind (Parent (gnat_node)) == N_Parameter_Association
 	  || (Nkind (Parent (gnat_node)) == N_Unchecked_Type_Conversion
 	   !AGGREGATE_TYPE_P (gnu_result_type)
 	   !AGGREGATE_TYPE_P (TREE_TYPE (gnu_result


Re: [PATCH, RFA] Pass address space to REGNO_MODE_CODE_OK_FOR_BASE_P

2011-10-20 Thread Georg-Johann Lay
Ulrich Weigand schrieb:
 Hello,
 
 Georg-Johann Lay has proposed a patch to add named address space support
 to the AVR target here:
 http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00471.html
 
 Since the target needs to make register allocation decisions for
 address base registers depending on the target address space, a
 prerequiste for this is a patch of mine that I posted a while ago
 to add the address space to the MODE_CODE_BASE_REG_CLASS and
 REGNO_MODE_CODE_OK_FOR_BASE_P target macros.
 
 I've updated the patch for current mainline and re-tested on SPU
 with no regressions.

Meanwhile, there was some code clean-up to avr backend. Would you add this?

Johann
* config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add address space
argument.
(REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
* config/avr/avr-protos.h (avr_mode_code_base_reg_class): Ditto.
(avr_regno_mode_code_ok_for_base_p): Ditto.
* config/avr/avr.c (avr_mode_code_base_reg_class): Ditto.
(avr_regno_mode_code_ok_for_base_p): Ditto.
(avr_reg_ok_for_addr_p): Pass AS down to
avr_regno_mode_code_ok_for_base_p.

 ChangeLog:
 
   * doc/tm.texi.in (MODE_CODE_BASE_REG_CLASS): Add address space
   argument.
   (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise.
   * doc/tm.texi: Regenerate.
 
   * config/cris/cris.h (MODE_CODE_BASE_REG_CLASS): Add address
   space argument.
   (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise.
   * config/bfin/bfin.h (MODE_CODE_BASE_REG_CLASS): Likewise.
   (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise.
 
   * addresses.h (base_reg_class): Add address space argument.
   Pass to MODE_CODE_BASE_REG_CLASS.
   (ok_for_base_p_1): Add address space argument.  Pass to
   REGNO_MODE_CODE_OK_FOR_BASE_P.
   (regno_ok_for_base_p): Add address space argument.  Pass to
   ok_for_base_p_1.
 
   * regrename.c (scan_rtx_address): Add address space argument.
   Pass address space to regno_ok_for_base_p and base_reg_class.
   Update recursive calls.
   (scan_rtx): Pass address space to scan_rtx_address.
   (build_def_use): Likewise.
   * regcprop.c (replace_oldest_value_addr): Add address space
   argument.  Pass to regno_ok_for_base_p and base_reg_class.
   Update recursive calls.
   (replace_oldest_value_mem): Pass address space to
   replace_oldest_value_addr.
   (copyprop_hardreg_forward_1): Likewise.
 
   * reload.c (find_reloads_address_1): Add address space argument.
   Pass address space to base_reg_class and regno_ok_for_base_p.
   Update recursive calls.
   (find_reloads_address): Pass address space to base_reg_class,
   regno_ok_for_base_p, and find_reloads_address_1.
   (find_reloads): Pass address space to base_reg_class.
   (find_reloads_subreg_address): Likewise.
 
   * ira-costs.c (record_reg_classes): Update calls to base_reg_class.
   (ok_for_base_p_nonstrict): Add address space argument.  Pass to
   ok_for_base_p_1.
   (record_address_regs): Add address space argument.  Pass to
   base_reg_class and ok_for_base_p_nonstrict.  Update recursive calls.
   (record_operand_costs): Pass address space to record_address_regs.
   (scan_one_insn): Likewise.
 
   * caller-save.c (init_caller_save): Update call to base_reg_class.
   * ira-conflicts.c (ira_build_conflicts): Likewise.
   * reload1.c (maybe_fix_stack_asms): Likewise.
Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 180193)
+++ config/avr/avr-protos.h	(working copy)
@@ -107,8 +107,8 @@ extern int avr_simplify_comparison_p (en
 extern RTX_CODE avr_normalize_condition (RTX_CODE condition);
 extern void out_shift_with_cnt (const char *templ, rtx insn,
 rtx operands[], int *len, int t_len);
-extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, RTX_CODE, RTX_CODE);
-extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, RTX_CODE, RTX_CODE);
+extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, addr_space_t, RTX_CODE, RTX_CODE);
+extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, addr_space_t, RTX_CODE, RTX_CODE);
 extern rtx avr_incoming_return_addr_rtx (void);
 extern rtx avr_legitimize_reload_address (rtx, enum machine_mode, int, int, int, int, rtx (*)(rtx,int));
 #endif /* RTX_CODE */
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 180193)
+++ config/avr/avr.c	(working copy)
@@ -1213,12 +1213,12 @@ avr_cannot_modify_jumps_p (void)
 /* Helper function for `avr_legitimate_address_p'.  */
 
 static inline bool
-avr_reg_ok_for_addr_p (rtx reg, addr_space_t as ATTRIBUTE_UNUSED,
+avr_reg_ok_for_addr_p (rtx reg, addr_space_t as,
RTX_CODE outer_code, bool strict)
 {
   return (REG_P (reg)
-   

Re: PR bootstrap/50709 (bootstrap miscompare)

2011-10-20 Thread Jan Hubicka
 @@ -1392,16 +1393,20 @@ inline_small_functions (void)
if (!edge-inline_failed)
   continue;
  
 -  /* Be sure that caches are maintained consistent.  */
  #ifdef ENABLE_CHECKING
 +  /* Be sure that caches are maintained conservatively consistent.
 +  This means that cached badness is allways smaller or equal
 +  to the real badness.  */
 +  cached_badness = edge_badness (edge, false);
 +#endif
reset_edge_growth_cache (edge);
reset_node_growth_cache (edge-callee);
 -#endif
  
/* When updating the edge costs, we only decrease badness in the keys.
Increases of badness are handled lazilly; when we see key with out
of date value on it, we re-insert it now.  */
current_badness = edge_badness (edge, false);
 +  gcc_assert (cached_badness == -1 || cached_badness = current_badness);

This new check actually cathes a bug that is in tree since introduction of new
ipa-inline-analysis code.  The inliner assume that when it produce a new inline
copy, the overall growth estimates for all callees can only degrade.  This is
not quite true: when a new knowledge is propagated, the callees might actually
become cheaper and reduce the growth.

This patch takes the easy but expensive way to plug the problem by forcing
updating of all keys in the queue.  It increases LTO compile time of Mozilla to
10 minutes, so I will need to develop better sollution. (the trick saving
recomputation was originally introduced to reduce copmile time particularly on
this testcase) Just I should not keep tree ICEing on many C++ sources until I
am done.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 180247)
+++ ChangeLog   (working copy)
@@ -1,5 +1,10 @@
 2011-10-19  Jan Hubicka  j...@suse.cz
 
+   * ipa-inline.c (inline_small_functions): Always update all calles after
+   inlining.
+
+2011-10-19  Jan Hubicka  j...@suse.cz
+
PR bootstrap/50709
* ipa-inline.c (inline_small_functions): Fix checking code to not make
effect on fibheap stability.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 180247)
+++ ipa-inline.c(working copy)
@@ -1515,8 +1515,13 @@ inline_small_functions (void)
 
  /* We inlined last offline copy to the body.  This might lead
 to callees of function having fewer call sites and thus they
-may need updating.  */
- if (callee-global.inlined_to)
+may need updating. 
+
+FIXME: the callee size could also shrink because more information
+is propagated from caller.  We don't track when this happen and
+thus we need to recompute everything all the time.  Once this is
+solved, || 1 should go away.  */
+ if (callee-global.inlined_to || 1)
update_all_callee_keys (heap, callee, updated_nodes);
  else
update_callee_keys (heap, edge-callee, updated_nodes);


Re: [patch] C6X unwinding/exception handling

2011-10-20 Thread Bernd Schmidt
On 10/17/11 16:10, Nicola Pero wrote:
 I checked the attached patch, test results at
 http://gcc.gnu.org/ml/gcc-testresults/2011-10/msg01377.html

 which are the same as with my suggested patch.

 Ok for the trunk?

 I probably don't have authority to approve this, but looks OK to me.
 
 The libobjc bits are Ok for trunk.

This is just making sure libjava/libobjc match libsupc++, correct? OK if
Andrew doesn't object in the next day or so.


Bernd



Re: PING: [PATCH, ARM, iWMMXt][5/5]: pipeline description

2011-10-20 Thread Ramana Radhakrishnan
On 20 October 2011 08:42, Xinyu Qi x...@marvell.com wrote:
 Ping

 http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01106.html
 Index: gcc/config/arm/marvell-f-iwmmxt.md
 ===
 --- gcc/config/arm/marvell-f-iwmmxt.md(revision 0)
 +++ gcc/config/arm/marvell-f-iwmmxt.md(revision 0)
@@ -0,0 +1,179 @@
+
+;; instructions classes

s/instructions/Instruction.

Otherwise OK.

Ramana


Re: PING: [PATCH, ARM, iWMMXt][1/5]: ARM code generic change

2011-10-20 Thread Ramana Radhakrishnan
On 20 October 2011 08:35, Xinyu Qi x...@marvell.com wrote:
 Ping

 http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01100.html

        * config/arm/arm.c (arm_option_override): Enable use of iWMMXt with 
 VFP.
        Disable use of iwMMXt and Neon.
        (arm_expand_binop_builtin): Accept VOIDmode op.
        * config/arm/arm.md (*arm_movdi): Remove check for TARGET_IWMMXT.
        (*arm_movsi_insn): Likewise.
        (iwmmxt.md): Include earlier.


OK.


cheers
Ramana


[patch tree-optimization]: allow branch-cost optimization for truth-and/or on mode-expanded simple boolean-operands

2011-10-20 Thread Kai Tietz
Hello,

this patch re-enables the branch-cost optimization on simple boolean-typed 
operands, which are casted to a wider integral type.  This happens due casts 
from
boolean-types are preserved, but FE might expands simple-expression to wider 
mode.

I added two tests for already working branch-cost optimization for 
IA-architecture and
two for explicit checking for boolean-type.

ChangeLog

2011-10-20  Kai Tietz  kti...@redhat.com

* fold-const.c (simple_operand_p_2): Handle integral
casts from boolean-operands.

2011-10-20  Kai Tietz  kti...@redhat.com

* gcc.target/i386/branch-cost1.c: New test.
* gcc.target/i386/branch-cost2.c: New test.
* gcc.target/i386/branch-cost3.c: New test.
* gcc.target/i386/branch-cost4.c: New test.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu for all 
languages including Ada and Obj-C++.  Ok for apply?

Regards,
Kai

Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=2 } */
+
+extern int doo (void);
+
+int
+foo (int a, int b)
+{
+  if (a  b)
+   return doo ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times if  1 gimple } } */
+/* { dg-final { scan-tree-dump-times1 gimple } } */
+/* { dg-final { cleanup-tree-dump gimple } } */
Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=2 } */
+
+extern int doo (void);
+
+int
+foo (_Bool a, _Bool b)
+{
+  if (a  b)
+   return doo ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times if  1 gimple } } */
+/* { dg-final { scan-tree-dump-times1 gimple } } */
+/* { dg-final { cleanup-tree-dump gimple } } */
Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=0 } */
+
+extern int doo (void);
+
+int
+foo (_Bool a, _Bool b)
+{
+  if (a  b)
+   return doo ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times if  2 gimple } } */
+/* { dg-final { scan-tree-dump-notgimple } } */
+/* { dg-final { cleanup-tree-dump gimple } } */
Index: gcc-head/gcc/fold-const.c
===
--- gcc-head.orig/gcc/fold-const.c
+++ gcc-head/gcc/fold-const.c
@@ -3706,6 +3706,19 @@ simple_operand_p_2 (tree exp)
   /* Strip any conversions that don't change the machine mode.  */
   STRIP_NOPS (exp);
 
+  /* Handle integral widening casts from boolean-typed
+ expressions as simple.  This happens due casts from
+ boolean-types are preserved, but FE might expands
+ simple-expression to wider mode.  */
+  if (INTEGRAL_TYPE_P (TREE_TYPE (exp))
+   CONVERT_EXPR_P (exp)
+   TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0)))
+== BOOLEAN_TYPE)
+{
+  exp = TREE_OPERAND (exp, 0);
+  STRIP_NOPS (exp);
+}
+
   code = TREE_CODE (exp);
 
   if (TREE_SIDE_EFFECTS (exp)
Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost1.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=0 } */
+
+extern int doo (void);
+
+int
+foo (int a, int b)
+{
+  if (a  b)
+   return doo ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times if  2 gimple } } */
+/* { dg-final { scan-tree-dump-notgimple } } */
+/* { dg-final { cleanup-tree-dump gimple } } */


Re: regcprop.c bug fix

2011-10-20 Thread Bernd Schmidt
On 10/19/11 23:24, Mike Stump wrote:
 So while tracking down a hairy address reload for an output reload
 bug, copyprop_hardreg_forward_1 was faulting because it was trying to
 extract move patterns that didn't work out, and when it came back to
 the code, it then tries to access recog_data, but the problem is, the
 exploration of other instructions to see if they match, overwrites
 that data, and there is nothing that restores the data to a point in
 which the code below this point expects.  It uses
 recog_data.operand[i], where i is limited by n_ops, but that value
 corresponded to the old data in recog_data.  The recog and
 extract_insn in insn_invalid_p called from verify_changes called from
 apply_change_group called from validate_change wipes the `old'
 recog_data with new data.  This data, for example, might only have 2
 operands, with an invalid value for the third operand.  The old
 n_ops, might well be 3 from the original data.  Accessing that data
 can cause a crash.

I found that maximally confusing, so let me try to rephrase it to see if
I understood you. The two calls to validate_change clobber the
recog_data even if they fail. In case they failed, we want to continue
looking at data from the original insn, so we must recompute it.

If that's what you were trying to say, it looks like the right
diagnosis. Better to move the recomputation into the if statement that
contains the validate_change calls and possibly add a comment about the
effect of that function; otherwise OK.


Bernd


Re: Plug some bogus used uninitialized warnings

2011-10-20 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/20/11 04:35, Jan Hubicka wrote:
 Hi, last time I tried profiledbootstrap with LTO I had to plug the
 following warnings.  Will commit it as obvoius later today.
 
 Honza
 
 * pt.c (unify_pack_expansion): Iniitalize bad_old_arg and
 bad_new_arg. * parser.c
 (cp_parser_ctor_initializer_opt_and_function_body): Initialize
 list.
 
 * sched-deps.c (sched_get_condition_with_rev_uncached): Iniitalize
 tmp.
Could we somehow mark cases where we create an initialization to avoid
a bogus warning.  Just some kind of comment marker would be fine.

Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOoCReAAoJEBRtltQi2kC7x/oIAI/FrW/S9MyQjkRP5Kv4oQWM
qDJPAiTufSyHqaYgjRFpblihsimMKEvuJYnxy0KxJXsPhy8HeO1OStnuNhTMLKLY
NAAkjLkq0VjfaEslukLM/OvQWmzJBwlt6nWle9K11KStrlpn1VTSFZWbZeDf5ELR
J4wvj4m57nHtUzy4nL2Iv4fQ2MwGZcdvjBOYQ7txb6szWcA0FY/M+y5gTLJ3vVIP
PRQmk7+nmTO/KJhgMGuWMo/kxvBOnWUl7knEySioHSwGZPxtfvsisO1h0AmfAN4E
yrdm+FAPqhH3MOtRgBryYyiqG/BNiBD1Ia+qGFrljHZarI1WMcJxfs50K9KU6jU=
=ITJz
-END PGP SIGNATURE-


[PATCH, PR50763] Fix for ICE in verify_gimple

2011-10-20 Thread Tom de Vries
Richard,

I have a fix for PR50763.

The second example from the PR looks like this:
...
int bar (int i);

void
foo (int c, int d)
{
  if (bar (c))
bar (c);
  d = 33;
  while (c == d);
}
...

When compiled with -O2 -fno-dominator-opt, the gimple representation before
ftree-tail-merge looks like this:
...
foo (intD.6 cD.1606, intD.6 dD.1607)
{
  intD.6 D.2730;

  # BLOCK 2 freq:900
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  # .MEMD.2733_6 = VDEF .MEMD.2733_5(D)
  # USE = nonlocal
  # CLB = nonlocal
  D.2730_2 = barD.1605 (cD.1606_1(D));
  if (D.2730_2 != 0)
goto bb 3;
  else
goto bb 7;
  # SUCC: 3 [29.0%]  (true,exec) 7 [71.0%]  (false,exec)

  # BLOCK 7 freq:639
  # PRED: 2 [71.0%]  (false,exec)
  goto bb 4;
  # SUCC: 4 [100.0%]  (fallthru)

  # BLOCK 3 freq:261
  # PRED: 2 [29.0%]  (true,exec)
  # .MEMD.2733_7 = VDEF .MEMD.2733_6
  # USE = nonlocal
  # CLB = nonlocal
  barD.1605 (cD.1606_1(D));
  # SUCC: 4 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:900
  # PRED: 7 [100.0%]  (fallthru) 3 [100.0%]  (fallthru,exec)
  # .MEMD.2733_4 = PHI .MEMD.2733_6(7), .MEMD.2733_7(3)
  if (cD.1606_1(D) == 33)
goto bb 8;
  else
goto bb 9;
  # SUCC: 8 [91.0%]  (true,exec) 9 [9.0%]  (false,exec)

  # BLOCK 9 freq:81
  # PRED: 4 [9.0%]  (false,exec)
  goto bb 6;
  # SUCC: 6 [100.0%]  (fallthru)

  # BLOCK 8 freq:819
  # PRED: 4 [91.0%]  (true,exec)
  # SUCC: 5 [100.0%]  (fallthru)

  # BLOCK 5 freq:9100
  # PRED: 8 [100.0%]  (fallthru) 10 [100.0%]  (fallthru)
  if (cD.1606_1(D) == 33)
goto bb 10;
  else
goto bb 11;
  # SUCC: 10 [91.0%]  (true,exec) 11 [9.0%]  (false,exec)

  # BLOCK 10 freq:8281
  # PRED: 5 [91.0%]  (true,exec)
  goto bb 5;
  # SUCC: 5 [100.0%]  (fallthru)

  # BLOCK 11 freq:819
  # PRED: 5 [9.0%]  (false,exec)
  # SUCC: 6 [100.0%]  (fallthru)

  # BLOCK 6 freq:900
  # PRED: 11 [100.0%]  (fallthru) 9 [100.0%]  (fallthru)
  # VUSE .MEMD.2733_4
  return;
  # SUCC: EXIT [100.0%]

}
...

During the first iteration, tail_merge_optimize finds that block 9 and 11, and
block 8 and 10 are equal, and removes block 11 and 10.
During the second iteration it finds that block 4 and block 5 are equal, and it
removes block 5.

Since pre had no effect, the responsibility for updating the vops lies with
tail_merge_optimize.

Block 4 starts with a virtual PHI which needs updating, but replace_block_by
decides that an update is not necessary, because vop_at_entry returns NULL_TREE
for block 5 (the vop_at_entry for block 4 is .MEMD.2733_4).
What is different from normal is that block 4 dominates block 5.

The patch makes sure that the vops are also updated if vop_at_entry is defined
for only one of bb1 and bb2.

This also forced me to rewrite the code that updates the uses, which uses
dominator info now. This forced me to keep the dominator info up-to-date. Which
forced me to move the actual deletion of the basic block and some additional
bookkeeping related to that from purge_bbs to replace_block_by.

Additionally, I fixed the case that update_vuses leaves virtual phis with only
one argument (see unlink_virtual_phi).

bootstrapped and reg-tested on x86_64. The tested patch had one addition to the
attached patch: calling verify_dominators at the end of replace_block_by.

OK for trunk?

Thanks,
- Tom

2011-10-20  Tom de Vries  t...@codesourcery.com

PR tree-optimization/50763
* tree-ssa-tail-merge.c (same_succ_flush_bb): New function, factored out
of ...
(same_succ_flush_bbs): Use same_succ_flush_bb.
(purge_bbs): Remove argument.  Remove calls to same_succ_flush_bbs,
release_last_vdef and delete_basic_block.
(unlink_virtual_phi): New function.
(update_vuses): Add and use vuse1_phi_args argument.  Set var to
SSA_NAME_VAR of vuse1 or vuse2, and use var.  Handle case that def_stmt2
is NULL.  Use phi result as phi arg in case vuse1 or vuse2 is NULL_TREE.
Replace uses of vuse1 if vuse2 is NULL_TREE.  Fix code to limit
replacement of uses.  Propagate phi argument for phis with a single
argument.
(replace_block_by): Update vops if phi_vuse1 or phi_vuse2 is NULL_TREE.
Set vuse1_phi_args if vuse1 is a phi defined in bb1.  Add vuse1_phi_args
as argument to call to update_vuses.  Call release_last_vdef,
same_succ_flush_bb, delete_basic_block.  Update CDI_DOMINATORS info.
(tail_merge_optimize): Remove argument in call to purge_bbs.  Remove
call to free_dominance_info.  Only call calculate_dominance_info once.

* gcc.dg/pr50763.c: New test.
Index: gcc/tree-ssa-tail-merge.c
===
--- gcc/tree-ssa-tail-merge.c (revision 180237)
+++ gcc/tree-ssa-tail-merge.c (working copy)
@@ -753,6 +753,19 @@ delete_basic_block_same_succ (basic_bloc
 bitmap_set_bit (deleted_bb_preds, e-src-index);
 }
 
+/* Removes BB from its corresponding same_succ.  */
+
+static void
+same_succ_flush_bb (basic_block bb)
+{
+ 

Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO

2011-10-20 Thread Andi Kleen
On Thu, Oct 20, 2011 at 10:45:31AM +0200, Richard Guenther wrote:
  My previous attempt at using shell scripts for this
  http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html
  was not approved. Here's another attempt using wrappers written
  in C. It's only a single wrapper which just adds a --plugin
  argument before calling the respective binutils utilities.
 
 Thanks for doing this.  How do they end up being used?  I suppose
 Makefiles will need to call gcc-ar then instead of ar?  In which case
 I wonder if ...

Basically you use

make AR=gcc-ar RANLIB=gcc-ranlib NM=gcc-nm

For most makefiles just specifying ar is enough.

 
  The logic gcc.c uses to find the files is very complicated. I didn't
  try to replicate it 100% and left out some magic. I would be interested
  if this simple method works for everyone or if more code needs
  to be added. This only needs to support LTO supporting hosts of course.
 
 ;)
 
 ... using something like gcc --ar would be more convenient (as you

That's essentially what the old proposal did (gcc -print-plugin-name) 
plus a wrapper. You can see the old discussion here
http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html


 can then trivially share the find-the-files logic)?  Did you consider
 factoring out the find-the-file logic to a shared file that you can re-use?

I did this first (with collect2), but it was quite messy. Still
have it as a branch. Then I settled on this simpler method which
works for me at least.

collect2 does not fully match what gcc.c does I think, so there's
already some divergence.

-Andi



Re: [patch#2] dwarf2out: Drop the size + performance overhead of DW_AT_sibling

2011-10-20 Thread Jan Kratochvil
On Tue, 18 Oct 2011 10:38:23 +0200, Jakub Jelinek wrote:
 On Tue, Oct 18, 2011 at 10:28:09AM +0200, Jan Kratochvil wrote:
  2011-10-12  Jan Kratochvil  jan.kratoch...@redhat.com
  
  Stop producing DW_AT_sibling without -gstrict-dwarf.
  * dwarf2out.c (dwarf2out_finish): Remove calls of
  add_sibling_attributes if !DWARF_STRICT.  Extend the comment with
  reason.
 
 This is ok for trunk.

FYI this patch has not yet been checked in, it has negative performance effect
on the systemtap DWARF consumer.
http://sourceware.org/ml/archer/2011-q4/msg4.html

I will post a patch removing only very short DW_AT_sibling skips later.


Thanks,
Jan


Re: [v3] tr2: bool_set, dynamic_bitset, ratio

2011-10-20 Thread Joseph S. Myers
On Wed, 19 Oct 2011, Ed Smith-Rowland wrote:

 I don't know if there is a paper yet.  I also did rational using the gmp
 library.  I'm wondering if rational should be a template class that could take

Having things in libstdc++ etc. using GMP runs into the same issues as 
libquadmath of not wanting to link in non-libc LGPL code unless required - 
in particular for -static-libstdc++, where you should be able to 
distribute a binary built with -static-libstdc++ without either it having 
a dependency on GMP (unless the relevant features are used) or including 
GMP code with the consequent complications to distribution requirements.  
(Quite apart from GMP changing its SONAME - not under our control, unlike 
libquadmath.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO

2011-10-20 Thread Joseph S. Myers
On Thu, 20 Oct 2011, Andi Kleen wrote:

 collect2 does not fully match what gcc.c does I think, so there's
 already some divergence.

collect2 is always called from within the gcc driver, so it can rely on 
environment variables set by the driver.  As I understand it, these 
wrappers are not called from within the driver - they are called in the 
same environment as the driver itself is called in.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, i386]: Use reciprocal sequences for vectorized SFmode division and sqrtf(x) for -ffast-math

2011-10-20 Thread Uros Bizjak
On Thu, Oct 20, 2011 at 4:45 PM, Joseph S. Myers
jos...@codesourcery.com wrote:

 The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph
 to check if I didn't mess something with options handling.

 I have no comments on the option handling in this patch.

 +for vectorized single float division and vectorized sqrtf(x) already with

 @code{sqrtf (@var{x})}

Thanks - fixed, with a similar fix in the previous paragraph.

I also found a PR that deals with vectorized reciprocal, so I referred
to the PR in the ChangeLog entry:

2011-10-20  Uros Bizjak  ubiz...@gmail.com

PR target/47989
* config/i386/i386.h (RECIP_MASK_DEFAULT): New define.
* config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT.
* doc/invoke.texi (ix86 Options, -mrecip): Document that GCC
implements vectorized single float division and vectorized sqrtf(x)
with reciprocal sequence with additional Newton-Raphson step with
-ffast-math.

Attached is the patch that was committed to mainline SVN. Encouraged
by Michael's results, let's see what automated benchmark testers will
show.

Uros.
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 180255)
+++ config/i386/i386.h  (working copy)
@@ -2322,6 +2322,7 @@
 #define RECIP_MASK_VEC_SQRT0x08
 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \
 | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
+#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
 
 #define TARGET_RECIP_DIV   ((recip_mask  RECIP_MASK_DIV) != 0)
 #define TARGET_RECIP_SQRT  ((recip_mask  RECIP_MASK_SQRT) != 0)
Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 180255)
+++ config/i386/i386.opt(working copy)
@@ -32,7 +32,7 @@
 HOST_WIDE_INT ix86_isa_flags_explicit
 
 TargetVariable
-int recip_mask
+int recip_mask = RECIP_MASK_DEFAULT
 
 Variable
 int recip_mask_explicit
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 180255)
+++ doc/invoke.texi (working copy)
@@ -12922,7 +12922,12 @@
 of the non-reciprocal instruction, the precision of the sequence can be
 decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).
 
-Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS)
+Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of RSQRTSS
+(or RSQRTPS) already with @option{-ffast-math} (or the above option
+combination), and doesn't need @option{-mrecip}.
+
+Also note that GCC emits the above sequence with additional Newton-Raphson step
+for vectorized single float division and vectorized @code{sqrtf(@var{x})}
 already with @option{-ffast-math} (or the above option combination), and
 doesn't need @option{-mrecip}.
 


Re: [cxx-mem-model] compare_exchange implementation II

2011-10-20 Thread Richard Henderson
On 10/19/2011 05:43 PM, Andrew MacLeod wrote:
   * optabs.h (direct_optab_index): Replace DOI_atomic_compare_exchange
   with DOI_atomic_compare_and_swap.
   (direct_op): Add DOI_atomic_compare_and_swap.
   * genopinit.c: Set atomic_compare_and_swap_optab.
   * expr.h (expand_atomic_compare_exchange): Add parameter.
   * builtins.c (builtin_atomic_compare_exchange): Add weak parameter
   and verify it is a compile time constant.
   * optabs.c (expand_atomic_compare_exchange): Use atomic_compare_and_swap
   if present, otherwise use __sync_val_compare_and_swap.
   * builtin-types.def (BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_BOOL_INT_INT):
   Add the bool parameter.
   * sync-builtins.def (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_*): Use new
   prototype.
 
   * c-family/c-common.c (resolve_overloaded_builtin): Don't try to
   process a return value with an error mark.
 
   * libstdc++-v3/include/bits/atomic_2.h: Use __atomic_compare_exchange.
 
   * fortran/types.def (BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_BOOL_INT_INT):
   Add the bool parameter.
 
   * testsuite/gcc.dg/atomic-invalid.c: Add compare_exchange failures.
   * testsuite/gcc.dg/atomic-compare-exchange-{1-5}.c: New tests.

Ok.


r~


Re: [PATCH, testsuite]: Require non_strict_align effective target for gcc.dg/ipa/ipa-sra-[26].c

2011-10-20 Thread Uros Bizjak
On Wed, Oct 19, 2011 at 9:50 PM, Uros Bizjak ubiz...@gmail.com wrote:

 These two tests require non_strict_aligned effective target, since IPA
 fails in tree_non_mode_aligned_mem_p () for cow and calf
 candidates for STRICT_ALIGNMENT targets. Mode alignment requires 32
 bytes, while data is aligned to 8 bytes.

 2011-10-19  Uros Bizjak  ubiz...@gmail.com

        * gcc.dg/ipa/ipa-sra-2.c: Add dg-require-effective-target
        non_strict_align.
        * gcc.dg/ipa/ipa-sra-6.c: Ditto.

 Tested on x86_64-pc-linux-gnu and alphaev68-pc-linux-gnu, where the
 patch fixes:

 FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace
 expr cow_.*D.-red with \\*ISRA
 FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace
 expr cow_.*D.-green with ISRA
 FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace
 expr calf_.*D.-red with \\*ISRA
 FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace
 expr calf_.*D.-green with ISRA
 FAIL: gcc.dg/ipa/ipa-sra-6.c scan-tree-dump-times eipa_sra foo  1

So, comitted to SVN mainline and 4.6 branch under obvious rule.

Uros.


Re: trunk (rev 180248) not buildable --with-gc=zone: undefined ggc_alloced_size_for_request

2011-10-20 Thread Dodji Seketeli
Basile Starynkevitch bas...@starynkevitch.net a écrit:

 libbackend.a(ggc-zone.o): In function `ggc_internal_alloc_zone_stat':
 /usr/src/Lang/gcc-trunk-bstarynk/gcc/ggc-zone.c:1105: undefined reference to 
 `ggc_alloced_size_for_request'

This is my fault.  I have tested and committed the below as per the
obvious rule.

Sorry for the inconvenience.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index eeed56d..83da507 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -10,6 +10,10 @@
 
 2011-10-20  Dodji Seketeli  do...@redhat.com
 
+   * ggc-zone.c (ggc_internal_alloc_zone_stat): Rename
+   ggc_alloced_size_order_for_request into ggc_round_alloc_size like
+   it was done in ggc-page.c.
+
PR other/50659
* doc/cppopts.texi: Use @smallexample/@end smallexample in
documentation for -fdebug-cpp instead of @quotation/@end quotation
diff --git a/gcc/ggc-zone.c b/gcc/ggc-zone.c
index 79c8c03..5257ada 100644
--- a/gcc/ggc-zone.c
+++ b/gcc/ggc-zone.c
@@ -1102,7 +1102,7 @@ ggc_internal_alloc_zone_stat (size_t orig_size, struct 
alloc_zone *zone
   struct small_page_entry *entry;
   struct alloc_chunk *chunk, **pp;
   void *result;
-  size_t size = ggc_alloced_size_for_request (orig_size);
+  size_t size = ggc_round_alloc_size (orig_size);
 
   /* Try to allocate the object from several different sources.  Each
  of these cases is responsible for setting RESULT and SIZE to

-- 
Dodji


[patch, testsuite] Fix vect-120.c failure on IA64

2011-10-20 Thread Steve Ellcey

I am going to check this change in as obvious later today, the test
includes a conversion from float to int in the loop and if that isn't
supported by a target, then the loop is not vectorized.  This test has
been failing on IA64 and perhaps on ARM too, there was a reference to it
in PR 50150.  I didn't test the change on ARM but it fixes the failure
on IA64.

Steve Ellcey
s...@cup.hp.com


2011-10-20  Steve Ellcey  s...@cup.hp.com

* gcc.dg/vect/vect-120.c: Add vect_floatint_cvt requirement.


Index: gcc.dg/vect/vect-120.c
===
--- gcc.dg/vect/vect-120.c  (revision 180233)
+++ gcc.dg/vect/vect-120.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
 /* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_floatint_cvt } */
 
 static inline float
 i2f(int x)


Re: [PATCH] New port for TILEPro and TILE-Gx 2/7: changes in contrib

2011-10-20 Thread Walter Lee

Here is a resubmission of the contrib patch, adding the entries to
gcc_update to handle the multiply tables.

* config-list.mk (tilegx-linux-gnu): Add.
(tilepro-linux-gnu): Add.
* gcc_update (gcc/config/tilegx/mul-tables.c): New dependencies.
(gcc/config/tilepro/mul-tables.c): New dependencies.
diff -r -u -p -N /home/packages/gcc-4.7.0-180241/contrib/config-list.mk 
./contrib/config-list.mk
--- /home/packages/gcc-4.7.0-180241/contrib/config-list.mk  2011-10-14 
01:08:51.0 -0400
+++ ./contrib/config-list.mk2011-10-20 10:23:51.331484000 -0400
@@ -59,7 +59,8 @@ LIST = alpha-linux-gnu alpha-freebsd6 al
   sparc-leon3-linux-gnuOPT-enable-target=all sparc-netbsdelf \
   
sparc64-sun-solaris2.10OPT-with-gnu-ldOPT-with-gnu-asOPT-enable-threads=posix \
   sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux sparc64-freebsd6 \
-  sparc64-netbsd sparc64-openbsd spu-elf v850e-elf v850-elf vax-linux-gnu \
+  sparc64-netbsd sparc64-openbsd spu-elf tilegx-linux-gnu tilepro-linux-gnu \
+  v850e-elf v850-elf vax-linux-gnu \
   vax-netbsdelf vax-openbsd x86_64-apple-darwin \
   x86_64-pc-linux-gnuOPT-with-fpmath=avx \
   x86_64-elfOPT-with-fpmath=sse x86_64-freebsd6 x86_64-netbsd \
diff -r -u -p -N /home/packages/gcc-4.7.0-180241/contrib/gcc_update 
./contrib/gcc_update
--- /home/packages/gcc-4.7.0-180241/contrib/gcc_update  2011-10-14 
01:08:51.0 -0400
+++ ./contrib/gcc_update2011-10-20 10:23:51.337478000 -0400
@@ -88,6 +88,8 @@ gcc/config/c6x/c6x-mult.md: gcc/config/c
 gcc/config/m68k/m68k-tables.opt: gcc/config/m68k/m68k-devices.def 
gcc/config/m68k/m68k-isas.def gcc/config/m68k/m68k-microarchs.def 
gcc/config/m68k/genopt.sh
 gcc/config/mips/mips-tables.opt: gcc/config/mips/mips-cpus.def 
gcc/config/mips/genopt.sh
 gcc/config/rs6000/rs6000-tables.opt: gcc/config/rs6000/rs6000-cpus.def 
gcc/config/rs6000/genopt.sh
+gcc/config/tilegx/mul-tables.c: gcc/config/tilepro/gen-mul-tables.cc
+gcc/config/tilepro/mul-tables.c: gcc/config/tilepro/gen-mul-tables.cc
 # And then, language-specific files
 gcc/cp/cfns.h: gcc/cp/cfns.gperf
 gcc/java/keyword.h: gcc/java/keyword.gperf


Re: [PATCH] New port for TILEPro and TILE-Gx: 5/7 libgcc port

2011-10-20 Thread Walter Lee
Here is a resubmission of the libgcc patch, using soft-fp as the 
floating point library.  I plan to do the benchmarking between the 
implementations as suggested, but I'd like to decouple that from the 
initial submission.


* config.host: Handle tilegx and tilepro.
* config/tilegx/sfp-machine.h: New file.
* config/tilegx/sfp-machine32.h: New file.
* config/tilegx/sfp-machine64.h: New file.
* config/tilegx/t-softfp: New file.
* config/tilegx/t-tilegx: New file.
* config/tilepro/atomic.c: New file.
* config/tilepro/sfp-machine.h: New file.
* config/tilepro/softdivide.c: New file.
* config/tilepro/softmpy.S: New file.
* config/tilepro/t-tilepro: New file.



libgcc.diff.gz
Description: GNU Zip compressed data


[patch, testsuite] Patch for gcc.dg/pr49994-3.c on HP-UX

2011-10-20 Thread Steve Ellcey

I am going to check this change in as obvious later today if there are
no objections,  the test gives warnings on HP-UX because it calls
__builtin_return_address with arguments of 0 through 5 but the value 0
is the only valid argument to __builtin_return_address on HP-UX.

Tested on IA64 and PA HP-UX.

Steve Ellcey
s...@cup.hp.com



2011-10-20  Steve Ellcey  s...@cup.hp.com

PR testsuite/50722
* gcc.dg/pr49994-3.c: Skip on HP-UX.


Index: gcc.dg/pr49994-3.c
===
--- gcc.dg/pr49994-3.c  (revision 180233)
+++ gcc.dg/pr49994-3.c  (working copy)
@@ -2,6 +2,7 @@
 /* { dg-options -O2 -fsched2-use-superblocks -g } */
 /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target 
s390*-*-* } } */
 /* { dg-require-effective-target scheduling } */
+/* { dg-skip-if  { *-*-hpux* } { * } {  } } */
 
 void *
 foo (int offset)


Re: [patch, testsuite] Patch for gcc.dg/pr49994-3.c on HP-UX

2011-10-20 Thread Rainer Orth
Steve Ellcey s...@cup.hp.com writes:

 Index: gcc.dg/pr49994-3.c
 ===
 --- gcc.dg/pr49994-3.c(revision 180233)
 +++ gcc.dg/pr49994-3.c(working copy)
 @@ -2,6 +2,7 @@
  /* { dg-options -O2 -fsched2-use-superblocks -g } */
  /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target 
 s390*-*-* } } */
  /* { dg-require-effective-target scheduling } */
 +/* { dg-skip-if  { *-*-hpux* } { * } {  } } */

Would you please include either an explanation or a PR reference in the
dg-skip-if?  Having to search the archives for an explanation is tedious.

Btw., you should be able to omit both the * and .

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][PING] Vectorize conversions directly

2011-10-20 Thread Richard Henderson
On 10/20/2011 09:24 AM, Dmitry Plotnikov wrote:
 gcc/
 * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions.
 * tree-vect-stmts.c (supportable_convert_operation): New function.
   (vectorizable_conversion): Call it.  Change condition and behavior
   for NONE modifier case.
 * tree-vectorizer.h (supportable_convert_operation): New prototype.
 * tree.h (VECTOR_INTEGER_TYPE_P): New macro.
 
 gcc/config/arm/
 * neon.md (floatv2siv2sf2): New.
   (floatunsv2siv2sf2): New.
   (fix_truncv2sfv2si2): New.
   (fix_truncunsv2sfv2si2): New.
   (floatv4siv4sf2): New.
   (floatunsv4siv4sf2): New.
   (fix_truncv4sfv4si2): New.
   (fix_truncunsv4sfv4si2): New.

 gcc/testsuite/
 * gcc.target/arm/vect-vcvt.c: New test.
 * gcc.target/arm/vect-vcvtq.c: New test.
 
 gcc/testsuite/lib/
 * target-supports.exp (check_effective_target_vect_intfloat_cvt): True
   for ARM NEON.
   (check_effective_target_vect_uintfloat_cvt): Likewise.
   (check_effective_target_vect_intfloat_cvt): Likewise.
   (check_effective_target_vect_floatuint_cvt): Likewise.
   (check_effective_target_vect_floatint_cvt): Likewise.
   (check_effective_target_vect_extract_even_odd): Likewise.

Please move supportable_convert_operation to optabs.c; eventually
we ought to use can_fix_p/can_float_p.

 +  if (code == FIX_TRUNC_EXPR)
 +optab1 = (TYPE_UNSIGNED (vectype_out)) ? ufixtrunc_optab : 
 sfixtrunc_optab;
 +  else if (code == FLOAT_EXPR)
 +optab1 = (TYPE_UNSIGNED (vectype_in)) ? ufloat_optab : sfloat_optab;
 +  
 +  m1 = TYPE_MODE (vectype_in);

Looks like a missing 

else
  gcc_unreachable()

there, since there's no check for optab1 != NULL later.

Otherwise the generic parts of the patch look good.
Please get separate approval for the arm portions of the patch.

After the generic parts of the patch goes in I will endevour to adjust the i386
and rs6000 backends to similarly populate the optabs, so that we can remove the
builtin path here.


r~


Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Xinliang David Li
On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote:
 x...@google.com (Rong Xu) writes:

 After some off-line discussion, we decided to use a more general approach
 to control the printing of optimization messages/warnings. We will
 introduce a new option -fopt-info:
  * fopt-info=0 or fno-opt-info: no message will be emitted.
  * fopt-info or fopt-info=1: emit important warnings and optimization
    messages with large performance impact.
  * fopt-info=2: warnings and optimization messages targeting power users.
  * fopt-info=3: informational messages for compiler developers.

 This doesn't look scalable if you consider that each pass would print
 as much of a mess like -fvectorizer-verbose=5.

What is not scalable? For level 1 dump, only the summary of
vectorization will be printed just like other loop transformations.


 I think =2 and =3 should be omitted - we do have dump-files for a reason.

Dump files are not easy to use -- it is big, and slow especially for
people with large distributed build systems.  Having both level 2 and
3 is debatable, but it will be useful to have a least one level above
level 1. Dump files are mainly for compiler developers, while
-fopt-info are for compiler developers *and* power users who know
performance tuning.

 Also the coverage/profile cases you changed do not at all match
 ... with large performance impact.  In fact the impact is completely
 unknown (as it would be the case usually).

Impact of any transformations is just 'potential', coverage problems
are no different from that.


 I'd rather have a way to make dump-files more structured (so, following
 some standard reporting scheme) than introducing yet another way
 of output.  [after making dump-files more consistent it will be easy
 to revisit patches like this, there would be a natural general central
 way to implement it]

Yes, I remember we have discussed about this before -- currently dump
files are a big mess -- debug tracing, IR are all mixed up, but as I
said above, this is a different matter -- it is for compiler
developers.

For more structured optimization report, we should use option
-fopt-report which dump optimization information based on category --
the info data base can also be shared across modules:

Example:

[Loop Interchange]
File a, line x,   yyy
File b, line xx, yyy

File c, line z,   It is beneficial to interchange the loop, but not
done because of possible carried dependency (caused by false aliasing
...)

[Loop Vectorization]


[Loop Unroll]
...

[SRA]

[Alias summary]
  [Global Vars]
   a: addr exposed
   b: add not exposed
   ..
  [Global Pointers]
..
  ...


Thanks,

David


 So, please fix dump-files instead.  And for coverage/profiling, fill
 in stuff in a dump-file!

 Richard.

 It would be interested to have some warnings about missing SRA
 opportunities in =1 or =2. I found that sometimes fixing those can give a
 large speedup.

 Right now a common case that prevents SRA on structure field
 is simply a memset or memcpy.

 -Andi


 --
 a...@linux.intel.com -- Speaking for myself only




Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Rong Xu
Richard, Thanks for the comments.

Let me give some background of the patch: The initial intention of the
patch is to suppress
the verbose warnings and notes emitted in profile-use compilation.
This warnings/notes are caused
by inconsistent profile due to data race (which is currently common in
multi-thread programs),
and some stale profiles (after adding a new functions). While valid,
they can easily pollute the build log.
In addition, some of these warnings cannot be disabled by any options
except -Wno-error. We have
many FDO users and these verbose messages one of the most complained issues.

The first patch was adding another option to control the fdo related
messages. Later we thought it's
better to do this in more general way. And here comes this patch.

I believe fopt-info is very useful for tracking regressions for
certain important optimization (inline, loop opt etc).
IR dump is just too big and add too much overhead to compilation (look
how many files it creates).

-Rong



On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote:
 x...@google.com (Rong Xu) writes:

 After some off-line discussion, we decided to use a more general approach
 to control the printing of optimization messages/warnings. We will
 introduce a new option -fopt-info:
  * fopt-info=0 or fno-opt-info: no message will be emitted.
  * fopt-info or fopt-info=1: emit important warnings and optimization
    messages with large performance impact.
  * fopt-info=2: warnings and optimization messages targeting power users.
  * fopt-info=3: informational messages for compiler developers.

 This doesn't look scalable if you consider that each pass would print
 as much of a mess like -fvectorizer-verbose=5.

 I think =2 and =3 should be omitted - we do have dump-files for a reason.

 Also the coverage/profile cases you changed do not at all match
 ... with large performance impact.  In fact the impact is completely
 unknown (as it would be the case usually).

 I'd rather have a way to make dump-files more structured (so, following
 some standard reporting scheme) than introducing yet another way
 of output.  [after making dump-files more consistent it will be easy
 to revisit patches like this, there would be a natural general central
 way to implement it]

 So, please fix dump-files instead.  And for coverage/profiling, fill
 in stuff in a dump-file!

 Richard.

 It would be interested to have some warnings about missing SRA
 opportunities in =1 or =2. I found that sometimes fixing those can give a
 large speedup.

 Right now a common case that prevents SRA on structure field
 is simply a memset or memcpy.

 -Andi


 --
 a...@linux.intel.com -- Speaking for myself only




Breakage with Update testsuite to run with slim LTO

2011-10-20 Thread Hans-Peter Nilsson
 Date: Tue, 27 Sep 2011 19:23:22 +0200
 From: Jan Hubicka hubi...@ucw.cz

 this patch updates testsuite to cover both fat and slim LTO when linker plugin
 is used and also both linker plugin and collect2 paths.  I didn't wanted to
 slow down testing too much so I just distributes the flags across existing 
 runs
 with aim to maximize the coverage of testing matrix that is bit large now.
 I believe it is sufficient and testsuite now runs a bit faster than previously
 since slim LTO saves some effort.
 
 sync and pr34850 tests doesn't pass with slim LTO. The reason is that they
 excpects diagnostics that is output too late in compilation (usually at
 expansion time).  These should be probably fixed as QOI issue but they are not
 real bug - the diagnostics will be output at linktime.  I will open PR 
 tracking
 this.  We probably should output pretty much everything till end of early opts
 except for stuff that really looks for optimization results.  Especially now
 when we handle always inline in early inlining.
 
 Honza
 
   * lib/lto.exp: When linker plugin is available test both
   plugin/non-plugin LTO paths as well as fat and slim LTO.
   lib/c-torture.exp: Likewise.
   lib/gcc-dg.exp: Likweise

Looks like this patch broke, for cris-elf with TOT binutils:

Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ...
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in-asm: .mof
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out-asm: .mof
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in2-asm: .mof
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out2-asm: .mof

which for -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
don't produce any code.  Is that expected? 

If so, and if the required update is as for the test-cases you
updated, to add:
+ /* { dg-options -ffat-lto-objects } */

then IIUC you need to patch *all* torture tests that use
scan-assembler and scan-assembler-not.  Alternatively, patch
somewhere else, like not passing it if certain directives are
used, like scan-assembler{,-not}.  And either way, is it safe to
add that option always, not just when also passing -flto or
something?

brgds, H-P


Re: Breakage with Update testsuite to run with slim LTO

2011-10-20 Thread Andi Kleen



Looks like this patch broke, for cris-elf with TOT binutils:

Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ...
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in-asm: .mof
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out-asm: .mof
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in2-asm: .mof
FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out2-asm: .mof

which for -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
don't produce any code.  Is that expected?


Yes. -fno-fat-lto-objects does only produce code at the final link.

-Andi


Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Xinliang David Li
While discussion for trunk version is still going, it is ok for google branches.

thanks,

David

On Wed, Oct 19, 2011 at 4:28 PM, Rong Xu x...@google.com wrote:
 After some off-line discussion, we decided to use a more general approach
 to control the printing of optimization messages/warnings. We will
 introduce a new option -fopt-info:
  * fopt-info=0 or fno-opt-info: no message will be emitted.
  * fopt-info or fopt-info=1: emit important warnings and optimization
   messages with large performance impact.
  * fopt-info=2: warnings and optimization messages targeting power users.
  * fopt-info=3: informational messages for compiler developers.

 2011-10-19   Rong Xu  x...@google.com

        * gcc/common.opt (fopt-info): New flag. (fopt-info=) Ditto.
        * gcc/opts.c (common_handle_option): Handle OPT_fopt_info_.
        * gcc/flag-types.h (opt_info_verbosity_levels): New enum.
        * gcc/value-prof.c (check_ic_counter): guard warnings/notes by
          flag_opt_info.
          (find_func_by_funcdef_no): Ditto.
          (check_ic_target): Ditto.
          (check_counter): Ditto.
          (check_ic_counter): Ditto.
        * gcc/mcf.c (find_minimum_cost_flow): Ditto.
        * gcc/profile.c (read_profile_edge_counts): Ditto.
          (compute_branch_probabilities): Ditto.
        * gcc/coverage.c (read_counts_file): Ditto.
          (get_coverage_counts): Ditto.
        * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto.
          (maybe_issue_profile_use_note): Ditto.
          (optimize_reusedist): Ditto.
        * gcc/testsuite/gcc.dg/pr32773.c: add -fopt-info.
        * gcc/testsuite/gcc.dg/pr40209.c: Ditto.
        * gcc/testsuite/gcc.dg/pr26570.c: Ditto.
        * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.


 Index: gcc/value-prof.c
 ===
 --- gcc/value-prof.c    (revision 180106)
 +++ gcc/value-prof.c    (working copy)
 @@ -472,9 +472,10 @@
               : DECL_SOURCE_LOCATION (current_function_decl);
       if (flag_profile_correction)
         {
 -         inform (locus, correcting inconsistent value profile: 
 -                 %s profiler overall count (%d) does not match BB count 
 -                  (%d), name, (int)*all, (int)bb_count);
 +          if (flag_opt_info = OPT_INFO_MAX)
 +            inform (locus, correcting inconsistent value profile: %s 
 +                   profiler overall count (%d) does not match BB count 
 +                    (%d), name, (int)*all, (int)bb_count);
          *all = bb_count;
          if (*count  *all)
             *count = *all;
 @@ -510,33 +511,42 @@
   location_t locus;
   if (*count1  all  flag_profile_correction)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Correcting inconsistent value profile: 
 -              ic (topn) profiler top target count (%ld) exceeds 
 -             BB count (%ld), (long)*count1, (long)all);
 +      if (flag_opt_info = OPT_INFO_MAX)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Correcting inconsistent value profile: 
 +                  ic (topn) profiler top target count (%ld) exceeds 
 +                  BB count (%ld), (long)*count1, (long)all);
 +        }
       *count1 = all;
     }
   if (*count2  all  flag_profile_correction)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Correcting inconsistent value profile: 
 -              ic (topn) profiler second target count (%ld) exceeds 
 -             BB count (%ld), (long)*count2, (long)all);
 +      if (flag_opt_info = OPT_INFO_MAX)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Correcting inconsistent value profile: 
 +                  ic (topn) profiler second target count (%ld) exceeds 
 +                 BB count (%ld), (long)*count2, (long)all);
 +        }
       *count2 = all;
     }

   if (*count2  *count1)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Corrupted topn ic value profile: 
 -             first target count (%ld) is less than the second 
 -             target count (%ld), (long)*count1, (long)*count2);
 +      if (flag_opt_info = OPT_INFO_MAX)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Corrupted topn ic value profile: 
 +                 first target count (%ld) is less than the second 
 +    

Remove target.vectorize.builtin_vec_perm

2011-10-20 Thread Richard Henderson
Since the vectorizer has been changed to emit VEC_PERM_EXPR,
I've now removed the hook and the implementations of that hook.

For the x86 target I also removed the builtins themselves.
For the rs6000 and spu targets, I've left that detail to the
port maintainers; I don't know what interfaces are actually
public.

Tested on x86_64-linux.  Committed.


r~
gcc/
+   * target.def (builtin_vec_perm): Remove.
+   * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove.
+
+   * config/i386/i386.c (ix86_expand_vec_perm_builtin): Remove.
+   (IX86_BUILTIN_VEC_PERM_*): Remove.
+   (bdesc_args): Remove vec_perm builtins
+   (ix86_expand_builtin): Likewise.
+   (ix86_expand_vec_perm_const_1): Rename from
+   ix86_expand_vec_perm_builtin_1.
+   (extract_vec_perm_cst): Merge into...
+   (ix86_vectorize_vec_perm_const_ok): ... here.  Rename from
+   ix86_vectorize_builtin_vec_perm_ok.
+   (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove.
+
+   * config/rs6000/rs6000.c (rs6000_builtin_vec_perm): Remove.
+   (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove.
+
+   * config/spu/spu.c (spu_builtin_vec_perm): Remove.
+   (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove.

gcc/testsuite/
+   * gcc.target/i386/vperm-v2df.c, gcc.target/i386/vperm-v2di.c,
+   gcc.target/i386/vperm-v4sf-1.c, gcc.target/i386/vperm-v4sf-2.c, 
+   gcc.target/i386/vperm-v4si-1.c, gcc.target/i386/vperm-v4si-2.c:
+   Use __builtin_shuffle.




diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4af4e59..7750356 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2509,7 +2509,6 @@ static void ix86_compute_frame_layout (struct ix86_frame 
*);
 static bool ix86_expand_vector_init_one_nonzero (bool, enum machine_mode,
 rtx, rtx, int);
 static void ix86_add_new_builtins (HOST_WIDE_INT);
-static rtx ix86_expand_vec_perm_builtin (tree);
 static tree ix86_canonical_va_list_type (tree);
 static void predict_jump (int);
 static unsigned int split_stack_prologue_scratch_regno (void);
@@ -25058,19 +25057,6 @@ enum ix86_builtins
 
   IX86_BUILTIN_CVTUDQ2PS,
 
-  IX86_BUILTIN_VEC_PERM_V2DF,
-  IX86_BUILTIN_VEC_PERM_V4SF,
-  IX86_BUILTIN_VEC_PERM_V2DI,
-  IX86_BUILTIN_VEC_PERM_V4SI,
-  IX86_BUILTIN_VEC_PERM_V8HI,
-  IX86_BUILTIN_VEC_PERM_V16QI,
-  IX86_BUILTIN_VEC_PERM_V2DI_U,
-  IX86_BUILTIN_VEC_PERM_V4SI_U,
-  IX86_BUILTIN_VEC_PERM_V8HI_U,
-  IX86_BUILTIN_VEC_PERM_V16QI_U,
-  IX86_BUILTIN_VEC_PERM_V4DF,
-  IX86_BUILTIN_VEC_PERM_V8SF,
-
   /* FMA4 instructions.  */
   IX86_BUILTIN_VFMADDSS,
   IX86_BUILTIN_VFMADDSD,
@@ -25779,19 +25765,6 @@ static const struct builtin_description bdesc_args[] =
   /* SSE2 */
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, __builtin_ia32_shufpd, 
IX86_BUILTIN_SHUFPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT },
 
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v2df, 
IX86_BUILTIN_VEC_PERM_V2DF, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI },
-  { OPTION_MASK_ISA_SSE, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4sf, 
IX86_BUILTIN_VEC_PERM_V4SF, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v2di, 
IX86_BUILTIN_VEC_PERM_V2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4si, 
IX86_BUILTIN_VEC_PERM_V4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v8hi, 
IX86_BUILTIN_VEC_PERM_V8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v16qi, 
IX86_BUILTIN_VEC_PERM_V16QI, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v2di_u, 
IX86_BUILTIN_VEC_PERM_V2DI_U, UNKNOWN, (int) V2UDI_FTYPE_V2UDI_V2UDI_V2UDI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4si_u, 
IX86_BUILTIN_VEC_PERM_V4SI_U, UNKNOWN, (int) V4USI_FTYPE_V4USI_V4USI_V4USI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v8hi_u, 
IX86_BUILTIN_VEC_PERM_V8HI_U, UNKNOWN, (int) V8UHI_FTYPE_V8UHI_V8UHI_V8UHI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v16qi_u, 
IX86_BUILTIN_VEC_PERM_V16QI_U, UNKNOWN, (int) V16UQI_FTYPE_V16UQI_V16UQI_V16UQI 
},
-  { OPTION_MASK_ISA_AVX, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4df, 
IX86_BUILTIN_VEC_PERM_V4DF, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DI },
-  { OPTION_MASK_ISA_AVX, CODE_FOR_nothing, __builtin_ia32_vec_perm_v8sf, 
IX86_BUILTIN_VEC_PERM_V8SF, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SI },
-
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movmskpd, __builtin_ia32_movmskpd, 
IX86_BUILTIN_MOVMSKPD, UNKNOWN, (int) INT_FTYPE_V2DF  },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_pmovmskb, 
__builtin_ia32_pmovmskb128, IX86_BUILTIN_PMOVMSKB128, UNKNOWN, (int) 
INT_FTYPE_V16QI },
   { 

C++ PATCH for c++/41449 (EH cleanup of partially-aggregate-initialized objects)

2011-10-20 Thread Jason Merrill
The C++ standard says that if an exception is thrown during 
initialization of a class, any fully-constructed subobjects are 
destroyed.  We already handled that properly for objects initialized via 
constructor, but we weren't handling it properly for aggregate 
initialization.  This patch adds the necessary EH cleanups for during 
initialization; conveniently, just using push_eh_cleanup works here 
because we were already doing push/pop_stmt_list around the 
initialization as a whole.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 7d32708956095f3ddb6698fee9fa092f649d72d4
Author: Jason Merrill ja...@redhat.com
Date:   Thu Oct 20 15:00:49 2011 -0400

	PR c++/41449
	* typeck2.c (split_nonconstant_init_1): Handle EH cleanup of
	initialized subobjects.

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 3accab6..580f669 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -567,6 +567,13 @@ split_nonconstant_init_1 (tree dest, tree init)
 	  code = build2 (INIT_EXPR, inner_type, sub, value);
 	  code = build_stmt (input_location, EXPR_STMT, code);
 	  add_stmt (code);
+	  if (!TYPE_HAS_TRIVIAL_DESTRUCTOR (inner_type))
+		{
+		  code = (build_special_member_call
+			  (sub, complete_dtor_identifier, NULL, inner_type,
+			   LOOKUP_NORMAL, tf_warning_or_error));
+		  finish_eh_cleanup (code);
+		}
 
 	  num_split_elts++;
 	}
diff --git a/gcc/testsuite/g++.dg/eh/partial1.C b/gcc/testsuite/g++.dg/eh/partial1.C
new file mode 100644
index 000..db73177
--- /dev/null
+++ b/gcc/testsuite/g++.dg/eh/partial1.C
@@ -0,0 +1,37 @@
+// PR c++/41449
+// { dg-do run }
+
+struct A
+{
+  A() {}
+  A(const A) { throw 1; }
+};
+
+int bs;
+struct B
+{
+  B() { ++bs; }
+  B(const B) { ++bs; }
+  ~B() { --bs; }
+};
+
+struct C
+{
+  B b1;
+  A a;
+  B b2;
+};
+
+int main()
+{
+  {
+B b1, b2;
+A a;
+
+try {
+  C c = { b1, a, b2 };
+} catch (...) {}
+  }
+  if (bs != 0)
+__builtin_abort ();
+}


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-20 Thread H.J. Lu
On Thu, Oct 20, 2011 at 1:05 AM, Sergey Ostanevich sergos@gmail.com wrote:
 Please provide a patch which can be applied.  Cut/paste doesn't create
 a working patch.  Please attach it.

 --
 H.J.


 Will that works?
 Sergos.

 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 6c73404..e21cf86 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,8 @@
 +2011-10-20  Sergey Ostanevich  sergos@gmail.com
 +
 +       * config/i386/i386.c (processor_target_table): Change Atom
 +       align_loops_max_skip to 15.
 +
  2011-10-17  Michael Spertus  mike_sper...@symantec.com

        * gcc/c-family/c-common.c (c_common_reswords): Add __bases,
 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 2c53423..8c60086 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -2596,7 +2596,7 @@ static const struct ptt
 processor_target_table[PROCESSOR_max] =
   {bdver1_cost, 32, 24, 32, 7, 32},
   {bdver2_cost, 32, 24, 32, 7, 32},
   {btver1_cost, 32, 24, 32, 7, 32},
 -  {atom_cost, 16, 7, 16, 7, 16}
 +  {atom_cost, 16, 15, 16, 7, 16}
  };

  static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =


No, it doesn't work.  I had to apply it by hand for you.

1. You have to use attachment for patches to be applied by other
people when you are sending them in gmail.
2. Please don't use diff on ChangeLog unless you are the only person
who changes it.   Such patch  rarely applies.
3. You should add PR target/50572 in ChangeLog entry.


-- 
H.J.


Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Andi Kleen
 This warnings/notes are caused
 by inconsistent profile due to data race (which is currently common in
 multi-thread programs),

I never quite understood why the gcov counters are not simply marked
__thread. This would make the profiled programs faster too because
they wouldn't bounce cache lines  that much. Especially on larger
systems (2S) frequent cache line bouncing can lead to extreme slow downs,
and even on smaller systems it's very expensive.
This would also eliminate data races, except for signals and somesuch.

-Andi



Re: [PATCH] New port for TILEPro and TILE-Gx 3/7: gcc port

2011-10-20 Thread Joseph S. Myers
On Thu, 20 Oct 2011, Walter Lee wrote:

   +#undef MCOUNT_NAME
   +#define MCOUNT_NAME mcount
  
  For a new target it seems much better to define your ABI to use a name in
  the reserved namespace for this - that is, starting with two underscores.
 
 I've changed it to use _mcount with one underscore.  That seems to be what
 glibc support by default, and it's consistent with x86, and we'd prefer to be
 consistent with x86 whenever possible.

x86 also has a newer version __fentry__ with -mfentry.  ARM has mcount and 
__gnu_mcount_nc.  I don't think consistency with the old x86 _mcount is 
particularly desirable.

   +/* For __clear_cache in libgcc2.c.  */
   +#ifdef IN_LIBGCC2
   +
   +#include arch/icache.h
  
  Where does this header come from?  Linux kernel, glibc, somewhere else?
  In general you want to condition header includes on inhibit_libc to
  facilitate bootstrapping (including building a partial static libgcc)
  before the libc headers are installed, since configuring glibc to install
  its headers requires a working compiler to run configure tests.
 
 We plan to include this as part of the Linux kernel, as the kernel itself
 depends on it.

So make headers_install for your architectures will install this header 
under than name?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Xinliang David Li
On Thu, Oct 20, 2011 at 12:53 PM, Andi Kleen a...@firstfloor.org wrote:
 This warnings/notes are caused
 by inconsistent profile due to data race (which is currently common in
 multi-thread programs),

 I never quite understood why the gcov counters are not simply marked
 __thread. This would make the profiled programs faster too because
 they wouldn't bounce cache lines  that much. Especially on larger
 systems (2S) frequent cache line bouncing can lead to extreme slow downs,
 and even on smaller systems it's very expensive.
 This would also eliminate data races, except for signals and somesuch.

It uses stack space and for programs with hundreds and thousands of
threads, it can be a big problem.

David


 -Andi




Use .exe suffix on LTO test executables

2011-10-20 Thread Joseph S. Myers
As I noted in
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg00905.html, test
executables with no .exe or .something suffix are problematic for
testing on Windows targets.  This patch fixes the LTO tests to use
such suffixes, like other tests.

Tested with cross to i686-mingw32.  OK to commit?

2011-10-20  Joseph Myers  jos...@codesourcery.com

* lib/lto.exp (lto-execute): Use .exe suffix for test executable
names.

Index: gcc/testsuite/lib/lto.exp
===
--- gcc/testsuite/lib/lto.exp   (revision 180200)
+++ gcc/testsuite/lib/lto.exp   (working copy)
@@ -500,7 +500,7 @@
verbose Testing $testcase, $option
 
# There's a unique name for each executable we generate.
-   set execname ${execbase}-${count}1
+   set execname ${execbase}-${count}1.exe
incr count
 
file_on_host delete $execname

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Use .exe suffix on LTO test executables

2011-10-20 Thread Diego Novillo
On Thu, Oct 20, 2011 at 16:23, Joseph S. Myers jos...@codesourcery.com wrote:

 2011-10-20  Joseph Myers  jos...@codesourcery.com

        * lib/lto.exp (lto-execute): Use .exe suffix for test executable
        names.

OK.


Diego.


Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order

2011-10-20 Thread H.J. Lu
On Thu, Oct 20, 2011 at 1:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 OK.

 Thanks,
 Uros.

 Great,
 could anybody please commit that?


I checked it in for you.

-- 
H.J.


[cxx-mem-model] expand_atomic_load: Handle an empty target

2011-10-20 Thread Aldy Hernandez
Found this while testing the branch on ia64.  The call to 
expand_val_compare_and_swap() above returns NULL_RTX when it can't find 
a suitable instruction.


OK for branch?
* optabs.c (expand_atomic_load): Handle a NULL target.

Index: optabs.c
===
--- optabs.c(revision 180273)
+++ optabs.c(working copy)
@@ -7140,7 +7140,7 @@ expand_atomic_load (rtx target, rtx mem,
   return target;
 }
 
-  if (target == const0_rtx)
+  if (!target || target == const0_rtx)
 target = gen_reg_rtx (mode);
 
   /* Emit the appropriate barrier before the load.  */


Re: [cxx-mem-model] expand_atomic_load: Handle an empty target

2011-10-20 Thread Andrew MacLeod

On 10/20/2011 04:54 PM, Aldy Hernandez wrote:
Found this while testing the branch on ia64.  The call to 
expand_val_compare_and_swap() above returns NULL_RTX when it can't 
find a suitable instruction.


OK for branch?


yes.   btw, did you audit the other new expand routines to see if they 
handled a NULL return target as well?


Andrew


Fix gcc.dg/lto/pr46940_0.c for assembler name prefixes

2011-10-20 Thread Joseph S. Myers
gcc.dg/lto/pr46940_0.c needs fixing for targets using a prefix on
assembler names, similar to the fixes recently made by Joern to some
other testcases.  This patch fixes it in the same way as Joern fixed
gcc.dg/lto/20081222_1.c.

Testes with cross to i686-mingw32.  OK to commit?

2011-10-20  Joseph Myers  jos...@codesourcery.com

* gcc.dg/lto/pr46940_0.c (ASMNAME, ASMNAME2, STRING): Define.
(_moz_foo, EXT__foo): Use ASMNAME.

Index: gcc/testsuite/gcc.dg/lto/pr46940_0.c
===
--- gcc/testsuite/gcc.dg/lto/pr46940_0.c(revision 180200)
+++ gcc/testsuite/gcc.dg/lto/pr46940_0.c(working copy)
@@ -2,10 +2,14 @@
 /* { dg-extra-ld-options -fuse-linker-plugin } */
 #include stdio.h
 
+#define ASMNAME(cname)  ASMNAME2 (__USER_LABEL_PREFIX__, cname)
+#define ASMNAME2(prefix, cname) STRING (prefix) cname
+#define STRING(x)#x
+
 extern __attribute__((visibility(hidden))) void _moz_foo (void);
-extern __typeof (_moz_foo) _moz_foo __asm__ ( INT__foo) 
__attribute__((__visibility__(hidden))) ;
+extern __typeof (_moz_foo) _moz_foo __asm__ (ASMNAME (INT__foo)) 
__attribute__((__visibility__(hidden))) ;
 void _moz_foo(void)
 {
   printf (blah\n);
 }
-extern __typeof (_moz_foo) EXT__foo __asm__( _moz_foo) 
__attribute__((__alias__( INT__foo)));
+extern __typeof (_moz_foo) EXT__foo __asm__(ASMNAME (_moz_foo)) 
__attribute__((__alias__( INT__foo)));

-- 
Joseph S. Myers
jos...@codesourcery.com


[cxx-mem-model] Handle x86-64 with -m32

2011-10-20 Thread Aldy Hernandez
These operations don't exist on x86-32 bits, and when running multilibed 
tests, the target is still x86_64-unknown-linux-gnu but the target is 
32-bits when using -m32.


The following change checks that we are actually running in 64-bits 
before assuming sync_int_128 or sync_long_long exist on the target.


OK for branch?
* lib/target-supports.exp (check_effective_target_sync_int_128):
Only set when running in 64-bit mode.
(check_effective_target_sync_long_long): Same.

Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 180156)
+++ lib/target-supports.exp (working copy)
@@ -3456,7 +3456,7 @@ proc check_effective_target_sync_int_128
 verbose check_effective_target_sync_int_128: using cached result 2
 } else {
 set et_sync_int_128_saved 0
-if { [istarget x86_64-*-*] } {
+if { [istarget x86_64-*-*]  [is-effective-target lp64] } {
set et_sync_int_128_saved 1
 }
 }
@@ -3474,7 +3474,7 @@ proc check_effective_target_sync_long_lo
 verbose check_effective_target_sync_long_long: using cached result 2
 } else {
 set et_sync_long_long_saved 0
-if { [istarget x86_64-*-*] } {
+if { [istarget x86_64-*-*]  [is-effective-target lp64] } {
set et_sync_long_long_saved 1
 }
 }


Rename builtin_vec_perm_ok to vec_perm_const_ok

2011-10-20 Thread Richard Henderson
... since it no longer applies to a builtin.

Tested on x86_64-linux.


r~
* target.def (vec_perm_const_ok): Rename from builtin_vec_perm_ok.
* optabs.c (can_vec_perm_expr_p): Update to match.
(expand_vec_perm_expr): Likewise.
* config/i386/i386.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Rename
from TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK.
* doc/tm.texi.in: Likewise.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7750356..b7718e9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -36446,7 +36446,7 @@ ix86_expand_vec_perm_const (rtx operands[4])
   return false;
 }
 
-/* Implement targetm.vectorize.builtin_vec_perm_ok.  */
+/* Implement targetm.vectorize.vec_perm_const_ok.  */
 
 static bool
 ix86_vectorize_vec_perm_const_ok (tree vec_type, tree mask)
@@ -37879,8 +37879,8 @@ ix86_autovectorize_vector_sizes (void)
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
   ix86_builtin_vectorization_cost
-#undef TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK
-#define TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK \
+#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
+#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
   ix86_vectorize_vec_perm_const_ok
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c52753a..a43ce3d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5711,7 +5711,7 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the 
given type.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK (tree 
@var{vec_type}, tree @var{mask})
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK (tree 
@var{vec_type}, tree @var{mask})
 Return true if a vector created for @code{vec_perm_const} is valid.
 @end deftypefn
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 22e82ee..cede91e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -5649,7 +5649,7 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the 
given type.
 @end deftypefn
 
-@hook TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK
+@hook TARGET_VECTORIZE_VEC_PERM_CONST_OK
 Return true if a vector created for @code{vec_perm_const} is valid.
 @end deftypefn
 
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 558c0fa..5036856 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6714,7 +6714,7 @@ can_vec_perm_expr_p (tree type, tree sel)
   if (sel == NULL || TREE_CODE (sel) == VECTOR_CST)
 {
   if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing
-  (sel == NULL || targetm.vectorize.builtin_vec_perm_ok (type, sel)))
+  (sel == NULL || targetm.vectorize.vec_perm_const_ok (type, sel)))
return true;
 }
 
@@ -6808,7 +6808,7 @@ expand_vec_perm_expr (tree type, tree v0, tree v1, tree 
sel, rtx target)
 {
   icode = direct_optab_handler (vec_perm_const_optab, mode);
   if (icode != CODE_FOR_nothing
-  targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), sel)
+  targetm.vectorize.vec_perm_const_ok (TREE_TYPE (v0), sel)
   (tmp = expand_vec_perm_expr_1 (icode, target, v0_rtx,
v1_rtx, sel_rtx)) != NULL)
return tmp;
diff --git a/gcc/target.def b/gcc/target.def
index c9d6067..60fad2a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -985,9 +985,9 @@ DEFHOOK
  bool, (const_tree type, bool is_packed),
  default_builtin_vector_alignment_reachable)
 
-/* Return true if a vector created for builtin_vec_perm is valid.  */
+/* Return true if a vector created for vec_perm_const is valid.  */
 DEFHOOK
-(builtin_vec_perm_ok,
+(vec_perm_const_ok,
  ,
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)


Re: [cxx-mem-model] expand_atomic_load: Handle an empty target

2011-10-20 Thread Aldy Hernandez

On 10/20/11 15:56, Andrew MacLeod wrote:

On 10/20/2011 04:54 PM, Aldy Hernandez wrote:

Found this while testing the branch on ia64. The call to
expand_val_compare_and_swap() above returns NULL_RTX when it can't
find a suitable instruction.

OK for branch?


yes. btw, did you audit the other new expand routines to see if they
handled a NULL return target as well?

Andrew


They seem ok, but I am re-running tests on ia64 to see if I find other 
similar failures.  If I do, I will submit fixes.


Aldy


Re: Fix gcc.dg/lto/pr46940_0.c for assembler name prefixes

2011-10-20 Thread Diego Novillo
On Thu, Oct 20, 2011 at 17:01, Joseph S. Myers jos...@codesourcery.com wrote:
 gcc.dg/lto/pr46940_0.c needs fixing for targets using a prefix on
 assembler names, similar to the fixes recently made by Joern to some
 other testcases.  This patch fixes it in the same way as Joern fixed
 gcc.dg/lto/20081222_1.c.

 Testes with cross to i686-mingw32.  OK to commit?

 2011-10-20  Joseph Myers  jos...@codesourcery.com

        * gcc.dg/lto/pr46940_0.c (ASMNAME, ASMNAME2, STRING): Define.
        (_moz_foo, EXT__foo): Use ASMNAME.

OK.


Diego.


Re: [patch, testsuite] Patch for gcc.dg/pr49994-3.c on HP-UX

2011-10-20 Thread Steve Ellcey
On Thu, 2011-10-20 at 18:23 +0200, Rainer Orth wrote:
 Steve Ellcey s...@cup.hp.com writes:
 
  Index: gcc.dg/pr49994-3.c
  ===
  --- gcc.dg/pr49994-3.c  (revision 180233)
  +++ gcc.dg/pr49994-3.c  (working copy)
  @@ -2,6 +2,7 @@
   /* { dg-options -O2 -fsched2-use-superblocks -g } */
   /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target 
  s390*-*-* } } */
   /* { dg-require-effective-target scheduling } */
  +/* { dg-skip-if  { *-*-hpux* } { * } {  } } */
 
 Would you please include either an explanation or a PR reference in the
 dg-skip-if?  Having to search the archives for an explanation is tedious.
 
 Btw., you should be able to omit both the * and .
 
 Thanks.
 Rainer

I put PR testsuite/50722 in the comment section and removed the *
and  after verifying that it works and then checked in the change.

Steve Ellcey
s...@cup.hp.com



[RFC PATCH] SLP vectorize calls

2011-10-20 Thread Jakub Jelinek
Hi!

While looking at *.vect dumps from Polyhedron, I've noticed the lack
of SLP vectorization of builtin calls.

This patch is an attempt to handle at least 1 and 2 operand builtin calls
(SLP doesn't handle ternary stmts either yet), where all the types are the
same.  E.g. it can handle
extern float copysignf (float, float);
extern float sqrtf (float);
float a[8], b[8], c[8], d[8];

void
foo (void)
{
  a[0] = copysignf (b[0], c[0]) + 1.0f + sqrtf (d[0]);
  a[1] = copysignf (b[1], c[1]) + 2.0f + sqrtf (d[1]);
  a[2] = copysignf (b[2], c[2]) + 3.0f + sqrtf (d[2]);
  a[3] = copysignf (b[3], c[3]) + 4.0f + sqrtf (d[3]);
  a[4] = copysignf (b[4], c[4]) + 5.0f + sqrtf (d[4]);
  a[5] = copysignf (b[5], c[5]) + 6.0f + sqrtf (d[5]);
  a[6] = copysignf (b[6], c[6]) + 7.0f + sqrtf (d[6]);
  a[7] = copysignf (b[7], c[7]) + 8.0f + sqrtf (d[7]);
}
and compile it into:
vmovaps .LC0(%rip), %ymm0
vandnps b(%rip), %ymm0, %ymm1
vandps  c(%rip), %ymm0, %ymm0
vorps   %ymm0, %ymm1, %ymm0
vsqrtps d(%rip), %ymm1
vaddps  %ymm1, %ymm0, %ymm0
vaddps  .LC1(%rip), %ymm0, %ymm0
vmovaps %ymm0, a(%rip)
I've bootstrapped/regtested it on x86_64-linux and i686-linux, but
am not 100% sure about all the changes, e.g. that
|| PURE_SLP_STMT (stmt_info) part.

2011-10-20  Jakub Jelinek  ja...@redhat.com

* tree-vect-stmts.c (vectorizable_call): Add SLP_NODE argument.
Handle vectorization of SLP calls.
(vect_analyze_stmt): Adjust caller, add call to it for SLP too.
(vect_transform_stmt): Adjust vectorizable_call caller, remove
assertion.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Handle one
and two argument calls too.
(vect_build_slp_tree): Allow CALL_EXPR.
(vect_get_slp_defs): Handle calls.

--- gcc/tree-vect-stmts.c.jj2011-10-20 14:13:34.0 +0200
+++ gcc/tree-vect-stmts.c   2011-10-20 18:02:43.0 +0200
@@ -1483,7 +1483,8 @@ vectorizable_function (gimple call, tree
Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 static bool
-vectorizable_call (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt)
+vectorizable_call (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt,
+  slp_tree slp_node)
 {
   tree vec_dest;
   tree scalar_dest;
@@ -1494,6 +1495,7 @@ vectorizable_call (gimple stmt, gimple_s
   int nunits_in;
   int nunits_out;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   tree fndecl, new_temp, def, rhs_type;
   gimple def_stmt;
   enum vect_def_type dt[3]
@@ -1505,19 +1507,12 @@ vectorizable_call (gimple stmt, gimple_s
   size_t i, nargs;
   tree lhs;
 
-  /* FORNOW: unsupported in basic block SLP.  */
-  gcc_assert (loop_vinfo);
-
-  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+  if (!STMT_VINFO_RELEVANT_P (stmt_info)  !bb_vinfo)
 return false;
 
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
 return false;
 
-  /* FORNOW: SLP not supported.  */
-  if (STMT_SLP_TYPE (stmt_info))
-return false;
-
   /* Is STMT a vectorizable call?   */
   if (!is_gimple_call (stmt))
 return false;
@@ -1558,7 +1553,7 @@ vectorizable_call (gimple stmt, gimple_s
   if (!rhs_type)
rhs_type = TREE_TYPE (op);
 
-  if (!vect_is_simple_use_1 (op, loop_vinfo, NULL,
+  if (!vect_is_simple_use_1 (op, loop_vinfo, bb_vinfo,
 def_stmt, def, dt[i], opvectype))
{
  if (vect_print_dump_info (REPORT_DETAILS))
@@ -1620,7 +1615,13 @@ vectorizable_call (gimple stmt, gimple_s
 
   gcc_assert (!gimple_vuse (stmt));
 
-  if (modifier == NARROW)
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+{
+  if (modifier != NONE)
+   return false;
+  ncopies = 1;
+}
+  else if (modifier == NARROW)
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
   else
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
@@ -1659,6 +1660,43 @@ vectorizable_call (gimple stmt, gimple_s
  else
VEC_truncate (tree, vargs, 0);
 
+ if (slp_node)
+   {
+ VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
+
+ gcc_assert (j == 0);
+ if (nargs == 1)
+   vect_get_vec_defs (gimple_call_arg (stmt, 0), NULL_TREE, stmt,
+  vec_oprnds0, vec_oprnds1, slp_node);
+ else if (nargs == 2)
+   vect_get_vec_defs (gimple_call_arg (stmt, 0),
+  gimple_call_arg (stmt, 1), stmt,
+  vec_oprnds0, vec_oprnds1, slp_node);
+ else
+   gcc_unreachable ();
+
+ /* Arguments are ready.  Create the new vector stmt.  */
+ FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vec_oprnd0)
+   {
+ vec_oprnd1 = nargs == 2 ? VEC_index (tree, vec_oprnds1, i)
+   

Re: Breakage with Update testsuite to run with slim LTO

2011-10-20 Thread Jan Hubicka
  Date: Tue, 27 Sep 2011 19:23:22 +0200
  From: Jan Hubicka hubi...@ucw.cz
 
  this patch updates testsuite to cover both fat and slim LTO when linker 
  plugin
  is used and also both linker plugin and collect2 paths.  I didn't wanted to
  slow down testing too much so I just distributes the flags across existing 
  runs
  with aim to maximize the coverage of testing matrix that is bit large now.
  I believe it is sufficient and testsuite now runs a bit faster than 
  previously
  since slim LTO saves some effort.
  
  sync and pr34850 tests doesn't pass with slim LTO. The reason is that they
  excpects diagnostics that is output too late in compilation (usually at
  expansion time).  These should be probably fixed as QOI issue but they are 
  not
  real bug - the diagnostics will be output at linktime.  I will open PR 
  tracking
  this.  We probably should output pretty much everything till end of early 
  opts
  except for stuff that really looks for optimization results.  Especially now
  when we handle always inline in early inlining.
  
  Honza
  
  * lib/lto.exp: When linker plugin is available test both
  plugin/non-plugin LTO paths as well as fat and slim LTO.
  lib/c-torture.exp: Likewise.
  lib/gcc-dg.exp: Likweise
 
 Looks like this patch broke, for cris-elf with TOT binutils:
 
 Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp 
 ...
 FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in-asm: .mof
 FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out-asm: .mof
 FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in2-asm: .mof
 FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out2-asm: .mof
 
 which for -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
 don't produce any code.  Is that expected? 
 
 If so, and if the required update is as for the test-cases you
 updated, to add:
 + /* { dg-options -ffat-lto-objects } */

Yes, if we scan assembler, we likely want -fno-fat-lto-objects.
 
 then IIUC you need to patch *all* torture tests that use
 scan-assembler and scan-assembler-not.  Alternatively, patch
 somewhere else, like not passing it if certain directives are
 used, like scan-assembler{,-not}.  And either way, is it safe to
 add that option always, not just when also passing -flto or
 something?

Hmm, some of assembler scans still works because they check for
presence of symbols we output anyway, but indeed, it would make more
sense to automatically imply -ffat-lto-object when scan-assembler
is used.  I am not sure if my dejagnu skill as on par here however.

Honza
 
 brgds, H-P


Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-20 Thread Joseph S. Myers
On Thu, 20 Oct 2011, Aldy Hernandez wrote:

 These operations don't exist on x86-32 bits, and when running multilibed
 tests, the target is still x86_64-unknown-linux-gnu but the target is 32-bits
 when using -m32.

Any test that only handles one of x86_64-* and i?86-* is automatically 
wrong; you can use -m64 with i?86-* targets.  You always need to handle 
both together.

Do these operations exist for x32 as well as for -m64?  If they do, then 
lp64 isn't the right test either; if not, then it is.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [cxx-mem-model] Handle x86-64 with -m32

2011-10-20 Thread H.J. Lu
On Thu, Oct 20, 2011 at 3:38 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Thu, 20 Oct 2011, Aldy Hernandez wrote:

 These operations don't exist on x86-32 bits, and when running multilibed
 tests, the target is still x86_64-unknown-linux-gnu but the target is 32-bits
 when using -m32.

 Any test that only handles one of x86_64-* and i?86-* is automatically
 wrong; you can use -m64 with i?86-* targets.  You always need to handle
 both together.

 Do these operations exist for x32 as well as for -m64?  If they do, then
 lp64 isn't the right test either; if not, then it is.


X32 has native int64 and int128.

-- 
H.J.


[v3] libstdc++/50196 - enable std::thread, std::mutex etc. on darwin

2011-10-20 Thread Jonathan Wakely
This patch should enable macosx support for thread and partial
support for mutex, by defining _GLIBCXX_HAS_GTHREADS on POSIX
systems without the _POSIX_TIMEOUTS option, and only disabling the
types which rely on the Timeouts option, std::timed_mutex and
std::recursive_timed_mutex, instead of disabling all thread support.

Paolo, Jakub, I'd appreciate it if you two could check this over, as
you were responsible for some of this autoconf stuff, via
http://gcc.gnu.org/PR49745

I've only tested this on x86-64-linux where everything is supported
anyway, but I did tweak the configure test for _POSIX_TIMEOUTS to
fail, so I could check the tests were correctly disabled when
_GTHREADS_HAS_MUTEX_TIMEDLOCK is zero.

If anyone can test this on darwin I'd be very grateful (you'll need to
run autoreconf in the libstdc++-v3 directory to regenerate configure
and config.h.in, or contact me and I'll mail you the regenerated
files)

ChangeLog:

* acinclude.m4 (GTHREADS_HAS_MUTEX_TIMEDLOCK): Don't depend on
_POSIX_TIMEOUTS.
* configure: Regenerate.
* config.h.in: Regenerate.
* include/std/mutex (timed_mutex, recursive_timed_mutex): Define
conditionally on GTHREADS_HAS_MUTEX_TIMEDLOCK.
* testsuite/lib/libstdc++.exp (check_v3_target_gthreads_timed): Define.
* testsuite/lib/dg-options.exp (dg-require-gthreads-timed): Define.
* testsuite/30_threads/recursive_timed_mutex/dest/destructor_locked.cc:
Use dg-require-gthreads-timed instead of dg-require-gthreads.
* testsuite/30_threads/recursive_timed_mutex/native_handle/
typesizes.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/native_handle/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/2.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/assign_neg.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/cons/copy_neg.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/requirements/typedefs.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock/2.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/lock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/lock/2.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/1.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/2.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_for/3.cc:
Likewise.
* testsuite/30_threads/timed_mutex/dest/destructor_locked.cc: Likewise.
* testsuite/30_threads/timed_mutex/native_handle/typesizes.cc:
Likewise.
* testsuite/30_threads/timed_mutex/native_handle/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_until/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_until/2.cc: Likewise.
* testsuite/30_threads/timed_mutex/cons/assign_neg.cc: Likewise.
* testsuite/30_threads/timed_mutex/cons/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/cons/copy_neg.cc: Likewise.
* testsuite/30_threads/timed_mutex/requirements/standard_layout.cc:
Likewise.
* testsuite/30_threads/timed_mutex/requirements/typedefs.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock/2.cc: Likewise.
* testsuite/30_threads/timed_mutex/lock/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/unlock/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_for/1.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_for/2.cc: Likewise.
* testsuite/30_threads/timed_mutex/try_lock_for/3.cc: Likewise.
Index: acinclude.m4
===
--- acinclude.m4	(revision 180278)
+++ acinclude.m4	(working copy)
@@ -3358,7 +3358,7 @@
   ac_save_CXXFLAGS=$CXXFLAGS
   CXXFLAGS=$CXXFLAGS -fno-exceptions -I${toplevel_srcdir}/gcc
 
-  AC_MSG_CHECKING([check whether it can be safely assumed that mutex_timedlock is available])
+  AC_MSG_CHECKING([whether it can be safely assumed that mutex_timedlock is available])
 
   AC_TRY_COMPILE([#include unistd.h],
 [
@@ -3382,20 +3382,11 @@
 
   AC_MSG_CHECKING([for gthreads library])
 
-  AC_TRY_COMPILE([
-  #include gthr.h
-		  #include unistd.h
- ],
+  AC_TRY_COMPILE([#include gthr.h],
 [
   #ifndef __GTHREADS_CXX0X
   #error
   #endif
-
-  // In case of POSIX threads check _POSIX_TIMEOUTS 

[PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO v2

2011-10-20 Thread Andi Kleen
From: Andi Kleen a...@linux.intel.com

Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most
convenient way to get this into existing Makefiles is using small
wrappers that pass the plugin. This matches how other compilers
(LLVM, icc) do this too.

My previous attempt at using shell scripts for this
http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html
was not approved. Here's another attempt using wrappers written
in C.  This adds wrappers add a --plugin argument before calling the
respective binutils utilities.

The logic gcc.c uses to find the files is very complicated. I didn't
try to replicate it 100% and left out some magic. I would be interested
if this simple method works for everyone or if more code needs
to be added. This only needs to support LTO supporting hosts of course.

I didn't add any documentation because the syntax is exactly the same as
the native ar/ranlib/nm.

v2: Address review comments. Makefile follows go now, use own binaries
for each sub program.

Passed bootstrap and test suite on x86_64-linux.

gcc/:
2011-10-19  Andi Kleen  a...@linux.intel.com

* Makefile.in (MOSTLYCLEANFILES): Add gcc-ar/nm/ranlib.
(native): Add gcc-ar, gcc-nm, gcc-ranlib.
(AR_LIBS, gcc-ar, gcc-ar.o, gcc-ranlib, gcc-ranlib.o,
 gcc-nm, gcc-nm.o, gcc-ranlib.c, gcc-nm.c): Add.
(install): Depend on install-gcc-ar.
(install-gcc-ar): Add.
(uninstall): Uninstall gcc-ar, gcc-nm, gcc-ranlib.
* gcc-ar.c: Add new file.
---
 gcc/Makefile.in |   71 +++--
 gcc/gcc-ar.c|   96 +++
 2 files changed, 164 insertions(+), 3 deletions(-)
 create mode 100644 gcc/gcc-ar.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6b28ef5..1b9987a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1545,7 +1545,8 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h 
insn-codes.h \
  genrtl.h gt-*.h gtype-*.h gtype-desc.c gtyp-input.list \
  xgcc$(exeext) cpp$(exeext) cc1$(exeext) $(EXTRA_PASSES) \
  $(EXTRA_PARTS) $(EXTRA_PROGRAMS) gcc-cross$(exeext) \
- $(SPECS) collect2$(exeext) lto-wrapper$(exeext) \
+ $(SPECS) collect2$(exeext) gcc-ar$(exeext) gcc-nm$(exeext) \
+ gcc-ranlib$(exeext) \
  gcov-iov$(build_exeext) gcov$(exeext) gcov-dump$(exeext) \
  gengtype$(exeext) *.[0-9][0-9].* *.[si] *-checksum.c libbackend.a \
  libcommon-target.a libcommon.a libgcc.mk
@@ -1791,7 +1792,8 @@ rest.encap: lang.rest.encap
 # This is what is made with the host's compiler
 # whether making a cross compiler or not.
 native: config.status auto-host.h build-@POSUB@ $(LANGUAGES) \
-   $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext)
+   $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext) \
+   gcc-ar$(exeext) gcc-nm$(exeext) gcc-ranlib$(exeext)
 
 ifeq ($(enable_plugin),yes)
 native: gengtype$(exeext)
@@ -2049,6 +2051,46 @@ sbitmap.o: sbitmap.c sbitmap.h $(CONFIG_H) $(SYSTEM_H) 
coretypes.h $(BASIC_BLOCK
 ebitmap.o: ebitmap.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(EBITMAP_H)
 sparseset.o: sparseset.c $(SYSTEM_H) sparseset.h $(CONFIG_H)
 
+AR_LIBS = @COLLECT2_LIBS@
+
+gcc-ar$(exeext): gcc-ar.o $(LIBDEPS)
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-ar.o -o $@ \
+   $(LIBS) $(AR_LIBS)
+
+gcc-nm$(exeext): gcc-nm.o $(LIBDEPS)
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-nm.o -o $@ \
+   $(LIBS) $(AR_LIBS)
+
+gcc-ranlib$(exeext): gcc-ranlib.o $(LIBDEPS)
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-ranlib.o -o $@ \
+   $(LIBS) $(AR_LIBS)
+
+CFLAGS-gcc-ar.o += $(DRIVER_DEFINES) \
+   -DTARGET_MACHINE=\$(target_noncanonical)\ \
+   @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\ar\
+
+gcc-ar.o: gcc-ar.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H)
+
+CFLAGS-gcc-ranlib.o += $(DRIVER_DEFINES) \
+   -DTARGET_MACHINE=\$(target_noncanonical)\ \
+   @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\ranlib\
+
+gcc-ranlib.o: gcc-ranlib.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H)
+
+CFLAGS-gcc-nm.o += $(DRIVER_DEFINES) \
+   -DTARGET_MACHINE=\$(target_noncanonical)\ \
+   @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\nm\
+
+gcc-nm.o: gcc-nm.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H)
+
+# ??? the implicit rules dont trigger if the source file has a different name
+# so copy instead
+gcc-ranlib.c: gcc-ar.c
+   cp $^ $@
+
+gcc-nm.c: gcc-ar.c
+   cp $^ $@
+
 COLLECT2_OBJS = collect2.o collect2-aix.o tlink.o
 COLLECT2_LIBS = @COLLECT2_LIBS@
 collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
@@ -4576,7 +4618,7 @@ maintainer-clean:
 # broken is small.
 install: install-common $(INSTALL_HEADERS) \
 install-cpp install-man install-info install-@POSUB@ \
-install-driver install-lto-wrapper
+install-driver install-lto-wrapper install-gcc-ar
 
 ifeq ($(enable_plugin),yes)
 install: install-plugin
@@ -4901,6 +4943,23 @@ install-collect2: collect2 installdirs
 install-lto-wrapper: 

[C++ Patch] PR 50811 (rejects class-virt-specifier if class-head-name includes nested-name-specifier)

2011-10-20 Thread Ville Voutilainen

Tested on X86-32 linux.

2011-10-21 Ville Voutilainen ville.voutilai...@gmail.com

   PR c++/50811
   * parser.c (cp_parser_class_head): Parse virt-specifiers regardless 
of whether an id is present

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ea0c4dc..dd2357b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -17576,8 +17576,8 @@ cp_parser_class_head (cp_parser* parser,
 {
   cp_parser_check_for_invalid_template_id (parser, id,
type_start_token-location);
-  virt_specifiers = cp_parser_virt_specifier_seq_opt (parser);
 }
+  virt_specifiers = cp_parser_virt_specifier_seq_opt (parser);
 
   /* If it's not a `:' or a `{' then we can't really be looking at a
  class-head, since a class-head only appears as part of a
diff --git a/gcc/testsuite/g++.dg/cpp0x/override2.C 
b/gcc/testsuite/g++.dg/cpp0x/override2.C
index 7f17504..0d8871d 100644
--- a/gcc/testsuite/g++.dg/cpp0x/override2.C
+++ b/gcc/testsuite/g++.dg/cpp0x/override2.C
@@ -28,6 +28,13 @@ struct B6 final final {}; // { dg-error duplicate 
virt-specifier }
 
 struct B7 override {}; // { dg-error cannot specify 'override' for a class }
 
+namespace N
+{
+  struct C;
+}
+
+struct N::C final{};
+
 int main()
 {
   D3B1 d;


[commit, spu] Fix vec_perm pattern (Re: [rs6000, spu] Add vec_perm named pattern)

2011-10-20 Thread Ulrich Weigand
Richard Henderson wrote:

 The generic support for vector permutation will allow for automatic
 lowering to V*QImode, so all we need to add to support for these targets
 is the single V16QI pattern that represents the base permutation insn.
 
 I'm not touching any of the other ways that the permutation insn 
 could be generated.  After the generic support is added, I'll leave
 it to the port maintainers to determine what they want to keep.  I
 suspect in many cases using the generic __builtin_shuffle plus some
 casting in the target-specific header files would be sufficient,
 eliminating several dozen builtins.


Sorry I didn't get to this earlier, I got side-tracked by a number
of independent regressions on SPU ...

Unfortunately, the semantics of vec_perm do not match 100% those of the
SPU Shuffle Bytes instruction.  vec_perm assumes the selector elements
apply modulo 32, but shufb uses values = 128 for special purposes.
See the ISA:

  Value in Register RC
  (Expressed in Binary)  Result Byte

  10xx   0x00
  110x   0xFF
  111x   0x80
  Otherwise  The byte of the concatenated register addressed by
 the rightmost 5 bits of register RC


To implement the vec_perm semantics fully, we therefore need to reduce the
selector modulo 32 explicitly before using shuf.

Tested on spu-elf, fixes various vshuf test cases.
Committed to mainline.

Bye,
Ulrich


ChangeLog:

* config/spu/spu.md (vec_permv16qi): Reduce selector modulo 32
before using the shufb instruction.

Index: gcc/config/spu/spu.md
===
*** gcc/config/spu/spu.md   (revision 180240)
--- gcc/config/spu/spu.md   (working copy)
*** selb\t%0,%4,%0,%3
*** 4395,4410 
shufb\t%0,%1,%2,%3
[(set_attr type shuf)])
  
  (define_expand vec_permv16qi
!   [(set (match_operand:V16QI 0 spu_reg_operand )
(unspec:V16QI
  [(match_operand:V16QI 1 spu_reg_operand )
   (match_operand:V16QI 2 spu_reg_operand )
!  (match_operand:V16QI 3 spu_reg_operand )]
  UNSPEC_SHUFB))]

{
! operands[3] = gen_lowpart (TImode, operands[3]);
})
  
  (define_insn nop
--- 4395,4416 
shufb\t%0,%1,%2,%3
[(set_attr type shuf)])
  
+ ; The semantics of vec_permv16qi are nearly identical to those of the SPU
+ ; shufb instruction, except that we need to reduce the selector modulo 32.
  (define_expand vec_permv16qi
!   [(set (match_dup 4) (and:V16QI (match_operand:V16QI 3 spu_reg_operand )
!  (match_dup 6)))
!(set (match_operand:V16QI 0 spu_reg_operand )
(unspec:V16QI
  [(match_operand:V16QI 1 spu_reg_operand )
   (match_operand:V16QI 2 spu_reg_operand )
!  (match_dup 5)]
  UNSPEC_SHUFB))]

{
! operands[4] = gen_reg_rtx (V16QImode);
! operands[5] = gen_lowpart (TImode, operands[4]);
! operands[6] = spu_const (V16QImode, 31);
})
  
  (define_insn nop

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: regcprop.c bug fix

2011-10-20 Thread Mike Stump
On Oct 20, 2011, at 6:22 AM, Bernd Schmidt wrote:
 I found that maximally confusing, so let me try to rephrase it to see if
 I understood you. The two calls to validate_change clobber the
 recog_data even if they fail. In case they failed, we want to continue
 looking at data from the original insn, so we must recompute it.

Yes, exactly.

 diagnosis. Better to move the recomputation into the if statement that
 contains the validate_change calls and possibly add a comment about the
 effect of that function; otherwise OK.

Ok, I've updated the code and added some comments and finished testing and 
checked it in.  Thanks.

2011-10-20  Mike Stump  mikest...@comcast.net

* regcprop.c (copyprop_hardreg_forward_1): Update recog_data
after validate_change wipes it out.

Index: regcprop.c
===
--- regcprop.c  (revision 180265)
+++ regcprop.c  (working copy)
@@ -840,6 +840,12 @@ copyprop_hardreg_forward_1 (basic_block 
  changed = true;
  goto did_replacement;
}
+ /* We need to re-extract as validate_change clobbers
+recog_data.  */
+ extract_insn (insn);
+ if (! constrain_operands (1))
+   fatal_insn_not_found (insn);
+ preprocess_constraints ();
}
 
  /* Otherwise, try all valid registers and see if its valid.  */
@@ -862,6 +868,12 @@ copyprop_hardreg_forward_1 (basic_block 
  changed = true;
  goto did_replacement;
}
+ /* We need to re-extract as validate_change clobbers
+recog_data.  */
+ extract_insn (insn);
+ if (! constrain_operands (1))
+   fatal_insn_not_found (insn);
+ preprocess_constraints ();
}
}
}


[RFA:] fix breakage with Update testsuite to run with slim LTO

2011-10-20 Thread Hans-Peter Nilsson
 Date: Fri, 21 Oct 2011 00:19:32 +0200
 From: Jan Hubicka hubi...@ucw.cz
 Yes, if we scan assembler, we likely want -fno-fat-lto-objects.

  then IIUC you need to patch *all* torture tests that use
  scan-assembler and scan-assembler-not.  Alternatively, patch
  somewhere else, like not passing it if certain directives are
  used, like scan-assembler{,-not}.  And either way, is it safe to
  add that option always, not just when also passing -flto or
  something?
 
 Hmm, some of assembler scans still works because they check for
 presence of symbols we output anyway, but indeed, it would make more
 sense to automatically imply -ffat-lto-object when scan-assembler
 is used.  I am not sure if my dejagnu skill as on par here however.

Maybe you could make amends ;) by testing the following, which
seems to work at least for dg-torture.exp and cris-elf/cris-sim,
in which -ffat-lto-object is automatically added for each
scan-assembler and scan-assembler-not test, extensible for other
dg-final actions without polluting with checking LTO options and
whatnot across the files.  I checked (and corrected) so it also
works when !check_effective_target_lto by commenting out the
setting in the second chunk.

gcc/testsuite:

* lib/gcc-dg.exp (gcc_force_conventional_output): New global
variable, default empty, -ffat-lto-objects for effective_target_lto.
(gcc-dg-test-1): Add options from dg-final methods.
* lib/scanasm.exp (scan-assembler_required_options)
(scan-assembler-not_required_options): New procs.

Ok to commit?

Index: lib/gcc-dg.exp
===
--- lib/gcc-dg.exp  (revision 180270)
+++ lib/gcc-dg.exp  (working copy)
@@ -68,6 +68,13 @@ if [info exists ADDITIONAL_TORTURE_OPTIO
 }
 
 set LTO_TORTURE_OPTIONS 
+
+# Some torture-options cause intermediate code output, unusable for
+# testing using e.g. scan-assembler.  In this variable are the options
+# how to force it, when needed.
+global gcc_force_conventional_output
+set gcc_force_conventional_output 
+
 if [check_effective_target_lto] {
 # When having plugin test both slim and fat LTO and plugin/nonplugin
 # path.
@@ -76,6 +83,7 @@ if [check_effective_target_lto] {
  { -O2 -flto -fno-use-linker-plugin -flto-partition=none } \
  { -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects }
   ]
+  set gcc_force_conventional_output -ffat-lto-objects
 } else {
   set LTO_TORTURE_OPTIONS [list \
  { -O2 -flto -flto-partition=none } \
@@ -154,6 +162,19 @@ proc gcc-dg-test-1 { target_compile prog
default {
perror $do_what: not a valid dg-do keyword
return 
+   }
+}
+
+# Let { dg-final { action } } force options as returned by an
+# optional proc ${action}_required_options.
+upvar 2 dg-final-code finalcode
+foreach x [split $finalcode \n] {
+   set finalcmd [lindex $x 0]
+   if { [info procs ${finalcmd}_required_options] !=  } {
+   set req [${finalcmd}_required_options]
+   if { $req !=  } {
+   lappend extra_tool_flags $req
+   }
}
 }
 
Index: lib/scanasm.exp
===
--- lib/scanasm.exp (revision 180270)
+++ lib/scanasm.exp (working copy)
@@ -85,6 +85,11 @@ proc scan-assembler { args } {
 dg-scan scan-assembler 1 $testcase $output_file $args
 }
 
+proc scan-assembler_required_options { args } {
+global gcc_force_conventional_output
+return $gcc_force_conventional_output
+}
+
 # Check that a pattern is not present in the .s file produced by the
 # compiler.  See dg-scan for details.
 
@@ -94,6 +99,11 @@ proc scan-assembler-not { args } {
 set output_file [file rootname [file tail $testcase]].s
 
 dg-scan scan-assembler-not 0 $testcase $output_file $args
+}
+
+proc scan-assembler-not_required_options { args } {
+global gcc_force_conventional_output
+return $gcc_force_conventional_output
 }
 
 # Return the scan for the assembly for hidden visibility. 

brgds, H-P


[pph] Re-organize handling of mergeable nodes (issue5318043)

2011-10-20 Thread Diego Novillo

This patch re-organizes the streaming of mergeable nodes so that we
can separate the process of merging the incoming ASTs into the current
compilation context from the reading of their contents.

This problem occurs when two or more PPH images in the same
compilation unit have embedded the same text header file.  For
example, in the following scenario files image1.h and image2.h can
be converted into PPH images, while text.h is a regular header
inclusion:

tu.cc:
#include image1.h
#include image2.h

image1.h
#ifndef __IMAGE1_H
#define __IMAGE1_H
#include text.h
...
#endif

image2.h
#ifndef __IMAGE2_H
#define __IMAGE2_H
#include text.h
...
#endif


Since the ASTs generated by text.h are embedded inside both image1.pph
and image2.pph.  When we read those file, we will read the same
symbols.  To avoid multiple definition errors, we need to merge the
symbols coming from these file.

This is not the usual merge done by the parser, however.  In here we
are merging full definitions, callgraph nodes, types, etc.

What the patch does is to break the writing of a tree into two pieces.
The first piece is the merge key, which is necessary for allocation
and used as a lookup key into the parser data structures to determine
if that object has already been seen by the parser. The second piece
is the merge body, which carries the bulk of the information,
particularly any information that causes circularity.

The serialization algorithm occurs in two phases. The first phase
walks the data structure emitting only merge keys for all mergeable
trees. The second phase walks the data structure emitting only bodies
for all mergeable trees. Non-mergeable trees will be emitted in whole
as a side effect of the second phase.

The de-serialization algorithm follows the same two phases. The first
phase reads the tree merge keys, searches for merge matches, and
either allocates a new tree or redirects references to an existing
tree. The second phase reads the tree merge bodies. When the body
corresponds to a matched tree, it must incorporate new information
into the existing tree, i.e. do merging. For example, a function
definition could come from the first pph file read, or the second pph
file read, but in either case the definition must reside in the final
data structures.

We are still not doing proper merging of everything.  This patch
simply implements the mechanics of splitting up the streaming of
mergeable nodes.  We will be working on the actual merge logic on top
of this change.

Tested on x86_64.  Committed to branch.


Diego.


* pph-streamer-in.c (ALLOC_AND_REGISTER_ALTERNATE): Remove.
(pph_in_tree_1): Remove.
(pph_in_tree): Rename from pph_in_tree_1.
(pph_in_chain): Call streamer_read_chain.
(pph_in_merge_key_chain): New.
(pph_in_merge_body_chain): Rename from pph_in_mergeable_chain.
(pph_in_binding_level_1): Switch the order of the arguments.
Update all users.
Read fields this_entity and static_decls.
(pph_in_mergeable_binding_level): Remove.
(pph_in_merge_key_tree): New.
(pph_in_tree): Handle PPH_RECORD_START_MERGE_BODY.
(pph_in_merge_keys): New.
(pph_in_global_binding): New.
(pph_read_file_1): Call it.
* pph-streamer-out.c (pph_get_marker_for):
(pph_out_start_tree_record): Handle PPH_RECORD_START_MERGE_BODY.
(pph_out_start_merge_key_record): New.
(pph_out_tree_1): Remove.
(pph_out_tree): Rename from pph_out_tree_1.
Handle PPH_RECORD_START_MERGE_BODY.
(pph_out_merge_key_vec): Rename from pph_out_mergeable_tree_vec.
Call pph_out_merge_key_tree.
(pph_out_merge_key_chain): Rename from pph_out_mergeable_chain_filtered.
(pph_out_binding_level_1): Handle fields this_entity and static_decls.
(pph_out_mergeable_binding_level): Remove.
(pph_out_merge_key_tree): New.
(pph_out_merge_keys): New.
(pph_out_global_binding): Call pph_out_merge_keys.
* pph-streamer.c (pph_cache_insert_at): Return the inserted entry.
Update all users.
(pph_cache_add): Likewise.
(pph_cache_lookup): Return the found entry.  Update all users.
(pph_cache_lookup_in_includes): Likewise.
(pph_merge_name): Ignore EXPRs with no lang-specific info.
* pph-streamer.h (pph_cache_entry): Add field needs_merge_body.
Update all users.
(pph_tag_to_tree_code): Remove.
(pph_tree_is_mergeable): New.
* pph.h (enum pph_record_marker): Add values PPH_RECORD_START_MERGE_KEY
and PPH_RECORD_START_MERGE_BODY.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index d8f17b9..ecb7182 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -58,20 +58,6 @@ static VEC(char_p,heap) *string_tables = NULL;
   pph_cache_insert_at (CACHE, 

Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-20 Thread Gabriel Charette
I just thought about something..

Earlier I said that ALL line_table issues were resolved after this
patch (as it ignores the re-included headers that were guarded, as the
non-pph compiler does naturally).

One problem remains however, I'm pretty sure that re-included
non-pph'ed header's line_table entries are still showing up multiple
times (as the direct non-pph childs of a given pph_include have their
line_table entries copied one by one from the pph file).

I think we were talking about somehow remembering guards context in
which DECLs were declared and then ignoring DECLs streamed in if they
belong to a given header guard type that was previously seen in a
prior include using the same non-pph header, allowing us to ignore
those DECLs that are re-declared when they should have been guarded
out the second time.

I'm not sure whether there is machinery to handle non-pph re-includes
yet... but... at the very least, I'm pretty sure those non-pph entries
still show up multiple times in the line_table.

Now, we can't just remove/ignore those entries either... doing so
would alter the expected location offset (pph_loc_offset) applied to
all tokens streamed in directly from the pph header.

What we could potentially do is:
- ignore the repeated non-pph entry
- remember the number of locations this entry was supposed to take
(call that pph_loc_ignored_offset)
- then for DECLs imported after it we would then need an offset of
pph_loc_offset - pph_loc_ignored_offset, to compensate for the missing
entries in the line_table

The problem here obviously is that I don't think we have a way of
knowing which DECLs come before, inside, and after a given non-pph
header included in the parent pph header which we are currently
reading.

Furthermore, a DECL coming after the non-pph header could potentially
refer to something inside the ignored non-pph header and the
source_location of the referred token would now be invalid (although
that might already be fixed by the cache hit which would redirect that
token reference to the same token in the first included copy of that
same header which wasn't actually skipped as it was first and which is
valid)


On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote:
 @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream)
   int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
   enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream);

 -  pph_reading_includes++;
 -
   for (first = true; next_lt_marker != PPH_LINETABLE_END;
        next_lt_marker = pph_in_linetable_marker (stream))
     {
 @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream)
          else
            lm-included_from += entries_offset;


Also, if we do ignore some non-pph entries, the included_from
calculation is going to need some trickier logic as well (it's fine
for the pph includes though as each child calculates it's own offset)

 - gcc_assert (lm-included_from  (int) line_table-used);
 -

Also, I think this slipped in my previous comment, but I don't see how
this assert could trigger in the current code. If it did trigger
something was definitely wrong as it asserts the offseted
included_from is referring to an entry that is actually in the
line_table...

  lm-start_location += pph_loc_offset;

Cheers,
Gab


 --
 This patch is available for review at http://codereview.appspot.com/5235061



Re: [patch] dwarf2out crash: missing GTY? (PR 50806)

2011-10-20 Thread Laurynas Biveinis
2011/10/20 Jan Kratochvil jan.kratoch...@redhat.com:
 Hi,

 with custom patched dwarf2out.c I got a crash on memory mangled by the garbage
 collector.  With patched GTY there the crash no longer happened - but I do not
 have a reproducer anymore, sorry if it is a bogus patch.

 The memory corrupted later was initially allocated and stored into
 mem_loc_result-dw_loc_oprnd1.v.val_loc.  I do not think there is any other
 reference to it than that field with no GTY.

 2011-10-20  Jan Kratochvil  jan.kratoch...@redhat.com

        * dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr;

This patch is a no-op, as already pointed out. If this comes up again,
I'd set a conditional breakpoint on ggc_set_mark if (arg == struct
dw_loc_list_struct with the field that gets collected) and try to find
out why the field does not get marked.

-- 
Laurynas


Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order

2011-10-20 Thread Kirill Yukhin
Thanks!

K

On Fri, Oct 21, 2011 at 12:37 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Oct 20, 2011 at 1:30 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:

 OK.

 Thanks,
 Uros.

 Great,
 could anybody please commit that?


 I checked it in for you.

 --
 H.J.



Re: [patch] dwarf2out crash: missing GTY? (PR 50806)

2011-10-20 Thread Jan Kratochvil
On Fri, 21 Oct 2011 05:37:09 +0200, Laurynas Biveinis wrote:
 This patch is a no-op, as already pointed out. If this comes up again,
 I'd set a conditional breakpoint on ggc_set_mark if (arg == struct
 dw_loc_list_struct with the field that gets collected) and try to find
 out why the field does not get marked.

Thanks all for the review, I see I do not know the GC.  I thought the bug is
so obvious... I did not make a snapshot of the tree in that crashing state.
Therefore I cannot say anything useful about the crash anymore.


Thanks,
Jan


  1   2   >