Re: [PATCH] Fix SLP vectorization of shifts (PR tree-optimization/48616)

2011-04-17 Thread Ira Rosen


Jakub Jelinek  wrote on 17/04/2011 05:26:14 PM:
>
> On Sun, Apr 17, 2011 at 11:30:31AM +0300, Ira Rosen wrote:
> > We already have this check in vect_build_slp_tree(). It didn't work for
the
> > testcase because it doesn't take into account the type of definition. I
> > agree that it's better to move it here and base the vector/scalar
decision
> > on it, but please remove the now redundant check from
vect_build_slp_tree
> > ().
>
> I've tried to add this to my patch, unfortunately slp-36.c test then
fails
> on both x86_64-linux and i686-linux.
>
> The problem is that during the analysis vectorizable_shift is called with
> NULL slp_node, but later on it is called with non-NULL:

Right, sorry, I haven't thought about this. The vect_build_slp_tree check
is done during the analysis and it checks that if vector shift argument is
not supported, the arguments should be equal. And the vectorizable_shift
SLP check is done only during transformation. So, your original patch is
OK.

Thanks,
Ira



Re: [patch, libgfortran] PR48602 Invalid F conversion of G descriptor for values close to powers of 10

2011-04-17 Thread Jerry DeLisle

On 04/17/2011 11:39 AM, Steve Kargl wrote:

On Sun, Apr 17, 2011 at 10:14:37AM -0700, Jerry DeLisle wrote:

Hi Folks,

This patch implements the adjustments required for the various rounding
modes as stated in the Fortran Standard F2008, paragraph 10.7.5.2.2.  In
the process of testing this patch we have found another bug related to D
and E editing which is now PR48651.  In the meantime, this regressions
tests fine on x86-64.

Test case is that provided in the PR.

OK for trunk?



The patch looks ok to me.  But, if you were in the IRC
meeting, we could discuss it. :-)



$ svn commit ChangeLog io/write_float.def
SendingChangeLog
Sendingio/write_float.def
Transmitting file data ..
Committed revision 172634.


Re: C++ PATCH for c++/48531 (ICE with value-initialization of array)

2011-04-17 Thread Jason Merrill

And a minor change I noticed while looking at this area.
commit bd069335f569a8c148aec0daf81b3e3a57f31ad4
Author: Jason Merrill 
Date:   Sat Apr 16 17:31:36 2011 -0700

* tree.c (get_target_expr): Handle VEC_INIT_EXPR.

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index ad004bb..25f2c32 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -605,6 +605,8 @@ get_target_expr (tree init)
 {
   if (TREE_CODE (init) == AGGR_INIT_EXPR)
 return build_target_expr (AGGR_INIT_EXPR_SLOT (init), init);
+  else if (TREE_CODE (init) == VEC_INIT_EXPR)
+return build_target_expr (VEC_INIT_EXPR_SLOT (init), init);
   else
 return build_target_expr_with_type (init, TREE_TYPE (init));
 }


C++ PATCH for c++/48531 (ICE with value-initialization of array)

2011-04-17 Thread Jason Merrill
The switch to using build_value_init for value-initialization of 
non-class types caused us to incorrectly allow functional casts to array 
type, leading to an ICE.


Tested x86_64-pc-linux-gnu, applied to trunk.
commit 4c4d50b6852b51cd6a227a09f330a086eb3b1d63
Author: Jason Merrill 
Date:   Sat Apr 16 17:34:10 2011 -0700

PR c++/48531
* typeck2.c (build_functional_cast): Disallow array type.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index fad7f0c..3280d9b 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -369,6 +369,8 @@ build_value_init (tree type, tsubst_flags_t complain)
 tree
 build_value_init_noctor (tree type, tsubst_flags_t complain)
 {
+  /* FIXME the class and array cases should just use digest_init once it is
+ SFINAE-enabled.  */
   if (CLASS_TYPE_P (type))
 {
   gcc_assert (!TYPE_NEEDS_CONSTRUCTING (type));
@@ -450,7 +452,9 @@ build_value_init_noctor (tree type, tsubst_flags_t complain)
  if (ce->value == error_mark_node)
return error_mark_node;
 
- /* The gimplifier can't deal with a RANGE_EXPR of TARGET_EXPRs.  */
+ /* We shouldn't have gotten here for anything that would need
+non-trivial initialization, and gimplify_init_ctor_preeval
+would need to be fixed to allow it.  */
  gcc_assert (TREE_CODE (ce->value) != TARGET_EXPR
  && TREE_CODE (ce->value) != AGGR_INIT_EXPR);
}
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 20b47d5..f0b67f7 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1534,6 +1534,15 @@ build_functional_cast (tree exp, tree parms, 
tsubst_flags_t complain)
   else
 type = exp;
 
+  /* We need to check this explicitly, since value-initialization of
+ arrays is allowed in other situations.  */
+  if (TREE_CODE (type) == ARRAY_TYPE)
+{
+  if (complain & tf_error)
+   error ("functional cast to array type %qT", type);
+  return error_mark_node;
+}
+
   if (processing_template_decl)
 {
   tree t;
diff --git a/gcc/testsuite/g++.dg/cpp0x/sfinae16.C 
b/gcc/testsuite/g++.dg/cpp0x/sfinae16.C
new file mode 100644
index 000..6470567
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/sfinae16.C
@@ -0,0 +1,17 @@
+// PR c++/48531
+// { dg-options -std=c++0x }
+
+template
+char f(int);
+
+template
+double f(...);
+
+struct B2 {
+  B2(...);
+};
+
+#define SA(X) static_assert ((X), #X);
+SA(sizeof(f(0)) != 1);


Re: [v3] Actually run some tests in parallel-mode for check-parallel

2011-04-17 Thread Paolo Carlini
... looks like I missed std::partitition. Tested x86_64-linux 
parallel-mode, committed to mainline.


Paolo.

//
2011-04-17  Paolo Carlini  

* testsuite/25_algorithms/partition/moveable.cc: Actually run
it in parallel-mode for check-parallel.
Index: testsuite/25_algorithms/partition/moveable.cc
===
--- testsuite/25_algorithms/partition/moveable.cc   (revision 172618)
+++ testsuite/25_algorithms/partition/moveable.cc   (working copy)
@@ -1,6 +1,6 @@
 // { dg-options "-std=gnu++0x" }
 
-// Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010
+// Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011
 // Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
@@ -22,10 +22,6 @@
 
 #undef _GLIBCXX_CONCEPT_CHECKS
 
-// XXX FIXME:  parallel-mode should deal correctly with moveable-only types
-// per C++0x, at minimum smoothly fall back to serial.
-#undef _GLIBCXX_PARALLEL
-
 #include 
 #include 
 #include 


Re: Remove vtable_method field in cgraph_node

2011-04-17 Thread Jan Hubicka
> > notice that OBJ_TYPE_REF is useless since foo got devitualized and it holds 
> > alive
> > the useless s._vptr.S = &_ZTV1S[2].  We want to drop OBJ_TYPE_REF at time 
> > we fold
> > first argument to constant.
> 
> At some point we dropped all OBJ_TYPE_REF with direct fn but that lead to
> issues.  I'll try to dig up what that was.

Hah, that sounds really scary, given that OBJ_TYPE_REF should be semantically 
no-op.
We definitely drop them in tree-ssa-ccp, so if we need to preserve them 
somehow, that is
probably case where we should.

Honza


[patch] Fix PR lto/48492 (partially)

2011-04-17 Thread Eric Botcazou
Hi,

this (partially) fixes the LTO bootstrap failure with Ada enabled on the 
mainline.  The problem is an ICE whose origin is as follows:
  1. during LTO stage, copy-prop eliminates a DECL_IN_CONSTANT_POOL variable
from the GIMPLE IR, except for a reference in a GIMPLE_DEBUG statement
(ADDR_EXPR of the variable is the debug value),
  2. the variable is streamed out in the object file, as well as its associated
varpool entry,
  3. during WPA stage, both are streamed in, then reachability analysis is run
and computes that the variable is unreachable so removes the varpool entry.
As a consequence, when the variable is streamed out again, its initializer gets
replaced with ERROR_MARK.
  4. during LTRANS stage, the variable is again streamed in, with ERROR_MARK as
initializer and this blows up when the GIMPLE_DEBUG statement is processed.

So the root problem is in the reachability analysis during the WPA stage but, 
as pointed out by Jakub, this is so by design for GIMPLE_DEBUG statements.

There are already cases for which expand_debug_expr simply gives up:

case VAR_DECL:
case PARM_DECL:
case FUNCTION_DECL:
case LABEL_DECL:
case CONST_DECL:
case RESULT_DECL:
  op0 = DECL_RTL_IF_SET (exp);

  /* This decl was probably optimized away.  */
  if (!op0)
{
  if (TREE_CODE (exp) != VAR_DECL
  || DECL_EXTERNAL (exp)
  || !TREE_STATIC (exp)
  || !DECL_NAME (exp)
  || DECL_HARD_REGISTER (exp)
  || DECL_IN_CONSTANT_POOL (exp)
  || mode == VOIDmode)
return NULL;

the attached patch adds DECL_IN_CONSTANT_POOL to them.  This makes it possible 
to eliminate the ICE (but LTO bootstrap still fails because of an unrelated 
ICE in dwarf2out.c later).  OK for the mainline?


2011-04-17  Eric Botcazou  

PR lto/48492
* cfgexpand.c (expand_debug_expr) : Return NULL for a
DECL_IN_CONSTANT_POOL without RTL.


-- 
Eric Botcazou
Index: cfgexpand.c
===
--- cfgexpand.c	(revision 172617)
+++ cfgexpand.c	(working copy)
@@ -2495,6 +2495,7 @@ expand_debug_expr (tree exp)
 	  || !TREE_STATIC (exp)
 	  || !DECL_NAME (exp)
 	  || DECL_HARD_REGISTER (exp)
+	  || DECL_IN_CONSTANT_POOL (exp)
 	  || mode == VOIDmode)
 	return NULL;
 


Re: [SPARC] Hookize REGISTER_MOVE_COST

2011-04-17 Thread Eric Botcazou
> * config/sparc/sparc.h (GENERAL_OR_I64, REGISTER_MOVE_COST):
> Remove. * config/sparc/sparc.c (TARGET_REGISTER_MOVE_COST): Define.
> (general_or_i64_p, sparc_register_move_cost): New function.

OK modulo:

> @@ -9124,6 +9128,36 @@
>  }
>  }
>
> +static inline bool
> +general_or_i64_p (reg_class_t rclass)
> +{
> +  return (rclass == GENERAL_REGS || rclass == I64_REGS);
> +}

Missing "/* Return true if CLASS is either GENERAL_REGS or I64_REGS.  */"

> +/* Implement TARGET_REGISTER_MOVE_COST.  */
> +
> +static int
> +sparc_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
> + reg_class_t from, reg_class_t to)
> +{
> +  if ((FP_REG_CLASS_P (from) && general_or_i64_p (to))
> +  || (general_or_i64_p (from) && FP_REG_CLASS_P (to))
> +  || from == FPCC_REGS
> +  || to == FPCC_REGS)
> +{
> +  if (sparc_cpu == PROCESSOR_ULTRASPARC
> + || sparc_cpu == PROCESSOR_ULTRASPARC3
> + || sparc_cpu == PROCESSOR_NIAGARA
> + || sparc_cpu == PROCESSOR_NIAGARA2)
> +   return 12;
> +  else
> +   return 6;
> +}
> +
> +  return 2;
> +}

Superflous "else".  No need to retest.

-- 
Eric Botcazou


[Ada] Remove useless flags for gnatbind link

2011-04-17 Thread Eric Botcazou
This removes the bunch of -Wxxx flags that are currently passed to link 
gnatbind, by replacing $(ALL_CFLAGS) with $(CFLAGS) as is done for gnat1.
This (marginally) helps during LTO bootstrap.

Tested on i586-suse-linux, applied on the mainline and 4.6 branch.


2011-04-17  Eric Botcazou  

* gcc-interface/Make-lang.in (gnatbind): Replace $(ALL_CFLAGS) with
$(CFLAGS) on the link line.


-- 
Eric Botcazou
Index: gcc-interface/Make-lang.in
===
--- gcc-interface/Make-lang.in	(revision 172617)
+++ gcc-interface/Make-lang.in	(working copy)
@@ -485,7 +485,7 @@ gnat1$(exeext): $(TARGET_ADA_SRCS) $(GNA
 	$(RM) stamp-gnatlib2-rts stamp-tools
 
 gnatbind$(exeext): ada/b_gnatb.o $(CONFIG_H) $(GNATBIND_OBJS)
-	$(GCC_LINK) -o $@ ada/b_gnatb.o $(GNATBIND_OBJS) $(ALL_CFLAGS) $(LIBS) $(SYSLIBS)
+	$(GCC_LINK) -o $@ ada/b_gnatb.o $(GNATBIND_OBJS) $(LIBS) $(SYSLIBS) $(CFLAGS)
 
 # use cross-gcc
 gnat-cross: force


Re: [Patch, Fortran] PR 18918 - implement coarray's IMAGE_INDEX for nonconstant bounds

2011-04-17 Thread Mikael Morin
On Saturday 16 April 2011 22:25:19 Tobias Burnus wrote:
> In simplify.c, currently an error is returned if the cobounds are
> exceeded. One should probably downgrade those to warnings.
OK with that change.
As far as I know, it is (weird but) valid to use the image_index result for 
something completely unrelated to image indexing (thus not involving any sort 
of invalid coarray access). Then we should not prevent compilation, and return 
zero as mandated by the standard.

Thanks
Mikael



Re: Patch to align local PowerPC AltiVec arrays

2011-04-17 Thread Wellander, Pat - Code Sourcery
Actually, after reinvestigating this, "AFAIK, the vectorizer is smart
enough to automagically increase alignment on global arrays" so is not
of concern to Freescale. Also, Changing the alignment of arrays within
structs would change the field layout and would violate the ABI and be
incompatible with other non-AltiVec hardware.

Pat

On 3/21/2011 3:26 PM, Andrew Pinski wrote:
> On Mon, Mar 21, 2011 at 3:25 PM, Andrew Pinski  wrote:
>> On Mon, Mar 21, 2011 at 3:21 PM, Wellander, Pat - Code Sourcery
>>  wrote:
>>> The stack is aligned at 16 bytes with AltiVec (if not, this patch would not
>>> make sense).
>>> Also, all char arrays are aligned at 16 bytes.
>>> This patch causes other local stack arrays >= 16 bytes to be aligned at 16
>>> bytes in order to
>>> make more effective use of AltiVec instructions.
>>
>> Then why not change it so not just auto arrays are aligned at 16
>> bytes, just like char arrays?
> 
> Also what about structs containing just arrays?  Should you have those
> aligned too?
> 
> -- Pinski


Re: [v3] libstdc++/48635

2011-04-17 Thread Paolo Carlini

... applied mainline and branch this follow-up, see audit-trail for details.

Tested x86_64-linux.

Paolo.

PS: Jon, looks like we are missing some constraining in the templated 
constructor and and assignment operator of the unique_ptr specialization?




2011-04-17  Daniel Krugler  
Paolo Carlini  

PR libstdc++/48635 (again)
* include/bits/unique_ptr.h (unique_ptr<>::unique_ptr(unique_ptr<>&&),
unique_ptr<_Tp[]>::unique_ptr(unique_ptr<>&&),
unique_ptr<>::operator=(unique_ptr<>&&),
unique_ptr<_Tp[]>::operator=(unique_ptr<>&&)): Use forward<_Ep>, not
forward<_Dp>, to forward the deleter.
* testsuite/20_util/unique_ptr/assign/48635_neg.cc: New.
Index: include/bits/unique_ptr.h
===
--- include/bits/unique_ptr.h   (revision 172617)
+++ include/bits/unique_ptr.h   (working copy)
@@ -153,7 +153,7 @@
   && std::is_convertible<_Ep, _Dp>::value))>
 ::type>
unique_ptr(unique_ptr<_Up, _Ep>&& __u)
-   : _M_t(__u.release(), std::forward(__u.get_deleter()))
+   : _M_t(__u.release(), std::forward<_Ep>(__u.get_deleter()))
{ }
 
 #if _GLIBCXX_USE_DEPRECATED
@@ -186,7 +186,7 @@
operator=(unique_ptr<_Up, _Ep>&& __u)
{
  reset(__u.release());
- get_deleter() = std::forward(__u.get_deleter());
+ get_deleter() = std::forward<_Ep>(__u.get_deleter());
  return *this;
}
 
@@ -306,7 +306,7 @@
 
   template
unique_ptr(unique_ptr<_Up, _Ep>&& __u)
-   : _M_t(__u.release(), std::forward(__u.get_deleter()))
+   : _M_t(__u.release(), std::forward<_Ep>(__u.get_deleter()))
{ }
 
   // Destructor.
@@ -326,7 +326,7 @@
operator=(unique_ptr<_Up, _Ep>&& __u)
{
  reset(__u.release());
- get_deleter() = std::forward(__u.get_deleter());
+ get_deleter() = std::forward<_Ep>(__u.get_deleter());
  return *this;
}
 
Index: testsuite/20_util/unique_ptr/assign/48635_neg.cc
===
--- testsuite/20_util/unique_ptr/assign/48635_neg.cc(revision 0)
+++ testsuite/20_util/unique_ptr/assign/48635_neg.cc(revision 0)
@@ -0,0 +1,50 @@
+// { dg-options "-std=gnu++0x" }
+// { dg-do compile }
+
+// Copyright (C) 2011 Free Software Foundation
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License
+// along with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+struct D;
+
+struct B
+{
+ B& operator=(D&) = delete; // { dg-error "declared here" }
+
+ template
+   void operator()(T*) const {}
+};
+
+struct D : B { };
+
+// libstdc++/48635
+void f()
+{
+  B b;
+  D d;
+
+  std::unique_ptr ub(nullptr, b);
+  std::unique_ptr ud(nullptr, d);
+  ub = std::move(ud);
+// { dg-error "use of deleted function" "" { target *-*-* } 189 }
+
+  std::unique_ptr uba(nullptr, b);
+  std::unique_ptr uda(nullptr, d);
+  uba = std::move(uda);
+// { dg-error "use of deleted function" "" { target *-*-* } 329 }
+}


Re: FDO usability patch -- correct insane profile data

2011-04-17 Thread Jan Hubicka
> Adding assertion sounds good to me -- the only problem is that problem
> like this hard to be triggered during testing, thus making assertion
> less useful as it can be. Is it better to add the assertion with
> checking is enabled while still keeping the correction code?
Hi,
since we speak about code that has minimal to none testing coverage in our
automates testers (we don't really test anything multithreaded or with 
mismatching
profiles), I don't think gcc_checking_assert would make any difference.

Since we don't really want the future bug to stay around unnoticed (since it
would resolut in silent misupdates of profiles), we are only shooting to make
analysis easier so next time someone trips around the scenario he won't see
symptomatic message from cgraph verifier.
So I would go for normal assert and comment about nature of the bug.

Honza
> 
> Thanks,
> 
> David
> 
> On Sun, Apr 17, 2011 at 11:53 AM, Jan Hubicka  wrote:
> > Hi,
> > hmm, looks we both run into same issue and developed different fixes.
> >>
> >> Good point. The change above was added based on 4.4.3 where the sum
> >> computation has no capping like above. Yes, with the current capping,
> >> the negative value won't result. However, it is still good to guard
> >> the scale independent of the implementation of ipcp_compute_node_scale
> >> -- it may change and break silently (the comment of the function says
> >> the implementation is wrong...)
> >
> > Well, if we want to add check in anticipation that we will introduce bug few
> > lines up, I think we could just add gcc_assert (scale < BASE); with an 
> > comment
> > explaining that ipcp_compute_node_scale must give results in range even when
> > profile is not flow consistent.
> >
> > Since we need to propagate the scales across callgraph (at least for the
> > correct implementation of the function), masking bug in 
> > ipcp_compute_node_scale
> > would make us to propagate the nonsenses further and silently producing
> > unnecesarily aplified insanity.
> >
> > I would pre-approve patch reverting the current change and adding the assert
> > with comment.
> >
> > Honza
> >>
> >> Thanks,
> >>
> >> David
> >>
> >>
> >> >
> >> > Honza
> >> >
> >


Re: Improve stack layout heuristic.

2011-04-17 Thread Easwaran Raman
On Sun, Apr 17, 2011 at 1:39 PM, Steven Bosscher  wrote:
> On Sun, Apr 17, 2011 at 9:39 PM, Easwaran Raman  wrote:
>> @@ -372,8 +366,9 @@
>>                 to elements will conflict.  In case of unions we have
>>                 to be careful as type based aliasing rules may say
>>                 access to the same memory does not conflict.  So play
>> -                safe and add a conflict in this case.  */
>> -             || contains_union)
>> +                safe and add a conflict in this case when -fstrict-aliasing
>> +                 is used.  */
>> +             || (contains_union && flag_strict_aliasing))
>>            add_stack_var_conflict (i, j);
>>        }
>>     }
>
> Are you sure this change is safe? See http://gcc.gnu.org/PR25654
>
> Ciao!
> Steven
>

I tried the test case in PR25654 and with my patch and on -O2
-fno-strict-aliasing  it puts the two variables in the same partition
and the test passes. My understanding of that issue is that with
-fstrict-aliasing, the short * and int * are assumed to point to
different locations and hence they can't share stack slots, but with
-fno-strict-aliasing that assumption is not valid.

Thanks,
Easwaran


Re: Improve stack layout heuristic.

2011-04-17 Thread Steven Bosscher
On Sun, Apr 17, 2011 at 9:39 PM, Easwaran Raman  wrote:
> @@ -372,8 +366,9 @@
>                 to elements will conflict.  In case of unions we have
>                 to be careful as type based aliasing rules may say
>                 access to the same memory does not conflict.  So play
> -                safe and add a conflict in this case.  */
> -             || contains_union)
> +                safe and add a conflict in this case when -fstrict-aliasing
> +                 is used.  */
> +             || (contains_union && flag_strict_aliasing))
>            add_stack_var_conflict (i, j);
>        }
>     }

Are you sure this change is safe? See http://gcc.gnu.org/PR25654

Ciao!
Steven


Re: [committed] Fix handling of global registers in MIPS prologues

2011-04-17 Thread Mikael Pettersson
Richard Sandiford writes:
 > PR 45074 showed up a rather embarrassing oversight in the MIPS backend:
 > global registers were still being treated as call-saved.
 > 
 > Fixed with the attached patch.  Tested on mips64-linux-gnu and applied.
 > 
 > Richard
 > 
 > 
 > gcc/
 >  * config/mips/mips.c (mips_cfun_call_saved_reg_p): Handle global
 >  registers.

This bug is also PR target/43700, a regression since gcc-4.4.
Do you want to backport the fix, or perhaps just mark PR43700
as fixed in 4.7.0?

/Mikael


Re: Remove vtable_method field in cgraph_node

2011-04-17 Thread Richard Guenther
On Sun, 17 Apr 2011, Jan Hubicka wrote:

> Hi,
> this patch drops vtable_method filed. I never understood what it is about but 
> reading
> PR20991 I am convinced that we hit the same problem with work on new 
> devirutalization code.
> I implemented what Mark describe as "ideal solution" there - i.e. teaching 
> cgraph that
> virtual functions might become reachable until after inlining.  Since we 
> still can
> devirutalize in late compilation (that is pretty much poinless but anyway), 
> we no
> ahve can_refer_decl_in_current_unit_p that tests if the function has been 
> already thrown
> away.  Perhaps we might apply there the observation about vtable being output 
> somewhere,
> but I do not think it is safe: if vtable is COMDAT, we can pretty much also 
> optimize
> all references to it out in all compilation unit and up not outputting it.
> When vtable is not COMDAT, the methods won't be either and this trick will 
> not apply.
> 
> Consequently I am dropping the flag. This is very trivial based on observation
> that cp_fold_obj_type_ref, the only setter of the flag, is now dead. Plus the
> varasm code is no-longer executed at the time of IPA optimizations that the 
> original
> patch was fixing.
> 
> Martin,
> can you please look into why cp_fold_obj_type_ref is no longer used and if 
> possible
> get rid of it?
> 
> Richi,
> compiling the original testcase:
> // PR middle-end/20991
> // { dg-options "-O2" }
> // { dg-do compile }
> 
> struct S
> {
>   virtual inline int foo () const;
>   virtual inline bool bar () const;
>   virtual int baz (int) const;
> };
> 
> inline int S::foo () const
> {
>   return 1;
> }
> 
> inline bool S::bar () const
> {
>   return foo () == 0;
> }
> 
> void A ()
> {
>   S s;
>   if (s.bar ())
> s.foo ();
> }
> 
> void B ()
> {
>   S s;
>   if (s.bar ())
> s.foo ();
> }
> 
> we end up with bit amusing code:
> void B() ()
> { 
>   int D.2147;
>   struct S s;
> 
> :
>   s._vptr.S = &_ZTV1S[2];
>   D.2147_8 = OBJ_TYPE_REF(foo;&s->0) (&s);
>   return;
> 
> }
> 
> notice that OBJ_TYPE_REF is useless since foo got devitualized and it holds 
> alive
> the useless s._vptr.S = &_ZTV1S[2].  We want to drop OBJ_TYPE_REF at time we 
> fold
> first argument to constant.

At some point we dropped all OBJ_TYPE_REF with direct fn but that lead to
issues.  I'll try to dig up what that was.

Richard.

> a.C.027t.fre1 dump reads:
> void B() ()
> {
>   int (*__vtbl_ptr_type) () * D.2149;
>   int (*__vtbl_ptr_type) () D.2148;
>   int D.2147;
>   bool D.2146;
>   struct S s;
> 
> :
>   s._vptr.S = &_ZTV1S[2];
>   D.2149_6 = &_ZTV1S[2];
>   D.2148_7 = foo;
>   D.2147_8 = OBJ_TYPE_REF(D.2148_7;&s->0) (&s);
>   D.2146_9 = D.2147_8 == 0;
>   D.2146_12 = D.2146_9;
>   D.2146_1 = D.2146_12;
>   D.2146_2 = D.2146_1;
>   return;
> 
> }
> 
> already I would expect FRE to replace D.2148_7 by foo.  Later this is done by 
> copyprop1
> 
> Folding statement: D.2147_8 = OBJ_TYPE_REF(D.2148_7;&s->0) (&s);
> Folded into: D.2147_8 = OBJ_TYPE_REF(foo;&s->0) (&s);
> 
> void B() ()
> {
>   int (*__vtbl_ptr_type) () * D.2149;
>   int (*__vtbl_ptr_type) () D.2148;
>   int D.2147;
>   bool D.2146;
>   struct S s;
> 
> :
>   s._vptr.S = &_ZTV1S[2];
>   D.2147_8 = OBJ_TYPE_REF(foo;&s->0) (&s);
>   return;
> 
> }
> 
> When copyprop chose to do constant propagation for whatever reason, it 
> probably should
> do the folding, too. Either in substitute_and_fold or via its own 
> ccp_fold_stmt.  I am sure
> you know the best alternatives better than I  do ;)
> 
> Honza
> 
>   * cgraph.c (cgraph_clone_node): Do not handle vtable_method
>   * cgraph.h (struct cgraph_local_info): Drop vtable_method.
>   * cgraphunit.c (cgraph_copy_node_for_versioning): Drop vtable_method.
>   * lto-cgraph.c (lto_output_node, input_overwrite_node): Drop vtable 
> method.
>   * gimple-fold.c (can_refer_decl_in_current_unit_p): Mention PR20991 in
>   gimple-fold.c
>   * varasm.c (mark_decl_referenced): Drop vtable_method handling code.
>   * cp/class.c (cp_fold_obj_type_ref): Drop vtable_method.
> Index: cgraph.c
> ===
> *** cgraph.c  (revision 172608)
> --- cgraph.c  (working copy)
> *** cgraph_clone_node (struct cgraph_node *n
> *** 2161,2167 
> new_node->local = n->local;
> new_node->local.externally_visible = false;
> new_node->local.local = true;
> -   new_node->local.vtable_method = false;
> new_node->global = n->global;
> new_node->rtl = n->rtl;
> new_node->count = count;
> --- 2161,2166 
> Index: cgraph.h
> ===
> *** cgraph.h  (revision 172608)
> --- cgraph.h  (working copy)
> *** struct GTY(()) cgraph_local_info {
> *** 95,104 
> /* True when the function has been originally extern inline, but it is
>redefined now.  */
> unsigned redefined_extern_inline : 1;
> - 
> -   /* True if the function is g

[SPARC] Hookize REGISTER_MOVE_COST

2011-04-17 Thread Anatoly Sokolov
 Hello.

  This patch removes obsolete REGISTER_MOVE_COST macro from SPARC back end in 
the GCC and introduces equivalent TARGET_REGISTER_MOVE_COST target hooks.

   Bootstrapped and regression tested on sparc64-unknown-linux-gnu.

  OK to install?

* config/sparc/sparc.h (GENERAL_OR_I64, REGISTER_MOVE_COST): Remove.
* config/sparc/sparc.c (TARGET_REGISTER_MOVE_COST): Define.
(general_or_i64_p, sparc_register_move_cost): New function.

Index: gcc/gcc/config/sparc/sparc.c
===
--- gcc/gcc/config/sparc/sparc.c(revision 172245)
+++ gcc/gcc/config/sparc/sparc.c(working copy)
@@ -422,6 +422,8 @@
 static rtx sparc_tls_got (void);
 static const char *get_some_local_dynamic_name (void);
 static int get_some_local_dynamic_name_1 (rtx *, void *);
+static int sparc_register_move_cost (enum machine_mode,
+reg_class_t, reg_class_t);
 static bool sparc_rtx_costs (rtx, int, int, int *, bool);
 static rtx sparc_function_value (const_tree, const_tree, bool);
 static rtx sparc_libcall_value (enum machine_mode, const_rtx);
@@ -560,6 +562,8 @@
 #define TARGET_RTX_COSTS sparc_rtx_costs
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST hook_int_rtx_bool_0
+#undef TARGET_REGISTER_MOVE_COST
+#define TARGET_REGISTER_MOVE_COST sparc_register_move_cost
 
 #undef TARGET_PROMOTE_FUNCTION_MODE
 #define TARGET_PROMOTE_FUNCTION_MODE sparc_promote_function_mode
@@ -9124,6 +9128,36 @@
 }
 }
 
+static inline bool
+general_or_i64_p (reg_class_t rclass)
+{
+  return (rclass == GENERAL_REGS || rclass == I64_REGS);
+}
+
+/* Implement TARGET_REGISTER_MOVE_COST.  */
+
+static int
+sparc_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
+ reg_class_t from, reg_class_t to)
+{
+  if ((FP_REG_CLASS_P (from) && general_or_i64_p (to))
+  || (general_or_i64_p (from) && FP_REG_CLASS_P (to))
+  || from == FPCC_REGS
+  || to == FPCC_REGS)  
+{
+  if (sparc_cpu == PROCESSOR_ULTRASPARC
+ || sparc_cpu == PROCESSOR_ULTRASPARC3
+ || sparc_cpu == PROCESSOR_NIAGARA
+ || sparc_cpu == PROCESSOR_NIAGARA2)
+   return 12;
+  else
+   return 6;
+}
+
+  return 2;
+}
+
+
 /* Emit the sequence of insns SEQ while preserving the registers REG and REG2.
This is achieved by means of a manual dynamic stack space allocation in
the current frame.  We make the assumption that SEQ doesn't contain any
Index: gcc/gcc/config/sparc/sparc.h
===
--- gcc/gcc/config/sparc/sparc.h(revision 172245)
+++ gcc/gcc/config/sparc/sparc.h(working copy)
@@ -1715,18 +1715,6 @@
 #define DITF_CONVERSION_LIBFUNCS   0
 #define SUN_INTEGER_MULTIPLY_640
 
-/* Compute extra cost of moving data between one register class
-   and another.  */
-#define GENERAL_OR_I64(C) ((C) == GENERAL_REGS || (C) == I64_REGS)
-#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2)   \
-  (((FP_REG_CLASS_P (CLASS1) && GENERAL_OR_I64 (CLASS2)) \
-|| (GENERAL_OR_I64 (CLASS1) && FP_REG_CLASS_P (CLASS2)) \
-|| (CLASS1) == FPCC_REGS || (CLASS2) == FPCC_REGS) \
-   ? ((sparc_cpu == PROCESSOR_ULTRASPARC \
-   || sparc_cpu == PROCESSOR_ULTRASPARC3 \
-   || sparc_cpu == PROCESSOR_NIAGARA \
-   || sparc_cpu == PROCESSOR_NIAGARA2) ? 12 : 6) : 2)
-
 /* Provide the cost of a branch.  For pre-v9 processors we use
a value of 3 to take into account the potential annulling of
the delay slot (which ends up being a bubble in the pipeline slot)


Anatoly.



Improve stack layout heuristic.

2011-04-17 Thread Easwaran Raman
Hi,
 This patch impoves the heuristic used in assigning stack location to
stack variables.  Currently, if there are 3 variables A, B and C with
their sizes in increasing order and A and C have a conflict, it will
put A and B in a partition and C in a separate partition with a total
frame size of sizeof(B) + sizeof(C).  This patch puts B and C in the
same partition and A in a separate partition, with a total size of
sizeof(A) + sizeof(C).  The other improvement is when
-fno-strict-aliasing is used. In that case, it is ok to share stack
slot for a variable whose type transitively contains a union. The
other change in this patch removes the field offset in struct
stack_var. Right now, the field is always 0 due to the way it is set
in partition_stack_vars.

Bootstraps and no test regressions on x86_64-unknown-linux. OK for trunk?

-Easwaran


2011-04-17  Easwaran Raman  

* gcc/testsuite/gcc.dg/stack-layout-1.c: New test.
* gcc/testsuite/gcc.dg/stack-layout-2.c: New test.
* gcc/cfgexpand.c (struct stack_var): Remove OFFSET...
(add_stack_var): ...and its reference here...
(expand_stack_vars): ...and here.
(add_alias_set_conflicts): Add conflict to union variables
only when strict aliasing is used.
(stack_var_cmp): Sort by descending order of size.
(partition_stack_vars): Change heuristic.
(union_stack_vars): Fix to refelect changes in
partition_stack_vars.
(dump_stack_var_partition): Add newline after each partition.
Index: gcc/testsuite/gcc.dg/stack-layout-1.c
===
--- gcc/testsuite/gcc.dg/stack-layout-1.c   (revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-1.c   (revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-strict-aliasing -fdump-rtl-expand" } */
+union U {
+  int a;
+  float b;
+};
+struct A {
+  union U u1;
+  char a[100];
+};
+void bar (struct A *);
+void foo ()
+  {
+{
+  struct A a;
+  bar (&a);
+}
+{
+  struct A a;
+  bar (&a);
+}
+  }
+
+/* { dg-final { scan-rtl-dump-times "Partition" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/stack-layout-2.c
===
--- gcc/testsuite/gcc.dg/stack-layout-2.c   (revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-2.c   (revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+void bar( char *);
+int foo()
+{
+  int i=0;
+  {
+char a[8000];
+bar(a);
+i += a[0];
+  }
+  {
+char a[8192];
+char b[32];
+bar(a);
+i += a[0];
+bar(b);
+i += a[0];
+  }
+  return i;
+}
+/* { dg-final { scan-rtl-dump "size 8192" "expand" } } */
+/* { dg-final { scan-rtl-dump "size 32" "expand" } } */
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 171954)
+++ gcc/cfgexpand.c (working copy)
@@ -158,11 +158,6 @@
   /* The Variable.  */
   tree decl;

-  /* The offset of the variable.  During partitioning, this is the
- offset relative to the partition.  After partitioning, this
- is relative to the stack frame.  */
-  HOST_WIDE_INT offset;
-
   /* Initially, the size of the variable.  Later, the size of the partition,
  if this variable becomes it's partition's representative.  */
   HOST_WIDE_INT size;
@@ -267,7 +262,6 @@
   v = &stack_vars[stack_vars_num];

   v->decl = decl;
-  v->offset = 0;
   v->size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1);
   /* Ensure that all variables have size, so that &a != &b for any two
  variables that are simultaneously live.  */
@@ -372,8 +366,9 @@
 to elements will conflict.  In case of unions we have
 to be careful as type based aliasing rules may say
 access to the same memory does not conflict.  So play
-safe and add a conflict in this case.  */
- || contains_union)
+safe and add a conflict in this case when -fstrict-aliasing
+ is used.  */
+ || (contains_union && flag_strict_aliasing))
add_stack_var_conflict (i, j);
}
 }
@@ -403,9 +398,9 @@
 return (int)largeb - (int)largea;

   /* Secondary compare on size, decreasing  */
-  if (sizea < sizeb)
-return -1;
   if (sizea > sizeb)
+return -1;
+  if (sizea < sizeb)
 return 1;

   /* Tertiary compare on true alignment, decreasing.  */
@@ -564,28 +559,18 @@

 /* A subroutine of partition_stack_vars.  The UNION portion of a UNION/FIND
partitioning algorithm.  Partitions A and B are known to be non-conflicting.
-   Merge them into a single partition A.
+   Merge them into a single partition A.  */

-   At the same time, add OFFSET to all variables in partition B.  At the end
-   of the partitioning process we've have a nice 

Re: FDO usability patch -- correct insane profile data

2011-04-17 Thread Xinliang David Li
Adding assertion sounds good to me -- the only problem is that problem
like this hard to be triggered during testing, thus making assertion
less useful as it can be. Is it better to add the assertion with
checking is enabled while still keeping the correction code?

Thanks,

David

On Sun, Apr 17, 2011 at 11:53 AM, Jan Hubicka  wrote:
> Hi,
> hmm, looks we both run into same issue and developed different fixes.
>>
>> Good point. The change above was added based on 4.4.3 where the sum
>> computation has no capping like above. Yes, with the current capping,
>> the negative value won't result. However, it is still good to guard
>> the scale independent of the implementation of ipcp_compute_node_scale
>> -- it may change and break silently (the comment of the function says
>> the implementation is wrong...)
>
> Well, if we want to add check in anticipation that we will introduce bug few
> lines up, I think we could just add gcc_assert (scale < BASE); with an comment
> explaining that ipcp_compute_node_scale must give results in range even when
> profile is not flow consistent.
>
> Since we need to propagate the scales across callgraph (at least for the
> correct implementation of the function), masking bug in 
> ipcp_compute_node_scale
> would make us to propagate the nonsenses further and silently producing
> unnecesarily aplified insanity.
>
> I would pre-approve patch reverting the current change and adding the assert
> with comment.
>
> Honza
>>
>> Thanks,
>>
>> David
>>
>>
>> >
>> > Honza
>> >
>


Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-17 Thread Denis Chertykov
2011/4/17 Denis Chertykov :
> 2011/4/15 Georg-Johann Lay :
>> Finally, I exposed alternative #3 of the insns to the register
>> allocator, because it is not possible to distinguish between
>> overlapping or non-overlapping regs, and #3 does not need a scratch.
>>
>> Ran C-testsuite with no regressions.
>
> Are you encountered any difference in code size ?

I'm ask about code size because the IRA pass isn't work with
`scratch:MODE' at all.
This lead to bad/wrong register allocation in IRA pass.
The reload pass will correct such a wrong allocation, but reload can't
generate optimal code. (reload generate correct code).
Because of that, may be you right and may be better to have
(clobber (match_operand)) instead of (clobber (match_scratch...)).

Denis.


Re: FDO usability patch -- correct insane profile data

2011-04-17 Thread Jan Hubicka
Hi,
hmm, looks we both run into same issue and developed different fixes.
> 
> Good point. The change above was added based on 4.4.3 where the sum
> computation has no capping like above. Yes, with the current capping,
> the negative value won't result. However, it is still good to guard
> the scale independent of the implementation of ipcp_compute_node_scale
> -- it may change and break silently (the comment of the function says
> the implementation is wrong...)

Well, if we want to add check in anticipation that we will introduce bug few
lines up, I think we could just add gcc_assert (scale < BASE); with an comment
explaining that ipcp_compute_node_scale must give results in range even when
profile is not flow consistent.

Since we need to propagate the scales across callgraph (at least for the
correct implementation of the function), masking bug in ipcp_compute_node_scale
would make us to propagate the nonsenses further and silently producing
unnecesarily aplified insanity.

I would pre-approve patch reverting the current change and adding the assert
with comment.

Honza
> 
> Thanks,
> 
> David
> 
> 
> >
> > Honza
> >


Re: FDO usability patch -- cfg and lineno checksum

2011-04-17 Thread Xinliang David Li
On Sun, Apr 17, 2011 at 8:36 AM, Jan Hubicka  wrote:
>> Hi,  in current FDO implementation, the source file version used in
>> profile-generate needs to strictly match the version used in
>> profile-use -- simple formating changes will invalidate the profile
>> data (use of it will lead to compiler error or ICE). There are two
>> main problems that lead to the weakness:
>>
>> 1) the function checksum is computed based on line number and number
>> of basic blocks -- this means any line number change will invalidate
>> the profile data. In fact, line number should play very minimal role
>> in profile matching decision. Another problem is that number of basic
>> blocks is also not a good indicator whether CFG matches or not.
>> 2) cgraph's pid is used as the key to find the matching profile data
>> for a function. If there is any new function added, or if the order of
>> the functions changes, the profile data is invalidated.
>>
>> The attached is a patch from google that improves this.
>> 1) function checksum is split into two parts: lineno checksum and cfg 
>> checksum
>> 2) function assembler name is used in profile hashing.
>
> Well, the overall idea was to make profile intolerant to pretty much any 
> source
> level change, since you never know if even when the CFG shape match, the 
> actual
> probabiliies did not changed.
> I however see that during development it is not bad to make compiler grok
> the profile as long as it seems consistent.
>

Make FDO usable in development cycle is one usage of the better
matching tolerance. There is another bigger use case which we are
looking at -- using sample profiling data from production binary
collected by the profile server to guide optimization. Of course there
will be lots of other problems (e.g., compiler optimizations can
drastically change control flow, etc.) before this can become reality.


> I did not looked in detail into the implementation yet, but it seems
> sane at first glance.  However what about issuing a warning when line number
> information change telling user that profile might no longer be accurate
> and that he may want to remove it and train again?

Yes, this may be good to have.

Thanks,

David
>
> Honza
>


Re: [patch, libgfortran] PR48602 Invalid F conversion of G descriptor for values close to powers of 10

2011-04-17 Thread Steve Kargl
On Sun, Apr 17, 2011 at 10:14:37AM -0700, Jerry DeLisle wrote:
> Hi Folks,
> 
> This patch implements the adjustments required for the various rounding 
> modes as stated in the Fortran Standard F2008, paragraph 10.7.5.2.2.  In 
> the process of testing this patch we have found another bug related to D 
> and E editing which is now PR48651.  In the meantime, this regressions 
> tests fine on x86-64.
> 
> Test case is that provided in the PR.
> 
> OK for trunk?
> 

The patch looks ok to me.  But, if you were in the IRC
meeting, we could discuss it. :-)

-- 
Steve


Re: FDO usability patch -- correct insane profile data

2011-04-17 Thread Xinliang David Li
On Sun, Apr 17, 2011 at 8:29 AM, Jan Hubicka  wrote:
>> Hi please review the attached patch.
>>
>> Ok when bootstrap and test finish?
>>
>> Thanks,
>>
>> David
>>
>>
>>
>> 2011-04-07  Xinliang David Li  
>>
>>       * ipa-cp.c (ipcp_update_profiling): Correct
>>       negative scale factor due to insane profile data.
>
>> Index: ipa-cp.c
>> ===
>> --- ipa-cp.c  (revision 171917)
>> +++ ipa-cp.c  (working copy)
>> @@ -1113,6 +1113,29 @@ ipcp_update_profiling (void)
>>         scale = ipcp_get_node_scale (orig_node);
>>         node->count = orig_node->count * scale / REG_BR_PROB_BASE;
>>         scale_complement = REG_BR_PROB_BASE - scale;
>> +
>> +          /* Negative scale complement can result from insane profile data
>> +             in which the total incoming edge counts in this module is
>> +             larger than the callee's entry count. The insane profile data
>> +             usually gets generated due to the following reasons:
>> +
>> +             1) in multithreaded programs, when profile data is dumped
>> +             to gcda files in gcov_exit, some other threads are still 
>> running.
>> +             The profile counters are dumped in bottom up order (call 
>> graph).
>> +             The caller's BB counters may still be updated while the 
>> callee's
>> +             counter data is already saved to disk.
>> +
>> +             2) Comdat functions: comdat functions' profile data are not
>> +             allocated in comdat. When a comdat callee function gets inlined
>> +             at some callsites after instrumentation, and the remaining 
>> calls
>> +             to this function resolves to a comdat copy in another module,
>> +             the profile counters for this function are split. This can
>> +             result in sum of incoming edge counts from this module being
>> +             larger than callee instance's entry count.  */
>> +
>> +          if (scale_complement < 0 && flag_profile_correction)
>
> Please don't use flag_profile_correction in code like this. Even with 
> !flag_profile_correction
> the profile is sane only just at the time it is read. Compiler invalidate it 
> for various non-trivial
> reasons during the code transformation.  Think if inlining
> int t(int a)
> {
>  if (a<0)
>    return 1;
>  else return 2;
> }
> that is called once with constant -1 and once with constant 0.  Profile say
> that the branch has probability 1/2 but in one inline copy it has probablity 0
> and in other copy 1.  We simply inline with probability 1/2 and end up with
> insane profile in the caller.

Yes -- this is the insanity due to profile update without context
sensitivity -- different from the scenario that motivates the change
-- which is insanity caused by profile collection.  Dropping the flag
check is fine.


>
> I would suggest, for brevity, also drop the comment on why profile can be 
> insane. We have
> many places like this, even if it might look bit distracking at start, we 
> probably could
> document that somewhere in the internal documentation. Can't think of better 
> place to stick
> this.
>
> However the test seems bit symptomatic to me. It matches always when scale > 
> REG_BR_PROB_BASE
> and such scale is already nonsential (i.e. clone should not be more often 
> executed than its
> original).  It is computed in:
>
> /* Compute the proper scale for NODE.  It is the ratio between the number of
>   direct calls (represented on the incoming cgraph_edges) and sum of all
>   invocations of NODE (represented as count in cgraph_node).
>
>   FIXME: This code is wrong.  Since the callers can be also clones and
>   the clones are not scaled yet, the sums gets unrealistically high.
>   To properly compute the counts, we would need to do propagation across
>   callgraph (as external call to A might imply call to non-cloned B
>   if A's clone calls cloned B).  */
> static void
> ipcp_compute_node_scale (struct cgraph_node *node)
> {
>  gcov_type sum;
>  struct cgraph_edge *cs;
>
>  sum = 0;
>  /* Compute sum of all counts of callers. */
>  for (cs = node->callers; cs != NULL; cs = cs->next_caller)
>    sum += cs->count;
>  /* Work around the unrealistically high sum problem.  We just don't want
>     the non-cloned body to have negative or very low frequency.  Since
>     majority of execution time will be spent in clones anyway, this should
>     give good enough profile.  */
>  if (sum > node->count * 9 / 10)
>    sum = node->count * 9 / 10;
>  if (node->count == 0)
>    ipcp_set_node_scale (node, 0);
>  else
>    ipcp_set_node_scale (node, sum * REG_BR_PROB_BASE / node->count);
> }
>
> We already test that sum is at most 9/10th of node count and then the scale is
> computed as sum * REG_BR_PROB_BASE / node->count so the scale should not
> exceed REG_BR_PROB_BASE * 9 / 10.
>
> How it possibly can end up being greater than REG_BR_PROB_BASE at the time it 
> is used?


Good point. The change above was adde

[committed] Fix handling of global registers in MIPS prologues

2011-04-17 Thread Richard Sandiford
PR 45074 showed up a rather embarrassing oversight in the MIPS backend:
global registers were still being treated as call-saved.

Fixed with the attached patch.  Tested on mips64-linux-gnu and applied.

Richard


gcc/
* config/mips/mips.c (mips_cfun_call_saved_reg_p): Handle global
registers.

gcc/testsuite/
* gcc.target/mips/reg-var-1.c: New test.

Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  2011-04-16 09:02:10.0 +0100
+++ gcc/config/mips/mips.c  2011-04-17 11:14:53.0 +0100
@@ -9097,6 +9097,11 @@ mips_interrupt_extra_call_saved_reg_p (u
 static bool
 mips_cfun_call_saved_reg_p (unsigned int regno)
 {
+  /* If the user makes an ordinarily-call-saved register global,
+ that register is no longer call-saved.  */
+  if (global_regs[regno])
+return false;
+
   /* Interrupt handlers need to save extra registers.  */
   if (cfun->machine->interrupt_handler_p
   && mips_interrupt_extra_call_saved_reg_p (regno))
Index: gcc/testsuite/gcc.target/mips/reg-var-1.c
===
--- /dev/null   2011-04-17 10:56:28.045573347 +0100
+++ gcc/testsuite/gcc.target/mips/reg-var-1.c   2011-04-17 11:22:39.0 
+0100
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+register int g asm ("$18");
+
+void __attribute__((noinline))
+test (void)
+{
+  g = g + 1;
+}
+
+int
+main (void)
+{
+  g = 2;
+  test ();
+  return g != 3;
+}


PR target/45074: Check targets of multi-word operations

2011-04-17 Thread Richard Sandiford
PR45074 is a case where expand_absneg_bit needs to decompose a
multi-word negation operation into separate word_mode operations.
The target of the operation is a global FP register variable.

For various reasons, CANNOT_CHANGE_MODE_CLASS forbids changes
from DFmode to SImode for FP registers, so a direct subreg is
not allowed.  This in turn causes:

  rtx targ_piece = operand_subword (target, i, 1, mode);

to return null, and we segfault on:

  if (temp != targ_piece)
emit_move_insn (targ_piece, temp);

This patch adds a multiword_target_p function that tests whether
a given target is suitable for this kind of word_mode decomposition.
I grepped for uses of operand_subword (as opposed to operand_subword_force),
and several use the same:

 if (target == 0 || some_other_stuff)
   target = gen_reg_rtx (mode);

 for (i = 0; i < nwords; i++)
   {
 rtx targ_piece = operand_subword (target, i, 1, mode);
 ...
   }

idiom.  The patch handles those in the same way.

Tested on mips64-linux-gnu.  OK to install?

Richard


gcc/
PR target/45074
* optabs.h (multiword_target_p): Declare.
* expmed.c (extract_bit_field_1): Check multiword_target_p when
doing multi-word operations.
* optabs.c (expand_binop): Likewise.
(expand_doubleword_bswap): Likewise.
(expand_absneg_bit): Likewise.
(expand_unop): Likewise.
(expand_copysign_bit): Likewise.
(multiword_target_p): New function.

gcc/testsuite/
PR target/45074
* gcc.target/mips/pr45074.c: New test.

Index: gcc/optabs.h
===
--- gcc/optabs.h2011-04-17 12:07:46.0 +0100
+++ gcc/optabs.h2011-04-17 12:08:50.0 +0100
@@ -1054,6 +1054,8 @@ create_integer_operand (struct expand_op
   create_expand_operand (op, EXPAND_INTEGER, GEN_INT (intval), VOIDmode, 
false);
 }
 
+extern bool multiword_target_p (rtx);
+
 extern bool maybe_legitimize_operands (enum insn_code icode,
   unsigned int opno, unsigned int nops,
   struct expand_operand *ops);
Index: gcc/expmed.c
===
--- gcc/expmed.c2011-04-17 12:07:46.0 +0100
+++ gcc/expmed.c2011-04-17 12:08:50.0 +0100
@@ -1342,7 +1342,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
   unsigned int nwords = (bitsize + (BITS_PER_WORD - 1)) / BITS_PER_WORD;
   unsigned int i;
 
-  if (target == 0 || !REG_P (target))
+  if (target == 0 || !REG_P (target) || !multiword_target_p (target))
target = gen_reg_rtx (mode);
 
   /* Indicate for flow that the entire target reg is being set.  */
Index: gcc/optabs.c
===
--- gcc/optabs.c2011-04-17 12:07:46.0 +0100
+++ gcc/optabs.c2011-04-17 18:11:58.0 +0100
@@ -1605,7 +1605,10 @@ expand_binop (enum machine_mode mode, op
 
  /* If TARGET is the same as one of the operands, the REG_EQUAL note
 won't be accurate, so use a new target.  */
- if (target == 0 || target == op0 || target == op1)
+ if (target == 0
+ || target == op0
+ || target == op1
+ || !multiword_target_p (target))
target = gen_reg_rtx (mode);
 
  start_sequence ();
@@ -1659,7 +1662,11 @@ expand_binop (enum machine_mode mode, op
 opportunities, and second because if target and op0 happen to be MEMs
 designating the same location, we would risk clobbering it too early
 in the code sequence we generate below.  */
-  if (target == 0 || target == op0 || target == op1 || ! REG_P (target))
+  if (target == 0
+ || target == op0
+ || target == op1
+ || !REG_P (target)
+ || !multiword_target_p (target))
target = gen_reg_rtx (mode);
 
   start_sequence ();
@@ -2481,7 +2488,7 @@ expand_doubleword_bswap (enum machine_mo
   t0 = expand_unop (word_mode, bswap_optab,
operand_subword_force (op, 1, mode), NULL_RTX, true);
 
-  if (target == 0)
+  if (target == 0 || !multiword_target_p (target))
 target = gen_reg_rtx (mode);
   if (REG_P (target))
 emit_clobber (target);
@@ -2724,7 +2731,9 @@ expand_absneg_bit (enum rtx_code code, e
   if (code == ABS)
 mask = double_int_not (mask);
 
-  if (target == 0 || target == op0)
+  if (target == 0
+  || target == op0
+  || (nwords > 1 && !multiword_target_p (target)))
 target = gen_reg_rtx (mode);
 
   if (nwords > 1)
@@ -2915,7 +2924,7 @@ expand_unop (enum machine_mode mode, opt
   int i;
   rtx insns;
 
-  if (target == 0 || target == op0)
+  if (target == 0 || target == op0 || !multiword_target_p (target))
target = gen_reg_rtx (mode);
 
   start_sequence (

[patch, libgfortran] PR48602 Invalid F conversion of G descriptor for values close to powers of 10

2011-04-17 Thread Jerry DeLisle

Hi Folks,

This patch implements the adjustments required for the various rounding modes as 
stated in the Fortran Standard F2008, paragraph 10.7.5.2.2.  In the process of 
testing this patch we have found another bug related to D and E editing which is 
now PR48651.  In the meantime, this regressions tests fine on x86-64.


Test case is that provided in the PR.

OK for trunk?

Regards,

Jerry

2011-04-17  Jerry DeLisle  

PR libgfortran/48602
* io/write_float.def (output_float_FMT_G): Use current rounding mode
to set the rounding parameters. (output_float): Skip rounding
if value is zero.
Index: write_float.def
===
--- write_float.def	(revision 172502)
+++ write_float.def	(working copy)
@@ -221,6 +221,8 @@ output_float (st_parameter_dt *dtp, const fnode *f
   internal_error (&dtp->common, "Unexpected format token");
 }
 
+  if (zero_flag)
+goto skip;
   /* Round the value.  The value being rounded is an unsigned magnitude.
  The ROUND_COMPATIBLE is rounding away from zero when there is a tie.  */
   switch (dtp->u.p.current_unit->round_status)
@@ -810,7 +812,8 @@ CALCULATE_EXP(16)
m >= 10**d-0.5  Ew.d[Ee]
 
notes: for Gw.d ,  n' ' means 4 blanks
-  for Gw.dEe, n' ' means e+2 blanks  */
+	  for Gw.dEe, n' ' means e+2 blanks
+	  for rounding modes adjustment, r, See Fortran F2008 10.7.5.2.2  */
 
 #define OUTPUT_FLOAT_FMT_G(x) \
 static void \
@@ -822,7 +825,7 @@ output_float_FMT_G_ ## x (st_parameter_dt *dtp, co
   int d = f->u.real.d;\
   int w = f->u.real.w;\
   fnode *newf;\
-  GFC_REAL_ ## x rexp_d;\
+  GFC_REAL_ ## x rexp_d, r = 0.5;\
   int low, high, mid;\
   int ubound, lbound;\
   char *p, pad = ' ';\
@@ -832,9 +835,25 @@ output_float_FMT_G_ ## x (st_parameter_dt *dtp, co
   save_scale_factor = dtp->u.p.scale_factor;\
   newf = (fnode *) get_mem (sizeof (fnode));\
 \
+  switch (dtp->u.p.current_unit->round_status)\
+{\
+  case ROUND_ZERO:\
+	r = sign_bit ? 0.0 : 1.0;\
+	break;\
+  case ROUND_UP:\
+	r = 1.0;\
+	break;\
+  case ROUND_DOWN:\
+	r = 0.0;\
+	break;\
+  default:\
+	break;\
+}\
+\
   rexp_d = calculate_exp_ ## x (-d);\
-  if ((m > 0.0 && m < 0.1 - 0.05 * rexp_d) || (rexp_d * (m + 0.5) >= 1.0) ||\
-  ((m == 0.0) && !(compile_options.allow_std & GFC_STD_F2003)))\
+  if ((m > 0.0 && ((m < 0.1 - 0.1 * r * rexp_d) || (rexp_d * (m + r) >= 1.0)))\
+  || ((m == 0.0) && !(compile_options.allow_std\
+			  & (GFC_STD_F2003 | GFC_STD_F2008\
 { \
   newf->format = FMT_E;\
   newf->u.real.w = w;\
@@ -855,7 +874,7 @@ output_float_FMT_G_ ## x (st_parameter_dt *dtp, co
   GFC_REAL_ ## x temp;\
   mid = (low + high) / 2;\
 \
-  temp = (calculate_exp_ ## x (mid - 1) * (1 - 0.5 * rexp_d));\
+  temp = (calculate_exp_ ## x (mid - 1) * (1 - r * rexp_d));\
 \
   if (m < temp)\
 { \
! { dg-do run }
! PE48602 Invalid F conversion of G descriptor for values close to powers of 10
! Test case provided by Thomas Henlich
program test_g0fr
use iso_fortran_env
implicit none
integer, parameter :: RT = REAL64

call check_all(0.0_RT, 15, 2, 0)
call check_all(0.991_RT, 15, 2, 0)
call check_all(0.995_RT, 15, 2, 0)
call check_all(0.996_RT, 15, 2, 0)
call check_all(0.999_RT, 15, 2, 0)
contains
subroutine check_all(val, w, d, e)
real(kind=RT), intent(in) :: val
integer, intent(in) :: w
integer, intent(in) :: d
integer, intent(in) :: e

call check_f_fmt(val, 'C', w, d, e)
call check_f_fmt(val, 'U', w, d, e)
call check_f_fmt(val, 'D', w, d, e)
end subroutine check_all

subroutine check_f_fmt(val, roundmode, w, d, e)
real(kind=RT), intent(in) :: val
character, intent(in) :: roundmode
integer, intent(in) :: w
integer, intent(in) :: d
integer, intent(in) :: e
character(len=80) :: fmt_f, fmt_g
character(len=80) :: s_f, s_g
real(kind=RT) :: mag, lower, upper
real(kind=RT) :: r
integer :: n, dec

mag = abs(val)
if (e == 0) then
n = 4
else
n = e + 2
end if
select case (roundmode)
case('U')
r = 1.0_RT
case('D')
r = 0.0_RT
case('C')
r = 0.5_RT
end select

if (mag == 0) then
write(fmt_f, "('R', a, ',F', i0, '.', i0, ',', i0, 'X')") 
roundmode, w - n, d - 1, n
else
do dec = d, 0, -1
lower = 10.0_RT ** (d - 1 - dec) - r * 10.0_RT ** (- dec - 1)
upper = 10.0_RT ** (d - dec) - r * 10.0_RT ** (- dec)
if (lower <= mag .and. mag < upper) then
write(fmt_f, "('R', a, ',F', i0, '.', i0, ',', i0, 'X')") 
roundmode, w - n, dec, n
exit
end if
   

Remove finalized_by_frontend

2011-04-17 Thread Jan Hubicka
Comitted as obvious.

Index: ChangeLog
===
--- ChangeLog   (revision 172613)
+++ ChangeLog   (working copy)
@@ -1,5 +1,12 @@
 2011-04-17  Jan Hubicka  
 
+   * cgrpah.h (struct cgraph_node): Remove finalized_by_frontend.
+   * cgrpahunit.c (cgraph_finalize_function): Do not set 
finalized_by_frontend.
+   * lto-cgraph.c (lto_output_node, input_overwrite_node): Do not stream
+   finalized_by_frontend.
+
+2011-04-17  Jan Hubicka  
+
* cgraph.c (cgraph_clone_node): Do not handle vtable_method
* cgraph.h (struct cgraph_local_info): Drop vtable_method.
* cgraphunit.c (cgraph_copy_node_for_versioning): Drop vtable_method.
Index: cgraph.h
===
--- cgraph.h(revision 172613)
+++ cgraph.h(working copy)
@@ -239,8 +239,6 @@ struct GTY((chain_next ("%h.next"), chai
   unsigned process : 1;
   /* Set for aliases once they got through assemble_alias.  */
   unsigned alias : 1;
-  /* Set for nodes that was constructed and finalized by frontend.  */
-  unsigned finalized_by_frontend : 1;
   /* Set for alias and thunk nodes, same_body points to the node they are alias
  of and they are linked through the next/previous pointers.  */
   unsigned same_body_alias : 1;
Index: cgraphunit.c
===
--- cgraphunit.c(revision 172613)
+++ cgraphunit.c(working copy)
@@ -352,7 +352,6 @@ cgraph_finalize_function (tree decl, boo
   notice_global_symbol (decl);
   node->local.finalized = true;
   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
-  node->finalized_by_frontend = true;
 
   if (cgraph_decide_is_function_needed (node, decl))
 cgraph_mark_needed_node (node);
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 172613)
+++ lto-cgraph.c(working copy)
@@ -502,7 +502,6 @@ lto_output_node (struct lto_simple_outpu
   bp_pack_value (&bp, node->lowered, 1);
   bp_pack_value (&bp, in_other_partition, 1);
   bp_pack_value (&bp, node->alias, 1);
-  bp_pack_value (&bp, node->finalized_by_frontend, 1);
   bp_pack_value (&bp, node->frequency, 2);
   bp_pack_value (&bp, node->only_called_at_startup, 1);
   bp_pack_value (&bp, node->only_called_at_exit, 1);
@@ -948,7 +947,6 @@ input_overwrite_node (struct lto_file_de
   TREE_STATIC (node->decl) = 0;
 }
   node->alias = bp_unpack_value (bp, 1);
-  node->finalized_by_frontend = bp_unpack_value (bp, 1);
   node->frequency = (enum node_frequency)bp_unpack_value (bp, 2);
   node->only_called_at_startup = bp_unpack_value (bp, 1);
   node->only_called_at_exit = bp_unpack_value (bp, 1);


Remove vtable_method field in cgraph_node

2011-04-17 Thread Jan Hubicka
Hi,
this patch drops vtable_method filed. I never understood what it is about but 
reading
PR20991 I am convinced that we hit the same problem with work on new 
devirutalization code.
I implemented what Mark describe as "ideal solution" there - i.e. teaching 
cgraph that
virtual functions might become reachable until after inlining.  Since we still 
can
devirutalize in late compilation (that is pretty much poinless but anyway), we 
no
ahve can_refer_decl_in_current_unit_p that tests if the function has been 
already thrown
away.  Perhaps we might apply there the observation about vtable being output 
somewhere,
but I do not think it is safe: if vtable is COMDAT, we can pretty much also 
optimize
all references to it out in all compilation unit and up not outputting it.
When vtable is not COMDAT, the methods won't be either and this trick will not 
apply.

Consequently I am dropping the flag. This is very trivial based on observation
that cp_fold_obj_type_ref, the only setter of the flag, is now dead. Plus the
varasm code is no-longer executed at the time of IPA optimizations that the 
original
patch was fixing.

Martin,
can you please look into why cp_fold_obj_type_ref is no longer used and if 
possible
get rid of it?

Richi,
compiling the original testcase:
// PR middle-end/20991
// { dg-options "-O2" }
// { dg-do compile }

struct S
{
  virtual inline int foo () const;
  virtual inline bool bar () const;
  virtual int baz (int) const;
};

inline int S::foo () const
{
  return 1;
}

inline bool S::bar () const
{
  return foo () == 0;
}

void A ()
{
  S s;
  if (s.bar ())
s.foo ();
}

void B ()
{
  S s;
  if (s.bar ())
s.foo ();
}

we end up with bit amusing code:
void B() ()
{ 
  int D.2147;
  struct S s;

:
  s._vptr.S = &_ZTV1S[2];
  D.2147_8 = OBJ_TYPE_REF(foo;&s->0) (&s);
  return;

}

notice that OBJ_TYPE_REF is useless since foo got devitualized and it holds 
alive
the useless s._vptr.S = &_ZTV1S[2].  We want to drop OBJ_TYPE_REF at time we 
fold
first argument to constant.

a.C.027t.fre1 dump reads:
void B() ()
{
  int (*__vtbl_ptr_type) () * D.2149;
  int (*__vtbl_ptr_type) () D.2148;
  int D.2147;
  bool D.2146;
  struct S s;

:
  s._vptr.S = &_ZTV1S[2];
  D.2149_6 = &_ZTV1S[2];
  D.2148_7 = foo;
  D.2147_8 = OBJ_TYPE_REF(D.2148_7;&s->0) (&s);
  D.2146_9 = D.2147_8 == 0;
  D.2146_12 = D.2146_9;
  D.2146_1 = D.2146_12;
  D.2146_2 = D.2146_1;
  return;

}

already I would expect FRE to replace D.2148_7 by foo.  Later this is done by 
copyprop1

Folding statement: D.2147_8 = OBJ_TYPE_REF(D.2148_7;&s->0) (&s);
Folded into: D.2147_8 = OBJ_TYPE_REF(foo;&s->0) (&s);

void B() ()
{
  int (*__vtbl_ptr_type) () * D.2149;
  int (*__vtbl_ptr_type) () D.2148;
  int D.2147;
  bool D.2146;
  struct S s;

:
  s._vptr.S = &_ZTV1S[2];
  D.2147_8 = OBJ_TYPE_REF(foo;&s->0) (&s);
  return;

}

When copyprop chose to do constant propagation for whatever reason, it probably 
should
do the folding, too. Either in substitute_and_fold or via its own 
ccp_fold_stmt.  I am sure
you know the best alternatives better than I  do ;)

Honza

* cgraph.c (cgraph_clone_node): Do not handle vtable_method
* cgraph.h (struct cgraph_local_info): Drop vtable_method.
* cgraphunit.c (cgraph_copy_node_for_versioning): Drop vtable_method.
* lto-cgraph.c (lto_output_node, input_overwrite_node): Drop vtable 
method.
* gimple-fold.c (can_refer_decl_in_current_unit_p): Mention PR20991 in
gimple-fold.c
* varasm.c (mark_decl_referenced): Drop vtable_method handling code.
* cp/class.c (cp_fold_obj_type_ref): Drop vtable_method.
Index: cgraph.c
===
*** cgraph.c(revision 172608)
--- cgraph.c(working copy)
*** cgraph_clone_node (struct cgraph_node *n
*** 2161,2167 
new_node->local = n->local;
new_node->local.externally_visible = false;
new_node->local.local = true;
-   new_node->local.vtable_method = false;
new_node->global = n->global;
new_node->rtl = n->rtl;
new_node->count = count;
--- 2161,2166 
Index: cgraph.h
===
*** cgraph.h(revision 172608)
--- cgraph.h(working copy)
*** struct GTY(()) cgraph_local_info {
*** 95,104 
/* True when the function has been originally extern inline, but it is
   redefined now.  */
unsigned redefined_extern_inline : 1;
- 
-   /* True if the function is going to be emitted in some other translation
-  unit, referenced from vtable.  */
-   unsigned vtable_method : 1;
  };
  
  /* Information about the function that needs to be computed globally
--- 95,100 
Index: cgraphunit.c
===
*** cgraphunit.c(revision 172608)
--- cgraphunit.c(working copy)
*** cgraph_copy_node_for_versioning (struct 
*** 1991,1997 
 new_version->local = old_v

Re: FDO usability patch -- cfg and lineno checksum

2011-04-17 Thread Jan Hubicka
> Hi,  in current FDO implementation, the source file version used in
> profile-generate needs to strictly match the version used in
> profile-use -- simple formating changes will invalidate the profile
> data (use of it will lead to compiler error or ICE). There are two
> main problems that lead to the weakness:
> 
> 1) the function checksum is computed based on line number and number
> of basic blocks -- this means any line number change will invalidate
> the profile data. In fact, line number should play very minimal role
> in profile matching decision. Another problem is that number of basic
> blocks is also not a good indicator whether CFG matches or not.
> 2) cgraph's pid is used as the key to find the matching profile data
> for a function. If there is any new function added, or if the order of
> the functions changes, the profile data is invalidated.
> 
> The attached is a patch from google that improves this.
> 1) function checksum is split into two parts: lineno checksum and cfg checksum
> 2) function assembler name is used in profile hashing.

Well, the overall idea was to make profile intolerant to pretty much any source
level change, since you never know if even when the CFG shape match, the actual
probabiliies did not changed.
I however see that during development it is not bad to make compiler grok
the profile as long as it seems consistent.

I did not looked in detail into the implementation yet, but it seems
sane at first glance.  However what about issuing a warning when line number
information change telling user that profile might no longer be accurate
and that he may want to remove it and train again?

Honza


Re: FDO usability patch -- correct insane profile data

2011-04-17 Thread Jan Hubicka
> Hi please review the attached patch.
> 
> Ok when bootstrap and test finish?
> 
> Thanks,
> 
> David
> 
> 
> 
> 2011-04-07  Xinliang David Li  
> 
>   * ipa-cp.c (ipcp_update_profiling): Correct
>   negative scale factor due to insane profile data.

> Index: ipa-cp.c
> ===
> --- ipa-cp.c  (revision 171917)
> +++ ipa-cp.c  (working copy)
> @@ -1113,6 +1113,29 @@ ipcp_update_profiling (void)
> scale = ipcp_get_node_scale (orig_node);
> node->count = orig_node->count * scale / REG_BR_PROB_BASE;
> scale_complement = REG_BR_PROB_BASE - scale;
> +
> +  /* Negative scale complement can result from insane profile data
> + in which the total incoming edge counts in this module is
> + larger than the callee's entry count. The insane profile data
> + usually gets generated due to the following reasons:
> +
> + 1) in multithreaded programs, when profile data is dumped
> + to gcda files in gcov_exit, some other threads are still 
> running.
> + The profile counters are dumped in bottom up order (call graph).
> + The caller's BB counters may still be updated while the callee's
> + counter data is already saved to disk.
> +
> + 2) Comdat functions: comdat functions' profile data are not
> + allocated in comdat. When a comdat callee function gets inlined
> + at some callsites after instrumentation, and the remaining calls
> + to this function resolves to a comdat copy in another module,
> + the profile counters for this function are split. This can
> + result in sum of incoming edge counts from this module being
> + larger than callee instance's entry count.  */
> +
> +  if (scale_complement < 0 && flag_profile_correction)

Please don't use flag_profile_correction in code like this. Even with 
!flag_profile_correction
the profile is sane only just at the time it is read. Compiler invalidate it 
for various non-trivial
reasons during the code transformation.  Think if inlining 
int t(int a)
{
  if (a<0)
return 1;
  else return 2;
}
that is called once with constant -1 and once with constant 0.  Profile say
that the branch has probability 1/2 but in one inline copy it has probablity 0
and in other copy 1.  We simply inline with probability 1/2 and end up with
insane profile in the caller.

I would suggest, for brevity, also drop the comment on why profile can be 
insane. We have
many places like this, even if it might look bit distracking at start, we 
probably could
document that somewhere in the internal documentation. Can't think of better 
place to stick
this.

However the test seems bit symptomatic to me. It matches always when scale > 
REG_BR_PROB_BASE
and such scale is already nonsential (i.e. clone should not be more often 
executed than its
original).  It is computed in:

/* Compute the proper scale for NODE.  It is the ratio between the number of
   direct calls (represented on the incoming cgraph_edges) and sum of all
   invocations of NODE (represented as count in cgraph_node).

   FIXME: This code is wrong.  Since the callers can be also clones and
   the clones are not scaled yet, the sums gets unrealistically high.
   To properly compute the counts, we would need to do propagation across
   callgraph (as external call to A might imply call to non-cloned B
   if A's clone calls cloned B).  */
static void
ipcp_compute_node_scale (struct cgraph_node *node)
{ 
  gcov_type sum;
  struct cgraph_edge *cs;

  sum = 0;
  /* Compute sum of all counts of callers. */
  for (cs = node->callers; cs != NULL; cs = cs->next_caller)
sum += cs->count;
  /* Work around the unrealistically high sum problem.  We just don't want
 the non-cloned body to have negative or very low frequency.  Since
 majority of execution time will be spent in clones anyway, this should
 give good enough profile.  */
  if (sum > node->count * 9 / 10)
sum = node->count * 9 / 10;
  if (node->count == 0)
ipcp_set_node_scale (node, 0);
  else
ipcp_set_node_scale (node, sum * REG_BR_PROB_BASE / node->count);
}

We already test that sum is at most 9/10th of node count and then the scale is
computed as sum * REG_BR_PROB_BASE / node->count so the scale should not
exceed REG_BR_PROB_BASE * 9 / 10.

How it possibly can end up being greater than REG_BR_PROB_BASE at the time it 
is used?

Honza


Re: FDO usability trivial patch to fix div by zero error with insane profile

2011-04-17 Thread Jan Hubicka
> On Fri, Apr 15, 2011 at 15:20, Xinliang David Li  wrote:
> > This is a trivial patch to deal with bad profile data (from
> > multi-threaded programs) that leads to divide by zero error.
> >
> > Ok for trunk?
> >
> > Thanks,
> >
> > David
> >
> >
> >
> > 2011-04-15  Xinliang David Li  
> >
> >        * ipa-inline.c (cgraph_decide_recursive_inlining): Fix
> >        div by zero error with insane profile.
> 
> Looks fine to me, but you really want Honza to review this.  Any
> chance we can get a test case for this?

Yes, this is OK for release branches.
On mainline I already fixed it with the current inliner reorg.  The weird think
is that the problem reproduce with sqlite build when profile is not present and
thus all counters should be zero.  Since the division is already guarded by
max_count being 0, someone must set non-zero count of some cgrpah edge that is
wrong.  I will need to investigate this...

Thanks,
Honza
> 
> 
> Diego.


Fix PR lto/48538

2011-04-17 Thread Eric Botcazou
This fixes profiled LTO bootstrap with Ada enabled on the 4.6 branch.  The bug 
is a segfault in merge_profile_summaries:

Program received signal SIGSEGV, Segmentation fault.
merge_profile_summaries (file_data_vec=0x718c9000)
at /home/eric/svn/gcc-4_6-branch/gcc/lto-cgraph.c:1504
1504if (node->local.lto_file_data->profile_info.runs)

(gdb) p debug_generic_expr(node->decl)
gnat1drv__check_library_items

gdb) p node->local.lto_file_data
$4 = (struct lto_file_decl_data *) 0x0

It turns out that the other accesses to node->local.lto_file_data are guarded, 
for example in materialize_cgraph:

  for (node = cgraph_nodes; node; node = node->next)
{
  if (node->local.lto_file_data)
{
  lto_materialize_function (node);
  lto_stats.num_input_cgraph_nodes++;
}
}

The reason is given in a comment in input_cgraph:

  /* Some nodes may have been created by cgraph_node.  This
 happens when the callgraph contains nested functions.  If the
 node for the parent function was never emitted to the gimple
 file, cgraph_node will create a node for it when setting the
 context of the nested function.  */
  if (node->local.lto_file_data)
node->aux = NULL;

In this case, the nested function is gnat1drv__check_library_items__action.

Fixed by adding the missing guard, profiled-LTO-bootstrapped on x86_64/Linux, 
applied on the mainline and 4.6 branch as obvious.


2011-04-17  Eric Botcazou  

PR lto/48538
* lto-cgraph.c (merge_profile_summaries): Check that lto_file_data
is non-null before accessing it.
(input_cgraph): Remove trailing spaces.


-- 
Eric Botcazou
Index: lto-cgraph.c
===
--- lto-cgraph.c	(revision 172603)
+++ lto-cgraph.c	(working copy)
@@ -1463,7 +1463,8 @@ merge_profile_summaries (struct lto_file
  During LTRANS we already have values of count_materialization_scale
  computed, so just update them.  */
   for (node = cgraph_nodes; node; node = node->next)
-if (node->local.lto_file_data->profile_info.runs)
+if (node->local.lto_file_data
+	&& node->local.lto_file_data->profile_info.runs)
   {
 	int scale;
 
@@ -1535,8 +1536,8 @@ input_cgraph (void)
   VEC_free (cgraph_node_ptr, heap, nodes);
   VEC_free (varpool_node_ptr, heap, varpool);
 }
+
   merge_profile_summaries (file_data_vec);
-
 
   /* Clear out the aux field that was used to store enough state to
  tell which nodes should be overwritten.  */


Re: Inliner heuristics facelift

2011-04-17 Thread Jan Hubicka
> > For me, I get everything inlined except for:
> 
> Oops! looking at my mail, I realized that I did not timed fatigue, but ac.
> With the patch the only difference for fatigue is that the threshold for
> full inlining decreased from 125 to 113.

See, we make progress. 10 more patches like this and you can set threshold to 0 
;)

This is expected behaviour - the patch changes interpretation of inline bounds
from limits on function sizes on limits of estimated function sizes after
inlining. Those 12 instructions are accounting function call overhead.

Honza
> 
> Sorry for the noise.
> 
> Dominique


Re: [PATCH] Fix SLP vectorization of shifts (PR tree-optimization/48616)

2011-04-17 Thread Jakub Jelinek
On Sun, Apr 17, 2011 at 11:30:31AM +0300, Ira Rosen wrote:
> We already have this check in vect_build_slp_tree(). It didn't work for the
> testcase because it doesn't take into account the type of definition. I
> agree that it's better to move it here and base the vector/scalar  decision
> on it, but please remove the now redundant check from vect_build_slp_tree
> ().

I've tried to add this to my patch, unfortunately slp-36.c test then fails
on both x86_64-linux and i686-linux.

The problem is that during the analysis vectorizable_shift is called with
NULL slp_node, but later on it is called with non-NULL:

#0  vect_build_slp_tree (loop_vinfo=0x91dae60, bb_vinfo=0x0, node=0xcba8, 
group_size=2, inside_cost=0xcc2c, outside_cost=0xcc30, 
ncopies_for_cost=1, max_nunits=0xcc34, load_permutation=0xcc38, 
loads=0xcc3c, vectorization_factor=4)
at ../../gcc/tree-vect-slp.c:428
#1  0x086f9a1f in vect_build_slp_tree (loop_vinfo=0x91dae60, bb_vinfo=0x0, 
node=0xcc28, group_size=2, inside_cost=0xcc2c, 
outside_cost=0xcc30, ncopies_for_cost=1, max_nunits=0xcc34, 
load_permutation=0xcc38, loads=0xcc3c, vectorization_factor=4)
at ../../gcc/tree-vect-slp.c:648
#2  0x086fae4c in vect_analyze_slp_instance (loop_vinfo=0x91dae60, 
bb_vinfo=0x0, stmt=0xf7d97900) at ../../gcc/tree-vect-slp.c:1212
#3  0x086fbf17 in vect_analyze_slp (loop_vinfo=0x91dae60, bb_vinfo=0x0) at 
../../gcc/tree-vect-slp.c:1303
#4  0x086eef7b in vect_analyze_loop_2 (loop=0xf7d00b7c) at 
../../gcc/tree-vect-loop.c:1523
#5  vect_analyze_loop (loop=0xf7d00b7c) at ../../gcc/tree-vect-loop.c:1585

#0  vectorizable_shift (stmt=0xf7d88bc8, gsi=0x0, vec_stmt=0x0, slp_node=0x0) 
at ../../gcc/tree-vect-stmts.c:2053
#1  0x086e1679 in vect_analyze_stmt (stmt=0xf7d88bc8, 
need_to_vectorize=0xcd08 "\001", node=0x0) at 
../../gcc/tree-vect-stmts.c:4719
#2  0x086ef1c9 in vect_analyze_loop_operations (loop=0xf7d00b7c) at 
../../gcc/tree-vect-loop.c:1279
#3  vect_analyze_loop_2 (loop=0xf7d00b7c) at ../../gcc/tree-vect-loop.c:1536
#4  vect_analyze_loop (loop=0xf7d00b7c) at ../../gcc/tree-vect-loop.c:1585

#0  vectorizable_shift (stmt=0xf7d88bc8, gsi=0xcb80, vec_stmt=0xcb0c, 
slp_node=0x91dbc60) at ../../gcc/tree-vect-stmts.c:2053
#1  0x086e0dec in vect_transform_stmt (stmt=0xf7d88bc8, gsi=0xcb80, 
strided_store=0xcb8f "", slp_node=0x91dbc60, 
slp_node_instance=0x91dbd00) at ../../gcc/tree-vect-stmts.c:4813
#2  0x086f77bf in vect_schedule_slp_instance (node=Unhandled dwarf expression 
opcode 0xf3) at ../../gcc/tree-vect-slp.c:2494
#3  0x086f765c in vect_schedule_slp_instance (node=0x91dbc08, 
instance=0x91dbd00, vectorization_factor=2) at ../../gcc/tree-vect-slp.c:2431
#4  0x086fca9f in vect_schedule_slp (loop_vinfo=0x91dae60, bb_vinfo=0x0) at 
../../gcc/tree-vect-slp.c:2523
#5  0x086f0687 in vect_transform_loop (loop_vinfo=0x91dae60) at 
../../gcc/tree-vect-loop.c:4891
#6  0x086fd685 in vectorize_loops () at ../../gcc/tree-vectorizer.c:205

--- gcc/tree-vect-slp.c.jj  2010-12-27 10:18:14.0 +0100
+++ gcc/tree-vect-slp.c 2011-04-17 13:06:48.037388449 +0200
@@ -321,12 +321,10 @@ vect_build_slp_tree (loop_vec_info loop_
   enum tree_code first_stmt_code = ERROR_MARK, rhs_code = ERROR_MARK;
   tree first_stmt_def1_type = NULL_TREE, first_stmt_def0_type = NULL_TREE;
   tree lhs;
-  bool stop_recursion = false, need_same_oprnds = false;
-  tree vectype, scalar_type, first_op1 = NULL_TREE;
+  bool stop_recursion = false;
+  tree vectype, scalar_type;
   unsigned int ncopies;
   optab optab;
-  int icode;
-  enum machine_mode optab_op2_mode;
   enum machine_mode vec_mode;
   tree first_stmt_const_oprnd = NULL_TREE;
   struct data_reference *first_dr;
@@ -433,20 +431,6 @@ vect_build_slp_tree (loop_vec_info loop_
fprintf (vect_dump, "Build SLP failed: no optab.");
  return false;
}
- icode = (int) optab_handler (optab, vec_mode);
- if (icode == CODE_FOR_nothing)
-   {
- if (vect_print_dump_info (REPORT_SLP))
-   fprintf (vect_dump, "Build SLP failed: "
-   "op not supported by target.");
- return false;
-   }
- optab_op2_mode = insn_data[icode].operand[2].mode;
- if (!VECTOR_MODE_P (optab_op2_mode))
-   {
- need_same_oprnds = true;
- first_op1 = gimple_assign_rhs2 (stmt);
-   }
}
}
}
@@ -472,19 +456,6 @@ vect_build_slp_tree (loop_vec_info loop_
 
  return false;
}
-
- if (need_same_oprnds
- && !operand_equal_p (first_op1, gimple_assign_rhs2 (stmt), 0))
-   {
- if (vect_print_dump_info (REPORT_SLP))
-   {
- fprintf (vect_dump,
-  

Re: Inliner heuristics facelift

2011-04-17 Thread Dominique Dhumieres
> For me, I get everything inlined except for:

Oops! looking at my mail, I realized that I did not timed fatigue, but ac.
With the patch the only difference for fatigue is that the threshold for
full inlining decreased from 125 to 113.

Sorry for the noise.

Dominique


Re: [5/9] Main target-independent support for direct interleaving

2011-04-17 Thread Ira Rosen
gcc-patches-ow...@gcc.gnu.org wrote on 12/04/2011 04:59:16 PM:

>
> This patch adds vec_load_lanes and vec_store_lanes optabs for
instructions
> like NEON's vldN and vstN.  The optabs are defined this way because the
> vectors must be allocated to a block of consecutive registers.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

The vectorizer part is fine with me except for:


> @@ -685,9 +761,11 @@ vect_model_store_cost (stmt_vec_info stm
>first_dr = STMT_VINFO_DATA_REF (stmt_info);
>  }
>
> -  /* Is this an access in a group of stores, which provide strided
access?
> - If so, add in the cost of the permutes.  */
> -  if (group_size > 1)
> +  /* We assume that the cost of a single store-lanes instruction is
> + equivalent to the cost of GROUP_SIZE separate stores.  If a strided
> + access is instead being provided by a load-and-permute operation,

I think it should be 'permute-and-store' and not 'load-and-permute'.

> + include the cost of the permutes.  */
> +  if (!store_lanes_p && group_size > 1)
>  {
>/* Uses a high and low interleave operation for each needed
> permute.  */
>inside_cost = ncopies * exact_log2(group_size) * group_size


Thanks,
Ira



New German PO file for 'gcc' (version 4.6.0)

2011-04-17 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-4.6.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[Patch, fortran] PR48462 - [4.6/4.7 Regression] realloc on assignment: matmul Segmentation Fault with Allocatable Array

2011-04-17 Thread Paul Richard Thomas
This is the last of three regressions caused by my introduction of
reallocation on assignment.  The comments in the patch adequately
explain what is being done.

Bootstrapped and regtested on FC9/x86_64 - OK for trunk and 4.6?

Paul

2011-04-17  Paul Thomas  

PR fortran/48462
* trans-expr.c (fcncall_realloc_result): Renamed version of
realloc_lhs_bounds_for_intrinsic_call that does not touch the
descriptor bounds anymore but makes a temporary descriptor to
hold the result.
(gfc_trans_arrayfunc_assign): Modify the reference to above
renamed function.

2011-04-17  Paul Thomas  

PR fortran/48462
* gfortran.dg/realloc_on_assign_7.f03: New test.
Index: gcc/fortran/trans-expr.c
===
*** gcc/fortran/trans-expr.c	(revision 172607)
--- gcc/fortran/trans-expr.c	(working copy)
*** realloc_lhs_loop_for_fcn_call (gfc_se *s
*** 5528,5582 
  }
  
  
  static void
! realloc_lhs_bounds_for_intrinsic_call (gfc_se *se, int rank)
  {
tree desc;
tree tmp;
-   tree offset;
-   int n;
  
!   /* Use the allocation done by the library.  */
desc = build_fold_indirect_ref_loc (input_location, se->expr);
tmp = gfc_conv_descriptor_data_get (desc);
tmp = gfc_call_free (fold_convert (pvoid_type_node, tmp));
!   gfc_add_expr_to_block (&se->pre, tmp);
!   gfc_conv_descriptor_data_set (&se->pre, desc, null_pointer_node);
/* Unallocated, the descriptor does not have a dtype.  */
tmp = gfc_conv_descriptor_dtype (desc);
!   gfc_add_modify (&se->pre, tmp, gfc_get_dtype (TREE_TYPE (desc)));
! 
!   offset = gfc_index_zero_node;
!   tmp = gfc_index_one_node;
!   /* Now reset the bounds from zero based to unity based.  */
!   for (n = 0 ; n < rank; n++)
! {
!   /* Accumulate the offset.  */
!   offset = fold_build2_loc (input_location, MINUS_EXPR,
! gfc_array_index_type,
! offset, tmp);
!   /* Now do the bounds.  */
!   gfc_conv_descriptor_offset_set (&se->post, desc, tmp);
!   tmp = gfc_conv_descriptor_ubound_get (desc, gfc_rank_cst[n]);
!   tmp = fold_build2_loc (input_location, PLUS_EXPR,
! 			 gfc_array_index_type,
! 			 tmp, gfc_index_one_node);
!   gfc_conv_descriptor_lbound_set (&se->post, desc,
!   gfc_rank_cst[n],
!   gfc_index_one_node);
!   gfc_conv_descriptor_ubound_set (&se->post, desc,
!   gfc_rank_cst[n], tmp);
! 
!   /* The extent for the next contribution to offset.  */
!   tmp = fold_build2_loc (input_location, MINUS_EXPR,
! 			 gfc_array_index_type,
! 			 gfc_conv_descriptor_ubound_get (desc, gfc_rank_cst[n]),
! 			 gfc_conv_descriptor_lbound_get (desc, gfc_rank_cst[n]));
!   tmp = fold_build2_loc (input_location, PLUS_EXPR,
! 			 gfc_array_index_type,
! 			 tmp, gfc_index_one_node);
! }
!   gfc_conv_descriptor_offset_set (&se->post, desc, offset);
  }
  
  
--- 5528,5565 
  }
  
  
+ /* For Assignment to a reallocatable lhs from intrinsic functions,
+replace the se.expr (ie. the result) with a temporary descriptor.
+Null the data field so that the library allocates space for the
+result. Free the data of the original descriptor after the function,
+in case it appears in an argument expression and transfer the
+result to the original descriptor.  */
+ 
  static void
! fcncall_realloc_result (gfc_se *se)
  {
tree desc;
+   tree res_desc;
tree tmp;
  
!   /* Use the allocation done by the library.  Substitute the lhs
!  descriptor with a copy, whose data field is nulled.*/
desc = build_fold_indirect_ref_loc (input_location, se->expr);
+   res_desc = gfc_evaluate_now (desc, &se->pre);
+   gfc_conv_descriptor_data_set (&se->pre, res_desc, null_pointer_node);
+   se->expr = gfc_build_addr_expr (TREE_TYPE (se->expr), res_desc);
+ 
+   /* Free the lhs after the function call and copy the result data to
+  it.  */
tmp = gfc_conv_descriptor_data_get (desc);
tmp = gfc_call_free (fold_convert (pvoid_type_node, tmp));
!   gfc_add_expr_to_block (&se->post, tmp);
!   tmp = gfc_conv_descriptor_data_get (res_desc);
!   gfc_conv_descriptor_data_set (&se->post, desc, tmp);
! 
/* Unallocated, the descriptor does not have a dtype.  */
tmp = gfc_conv_descriptor_dtype (desc);
!   gfc_add_modify (&se->post, tmp, gfc_get_dtype (TREE_TYPE (desc)));
  }
  
  
*** gfc_trans_arrayfunc_assign (gfc_expr * e
*** 5646,5652 
  	  ss->is_alloc_lhs = 1;
  	}
else
! 	realloc_lhs_bounds_for_intrinsic_call (&se, expr1->rank);
  }
  
gfc_conv_function_expr (&se, expr2);
--- 5629,5635 
  	  ss->is_alloc_lhs = 1;
  	}
else
! 	fcncall_realloc_result (&se);
  }
  
gfc_conv_function_expr (&se, expr2);
Index: gcc/testsuite/gfortran.dg/realloc_on_assign_7.f03
===
*** gcc/testsuite/gfortran.dg/realloc_on_assign_7.f

Re: Inliner heuristics facelift

2011-04-17 Thread Jan Hubicka
> Honza,
> 
> Your patch seems to make --param max-inline-insns-auto= ineffective:
> 
> gfc -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer --param 
> max-inline-insns-auto=4000 -fwhole-program -fstack-arrays ac.f90
> 8.105u 0.005s 0:08.11 99.8%   0+0k 0+5io 0pf+0w
> 
> while the timing was ~2.7s with a 125 threshold before the patch.

For me, I get everything inlined except for:
  not inlinable: main/5 -> MAIN__/4, --param large-stack-frame-growth limit 
reached
and indeed in my program there is only main and MAIN function.  You may try if
increasing large-stack-frame-grwoth makes the regression go away, but in a way

I doubt so, since main() is:
main (integer(kind=4) argc, character(kind=1) * * argv)
{ 
  static integer(kind=4) options.90[8] = {68, 511, 0, 0, 0, 1, 0, 1};

:
  _gfortran_set_args (argc_1(D), argv_2(D));
  _gfortran_set_options (8, &options.90[0]);
  MAIN__ ();
  return 0;

}
and inlining into it is not most fruitable thing anyway.

--param max-inline-insns-auto needed for me is also way bellow 125.  If I 
believe
right, the function is estimated to 94 with 14 of benefit, so it should be about
80.

Any idea what makes the regression for you? Can you confirm that you also get
everything inlined? I can't easilly reproduce your result since I don't have
graphite enabled setup.

I think I should add -Winline-all that will report
reasons for not inlining functions that are not declared inline, too, so these
things are easier to look at.

Honza
> 
> Cheers,
> 
> Dominique


Re: More of ipa-inline housekeeping

2011-04-17 Thread Jan Hubicka
> >
> > I never really tuned the stack frame growth heuristics since it did not 
> > cause any problems
> > in the benchmarks. On fortran this is quite different because of the large 
> > i/o blocks
> > hitting it very commonly, so I will look into making it more permissive.  
> > We definitely
> > can just bump up the limits and/or we can also teach it that if call 
> > dominates the return
> > there is not really much to save of stack usage by preventing inlining 
> > since both stack
> > frames will wind up on the stack anyway.
> 
> I think Micha has a fix for the I/O block issue.

Good, how he fixed this?
In any case I sort of like the idea of tracking whether function call happens 
always and
bypassing limits in that case.  There seems to be cases of functions called 
once that
are not inlined for this reason and there is no danger for doing so.  I saw 
that in combine.c
dump as well as in the ac polyhedron benchmark I just looked that.  In case of 
combine.c it is
definitely becuase of standard non-scalar local vars.

Honza
> 
> Richard.
> 
> > This means adding new bit whether call edge dominate exit and using this 
> > info. Also simple
> > noreturn IPA discovery can be based on this and I recently noticed it might 
> > be important
> > for Mozilla. So I will give it a try soonish.
> >
> > I will also look into the estimate_size ICE reported today.
> >
> > Honza
> >


Re: [google] Handle NULL return values in setlocale calls (issue4444046)

2011-04-17 Thread Paolo Carlini

Hi,

Thanks Diego for committing this patch.

Yes, I would like to submit the patch to trunk. The reason is for this
patch is that setlocale() in bionicC always returns NULL (bionicC does
not support setlocale()), however libstdc++ does not handle the NULL
return value of setlocale(xxx,NULL) and thus causes segmentation
fault.
The patch makes sense and I'm ok with it if you add a short comment 
explaining the issue, which we never encountered before. Of course, 
longer term we (you ;) may want to add to the library a 4th locale model 
beyond the existing ones, some sort of dummy locale model meant for 
targets like bionicC, which essentially don't provide locales at all, if 
I understand correctly: without a working setlocale how can you switch 
from one named locale to another in portable C code?


Paolo.


[PATCH, SMS] Support instructions with REG_INC_NOTE

2011-04-17 Thread Revital Eres
Hello,

The attached patch extends the current implementation to analyze
instructions with REG_INC_NOTE.

Tested on ppc64-redhat-linux (bootstrap and regtest) SPU (only regtest)
and arm-linux-gnueabi (bootstrap c and regtest) configured with
--with-arch=armv7-a --with-mode=thumb.

OK for mainline?

Thanks,
Revital

Changelog:

* modulo-sched.c (record_inc_dec_insn_info,
free_node_sched_params): New functions.
(SCHED_FIRST_REG_MOVE, SCHED_NREG_MOVES): Remove.
(struct regmove_info): New.
(insn_regmove_info): New field in node_sched_params.
(print_node_sched_params): Print information for all the
definitions in the instructions.
(generate_reg_moves, duplicate_insns_of_cycles,
set_node_sched_params): Adjust the code to handle instructions
that have multiple definitions.
(sms_schedule): Handle loops that contain instructions with
FIND_REG_INC_NOTE and call free_node_sched_params.
=== modified file 'gcc/modulo-sched.c'
--- gcc/modulo-sched.c  2011-03-27 07:11:08 +
+++ gcc/modulo-sched.c  2011-04-17 10:29:24 +
@@ -201,32 +201,50 @@ static void duplicate_insns_of_cycles (p
   int, int, int, rtx);
 static int calculate_stage_count (partial_schedule_ptr ps);
 
+static int record_inc_dec_insn_info (rtx, rtx, rtx, rtx, rtx, void *);
+
+
 #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
 #define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
-#define SCHED_FIRST_REG_MOVE(x) \
-   (((node_sched_params_ptr)(x)->aux.info)->first_reg_move)
-#define SCHED_NREG_MOVES(x) \
-   (((node_sched_params_ptr)(x)->aux.info)->nreg_moves)
 #define SCHED_ROW(x) (((node_sched_params_ptr)(x)->aux.info)->row)
 #define SCHED_STAGE(x) (((node_sched_params_ptr)(x)->aux.info)->stage)
 #define SCHED_COLUMN(x) (((node_sched_params_ptr)(x)->aux.info)->column)
 
-/* The scheduling parameters held for each node.  */
-typedef struct node_sched_params
+/* Information about register-move generated for a definition.  */
+struct regmove_info
 {
-  int asap;/* A lower-bound on the absolute scheduling cycle.  */
-  int time;/* The absolute scheduling cycle (time >= asap).  */
-
+  /* The definition for which the register-move is generated for.  */
+  rtx def;
+  
   /* The following field (first_reg_move) is a pointer to the first
- register-move instruction added to handle the modulo-variable-expansion
- of the register defined by this node.  This register-move copies the
- original register defined by the node.  */
+ register-move instruction added to handle the
+ modulo-variable-expansion of the register defined by this node.
+ This register-move copies the original register defined by the node.
+  */
   rtx first_reg_move;
-
+  
   /* The number of register-move instructions added, immediately preceding
  first_reg_move.  */
   int nreg_moves;
+  
+  /* Auxiliary info used in the calculation of the register-moves.  */
+  void *aux;
+};
+
+typedef struct regmove_info *regmove_info_ptr;
+DEF_VEC_P (regmove_info_ptr);
+DEF_VEC_ALLOC_P (regmove_info_ptr, heap);
 
+/* The scheduling parameters held for each node.  */
+typedef struct node_sched_params
+{
+  int asap;/* A lower-bound on the absolute scheduling cycle.  */
+  int time;/* The absolute scheduling cycle (time >= asap).  */
+  
+  /* Information about register-moves needed for
+ definitions in the instruction.  */
+  VEC (regmove_info_ptr, heap) *insn_regmove_info;
+  
   int row;/* Holds time % ii.  */
   int stage;  /* Holds time / ii.  */
 
@@ -423,12 +441,58 @@ set_node_sched_params (ddg_ptr g)
  appropriate sched_params structure.  */
   for (i = 0; i < g->num_nodes; i++)
 {
+  rtx insn = g->nodes[i].insn;
+  rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+  rtx set = single_set (insn);
+  
   /* Watch out for aliasing problems?  */
   node_sched_params[i].asap = g->nodes[i].aux.count;
+  node_sched_params[i].insn_regmove_info = NULL;
+  
+  /* Record the definition(s) in the instruction.  These will be
+later used to calculate the register-moves needed for each
+definition. */
+  if (set && REG_P (SET_DEST (set)))
+   { 
+ regmove_info_ptr elt = 
+   (regmove_info_ptr) xcalloc (1, sizeof (struct regmove_info));
+ 
+ elt->def = SET_DEST (set);
+ VEC_safe_push (regmove_info_ptr, heap, 
+node_sched_params[i].insn_regmove_info,
+elt);
+   }
+  
+  if (note)
+   for_each_inc_dec (&insn, record_inc_dec_insn_info, 
+ &node_sched_params[i]);
+  
   g->nodes[i].aux.info = &node_sched_params[i];
 }
 }
 
+/* Free the sched_params information allocated for each node.  */
+static void
+free_node_sched_params (ddg_ptr g)
+{
+  int i;
+  regmove_info_ptr def;
+
+  for

addition to http://gcc.gnu.org/faq.html

2011-04-17 Thread Axel Freyn
Hi,

I propose to add a phrase to the "rpath"-section (Dynamic linker is
unable to find GCC libraries), which explains how to use "-R" as
compiler-option (by using "-Wl,")

Axel

--- old/faq.html2011-04-17 13:55:36.0 +0200
+++ new/faq.html2011-04-17 14:00:28.0 +0200
@@ -244,7 +244,8 @@
 flags such as -R or -rpath, depending on
 platform and linker, to the *link or *lib

-specs.
+specs. The syntax -Wl,option of GCC can be used to ask GCC
+to transfer the flag option to the linker.

 Another alternative is to install a wrapper script around gcc, g++
 or ld that adds the appropriate directory to the environment variable



Re: [PATCH] New flag to apply SMS when SC equals 1 (Second try)

2011-04-17 Thread Revital Eres
Hello,

>
> New params need documentation in doc/invoke.texi.  Please also change the
> maximum value to 2, not 1.  Ok with that changes.

When changing the max value to 2, any value that is greater than
2 is rejected due to the following comment in params.def:

 - The maximum acceptable value for the parameter (if greater than
 the minimum).

So I'll leave the maximum value to be 1 if that's OK.
I added the documentation in invoke.taxi.

Thanks,
Revital

* params.def (sms-min-sc): New param flag.
* modulo-sched.c (sms_schedule): Use it.
* doc/invoke.texi (sms-min-sc): Document it.
=== modified file 'gcc/doc/invoke.texi'
--- gcc/doc/invoke.texi 2011-03-07 03:00:04 +
+++ gcc/doc/invoke.texi 2011-04-17 11:28:33 +
@@ -8718,6 +8718,10 @@ through which the instruction may be pip
 The maximum number of best instructions in the ready list that are considered
 for renaming in the selective scheduler.  The default value is 2.
 
+@item sms-min-sc
+The minimum value of stage count that swing modulo scheduler will
+generate.  The default value is 2.
+
 @item max-last-value-rtl
 The maximum size measured as number of RTLs that can be recorded in an 
expression
 in combiner for a pseudo register as last known value of that register.  The 
default

=== modified file 'gcc/modulo-sched.c'
--- gcc/modulo-sched.c  2011-03-27 07:11:08 +
+++ gcc/modulo-sched.c  2011-04-17 10:53:03 +
@@ -1222,9 +1222,10 @@ sms_schedule (void)
  PS_STAGE_COUNT(ps) = stage_count;
}
   
-  /* Stage count of 1 means that there is no interleaving between
- iterations, let the scheduling passes do the job.  */
-  if (stage_count <= 1
+  /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
+1 means that there is no interleaving between iterations thus
+we let the scheduling passes do the job in this case.  */
+  if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
  || (count_init && (loop_count <= stage_count))
  || (flag_branch_probabilities && (trip_count <= stage_count)))
{

=== modified file 'gcc/params.def'
--- gcc/params.def  2011-02-02 15:52:08 +
+++ gcc/params.def  2011-04-17 11:16:03 +
@@ -344,6 +344,11 @@ DEFPARAM(PARAM_SMS_MAX_II_FACTOR,
 "sms-max-ii-factor",
 "A factor for tuning the upper bound that swing modulo scheduler uses 
for scheduling a loop",
 100, 0, 0)
+/* The minimum value of stage count that swing modulo scheduler will generate. 
 */
+DEFPARAM(PARAM_SMS_MIN_SC,
+"sms-min-sc",
+"The minimum value of stage count that swing modulo scheduler will 
generate.",
+2, 1, 1)
 DEFPARAM(PARAM_SMS_DFA_HISTORY,
 "sms-dfa-history",
 "The number of cycles the swing modulo scheduler considers when 
checking conflicts using DFA",



Re: More of ipa-inline housekeeping

2011-04-17 Thread Richard Guenther
On Sun, Apr 17, 2011 at 10:35 AM, Jan Hubicka  wrote:
>> AFAICT revision 172430 fixed the original problem in pr45810:
>>
>> gfc -Ofast -fwhole-program fatigue.f90       : 6.301u 0.003s 0:06.30
>> gfc -Ofast -fwhole-program -flto fatigue.f90 : 6.263u 0.003s 0:06.26
>>
>> However if I play with --param max-inline-insns-auto=*, I get
>>
>> gfc -Ofast -fwhole-program --param max-inline-insns-auto=124 -fstack-arrays 
>> fatigue.f90 : 4.870u 0.002s 0:04.87
>> gfc -Ofast -fwhole-program --param max-inline-insns-auto=125 -fstack-arrays 
>> fatigue.f90 : 2.872u 0.002s 0:02.87
>>
>> and
>>
>> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=515 
>> -fstack-arrays fatigue.f90 : 4.965u 0.003s 0:04.97
>> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=516 
>> -fstack-arrays fatigue.f90 : 2.732u 0.002s 0:02.73
>>
>> while I get the same threshold=125 with/without -flto at revision 172429.
>> Note that I get the same thresholds without -fstack-arrays, the run times
>> are only larger.
>
> Thanks for notice.   This was not really expected, but seems to give some
> insight.  I just tested a new cleanup patch of mine where I fixed few minor
> bugs in side corners.  One of those bugs I noticed was introduced by this 
> patch
> (an overlook while converting the code to new accesor).
>
> In case of nested inlining, the stack usage got misaccounted and consequently
> we allowed more inlining than --param large-stack-frame-growth would allow 
> normally.
> The vortex and wupwise improvement seems to be gone, so I think they are due 
> to this
> issue.
>
> I never really tuned the stack frame growth heuristics since it did not cause 
> any problems
> in the benchmarks. On fortran this is quite different because of the large 
> i/o blocks
> hitting it very commonly, so I will look into making it more permissive.  We 
> definitely
> can just bump up the limits and/or we can also teach it that if call 
> dominates the return
> there is not really much to save of stack usage by preventing inlining since 
> both stack
> frames will wind up on the stack anyway.

I think Micha has a fix for the I/O block issue.

Richard.

> This means adding new bit whether call edge dominate exit and using this 
> info. Also simple
> noreturn IPA discovery can be based on this and I recently noticed it might 
> be important
> for Mozilla. So I will give it a try soonish.
>
> I will also look into the estimate_size ICE reported today.
>
> Honza
>


Re: Inliner heuristics facelift

2011-04-17 Thread Dominique Dhumieres
Honza,

Your patch seems to make --param max-inline-insns-auto= ineffective:

gfc -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer --param 
max-inline-insns-auto=4000 -fwhole-program -fstack-arrays ac.f90
8.105u 0.005s 0:08.11 99.8% 0+0k 0+5io 0pf+0w

while the timing was ~2.7s with a 125 threshold before the patch.

Cheers,

Dominique


Re: [PATCH] New flag to apply SMS when SC equals 1 (Second try)

2011-04-17 Thread Richard Guenther
On Sun, Apr 17, 2011 at 8:33 AM, Revital Eres  wrote:
> Hello,
>
> Following Richard's suggestion here is the patch again.
> Tested on ppc64-redhat-linux.
>
> OK for mainline?

New params need documentation in doc/invoke.texi.  Please also change the
maximum value to 2, not 1.  Ok with that changes.

Thanks,
Richard.

> Thanks,
> Revital
>
>
> Changelog
>
>        * params.def (sms-min-sc): New param flag.
>        * modulo-sched.c (sms_schedule): Use it.
>


Re: Fix PR48622 (lto ICE, lto bootstrap)

2011-04-17 Thread Jan Hubicka
> Hi,
> 
> since r172430 lto bootstrap is broken, as well as the attached testcase 
> (pr48622) and cpu2006 compilation (pr48645).  The inline summary writer 
> used a different order for size and time than the reader expected.
> 
> I've committed the below patch as obvious (r172603) after verifying that 
> lto bootstrap passes the place of the ICE (and that the testcase is 
> fixed).

Oops, thanks!  I wonder why this did not wreck the LTO benchmark results though 
;)

Honza


[Ada] Fix ICE on function call returning variant record

2011-04-17 Thread Eric Botcazou
The compiler ICEs on the call to a function returning a discriminated record 
type with variant part.  The problem is again an incorrect sharing of a tree 
node between two types.

Tested on i586-suse-linux, applied on the mainline.


2011-04-17  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Declare the
padded type built for the return type if it is unconstrained.


2011-04-17  Eric Botcazou  

* gnat.dg/discr27.ad[sb]: Move dg directive.
* gnat.dg/discr28.ad[sb]: New test.
* gnat.dg/discr28_pkg.ads: New helper.


-- 
Eric Botcazou
Index: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 172603)
+++ gcc-interface/decl.c	(working copy)
@@ -4068,6 +4068,13 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 max_size (TYPE_SIZE (gnu_return_type),
 	  true),
 0, gnat_entity, false, false, false, true);
+
+		/* Declare it now since it will never be declared otherwise.
+		   This is necessary to ensure that its subtrees are properly
+		   marked.  */
+		create_type_decl (TYPE_NAME (gnu_return_type), gnu_return_type,
+  NULL, true, debug_info_p, gnat_entity);
+
 		return_by_invisi_ref_p = true;
 	  }
 
-- { dg-do compile }

package body Discr28 is

   procedure Dummy (E : Rec) is
   begin
 null;
   end;

   function F return Rec is
   begin
  return Default_Rec;
   end;

   procedure Proc1 is
   begin
  Dummy (F);
   end;

   procedure Proc2 is
   begin
  Dummy (F);
   end;

end Discr28;
with Discr28_Pkg;

package Discr28 is

   type Enum is (One, Two);

   type Rec (D : Enum := One) is record
  case D is
 when One => null;
 when Two => S : String (1 .. Discr28_Pkg.N);
  end case;
   end record;

   Default_Rec : constant Rec := (D => One);

   procedure Proc1;
   procedure Proc2;

end Discr28;
package Discr28_Pkg is

  function N return Natural;

end Discr28_Pkg;


Re: More of ipa-inline housekeeping

2011-04-17 Thread Dominique Dhumieres
Honza,

> I will also look into the estimate_size ICE reported today.

This has been fixed by revision 172603 and now the thresholds are
the same with/without -flto. For the fine tuning I have posted
some results on pr48636.

Cheers,

Dominique


Re: [3/9] STMT_VINFO_RELATED_STMT handling in vectorizable_store

2011-04-17 Thread Ira Rosen


gcc-patches-ow...@gcc.gnu.org wrote on 12/04/2011 04:38:54 PM:

> vectorizable_store contains the code:
>
>   for (j = 0; j < ncopies; j++)
> {
>   for (i = 0; i < vec_num; i++)
>{
>  ...
>  if (j == 0)
>STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
>  else
>STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
>  prev_stmt_info = vinfo_for_stmt (new_stmt);
>  }
> }
>
> That is, STMT_VINFO_VEC_STMT (stmt_info) and *vec_stmt contain the last
> statement emitted for the _last_ vector of the first copy.  However,
> for later copies, the last statement for _every_ vector is chained using
> STMT_VINFO_RELATED_STMT.  This seems a bit inconsistent, and isn't
> what I expected from the comments.  It also seems different from
> other vectorisation functions, where each copy has exactly one
> STMT_VINFO_RELATED_STMT.  I wasn't sure whether the difference here
> was deliberate or not.

I think it doesn't really matter because STMT_VINFO_RELATED_STMT is used
for retrieving copies of vector operands, and stores don't define any.

>
> The reason I'm changing it is that it makes the control flow for
> the new code more obvious.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

OK.

Thanks,
Ira

>
> Richard
>
>
> gcc/
>* tree-vect-stmts.c (vectorizable_store): Only chain one related
>statement per copy.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2011-04-12 11:55:08.0 +0100
> +++ gcc/tree-vect-stmts.c   2011-04-12 11:55:09.0 +0100
> @@ -3612,6 +3612,7 @@ vectorizable_store (gimple stmt, gimple_
>
>if (1)
> {
> + new_stmt = NULL;
>   if (strided_store)
> {
>   result_chain = VEC_alloc (tree, heap, group_size);
> @@ -3669,17 +3670,19 @@ vectorizable_store (gimple stmt, gimple_
>   if (slp)
>continue;
>
> - if (j == 0)
> -  STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
> - else
> -  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> -
> - prev_stmt_info = vinfo_for_stmt (new_stmt);
>   next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
>   if (!next_stmt)
>break;
> }
> }
> +  if (!slp)
> +   {
> + if (j == 0)
> +   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
> + else
> +   STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> + prev_stmt_info = vinfo_for_stmt (new_stmt);
> +   }
>  }
>
>VEC_free (tree, heap, dr_chain);



Re: [Patch,AVR]: FIX ICE in optabs due to bad rotate expander.

2011-04-17 Thread Denis Chertykov
2011/4/15 Georg-Johann Lay :
> Denis Chertykov schrieb:
>> 2011/4/14 Georg-Johann Lay :
>>> Denis Chertykov schrieb:
 2011/4/14 Georg-Johann Lay :
> The "rotl3" expanders (mode \in {HI,SI,DI}) violates synopsis by
> using 4 operands instead of 3. This runs in ICE in top of
> optabs.c:maybe_gen_insn.
>
> The right way to do this is to use match_dup, not match_operand. So
> the fix is obvious.
>
> Regenerated avr-gcc and run against avr testsuite:
>
> gcc.target/avr/torture/pr41885.c generates these patterns
>
> Johann
>
> 2011-04-14  Georg-Johann Lay  
>
>        * config/avr/avr.md ("rotlhi3", "rotlsi3", "rotldi3"): Fix
>        expanders operand 3 from match_operand to match_dup.
 May be better to use (clobber (match_scratch ...)) here and in
 `*rotw' and `*rotb'.
>>> These are splitters that might split before reload, producing strange,
>>> non-matching insns like
>>>  (set (scratch:QI) ...
>>> or crashing already in avr_rotate_bytes becase the split condition reads
>>> "&& (reload_completed || mode == DImode)"
>>
>> Generally I'm agree with you, change match_operand to match_dup is easy.
>
> I already submitted the easy patch to avoid the ICE.
>
>> But IMHO the right way is:
>>  - the splitters and avr_rotate_bytes are wrong and must be fixed too.
>>  - operands[3] is a scratch register and right way is to use match_scratch.
>>
>> I can approve the patch.
>>
>> Denis.
>
> Ok, here is the right-way patch.
>
> The expander generates a SCRATCH instead of a pseudo, and in case of a
> pre-reload split the SCRATCH is replaced with a pseudo because it is
> not known if or if not the scratch will actually be needed.
>
> avr_rotate_bytes now skips operations on non-REG 'scratch'.
> Furthermore, I added an assertion that ensures that 'scratch' is
> actually a REG when it is needed.
>
> Besides that, I fixed indentation. I know it is not optimal to fix
> indentation alongside with functionality... did it anyway :-)
>
> Finally, I exposed alternative #3 of the insns to the register
> allocator, because it is not possible to distinguish between
> overlapping or non-overlapping regs, and #3 does not need a scratch.
>
> Ran C-testsuite with no regressions.

Are you encountered any difference in code size ?

Denis.


Re: More of ipa-inline housekeeping

2011-04-17 Thread Jan Hubicka
> AFAICT revision 172430 fixed the original problem in pr45810:
> 
> gfc -Ofast -fwhole-program fatigue.f90   : 6.301u 0.003s 0:06.30
> gfc -Ofast -fwhole-program -flto fatigue.f90 : 6.263u 0.003s 0:06.26
> 
> However if I play with --param max-inline-insns-auto=*, I get
> 
> gfc -Ofast -fwhole-program --param max-inline-insns-auto=124 -fstack-arrays 
> fatigue.f90 : 4.870u 0.002s 0:04.87
> gfc -Ofast -fwhole-program --param max-inline-insns-auto=125 -fstack-arrays 
> fatigue.f90 : 2.872u 0.002s 0:02.87
> 
> and
> 
> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=515 
> -fstack-arrays fatigue.f90 : 4.965u 0.003s 0:04.97
> gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=516 
> -fstack-arrays fatigue.f90 : 2.732u 0.002s 0:02.73
> 
> while I get the same threshold=125 with/without -flto at revision 172429.
> Note that I get the same thresholds without -fstack-arrays, the run times
> are only larger.

Thanks for notice.   This was not really expected, but seems to give some
insight.  I just tested a new cleanup patch of mine where I fixed few minor
bugs in side corners.  One of those bugs I noticed was introduced by this patch
(an overlook while converting the code to new accesor).

In case of nested inlining, the stack usage got misaccounted and consequently
we allowed more inlining than --param large-stack-frame-growth would allow 
normally.
The vortex and wupwise improvement seems to be gone, so I think they are due to 
this
issue.

I never really tuned the stack frame growth heuristics since it did not cause 
any problems
in the benchmarks. On fortran this is quite different because of the large i/o 
blocks
hitting it very commonly, so I will look into making it more permissive.  We 
definitely
can just bump up the limits and/or we can also teach it that if call dominates 
the return
there is not really much to save of stack usage by preventing inlining since 
both stack
frames will wind up on the stack anyway.

This means adding new bit whether call edge dominate exit and using this info. 
Also simple
noreturn IPA discovery can be based on this and I recently noticed it might be 
important
for Mozilla. So I will give it a try soonish.

I will also look into the estimate_size ICE reported today.

Honza


Re: [PATCH] Fix SLP vectorization of shifts (PR tree-optimization/48616)

2011-04-17 Thread Ira Rosen


Jakub Jelinek  wrote on 16/04/2011 09:16:23 AM:
>
> Hi!
>
> As the attached testcase shows, while the current detection of what
> shifts are by scalar and what shifts are by vector shift count
> may work well for loop vectorizer (vect_internal_def being vector
> shift, vect_external_def or vect_constant_def scalar shift),
> it is incorrect for SLP, where vect_external_def and vect_constant_def
> may be either scalar or vector shift (vect_external_def is e.g.
> if def_stmt is in a different bb and either it can be the same
> SSA_NAMEs in all shifts, or different, for vect_constant_def it
> can be either the same constant in all cases, or different)
> and in theory vect_internal_def could be also scalar shift (tried
> to test that in fn3's first bb, though currently SLP doesn't attempt
> to vectorize that, missed-optimization).
>
> The following patch fixes that.  Bootstrapped/regtested on x86_64-linux
> and i686-linux and additionally tested on the testcase with additional
> -mxop.  Ok for trunk/4.6?
>
> For better test coverage, perhaps the testcase should be duplicated
> after effective target xop and xop_runtime are added and additionally
> perhaps the testcase in gcc.dg/vect with -mxop testing what is SLP
> vectorized from the dumps would be nice.
>
> 2011-04-16  Jakub Jelinek  
>
>PR tree-optimization/48616
>* tree-vect-stmts.c (vectorizable_shift): If SLP, determine
>whether the shift is by scalar or vector based on whether all SLP
>scalar stmts have the same rhs.
>
>* gcc.dg/pr48616.c: New test.
>
> --- gcc/tree-vect-stmts.c.jj   2011-03-14 14:12:15.0 +0100
> +++ gcc/tree-vect-stmts.c   2011-04-15 20:44:30.0 +0200
> @@ -2077,7 +2077,7 @@ vectorizable_shift (gimple stmt, gimple_
>VEC (tree, heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
>tree vop0, vop1;
>unsigned int k;
> -  bool scalar_shift_arg = false;
> +  bool scalar_shift_arg = true;
>bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>int vf;
>
> @@ -2159,8 +2159,34 @@ vectorizable_shift (gimple stmt, gimple_
>/* Determine whether the shift amount is a vector, or scalar.  If the
>   shift/rotate amount is a vector, use the vector/vector shift
optabs.  */
>
> +  if (dt[1] == vect_internal_def && !slp_node)
> +scalar_shift_arg = false;
> +  else if (dt[1] == vect_constant_def
> +  || dt[1] == vect_external_def
> +  || dt[1] == vect_internal_def)
> +{
> +  /* In SLP, need to check whether the shift count is the same,
> +in loops if it is a constant or invariant, it is always
> +a scalar shift.  */
> +  if (slp_node)
> +   {
> + VEC (gimple, heap) *stmts = SLP_TREE_SCALAR_STMTS (slp_node);
> + gimple slpstmt;
> +
> + FOR_EACH_VEC_ELT (gimple, stmts, k, slpstmt)
> +   if (!operand_equal_p (gimple_assign_rhs2 (slpstmt), op1, 0))
> + scalar_shift_arg = false;

We already have this check in vect_build_slp_tree(). It didn't work for the
testcase because it doesn't take into account the type of definition. I
agree that it's better to move it here and base the vector/scalar  decision
on it, but please remove the now redundant check from vect_build_slp_tree
().

Otherwise, OK for trunk.

Thanks,
Ira

> +   }
> +}
> +  else
> +{
> +  if (vect_print_dump_info (REPORT_DETAILS))
> +   fprintf (vect_dump, "operand mode requires invariant argument.");
> +  return false;
> +}
> +
>/* Vector shifted by vector.  */
> -  if (dt[1] == vect_internal_def)
> +  if (!scalar_shift_arg)
>  {
>optab = optab_for_tree_code (code, vectype, optab_vector);
>if (vect_print_dump_info (REPORT_DETAILS))
> @@ -2168,13 +2194,12 @@ vectorizable_shift (gimple stmt, gimple_
>  }
>/* See if the machine has a vector shifted by scalar insn and if not
>   then see if it has a vector shifted by vector insn.  */
> -  else if (dt[1] == vect_constant_def || dt[1] == vect_external_def)
> +  else
>  {
>optab = optab_for_tree_code (code, vectype, optab_scalar);
>if (optab
>&& optab_handler (optab, TYPE_MODE (vectype)) !=
CODE_FOR_nothing)
>  {
> -  scalar_shift_arg = true;
>if (vect_print_dump_info (REPORT_DETAILS))
>  fprintf (vect_dump, "vector/scalar shift/rotate found.");
>  }
> @@ -2185,6 +2210,8 @@ vectorizable_shift (gimple stmt, gimple_
> && (optab_handler (optab, TYPE_MODE (vectype))
>!= CODE_FOR_nothing))
>  {
> + scalar_shift_arg = false;
> +
>if (vect_print_dump_info (REPORT_DETAILS))
>  fprintf (vect_dump, "vector/vector shift/rotate
found.");
>
> @@ -2197,12 +2224,6 @@ vectorizable_shift (gimple stmt, gimple_
>  }
>  }
>  }
> -  else
> -{
> -  if (vect_print_dump_info (REPORT_DETAILS))
> -fprintf (vect_dump, "operand mode requires invariant
argument.");
> -  return false;
> -}
>
>/* 

Re: Fix gengtype-state string hashtable

2011-04-17 Thread Basile Starynkevitch
On Sun, 17 Apr 2011 02:55:33 +0200 (CEST)
"Nicola Pero"  wrote:

> While reading GCC code, I noticed that in gengtype-state.c
> the equality function in a string hashtable is set to strcmp.
> 
> But that returns 0 (ie, false for hashtable.c) when the strings
> are equal!  I can't see how that hashtable would ever work.  Do
> we have any tests for gengtype-state ?  Am I missing something ? :-)
> 
> Else, Ok to commit the following patch ?

I am not authorized to Ok it, but I hope it will be oked.

Thanks for finding that bug.

Cheers

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


[PATCH, SMS] Avoid considering debug_insn when calculating SCCs

2011-04-17 Thread Revital Eres
Hello,

The attached patch avoids considering debug_insn when calculating SCCs.
With this change the existence of debug_insn does not influence
the scheduling order and rec_MII.

Bootstrap and regtest on ppc64-redhat-linux and regtest on arm-linux-gnueabi.

OK for mainline?

Thanks,
Revital

Changelog:

* ddg.c (find_nodes_on_paths): Ignore DEBUG_INSNs.
=== modified file 'gcc/ddg.c'
--- gcc/ddg.c   2011-03-27 07:11:08 +
+++ gcc/ddg.c   2011-04-11 12:04:54 +
@@ -1116,13 +1116,19 @@ find_nodes_on_paths (sbitmap result, ddg
{
  ddg_edge_ptr e;
  ddg_node_ptr u_node = &g->nodes[u];
-
+ 
+ /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their
+influence on the scheduling order and rec_mii.  */
+ if (DEBUG_INSN_P (u_node->insn))
+   continue;
+ 
  for (e = u_node->out; e != (ddg_edge_ptr) 0; e = e->next_out)
{
  ddg_node_ptr v_node = e->dest;
  int v = v_node->cuid;
 
- if (!TEST_BIT (reachable_from, v))
+ /* Ignore DEBUG_INSN when calculating the SCCs.  */
+ if (!TEST_BIT (reachable_from, v) && !DEBUG_INSN_P (v_node->insn))
{
  SET_BIT (reachable_from, v);
  SET_BIT (tmp, v);
@@ -1146,12 +1152,18 @@ find_nodes_on_paths (sbitmap result, ddg
  ddg_edge_ptr e;
  ddg_node_ptr u_node = &g->nodes[u];
 
+ /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their
+influence on the scheduling order and rec_mii.  */
+ if (DEBUG_INSN_P (u_node->insn))
+   continue;
+ 
  for (e = u_node->in; e != (ddg_edge_ptr) 0; e = e->next_in)
{
  ddg_node_ptr v_node = e->src;
  int v = v_node->cuid;
 
- if (!TEST_BIT (reach_to, v))
+ /* Ignore DEBUG_INSN when calculating the SCCs.  */
+ if (!TEST_BIT (reach_to, v) && !DEBUG_INSN_P (v_node->insn))
{
  SET_BIT (reach_to, v);
  SET_BIT (tmp, v);



[PATCH] New flag to apply SMS when SC equals 1 (Second try)

2011-04-17 Thread Revital Eres
Hello,

Following Richard's suggestion here is the patch again.
Tested on ppc64-redhat-linux.

OK for mainline?

Thanks,
Revital


Changelog

* params.def (sms-min-sc): New param flag.
* modulo-sched.c (sms_schedule): Use it.
=== modified file 'gcc/modulo-sched.c'
--- gcc/modulo-sched.c  2011-03-27 07:11:08 +
+++ gcc/modulo-sched.c  2011-04-17 06:24:21 +
@@ -1222,9 +1222,10 @@ sms_schedule (void)
  PS_STAGE_COUNT(ps) = stage_count;
}
   
-  /* Stage count of 1 means that there is no interleaving between
- iterations, let the scheduling passes do the job.  */
-  if (stage_count <= 1
+  /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
+1 means that there is no interleaving between iterations thus
+we let the scheduling passes do the job in this case.  */
+  if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
  || (count_init && (loop_count <= stage_count))
  || (flag_branch_probabilities && (trip_count <= stage_count)))
{

=== modified file 'gcc/params.def'
--- gcc/params.def  2011-02-02 15:52:08 +
+++ gcc/params.def  2011-04-17 05:48:44 +
@@ -344,6 +344,11 @@ DEFPARAM(PARAM_SMS_MAX_II_FACTOR,
 "sms-max-ii-factor",
 "A factor for tuning the upper bound that swing modulo scheduler uses 
for scheduling a loop",
 100, 0, 0)
+/* The minimum value of stage count that swing modulo scheduler will generate. 
 */
+DEFPARAM(PARAM_SMS_MIN_SC,
+"sms-min-sc",
+"The minimum value of stage count that swing modulo scheduler will 
generate.",
+2, 1, 1)
 DEFPARAM(PARAM_SMS_DFA_HISTORY,
 "sms-dfa-history",
 "The number of cycles the swing modulo scheduler considers when 
checking conflicts using DFA",



Re: [google] Install additional gcov files (issue4442052)

2011-04-17 Thread Ralf Wildenhues
Hello,

* Diego Novillo wrote on Sat, Apr 16, 2011 at 10:14:32PM CEST:
> 2011-04-15  Rong Xu  
> 
>   * Makefile.in (install-leaf): Install gcov-io.h, gcov-iov.h
>   gcov-io.c and libgcov.c to $DESTDIR/$inst_libdir/gcov-src.

> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -978,6 +978,18 @@ install-leaf: $(install-shared) $(install-libunwind)
> esac; \
>   done
>  
> + if [ "$(MULTIDIR)" == "." ]; then   \

With '[' aka. test, '==' is a bashism while '=' is portable and Posix.
Please use that instead.

> +   gcov_src_dest="$(DESTDIR)$(inst_libdir)/gcov-src";\
> +   $(mkinstalldirs) $$gcov_src_dest; \
> +   cp ../../gcc/gcov-iov.h $$gcov_src_dest;  \

Is it relevant whether any of the commands fail?  If so, '&&' should be
preferred over ';' (we cannot assume that make obeys Posix 2008 rules of
invoking all shell commands with errexit set; GNU make does not do it).
Esp. a mkinstalldirs failure would lead to undesirable results.

I'm aware that there are prior issues in this area in the GCC build
system.

> +   cp $(srcdir)/../gcc/gcov-io.h $$gcov_src_dest;\
> +   cp $(srcdir)/../gcc/gcov-io.c $$gcov_src_dest;\
> +   cp $(srcdir)/../gcc/libgcov.c $$gcov_src_dest;\
> +   chmod 644 $$gcov_src_dest/gcov-iov.h  \
> + $$gcov_src_dest/gcov-io.h $$gcov_src_dest/gcov-io.c \
> + $$gcov_src_dest/libgcov.c;  \
> + fi
> +
>  install: install-leaf
>   @: $(MAKE) ; $(MULTIDO) $(FLAGS_TO_PASS) multi-do DO=install

Thanks,
Ralf