[committed] vms/ia64: Define SUPPORTS_ONE_ONLY

2011-12-23 Thread Tristan Gingold
Hi,

the native ia64 VMS linker doesn't fully support COMDAT sections.

Committed on trunk.

Tristan.

2011-12-23  Tristan Gingold  ging...@adacore.com

* config/ia64/vms.h (SUPPORTS_ONE_ONLY): Define.


--- a/gcc/config/ia64/vms.h
+++ b/gcc/config/ia64/vms.h
@@ -157,3 +157,7 @@ STATIC func_ptr __CTOR_LIST__[1]
 
 #undef TARGET_PROMOTE_FUNCTION_MODE
 #define TARGET_PROMOTE_FUNCTION_MODE default_promote_function_mode_always_promo
+
+/* IA64 VMS doesn't fully support COMDAT sections.  */
+
+#define SUPPORTS_ONE_ONLY 0



[committed]: VMS: Fix a typo in vms-crtlmap.map

2011-12-23 Thread Tristan Gingold
Hi,

this patch fixes a typo in the CRTL map file.

Committed.

Tristan.

2011-12-23  Tristan Gingold  ging...@adacore.com

* config/vms/vms-crtlmap.map (log10): Fix typo.

--- a/gcc/config/vms/vms-crtlmap.map
+++ b/gcc/config/vms/vms-crtlmap.map
@@ -112,7 +112,7 @@ isupper
 kill
 localtime
 log   FLOAT
-log1  FLOAT
+log10 FLOAT
 lseek
 malloc64 MALLOC
 mbstowcs  64



Re: [PATCH] Fix PR50396

2011-12-23 Thread Richard Guenther
On Thu, 22 Dec 2011, Richard Henderson wrote:

 On 12/22/2011 07:46 AM, Richard Guenther wrote:
  Any way to test, in the testcase, whether the vector modes
  will have NaNs or not?
 
  v[0] != v[0] ?

Well, if MODE_HAS_NANS returns false we might fold 0.0/0.0 to 0.0,
or the HW might simply not have NaNs (SPU?) and have 0.0 as the
result.  Thus, I want to query GCC capabilities (-ffinite-math-only)
and HW capabilities (what we have in real_mode_format) from inside
the testcase.

Any idea?  Otherwise I'll add dg-skips for the targets that fail
the test.

Richard.


Re: [PATCH] Fix PR50396

2011-12-23 Thread Richard Guenther
On Fri, 23 Dec 2011, Richard Guenther wrote:

 On Thu, 22 Dec 2011, Richard Henderson wrote:
 
  On 12/22/2011 07:46 AM, Richard Guenther wrote:
   Any way to test, in the testcase, whether the vector modes
   will have NaNs or not?
  
   v[0] != v[0] ?
 
 Well, if MODE_HAS_NANS returns false we might fold 0.0/0.0 to 0.0,
 or the HW might simply not have NaNs (SPU?) and have 0.0 as the
 result.  Thus, I want to query GCC capabilities (-ffinite-math-only)
 and HW capabilities (what we have in real_mode_format) from inside
 the testcase.
 
 Any idea?  Otherwise I'll add dg-skips for the targets that fail
 the test.

It seems we have __type_HAS_QUIET_NAN__.  Nice.  Thus I'll use

/* { dg-do run } */

extern void abort (void);
typedef float vf128 __attribute__((vector_size(16)));
typedef float vf64 __attribute__((vector_size(8)));
int main()
{
#if !__FINITE_MATH_ONLY__
#if __FLT_HAS_QUIET_NAN__
  vf128 v = (vf128){ 0.f, 0.f, 0.f, 0.f };
  vf64 u = (vf64){ 0.f, 0.f };
  v = v / (vf128){ 0.f, 0.f, 0.f, 0.f };
  if (v[0] == v[0])
abort ();
  u = u / (vf64){ 0.f, 0.f };
  if (u[0] == u[0])
abort ();
#endif
#endif
  return 0;
}



[Ada] Straigthen implementation of aggregate libraries

2011-12-23 Thread Arnaud Charlet
Handle case where the same library project is imported by multiple
aggregated libraries.

Tested on x86_64-pc-linux-gnu, committed on trunk

2011-12-23  Pascal Obry  o...@adacore.com

* prj.ads (For_Every_Project_Imported): Add In_Aggregate_Lib
parameter to generic formal procedure.
* prj.adb (For_Every_Project_Imported): Update accordingly.
(Recursive_Check): Likewise. Do not parse imported project for
aggregate library. This is needed as the imported projects are
there just to handle dependencies.
(Look_For_Sources): Likewise.
(Recursive_Add): Likewise.
* prj-env.adb, prj-conf.adb, makeutl.adb, gnatcmd.adb:
Add In_Aggregate_Lib parameter to routines used with
For_Every_Project_Imported generic procedure.
* prj-nmsc.adb (Tree_Processing_Data): Add In_Aggregate_Lib
field.
(Check): Move where it is used. Fix implementation
to not check libraries that are inside aggregate libraries.
(Recursive_Check): Add In_Aggregate_Lib parameter.

Index: gnatcmd.adb
===
--- gnatcmd.adb (revision 182655)
+++ gnatcmd.adb (working copy)
@@ -264,6 +264,7 @@
procedure Set_Library_For
  (Project   : Project_Id;
   Tree  : Project_Tree_Ref;
+  In_Aggregate_Lib  : Boolean;
   Libraries_Present : in out Boolean);
--  If Project is a library project, add the correct -L and -l switches to
--  the linker invocation.
@@ -1264,9 +1265,10 @@
procedure Set_Library_For
  (Project   : Project_Id;
   Tree  : Project_Tree_Ref;
+  In_Aggregate_Lib  : Boolean;
   Libraries_Present : in out Boolean)
is
-  pragma Unreferenced (Tree);
+  pragma Unreferenced (Tree, In_Aggregate_Lib);
 
   Path_Option : constant String_Access :=
   MLib.Linker_Library_Path_Option;
Index: prj.adb
===
--- prj.adb (revision 182655)
+++ prj.adb (working copy)
@@ -528,20 +528,24 @@
   Seen : Project_Boolean_Htable.Instance := Project_Boolean_Htable.Nil;
 
   procedure Recursive_Check
-(Project : Project_Id;
- Tree: Project_Tree_Ref);
-  --  Check if a project has already been seen. If not seen, mark it as
-  --  Seen, Call Action, and check all its imported projects.
+(Project  : Project_Id;
+ Tree : Project_Tree_Ref;
+ In_Aggregate_Lib : Boolean);
+  --  Check if a project has already been seen. If not seen, mark it
+  --  as Seen, Call Action, and check all its imported and aggregated
+  --  projects.
 
   -
   -- Recursive_Check --
   -
 
   procedure Recursive_Check
-(Project : Project_Id;
- Tree: Project_Tree_Ref)
+(Project  : Project_Id;
+ Tree : Project_Tree_Ref;
+ In_Aggregate_Lib : Boolean)
   is
  List : Project_List;
+ T: Project_Tree_Ref;
 
   begin
  if not Get (Seen, Project) then
@@ -552,22 +556,28 @@
 Set (Seen, Project, True);
 
 if not Imported_First then
-   Action (Project, Tree, With_State);
+   Action (Project, Tree, In_Aggregate_Lib, With_State);
 end if;
 
 --  Visit all extended projects
 
 if Project.Extends /= No_Project then
-   Recursive_Check (Project.Extends, Tree);
+   Recursive_Check (Project.Extends, Tree, In_Aggregate_Lib);
 end if;
 
---  Visit all imported projects
+--  Visit all imported projects if needed. This is not needed
+--  for an aggregate library as imported libraries are just
+--  there for dependency support.
 
-List := Project.Imported_Projects;
-while List /= null loop
-   Recursive_Check (List.Project, Tree);
-   List := List.Next;
-end loop;
+if Project.Qualifier /= Aggregate_Library
+  or else not Include_Aggregated
+then
+   List := Project.Imported_Projects;
+   while List /= null loop
+  Recursive_Check (List.Project, Tree, In_Aggregate_Lib);
+  List := List.Next;
+   end loop;
+end if;
 
 --  Visit all aggregated projects
 
@@ -580,14 +590,25 @@
   Agg := Project.Aggregated_Projects;
   while Agg /= null loop
  pragma Assert (Agg.Project /= No_Project);
- Recursive_Check (Agg.Project, Agg.Tree);
+
+ --  For aggregated libraries, the tree must be the one
+ --  of the aggregate library.
+
+ if Project.Qualifier 

Re: [PATCH, PR 51600] IPA-CP workaround for negative size cloning estimates

2011-12-23 Thread Jan Hubicka
 Hi,
 
 On Wed, Dec 21, 2011 at 05:29:51PM +0100, Jan Hubicka wrote:
   Hi,
   
   given that we already have a workaround for zero size increase
   estimates from estimate_ipcp_clone_size_and_time, I see little reason
   not to extend it to negative values too, 0 is really just as bad as -2
   that we are getting in the testcase.  Hopefully this will allow peple
   who hit this bug proceed with their testing.
   
   Bootstrapped and tested on x86-64-linux with no regressions.
   OK for trunk?
  
  Hmm, so the size value is not negative because 
  estimate_ipcp_clone_size_and_time
  would return 0 or negative value but because of
size -= stats.n_calls * removable_params_cost
  (i.e. the callee function is so small that the program will really
  shrink because of reduced call overhead)?
 
 no, it is really estimate_ipcp_clone_size_and_time that returns size
 estimate -2.  In fact, the subtraction you described does not occur on
 that code path at all because I do it only for constants that occur in
 all contexts (from all callers) and this assert is on the path dealing
 with estimates of effects of constants that there are only in some
 contexts.
 
 The reason why I don't do it for constants that come from only a
 subset of callers is that some of these callers might themselves
 require context specific cloning to provide tha value but when actual
 decisions are being made later on, they would not be cloned.  So I
 don't know the set of callers that provide the constant at this time
 and cannot do the subtraction.
 
  
  In that case I guess the patch is OK, but please update the comment,
 
 Well, it't not the case, so what do you think?

Hmm, it is estimate_ipcp_clone_size_and_time bug then.  I will look into that
today.

Honza
 
 Martin


Re: RFC: An alternative -fsched-pressure implementation

2011-12-23 Thread Richard Guenther
On Fri, Dec 23, 2011 at 12:46 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 So it looks like two pieces of work related to scheduling and register
 pressure are being posted close together.  This one is an RFC for a less
 aggressive form of -fsched-pressure.  I think it should complement
 rather than compete with Bernd's IRA patch.  It seems like a good idea
 to take register pressure into account during the first scheduling pass,
 where we can still easily look at things like instruction latencies
 and pipeline utilisation.  Better rematerialisation in the register
 allocator would obviously be a good thing too though.

 This patch started when we (Linaro) saw a big drop in performance
 from vectorising an RGB to YIQ filter on ARM.  The first scheduling
 pass was overly aggressive in creating a wide schedule, and caused
 the newly-vectorised loop to contain lots of spills.  The loop grew so
 big that it even had a constant pool in the middle of it.

 -fsched-pressure did a very good job on this loop, creating far fewer
 spills and consequently avoiding the constant pool.  However, it seemed
 to make several other cases significantly worse.  The idea was therefore
 to try to come up with a form of -fsched-pressure that could be turned
 on for ARM by default.

 Current -fsched-pressure works by assigning an excess (pressure) cost change
 to each instruction; here I'll write that as ECC(X).  -fsched-pressure also
 changes the way that the main list scheduler handles stalls due to data
 dependencies.  If an instruction would stall for N cycles, the scheduler
 would normally add it to the now+N queue, then add it to the ready queue
 after N cycles.  With -fsched-pressure, it instead adds the instruction
 to the ready queue immediately, while still recording that the instruction
 would require N stalls.  I'll write the number of stalls on X as delay(X).

 This arrangement allows the scheduler to choose between increasing register
 pressure and introducing a deliberate stall.  Instructions are ranked by:

  (a) lowest ECC(X) + delay(X)
  (b) lowest delay(X)
  (c) normal list-scheduler ranking (based on things like INSN_PRIORITY)

 Note that since delay(X) is measured in cycles, ECC(X) is effectively
 measured in cycles too.

 Several things seemed to be causing the degradations we were seeing
 with -fsched-pressure:

  (1) The -fsched-pressure schedule is based purely on instruction latencies
      and issue rate; it doesn't take the DFA into account.  This means that
      we attempt to dual issue things like vector operations, loads and
      stores on Cortex A8 and A9.  In the examples I looked at, these sorts
      of inaccuracy seemed to accumulate, so that the value of delay(X)
      became based on slightly unrealistic cycle times.

      Note that this also affects code that doesn't have any pressure
      problems; it isn't limited to code that does.

      This may simply be historical.  It became much easier to use the
      DFA here after Bernd's introduction of prune_ready_list, but the
      original -fsched-pressure predates that.

  (2) We calculate ECC(X) by walking the unscheduled part of the block
      in its original order, then recording the pressure at each instruction.
      This seemed to make ECC(X) quite sensitive to that original order.
      I saw blocks that started out naturally narrow (not much ILP,
      e.g. from unrolled loops) and others that started naturally wide
      (a lot of ILP, such as in the libav h264 code), and walking the
      block in order meant that the two styles would be handled differently.

  (3) When calculating the pressure of the original block (as described
      in (2)), we ignore the deaths of registers that are used by more
      than one unscheduled instruction.  This tended to hurt long(ish)
      loops in things like filters, where the same value is often used
      as an input to two calculations.  The effect was that instructions
      towards the end of the block would appear to have very high pressure.
      This in turn made the algorithm very conservative; it wouldn't
      promote instructions from later in the block because those
      instructions seemed to have a prohibitively large cost.

      I asked Vlad about this, and he confirmed that it was a deliberate
      decision.  He'd tried honouring REG_DEAD notes instead, but it
      produced worse results on x86.  I'll return to this at the end.

  (4) ECC(X) is based on the pressure over and above ira_available_class_regs
      (the number of allocatable registers in a given class).  ARM has 14
      allocatable GENERAL_REGS: 16 minus the stack pointer and program
      counter.  So if 14 integer variables are live across a loop but
      not referenced within it, we end up scheduling that loop in a context
      of permanent pressure.  Pressure becomes the overriding concern,
      and we don't get much ILP.

      I suppose there are at least two ways of viewing this:


[PATCH] libstdc++: Make it possible to annotate the shared pointer operations in the std::thread implementation

2011-12-23 Thread Bart Van Assche
As documented in the libstdc++ manual, the shared pointer operations in
libstdc++ headers can be instrumented by defining the macros
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE()/AFTER() and libstdc++ has to be
rebuilt in order to instrument the remaining shared pointer operations.
However, rebuilding libstdc++ is inconvenient. So let's move the thread
wrapper code from thread.cc into thread.

See also:
* http://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html.
* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51504.

Signed-off-by: Bart Van Assche bvanass...@acm.org

Index: libstdc++-v3/src/thread.cc
===
--- libstdc++-v3/src/thread.cc  (revision 182271)
+++ libstdc++-v3/src/thread.cc  (working copy)
@@ -59,28 +59,6 @@ static inline int get_nprocs()
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
-  namespace
-  {
-extern C void*
-execute_native_thread_routine(void* __p)
-{
-  thread::_Impl_base* __t = static_castthread::_Impl_base*(__p);
-  thread::__shared_base_type __local;
-  __local.swap(__t-_M_this_ptr);
-
-  __try
-   {
- __t-_M_run();
-   }
-  __catch(...)
-   {
- std::terminate();
-   }
-
-  return 0;
-}
-  }
-
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   void
@@ -114,12 +92,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   thread::_M_start_thread(__shared_base_type __b)
   {
+  _M_start_thread(__b, _M_entry);
+  }
+
+  void
+  thread::_M_start_thread(__shared_base_type __b, void* (*__pf)(void*))
+  {
 if (!__gthread_active_p())
   __throw_system_error(int(errc::operation_not_permitted));
 
 __b-_M_this_ptr = __b;
-int __e = __gthread_create(_M_id._M_thread,
-  execute_native_thread_routine, __b.get());
+int __e = __gthread_create(_M_id._M_thread, __pf, __b.get());
 if (__e)
 {
   __b-_M_this_ptr.reset();
Index: libstdc++-v3/include/std/thread
===
--- libstdc++-v3/include/std/thread (revision 182271)
+++ libstdc++-v3/include/std/thread (working copy)
@@ -132,7 +132,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 _M_start_thread(_M_make_routine(std::__bind_simple(
 std::forward_Callable(__f),
-std::forward_Args(__args)...)));
+std::forward_Args(__args)...)),
+thread::_M_entry);
   }
 
 ~thread()
@@ -180,9 +181,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 hardware_concurrency() noexcept;
 
   private:
+static void* _M_entry(void* __p)
+{
+  thread::_Impl_base* __t = static_castthread::_Impl_base*(__p);
+  thread::__shared_base_type __local;
+  __local.swap(__t-_M_this_ptr);
+  
+  __try
+{
+  __t-_M_run();
+}
+  __catch(...)
+{
+  std::terminate();
+}
+  
+  return 0;
+}
+
 void
 _M_start_thread(__shared_base_type);
 
+void
+_M_start_thread(__shared_base_type, void* (*)(void*));
+
 templatetypename _Callable
   shared_ptr_Impl_Callable
   _M_make_routine(_Callable __f)
Index: libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
(revision 182271)
+++ libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
(working copy)
@@ -2145,6 +2145,7 @@ FUNC:_ZNSt6localeD1Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeD2Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeaSERKS_@@GLIBCXX_3.4
 
FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE@@GLIBCXX_3.4.11
+FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEEPFPvS3_E@@GLIBCXX_3.4.17
 FUNC:_ZNSt6thread4joinEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt6thread6detachEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt7codecvtIcc11__mbstate_tEC1EP15__locale_structm@@GLIBCXX_3.4
Index: libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt  
(revision 182271)
+++ libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt  
(working copy)
@@ -1955,6 +1955,7 @@ FUNC:_ZNSt6localeD1Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeD2Ev@@GLIBCXX_3.4
 FUNC:_ZNSt6localeaSERKS_@@GLIBCXX_3.4
 
FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE@@GLIBCXX_3.4.11
+FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEEPFPvS3_E@@GLIBCXX_3.4.17
 FUNC:_ZNSt6thread4joinEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt6thread6detachEv@@GLIBCXX_3.4.11
 FUNC:_ZNSt7codecvtIcc11__mbstate_tEC1EP15__locale_structm@@GLIBCXX_3.4
Index: libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt
(revision 182271)
+++ 

Re: RFC: IRA patch to reduce lifetimes

2011-12-23 Thread Vladimir Makarov

On 12/21/2011 09:09 AM, Bernd Schmidt wrote:

For a customer I've looked into improving code for 456.hmmer on a mips64
target. The benchmark responds to -fsched-pressure, which reduces
lifetimes of a few registers.

This patch was an experiment to see if we can get the same improvement
with modifications to IRA, making it more tolerant to over-aggressive
scheduling. THe idea is that if an instruction sets a register A, and
all its inputs are live and unmodified for the lifetime of A, then
moving the instruction downwards towards its first use is going to be
beneficial from a register pressure point of view.

That alone, however, turns out to be too aggressive, performance drops
presumably because we undo too many scheduling decisions. So, the patch
detects such situations, and splits the pseudo; a new pseudo is
introduced in the original setting instruction, and a copy is added
before the first use. If the new pseudo does not get a hard register, it
is removed again and instead the setting instruction is moved to the
point of the copy.

This gets up to 6.5% on 456.hmmer on the mips target I was working on;
an embedded benchmark suite also seems to have a (small) geomean
improvement. On x86_64, I've tested spec2k, where specint is unchanged
and specfp has a tiny performance regression. All these tests were done
with a gcc-4.6 based tree.

Thoughts? Currently the patch feels somewhat bolted on to the side of
IRA, maybe there's a nicer way to achieve this?

I think that is an excellent idea.  I used analogous approach for 
splitting pseudo in IRA on loop bounds even if it gets hard register 
inside and outside loops.  The copies are removed if the live ranges 
were not spilled in reload.


I have no problem with this patch.  It is just a small change in IRA.



Re: RFC: An alternative -fsched-pressure implementation

2011-12-23 Thread Vladimir Makarov

On 12/23/2011 06:46 AM, Richard Sandiford wrote:

So it looks like two pieces of work related to scheduling and register
pressure are being posted close together.  This one is an RFC for a less
aggressive form of -fsched-pressure.  I think it should complement
rather than compete with Bernd's IRA patch.  It seems like a good idea
to take register pressure into account during the first scheduling pass,
where we can still easily look at things like instruction latencies
and pipeline utilisation.  Better rematerialisation in the register
allocator would obviously be a good thing too though.

This patch started when we (Linaro) saw a big drop in performance
from vectorising an RGB to YIQ filter on ARM.  The first scheduling
pass was overly aggressive in creating a wide schedule, and caused
the newly-vectorised loop to contain lots of spills.  The loop grew so
big that it even had a constant pool in the middle of it.

-fsched-pressure did a very good job on this loop, creating far fewer
spills and consequently avoiding the constant pool.  However, it seemed
to make several other cases significantly worse.  The idea was therefore
to try to come up with a form of -fsched-pressure that could be turned
on for ARM by default.

Current -fsched-pressure works by assigning an excess (pressure) cost change
to each instruction; here I'll write that as ECC(X).  -fsched-pressure also
changes the way that the main list scheduler handles stalls due to data
dependencies.  If an instruction would stall for N cycles, the scheduler
would normally add it to the now+N queue, then add it to the ready queue
after N cycles.  With -fsched-pressure, it instead adds the instruction
to the ready queue immediately, while still recording that the instruction
would require N stalls.  I'll write the number of stalls on X as delay(X).

This arrangement allows the scheduler to choose between increasing register
pressure and introducing a deliberate stall.  Instructions are ranked by:

   (a) lowest ECC(X) + delay(X)
   (b) lowest delay(X)
   (c) normal list-scheduler ranking (based on things like INSN_PRIORITY)

Note that since delay(X) is measured in cycles, ECC(X) is effectively
measured in cycles too.

Several things seemed to be causing the degradations we were seeing
with -fsched-pressure:

   (1) The -fsched-pressure schedule is based purely on instruction latencies
   and issue rate; it doesn't take the DFA into account.  This means that
   we attempt to dual issue things like vector operations, loads and
   stores on Cortex A8 and A9.  In the examples I looked at, these sorts
   of inaccuracy seemed to accumulate, so that the value of delay(X)
   became based on slightly unrealistic cycle times.

   Note that this also affects code that doesn't have any pressure
   problems; it isn't limited to code that does.

   This may simply be historical.  It became much easier to use the
   DFA here after Bernd's introduction of prune_ready_list, but the
   original -fsched-pressure predates that.

   (2) We calculate ECC(X) by walking the unscheduled part of the block
   in its original order, then recording the pressure at each instruction.
   This seemed to make ECC(X) quite sensitive to that original order.
   I saw blocks that started out naturally narrow (not much ILP,
   e.g. from unrolled loops) and others that started naturally wide
   (a lot of ILP, such as in the libav h264 code), and walking the
   block in order meant that the two styles would be handled differently.

   (3) When calculating the pressure of the original block (as described
   in (2)), we ignore the deaths of registers that are used by more
   than one unscheduled instruction.  This tended to hurt long(ish)
   loops in things like filters, where the same value is often used
   as an input to two calculations.  The effect was that instructions
   towards the end of the block would appear to have very high pressure.
   This in turn made the algorithm very conservative; it wouldn't
   promote instructions from later in the block because those
   instructions seemed to have a prohibitively large cost.

   I asked Vlad about this, and he confirmed that it was a deliberate
   decision.  He'd tried honouring REG_DEAD notes instead, but it
   produced worse results on x86.  I'll return to this at the end.

   (4) ECC(X) is based on the pressure over and above ira_available_class_regs
   (the number of allocatable registers in a given class).  ARM has 14
   allocatable GENERAL_REGS: 16 minus the stack pointer and program
   counter.  So if 14 integer variables are live across a loop but
   not referenced within it, we end up scheduling that loop in a context
   of permanent pressure.  Pressure becomes the overriding concern,
   and we don't get much ILP.

   I suppose there are at least two ways of viewing this:

   (4a) We're giving an 

Re: [PATCH] libstdc++: Make it possible to annotate the shared pointer operations in the std::thread implementation

2011-12-23 Thread Paolo Carlini
Hi,

 As documented in the libstdc++ manual, the shared pointer operations in
 libstdc++ headers can be instrumented by defining the macros
 _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE()/AFTER() and libstdc++ has to be
 rebuilt in order to instrument the remaining shared pointer operations.
 However, rebuilding libstdc++ is inconvenient. So let's move the thread
 wrapper code from thread.cc into thread.

First, do you have already a Copyright assignment on file? It's a precondition 
for any non trivial contribution.

That said, please leave alone the baselines. Otherwise, Jon can comment on 
whether the reshuffling makes sense and would be safe from the Abi point of 
view.

Paolo


#undef fopen+freopen prior to #def in system.h, for aix bootstrap

2011-12-23 Thread Olivier Hainque
bootstrap currently fails for mainline on AIX, first because of problems like

 ...trunk/libcpp/system.h:47:0: error: fopen redefined [-Werror]
 .../include-fixed/stdio.h:110:0: note: this is the location of the previous 
definition

Indeed, libcpp/system and gcc/system.h have

  /* Use the unlocked open routines from libiberty.  */
  ...
  #define fopen(PATH,MODE) fopen_unlocked(PATH,MODE)
  #define fdopen(FILDES,MODE) fdopen_unlocked(FILDES,MODE)
  #define freopen(PATH,MODE,STREAM) freopen_unlocked(PATH,MODE,STREAM)

while /usr/include/stdio.h on AIX (5.3 at least) has

  #ifdef _LARGE_FILES
  ...
  #define fopen fopen64
  #define freopen freopen64

gcc/system.h already has some provision for this sort of mishap:

  #ifdef fopen /* fopen is a #define on VMS.  */
  #undef fopen
  #endif

The attached patch is a suggestion to simplify and widen this a bit
to catch all the AIX related problems to date.

Tested by checking that bootstrap proceeds (and ends successfully
after another change, to be posted shortly) on powerpc-ibm-aix5.3.0
with languages=all,ada. Also bootstrapped on i686-suse-linux.

OK ?

Thanks in advance,

Regards,

Olivier

--

2011-12-23  Olivier Hainque  hain...@adacore.com

* system.h: #undef fopen and freopen unconditionally.

libcpp/
* system.h: #undef fopen and freopen unconditionally.




aix-redef.dif
Description: video/dv


[v3] update cinttypes comments

2011-12-23 Thread Jonathan Wakely
The comments in cinttypes were copied from the TR1 implementation,
this updates them w.r.t C++11, including removing the likely a
defect comment because 27.9.2/4 clarifies that abs and div are only
overloaded for intmax_t if it's an extended integer type.

* include/c_global/cinttypes: Update comments that refer to TR1.

Tested x86_64-linux, committed to trunk.
Index: include/c_global/cinttypes
===
--- include/c_global/cinttypes  (revision 182658)
+++ include/c_global/cinttypes  (revision 182659)
@@ -1,6 +1,6 @@
 // cinttypes -*- C++ -*-
 
-// Copyright (C) 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+// Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -37,7 +37,7 @@
 
 #include cstdint
 
-// For 8.11.1/1 (see C99, Note 184)
+// For 27.9.2/3 (see C99, Note 184)
 #if _GLIBCXX_HAVE_INTTYPES_H
 # ifndef __STDC_FORMAT_MACROS
 #  define _UNDEF__STDC_FORMAT_MACROS
@@ -59,16 +59,10 @@ namespace std
 
   // functions
   using ::imaxabs;
-
-  // May collide with _Longlong abs(_Longlong), and is not described
-  // anywhere outside the synopsis.  Likely, a defect.
-  //
-  // intmax_t abs(intmax_t)
-
   using ::imaxdiv;
 
-  // Likewise, with lldiv_t div(_Longlong, _Longlong).
-  //
+  // GCC does not support extended integer types
+  // intmax_t abs(intmax_t)
   // imaxdiv_t div(intmax_t, intmax_t)
 
   using ::strtoimax;


[v3] adjust weak_ptr testcase

2011-12-23 Thread Jonathan Wakely
This modifies the test to PASS when the expected type of exception is
caught, instead of being XFAIL due to uncaught exception.

Tested x86_64-linux, committed to trunk.

* testsuite/tr1/2_general_utilities/shared_ptr/cons/
weak_ptr_expired.cc: Modify to PASS instead of XFAIL.
Index: testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc
===
--- testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc   
(revision 182660)
+++ testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc   
(revision 182661)
@@ -1,5 +1,5 @@
-// { dg-do run { xfail *-*-* } }
-// Copyright (C) 2005, 2009 Free Software Foundation
+// { dg-do run }
+// Copyright (C) 2005, 2009, 2010, 2011 Free Software Foundation
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -29,7 +29,7 @@ struct A { };
 int
 test01()
 {
-  bool test __attribute__((unused)) = true;
+  bool test = false;
 
   std::tr1::shared_ptrA a1(new A);
   std::tr1::weak_ptrA wa(a1);
@@ -42,12 +42,9 @@ test01()
   catch (const std::tr1::bad_weak_ptr)
   {
 // Expected.
-  __throw_exception_again;
-  }
-  catch (...)
-  {
-// Failed.
+test = true;
   }
+  VERIFY( test );
 
   return 0;
 }


Re: [PATCH v3 00/10] MIPS vectorization improvements

2011-12-23 Thread Richard Henderson
On 12/22/2011 12:44 PM, Richard Sandiford wrote:
 Woah, thanks, that's quite some work.  OK for the patches I didn't
 respond to.

Here's a combined follow-on patch that I believe addresses all of
the comments you had.

Ok?


r~
commit 824b5ca31ea21bb02cedabf79bb98e4348c34366
Author: Richard Henderson r...@redhat.com
Date:   Thu Dec 22 12:23:03 2011 -0800

mips: Feedback from rsandiford.

diff --git a/gcc/config/mips/mips-modes.def b/gcc/config/mips/mips-modes.def
index 85861a9..187c651 100644
--- a/gcc/config/mips/mips-modes.def
+++ b/gcc/config/mips/mips-modes.def
@@ -26,15 +26,15 @@ RESET_FLOAT_FORMAT (DF, mips_double_format);
 FLOAT_MODE (TF, 16, mips_quad_format);
 
 /* Vector modes.  */
-VECTOR_MODES (INT, 8);/*   V8QI  V4HI V2SI */
-VECTOR_MODES (FLOAT, 8);  /* V4HF V2SF */
-VECTOR_MODES (INT, 4);/* V4QI V2HI */
+VECTOR_MODES (INT, 4);/* V4QI  V2HI  */
+VECTOR_MODES (INT, 8);/* V8QI  V4HI V2SI */
+VECTOR_MODES (FLOAT, 8);  /*   V4HF V2SF */
 
 /* Double-sized vector modes for vec_concat.  */
-VECTOR_MODE (INT, QI, 16);
-VECTOR_MODE (INT, HI, 8);
-VECTOR_MODE (INT, SI, 4);
-VECTOR_MODE (FLOAT, SF, 4);
+VECTOR_MODE (INT, QI, 16);/* V16QI   */
+VECTOR_MODE (INT, HI, 8); /*   V8HI  */
+VECTOR_MODE (INT, SI, 4); /*V4SI */
+VECTOR_MODE (FLOAT, SF, 4);   /*V4SF */
 
 VECTOR_MODES (FRACT, 4);   /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4);  /* V4UQQ V2UHQ */
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index bc76078..94d2c2f 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -4638,7 +4638,7 @@ mips_get_arg_info (struct mips_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   /* The EABI conventions have traditionally been defined in terms
 of TYPE_MODE, regardless of the actual type.  */
   info-fpr_p = ((GET_MODE_CLASS (mode) == MODE_FLOAT
- || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+ || mode == V2SFmode)
  GET_MODE_SIZE (mode) = UNITS_PER_FPVALUE);
   break;
 
@@ -4653,7 +4653,7 @@ mips_get_arg_info (struct mips_arg_info *info, const 
CUMULATIVE_ARGS *cum,
 || SCALAR_FLOAT_TYPE_P (type)
 || VECTOR_FLOAT_TYPE_P (type))
  (GET_MODE_CLASS (mode) == MODE_FLOAT
-|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+|| mode == V2SFmode)
  GET_MODE_SIZE (mode) = UNITS_PER_FPVALUE);
   break;
 
@@ -4666,7 +4666,7 @@ mips_get_arg_info (struct mips_arg_info *info, const 
CUMULATIVE_ARGS *cum,
  (type == 0 || FLOAT_TYPE_P (type))
  (GET_MODE_CLASS (mode) == MODE_FLOAT
 || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
-|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+|| mode == V2SFmode)
  GET_MODE_UNIT_SIZE (mode) = UNITS_PER_FPVALUE);
 
   /* ??? According to the ABI documentation, the real and imaginary
@@ -5103,7 +5103,7 @@ static bool
 mips_return_mode_in_fpr_p (enum machine_mode mode)
 {
   return ((GET_MODE_CLASS (mode) == MODE_FLOAT
-  || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+  || mode == V2SFmode
   || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
   GET_MODE_UNIT_SIZE (mode) = UNITS_PER_HWFPVALUE);
 }
@@ -10786,8 +10786,14 @@ mips_cannot_change_mode_class (enum machine_mode from,
   enum machine_mode to,
   enum reg_class rclass)
 {
-  /* There are several problems with changing the modes of values in
- floating-point registers:
+  /* Allow conversions between different Loongson integer vectors,
+ and between those vectors and DImode.  */
+  if (GET_MODE_SIZE (from) == 8  GET_MODE_SIZE (to) == 8
+   INTEGRAL_MODE_P (from)  INTEGRAL_MODE_P (to))
+return false;
+
+  /* Otherwise, there are several problems with changing the modes of
+ values in floating-point registers:
 
  - When a multi-word value is stored in paired floating-point
registers, the first register always holds the low word.  We
@@ -10809,12 +10815,6 @@ mips_cannot_change_mode_class (enum machine_mode from,
 
  We therefore disallow all mode changes involving FPRs.  */
 
-  /* Except for Loongson and its integral vectors.  We need to be able
- to change between those modes easily.  */
-  if (GET_MODE_SIZE (from) == 8  GET_MODE_SIZE (to) == 8
-   INTEGRAL_MODE_P (from)  INTEGRAL_MODE_P (to))
-return false;
-
   return reg_classes_intersect_p (FP_REGS, rclass);
 }
 
@@ -16352,7 +16352,8 @@ struct expand_vec_perm_d
return true if that's a valid instruction in the active ISA.  */
 
 static bool
-expand_vselect (rtx target, rtx op0, const unsigned char *perm, unsigned nelt)
+mips_expand_vselect 

Re: [PATCH v3 00/10] MIPS vectorization improvements

2011-12-23 Thread Richard Sandiford
Richard Henderson r...@redhat.com writes:
 On 12/22/2011 12:44 PM, Richard Sandiford wrote:
 Woah, thanks, that's quite some work.  OK for the patches I didn't
 respond to.

 Here's a combined follow-on patch that I believe addresses all of
 the comments you had.

 Ok?

Yeah, looks good, thanks.

Richard


[BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Anatoly Sokolov
  Hi.

  This patch removes obsolete REGISTER_MOVE_COST and MEMORY_MOVE_COST
macros from the Blackfin back end in the GCC and introduces equivalent
TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST target hooks.

  Untested.

  OK to install?

* config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
* config/bfin/bfin-protos.h (bfin_register_move_cost,
bfin_memory_move_cost): Remove. 
* config/bfin/bfin.c (bfin_register_move_cost,
bfin_memory_move_cost): Make static. Change arguments type from
enum reg_class to reg_class_t and from int to bool.
(TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define.

Index: gcc/config/bfin/bfin-protos.h
===
--- gcc/config/bfin/bfin-protos.h   (revision 182658)
+++ gcc/config/bfin/bfin-protos.h   (working copy)
@@ -85,9 +85,6 @@ extern bool bfin_longcall_p (rtx, int);
 extern bool bfin_dsp_memref_p (rtx);
 extern bool bfin_expand_movmem (rtx, rtx, rtx, rtx);
 
-extern int bfin_register_move_cost (enum machine_mode, enum reg_class,
-   enum reg_class);
-extern int bfin_memory_move_cost (enum machine_mode, enum reg_class, int in);
 extern enum reg_class secondary_input_reload_class (enum reg_class,
enum machine_mode,
rtx);
Index: gcc/config/bfin/bfin.c
===
--- gcc/config/bfin/bfin.c  (revision 182658)
+++ gcc/config/bfin/bfin.c  (working copy)
@@ -2149,12 +2149,11 @@ bfin_vector_mode_supported_p (enum machi
   return mode == V2HImode;
 }
 
-/* Return the cost of moving data from a register in class CLASS1 to
-   one in class CLASS2.  A cost of 2 is the default.  */
+/* Worker function for TARGET_REGISTER_MOVE_COST.  */
 
-int
+static int
 bfin_register_move_cost (enum machine_mode mode,
-enum reg_class class1, enum reg_class class2)
+reg_class_t class1, reg_class_t class2)
 {
   /* These need secondary reloads, so they're more expensive.  */
   if ((class1 == CCREGS  !reg_class_subset_p (class2, DREGS))
@@ -2177,18 +2176,16 @@ bfin_register_move_cost (enum machine_mo
   return 2;
 }
 
-/* Return the cost of moving data of mode M between a
-   register and memory.  A value of 2 is the default; this cost is
-   relative to those in `REGISTER_MOVE_COST'.
+/* Worker function for TARGET_MEMORY_MOVE_COST.
 
??? In theory L1 memory has single-cycle latency.  We should add a switch
that tells the compiler whether we expect to use only L1 memory for the
program; it'll make the costs more accurate.  */
 
-int
+static int
 bfin_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
-  enum reg_class rclass,
-  int in ATTRIBUTE_UNUSED)
+  reg_class_t rclass,
+  bool in ATTRIBUTE_UNUSED)
 {
   /* Make memory accesses slightly more expensive than any register-register
  move.  Also, penalize non-DP registers, since they need secondary
@@ -5703,6 +5700,12 @@ bfin_conditional_register_usage (void)
 #undef  TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST bfin_address_cost
 
+#undef TARGET_REGISTER_MOVE_COST
+#define TARGET_REGISTER_MOVE_COST bfin_register_move_cost
+
+#undef TARGET_MEMORY_MOVE_COST
+#define TARGET_MEMORY_MOVE_COST bfin_memory_move_cost
+
 #undef  TARGET_ASM_INTEGER
 #define TARGET_ASM_INTEGER bfin_assemble_integer
 
Index: gcc/config/bfin/bfin.h
===
--- gcc/config/bfin/bfin.h  (revision 182658)
+++ gcc/config/bfin/bfin.h  (working copy)
@@ -975,29 +975,6 @@ typedef struct {
 /* Do not put function addr into constant pool */
 #define NO_FUNCTION_CSE 1
 
-/* A C expression for the cost of moving data from a register in class FROM to
-   one in class TO.  The classes are expressed using the enumeration values
-   such as `GENERAL_REGS'.  A value of 2 is the default; other values are
-   interpreted relative to that.
-
-   It is not required that the cost always equal 2 when FROM is the same as TO;
-   on some machines it is expensive to move between registers if they are not
-   general registers.  */
-
-#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2) \
-   bfin_register_move_cost ((MODE), (CLASS1), (CLASS2))
-
-/* A C expression for the cost of moving data of mode M between a
-   register and memory.  A value of 2 is the default; this cost is
-   relative to those in `REGISTER_MOVE_COST'.
-
-   If moving between registers and memory is more expensive than
-   between two registers, you should define this macro to express the
-   relative cost.  */
-
-#define MEMORY_MOVE_COST(MODE, CLASS, IN)  \
-  bfin_memory_move_cost ((MODE), (CLASS), (IN))
-
 /* Specify the machine mode that this machine uses
for 

[SCORE] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Anatoly Sokolov
  Hi.

  This patch removes obsolete REGISTER_MOVE_COST macro from the SCORE back 
end in the GCC and introduces equivalent TARGET_MEMORY_MOVE_COST target hook.
The MEMORY_MOVE_COST macros is removed and default implementation of the 
TARGET_MEMORY_MOVE_COST target hook is used.

  Untested.

  OK to install?

* config/score/score.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
* config/score/score-protos.h (score_register_move_cost): Remove.   
* config/score/score.c (TARGET_REGISTER_MOVE_COST): Define.
(score_register_move_cost): Make static. Change arguments type from
enum reg_class to reg_class_t.

Index: gcc/config/score/score.h
===
--- gcc/config/score/score.h(revision 182660)
+++ gcc/config/score/score.h(working copy)
@@ -601,14 +601,6 @@ typedef struct score_args
 #define REVERSIBLE_CC_MODE(MODE)1
 
 /* Describing Relative Costs of Operations  */
-/* Compute extra cost of moving data between one register class and another.  
*/
-#define REGISTER_MOVE_COST(MODE, FROM, TO) \
-  score_register_move_cost (MODE, FROM, TO)
-
-/* Moves to and from memory are quite expensive */
-#define MEMORY_MOVE_COST(MODE, CLASS, TO_P) \
-  (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P)))
-
 /* Try to generate sequences that don't involve branches.  */
 #define BRANCH_COST(speed_p, predictable_p) 2
 
Index: gcc/config/score/score-protos.h
===
--- gcc/config/score/score-protos.h (revision 182660)
+++ gcc/config/score/score-protos.h (working copy)
@@ -42,8 +42,6 @@ extern bool score_block_move (rtx* ops);
 extern int score_address_cost (rtx addr, bool speed);
 extern int score_address_p (enum machine_mode mode, rtx x, int strict);
 extern int score_reg_class (int regno);
-extern int score_register_move_cost (enum machine_mode mode, enum reg_class to,
- enum reg_class from);
 extern int score_hard_regno_mode_ok (unsigned int, enum machine_mode);
 extern int score_const_ok_for_letter_p (HOST_WIDE_INT value, char c);
 extern int score_extra_constraint (rtx op, char c);
Index: gcc/config/score/score.c
===
--- gcc/config/score/score.c(revision 182660)
+++ gcc/config/score/score.c(working copy)
@@ -187,6 +187,9 @@ struct extern_list *extern_head = 0;
 #undef TARGET_TRAMPOLINE_INIT
 #define TARGET_TRAMPOLINE_INIT score_trampoline_init
 
+#undef TARGET_REGISTER_MOVE_COST
+#define TARGET_REGISTER_MOVE_COST  score_register_move_cost
+
 /* Return true if SYMBOL is a SYMBOL_REF and OFFSET + SYMBOL points
to the same object as SYMBOL.  */
 static int
@@ -998,11 +1001,13 @@ score_legitimate_address_p (enum machine
   return score_classify_address (addr, mode, x, strict);
 }
 
-/* Return a number assessing the cost of moving a register in class
+/* Implement TARGET_REGISTER_MOVE_COST.
+
+   Return a number assessing the cost of moving a register in class
FROM to class TO. */
-int
+static int
 score_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
-  enum reg_class from, enum reg_class to)
+  reg_class_t from, reg_class_t to)
 {
   if (GR_REG_CLASS_P (from))
 {


Anatoly.



Re: [SCORE] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Richard Henderson
On 12/23/2011 11:08 AM, Anatoly Sokolov wrote:
 * config/score/score.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
 * config/score/score-protos.h (score_register_move_cost): Remove. 
   
 * config/score/score.c (TARGET_REGISTER_MOVE_COST): Define.
 (score_register_move_cost): Make static. Change arguments type from
 enum reg_class to reg_class_t.

Ok.


r~


Re: [BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Richard Henderson
On 12/23/2011 10:55 AM, Anatoly Sokolov wrote:
 * config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
 * config/bfin/bfin-protos.h (bfin_register_move_cost,
 bfin_memory_move_cost): Remove. 
 * config/bfin/bfin.c (bfin_register_move_cost,
 bfin_memory_move_cost): Make static. Change arguments type from
 enum reg_class to reg_class_t and from int to bool.
 (TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define.

Ok.


r~


Re: #undef fopen+freopen prior to #def in system.h, for aix bootstrap

2011-12-23 Thread Olivier Hainque
A minor update to provide a more precise ChangeLog:

   * system.h: #undef fopen and freopen unconditionally.


2011-12-23  Olivier Hainque  hain...@adacore.com

* system.h: Prior to #define, #undef fopen and freopen unconditionally.

libcpp/
* system.h: Likewise.




[lra] patch to fix an arm testsuite degradation

2011-12-23 Thread Vladimir Makarov
The following patch fixes a degradation of 20060102-1.c  on ARM.  Not 
updating REG notes resulted in removing an insn after LRA as it was 
wrongly considered dead.


The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 182664.

2011-12-23  Vladimir Makarov vmaka...@redhat.com

* lra.c (update_auto_inc_notes): Rename to update_reg_notes.  Make
it unconditional.  Remove REG_DEAD and REG_UNUSED too.  Make call
of add_auto_inc_notes conditional.

Index: lra.c
===
--- lra.c   (revision 182663)
+++ lra.c   (working copy)
@@ -2032,10 +2032,14 @@ add_auto_inc_notes (rtx insn, rtx x)
 }
 }
 
-/* DF infrastructure does not deal with REG_INC notes -- so update
-   them here.  */
+#endif
+
+/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
+   We change pseudos by hard registers without notification of DF and
+   that can make the notes obsolete.  DF-infrastructure does not deal
+   with REG_INC notes -- so we should regenerate them here.  */
 static void
-update_auto_inc_notes (void)
+update_reg_notes (void)
 {
   rtx *pnote;
   basic_block bb;
@@ -2048,17 +2052,19 @@ update_auto_inc_notes (void)
pnote = REG_NOTES (insn);
while (*pnote != 0)
  {
-   if (REG_NOTE_KIND (*pnote) == REG_INC)
+   if (REG_NOTE_KIND (*pnote) == REG_DEAD
+   || REG_NOTE_KIND (*pnote) == REG_UNUSED
+   || REG_NOTE_KIND (*pnote) == REG_INC)
  *pnote = XEXP (*pnote, 1);
else
  pnote = XEXP (*pnote, 1);
  }
+#ifdef AUTO_INC_DEC
add_auto_inc_notes (insn, PATTERN (insn));
+#endif
   }
 }
 
-#endif
-
 /* Set to 1 while in lra.  */
 int lra_in_progress;
 
@@ -2204,9 +2210,7 @@ lra (FILE *f)
   regstat_free_n_sets_and_refs ();
   regstat_free_ri ();
   reload_completed = 1;
-#ifdef AUTO_INC_DEC
-  update_auto_inc_notes ();
-#endif
+  update_reg_notes ();
   finish_subregs_of_mode ();
 
   inserted_p = fixup_abnormal_edges ();


Re: [lra] patch to fix an arm testsuite degradation

2011-12-23 Thread Paolo Carlini
Hi Vladimir,

 The following patch fixes a degradation of 20060102-1.c  on ARM.

unless I'm badly mistaken, I see you using quite often the form 'degradation', 
which is somewhat unusual in this mailing list. Are you using it like 
'regression' or you actually mean something slightly, subtly, different?

A bit off topic, sorry,
Paolo


Re: [lra] patch to fix an arm testsuite degradation

2011-12-23 Thread Vladimir Makarov

On 12/23/2011 04:17 PM, Paolo Carlini wrote:

Hi Vladimir,


The following patch fixes a degradation of 20060102-1.c  on ARM.

unless I'm badly mistaken, I see you using quite often the form 'degradation', 
which is somewhat unusual in this mailing list. Are you using it like 
'regression' or you actually mean something slightly, subtly, different?

Paolo, thanks for pointing this out.  You are right.  I frequently 
wrongly use this word.  I should use regression.




Re: [patch, testsuite] One more strict-volatile-bitfields test case

2011-12-23 Thread Richard Henderson
On 12/22/2011 06:28 PM, Ye Joey wrote:
   * gcc.dg/volatile-bitfields-2.c: New test.

Ok.


r~


Re: [BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST

2011-12-23 Thread Jie Zhang
Hi Anatoly,

I cannot apply your patch to a lean tree. I tried to save your email
as a text file, copy from thunderbird, copy from gmail, copy from the
mailing list archive. But neither works.

Regards,
Jie

2011/12/23 Anatoly Sokolov ae...@post.ru:
  Hi.

  This patch removes obsolete REGISTER_MOVE_COST and MEMORY_MOVE_COST
 macros from the Blackfin back end in the GCC and introduces equivalent
 TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST target hooks.

  Untested.

  OK to install?

        * config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove.
        * config/bfin/bfin-protos.h (bfin_register_move_cost,
        bfin_memory_move_cost): Remove.
        * config/bfin/bfin.c (bfin_register_move_cost,
        bfin_memory_move_cost): Make static. Change arguments type from
        enum reg_class to reg_class_t and from int to bool.
        (TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define.

 Index: gcc/config/bfin/bfin-protos.h
 ===
 --- gcc/config/bfin/bfin-protos.h       (revision 182658)
 +++ gcc/config/bfin/bfin-protos.h       (working copy)
 @@ -85,9 +85,6 @@ extern bool bfin_longcall_p (rtx, int);
  extern bool bfin_dsp_memref_p (rtx);
  extern bool bfin_expand_movmem (rtx, rtx, rtx, rtx);

 -extern int bfin_register_move_cost (enum machine_mode, enum reg_class,
 -                                   enum reg_class);
 -extern int bfin_memory_move_cost (enum machine_mode, enum reg_class, int in);
  extern enum reg_class secondary_input_reload_class (enum reg_class,
                                                    enum machine_mode,
                                                    rtx);
 Index: gcc/config/bfin/bfin.c
 ===
 --- gcc/config/bfin/bfin.c      (revision 182658)
 +++ gcc/config/bfin/bfin.c      (working copy)
 @@ -2149,12 +2149,11 @@ bfin_vector_mode_supported_p (enum machi
   return mode == V2HImode;
  }

 -/* Return the cost of moving data from a register in class CLASS1 to
 -   one in class CLASS2.  A cost of 2 is the default.  */
 +/* Worker function for TARGET_REGISTER_MOVE_COST.  */

 -int
 +static int
  bfin_register_move_cost (enum machine_mode mode,
 -                        enum reg_class class1, enum reg_class class2)
 +                        reg_class_t class1, reg_class_t class2)
  {
   /* These need secondary reloads, so they're more expensive.  */
   if ((class1 == CCREGS  !reg_class_subset_p (class2, DREGS))
 @@ -2177,18 +2176,16 @@ bfin_register_move_cost (enum machine_mo
   return 2;
  }

 -/* Return the cost of moving data of mode M between a
 -   register and memory.  A value of 2 is the default; this cost is
 -   relative to those in `REGISTER_MOVE_COST'.
 +/* Worker function for TARGET_MEMORY_MOVE_COST.

    ??? In theory L1 memory has single-cycle latency.  We should add a switch
    that tells the compiler whether we expect to use only L1 memory for the
    program; it'll make the costs more accurate.  */

 -int
 +static int
  bfin_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
 -                      enum reg_class rclass,
 -                      int in ATTRIBUTE_UNUSED)
 +                      reg_class_t rclass,
 +                      bool in ATTRIBUTE_UNUSED)
  {
   /* Make memory accesses slightly more expensive than any register-register
      move.  Also, penalize non-DP registers, since they need secondary
 @@ -5703,6 +5700,12 @@ bfin_conditional_register_usage (void)
  #undef  TARGET_ADDRESS_COST
  #define TARGET_ADDRESS_COST bfin_address_cost

 +#undef TARGET_REGISTER_MOVE_COST
 +#define TARGET_REGISTER_MOVE_COST bfin_register_move_cost
 +
 +#undef TARGET_MEMORY_MOVE_COST
 +#define TARGET_MEMORY_MOVE_COST bfin_memory_move_cost
 +
  #undef  TARGET_ASM_INTEGER
  #define TARGET_ASM_INTEGER bfin_assemble_integer

 Index: gcc/config/bfin/bfin.h
 ===
 --- gcc/config/bfin/bfin.h      (revision 182658)
 +++ gcc/config/bfin/bfin.h      (working copy)
 @@ -975,29 +975,6 @@ typedef struct {
  /* Do not put function addr into constant pool */
  #define NO_FUNCTION_CSE 1

 -/* A C expression for the cost of moving data from a register in class FROM 
 to
 -   one in class TO.  The classes are expressed using the enumeration values
 -   such as `GENERAL_REGS'.  A value of 2 is the default; other values are
 -   interpreted relative to that.
 -
 -   It is not required that the cost always equal 2 when FROM is the same as 
 TO;
 -   on some machines it is expensive to move between registers if they are not
 -   general registers.  */
 -
 -#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2) \
 -   bfin_register_move_cost ((MODE), (CLASS1), (CLASS2))
 -
 -/* A C expression for the cost of moving data of mode M between a
 -   register and memory.  A value of 2 is the default; this cost is
 -   relative to those in `REGISTER_MOVE_COST'.
 -
 -  

Re: [patch] libitm: Fix privatization safety during upgrades to serial mode.

2011-12-23 Thread Richard Henderson
On 12/22/2011 11:28 AM, Torvald Riegel wrote:
 libitm: Fix privatization safety during upgrades to serial mode.
 
   libitm/
   * beginend.cc (GTM::gtm_thread::restart): Add and handle
   finish_serial_upgrade parameter.
   * libitm.h (GTM::gtm_thread::restart): Adapt declaration.
   * config/linux/rwlock.cc (GTM::gtm_rwlock::write_lock_generic):
   Don't unset reader flag.
   (GTM::gtm_rwlock::write_upgrade_finish): New.
   * config/posix/rwlock.cc: Same.
   * config/linux/rwlock.h (GTM::gtm_rwlock::write_upgrade_finish):
   Declare.
   * config/posix/rwlock.h: Same.
   * method-serial.cc (GTM::gtm_thread::serialirr_mode): Unset reader
   flag after commit or after rollback when restarting.

Ok.



r~


C++ PATCH for c++/51507 (pack expansion in trailing-return-type)

2011-12-23 Thread Jason Merrill
The existing code to handle pack expansions in trailing-return-type 
assumed that such expansions would only occur inside decltype, which is 
not the case.  This patch fixes the test to check for whether or not 
we're doing the substitution in the context of a function body, and 
fixes at_function_scope_p to properly return false when we're 
substituting deduced arguments into a candidate function template.


Even with the change to at_function_scope_p it was impossible to tell 
that we weren't in function scope when instantiating a function 
declaration as part of overload resolution, so I also changed 
instantiate_template_1 to use push_to_top_level rather than just clear 
processing_template_decl.  In my testing it was enough to just clear 
current_function_decl as well, but since in fact the instantiation 
happens at top level it seems more correct to use push_to_top_level.


The second patch is a bug I noticed in dependent_name while working on 
this patch, though it isn't necessary to this patch; a BASELINK should 
not be considered a dependent name, or we end up treating calls to 
members of different classes as equivalent.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 4df8b36d378adc678ed4ca9ac91088ad0772b750
Author: Jason Merrill ja...@redhat.com
Date:   Thu Dec 22 11:03:09 2011 -0500

	PR c++/51507
	* search.c (at_function_scope_p): Also check cfun.
	* pt.c (tsubst_pack_expansion): Check it instead of
	cp_unevaluated_operand.
	(instantiate_template_1): Clear current_function_decl.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 820b1ff..20f67aa 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9297,6 +9297,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
   int i, len = -1;
   tree result;
   htab_t saved_local_specializations = NULL;
+  bool need_local_specializations = false;
   int levels;
 
   gcc_assert (PACK_EXPANSION_P (t));
@@ -9330,7 +9331,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
}
   if (TREE_CODE (parm_pack) == PARM_DECL)
 	{
-	  if (!cp_unevaluated_operand)
+	  if (at_function_scope_p ())
 	arg_pack = retrieve_local_specialization (parm_pack);
 	  else
 	{
@@ -9346,6 +9347,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
 		arg_pack = NULL_TREE;
 	  else
 		arg_pack = make_fnparm_pack (arg_pack);
+	  need_local_specializations = true;
 	}
 	}
   else
@@ -9476,7 +9478,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
   if (len  0)
 return error_mark_node;
 
-  if (cp_unevaluated_operand)
+  if (need_local_specializations)
 {
   /* We're in a late-specified return type, so create our own local
 	 specializations table; the current table is either NULL or (in the
@@ -14524,7 +14526,6 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
   tree fndecl;
   tree gen_tmpl;
   tree spec;
-  HOST_WIDE_INT saved_processing_template_decl;
 
   if (tmpl == error_mark_node)
 return error_mark_node;
@@ -14585,18 +14586,22 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
  deferring all checks until we have the FUNCTION_DECL.  */
   push_deferring_access_checks (dk_deferred);
 
-  /* Although PROCESSING_TEMPLATE_DECL may be true at this point
- (because, for example, we have encountered a non-dependent
- function call in the body of a template function and must now
- determine which of several overloaded functions will be called),
- within the instantiation itself we are not processing a
- template.  */  
-  saved_processing_template_decl = processing_template_decl;
-  processing_template_decl = 0;
+  /* Instantiation of the function happens in the context of the function
+ template, not the context of the overload resolution we're doing.  */
+  push_to_top_level ();
+  if (DECL_CLASS_SCOPE_P (gen_tmpl))
+{
+  tree ctx = tsubst (DECL_CONTEXT (gen_tmpl), targ_ptr,
+			 complain, gen_tmpl);
+  push_nested_class (ctx);
+}
   /* Substitute template parameters to obtain the specialization.  */
   fndecl = tsubst (DECL_TEMPLATE_RESULT (gen_tmpl),
 		   targ_ptr, complain, gen_tmpl);
-  processing_template_decl = saved_processing_template_decl;
+  if (DECL_CLASS_SCOPE_P (gen_tmpl))
+pop_nested_class ();
+  pop_from_top_level ();
+
   if (fndecl == error_mark_node)
 return error_mark_node;
 
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index 0ceb5bc..45fdafc 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -539,7 +539,11 @@ int
 at_function_scope_p (void)
 {
   tree cs = current_scope ();
-  return cs  TREE_CODE (cs) == FUNCTION_DECL;
+  /* Also check cfun to make sure that we're really compiling
+ this function (as opposed to having set current_function_decl
+ for access checking or some such).  */
+  return (cs  TREE_CODE (cs) == FUNCTION_DECL
+	   cfun  cfun-decl == current_function_decl);
 }
 
 /* Returns 

[committed] Remove VEC_EXTRACT_EVEN/ODD_EXPR

2011-12-23 Thread Richard Henderson
Having now committed patches to convert all targets to vec_perm_const, 
supporting the interleave and even/odd permutations, we can now remove the 
VEC_INTERLEAVE_HIGH/LOW_EXPR and VEC_EXTRACT_EVEN/ODD_EXPR  codes as redundant 
with the primary VEC_PERM_EXPR code.

I have committed the patch previously posted by Jakub (and approved by Richi) 
that removes VEC_INTERLEAVE_HIGH/LOW_EXPR.  I have also committed thefollowing 
patch which removes VEC_EXTRACT_EVEN/ODD_EXPR.

All re-tested on x86_64-linux.


r~
* tree.def (VEC_EXTRACT_EVEN_EXPR, VEC_EXTRACT_ODD_EXPR): Remove.
* cfgexpand.c (expand_debug_expr): Don't handle them.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (fold_binary_loc): Likewise.
* gimple-pretty-print.c (dump_binary_rhs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-vect-generic.c (expand_vector_operations_1): Likewise.
* optabs.c (optab_for_tree_code): Likewise.
(can_vec_perm_for_code_p): Remove.
(expand_binop): Don't try it.
(init_optabs): Don't init vec_extract_even/odd_optab.
* genopinit.c (optabs): Likewise.
* optabs.h (OTI_vec_extract_even, OTI_vec_extract_odd): Remove.
(vec_extract_even_optab, vec_extract_odd_optab): Remove.
* tree-vect-data-refs.c (vect_strided_store_supported): Tidy code.
(vect_permute_store_chain): Use TYPE_VECTOR_SUBPARTS instead of
GET_MODE_NUNITS; check vect_gen_perm_mask return value instead of
asserting vect_strided_store_supported.
(vect_strided_load_supported): Use can_vec_perm_p.
(vect_permute_load_chain): Use VEC_PERM_EXPR.

* doc/generic.texi (VEC_EXTRACT_EVEN_EXPR): Remove.
(VEC_EXTRACT_ODD_EXPR): Remove.
* doc/md.texi (vec_extract_even, vec_extract_odd): Remove.


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index dfe5442..2b2e464 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3449,8 +3449,6 @@ expand_debug_expr (tree exp)
 case REDUC_MIN_EXPR:
 case REDUC_PLUS_EXPR:
 case VEC_COND_EXPR:
-case VEC_EXTRACT_EVEN_EXPR:
-case VEC_EXTRACT_ODD_EXPR:
 case VEC_LSHIFT_EXPR:
 case VEC_PACK_FIX_TRUNC_EXPR:
 case VEC_PACK_SAT_EXPR:
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 4f26238..31e8855 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1695,8 +1695,6 @@ its sole argument yields the representation for @code{ap}.
 @tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_PACK_FIX_TRUNC_EXPR
-@tindex VEC_EXTRACT_EVEN_EXPR
-@tindex VEC_EXTRACT_ODD_EXPR
 
 @table @code
 @item VEC_LSHIFT_EXPR
@@ -1765,13 +1763,6 @@ of elements of a floating point type.  The result is a 
vector that contains
 twice as many elements of an integral type whose size is half as wide.  The
 elements of the two vectors are merged (concatenated) to form the output
 vector.
-
-@item VEC_EXTRACT_EVEN_EXPR
-@itemx VEC_EXTRACT_ODD_EXPR
-These nodes represent extracting of the even/odd elements of the two input
-vectors, respectively. Their operands and result are vectors that contain the
-same number of elements of the same type.
-
 @end table
 
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6dd6a58..93183e6 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4145,20 +4145,6 @@ operand 1 is new value of field and operand 2 specify 
the field index.
 Extract given field from the vector value.  Operand 1 is the vector, operand 2
 specify field index and operand 0 place to store value into.
 
-@cindex @code{vec_extract_even@var{m}} instruction pattern
-@item @samp{vec_extract_even@var{m}}
-Extract even elements from the input vectors (operand 1 and operand 2).
-The even elements of operand 2 are concatenated to the even elements of operand
-1 in their original order. The result is stored in operand 0.
-The output and input vectors should have the same modes.
-
-@cindex @code{vec_extract_odd@var{m}} instruction pattern
-@item @samp{vec_extract_odd@var{m}}
-Extract odd elements from the input vectors (operand 1 and operand 2).
-The odd elements of operand 2 are concatenated to the odd elements of operand
-1 in their original order. The result is stored in operand 0.
-The output and input vectors should have the same modes.
-
 @cindex @code{vec_init@var{m}} instruction pattern
 @item @samp{vec_init@var{m}}
 Initialize the vector to given values.  Operand 0 is the vector to initialize
diff --git a/gcc/expr.c b/gcc/expr.c
index cb28f48..c10f915 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8647,10 +8647,6 @@ expand_expr_real_2 (sepops ops, rtx target, enum 
machine_mode tmode,
 return temp;
   }
 
-case VEC_EXTRACT_EVEN_EXPR:
-case VEC_EXTRACT_ODD_EXPR:
-  goto binop;
-
 case VEC_LSHIFT_EXPR:
 case VEC_RSHIFT_EXPR:
   {
diff --git