[committed] vms/ia64: Define SUPPORTS_ONE_ONLY
Hi, the native ia64 VMS linker doesn't fully support COMDAT sections. Committed on trunk. Tristan. 2011-12-23 Tristan Gingold ging...@adacore.com * config/ia64/vms.h (SUPPORTS_ONE_ONLY): Define. --- a/gcc/config/ia64/vms.h +++ b/gcc/config/ia64/vms.h @@ -157,3 +157,7 @@ STATIC func_ptr __CTOR_LIST__[1] #undef TARGET_PROMOTE_FUNCTION_MODE #define TARGET_PROMOTE_FUNCTION_MODE default_promote_function_mode_always_promo + +/* IA64 VMS doesn't fully support COMDAT sections. */ + +#define SUPPORTS_ONE_ONLY 0
[committed]: VMS: Fix a typo in vms-crtlmap.map
Hi, this patch fixes a typo in the CRTL map file. Committed. Tristan. 2011-12-23 Tristan Gingold ging...@adacore.com * config/vms/vms-crtlmap.map (log10): Fix typo. --- a/gcc/config/vms/vms-crtlmap.map +++ b/gcc/config/vms/vms-crtlmap.map @@ -112,7 +112,7 @@ isupper kill localtime log FLOAT -log1 FLOAT +log10 FLOAT lseek malloc64 MALLOC mbstowcs 64
Re: [PATCH] Fix PR50396
On Thu, 22 Dec 2011, Richard Henderson wrote: On 12/22/2011 07:46 AM, Richard Guenther wrote: Any way to test, in the testcase, whether the vector modes will have NaNs or not? v[0] != v[0] ? Well, if MODE_HAS_NANS returns false we might fold 0.0/0.0 to 0.0, or the HW might simply not have NaNs (SPU?) and have 0.0 as the result. Thus, I want to query GCC capabilities (-ffinite-math-only) and HW capabilities (what we have in real_mode_format) from inside the testcase. Any idea? Otherwise I'll add dg-skips for the targets that fail the test. Richard.
Re: [PATCH] Fix PR50396
On Fri, 23 Dec 2011, Richard Guenther wrote: On Thu, 22 Dec 2011, Richard Henderson wrote: On 12/22/2011 07:46 AM, Richard Guenther wrote: Any way to test, in the testcase, whether the vector modes will have NaNs or not? v[0] != v[0] ? Well, if MODE_HAS_NANS returns false we might fold 0.0/0.0 to 0.0, or the HW might simply not have NaNs (SPU?) and have 0.0 as the result. Thus, I want to query GCC capabilities (-ffinite-math-only) and HW capabilities (what we have in real_mode_format) from inside the testcase. Any idea? Otherwise I'll add dg-skips for the targets that fail the test. It seems we have __type_HAS_QUIET_NAN__. Nice. Thus I'll use /* { dg-do run } */ extern void abort (void); typedef float vf128 __attribute__((vector_size(16))); typedef float vf64 __attribute__((vector_size(8))); int main() { #if !__FINITE_MATH_ONLY__ #if __FLT_HAS_QUIET_NAN__ vf128 v = (vf128){ 0.f, 0.f, 0.f, 0.f }; vf64 u = (vf64){ 0.f, 0.f }; v = v / (vf128){ 0.f, 0.f, 0.f, 0.f }; if (v[0] == v[0]) abort (); u = u / (vf64){ 0.f, 0.f }; if (u[0] == u[0]) abort (); #endif #endif return 0; }
[Ada] Straigthen implementation of aggregate libraries
Handle case where the same library project is imported by multiple aggregated libraries. Tested on x86_64-pc-linux-gnu, committed on trunk 2011-12-23 Pascal Obry o...@adacore.com * prj.ads (For_Every_Project_Imported): Add In_Aggregate_Lib parameter to generic formal procedure. * prj.adb (For_Every_Project_Imported): Update accordingly. (Recursive_Check): Likewise. Do not parse imported project for aggregate library. This is needed as the imported projects are there just to handle dependencies. (Look_For_Sources): Likewise. (Recursive_Add): Likewise. * prj-env.adb, prj-conf.adb, makeutl.adb, gnatcmd.adb: Add In_Aggregate_Lib parameter to routines used with For_Every_Project_Imported generic procedure. * prj-nmsc.adb (Tree_Processing_Data): Add In_Aggregate_Lib field. (Check): Move where it is used. Fix implementation to not check libraries that are inside aggregate libraries. (Recursive_Check): Add In_Aggregate_Lib parameter. Index: gnatcmd.adb === --- gnatcmd.adb (revision 182655) +++ gnatcmd.adb (working copy) @@ -264,6 +264,7 @@ procedure Set_Library_For (Project : Project_Id; Tree : Project_Tree_Ref; + In_Aggregate_Lib : Boolean; Libraries_Present : in out Boolean); -- If Project is a library project, add the correct -L and -l switches to -- the linker invocation. @@ -1264,9 +1265,10 @@ procedure Set_Library_For (Project : Project_Id; Tree : Project_Tree_Ref; + In_Aggregate_Lib : Boolean; Libraries_Present : in out Boolean) is - pragma Unreferenced (Tree); + pragma Unreferenced (Tree, In_Aggregate_Lib); Path_Option : constant String_Access := MLib.Linker_Library_Path_Option; Index: prj.adb === --- prj.adb (revision 182655) +++ prj.adb (working copy) @@ -528,20 +528,24 @@ Seen : Project_Boolean_Htable.Instance := Project_Boolean_Htable.Nil; procedure Recursive_Check -(Project : Project_Id; - Tree: Project_Tree_Ref); - -- Check if a project has already been seen. If not seen, mark it as - -- Seen, Call Action, and check all its imported projects. +(Project : Project_Id; + Tree : Project_Tree_Ref; + In_Aggregate_Lib : Boolean); + -- Check if a project has already been seen. If not seen, mark it + -- as Seen, Call Action, and check all its imported and aggregated + -- projects. - -- Recursive_Check -- - procedure Recursive_Check -(Project : Project_Id; - Tree: Project_Tree_Ref) +(Project : Project_Id; + Tree : Project_Tree_Ref; + In_Aggregate_Lib : Boolean) is List : Project_List; + T: Project_Tree_Ref; begin if not Get (Seen, Project) then @@ -552,22 +556,28 @@ Set (Seen, Project, True); if not Imported_First then - Action (Project, Tree, With_State); + Action (Project, Tree, In_Aggregate_Lib, With_State); end if; -- Visit all extended projects if Project.Extends /= No_Project then - Recursive_Check (Project.Extends, Tree); + Recursive_Check (Project.Extends, Tree, In_Aggregate_Lib); end if; --- Visit all imported projects +-- Visit all imported projects if needed. This is not needed +-- for an aggregate library as imported libraries are just +-- there for dependency support. -List := Project.Imported_Projects; -while List /= null loop - Recursive_Check (List.Project, Tree); - List := List.Next; -end loop; +if Project.Qualifier /= Aggregate_Library + or else not Include_Aggregated +then + List := Project.Imported_Projects; + while List /= null loop + Recursive_Check (List.Project, Tree, In_Aggregate_Lib); + List := List.Next; + end loop; +end if; -- Visit all aggregated projects @@ -580,14 +590,25 @@ Agg := Project.Aggregated_Projects; while Agg /= null loop pragma Assert (Agg.Project /= No_Project); - Recursive_Check (Agg.Project, Agg.Tree); + + -- For aggregated libraries, the tree must be the one + -- of the aggregate library. + + if Project.Qualifier
Re: [PATCH, PR 51600] IPA-CP workaround for negative size cloning estimates
Hi, On Wed, Dec 21, 2011 at 05:29:51PM +0100, Jan Hubicka wrote: Hi, given that we already have a workaround for zero size increase estimates from estimate_ipcp_clone_size_and_time, I see little reason not to extend it to negative values too, 0 is really just as bad as -2 that we are getting in the testcase. Hopefully this will allow peple who hit this bug proceed with their testing. Bootstrapped and tested on x86-64-linux with no regressions. OK for trunk? Hmm, so the size value is not negative because estimate_ipcp_clone_size_and_time would return 0 or negative value but because of size -= stats.n_calls * removable_params_cost (i.e. the callee function is so small that the program will really shrink because of reduced call overhead)? no, it is really estimate_ipcp_clone_size_and_time that returns size estimate -2. In fact, the subtraction you described does not occur on that code path at all because I do it only for constants that occur in all contexts (from all callers) and this assert is on the path dealing with estimates of effects of constants that there are only in some contexts. The reason why I don't do it for constants that come from only a subset of callers is that some of these callers might themselves require context specific cloning to provide tha value but when actual decisions are being made later on, they would not be cloned. So I don't know the set of callers that provide the constant at this time and cannot do the subtraction. In that case I guess the patch is OK, but please update the comment, Well, it't not the case, so what do you think? Hmm, it is estimate_ipcp_clone_size_and_time bug then. I will look into that today. Honza Martin
Re: RFC: An alternative -fsched-pressure implementation
On Fri, Dec 23, 2011 at 12:46 PM, Richard Sandiford richard.sandif...@linaro.org wrote: So it looks like two pieces of work related to scheduling and register pressure are being posted close together. This one is an RFC for a less aggressive form of -fsched-pressure. I think it should complement rather than compete with Bernd's IRA patch. It seems like a good idea to take register pressure into account during the first scheduling pass, where we can still easily look at things like instruction latencies and pipeline utilisation. Better rematerialisation in the register allocator would obviously be a good thing too though. This patch started when we (Linaro) saw a big drop in performance from vectorising an RGB to YIQ filter on ARM. The first scheduling pass was overly aggressive in creating a wide schedule, and caused the newly-vectorised loop to contain lots of spills. The loop grew so big that it even had a constant pool in the middle of it. -fsched-pressure did a very good job on this loop, creating far fewer spills and consequently avoiding the constant pool. However, it seemed to make several other cases significantly worse. The idea was therefore to try to come up with a form of -fsched-pressure that could be turned on for ARM by default. Current -fsched-pressure works by assigning an excess (pressure) cost change to each instruction; here I'll write that as ECC(X). -fsched-pressure also changes the way that the main list scheduler handles stalls due to data dependencies. If an instruction would stall for N cycles, the scheduler would normally add it to the now+N queue, then add it to the ready queue after N cycles. With -fsched-pressure, it instead adds the instruction to the ready queue immediately, while still recording that the instruction would require N stalls. I'll write the number of stalls on X as delay(X). This arrangement allows the scheduler to choose between increasing register pressure and introducing a deliberate stall. Instructions are ranked by: (a) lowest ECC(X) + delay(X) (b) lowest delay(X) (c) normal list-scheduler ranking (based on things like INSN_PRIORITY) Note that since delay(X) is measured in cycles, ECC(X) is effectively measured in cycles too. Several things seemed to be causing the degradations we were seeing with -fsched-pressure: (1) The -fsched-pressure schedule is based purely on instruction latencies and issue rate; it doesn't take the DFA into account. This means that we attempt to dual issue things like vector operations, loads and stores on Cortex A8 and A9. In the examples I looked at, these sorts of inaccuracy seemed to accumulate, so that the value of delay(X) became based on slightly unrealistic cycle times. Note that this also affects code that doesn't have any pressure problems; it isn't limited to code that does. This may simply be historical. It became much easier to use the DFA here after Bernd's introduction of prune_ready_list, but the original -fsched-pressure predates that. (2) We calculate ECC(X) by walking the unscheduled part of the block in its original order, then recording the pressure at each instruction. This seemed to make ECC(X) quite sensitive to that original order. I saw blocks that started out naturally narrow (not much ILP, e.g. from unrolled loops) and others that started naturally wide (a lot of ILP, such as in the libav h264 code), and walking the block in order meant that the two styles would be handled differently. (3) When calculating the pressure of the original block (as described in (2)), we ignore the deaths of registers that are used by more than one unscheduled instruction. This tended to hurt long(ish) loops in things like filters, where the same value is often used as an input to two calculations. The effect was that instructions towards the end of the block would appear to have very high pressure. This in turn made the algorithm very conservative; it wouldn't promote instructions from later in the block because those instructions seemed to have a prohibitively large cost. I asked Vlad about this, and he confirmed that it was a deliberate decision. He'd tried honouring REG_DEAD notes instead, but it produced worse results on x86. I'll return to this at the end. (4) ECC(X) is based on the pressure over and above ira_available_class_regs (the number of allocatable registers in a given class). ARM has 14 allocatable GENERAL_REGS: 16 minus the stack pointer and program counter. So if 14 integer variables are live across a loop but not referenced within it, we end up scheduling that loop in a context of permanent pressure. Pressure becomes the overriding concern, and we don't get much ILP. I suppose there are at least two ways of viewing this:
[PATCH] libstdc++: Make it possible to annotate the shared pointer operations in the std::thread implementation
As documented in the libstdc++ manual, the shared pointer operations in libstdc++ headers can be instrumented by defining the macros _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE()/AFTER() and libstdc++ has to be rebuilt in order to instrument the remaining shared pointer operations. However, rebuilding libstdc++ is inconvenient. So let's move the thread wrapper code from thread.cc into thread. See also: * http://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html. * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51504. Signed-off-by: Bart Van Assche bvanass...@acm.org Index: libstdc++-v3/src/thread.cc === --- libstdc++-v3/src/thread.cc (revision 182271) +++ libstdc++-v3/src/thread.cc (working copy) @@ -59,28 +59,6 @@ static inline int get_nprocs() namespace std _GLIBCXX_VISIBILITY(default) { - namespace - { -extern C void* -execute_native_thread_routine(void* __p) -{ - thread::_Impl_base* __t = static_castthread::_Impl_base*(__p); - thread::__shared_base_type __local; - __local.swap(__t-_M_this_ptr); - - __try - { - __t-_M_run(); - } - __catch(...) - { - std::terminate(); - } - - return 0; -} - } - _GLIBCXX_BEGIN_NAMESPACE_VERSION void @@ -114,12 +92,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION void thread::_M_start_thread(__shared_base_type __b) { + _M_start_thread(__b, _M_entry); + } + + void + thread::_M_start_thread(__shared_base_type __b, void* (*__pf)(void*)) + { if (!__gthread_active_p()) __throw_system_error(int(errc::operation_not_permitted)); __b-_M_this_ptr = __b; -int __e = __gthread_create(_M_id._M_thread, - execute_native_thread_routine, __b.get()); +int __e = __gthread_create(_M_id._M_thread, __pf, __b.get()); if (__e) { __b-_M_this_ptr.reset(); Index: libstdc++-v3/include/std/thread === --- libstdc++-v3/include/std/thread (revision 182271) +++ libstdc++-v3/include/std/thread (working copy) @@ -132,7 +132,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { _M_start_thread(_M_make_routine(std::__bind_simple( std::forward_Callable(__f), -std::forward_Args(__args)...))); +std::forward_Args(__args)...)), +thread::_M_entry); } ~thread() @@ -180,9 +181,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION hardware_concurrency() noexcept; private: +static void* _M_entry(void* __p) +{ + thread::_Impl_base* __t = static_castthread::_Impl_base*(__p); + thread::__shared_base_type __local; + __local.swap(__t-_M_this_ptr); + + __try +{ + __t-_M_run(); +} + __catch(...) +{ + std::terminate(); +} + + return 0; +} + void _M_start_thread(__shared_base_type); +void +_M_start_thread(__shared_base_type, void* (*)(void*)); + templatetypename _Callable shared_ptr_Impl_Callable _M_make_routine(_Callable __f) Index: libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt === --- libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt (revision 182271) +++ libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt (working copy) @@ -2145,6 +2145,7 @@ FUNC:_ZNSt6localeD1Ev@@GLIBCXX_3.4 FUNC:_ZNSt6localeD2Ev@@GLIBCXX_3.4 FUNC:_ZNSt6localeaSERKS_@@GLIBCXX_3.4 FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE@@GLIBCXX_3.4.11 +FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEEPFPvS3_E@@GLIBCXX_3.4.17 FUNC:_ZNSt6thread4joinEv@@GLIBCXX_3.4.11 FUNC:_ZNSt6thread6detachEv@@GLIBCXX_3.4.11 FUNC:_ZNSt7codecvtIcc11__mbstate_tEC1EP15__locale_structm@@GLIBCXX_3.4 Index: libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt === --- libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt (revision 182271) +++ libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt (working copy) @@ -1955,6 +1955,7 @@ FUNC:_ZNSt6localeD1Ev@@GLIBCXX_3.4 FUNC:_ZNSt6localeD2Ev@@GLIBCXX_3.4 FUNC:_ZNSt6localeaSERKS_@@GLIBCXX_3.4 FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE@@GLIBCXX_3.4.11 +FUNC:_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEEPFPvS3_E@@GLIBCXX_3.4.17 FUNC:_ZNSt6thread4joinEv@@GLIBCXX_3.4.11 FUNC:_ZNSt6thread6detachEv@@GLIBCXX_3.4.11 FUNC:_ZNSt7codecvtIcc11__mbstate_tEC1EP15__locale_structm@@GLIBCXX_3.4 Index: libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt === --- libstdc++-v3/config/abi/post/ia64-linux-gnu/baseline_symbols.txt (revision 182271) +++
Re: RFC: IRA patch to reduce lifetimes
On 12/21/2011 09:09 AM, Bernd Schmidt wrote: For a customer I've looked into improving code for 456.hmmer on a mips64 target. The benchmark responds to -fsched-pressure, which reduces lifetimes of a few registers. This patch was an experiment to see if we can get the same improvement with modifications to IRA, making it more tolerant to over-aggressive scheduling. THe idea is that if an instruction sets a register A, and all its inputs are live and unmodified for the lifetime of A, then moving the instruction downwards towards its first use is going to be beneficial from a register pressure point of view. That alone, however, turns out to be too aggressive, performance drops presumably because we undo too many scheduling decisions. So, the patch detects such situations, and splits the pseudo; a new pseudo is introduced in the original setting instruction, and a copy is added before the first use. If the new pseudo does not get a hard register, it is removed again and instead the setting instruction is moved to the point of the copy. This gets up to 6.5% on 456.hmmer on the mips target I was working on; an embedded benchmark suite also seems to have a (small) geomean improvement. On x86_64, I've tested spec2k, where specint is unchanged and specfp has a tiny performance regression. All these tests were done with a gcc-4.6 based tree. Thoughts? Currently the patch feels somewhat bolted on to the side of IRA, maybe there's a nicer way to achieve this? I think that is an excellent idea. I used analogous approach for splitting pseudo in IRA on loop bounds even if it gets hard register inside and outside loops. The copies are removed if the live ranges were not spilled in reload. I have no problem with this patch. It is just a small change in IRA.
Re: RFC: An alternative -fsched-pressure implementation
On 12/23/2011 06:46 AM, Richard Sandiford wrote: So it looks like two pieces of work related to scheduling and register pressure are being posted close together. This one is an RFC for a less aggressive form of -fsched-pressure. I think it should complement rather than compete with Bernd's IRA patch. It seems like a good idea to take register pressure into account during the first scheduling pass, where we can still easily look at things like instruction latencies and pipeline utilisation. Better rematerialisation in the register allocator would obviously be a good thing too though. This patch started when we (Linaro) saw a big drop in performance from vectorising an RGB to YIQ filter on ARM. The first scheduling pass was overly aggressive in creating a wide schedule, and caused the newly-vectorised loop to contain lots of spills. The loop grew so big that it even had a constant pool in the middle of it. -fsched-pressure did a very good job on this loop, creating far fewer spills and consequently avoiding the constant pool. However, it seemed to make several other cases significantly worse. The idea was therefore to try to come up with a form of -fsched-pressure that could be turned on for ARM by default. Current -fsched-pressure works by assigning an excess (pressure) cost change to each instruction; here I'll write that as ECC(X). -fsched-pressure also changes the way that the main list scheduler handles stalls due to data dependencies. If an instruction would stall for N cycles, the scheduler would normally add it to the now+N queue, then add it to the ready queue after N cycles. With -fsched-pressure, it instead adds the instruction to the ready queue immediately, while still recording that the instruction would require N stalls. I'll write the number of stalls on X as delay(X). This arrangement allows the scheduler to choose between increasing register pressure and introducing a deliberate stall. Instructions are ranked by: (a) lowest ECC(X) + delay(X) (b) lowest delay(X) (c) normal list-scheduler ranking (based on things like INSN_PRIORITY) Note that since delay(X) is measured in cycles, ECC(X) is effectively measured in cycles too. Several things seemed to be causing the degradations we were seeing with -fsched-pressure: (1) The -fsched-pressure schedule is based purely on instruction latencies and issue rate; it doesn't take the DFA into account. This means that we attempt to dual issue things like vector operations, loads and stores on Cortex A8 and A9. In the examples I looked at, these sorts of inaccuracy seemed to accumulate, so that the value of delay(X) became based on slightly unrealistic cycle times. Note that this also affects code that doesn't have any pressure problems; it isn't limited to code that does. This may simply be historical. It became much easier to use the DFA here after Bernd's introduction of prune_ready_list, but the original -fsched-pressure predates that. (2) We calculate ECC(X) by walking the unscheduled part of the block in its original order, then recording the pressure at each instruction. This seemed to make ECC(X) quite sensitive to that original order. I saw blocks that started out naturally narrow (not much ILP, e.g. from unrolled loops) and others that started naturally wide (a lot of ILP, such as in the libav h264 code), and walking the block in order meant that the two styles would be handled differently. (3) When calculating the pressure of the original block (as described in (2)), we ignore the deaths of registers that are used by more than one unscheduled instruction. This tended to hurt long(ish) loops in things like filters, where the same value is often used as an input to two calculations. The effect was that instructions towards the end of the block would appear to have very high pressure. This in turn made the algorithm very conservative; it wouldn't promote instructions from later in the block because those instructions seemed to have a prohibitively large cost. I asked Vlad about this, and he confirmed that it was a deliberate decision. He'd tried honouring REG_DEAD notes instead, but it produced worse results on x86. I'll return to this at the end. (4) ECC(X) is based on the pressure over and above ira_available_class_regs (the number of allocatable registers in a given class). ARM has 14 allocatable GENERAL_REGS: 16 minus the stack pointer and program counter. So if 14 integer variables are live across a loop but not referenced within it, we end up scheduling that loop in a context of permanent pressure. Pressure becomes the overriding concern, and we don't get much ILP. I suppose there are at least two ways of viewing this: (4a) We're giving an
Re: [PATCH] libstdc++: Make it possible to annotate the shared pointer operations in the std::thread implementation
Hi, As documented in the libstdc++ manual, the shared pointer operations in libstdc++ headers can be instrumented by defining the macros _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE()/AFTER() and libstdc++ has to be rebuilt in order to instrument the remaining shared pointer operations. However, rebuilding libstdc++ is inconvenient. So let's move the thread wrapper code from thread.cc into thread. First, do you have already a Copyright assignment on file? It's a precondition for any non trivial contribution. That said, please leave alone the baselines. Otherwise, Jon can comment on whether the reshuffling makes sense and would be safe from the Abi point of view. Paolo
#undef fopen+freopen prior to #def in system.h, for aix bootstrap
bootstrap currently fails for mainline on AIX, first because of problems like ...trunk/libcpp/system.h:47:0: error: fopen redefined [-Werror] .../include-fixed/stdio.h:110:0: note: this is the location of the previous definition Indeed, libcpp/system and gcc/system.h have /* Use the unlocked open routines from libiberty. */ ... #define fopen(PATH,MODE) fopen_unlocked(PATH,MODE) #define fdopen(FILDES,MODE) fdopen_unlocked(FILDES,MODE) #define freopen(PATH,MODE,STREAM) freopen_unlocked(PATH,MODE,STREAM) while /usr/include/stdio.h on AIX (5.3 at least) has #ifdef _LARGE_FILES ... #define fopen fopen64 #define freopen freopen64 gcc/system.h already has some provision for this sort of mishap: #ifdef fopen /* fopen is a #define on VMS. */ #undef fopen #endif The attached patch is a suggestion to simplify and widen this a bit to catch all the AIX related problems to date. Tested by checking that bootstrap proceeds (and ends successfully after another change, to be posted shortly) on powerpc-ibm-aix5.3.0 with languages=all,ada. Also bootstrapped on i686-suse-linux. OK ? Thanks in advance, Regards, Olivier -- 2011-12-23 Olivier Hainque hain...@adacore.com * system.h: #undef fopen and freopen unconditionally. libcpp/ * system.h: #undef fopen and freopen unconditionally. aix-redef.dif Description: video/dv
[v3] update cinttypes comments
The comments in cinttypes were copied from the TR1 implementation, this updates them w.r.t C++11, including removing the likely a defect comment because 27.9.2/4 clarifies that abs and div are only overloaded for intmax_t if it's an extended integer type. * include/c_global/cinttypes: Update comments that refer to TR1. Tested x86_64-linux, committed to trunk. Index: include/c_global/cinttypes === --- include/c_global/cinttypes (revision 182658) +++ include/c_global/cinttypes (revision 182659) @@ -1,6 +1,6 @@ // cinttypes -*- C++ -*- -// Copyright (C) 2007, 2008, 2009, 2010 Free Software Foundation, Inc. +// Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -37,7 +37,7 @@ #include cstdint -// For 8.11.1/1 (see C99, Note 184) +// For 27.9.2/3 (see C99, Note 184) #if _GLIBCXX_HAVE_INTTYPES_H # ifndef __STDC_FORMAT_MACROS # define _UNDEF__STDC_FORMAT_MACROS @@ -59,16 +59,10 @@ namespace std // functions using ::imaxabs; - - // May collide with _Longlong abs(_Longlong), and is not described - // anywhere outside the synopsis. Likely, a defect. - // - // intmax_t abs(intmax_t) - using ::imaxdiv; - // Likewise, with lldiv_t div(_Longlong, _Longlong). - // + // GCC does not support extended integer types + // intmax_t abs(intmax_t) // imaxdiv_t div(intmax_t, intmax_t) using ::strtoimax;
[v3] adjust weak_ptr testcase
This modifies the test to PASS when the expected type of exception is caught, instead of being XFAIL due to uncaught exception. Tested x86_64-linux, committed to trunk. * testsuite/tr1/2_general_utilities/shared_ptr/cons/ weak_ptr_expired.cc: Modify to PASS instead of XFAIL. Index: testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc === --- testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc (revision 182660) +++ testsuite/tr1/2_general_utilities/shared_ptr/cons/weak_ptr_expired.cc (revision 182661) @@ -1,5 +1,5 @@ -// { dg-do run { xfail *-*-* } } -// Copyright (C) 2005, 2009 Free Software Foundation +// { dg-do run } +// Copyright (C) 2005, 2009, 2010, 2011 Free Software Foundation // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -29,7 +29,7 @@ struct A { }; int test01() { - bool test __attribute__((unused)) = true; + bool test = false; std::tr1::shared_ptrA a1(new A); std::tr1::weak_ptrA wa(a1); @@ -42,12 +42,9 @@ test01() catch (const std::tr1::bad_weak_ptr) { // Expected. - __throw_exception_again; - } - catch (...) - { -// Failed. +test = true; } + VERIFY( test ); return 0; }
Re: [PATCH v3 00/10] MIPS vectorization improvements
On 12/22/2011 12:44 PM, Richard Sandiford wrote: Woah, thanks, that's quite some work. OK for the patches I didn't respond to. Here's a combined follow-on patch that I believe addresses all of the comments you had. Ok? r~ commit 824b5ca31ea21bb02cedabf79bb98e4348c34366 Author: Richard Henderson r...@redhat.com Date: Thu Dec 22 12:23:03 2011 -0800 mips: Feedback from rsandiford. diff --git a/gcc/config/mips/mips-modes.def b/gcc/config/mips/mips-modes.def index 85861a9..187c651 100644 --- a/gcc/config/mips/mips-modes.def +++ b/gcc/config/mips/mips-modes.def @@ -26,15 +26,15 @@ RESET_FLOAT_FORMAT (DF, mips_double_format); FLOAT_MODE (TF, 16, mips_quad_format); /* Vector modes. */ -VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI */ -VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ -VECTOR_MODES (INT, 4);/* V4QI V2HI */ +VECTOR_MODES (INT, 4);/* V4QI V2HI */ +VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI */ +VECTOR_MODES (FLOAT, 8); /* V4HF V2SF */ /* Double-sized vector modes for vec_concat. */ -VECTOR_MODE (INT, QI, 16); -VECTOR_MODE (INT, HI, 8); -VECTOR_MODE (INT, SI, 4); -VECTOR_MODE (FLOAT, SF, 4); +VECTOR_MODE (INT, QI, 16);/* V16QI */ +VECTOR_MODE (INT, HI, 8); /* V8HI */ +VECTOR_MODE (INT, SI, 4); /*V4SI */ +VECTOR_MODE (FLOAT, SF, 4); /*V4SF */ VECTOR_MODES (FRACT, 4); /* V4QQ V2HQ */ VECTOR_MODES (UFRACT, 4); /* V4UQQ V2UHQ */ diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index bc76078..94d2c2f 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -4638,7 +4638,7 @@ mips_get_arg_info (struct mips_arg_info *info, const CUMULATIVE_ARGS *cum, /* The EABI conventions have traditionally been defined in terms of TYPE_MODE, regardless of the actual type. */ info-fpr_p = ((GET_MODE_CLASS (mode) == MODE_FLOAT - || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) + || mode == V2SFmode) GET_MODE_SIZE (mode) = UNITS_PER_FPVALUE); break; @@ -4653,7 +4653,7 @@ mips_get_arg_info (struct mips_arg_info *info, const CUMULATIVE_ARGS *cum, || SCALAR_FLOAT_TYPE_P (type) || VECTOR_FLOAT_TYPE_P (type)) (GET_MODE_CLASS (mode) == MODE_FLOAT -|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) +|| mode == V2SFmode) GET_MODE_SIZE (mode) = UNITS_PER_FPVALUE); break; @@ -4666,7 +4666,7 @@ mips_get_arg_info (struct mips_arg_info *info, const CUMULATIVE_ARGS *cum, (type == 0 || FLOAT_TYPE_P (type)) (GET_MODE_CLASS (mode) == MODE_FLOAT || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT -|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) +|| mode == V2SFmode) GET_MODE_UNIT_SIZE (mode) = UNITS_PER_FPVALUE); /* ??? According to the ABI documentation, the real and imaginary @@ -5103,7 +5103,7 @@ static bool mips_return_mode_in_fpr_p (enum machine_mode mode) { return ((GET_MODE_CLASS (mode) == MODE_FLOAT - || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT + || mode == V2SFmode || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT) GET_MODE_UNIT_SIZE (mode) = UNITS_PER_HWFPVALUE); } @@ -10786,8 +10786,14 @@ mips_cannot_change_mode_class (enum machine_mode from, enum machine_mode to, enum reg_class rclass) { - /* There are several problems with changing the modes of values in - floating-point registers: + /* Allow conversions between different Loongson integer vectors, + and between those vectors and DImode. */ + if (GET_MODE_SIZE (from) == 8 GET_MODE_SIZE (to) == 8 + INTEGRAL_MODE_P (from) INTEGRAL_MODE_P (to)) +return false; + + /* Otherwise, there are several problems with changing the modes of + values in floating-point registers: - When a multi-word value is stored in paired floating-point registers, the first register always holds the low word. We @@ -10809,12 +10815,6 @@ mips_cannot_change_mode_class (enum machine_mode from, We therefore disallow all mode changes involving FPRs. */ - /* Except for Loongson and its integral vectors. We need to be able - to change between those modes easily. */ - if (GET_MODE_SIZE (from) == 8 GET_MODE_SIZE (to) == 8 - INTEGRAL_MODE_P (from) INTEGRAL_MODE_P (to)) -return false; - return reg_classes_intersect_p (FP_REGS, rclass); } @@ -16352,7 +16352,8 @@ struct expand_vec_perm_d return true if that's a valid instruction in the active ISA. */ static bool -expand_vselect (rtx target, rtx op0, const unsigned char *perm, unsigned nelt) +mips_expand_vselect
Re: [PATCH v3 00/10] MIPS vectorization improvements
Richard Henderson r...@redhat.com writes: On 12/22/2011 12:44 PM, Richard Sandiford wrote: Woah, thanks, that's quite some work. OK for the patches I didn't respond to. Here's a combined follow-on patch that I believe addresses all of the comments you had. Ok? Yeah, looks good, thanks. Richard
[BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST
Hi. This patch removes obsolete REGISTER_MOVE_COST and MEMORY_MOVE_COST macros from the Blackfin back end in the GCC and introduces equivalent TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST target hooks. Untested. OK to install? * config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove. * config/bfin/bfin-protos.h (bfin_register_move_cost, bfin_memory_move_cost): Remove. * config/bfin/bfin.c (bfin_register_move_cost, bfin_memory_move_cost): Make static. Change arguments type from enum reg_class to reg_class_t and from int to bool. (TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define. Index: gcc/config/bfin/bfin-protos.h === --- gcc/config/bfin/bfin-protos.h (revision 182658) +++ gcc/config/bfin/bfin-protos.h (working copy) @@ -85,9 +85,6 @@ extern bool bfin_longcall_p (rtx, int); extern bool bfin_dsp_memref_p (rtx); extern bool bfin_expand_movmem (rtx, rtx, rtx, rtx); -extern int bfin_register_move_cost (enum machine_mode, enum reg_class, - enum reg_class); -extern int bfin_memory_move_cost (enum machine_mode, enum reg_class, int in); extern enum reg_class secondary_input_reload_class (enum reg_class, enum machine_mode, rtx); Index: gcc/config/bfin/bfin.c === --- gcc/config/bfin/bfin.c (revision 182658) +++ gcc/config/bfin/bfin.c (working copy) @@ -2149,12 +2149,11 @@ bfin_vector_mode_supported_p (enum machi return mode == V2HImode; } -/* Return the cost of moving data from a register in class CLASS1 to - one in class CLASS2. A cost of 2 is the default. */ +/* Worker function for TARGET_REGISTER_MOVE_COST. */ -int +static int bfin_register_move_cost (enum machine_mode mode, -enum reg_class class1, enum reg_class class2) +reg_class_t class1, reg_class_t class2) { /* These need secondary reloads, so they're more expensive. */ if ((class1 == CCREGS !reg_class_subset_p (class2, DREGS)) @@ -2177,18 +2176,16 @@ bfin_register_move_cost (enum machine_mo return 2; } -/* Return the cost of moving data of mode M between a - register and memory. A value of 2 is the default; this cost is - relative to those in `REGISTER_MOVE_COST'. +/* Worker function for TARGET_MEMORY_MOVE_COST. ??? In theory L1 memory has single-cycle latency. We should add a switch that tells the compiler whether we expect to use only L1 memory for the program; it'll make the costs more accurate. */ -int +static int bfin_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED, - enum reg_class rclass, - int in ATTRIBUTE_UNUSED) + reg_class_t rclass, + bool in ATTRIBUTE_UNUSED) { /* Make memory accesses slightly more expensive than any register-register move. Also, penalize non-DP registers, since they need secondary @@ -5703,6 +5700,12 @@ bfin_conditional_register_usage (void) #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST bfin_address_cost +#undef TARGET_REGISTER_MOVE_COST +#define TARGET_REGISTER_MOVE_COST bfin_register_move_cost + +#undef TARGET_MEMORY_MOVE_COST +#define TARGET_MEMORY_MOVE_COST bfin_memory_move_cost + #undef TARGET_ASM_INTEGER #define TARGET_ASM_INTEGER bfin_assemble_integer Index: gcc/config/bfin/bfin.h === --- gcc/config/bfin/bfin.h (revision 182658) +++ gcc/config/bfin/bfin.h (working copy) @@ -975,29 +975,6 @@ typedef struct { /* Do not put function addr into constant pool */ #define NO_FUNCTION_CSE 1 -/* A C expression for the cost of moving data from a register in class FROM to - one in class TO. The classes are expressed using the enumeration values - such as `GENERAL_REGS'. A value of 2 is the default; other values are - interpreted relative to that. - - It is not required that the cost always equal 2 when FROM is the same as TO; - on some machines it is expensive to move between registers if they are not - general registers. */ - -#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2) \ - bfin_register_move_cost ((MODE), (CLASS1), (CLASS2)) - -/* A C expression for the cost of moving data of mode M between a - register and memory. A value of 2 is the default; this cost is - relative to those in `REGISTER_MOVE_COST'. - - If moving between registers and memory is more expensive than - between two registers, you should define this macro to express the - relative cost. */ - -#define MEMORY_MOVE_COST(MODE, CLASS, IN) \ - bfin_memory_move_cost ((MODE), (CLASS), (IN)) - /* Specify the machine mode that this machine uses for
[SCORE] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST
Hi. This patch removes obsolete REGISTER_MOVE_COST macro from the SCORE back end in the GCC and introduces equivalent TARGET_MEMORY_MOVE_COST target hook. The MEMORY_MOVE_COST macros is removed and default implementation of the TARGET_MEMORY_MOVE_COST target hook is used. Untested. OK to install? * config/score/score.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove. * config/score/score-protos.h (score_register_move_cost): Remove. * config/score/score.c (TARGET_REGISTER_MOVE_COST): Define. (score_register_move_cost): Make static. Change arguments type from enum reg_class to reg_class_t. Index: gcc/config/score/score.h === --- gcc/config/score/score.h(revision 182660) +++ gcc/config/score/score.h(working copy) @@ -601,14 +601,6 @@ typedef struct score_args #define REVERSIBLE_CC_MODE(MODE)1 /* Describing Relative Costs of Operations */ -/* Compute extra cost of moving data between one register class and another. */ -#define REGISTER_MOVE_COST(MODE, FROM, TO) \ - score_register_move_cost (MODE, FROM, TO) - -/* Moves to and from memory are quite expensive */ -#define MEMORY_MOVE_COST(MODE, CLASS, TO_P) \ - (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P))) - /* Try to generate sequences that don't involve branches. */ #define BRANCH_COST(speed_p, predictable_p) 2 Index: gcc/config/score/score-protos.h === --- gcc/config/score/score-protos.h (revision 182660) +++ gcc/config/score/score-protos.h (working copy) @@ -42,8 +42,6 @@ extern bool score_block_move (rtx* ops); extern int score_address_cost (rtx addr, bool speed); extern int score_address_p (enum machine_mode mode, rtx x, int strict); extern int score_reg_class (int regno); -extern int score_register_move_cost (enum machine_mode mode, enum reg_class to, - enum reg_class from); extern int score_hard_regno_mode_ok (unsigned int, enum machine_mode); extern int score_const_ok_for_letter_p (HOST_WIDE_INT value, char c); extern int score_extra_constraint (rtx op, char c); Index: gcc/config/score/score.c === --- gcc/config/score/score.c(revision 182660) +++ gcc/config/score/score.c(working copy) @@ -187,6 +187,9 @@ struct extern_list *extern_head = 0; #undef TARGET_TRAMPOLINE_INIT #define TARGET_TRAMPOLINE_INIT score_trampoline_init +#undef TARGET_REGISTER_MOVE_COST +#define TARGET_REGISTER_MOVE_COST score_register_move_cost + /* Return true if SYMBOL is a SYMBOL_REF and OFFSET + SYMBOL points to the same object as SYMBOL. */ static int @@ -998,11 +1001,13 @@ score_legitimate_address_p (enum machine return score_classify_address (addr, mode, x, strict); } -/* Return a number assessing the cost of moving a register in class +/* Implement TARGET_REGISTER_MOVE_COST. + + Return a number assessing the cost of moving a register in class FROM to class TO. */ -int +static int score_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED, - enum reg_class from, enum reg_class to) + reg_class_t from, reg_class_t to) { if (GR_REG_CLASS_P (from)) { Anatoly.
Re: [SCORE] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST
On 12/23/2011 11:08 AM, Anatoly Sokolov wrote: * config/score/score.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove. * config/score/score-protos.h (score_register_move_cost): Remove. * config/score/score.c (TARGET_REGISTER_MOVE_COST): Define. (score_register_move_cost): Make static. Change arguments type from enum reg_class to reg_class_t. Ok. r~
Re: [BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST
On 12/23/2011 10:55 AM, Anatoly Sokolov wrote: * config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove. * config/bfin/bfin-protos.h (bfin_register_move_cost, bfin_memory_move_cost): Remove. * config/bfin/bfin.c (bfin_register_move_cost, bfin_memory_move_cost): Make static. Change arguments type from enum reg_class to reg_class_t and from int to bool. (TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define. Ok. r~
Re: #undef fopen+freopen prior to #def in system.h, for aix bootstrap
A minor update to provide a more precise ChangeLog: * system.h: #undef fopen and freopen unconditionally. 2011-12-23 Olivier Hainque hain...@adacore.com * system.h: Prior to #define, #undef fopen and freopen unconditionally. libcpp/ * system.h: Likewise.
[lra] patch to fix an arm testsuite degradation
The following patch fixes a degradation of 20060102-1.c on ARM. Not updating REG notes resulted in removing an insn after LRA as it was wrongly considered dead. The patch was successfully bootstrapped on x86/x86-64. Committed as rev. 182664. 2011-12-23 Vladimir Makarov vmaka...@redhat.com * lra.c (update_auto_inc_notes): Rename to update_reg_notes. Make it unconditional. Remove REG_DEAD and REG_UNUSED too. Make call of add_auto_inc_notes conditional. Index: lra.c === --- lra.c (revision 182663) +++ lra.c (working copy) @@ -2032,10 +2032,14 @@ add_auto_inc_notes (rtx insn, rtx x) } } -/* DF infrastructure does not deal with REG_INC notes -- so update - them here. */ +#endif + +/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC. + We change pseudos by hard registers without notification of DF and + that can make the notes obsolete. DF-infrastructure does not deal + with REG_INC notes -- so we should regenerate them here. */ static void -update_auto_inc_notes (void) +update_reg_notes (void) { rtx *pnote; basic_block bb; @@ -2048,17 +2052,19 @@ update_auto_inc_notes (void) pnote = REG_NOTES (insn); while (*pnote != 0) { - if (REG_NOTE_KIND (*pnote) == REG_INC) + if (REG_NOTE_KIND (*pnote) == REG_DEAD + || REG_NOTE_KIND (*pnote) == REG_UNUSED + || REG_NOTE_KIND (*pnote) == REG_INC) *pnote = XEXP (*pnote, 1); else pnote = XEXP (*pnote, 1); } +#ifdef AUTO_INC_DEC add_auto_inc_notes (insn, PATTERN (insn)); +#endif } } -#endif - /* Set to 1 while in lra. */ int lra_in_progress; @@ -2204,9 +2210,7 @@ lra (FILE *f) regstat_free_n_sets_and_refs (); regstat_free_ri (); reload_completed = 1; -#ifdef AUTO_INC_DEC - update_auto_inc_notes (); -#endif + update_reg_notes (); finish_subregs_of_mode (); inserted_p = fixup_abnormal_edges ();
Re: [lra] patch to fix an arm testsuite degradation
Hi Vladimir, The following patch fixes a degradation of 20060102-1.c on ARM. unless I'm badly mistaken, I see you using quite often the form 'degradation', which is somewhat unusual in this mailing list. Are you using it like 'regression' or you actually mean something slightly, subtly, different? A bit off topic, sorry, Paolo
Re: [lra] patch to fix an arm testsuite degradation
On 12/23/2011 04:17 PM, Paolo Carlini wrote: Hi Vladimir, The following patch fixes a degradation of 20060102-1.c on ARM. unless I'm badly mistaken, I see you using quite often the form 'degradation', which is somewhat unusual in this mailing list. Are you using it like 'regression' or you actually mean something slightly, subtly, different? Paolo, thanks for pointing this out. You are right. I frequently wrongly use this word. I should use regression.
Re: [patch, testsuite] One more strict-volatile-bitfields test case
On 12/22/2011 06:28 PM, Ye Joey wrote: * gcc.dg/volatile-bitfields-2.c: New test. Ok. r~
Re: [BFIN] Hookize REGISTER_MOVE_COST and MEMORY_MOVE_COST
Hi Anatoly, I cannot apply your patch to a lean tree. I tried to save your email as a text file, copy from thunderbird, copy from gmail, copy from the mailing list archive. But neither works. Regards, Jie 2011/12/23 Anatoly Sokolov ae...@post.ru: Hi. This patch removes obsolete REGISTER_MOVE_COST and MEMORY_MOVE_COST macros from the Blackfin back end in the GCC and introduces equivalent TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST target hooks. Untested. OK to install? * config/bfin/bfin.h (REGISTER_MOVE_COST, MEMORY_MOVE_COST): Remove. * config/bfin/bfin-protos.h (bfin_register_move_cost, bfin_memory_move_cost): Remove. * config/bfin/bfin.c (bfin_register_move_cost, bfin_memory_move_cost): Make static. Change arguments type from enum reg_class to reg_class_t and from int to bool. (TARGET_REGISTER_MOVE_COST, TARGET_MEMORY_MOVE_COST): Define. Index: gcc/config/bfin/bfin-protos.h === --- gcc/config/bfin/bfin-protos.h (revision 182658) +++ gcc/config/bfin/bfin-protos.h (working copy) @@ -85,9 +85,6 @@ extern bool bfin_longcall_p (rtx, int); extern bool bfin_dsp_memref_p (rtx); extern bool bfin_expand_movmem (rtx, rtx, rtx, rtx); -extern int bfin_register_move_cost (enum machine_mode, enum reg_class, - enum reg_class); -extern int bfin_memory_move_cost (enum machine_mode, enum reg_class, int in); extern enum reg_class secondary_input_reload_class (enum reg_class, enum machine_mode, rtx); Index: gcc/config/bfin/bfin.c === --- gcc/config/bfin/bfin.c (revision 182658) +++ gcc/config/bfin/bfin.c (working copy) @@ -2149,12 +2149,11 @@ bfin_vector_mode_supported_p (enum machi return mode == V2HImode; } -/* Return the cost of moving data from a register in class CLASS1 to - one in class CLASS2. A cost of 2 is the default. */ +/* Worker function for TARGET_REGISTER_MOVE_COST. */ -int +static int bfin_register_move_cost (enum machine_mode mode, - enum reg_class class1, enum reg_class class2) + reg_class_t class1, reg_class_t class2) { /* These need secondary reloads, so they're more expensive. */ if ((class1 == CCREGS !reg_class_subset_p (class2, DREGS)) @@ -2177,18 +2176,16 @@ bfin_register_move_cost (enum machine_mo return 2; } -/* Return the cost of moving data of mode M between a - register and memory. A value of 2 is the default; this cost is - relative to those in `REGISTER_MOVE_COST'. +/* Worker function for TARGET_MEMORY_MOVE_COST. ??? In theory L1 memory has single-cycle latency. We should add a switch that tells the compiler whether we expect to use only L1 memory for the program; it'll make the costs more accurate. */ -int +static int bfin_memory_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED, - enum reg_class rclass, - int in ATTRIBUTE_UNUSED) + reg_class_t rclass, + bool in ATTRIBUTE_UNUSED) { /* Make memory accesses slightly more expensive than any register-register move. Also, penalize non-DP registers, since they need secondary @@ -5703,6 +5700,12 @@ bfin_conditional_register_usage (void) #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST bfin_address_cost +#undef TARGET_REGISTER_MOVE_COST +#define TARGET_REGISTER_MOVE_COST bfin_register_move_cost + +#undef TARGET_MEMORY_MOVE_COST +#define TARGET_MEMORY_MOVE_COST bfin_memory_move_cost + #undef TARGET_ASM_INTEGER #define TARGET_ASM_INTEGER bfin_assemble_integer Index: gcc/config/bfin/bfin.h === --- gcc/config/bfin/bfin.h (revision 182658) +++ gcc/config/bfin/bfin.h (working copy) @@ -975,29 +975,6 @@ typedef struct { /* Do not put function addr into constant pool */ #define NO_FUNCTION_CSE 1 -/* A C expression for the cost of moving data from a register in class FROM to - one in class TO. The classes are expressed using the enumeration values - such as `GENERAL_REGS'. A value of 2 is the default; other values are - interpreted relative to that. - - It is not required that the cost always equal 2 when FROM is the same as TO; - on some machines it is expensive to move between registers if they are not - general registers. */ - -#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2) \ - bfin_register_move_cost ((MODE), (CLASS1), (CLASS2)) - -/* A C expression for the cost of moving data of mode M between a - register and memory. A value of 2 is the default; this cost is - relative to those in `REGISTER_MOVE_COST'. - -
Re: [patch] libitm: Fix privatization safety during upgrades to serial mode.
On 12/22/2011 11:28 AM, Torvald Riegel wrote: libitm: Fix privatization safety during upgrades to serial mode. libitm/ * beginend.cc (GTM::gtm_thread::restart): Add and handle finish_serial_upgrade parameter. * libitm.h (GTM::gtm_thread::restart): Adapt declaration. * config/linux/rwlock.cc (GTM::gtm_rwlock::write_lock_generic): Don't unset reader flag. (GTM::gtm_rwlock::write_upgrade_finish): New. * config/posix/rwlock.cc: Same. * config/linux/rwlock.h (GTM::gtm_rwlock::write_upgrade_finish): Declare. * config/posix/rwlock.h: Same. * method-serial.cc (GTM::gtm_thread::serialirr_mode): Unset reader flag after commit or after rollback when restarting. Ok. r~
C++ PATCH for c++/51507 (pack expansion in trailing-return-type)
The existing code to handle pack expansions in trailing-return-type assumed that such expansions would only occur inside decltype, which is not the case. This patch fixes the test to check for whether or not we're doing the substitution in the context of a function body, and fixes at_function_scope_p to properly return false when we're substituting deduced arguments into a candidate function template. Even with the change to at_function_scope_p it was impossible to tell that we weren't in function scope when instantiating a function declaration as part of overload resolution, so I also changed instantiate_template_1 to use push_to_top_level rather than just clear processing_template_decl. In my testing it was enough to just clear current_function_decl as well, but since in fact the instantiation happens at top level it seems more correct to use push_to_top_level. The second patch is a bug I noticed in dependent_name while working on this patch, though it isn't necessary to this patch; a BASELINK should not be considered a dependent name, or we end up treating calls to members of different classes as equivalent. Tested x86_64-pc-linux-gnu, applying to trunk. commit 4df8b36d378adc678ed4ca9ac91088ad0772b750 Author: Jason Merrill ja...@redhat.com Date: Thu Dec 22 11:03:09 2011 -0500 PR c++/51507 * search.c (at_function_scope_p): Also check cfun. * pt.c (tsubst_pack_expansion): Check it instead of cp_unevaluated_operand. (instantiate_template_1): Clear current_function_decl. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 820b1ff..20f67aa 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -9297,6 +9297,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain, int i, len = -1; tree result; htab_t saved_local_specializations = NULL; + bool need_local_specializations = false; int levels; gcc_assert (PACK_EXPANSION_P (t)); @@ -9330,7 +9331,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain, } if (TREE_CODE (parm_pack) == PARM_DECL) { - if (!cp_unevaluated_operand) + if (at_function_scope_p ()) arg_pack = retrieve_local_specialization (parm_pack); else { @@ -9346,6 +9347,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain, arg_pack = NULL_TREE; else arg_pack = make_fnparm_pack (arg_pack); + need_local_specializations = true; } } else @@ -9476,7 +9478,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain, if (len 0) return error_mark_node; - if (cp_unevaluated_operand) + if (need_local_specializations) { /* We're in a late-specified return type, so create our own local specializations table; the current table is either NULL or (in the @@ -14524,7 +14526,6 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain) tree fndecl; tree gen_tmpl; tree spec; - HOST_WIDE_INT saved_processing_template_decl; if (tmpl == error_mark_node) return error_mark_node; @@ -14585,18 +14586,22 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain) deferring all checks until we have the FUNCTION_DECL. */ push_deferring_access_checks (dk_deferred); - /* Although PROCESSING_TEMPLATE_DECL may be true at this point - (because, for example, we have encountered a non-dependent - function call in the body of a template function and must now - determine which of several overloaded functions will be called), - within the instantiation itself we are not processing a - template. */ - saved_processing_template_decl = processing_template_decl; - processing_template_decl = 0; + /* Instantiation of the function happens in the context of the function + template, not the context of the overload resolution we're doing. */ + push_to_top_level (); + if (DECL_CLASS_SCOPE_P (gen_tmpl)) +{ + tree ctx = tsubst (DECL_CONTEXT (gen_tmpl), targ_ptr, + complain, gen_tmpl); + push_nested_class (ctx); +} /* Substitute template parameters to obtain the specialization. */ fndecl = tsubst (DECL_TEMPLATE_RESULT (gen_tmpl), targ_ptr, complain, gen_tmpl); - processing_template_decl = saved_processing_template_decl; + if (DECL_CLASS_SCOPE_P (gen_tmpl)) +pop_nested_class (); + pop_from_top_level (); + if (fndecl == error_mark_node) return error_mark_node; diff --git a/gcc/cp/search.c b/gcc/cp/search.c index 0ceb5bc..45fdafc 100644 --- a/gcc/cp/search.c +++ b/gcc/cp/search.c @@ -539,7 +539,11 @@ int at_function_scope_p (void) { tree cs = current_scope (); - return cs TREE_CODE (cs) == FUNCTION_DECL; + /* Also check cfun to make sure that we're really compiling + this function (as opposed to having set current_function_decl + for access checking or some such). */ + return (cs TREE_CODE (cs) == FUNCTION_DECL + cfun cfun-decl == current_function_decl); } /* Returns
[committed] Remove VEC_EXTRACT_EVEN/ODD_EXPR
Having now committed patches to convert all targets to vec_perm_const, supporting the interleave and even/odd permutations, we can now remove the VEC_INTERLEAVE_HIGH/LOW_EXPR and VEC_EXTRACT_EVEN/ODD_EXPR codes as redundant with the primary VEC_PERM_EXPR code. I have committed the patch previously posted by Jakub (and approved by Richi) that removes VEC_INTERLEAVE_HIGH/LOW_EXPR. I have also committed thefollowing patch which removes VEC_EXTRACT_EVEN/ODD_EXPR. All re-tested on x86_64-linux. r~ * tree.def (VEC_EXTRACT_EVEN_EXPR, VEC_EXTRACT_ODD_EXPR): Remove. * cfgexpand.c (expand_debug_expr): Don't handle them. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (fold_binary_loc): Likewise. * gimple-pretty-print.c (dump_binary_rhs): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. * tree-vect-generic.c (expand_vector_operations_1): Likewise. * optabs.c (optab_for_tree_code): Likewise. (can_vec_perm_for_code_p): Remove. (expand_binop): Don't try it. (init_optabs): Don't init vec_extract_even/odd_optab. * genopinit.c (optabs): Likewise. * optabs.h (OTI_vec_extract_even, OTI_vec_extract_odd): Remove. (vec_extract_even_optab, vec_extract_odd_optab): Remove. * tree-vect-data-refs.c (vect_strided_store_supported): Tidy code. (vect_permute_store_chain): Use TYPE_VECTOR_SUBPARTS instead of GET_MODE_NUNITS; check vect_gen_perm_mask return value instead of asserting vect_strided_store_supported. (vect_strided_load_supported): Use can_vec_perm_p. (vect_permute_load_chain): Use VEC_PERM_EXPR. * doc/generic.texi (VEC_EXTRACT_EVEN_EXPR): Remove. (VEC_EXTRACT_ODD_EXPR): Remove. * doc/md.texi (vec_extract_even, vec_extract_odd): Remove. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index dfe5442..2b2e464 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -3449,8 +3449,6 @@ expand_debug_expr (tree exp) case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: case VEC_COND_EXPR: -case VEC_EXTRACT_EVEN_EXPR: -case VEC_EXTRACT_ODD_EXPR: case VEC_LSHIFT_EXPR: case VEC_PACK_FIX_TRUNC_EXPR: case VEC_PACK_SAT_EXPR: diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi index 4f26238..31e8855 100644 --- a/gcc/doc/generic.texi +++ b/gcc/doc/generic.texi @@ -1695,8 +1695,6 @@ its sole argument yields the representation for @code{ap}. @tindex VEC_PACK_TRUNC_EXPR @tindex VEC_PACK_SAT_EXPR @tindex VEC_PACK_FIX_TRUNC_EXPR -@tindex VEC_EXTRACT_EVEN_EXPR -@tindex VEC_EXTRACT_ODD_EXPR @table @code @item VEC_LSHIFT_EXPR @@ -1765,13 +1763,6 @@ of elements of a floating point type. The result is a vector that contains twice as many elements of an integral type whose size is half as wide. The elements of the two vectors are merged (concatenated) to form the output vector. - -@item VEC_EXTRACT_EVEN_EXPR -@itemx VEC_EXTRACT_ODD_EXPR -These nodes represent extracting of the even/odd elements of the two input -vectors, respectively. Their operands and result are vectors that contain the -same number of elements of the same type. - @end table diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 6dd6a58..93183e6 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4145,20 +4145,6 @@ operand 1 is new value of field and operand 2 specify the field index. Extract given field from the vector value. Operand 1 is the vector, operand 2 specify field index and operand 0 place to store value into. -@cindex @code{vec_extract_even@var{m}} instruction pattern -@item @samp{vec_extract_even@var{m}} -Extract even elements from the input vectors (operand 1 and operand 2). -The even elements of operand 2 are concatenated to the even elements of operand -1 in their original order. The result is stored in operand 0. -The output and input vectors should have the same modes. - -@cindex @code{vec_extract_odd@var{m}} instruction pattern -@item @samp{vec_extract_odd@var{m}} -Extract odd elements from the input vectors (operand 1 and operand 2). -The odd elements of operand 2 are concatenated to the odd elements of operand -1 in their original order. The result is stored in operand 0. -The output and input vectors should have the same modes. - @cindex @code{vec_init@var{m}} instruction pattern @item @samp{vec_init@var{m}} Initialize the vector to given values. Operand 0 is the vector to initialize diff --git a/gcc/expr.c b/gcc/expr.c index cb28f48..c10f915 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8647,10 +8647,6 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, return temp; } -case VEC_EXTRACT_EVEN_EXPR: -case VEC_EXTRACT_ODD_EXPR: - goto binop; - case VEC_LSHIFT_EXPR: case VEC_RSHIFT_EXPR: { diff --git