Re: Source Code for Profile Guided Code Positioning
On Fri, Jan 15, 2016 at 9:51 AM, Yury Gribov wrote: > On 01/15/2016 08:44 PM, vivek pandya wrote: >> >> Thanks Yury for >> https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link. >> It implements procedure reordering as linker plugin. >> I have some questions : >> 1 ) Can you point me to some documentation for "how to write plugin >> for linkers " I am I have not seen doc for structs with 'ld_' prefix >> (i.e defined in plugin-api.h ) >> 2 ) There is one more algorithm for Basic Block ordering with >> execution frequency count in PH paper . Is there any implementation >> available for it ? > > > Quite frankly - I don't know (I've only learned about Google implementation > recently). > > I've added Sriram to maybe comment. Sorry for the late response. The google/gcc_4_9 branch has the source of function reordering linker Plugin. It is available in the function_reordering_plugin directory under the top level gcc directory. The function reordering plugin constructs a callgraph and uses profile information to do a Pettis Hansen style function reordering. This plugin does not do basic block re-ordering. There is no documentation as such that I am aware of to write a linker plugin. Here is a very brief overview. The linker calls the plugin's "onload" function when registering the plugin and the plugin inturn can register two call-backs with the linker, "claim_file_hook" and the "all_symbols_read_hook". "claim_file_hook" is called for each object file that the linker prcesses and the "all_symbols_read_hook" is called after all the symbols have been read by the linker. These are just two different interesting points in the course of a link. The plugin can also get handles to linker functions like "get_input_section_name" which it can use to process sections given their handle. You can also check the gold linker tests for simpler plugin examples. HTH, Thanks Sri > > -Y
Re: multiversioning page update?
On Wed, Sep 4, 2013 at 6:48 AM, Kenny Simpson wrote: > http://gcc.gnu.org/wiki/FunctionMultiVersioning says "This support has been > checked in to trunk and should be available when GCC 4.8 is released." > > > Since 4.8 has been released, and lists multiversioning support in the release > notes, should the page be updated to reflect this? Updated. Sri > > -Kenny >
Re: What's up with g++.dg/ext/mv*.C?
On Thu, Jun 13, 2013 at 3:38 AM, Paolo Carlini wrote: > On 06/13/2013 12:35 PM, Paolo Carlini wrote: >> >> On 06/13/2013 12:28 PM, Paolo Carlini wrote: >>> >>> Hi, >>> >>> these FAILs are much more recent but frankly I'm also puzzled: is a fix >>> actively in the making? Do we have any sort of time for that? >> >> This is PR57548 and a patch is approved but unapplied: >> >> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00426.html >> >> I'll do that myself ASAP if nobody beats me. > > In fact the patch is already in but the regressions are still there, eg: > > http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01257.html > > Is it actually a different issue or what? This is probably http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56673 I will take a look at it. I am not sure what revision introduced the bug. Sri > > Paolo.
Re: [wwwdocs] Release note entry for Function Multiversioning
On Tue, Nov 20, 2012 at 7:23 AM, Gerald Pfeifer wrote: > Hi Sri, > > On Tue, 13 Nov 2012, Sriraman Tallam wrote: >> I have added a release note for Function Multiversioning which is >> checked into trunk. Please review. > > Index: changes.html > === > + Function Multiversioning Support with G++: > > Is this really specific to the C++ front end? The example is C, not > C++? This support is only available with C++ front end for now. I intend to add support to C front end soon. > > +It is now possible to create multiple function versions each targeting a > +specific processor and/or ISA. Function versions have the same signature > +but different target attributes. For example, here is a program with > +function versions: > > I believe embedding this in and could be better. > > This is okay, thanks. Thanks, made the change and submitted. -Sri. > > Gerald
Re: Using -ffunction-sections and -p
On Sun, Nov 4, 2012 at 9:03 PM, Ian Lance Taylor wrote: > On Sun, Nov 4, 2012 at 8:04 PM, Sriraman Tallam wrote: >> >>Currently, using -ffunction-sections and -p together results in a >> warning. I ran into this problem when compiling the kernel. This is >> discussed in this thread: >> >> http://gcc.gnu.org/ml/gcc-help/2008-11/msg00128.html >> >> Ian's reply suggests this warning is no longer necessary and can be >> removed. Is this patch alright to commit with all tests passing: >> >> * toplev.c (process_options): Do not warn when -ffunction-sections >> and -fprofile are used together. > > In that thread Jeff suggested that this be tested on Solaris and PA. > I don't know how to test on PA, but could somebody test the patch on > Solaris? Is it reasonable to gate this using a target hook and start allowing this on selected targets? For instance, i386 does not seem to have a problem with this. Thanks, -Sri. > > Ian
[wwwdocs] Release note entry for Function Multiversioning
Hi, I have added a release note for Function Multiversioning which is checked into trunk. Please review. Thanks, -Sri. Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.56 diff -u -u -p -r1.56 changes.html --- changes.html12 Nov 2012 15:19:33 - 1.56 +++ changes.html14 Nov 2012 01:15:15 - @@ -297,6 +297,34 @@ B b(42); // OK } + Function Multiversioning Support with G++: +It is now possible to create multiple function versions each targeting a +specific processor and/or ISA. Function versions have the same signature +but different target attributes. For example, here is a program with +function versions: + +int foo(void) +{ + return 1; +} + +__attribute__ ((target ("sse4.2"))) +int foo(void) +{ + return 2; +} + +int main (void) +{ + int (*p) = &foo; + assert ((*p)() == foo()); + return 0; +} + +Please refer to this +http://gcc.gnu.org/wiki/FunctionMultiVersioning";>wiki for more +information. + FRV
Using -ffunction-sections and -p
Hi, Currently, using -ffunction-sections and -p together results in a warning. I ran into this problem when compiling the kernel. This is discussed in this thread: http://gcc.gnu.org/ml/gcc-help/2008-11/msg00128.html Ian's reply suggests this warning is no longer necessary and can be removed. Is this patch alright to commit with all tests passing: * toplev.c (process_options): Do not warn when -ffunction-sections and -fprofile are used together. --- toplev.c (revision 193117) +++ toplev.c (working copy) @@ -1467,12 +1467,6 @@ process_options (void) } } - if (flag_function_sections && profile_flag) -{ - warning (0, "-ffunction-sections disabled; it makes profiling impossible"); - flag_function_sections = 0; -} - #ifndef HAVE_prefetch if (flag_prefetch_loop_arrays > 0) { Thanks, -Sri.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Hi Jakub, My function multiversioning patch is being reviewed and I hope to get this in by Nov. 5. Thanks, -Sri. On Mon, Oct 29, 2012 at 10:56 AM, Jakub Jelinek wrote: > Status > == > > I'd like to close the stage 1 phase of GCC 4.8 development > on Monday, November 5th. If you have still patches for new features you'd > like to see in GCC 4.8, please post them for review soon. Patches > posted before the freeze, but reviewed shortly after the freeze, may > still go in, further changes should be just bugfixes and documentation > fixes. > > > Quality Data > > > Priority # Change from Last Report > --- --- > P1 23 + 23 > P2 77 + 8 > P3 85 + 84 > --- --- > Total 185 +115 > > > Previous Report > === > > http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html > > The next report will be sent by me again, announcing end of stage 1.
Re: How much time left till phase 3?
On Tue, Oct 2, 2012 at 2:45 AM, Richard Guenther wrote: > On Tue, Oct 2, 2012 at 11:34 AM, Richard Guenther > wrote: >> On Tue, Oct 2, 2012 at 10:44 AM, Joern Rennecke >> wrote: >>> I'll have to prepare a few more patches to (supposedly) generic >>> code to support the ARCompact port, which we (Synopsys and Embecosm) >>> would like contribute in time for gcc 4.8. >>> >>> How much time is left till we switch from phase 1 to phase 3? >> >> I expect stage1 to close mid to end of October (after which it lasted >> for more than 7 months). > > Btw, I realize that the aarch64 port probably also wants to merge even if > I didn't see a merge proposal or know whether they have patches to > generic code. > > Anybody else with things they want to merge during stage1? My multiversioning patch is under review and I am hoping for merge during stage 1. The target part has already been reviewed by HJ. I am waiting for reviews on the front-end and the cgraph part. Patch here: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01717.html Thanks, -Sri. > > Thanks, > Richard. > >> Richard.
Re: Reserving a bit in ELF segment flags for huge page mappings
On Tue, Jul 24, 2012 at 1:40 PM, Cary Coutant wrote: >> To do this, I would like to reserve a bit in the segment flags to >> indicate that this segment is to be mapped to huge pages if possible. >> Can I reserve something like a PF_LARGE_PAGE bit? > > HP-UX has a PF_HP_PAGE_SIZE (0x0010) bit that says "Segment should > be mapped with page size specified in p_align field". Ok to define PF_LINUX_PAGE_SIZE similarly, same bit (0x0010) ? I want this to be a hint to the loader. Thanks, -Sri. > > -cary
Reserving a bit in ELF segment flags for huge page mappings
Hi, I am working on a patch to allow subsets of text sections to be mapped to different ELF segments : http://sourceware.org/ml/binutils/2012-07/msg00153.html using linker plugins. This will allow splitting hot and cold functions into separate segments so that only the hot segment can be then mapped to text huge pages. It has been found for some Google applications that mapping functions to huge pages, along with function layout, provides a significant performance benefit and this feature can take this further by only mapping certain functions to huge pages. To do this, I would like to reserve a bit in the segment flags to indicate that this segment is to be mapped to huge pages if possible. Can I reserve something like a PF_LARGE_PAGE bit? Thanks, -Sri.
User directed Function Multiversioning (MV) via Function Overloading
Hi, User directed Function Multiversioning (MV) via Function Overloading === I have created a set of patches to add support for user directed function MV via function overloading. This was discussed in this thread previously: http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html Two patches have been created now to support this: * The patch with the front-end changes to support versioned functions is: http://codereview.appspot.com/5752064/ * The patch to add runtime CPU type detection support is here: http://codereview.appspot.com/5754058/ With this support, here is an example of writing a program with function versions: int foo (); /* Default version */ int foo () __attribute__ ((targetv("arch=corei7"))); /*Specialized for corei7 */ int foo () __attribute__ ((targetv("arch=amdfam10"))); /*Specialized for amdfam10 */ int main () { int (*p)() = &foo; return foo () + (*p)(); } int foo () { return 0; } int __attribute__ ((targetv("arch=corei7"))) foo () { ... return 0; } int __attribute__ ((targetv("arch=amdfam10"))) foo () { ... return 0; } The above example has foo defined 3 times, but all 3 definitions of foo are different versions of the same function. The call to foo in main, directly and via a pointer, are calls to the multi-versioned function foo which is dispatched to the right foo at run-time. Function versions must have the same signature but must differ in the specifier string provided to a new attribute called "targetv", which is nothing but the target attribute with an extra specification to indicate a version. Any number of versions can be created using the targetv attribute but it is mandatory to have one function without the attribute, which is treated as the default version. The front-end support is available in this patch: http://codereview.appspot.com/5752064/ The front-end treats multiple definitions of foo with the same signature but with different targetv attributes as legitimate candidates for overloading. Also, all the function versions of one function are grouped together. Then, calls to foo and pointer access of foo will be replaced by an IFUNC function (foo.ifunc) which will call the dispatcher code at run-time to figure out the right version to execute. For the above example, the following functions will be created : * _Z3foov.ifunc : ifunc dispatcher for multi-versioned function foo and aliases to _Z3foov.resolver. All calls and pointer accesses to foo are replaced by an call or pointer access to this function. * _Z3foov.resolver : The code to determine which version to execute at run-time. * _Z3foov : The default version of foo. * _Z3foov.arch_corei7 : The corei7 version of foo. * _Z3foov.arch_amdfam10 : The amdfam10 version of foo. Note that using IFUNC blocks inlining of versioned functions. I had implemented an optimization earlier to do hot path cloning to allow versioned functions to be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html In the next iteration, I plan to merge these two. With that, hot code paths with versioned functions will be cloned so that versioned functions can be inlined. The version dispatch itself happens in a newly created pass added to be one of the initial lowering passes. The pass communicates with the target to determine the appropriate predicates to use to figure out which version to dispatch at run-time. The predicates are target builtins which determine the platform type at run-time and are added in this patch : http://codereview.appspot.com/5754058/ The following features are being developed for the next iteration: 1) Support for hot path cloning to inline versioned functions. 2) Specifying multiple versions in a single function definition. This will be done using the following syntax: int foo () __attribute__ ((targetv (("arch=corei7"),("arch=amdfam10"), ("arch=core2"; which means the same body of foo must be cloned for corei7, amdfam10, and core2. 3) Specifying ISA types in the attribute. Only "arch=" is supported now. For example, int foo () __attribute__ ((targetv ("popcnt,ssse3"))); means the version is only to be executed when popcount and ssse3 instructions are available. 4) Other dispatching mechanism. IFUNC is used for dispatch, but then the target does not support this dispatching by directly calling the appropriate function version after checking the platform type will be supported. 5) Virtual function versioning. Thoughts? Thanks, -Sri.
Function Multiversioning Usability.
Hi, I am working on supporting function multi-versioning in GCC and here is a write-up on its usability. Multiversioning Usability For a simple motivating example, int find_popcount(unsigned int i) { return __builtin_popcount(i); } Currently, compiling this with -mpopcnt will result in the “popcnt” instruction being used and otherwise call a built-in generic implementation. It is desirable to have two versions of this function so that it can be run both on targets that support the popcnt insn and those that do not. * Case I - User Guided Versioning where only one function body is provided by the user. This case addresses a use where the user wants multi-versioning but provides only one function body. I want to add a new attribute called “mversion” which will be used like this: int __attribute__(mversion(“popcnt”)) find_popcount(unsigned int i) { return __builtin_popcount(i); } With the attribute, the compiler should understand that it should generate two versions for this function. The user makes a call to this function like a regular call but the code generated would call the appropriate function at run-time based on a check to determine if that instruction is supported or not. The attribute can be scaled to support many versions but allowing a comma separated list of values for the mversion attribute. For instance, “__attribute__(mversion(“sse3”, “sse4”, ...)) will provide a version for each. For N attributes, N clones plus one clone for the default case will have to be generated by the compiler. The arguments to the "mversion" attribute will be similar to the arguments supported by the "target" attribute. This attribute is useful if the same source is going to be used to generate the different versions. If this has to be done manually, the user has to duplicate the body of the function and specify a target attribute of “popcnt” on one clone. Then, the user has to use something like IFUNC support or manually write code to call the appropriate version. All of this will be done automatically by the compiler with this new attribute. * Case II - User Guided Versioning where the function bodies for each version differ and is provided by the user. This case pertains to multi-versioning when the source bodies of the two or more versions are different and are provided by the user. Here too, I want to use a new attribute, “version”. Now, the user can specify versioning intent like this: int __attribute__((version(“popcnt”)) find_popcnt(unsigned int i) { // inline assembly of the popcnt instruction, specialized version. asm(“popcnt ….”); } int find_popcnt(unsigned int i) { //generic code for doing this ... } This uses function overloading to specify versions. The compiler will understand that versioning is requested, since the functions have different attributes with "version", and will generate the code to execute the right function at run-time. The compiler should check for the existence of one body without the attribute which will be the default version. * Case III - Versioning is done automatically by the compiler. I want to add a new compiler flag “-mversion” along the lines of “-m”. If the user specifies “-mversion=popcnt” then the compiler will automatically create two versions of any function that is impacted by the new instruction. The difference between “-m” and “-mversion” will be that while “-m” generates only the specialized version, “-mversion” will generate both the specialized and the generic versions. There is no need to explicity mark any function for versioning, no source changes. The compiler will decide if it is beneficial to multi-version a function based on heuristics using hotness information, code size growth, etc. Runtime support === In order for the compiler to generate multi-versioned code, it needs to call functions that would test if a particular feature exists or not at run-time. For example, IsPopcntSupported() would be one such function. I have prepared a patch to do this which adds the runtime support in libgcc and supports new builtins to test the various features. I will send the patch separately to keep the dicussions focused. Thoughts? Thanks, -Sri.
Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)
Hi Jan, Can you take a look at this patch when you find the time ? This is being blocked needing an approval from a x86 backend maintainer and you are the only one listed in the MAINTAINERS file. Thanks, -Sriraman. On Tue, Oct 6, 2009 at 2:56 PM, Paolo Bonzini wrote: > On 10/01/2009 11:37 PM, Sriraman Tallam wrote: >> >> Hi, >> >> I moved implicit-zee.c to config/i386. Can you please take another >> look ? > > I think this patch is best reviewed by an x86 backend maintainer now. > > Thanks for doing the adjustments, BTW. > > Paolo >
Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)
Hi Richard, I was wondering if you got a chance to see if this new patch is alright ?. Thanks, -Sriraman. On Thu, Oct 1, 2009 at 2:37 PM, Sriraman Tallam wrote: > Hi, > > I moved implicit-zee.c to config/i386. Can you please take another look ? > > * tree-pass.h (pass_implicit_zee): New pass. > * testsuite/gcc.target/i386/zee.c: New test. > * timevar.def (TV_ZEE): New. > * common.opt (fzee): New flag. > * config.gcc: Add implicit-zee.o for x86_64 target. > * implicit-zee.c: New file, zero extension elimination pass. > * config/i386/t-i386: Add rule for implicit-zee.o. > * i386.c (optimization_options): Enable zee pass for x86_64 target. > > Thanks, > -Sriraman. > > > On Thu, Sep 24, 2009 at 9:34 AM, Sriraman Tallam wrote: >> On Thu, Sep 24, 2009 at 1:36 AM, Richard Guenther >> wrote: >>> On Thu, Sep 24, 2009 at 8:25 AM, Paolo Bonzini wrote: >>>> On 09/24/2009 08:24 AM, Ian Lance Taylor wrote: >>>>> >>>>> We already have the hooks, they have just been stuck in plugin.c when >>>>> they should really be in the generic backend. See register_pass. >>>>> >>>>> (Sigh, every time I looked at this I said "the pass control has to be >>>>> generic" but it still wound up in plugin.c.) >>>> >>>> Then I'll rephrase and say only that the pass should be in config/i386/. >>> >>> It should also be on by default on -O[23s] I think (didn't check if it >>> already >>> is). Otherwise it shortly will go the see lala-land. >> >> It is already on by default in O2 and higher. >> >>> >>> Richard. >>> >>>> Paolo >>>> >>> >> >
Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)
Hi, I moved implicit-zee.c to config/i386. Can you please take another look ? * tree-pass.h (pass_implicit_zee): New pass. * testsuite/gcc.target/i386/zee.c: New test. * timevar.def (TV_ZEE): New. * common.opt (fzee): New flag. * config.gcc: Add implicit-zee.o for x86_64 target. * implicit-zee.c: New file, zero extension elimination pass. * config/i386/t-i386: Add rule for implicit-zee.o. * i386.c (optimization_options): Enable zee pass for x86_64 target. Thanks, -Sriraman. On Thu, Sep 24, 2009 at 9:34 AM, Sriraman Tallam wrote: > On Thu, Sep 24, 2009 at 1:36 AM, Richard Guenther > wrote: >> On Thu, Sep 24, 2009 at 8:25 AM, Paolo Bonzini wrote: >>> On 09/24/2009 08:24 AM, Ian Lance Taylor wrote: >>>> >>>> We already have the hooks, they have just been stuck in plugin.c when >>>> they should really be in the generic backend. See register_pass. >>>> >>>> (Sigh, every time I looked at this I said "the pass control has to be >>>> generic" but it still wound up in plugin.c.) >>> >>> Then I'll rephrase and say only that the pass should be in config/i386/. >> >> It should also be on by default on -O[23s] I think (didn't check if it >> already >> is). Otherwise it shortly will go the see lala-land. > > It is already on by default in O2 and higher. > >> >> Richard. >> >>> Paolo >>> >> > Index: tree-pass.h === --- tree-pass.h (revision 152385) +++ tree-pass.h (working copy) @@ -500,6 +500,7 @@ extern struct rtl_opt_pass pass_stack_ptr_mod; extern struct rtl_opt_pass pass_initialize_regs; extern struct rtl_opt_pass pass_combine; extern struct rtl_opt_pass pass_if_after_combine; +extern struct rtl_opt_pass pass_implicit_zee; extern struct rtl_opt_pass pass_partition_blocks; extern struct rtl_opt_pass pass_match_asm_constraints; extern struct rtl_opt_pass pass_regmove; Index: testsuite/gcc.target/i386/zee.c === --- testsuite/gcc.target/i386/zee.c (revision 0) +++ testsuite/gcc.target/i386/zee.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -fzee -S" } */ +/* { dg-final { scan-assembler-not "mov\[\\t \]+\(%\[\^,\]+\),\[\\t \]*\\1" } } */ +int mask[100]; +int foo(unsigned x) +{ + if (x < 10) +x = x * 45; + else +x = x * 78; + return mask[x]; +} Index: timevar.def === --- timevar.def (revision 152385) +++ timevar.def (working copy) @@ -182,6 +182,7 @@ DEFTIMEVAR (TV_RELOAD, "reload") DEFTIMEVAR (TV_RELOAD_CSE_REGS , "reload CSE regs") DEFTIMEVAR (TV_SEQABSTR , "sequence abstraction") DEFTIMEVAR (TV_GCSE_AFTER_RELOAD , "load CSE after reload") +DEFTIMEVAR (TV_ZEE , "zee") DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue") DEFTIMEVAR (TV_IFCVT2 , "if-conversion 2") DEFTIMEVAR (TV_COMBINE_STACK_ADJUST , "combine stack adjustments") Index: common.opt === --- common.opt (revision 152385) +++ common.opt (working copy) @@ -1099,6 +1099,10 @@ fsee Common Does nothing. Preserved for backward compatibility. +fzee +Common Report Var(flag_zee) Init(0) +Eliminate redundant zero extensions on targets that support implicit extensions. + fshow-column Common C ObjC C++ ObjC++ Report Var(flag_show_column) Init(1) Show column numbers in diagnostics, when available. Default on Index: config.gcc === --- config.gcc (revision 152385) +++ config.gcc (working copy) @@ -2569,6 +2569,12 @@ powerpc*-*-* | rs6000-*-*) tm_file="${tm_file} rs6000/option-defaults.h" esac +case ${target} in +x86_64-*-*) + extra_objs="${extra_objs} implicit-zee.o" + ;; +esac + # Support for --with-cpu and related options (and a few unrelated options, # too). case ${with_cpu} in Index: config/i386/implicit-zee.c ======= --- config/i386/implicit-zee.c (revision 0) +++ config/i386/implicit-zee.c (revision 0) @@ -0,0 +1,1029 @@ +/* Redundant Zero-extension elimination for targets that implicitly + zero-extend writes to the lower 32-bit portion of 64-bit registers. + Copyright (C) 2009 Free Software Foundation, Inc. + Contributed by Sriraman Tallam (tmsri...@google.com) and + Silvius Rus (r...@google.com) + +This file is
Re: GCC 4.5 Status Report (2009-09-19)
Hi, I have a zero-extension elimination patch that has been reviewed and needs one minor fix before it is ready for submission. I can get this in by Thursday, October 1st. Would it be alright to submit this patch then ? Thanks, -Sriraman. On Sat, Sep 19, 2009 at 1:57 PM, Richard Guenther wrote: > > Status > == > > The trunk is in Stage 1. Stage 1 will end on Sep 30th. After Stage 1 > Stage 3 follows with only bugfixes and no new features allowed. > Stage 3 will end Nov 30th. > > Since the last status report we have merged the VTA branch and pieces > of the LTO branch. The named address-spaces changes are still pending > review but I expect it to be merged before the end of Stage 1. > The rest of the LTO branch will be merged last, which practically > means after Stage 1 is over. Thus, starting Oct 1st the trunk will > be frozen for the LTO merge and I'll announce Stage 3 once the merge > is completed. > > There are still new ports pending review and approval. As usual > new ports can be accepted also during Stage 3. > > We've been accumulating quite a number of P1 bugs. Entering Stage 3 > should allow to improve considerably here in a short time. > > Quality Data > > > Priority # Change from Last Report > --- --- > P1 22 + 6 > P2 111 + 7 > P3 6 + 6 > --- --- > Total 139 +19 > > Previous Report > === > > http://gcc.gnu.org/ml/gcc/2009-08/msg00427.html > > The next report will be sent by me announcing Stage 3 begin. > > -- > Richard Guenther > Novell / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus > Rex >
Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)
On Thu, Sep 24, 2009 at 1:36 AM, Richard Guenther wrote: > On Thu, Sep 24, 2009 at 8:25 AM, Paolo Bonzini wrote: >> On 09/24/2009 08:24 AM, Ian Lance Taylor wrote: >>> >>> We already have the hooks, they have just been stuck in plugin.c when >>> they should really be in the generic backend. See register_pass. >>> >>> (Sigh, every time I looked at this I said "the pass control has to be >>> generic" but it still wound up in plugin.c.) >> >> Then I'll rephrase and say only that the pass should be in config/i386/. > > It should also be on by default on -O[23s] I think (didn't check if it already > is). Otherwise it shortly will go the see lala-land. It is already on by default in O2 and higher. > > Richard. > >> Paolo >> >
Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)
On Wed, Sep 23, 2009 at 3:57 PM, H.J. Lu wrote: > On Sat, Aug 8, 2009 at 2:59 PM, Sriraman Tallam wrote: >> Hi, >> >> Here is a patch to eliminate redundant zero-extension instructions >> on x86_64. >> >> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified >> that the results are the same with/without this patch. >> >> >> Problem Description : >> - >> >> This pass is intended to be applicable only to targets that implicitly >> zero-extend 64-bit registers after writing to their lower 32-bit half. >> For instance, x86_64 zero-extends the upper bits of a register >> implicitly whenever an instruction writes to its lower 32-bit half. >> For example, the instruction *add edi,eax* also zero-extends the upper >> 32-bits of rax after doing the addition. These zero extensions come >> for free and GCC does not always exploit this well. That is, it has >> been observed that there are plenty of cases where GCC explicitly >> zero-extends registers for x86_64 that are actually useless because >> these registers were already implicitly zero-extended in a prior >> instruction. This pass tries to eliminate such useless zero extension >> instructions. >> > > Does this fix: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17387 Yes, this patch fixes this problem. All the mov %eax, %eax are removed. > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34653 No, this patch does not fix this problem. > > -- > H.J. >
Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)
Sorry, it is the other way around. Total number of zero-extension instructions before : 5814 Total number of zero-extension instructions after : 1456 Thanks for pointing it. On Wed, Sep 23, 2009 at 4:10 PM, Ramana Radhakrishnan wrote: >> >> GCC bootstrap : >> >> Total number of zero-extension instructions before : 1456 >> Total number of zero-extension instructions after : 5814 >> No impact on boot-strap time. > > > You sure you have these numbers the right way around ? Shouldn't the > number of zero-extension instructions after the patch be less than the > number of zero-extension instructions before or is this a regression > ? > > Thanks, > Ramana > >> >> >> I have attached the latest patch : >> >> >> On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther >> wrote: >>> On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam wrote: >>>> Hi, >>>> >>>> Here is a patch to eliminate redundant zero-extension instructions >>>> on x86_64. >>>> >>>> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified >>>> that the results are the same with/without this patch. >>> >>> The patch misses testcases. Why does zee run after register allocation? >>> Your examples suggest that it will free hard registers so doing it before >>> regalloc looks odd. >>> >>> What is the compile-time impact of your patch on say, gcc bootstrap? >>> How many percent of instructions are removed as useless zero-extensions >>> during gcc bootstrap? How much do CSiBE numbers improve? >>> >>> Thanks, >>> Richard. >>> >>>> >>>> Problem Description : >>>> - >>>> >>>> This pass is intended to be applicable only to targets that implicitly >>>> zero-extend 64-bit registers after writing to their lower 32-bit half. >>>> For instance, x86_64 zero-extends the upper bits of a register >>>> implicitly whenever an instruction writes to its lower 32-bit half. >>>> For example, the instruction *add edi,eax* also zero-extends the upper >>>> 32-bits of rax after doing the addition. These zero extensions come >>>> for free and GCC does not always exploit this well. That is, it has >>>> been observed that there are plenty of cases where GCC explicitly >>>> zero-extends registers for x86_64 that are actually useless because >>>> these registers were already implicitly zero-extended in a prior >>>> instruction. This pass tries to eliminate such useless zero extension >>>> instructions. >>>> >>>> Motivating Example I : >>>> -- >>>> For this program : >>>> ** >>>> bad_code.c >>>> >>>> int mask[1000]; >>>> >>>> int foo(unsigned x) >>>> { >>>> if (x < 10) >>>> x = x * 45; >>>> else >>>> x = x * 78; >>>> return mask[x]; >>>> } >>>> ** >>>> >>>> $ gcc -O2 bad_code.c >>>> >>>> 400315: b8 4e 00 00 00 mov $0x4e,%eax >>>> 40031a: 0f af f8 imul %eax,%edi >>>> 40031d: 89 ff mov %edi,%edi >>>> ---> Useless zero extend. >>>> 40031f: 8b 04 bd 60 19 40 00 mov 0x401960(,%rdi,4),%eax >>>> 400326: c3 retq >>>> .. >>>> 400330: ba 2d 00 00 00 mov $0x2d,%edx >>>> 400335: 0f af fa imul %edx,%edi >>>> 400338: 89 ff mov %edi,%edi ---> >>>> Useless zero extend. >>>> 40033a: 8b 04 bd 60 19 40 00 mov 0x401960(,%rdi,4),%eax >>>> 400341: c3 retq >>>> >>>> $ gcc -O2 -fzee bad_code.c >>>> .. >>>> 400315: 6b ff 4e imul $0x4e,%edi,%edi >>>> 400318: 8b 04 bd 40 19 40 00 mov 0x401940(,%rdi,4),%eax >>>> 40031f: c3 retq >>>> 400320: 6b ff 2d imul $0x2d,%edi,%edi >>>> 400323: 8b 04 bd 40 19 40 00 mov 0x401940(,%rdi,4),%eax >>>> 40032a: c3 retq >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Sriraman M Tallam. >>>> Google, Inc. >>>> tmsri...@google.com >>>> >>> >> >
Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)
Hi Richard, I finally got around to getting the data you wanted. Thanks for the response. Please find my comments below. On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther wrote: > On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam wrote: >> Hi, >> >>Here is a patch to eliminate redundant zero-extension instructions >> on x86_64. >> >> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified >> that the results are the same with/without this patch. > > The patch misses testcases. Added. Why does zee run after register allocation? > Your examples suggest that it will free hard registers so doing it before > regalloc looks odd. Originally, I had written this patch to have ZEE run before IRA. However, I noticed that IRA generates poorer code when my patch is turned on. Here is to give an example of how badly RA can hurt . I show a piece of code around a zero-extend that got eliminated. The code on the right is after eliminating zero-extends. The code is pretty much the same except the extra move highlighted in yellow. IRA is not able to coalesce %esi and %r15d. Base line : 48b760: imul $0x9e406cb5,%r15d,%esi 48b767: mov%rax,%rcx 48b76a: shr$0x12,%esi 48b76d: and%r12d,%esi 48b770: mov%edi,%eax 48b772: add$0x1,%edi 48b775: shr$0x5,%eax 48b778: mov%eax,%eax# redundant zero extend 48b77a: lea(%rcx,%rax,1),%rax 48b77e: cmp%rax,%r9 -fzee : 48b7d0: imul $0x9e406cb5,%r15d,%r15d # The destination should have just been esi. 48b7d7: mov%rax,%rcx 48b7da: shr$0x12,%r15d 48b7de: mov%r15d,%esi # This move is useless if r15d and esi can be coalesced into esi. 48b7e1: and%r12d,%esi 48b7e4: mov%edi,%eax 48b7e6: add$0x1,%edi 48b7e9: shr$0x5,%eax Ok, zero-extend eliminated. 48b7ec: lea(%rcx,%rax,1),%rax 48b7f0: cmp%rax,%r9 Going after IRA preserves code quality and the useless extension gets removed. > > What is the compile-time impact of your patch on say, gcc bootstrap? > How many percent of instructions are removed as useless zero-extensions > during gcc bootstrap? How much do CSiBE numbers improve? CSiBE numbers : Total number of zero-extension instructions before : 667. Total number of zero-extension instructions after : 122. Performance : no measurable impact. GCC bootstrap : Total number of zero-extension instructions before : 1456 Total number of zero-extension instructions after: 5814 No impact on boot-strap time. I have attached the latest patch : On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther wrote: > On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam wrote: >> Hi, >> >> Here is a patch to eliminate redundant zero-extension instructions >> on x86_64. >> >> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified >> that the results are the same with/without this patch. > > The patch misses testcases. Why does zee run after register allocation? > Your examples suggest that it will free hard registers so doing it before > regalloc looks odd. > > What is the compile-time impact of your patch on say, gcc bootstrap? > How many percent of instructions are removed as useless zero-extensions > during gcc bootstrap? How much do CSiBE numbers improve? > > Thanks, > Richard. > >> >> Problem Description : >> - >> >> This pass is intended to be applicable only to targets that implicitly >> zero-extend 64-bit registers after writing to their lower 32-bit half. >> For instance, x86_64 zero-extends the upper bits of a register >> implicitly whenever an instruction writes to its lower 32-bit half. >> For example, the instruction *add edi,eax* also zero-extends the upper >> 32-bits of rax after doing the addition. These zero extensions come >> for free and GCC does not always exploit this well. That is, it has >> been observed that there are plenty of cases where GCC explicitly >> zero-extends registers for x86_64 that are actually useless because >> these registers were already implicitly zero-extended in a prior >> instruction. This pass tries to eliminate such useless zero extension >> instructions. >> >> Motivating Example I : >> -- >> For this program : >> ** >> bad_code.c >> >> int mask[1000]; >> >> int foo(unsigned x) >> { >> if (x < 10) >> x = x * 45; >> else >> x = x * 78; >> return mask[x]; >> } >> ** >> >> $ gcc -O2 bad_code.c >> >> 400315: b8 4e 00 00 00 mov $0x4e,%eax >> 40031a: 0f af f8
Request for code review - (ZEE patch : Redundant Zero extension elimination)
Hi, Here is a patch to eliminate redundant zero-extension instructions on x86_64. Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified that the results are the same with/without this patch. Problem Description : - This pass is intended to be applicable only to targets that implicitly zero-extend 64-bit registers after writing to their lower 32-bit half. For instance, x86_64 zero-extends the upper bits of a register implicitly whenever an instruction writes to its lower 32-bit half. For example, the instruction *add edi,eax* also zero-extends the upper 32-bits of rax after doing the addition. These zero extensions come for free and GCC does not always exploit this well. That is, it has been observed that there are plenty of cases where GCC explicitly zero-extends registers for x86_64 that are actually useless because these registers were already implicitly zero-extended in a prior instruction. This pass tries to eliminate such useless zero extension instructions. Motivating Example I : -- For this program : ** bad_code.c int mask[1000]; int foo(unsigned x) { if (x < 10) x = x * 45; else x = x * 78; return mask[x]; } ** $ gcc -O2 bad_code.c 400315: b8 4e 00 00 00mov$0x4e,%eax 40031a: 0f af f8imul %eax,%edi 40031d: 89 ff mov%edi,%edi ---> Useless zero extend. 40031f: 8b 04 bd 60 19 40 00mov0x401960(,%rdi,4),%eax 400326: c3 retq .. 400330: ba 2d 00 00 00 mov$0x2d,%edx 400335: 0f af fa imul %edx,%edi 400338: 89 ff mov%edi,%edi ---> Useless zero extend. 40033a: 8b 04 bd 60 19 40 00mov0x401960(,%rdi,4),%eax 400341: c3 retq $ gcc -O2 -fzee bad_code.c .. 400315: 6b ff 4eimul $0x4e,%edi,%edi 400318: 8b 04 bd 40 19 40 00mov0x401940(,%rdi,4),%eax 40031f: c3 retq 400320: 6b ff 2dimul $0x2d,%edi,%edi 400323: 8b 04 bd 40 19 40 00mov0x401940(,%rdi,4),%eax 40032a: c3 retq Thanks, Sriraman M Tallam. Google, Inc. tmsri...@google.com Index: tree-pass.h === --- tree-pass.h (revision 150581) +++ tree-pass.h (working copy) @@ -475,6 +475,7 @@ extern struct rtl_opt_pass pass_initialize_regs; extern struct rtl_opt_pass pass_combine; extern struct rtl_opt_pass pass_if_after_combine; +extern struct rtl_opt_pass pass_implicit_zee; extern struct rtl_opt_pass pass_partition_blocks; extern struct rtl_opt_pass pass_match_asm_constraints; extern struct rtl_opt_pass pass_regmove; Index: timevar.def === --- timevar.def (revision 150581) +++ timevar.def (working copy) @@ -178,6 +178,7 @@ DEFTIMEVAR (TV_RELOAD_CSE_REGS , "reload CSE regs") DEFTIMEVAR (TV_SEQABSTR , "sequence abstraction") DEFTIMEVAR (TV_GCSE_AFTER_RELOAD , "load CSE after reload") +DEFTIMEVAR (TV_ZEE , "zee") DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue") DEFTIMEVAR (TV_IFCVT2 , "if-conversion 2") DEFTIMEVAR (TV_COMBINE_STACK_ADJUST , "combine stack adjustments") Index: implicit-zee.c === --- implicit-zee.c (revision 0) +++ implicit-zee.c (revision 0) @@ -0,0 +1,1029 @@ +/* Redundant Zero-extension elimination for targets that implicitly + zero-extend writes to the lower 32-bit portion of 64-bit registers. + Copyright (C) 2009 Free Software Foundation, Inc. + Contributed by Sriraman Tallam (tmsri...@google.com) and + Silvius Rus (r...@google.com) + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + + +/* Problem Description : + - + +This pass is intended to be applicable only to targets that implicitly +zero-extend 64-bit r