Re: Source Code for Profile Guided Code Positioning

2016-01-19 Thread Sriraman Tallam
On Fri, Jan 15, 2016 at 9:51 AM, Yury Gribov  wrote:
> On 01/15/2016 08:44 PM, vivek pandya wrote:
>>
>> Thanks Yury for
>> https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link.
>> It implements procedure reordering as linker plugin.
>> I have some questions :
>> 1 ) Can you point me to some documentation for "how to write plugin
>> for linkers " I am I have not seen doc for structs with 'ld_' prefix
>> (i.e defined in plugin-api.h )
>>   2 ) There is one more algorithm for Basic Block ordering with
>> execution frequency count in PH paper . Is there any implementation
>> available for it ?
>
>
> Quite frankly - I don't know (I've only learned about Google implementation
> recently).
>
> I've added Sriram to maybe comment.

Sorry for the late response.  The google/gcc_4_9 branch has the source
of function reordering linker Plugin.  It is available in the
function_reordering_plugin directory under the top level gcc
directory.

The function reordering plugin constructs a callgraph and uses profile
information to do a Pettis Hansen style function reordering.   This
plugin does not do basic block re-ordering.

There is no documentation as such that I am aware of to write a linker
plugin.  Here is a very brief overview.   The linker calls the
plugin's "onload" function when registering the plugin and the plugin
inturn can register two call-backs with the linker, "claim_file_hook"
and the "all_symbols_read_hook".  "claim_file_hook" is called  for
each object file that the linker prcesses and the
"all_symbols_read_hook" is called after all the symbols have been read
by the linker.  These are just two different interesting points in the
course of a link.

The plugin can also get handles to linker functions like
"get_input_section_name" which it can use to process sections given
their handle. You can also check the gold linker tests for simpler
plugin examples.

HTH,
Thanks
Sri

>
> -Y


Re: multiversioning page update?

2013-09-17 Thread Sriraman Tallam
On Wed, Sep 4, 2013 at 6:48 AM, Kenny Simpson  wrote:
> http://gcc.gnu.org/wiki/FunctionMultiVersioning  says "This support has been 
> checked in to trunk and should be available when GCC 4.8 is released."
>
>
> Since 4.8 has been released, and lists multiversioning support in the release 
> notes, should the page be updated to reflect this?

Updated.

Sri

>
> -Kenny
>


Re: What's up with g++.dg/ext/mv*.C?

2013-06-13 Thread Sriraman Tallam
On Thu, Jun 13, 2013 at 3:38 AM, Paolo Carlini  wrote:
> On 06/13/2013 12:35 PM, Paolo Carlini wrote:
>>
>> On 06/13/2013 12:28 PM, Paolo Carlini wrote:
>>>
>>> Hi,
>>>
>>> these FAILs are much more recent but frankly I'm also puzzled: is a fix
>>> actively in the making? Do we have any sort of time for that?
>>
>> This is PR57548 and a patch is approved but unapplied:
>>
>> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00426.html
>>
>> I'll do that myself ASAP if nobody beats me.
>
> In fact the patch is already in but the regressions are still there, eg:
>
> http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01257.html
>
> Is it actually a different issue or what?

This is probably http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56673

I will take a look at it. I am not sure what revision introduced the bug.

Sri

>
> Paolo.


Re: [wwwdocs] Release note entry for Function Multiversioning

2012-11-20 Thread Sriraman Tallam
On Tue, Nov 20, 2012 at 7:23 AM, Gerald Pfeifer  wrote:
> Hi Sri,
>
> On Tue, 13 Nov 2012, Sriraman Tallam wrote:
>> I have added a release note for Function Multiversioning which is
>> checked into trunk. Please review.
>
> Index: changes.html
> ===
> + Function Multiversioning Support with G++:
>
> Is this really specific to the C++ front end?  The example is C, not
> C++?

This support is only available with C++ front end for now. I intend to
add support to C front end soon.

>
> +It is now possible to create multiple function versions each targeting a
> +specific processor and/or ISA.  Function versions have the same signature
> +but different target attributes.  For example, here is a program with
> +function versions:
>
> I believe embedding this in  and  could be better.
>
> This is okay, thanks.

Thanks, made the change and submitted.

-Sri.

>
> Gerald


Re: Using -ffunction-sections and -p

2012-11-14 Thread Sriraman Tallam
On Sun, Nov 4, 2012 at 9:03 PM, Ian Lance Taylor  wrote:
> On Sun, Nov 4, 2012 at 8:04 PM, Sriraman Tallam  wrote:
>>
>>Currently, using -ffunction-sections and -p together results in a
>> warning. I ran into this problem when compiling the kernel. This is
>> discussed in this thread:
>>
>> http://gcc.gnu.org/ml/gcc-help/2008-11/msg00128.html
>>
>> Ian's reply suggests this warning is no longer necessary and can be
>> removed. Is this patch alright to commit with all tests passing:
>>
>> * toplev.c (process_options): Do not warn when -ffunction-sections
>>   and -fprofile are used together.
>
> In that thread Jeff suggested that this be tested on Solaris and PA.
> I don't know how to test on PA, but could somebody test the patch on
> Solaris?

Is it reasonable to gate this using a target hook and start allowing
this on selected targets? For instance, i386 does not seem to have a
problem with this.

Thanks,
-Sri.

>
> Ian


[wwwdocs] Release note entry for Function Multiversioning

2012-11-13 Thread Sriraman Tallam
Hi,

  I have added a release note for Function Multiversioning which is
checked into trunk. Please review.

Thanks,
-Sri.
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.56
diff -u -u -p -r1.56 changes.html
--- changes.html12 Nov 2012 15:19:33 -  1.56
+++ changes.html14 Nov 2012 01:15:15 -
@@ -297,6 +297,34 @@ B b(42); // OK
 }
 
 
+ Function Multiversioning Support with G++:
+It is now possible to create multiple function versions each targeting a
+specific processor and/or ISA.  Function versions have the same signature
+but different target attributes.  For example, here is a program with
+function versions:
+
+int foo(void)
+{
+  return 1;
+}
+
+__attribute__ ((target ("sse4.2")))
+int foo(void)
+{
+  return 2;
+}
+
+int main (void)
+{
+  int (*p) = &foo;
+  assert ((*p)() == foo());
+  return 0;
+}
+
+Please refer to this
+http://gcc.gnu.org/wiki/FunctionMultiVersioning";>wiki for more
+information.
+
   
 
 FRV


Using -ffunction-sections and -p

2012-11-04 Thread Sriraman Tallam
Hi,

   Currently, using -ffunction-sections and -p together results in a
warning. I ran into this problem when compiling the kernel. This is
discussed in this thread:

http://gcc.gnu.org/ml/gcc-help/2008-11/msg00128.html

Ian's reply suggests this warning is no longer necessary and can be
removed. Is this patch alright to commit with all tests passing:

* toplev.c (process_options): Do not warn when -ffunction-sections
  and -fprofile are used together.

--- toplev.c (revision 193117)
+++ toplev.c (working copy)
@@ -1467,12 +1467,6 @@ process_options (void)
  }
 }

-  if (flag_function_sections && profile_flag)
-{
-  warning (0, "-ffunction-sections disabled; it makes profiling
impossible");
-  flag_function_sections = 0;
-}
-
 #ifndef HAVE_prefetch
   if (flag_prefetch_loop_arrays > 0)
 {



Thanks,
-Sri.


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-30 Thread Sriraman Tallam
Hi Jakub,

   My function multiversioning patch is being reviewed and  I hope to
get this in by Nov. 5.

Thanks,
-Sri.

On Mon, Oct 29, 2012 at 10:56 AM, Jakub Jelinek  wrote:
> Status
> ==
>
> I'd like to close the stage 1 phase of GCC 4.8 development
> on Monday, November 5th.  If you have still patches for new features you'd
> like to see in GCC 4.8, please post them for review soon.  Patches
> posted before the freeze, but reviewed shortly after the freeze, may
> still go in, further changes should be just bugfixes and documentation
> fixes.
>
>
> Quality Data
> 
>
> Priority  #   Change from Last Report
> ---   ---
> P1   23   + 23
> P2   77   +  8
> P3   85   + 84
> ---   ---
> Total   185   +115
>
>
> Previous Report
> ===
>
> http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html
>
> The next report will be sent by me again, announcing end of stage 1.


Re: How much time left till phase 3?

2012-10-03 Thread Sriraman Tallam
On Tue, Oct 2, 2012 at 2:45 AM, Richard Guenther
 wrote:
> On Tue, Oct 2, 2012 at 11:34 AM, Richard Guenther
>  wrote:
>> On Tue, Oct 2, 2012 at 10:44 AM, Joern Rennecke
>>  wrote:
>>> I'll have to prepare a few more patches to (supposedly) generic
>>> code to support the ARCompact port, which we (Synopsys and Embecosm)
>>> would like contribute in time for gcc 4.8.
>>>
>>> How much time is left till we switch from phase 1 to phase 3?
>>
>> I expect stage1 to close mid to end of October (after which it lasted
>> for more than 7 months).
>
> Btw, I realize that the aarch64 port probably also wants to merge even if
> I didn't see a merge proposal or know whether they have patches to
> generic code.
>
> Anybody else with things they want to merge during stage1?

My multiversioning patch is under review and I am hoping for merge
during stage 1. The target part has already been reviewed by HJ. I am
waiting for reviews on the front-end and the cgraph part.

Patch here: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01717.html

Thanks,
-Sri.

>
> Thanks,
> Richard.
>
>> Richard.


Re: Reserving a bit in ELF segment flags for huge page mappings

2012-07-25 Thread Sriraman Tallam
On Tue, Jul 24, 2012 at 1:40 PM, Cary Coutant  wrote:
>>   To do this, I would like to reserve a bit in the segment flags to
>> indicate that this segment is to be mapped to huge pages if possible.
>> Can I reserve something like a PF_LARGE_PAGE bit?
>
> HP-UX has a PF_HP_PAGE_SIZE (0x0010) bit that says "Segment should
> be mapped with page size specified in p_align field".

Ok to define PF_LINUX_PAGE_SIZE similarly, same bit (0x0010) ? I
want this to be a hint to the loader.

Thanks,
-Sri.

>
> -cary


Reserving a bit in ELF segment flags for huge page mappings

2012-07-24 Thread Sriraman Tallam
Hi,

I am working on a patch to allow subsets of text sections to be
mapped to different ELF segments :
http://sourceware.org/ml/binutils/2012-07/msg00153.html using linker
plugins.

This will allow splitting hot and cold functions into separate
segments so that only the hot segment can be then mapped to text huge
pages. It has been found for some Google applications that mapping
functions to huge pages, along with function layout, provides a
significant performance benefit and this feature can take this further
by only mapping certain functions to huge pages.

  To do this, I would like to reserve a bit in the segment flags to
indicate that this segment is to be mapped to huge pages if possible.
Can I reserve something like a PF_LARGE_PAGE bit?

Thanks,
-Sri.


User directed Function Multiversioning (MV) via Function Overloading

2012-03-06 Thread Sriraman Tallam
Hi,

User directed Function Multiversioning (MV) via Function Overloading
===

I have created a set of patches to add support for user directed
function MV via function overloading. This was discussed in this
thread previously:
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html

Two patches have been created now to support this:
* The patch with the front-end changes to support versioned functions is:
 http://codereview.appspot.com/5752064/

* The patch to add runtime CPU type detection support is here:
 http://codereview.appspot.com/5754058/

With this support, here is an example of writing a program with
function versions:

int foo ();  /* Default version */
int foo () __attribute__ ((targetv("arch=corei7"))); /*Specialized for corei7 */
int foo () __attribute__ ((targetv("arch=amdfam10"))); /*Specialized
for amdfam10 */

int main ()
{
  int (*p)() = &foo;
  return foo () + (*p)();
}

int foo ()
{
  return 0;
}

int __attribute__ ((targetv("arch=corei7")))
foo ()
{
  ...
  return 0;
}

int __attribute__ ((targetv("arch=amdfam10")))
foo ()
{
  ...
  return 0;
}

The above example has foo defined 3 times, but all 3 definitions of
foo are different versions of the same function. The call to foo in
main, directly and via a pointer, are calls to the multi-versioned
function foo which is dispatched to the right foo at run-time.

Function versions must have the same signature but must differ in the
specifier string provided to a new attribute called "targetv", which
is nothing but the target attribute with an extra specification to
indicate a version. Any number of versions can be created using the
targetv attribute but it is mandatory to have one function without the
attribute, which is treated as the default version. The front-end
support is available in this patch:
 http://codereview.appspot.com/5752064/

The front-end treats multiple definitions of foo with the same
signature but with different targetv attributes as legitimate
candidates for overloading. Also, all the function versions of one
function are grouped together. Then, calls to foo and pointer access
of foo will be replaced by an IFUNC function (foo.ifunc) which will
call the dispatcher code at run-time to figure out the right version
to execute. For the above example, the following functions will be
created :

* _Z3foov.ifunc : ifunc dispatcher for multi-versioned function foo and
  aliases to _Z3foov.resolver. All calls and pointer accesses to foo are
  replaced by an call or pointer access to this function.
* _Z3foov.resolver : The code to determine which version to execute at
  run-time.
* _Z3foov : The default version of foo.
* _Z3foov.arch_corei7 : The corei7 version of foo.
* _Z3foov.arch_amdfam10 : The amdfam10 version of foo.

Note that using IFUNC  blocks inlining of versioned functions. I had
implemented an optimization earlier to do hot path cloning to allow
versioned functions to be inlined. Please see :
http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html
In the next iteration, I plan to merge these two. With that, hot code
paths with versioned functions will be cloned so that versioned
functions can be inlined.

The version dispatch itself happens in a newly created pass added to
be one of the initial lowering passes. The pass communicates with the
target to determine the appropriate predicates to use to figure out
which version to dispatch at run-time. The predicates are target
builtins which determine the platform type at run-time and are added
in this patch :
http://codereview.appspot.com/5754058/

The following features are being developed for the next iteration:

1) Support for hot path cloning to inline versioned functions.
2) Specifying multiple versions in a single function definition.

This will be done using the following syntax:
int foo ()
__attribute__ ((targetv (("arch=corei7"),("arch=amdfam10"), ("arch=core2";

which means the same body of foo must be cloned for corei7, amdfam10, and core2.

3) Specifying ISA types in the attribute. Only "arch=" is supported now.

For example,
int foo ()
__attribute__ ((targetv ("popcnt,ssse3")));

means the version is only to be executed when popcount and ssse3
instructions are available.

4) Other dispatching mechanism.

IFUNC is used for dispatch, but then the target does not support this
dispatching by directly calling the appropriate function version after
checking the platform type will be supported.

5) Virtual function versioning.

Thoughts?

Thanks,
-Sri.


Function Multiversioning Usability.

2011-08-16 Thread Sriraman Tallam
Hi,

  I am working on supporting function multi-versioning in GCC and here
is a write-up on its usability.

Multiversioning Usability


For a simple motivating example,

int
find_popcount(unsigned int i)
{
  return __builtin_popcount(i);
}

Currently, compiling this with -mpopcnt will result in the “popcnt”
instruction being used and otherwise call a built-in generic
implementation. It is desirable to have two versions of this function
so that it can be run both on targets that support the popcnt insn and
those that do not.


* Case I - User Guided Versioning where only one function body is
provided by the user.

This case addresses a use where the user wants multi-versioning but
provides only one function body.  I want to add a new attribute called
“mversion” which will be used like this:

int __attribute__(mversion(“popcnt”))
find_popcount(unsigned int i)
{
  return __builtin_popcount(i);
}

With the attribute, the compiler should understand that it should
generate two versions for this function. The user makes a call to this
function like a regular call but the code generated would call the
appropriate function at run-time based on a check to determine if that
instruction is supported or not.

The attribute can be scaled to support many versions but allowing a
comma separated list of values for the mversion attribute. For
instance, “__attribute__(mversion(“sse3”, “sse4”, ...)) will provide a
version for each. For N attributes, N clones plus one clone for the
default case will have to be generated by the compiler. The arguments
to the "mversion" attribute will be similar to the arguments supported
by the "target" attribute.

This attribute is useful if the same source is going to be used to
generate the different versions. If this has to be done manually, the
user has to duplicate the body of the function and specify a target
attribute of “popcnt” on one clone. Then, the user has to use
something like IFUNC support or manually write code to call the
appropriate version. All of this will be done automatically by the
compiler with this new attribute.

* Case II - User Guided Versioning where the function bodies for each
version differ and is provided by the user.

This case pertains to multi-versioning when the source bodies of the
two or more versions are different and are provided by the user. Here
too, I want to use a new attribute, “version”. Now, the user can
specify versioning intent like this:

int __attribute__((version(“popcnt”))
find_popcnt(unsigned int i)
{
   // inline assembly of the popcnt instruction, specialized version.
  asm(“popcnt ….”);
}

int
find_popcnt(unsigned int i)
{
  //generic code for doing this
  ...
}

This uses function overloading to specify versions.  The compiler will
understand that versioning is requested, since the functions have
different attributes with "version", and will generate the code to
execute the right function at run-time.  The compiler should check for
the existence of one body without the attribute which will be the
default version.

* Case III - Versioning is done automatically by the compiler.

I want to add a new compiler flag “-mversion” along the lines of “-m”.
If the user specifies “-mversion=popcnt” then the compiler will
automatically create two versions of any function that is impacted by
the new instruction. The difference between “-m” and “-mversion” will
be that while “-m” generates only the specialized version, “-mversion”
will generate both the specialized and the generic versions.  There is
no need to explicity mark any function for versioning, no source
changes.

The compiler will decide if it is beneficial to multi-version a
function based on heuristics using hotness information, code size
growth, etc.


Runtime support
===

In order for the compiler to generate multi-versioned code, it needs
to call functions that would test if a particular feature exists or
not at run-time. For example, IsPopcntSupported() would be one such
function. I have prepared a patch to do this which adds the runtime
support in libgcc and supports new builtins to test the various
features. I will send the patch separately to keep the dicussions
focused.


Thoughts?

Thanks,
-Sri.


Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)

2010-01-13 Thread Sriraman Tallam
Hi Jan,

Can you take a look at this patch when you find the time ? This is
being blocked needing an approval from a x86 backend maintainer and
you are the only one listed in the MAINTAINERS file.

Thanks,
-Sriraman.

On Tue, Oct 6, 2009 at 2:56 PM, Paolo Bonzini  wrote:
> On 10/01/2009 11:37 PM, Sriraman Tallam wrote:
>>
>> Hi,
>>
>>       I moved implicit-zee.c to config/i386. Can you please take another
>> look ?
>
> I think this patch is best reviewed by an x86 backend maintainer now.
>
> Thanks for doing the adjustments, BTW.
>
> Paolo
>


Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)

2009-10-06 Thread Sriraman Tallam
Hi Richard,

I was wondering if you got a chance to see if this new patch is alright ?.

Thanks,
-Sriraman.

On Thu, Oct 1, 2009 at 2:37 PM, Sriraman Tallam  wrote:
> Hi,
>
>      I moved implicit-zee.c to config/i386. Can you please take another look ?
>
>        * tree-pass.h (pass_implicit_zee): New pass.
>        * testsuite/gcc.target/i386/zee.c: New test.
>        * timevar.def (TV_ZEE): New.
>        * common.opt (fzee): New flag.
>        * config.gcc: Add implicit-zee.o for x86_64 target.
>        * implicit-zee.c: New file, zero extension elimination pass.
>        * config/i386/t-i386: Add rule for implicit-zee.o.
>        * i386.c (optimization_options): Enable zee pass for x86_64 target.
>
> Thanks,
> -Sriraman.
>
>
> On Thu, Sep 24, 2009 at 9:34 AM, Sriraman Tallam  wrote:
>> On Thu, Sep 24, 2009 at 1:36 AM, Richard Guenther
>>  wrote:
>>> On Thu, Sep 24, 2009 at 8:25 AM, Paolo Bonzini  wrote:
>>>> On 09/24/2009 08:24 AM, Ian Lance Taylor wrote:
>>>>>
>>>>> We already have the hooks, they have just been stuck in plugin.c when
>>>>> they should really be in the generic backend.  See register_pass.
>>>>>
>>>>> (Sigh, every time I looked at this I said "the pass control has to be
>>>>> generic" but it still wound up in plugin.c.)
>>>>
>>>> Then I'll rephrase and say only that the pass should be in config/i386/.
>>>
>>> It should also be on by default on -O[23s] I think (didn't check if it 
>>> already
>>> is).  Otherwise it shortly will go the see lala-land.
>>
>> It is already on by default in O2 and higher.
>>
>>>
>>> Richard.
>>>
>>>> Paolo
>>>>
>>>
>>
>


Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)

2009-10-01 Thread Sriraman Tallam
Hi,

  I moved implicit-zee.c to config/i386. Can you please take another look ?

* tree-pass.h (pass_implicit_zee): New pass.
* testsuite/gcc.target/i386/zee.c: New test.
* timevar.def (TV_ZEE): New.
* common.opt (fzee): New flag.
* config.gcc: Add implicit-zee.o for x86_64 target.
* implicit-zee.c: New file, zero extension elimination pass.
* config/i386/t-i386: Add rule for implicit-zee.o.
* i386.c (optimization_options): Enable zee pass for x86_64 target.

Thanks,
-Sriraman.


On Thu, Sep 24, 2009 at 9:34 AM, Sriraman Tallam  wrote:
> On Thu, Sep 24, 2009 at 1:36 AM, Richard Guenther
>  wrote:
>> On Thu, Sep 24, 2009 at 8:25 AM, Paolo Bonzini  wrote:
>>> On 09/24/2009 08:24 AM, Ian Lance Taylor wrote:
>>>>
>>>> We already have the hooks, they have just been stuck in plugin.c when
>>>> they should really be in the generic backend.  See register_pass.
>>>>
>>>> (Sigh, every time I looked at this I said "the pass control has to be
>>>> generic" but it still wound up in plugin.c.)
>>>
>>> Then I'll rephrase and say only that the pass should be in config/i386/.
>>
>> It should also be on by default on -O[23s] I think (didn't check if it 
>> already
>> is).  Otherwise it shortly will go the see lala-land.
>
> It is already on by default in O2 and higher.
>
>>
>> Richard.
>>
>>> Paolo
>>>
>>
>
Index: tree-pass.h
===
--- tree-pass.h (revision 152385)
+++ tree-pass.h (working copy)
@@ -500,6 +500,7 @@ extern struct rtl_opt_pass pass_stack_ptr_mod;
 extern struct rtl_opt_pass pass_initialize_regs;
 extern struct rtl_opt_pass pass_combine;
 extern struct rtl_opt_pass pass_if_after_combine;
+extern struct rtl_opt_pass pass_implicit_zee;
 extern struct rtl_opt_pass pass_partition_blocks;
 extern struct rtl_opt_pass pass_match_asm_constraints;
 extern struct rtl_opt_pass pass_regmove;
Index: testsuite/gcc.target/i386/zee.c
===
--- testsuite/gcc.target/i386/zee.c (revision 0)
+++ testsuite/gcc.target/i386/zee.c (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -fzee -S" } */
+/* { dg-final { scan-assembler-not "mov\[\\t \]+\(%\[\^,\]+\),\[\\t \]*\\1" } 
} */
+int mask[100];
+int foo(unsigned x)
+{
+  if (x < 10)
+x = x * 45;
+  else
+x = x * 78;
+  return mask[x];
+}
Index: timevar.def
===
--- timevar.def (revision 152385)
+++ timevar.def (working copy)
@@ -182,6 +182,7 @@ DEFTIMEVAR (TV_RELOAD, "reload")
 DEFTIMEVAR (TV_RELOAD_CSE_REGS   , "reload CSE regs")
 DEFTIMEVAR (TV_SEQABSTR  , "sequence abstraction")
 DEFTIMEVAR (TV_GCSE_AFTER_RELOAD  , "load CSE after reload")
+DEFTIMEVAR (TV_ZEE  , "zee")
 DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue")
 DEFTIMEVAR (TV_IFCVT2   , "if-conversion 2")
 DEFTIMEVAR (TV_COMBINE_STACK_ADJUST  , "combine stack adjustments")
Index: common.opt
===
--- common.opt  (revision 152385)
+++ common.opt  (working copy)
@@ -1099,6 +1099,10 @@ fsee
 Common
 Does nothing.  Preserved for backward compatibility.
 
+fzee
+Common Report Var(flag_zee) Init(0)
+Eliminate redundant zero extensions on targets that support implicit 
extensions.
+
 fshow-column
 Common C ObjC C++ ObjC++ Report Var(flag_show_column) Init(1)
 Show column numbers in diagnostics, when available.  Default on
Index: config.gcc
===
--- config.gcc  (revision 152385)
+++ config.gcc  (working copy)
@@ -2569,6 +2569,12 @@ powerpc*-*-* | rs6000-*-*)
tm_file="${tm_file} rs6000/option-defaults.h"
 esac
 
+case ${target} in
+x86_64-*-*)
+   extra_objs="${extra_objs} implicit-zee.o"
+   ;;
+esac
+
 # Support for --with-cpu and related options (and a few unrelated options,
 # too).
 case ${with_cpu} in
Index: config/i386/implicit-zee.c
=======
--- config/i386/implicit-zee.c  (revision 0)
+++ config/i386/implicit-zee.c  (revision 0)
@@ -0,0 +1,1029 @@
+/* Redundant Zero-extension elimination for targets that implicitly
+   zero-extend writes to the lower 32-bit portion of 64-bit registers.
+   Copyright (C) 2009 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsri...@google.com) and
+  Silvius Rus (r...@google.com)
+
+This file is 

Re: GCC 4.5 Status Report (2009-09-19)

2009-09-29 Thread Sriraman Tallam
Hi,

  I have a zero-extension elimination patch that has been reviewed and needs
one minor fix before it is ready for submission. I can get this in by Thursday,
October 1st. Would it be alright to submit this patch then ?

Thanks,
-Sriraman.


On Sat, Sep 19, 2009 at 1:57 PM, Richard Guenther  wrote:
>
> Status
> ==
>
> The trunk is in Stage 1.  Stage 1 will end on Sep 30th.  After Stage 1
> Stage 3 follows with only bugfixes and no new features allowed.
> Stage 3 will end Nov 30th.
>
> Since the last status report we have merged the VTA branch and pieces
> of the LTO branch.  The named address-spaces changes are still pending
> review but I expect it to be merged before the end of Stage 1.
> The rest of the LTO branch will be merged last, which practically
> means after Stage 1 is over.  Thus, starting Oct 1st the trunk will
> be frozen for the LTO merge and I'll announce Stage 3 once the merge
> is completed.
>
> There are still new ports pending review and approval.  As usual
> new ports can be accepted also during Stage 3.
>
> We've been accumulating quite a number of P1 bugs.  Entering Stage 3
> should allow to improve considerably here in a short time.
>
> Quality Data
> 
>
> Priority          #     Change from Last Report
>         ---     ---
> P1               22     + 6
> P2              111     + 7
> P3                6     + 6
>         ---     ---
> Total           139     +19
>
> Previous Report
> ===
>
> http://gcc.gnu.org/ml/gcc/2009-08/msg00427.html
>
> The next report will be sent by me announcing Stage 3 begin.
>
> --
> Richard Guenther 
> Novell / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus 
> Rex
>


Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)

2009-09-24 Thread Sriraman Tallam
On Thu, Sep 24, 2009 at 1:36 AM, Richard Guenther
 wrote:
> On Thu, Sep 24, 2009 at 8:25 AM, Paolo Bonzini  wrote:
>> On 09/24/2009 08:24 AM, Ian Lance Taylor wrote:
>>>
>>> We already have the hooks, they have just been stuck in plugin.c when
>>> they should really be in the generic backend.  See register_pass.
>>>
>>> (Sigh, every time I looked at this I said "the pass control has to be
>>> generic" but it still wound up in plugin.c.)
>>
>> Then I'll rephrase and say only that the pass should be in config/i386/.
>
> It should also be on by default on -O[23s] I think (didn't check if it already
> is).  Otherwise it shortly will go the see lala-land.

It is already on by default in O2 and higher.

>
> Richard.
>
>> Paolo
>>
>


Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)

2009-09-23 Thread Sriraman Tallam
On Wed, Sep 23, 2009 at 3:57 PM, H.J. Lu  wrote:
> On Sat, Aug 8, 2009 at 2:59 PM, Sriraman Tallam  wrote:
>> Hi,
>>
>>    Here is a patch to eliminate redundant zero-extension instructions
>> on x86_64.
>>
>> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified
>> that the results are the same with/without this patch.
>>
>>
>> Problem Description :
>> -
>>
>> This pass is intended to be applicable only to targets that implicitly
>> zero-extend 64-bit registers after writing to their lower 32-bit half.
>> For instance, x86_64 zero-extends the upper bits of a register
>> implicitly whenever an instruction writes to its lower 32-bit half.
>> For example, the instruction *add edi,eax* also zero-extends the upper
>> 32-bits of rax after doing the addition.  These zero extensions come
>> for free and GCC does not always exploit this well.  That is, it has
>> been observed that there are plenty of cases where GCC explicitly
>> zero-extends registers for x86_64 that are actually useless because
>> these registers were already implicitly zero-extended in a prior
>> instruction.  This pass tries to eliminate such useless zero extension
>> instructions.
>>
>
> Does this fix:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17387

Yes, this patch fixes this problem. All the mov %eax, %eax are removed.


> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34653

No, this patch does not fix this problem.

>
> --
> H.J.
>


Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)

2009-09-23 Thread Sriraman Tallam
Sorry, it is the other way around.

Total number of zero-extension instructions before  :  5814
Total number of zero-extension instructions after   :  1456

Thanks for pointing it.

On Wed, Sep 23, 2009 at 4:10 PM, Ramana Radhakrishnan
 wrote:
>>
>> GCC bootstrap :
>>
>> Total number of zero-extension instructions before  : 1456
>> Total number of zero-extension instructions after    :  5814
>> No impact on boot-strap time.
>
>
> You sure you have these numbers the right way around ? Shouldn't the
> number of zero-extension instructions after the patch be less than the
> number of zero-extension instructions before or is this a regression
> ?
>
> Thanks,
> Ramana
>
>>
>>
>> I have attached the latest patch :
>>
>>
>> On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther
>>  wrote:
>>> On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam wrote:
>>>> Hi,
>>>>
>>>>    Here is a patch to eliminate redundant zero-extension instructions
>>>> on x86_64.
>>>>
>>>> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified
>>>> that the results are the same with/without this patch.
>>>
>>> The patch misses testcases.  Why does zee run after register allocation?
>>> Your examples suggest that it will free hard registers so doing it before
>>> regalloc looks odd.
>>>
>>> What is the compile-time impact of your patch on say, gcc bootstrap?
>>> How many percent of instructions are removed as useless zero-extensions
>>> during gcc bootstrap?  How much do CSiBE numbers improve?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Problem Description :
>>>> -
>>>>
>>>> This pass is intended to be applicable only to targets that implicitly
>>>> zero-extend 64-bit registers after writing to their lower 32-bit half.
>>>> For instance, x86_64 zero-extends the upper bits of a register
>>>> implicitly whenever an instruction writes to its lower 32-bit half.
>>>> For example, the instruction *add edi,eax* also zero-extends the upper
>>>> 32-bits of rax after doing the addition.  These zero extensions come
>>>> for free and GCC does not always exploit this well.  That is, it has
>>>> been observed that there are plenty of cases where GCC explicitly
>>>> zero-extends registers for x86_64 that are actually useless because
>>>> these registers were already implicitly zero-extended in a prior
>>>> instruction.  This pass tries to eliminate such useless zero extension
>>>> instructions.
>>>>
>>>> Motivating Example I :
>>>> --
>>>> For this program :
>>>> **
>>>> bad_code.c
>>>>
>>>> int mask[1000];
>>>>
>>>> int foo(unsigned x)
>>>> {
>>>>  if (x < 10)
>>>>    x = x * 45;
>>>>  else
>>>>    x = x * 78;
>>>>  return mask[x];
>>>> }
>>>> **
>>>>
>>>> $ gcc -O2 bad_code.c
>>>>  
>>>>  400315:       b8 4e 00 00 00            mov    $0x4e,%eax
>>>>  40031a:       0f af f8                        imul   %eax,%edi
>>>>  40031d:       89 ff                             mov    %edi,%edi
>>>> ---> Useless zero extend.
>>>>  40031f:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
>>>>  400326:       c3                                 retq
>>>>  ..
>>>>  400330:       ba 2d 00 00 00          mov    $0x2d,%edx
>>>>  400335:       0f af fa                      imul   %edx,%edi
>>>>  400338:       89 ff                           mov    %edi,%edi  --->
>>>> Useless zero extend.
>>>>  40033a:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
>>>>  400341:       c3                      retq
>>>>
>>>> $ gcc -O2 -fzee bad_code.c
>>>>  ..
>>>>  400315:       6b ff 4e                imul   $0x4e,%edi,%edi
>>>>  400318:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
>>>>  40031f:       c3                      retq
>>>>  400320:       6b ff 2d                imul   $0x2d,%edi,%edi
>>>>  400323:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
>>>>  40032a:       c3                      retq
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Sriraman M Tallam.
>>>> Google, Inc.
>>>> tmsri...@google.com
>>>>
>>>
>>
>


Re: Request for code review - (ZEE patch : Redundant Zero extension elimination)

2009-09-23 Thread Sriraman Tallam
Hi Richard,

 I finally got around to getting the data you wanted. Thanks for
the response. Please
find my comments below.


On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther
 wrote:
> On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam wrote:
>> Hi,
>>
>>Here is a patch to eliminate redundant zero-extension instructions
>> on x86_64.
>>
>> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified
>> that the results are the same with/without this patch.
>
> The patch misses testcases.

Added.

Why does zee run after register allocation?
> Your examples suggest that it will free hard registers so doing it before
> regalloc looks odd.

Originally, I had written this patch to have ZEE run before IRA.
However, I noticed
that IRA generates poorer code when my patch is turned on.

Here is to give an example of how badly RA can hurt . I show a piece
of code around a
zero-extend that got eliminated. The code on the right is after
eliminating zero-extends.
The code is pretty much the same except the extra move highlighted in
yellow. IRA is not
able to coalesce %esi and %r15d.

Base line :

48b760: imul   $0x9e406cb5,%r15d,%esi
48b767: mov%rax,%rcx
48b76a: shr$0x12,%esi
48b76d: and%r12d,%esi
48b770: mov%edi,%eax
48b772: add$0x1,%edi
48b775: shr$0x5,%eax
48b778: mov%eax,%eax# redundant zero extend
48b77a: lea(%rcx,%rax,1),%rax
48b77e: cmp%rax,%r9


-fzee :

48b7d0: imul   $0x9e406cb5,%r15d,%r15d # The destination should have
just been esi.
48b7d7: mov%rax,%rcx
48b7da: shr$0x12,%r15d
48b7de: mov%r15d,%esi   # This move is useless if r15d and esi can
be coalesced into esi.
48b7e1: and%r12d,%esi
48b7e4: mov%edi,%eax
48b7e6: add$0x1,%edi
48b7e9: shr$0x5,%eax
Ok, zero-extend eliminated.
48b7ec: lea(%rcx,%rax,1),%rax
48b7f0: cmp%rax,%r9

Going after IRA preserves code quality and the useless extension gets removed.

>
> What is the compile-time impact of your patch on say, gcc bootstrap?
> How many percent of instructions are removed as useless zero-extensions
> during gcc bootstrap?  How much do CSiBE numbers improve?

CSiBE numbers :

Total number of zero-extension instructions before : 667.
Total number of zero-extension instructions after   : 122.
Performance : no measurable impact.

GCC bootstrap :

Total number of zero-extension instructions before  : 1456
Total number of zero-extension instructions after:  5814
No impact on boot-strap time.


I have attached the latest patch :


On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther
 wrote:
> On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam wrote:
>> Hi,
>>
>>    Here is a patch to eliminate redundant zero-extension instructions
>> on x86_64.
>>
>> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified
>> that the results are the same with/without this patch.
>
> The patch misses testcases.  Why does zee run after register allocation?
> Your examples suggest that it will free hard registers so doing it before
> regalloc looks odd.
>
> What is the compile-time impact of your patch on say, gcc bootstrap?
> How many percent of instructions are removed as useless zero-extensions
> during gcc bootstrap?  How much do CSiBE numbers improve?
>
> Thanks,
> Richard.
>
>>
>> Problem Description :
>> -
>>
>> This pass is intended to be applicable only to targets that implicitly
>> zero-extend 64-bit registers after writing to their lower 32-bit half.
>> For instance, x86_64 zero-extends the upper bits of a register
>> implicitly whenever an instruction writes to its lower 32-bit half.
>> For example, the instruction *add edi,eax* also zero-extends the upper
>> 32-bits of rax after doing the addition.  These zero extensions come
>> for free and GCC does not always exploit this well.  That is, it has
>> been observed that there are plenty of cases where GCC explicitly
>> zero-extends registers for x86_64 that are actually useless because
>> these registers were already implicitly zero-extended in a prior
>> instruction.  This pass tries to eliminate such useless zero extension
>> instructions.
>>
>> Motivating Example I :
>> --
>> For this program :
>> **
>> bad_code.c
>>
>> int mask[1000];
>>
>> int foo(unsigned x)
>> {
>>  if (x < 10)
>>    x = x * 45;
>>  else
>>    x = x * 78;
>>  return mask[x];
>> }
>> **
>>
>> $ gcc -O2 bad_code.c
>>  
>>  400315:       b8 4e 00 00 00            mov    $0x4e,%eax
>>  40031a:       0f af f8     

Request for code review - (ZEE patch : Redundant Zero extension elimination)

2009-08-08 Thread Sriraman Tallam
Hi,

Here is a patch to eliminate redundant zero-extension instructions
on x86_64.

Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified
that the results are the same with/without this patch.


Problem Description :
-

This pass is intended to be applicable only to targets that implicitly
zero-extend 64-bit registers after writing to their lower 32-bit half.
For instance, x86_64 zero-extends the upper bits of a register
implicitly whenever an instruction writes to its lower 32-bit half.
For example, the instruction *add edi,eax* also zero-extends the upper
32-bits of rax after doing the addition.  These zero extensions come
for free and GCC does not always exploit this well.  That is, it has
been observed that there are plenty of cases where GCC explicitly
zero-extends registers for x86_64 that are actually useless because
these registers were already implicitly zero-extended in a prior
instruction.  This pass tries to eliminate such useless zero extension
instructions.

Motivating Example I :
--
For this program :
**
bad_code.c

int mask[1000];

int foo(unsigned x)
{
  if (x < 10)
x = x * 45;
  else
x = x * 78;
  return mask[x];
}
**

$ gcc -O2 bad_code.c
  
  400315:   b8 4e 00 00 00mov$0x4e,%eax
  40031a:   0f af f8imul   %eax,%edi
  40031d:   89 ff mov%edi,%edi
---> Useless zero extend.
  40031f:   8b 04 bd 60 19 40 00mov0x401960(,%rdi,4),%eax
  400326:   c3 retq
  ..
  400330:   ba 2d 00 00 00  mov$0x2d,%edx
  400335:   0f af fa  imul   %edx,%edi
  400338:   89 ff   mov%edi,%edi  --->
Useless zero extend.
  40033a:   8b 04 bd 60 19 40 00mov0x401960(,%rdi,4),%eax
  400341:   c3  retq

$ gcc -O2 -fzee bad_code.c
  ..
  400315:   6b ff 4eimul   $0x4e,%edi,%edi
  400318:   8b 04 bd 40 19 40 00mov0x401940(,%rdi,4),%eax
  40031f:   c3  retq
  400320:   6b ff 2dimul   $0x2d,%edi,%edi
  400323:   8b 04 bd 40 19 40 00mov0x401940(,%rdi,4),%eax
  40032a:   c3  retq



Thanks,

Sriraman M Tallam.
Google, Inc.
tmsri...@google.com
Index: tree-pass.h
===
--- tree-pass.h (revision 150581)
+++ tree-pass.h (working copy)
@@ -475,6 +475,7 @@
 extern struct rtl_opt_pass pass_initialize_regs;
 extern struct rtl_opt_pass pass_combine;
 extern struct rtl_opt_pass pass_if_after_combine;
+extern struct rtl_opt_pass pass_implicit_zee;
 extern struct rtl_opt_pass pass_partition_blocks;
 extern struct rtl_opt_pass pass_match_asm_constraints;
 extern struct rtl_opt_pass pass_regmove;
Index: timevar.def
===
--- timevar.def (revision 150581)
+++ timevar.def (working copy)
@@ -178,6 +178,7 @@
 DEFTIMEVAR (TV_RELOAD_CSE_REGS   , "reload CSE regs")
 DEFTIMEVAR (TV_SEQABSTR  , "sequence abstraction")
 DEFTIMEVAR (TV_GCSE_AFTER_RELOAD  , "load CSE after reload")
+DEFTIMEVAR (TV_ZEE  , "zee")
 DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue")
 DEFTIMEVAR (TV_IFCVT2   , "if-conversion 2")
 DEFTIMEVAR (TV_COMBINE_STACK_ADJUST  , "combine stack adjustments")
Index: implicit-zee.c
===
--- implicit-zee.c  (revision 0)
+++ implicit-zee.c  (revision 0)
@@ -0,0 +1,1029 @@
+/* Redundant Zero-extension elimination for targets that implicitly
+   zero-extend writes to the lower 32-bit portion of 64-bit registers.
+   Copyright (C) 2009 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsri...@google.com) and
+  Silvius Rus (r...@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+
+/* Problem Description :
+  -
+
+This pass is intended to be applicable only to targets that implicitly
+zero-extend 64-bit r