Re: [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)

2011-06-08 Thread Sriraman Tallam
+davidxl

On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam tmsri...@google.com wrote:
 Patch Description:
 =

 I am working on a project to do global function layout in the linker where 
 the linker reads the callgraph edge profile information, generated by FDO, 
 and uses that to find a ordering of functions that will place functions 
 calling each other frequently closer, like the Pettis-Hansen code ordering 
 algorithm described in the paper Profile-guided Code Poisitioning in PLDI 
 1990.

 This patch adds a flag that allows the callgraph edge profile information to 
 be stored .note sections called .note.callgraph.text. The new compiler flag 
 -fcallgraph-profiles-sections generates these sections and must be used along 
 with -fprofile-use. I have added a PARAM to only output callgraph edges 
 greater than a specified threshold. Once this is available, the linker can 
 read these sections and generate a global callgraph which can be used to 
 determine a global function ordering.

 I am adding plugin support in the gold linker to allow linker plugins to be 
 able to read the contents of sections and also adding plugin hooks to specify 
 a desired ordering of functions to the linker. The linker patch is available 
 here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is 
 available, linker plugins can be used to determine the function layout, like 
 the Pettis-Hansen algorithm, of the final binary.

 Example: The new .note.callgraph.text sections looks like this for a function 
 foo that calls bar 100 times and zap 50 times:
 
 .section        .note.callgraph.text._Z3foov,,@progbits
        .string Function _Z3foov
        .string _Z3barv
        .string 100
        .string _Z3zapv
        .string 50
 ***

 For now, this is for google/main. I will re-submit for review to trunk along 
 with data layout.

 Google ref 41940

 2011-06-07  Sriraman Tallam  tmsri...@google.com

        * doc/invoke.texi: document option -fcallgraph-profiles-sections.
        * final.c  (dump_cgraph_profiles): New function.
        (rest_of_handle_final): Create new section '.note.callgraph.text'
        with compiler flag -fcallgraph-profiles-sections
        * common.opt: New option -fcallgraph-profiles-sections.
        * params.def (DEFPARAM): New param
        PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.

 Index: doc/invoke.texi
 ===
 --- doc/invoke.texi     (revision 174789)
 +++ doc/invoke.texi     (working copy)
 @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
  -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
  -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
  -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
 --fcheck-data-deps -fclone-hot-version-paths @gol
 +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths 
 @gol
  -fcombine-stack-adjustments -fconserve-stack @gol
  -fcompare-elim -fcprop-registers -fcrossjumping @gol
  -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
 @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
  @opindex fripa-verbose
  Enable printing of verbose information about dynamic inter-procedural 
 optimizations.
  This is used in conjunction with the @option{-fripa}.
 +
 +@item -fcallgraph-profiles-sections
 +@opindex fcallgraph-profiles-sections
 +Emit call graph edge profile counts in .note.callgraph.text sections. This is
 +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
 +section is created for each function. This section lists every callee and the
 +number of times it is called. The params variable
 +note-cgraph-section-edge-threshold can be used to only list edges above a
 +certain threshold.
  @end table

  The following options control compiler behavior regarding floating
 Index: final.c
 ===
 --- final.c     (revision 174789)
 +++ final.c     (working copy)
 @@ -4321,13 +4321,37 @@ debug_free_queue (void)
       symbol_queue_size = 0;
     }
  }
 -
 +
 +/* List the call graph profiled edges whise value is greater than
 +   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
 +   .note.callgraph.text section. */
 +static void
 +dump_cgraph_profiles (void)
 +{
 +  struct cgraph_node *node = cgraph_node (current_function_decl);
 +  struct cgraph_edge *e;
 +  struct cgraph_node *callee;
 +
 +  for (e = node-callees; e != NULL; e = e-next_callee)
 +    {
 +      if (e-count = PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
 +        continue;
 +      callee = e-callee;
 +      fprintf (asm_out_file, \t.string \%s\\n,
 +               IDENTIFIER_POINTER (decl_assembler_name (callee-decl)));
 +      fprintf (asm_out_file, \t.string \ HOST_WIDEST_INT_PRINT_DEC \\n,
 +               e-count);
 +    }
 +}
 +
  /* 

Re: [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)

2011-06-08 Thread Sriraman Tallam
On Wed, Jun 8, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote:
 ok for google/main.

Thanks, the patch is now committed.


 David

 On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam tmsri...@google.com wrote:
 +davidxl

 On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam tmsri...@google.com wrote:
 Patch Description:
 =

 I am working on a project to do global function layout in the linker where 
 the linker reads the callgraph edge profile information, generated by FDO, 
 and uses that to find a ordering of functions that will place functions 
 calling each other frequently closer, like the Pettis-Hansen code ordering 
 algorithm described in the paper Profile-guided Code Poisitioning in PLDI 
 1990.

 This patch adds a flag that allows the callgraph edge profile information 
 to be stored .note sections called .note.callgraph.text. The new compiler 
 flag -fcallgraph-profiles-sections generates these sections and must be 
 used along with -fprofile-use. I have added a PARAM to only output 
 callgraph edges greater than a specified threshold. Once this is available, 
 the linker can read these sections and generate a global callgraph which 
 can be used to determine a global function ordering.

 I am adding plugin support in the gold linker to allow linker plugins to be 
 able to read the contents of sections and also adding plugin hooks to 
 specify a desired ordering of functions to the linker. The linker patch is 
 available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. 
 Once this is available, linker plugins can be used to determine the 
 function layout, like the Pettis-Hansen algorithm, of the final binary.

 Example: The new .note.callgraph.text sections looks like this for a 
 function foo that calls bar 100 times and zap 50 times:
 
 .section        .note.callgraph.text._Z3foov,,@progbits
        .string Function _Z3foov
        .string _Z3barv
        .string 100
        .string _Z3zapv
        .string 50
 ***

 For now, this is for google/main. I will re-submit for review to trunk 
 along with data layout.

 Google ref 41940

 2011-06-07  Sriraman Tallam  tmsri...@google.com

        * doc/invoke.texi: document option -fcallgraph-profiles-sections.
        * final.c  (dump_cgraph_profiles): New function.
        (rest_of_handle_final): Create new section '.note.callgraph.text'
        with compiler flag -fcallgraph-profiles-sections
        * common.opt: New option -fcallgraph-profiles-sections.
        * params.def (DEFPARAM): New param
        PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.

 Index: doc/invoke.texi
 ===
 --- doc/invoke.texi     (revision 174789)
 +++ doc/invoke.texi     (working copy)
 @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
  -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
  -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
  -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
 --fcheck-data-deps -fclone-hot-version-paths @gol
 +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths 
 @gol
  -fcombine-stack-adjustments -fconserve-stack @gol
  -fcompare-elim -fcprop-registers -fcrossjumping @gol
  -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
 @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
  @opindex fripa-verbose
  Enable printing of verbose information about dynamic inter-procedural 
 optimizations.
  This is used in conjunction with the @option{-fripa}.
 +
 +@item -fcallgraph-profiles-sections
 +@opindex fcallgraph-profiles-sections
 +Emit call graph edge profile counts in .note.callgraph.text sections. This 
 is
 +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
 +section is created for each function. This section lists every callee and 
 the
 +number of times it is called. The params variable
 +note-cgraph-section-edge-threshold can be used to only list edges above a
 +certain threshold.
  @end table

  The following options control compiler behavior regarding floating
 Index: final.c
 ===
 --- final.c     (revision 174789)
 +++ final.c     (working copy)
 @@ -4321,13 +4321,37 @@ debug_free_queue (void)
       symbol_queue_size = 0;
     }
  }
 -
 +
 +/* List the call graph profiled edges whise value is greater than
 +   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
 +   .note.callgraph.text section. */
 +static void
 +dump_cgraph_profiles (void)
 +{
 +  struct cgraph_node *node = cgraph_node (current_function_decl);
 +  struct cgraph_edge *e;
 +  struct cgraph_node *callee;
 +
 +  for (e = node-callees; e != NULL; e = e-next_callee)
 +    {
 +      if (e-count = PARAM_VALUE 
 (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
 +        continue;
 +      callee = e-callee;
 +      fprintf 

[google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)

2011-06-07 Thread Sriraman Tallam
Patch Description:
=

I am working on a project to do global function layout in the linker where the 
linker reads the callgraph edge profile information, generated by FDO, and uses 
that to find a ordering of functions that will place functions calling each 
other frequently closer, like the Pettis-Hansen code ordering algorithm 
described in the paper Profile-guided Code Poisitioning in PLDI 1990.

This patch adds a flag that allows the callgraph edge profile information to be 
stored .note sections called .note.callgraph.text. The new compiler flag 
-fcallgraph-profiles-sections generates these sections and must be used along 
with -fprofile-use. I have added a PARAM to only output callgraph edges greater 
than a specified threshold. Once this is available, the linker can read these 
sections and generate a global callgraph which can be used to determine a 
global function ordering.

I am adding plugin support in the gold linker to allow linker plugins to be 
able to read the contents of sections and also adding plugin hooks to specify a 
desired ordering of functions to the linker. The linker patch is available here 
: http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is 
available, linker plugins can be used to determine the function layout, like 
the Pettis-Hansen algorithm, of the final binary.

Example: The new .note.callgraph.text sections looks like this for a function 
foo that calls bar 100 times and zap 50 times:

.section.note.callgraph.text._Z3foov,,@progbits
.string Function _Z3foov
.string _Z3barv
.string 100
.string _Z3zapv
.string 50
***

For now, this is for google/main. I will re-submit for review to trunk along 
with data layout.

Google ref 41940

2011-06-07  Sriraman Tallam  tmsri...@google.com

* doc/invoke.texi: document option -fcallgraph-profiles-sections.
* final.c  (dump_cgraph_profiles): New function.
(rest_of_handle_final): Create new section '.note.callgraph.text'
with compiler flag -fcallgraph-profiles-sections
* common.opt: New option -fcallgraph-profiles-sections.
* params.def (DEFPARAM): New param
PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 174789)
+++ doc/invoke.texi (working copy)
@@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
 -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
 -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
 -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
--fcheck-data-deps -fclone-hot-version-paths @gol
+-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
 -fcombine-stack-adjustments -fconserve-stack @gol
 -fcompare-elim -fcprop-registers -fcrossjumping @gol
 -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
@@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
 @opindex fripa-verbose
 Enable printing of verbose information about dynamic inter-procedural 
optimizations.
 This is used in conjunction with the @option{-fripa}.
+
+@item -fcallgraph-profiles-sections
+@opindex fcallgraph-profiles-sections
+Emit call graph edge profile counts in .note.callgraph.text sections. This is
+used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
+section is created for each function. This section lists every callee and the
+number of times it is called. The params variable
+note-cgraph-section-edge-threshold can be used to only list edges above a
+certain threshold.
 @end table
 
 The following options control compiler behavior regarding floating
Index: final.c
===
--- final.c (revision 174789)
+++ final.c (working copy)
@@ -4321,13 +4321,37 @@ debug_free_queue (void)
   symbol_queue_size = 0;
 }
 }
-
+
+/* List the call graph profiled edges whise value is greater than
+   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
+   .note.callgraph.text section. */
+static void
+dump_cgraph_profiles (void)
+{
+  struct cgraph_node *node = cgraph_node (current_function_decl);
+  struct cgraph_edge *e;
+  struct cgraph_node *callee;
+
+  for (e = node-callees; e != NULL; e = e-next_callee)
+{
+  if (e-count = PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
+continue;
+  callee = e-callee;
+  fprintf (asm_out_file, \t.string \%s\\n,
+   IDENTIFIER_POINTER (decl_assembler_name (callee-decl)));
+  fprintf (asm_out_file, \t.string \ HOST_WIDEST_INT_PRINT_DEC \\n,
+   e-count);
+}
+}
+
 /* Turn the RTL into assembly.  */
 static unsigned int
 rest_of_handle_final (void)
 {
   rtx x;
   const char *fnname;
+  char *profile_fnname;
+  unsigned int flags;
 
   /* Get