Re: [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
+davidxl On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam tmsri...@google.com wrote: Patch Description: = I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper Profile-guided Code Poisitioning in PLDI 1990. This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called .note.callgraph.text. The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: .section .note.callgraph.text._Z3foov,,@progbits .string Function _Z3foov .string _Z3barv .string 100 .string _Z3zapv .string 50 *** For now, this is for google/main. I will re-submit for review to trunk along with data layout. Google ref 41940 2011-06-07 Sriraman Tallam tmsri...@google.com * doc/invoke.texi: document option -fcallgraph-profiles-sections. * final.c (dump_cgraph_profiles): New function. (rest_of_handle_final): Create new section '.note.callgraph.text' with compiler flag -fcallgraph-profiles-sections * common.opt: New option -fcallgraph-profiles-sections. * params.def (DEFPARAM): New param PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. Index: doc/invoke.texi === --- doc/invoke.texi (revision 174789) +++ doc/invoke.texi (working copy) @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol --fcheck-data-deps -fclone-hot-version-paths @gol +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol -fcombine-stack-adjustments -fconserve-stack @gol -fcompare-elim -fcprop-registers -fcrossjumping @gol -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline @opindex fripa-verbose Enable printing of verbose information about dynamic inter-procedural optimizations. This is used in conjunction with the @option{-fripa}. + +@item -fcallgraph-profiles-sections +@opindex fcallgraph-profiles-sections +Emit call graph edge profile counts in .note.callgraph.text sections. This is +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text +section is created for each function. This section lists every callee and the +number of times it is called. The params variable +note-cgraph-section-edge-threshold can be used to only list edges above a +certain threshold. @end table The following options control compiler behavior regarding floating Index: final.c === --- final.c (revision 174789) +++ final.c (working copy) @@ -4321,13 +4321,37 @@ debug_free_queue (void) symbol_queue_size = 0; } } - + +/* List the call graph profiled edges whise value is greater than + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the + .note.callgraph.text section. */ +static void +dump_cgraph_profiles (void) +{ + struct cgraph_node *node = cgraph_node (current_function_decl); + struct cgraph_edge *e; + struct cgraph_node *callee; + + for (e = node-callees; e != NULL; e = e-next_callee) + { + if (e-count = PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) + continue; + callee = e-callee; + fprintf (asm_out_file, \t.string \%s\\n, + IDENTIFIER_POINTER (decl_assembler_name (callee-decl))); + fprintf (asm_out_file, \t.string \ HOST_WIDEST_INT_PRINT_DEC \\n, + e-count); + } +} + /*
Re: [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
On Wed, Jun 8, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote: ok for google/main. Thanks, the patch is now committed. David On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam tmsri...@google.com wrote: +davidxl On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam tmsri...@google.com wrote: Patch Description: = I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper Profile-guided Code Poisitioning in PLDI 1990. This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called .note.callgraph.text. The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: .section .note.callgraph.text._Z3foov,,@progbits .string Function _Z3foov .string _Z3barv .string 100 .string _Z3zapv .string 50 *** For now, this is for google/main. I will re-submit for review to trunk along with data layout. Google ref 41940 2011-06-07 Sriraman Tallam tmsri...@google.com * doc/invoke.texi: document option -fcallgraph-profiles-sections. * final.c (dump_cgraph_profiles): New function. (rest_of_handle_final): Create new section '.note.callgraph.text' with compiler flag -fcallgraph-profiles-sections * common.opt: New option -fcallgraph-profiles-sections. * params.def (DEFPARAM): New param PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. Index: doc/invoke.texi === --- doc/invoke.texi (revision 174789) +++ doc/invoke.texi (working copy) @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol --fcheck-data-deps -fclone-hot-version-paths @gol +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol -fcombine-stack-adjustments -fconserve-stack @gol -fcompare-elim -fcprop-registers -fcrossjumping @gol -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline @opindex fripa-verbose Enable printing of verbose information about dynamic inter-procedural optimizations. This is used in conjunction with the @option{-fripa}. + +@item -fcallgraph-profiles-sections +@opindex fcallgraph-profiles-sections +Emit call graph edge profile counts in .note.callgraph.text sections. This is +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text +section is created for each function. This section lists every callee and the +number of times it is called. The params variable +note-cgraph-section-edge-threshold can be used to only list edges above a +certain threshold. @end table The following options control compiler behavior regarding floating Index: final.c === --- final.c (revision 174789) +++ final.c (working copy) @@ -4321,13 +4321,37 @@ debug_free_queue (void) symbol_queue_size = 0; } } - + +/* List the call graph profiled edges whise value is greater than + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the + .note.callgraph.text section. */ +static void +dump_cgraph_profiles (void) +{ + struct cgraph_node *node = cgraph_node (current_function_decl); + struct cgraph_edge *e; + struct cgraph_node *callee; + + for (e = node-callees; e != NULL; e = e-next_callee) + { + if (e-count = PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) + continue; + callee = e-callee; + fprintf
[google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)
Patch Description: = I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper Profile-guided Code Poisitioning in PLDI 1990. This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called .note.callgraph.text. The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: .section.note.callgraph.text._Z3foov,,@progbits .string Function _Z3foov .string _Z3barv .string 100 .string _Z3zapv .string 50 *** For now, this is for google/main. I will re-submit for review to trunk along with data layout. Google ref 41940 2011-06-07 Sriraman Tallam tmsri...@google.com * doc/invoke.texi: document option -fcallgraph-profiles-sections. * final.c (dump_cgraph_profiles): New function. (rest_of_handle_final): Create new section '.note.callgraph.text' with compiler flag -fcallgraph-profiles-sections * common.opt: New option -fcallgraph-profiles-sections. * params.def (DEFPARAM): New param PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. Index: doc/invoke.texi === --- doc/invoke.texi (revision 174789) +++ doc/invoke.texi (working copy) @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol --fcheck-data-deps -fclone-hot-version-paths @gol +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol -fcombine-stack-adjustments -fconserve-stack @gol -fcompare-elim -fcprop-registers -fcrossjumping @gol -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline @opindex fripa-verbose Enable printing of verbose information about dynamic inter-procedural optimizations. This is used in conjunction with the @option{-fripa}. + +@item -fcallgraph-profiles-sections +@opindex fcallgraph-profiles-sections +Emit call graph edge profile counts in .note.callgraph.text sections. This is +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text +section is created for each function. This section lists every callee and the +number of times it is called. The params variable +note-cgraph-section-edge-threshold can be used to only list edges above a +certain threshold. @end table The following options control compiler behavior regarding floating Index: final.c === --- final.c (revision 174789) +++ final.c (working copy) @@ -4321,13 +4321,37 @@ debug_free_queue (void) symbol_queue_size = 0; } } - + +/* List the call graph profiled edges whise value is greater than + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the + .note.callgraph.text section. */ +static void +dump_cgraph_profiles (void) +{ + struct cgraph_node *node = cgraph_node (current_function_decl); + struct cgraph_edge *e; + struct cgraph_node *callee; + + for (e = node-callees; e != NULL; e = e-next_callee) +{ + if (e-count = PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) +continue; + callee = e-callee; + fprintf (asm_out_file, \t.string \%s\\n, + IDENTIFIER_POINTER (decl_assembler_name (callee-decl))); + fprintf (asm_out_file, \t.string \ HOST_WIDEST_INT_PRINT_DEC \\n, + e-count); +} +} + /* Turn the RTL into assembly. */ static unsigned int rest_of_handle_final (void) { rtx x; const char *fnname; + char *profile_fnname; + unsigned int flags; /* Get