Re: new patches using -fopt-info (issue5294043)
Well, you seem to keep not reading what I write. I am not opposed to adding -fopt-info/report nor to funnel messages to stdout/err. What I am opposed is the way you want to introduce them. I want you to fix what we dump into dump files, so that both -fopt-report and -fopt-info can be implemented by outputting selected pieces of the dump file to stdout/stderr. We already have -fdump-*-stats which supposedly could match -fopt-report, and the default -fdump-* should be what goes to -fopt-info (minus the function bodies, of course). That sounds good. What you propose seems like -fdump-pass-[ir_only|transformation|debug]-stderr and -fopt-info is a short cut for -fdump-tree-all-transformations-stderr -fdump-ipa-all-tranformations-stderr -fdump-rtl-all-transformations-stderr thanks, David Yes, dump files are a mess. So - why not clean them up, and at the same time annotate dump file pieces so _automatic_ filtering and redirecting to stdout with something like -fopt-report would do something sensible? I don't see why dump files have to stay messy while you at the same time would need to add _new_ code to dump to stdout for -fopt-report. In my mind, I would like to separate all dumps into three categories. 1) IR dumps, and support dump before and after (this reminds me my patches are still pending :) ) -fdump-tree-pre-[before|after]- Dump into .after, .before files 2) debug tracing etc: -fdump-tree-pre-debug-... Dump into .debug files. 3) opt report : -fdump-opt or -fopt-report Changes for 1) and 2) are mechanic but requires lots of work. You can do that, but I want the passes to use a single mechanism to feed all three separated dumps. Can you elaborate on single mechanism here? A set of well defined dumping APIs (instead of free form of if (dump_file) fprintf (dump_file, ...) ) ? Well, design one that will work. But yes, a set of well-defined dumping APIs, like print_start_{loop,location,region,...} (...); print_end_{loop...} (...); or so. debug_print (message, dump_flags, message_verbose_level, ...) Rather instead of verbosity levels use TDF_* flags (with maybe reorganizing them a bit) internally, a verbosity level can be implemented ontop of that by -fopt-{info,report} if needed. trace_enter (trace_header_note) trace_exit (trace_header_not) opt_info_print (location, message_template, insertion) Or how dump files are organized? I am all for clean up of dumping, but I don't see how -fopt-info get in the way of that. In the way? It is a prerequesite to both -fopt-info and -fopt-report. Otherwise you will end up adding _additional_ dumping to passes. Which is what I very very much object to. You can transition to the common dump API incrementally and only handle the passes you care for initially. But anything else from a common mechanism isn't going to be maintainable. So, no, please do it the right way that benefits both compiler developers and your power users. And yes, the right way is not to start adding that -fopt-report switch. The right way is to make dump-files consumable by mere mortals first. I agree we need to do the right way which needs to be discussed first. I would argue that mere mortals will really appreciate opt-info (separate from dump file and opt-report). Well, still what you print with opt-info should be better also be present with opt-report and in dump files. Thus it all boils down to be able to filter what passes put in their dump files. opt-report is different (needs to buffer information and dumping at the end of compilation). Why at the end of compilation? Passes already collect info for -stats dumping. What would -fopt-report print? Something like note: I have reduced size of your binary by 90% note: You should improve your programming skills ? Let's put -fopt-report aside for now as I don't have the slightest idea what it should be. Dump files and fopt-info can share the same dumping format -- whatever gets emitted by opt-info should also be emitted in the dump file (or replace the less well formated transformation messages that are already available in dump files), however simply filering the dump info does not solve the scalabilty issue I mentioned. What scalability issue? I see a maintainance issue and a code readability issue. Richard. thanks, David Richard. thanks, David Thanks, Richard. Thanks, David So, please fix dump-files instead. And for coverage/profiling, fill in stuff in a dump-file! Richard. It would be interested to have some warnings about missing SRA opportunities in =1 or =2. I found that sometimes fixing those can give a large speedup. Right now a common case that prevents SRA on structure field is simply a memset or memcpy. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: new patches using -fopt-info (issue5294043)
On Sun, Oct 23, 2011 at 3:18 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Oct 21, 2011 at 6:48 PM, Xinliang David Li davi...@google.com wrote: There are two proposals here. One is -fopt-info which prints out informational notes to stderr, and the other is -fopt-report which is more elaborate form of dump files. Are you object to both or just the opt-report one? What? I'm objected to adding _two_ variants. Didn't even realize you proposed that. They are different -- -fopt-info is on the fly -- the notes are emitted as the transformations are done while -fopt-report is for more structured report so it requires more compiler changes. Bringing in -fopt-report is a little distraction as the main discussion is on -fopt-info. The former is no different from any other informational notes we already have -- the only difference is that they are suppressed by default. We do not have many informational notes, so it is different. Why different? opt information notes are not even emitted by default. .. ... I very well understand the intent. But I disagree with where you start to implement this. Dump files are _not_ only for developers - after all we don't have anything else. -fopt-report can get as big and unmanagable to read as dump files - in fact I argue it will be worse than dump files if you go beyond very very coarse reporting. The problem of using dump files for optimization report is that all optimization decisions are 'distributed' in phase specific dumps file. For a whole program report, the number of files that are created is not manageable (think about a program with 4000 sources each dumping 200 files). If we create a dummy pass and suck in all optimization decisions in that pass's dump file -- it will be no different from opt-report. Well, -fopt-whatever will just funnel selected pieces also to stderr. I object to duplicate dumping when we just need a way to filter what goes to dump files. that is the main point -- using dump files are not scalable. If you are just against using stderr and propose dumping the selected information into a single shared dump file per build, I don't see the difference with using stderr -- they are not emitted by default and won't contaminate the build log. Yes, dump files are a mess. So - why not clean them up, and at the same time annotate dump file pieces so _automatic_ filtering and redirecting to stdout with something like -fopt-report would do something sensible? I don't see why dump files have to stay messy while you at the same time would need to add _new_ code to dump to stdout for -fopt-report. In my mind, I would like to separate all dumps into three categories. 1) IR dumps, and support dump before and after (this reminds me my patches are still pending :) ) -fdump-tree-pre-[before|after]- Dump into .after, .before files 2) debug tracing etc: -fdump-tree-pre-debug-... Dump into .debug files. 3) opt report : -fdump-opt or -fopt-report Changes for 1) and 2) are mechanic but requires lots of work. You can do that, but I want the passes to use a single mechanism to feed all three separated dumps. Can you elaborate on single mechanism here? A set of well defined dumping APIs (instead of free form of if (dump_file) fprintf (dump_file, ...) ) ? debug_print (message, dump_flags, message_verbose_level, ...) trace_enter (trace_header_note) trace_exit (trace_header_not) opt_info_print (location, message_template, insertion) Or how dump files are organized? I am all for clean up of dumping, but I don't see how -fopt-info get in the way of that. So, no, please do it the right way that benefits both compiler developers and your power users. And yes, the right way is not to start adding that -fopt-report switch. The right way is to make dump-files consumable by mere mortals first. I agree we need to do the right way which needs to be discussed first. I would argue that mere mortals will really appreciate opt-info (separate from dump file and opt-report). Well, still what you print with opt-info should be better also be present with opt-report and in dump files. Thus it all boils down to be able to filter what passes put in their dump files. opt-report is different (needs to buffer information and dumping at the end of compilation). Dump files and fopt-info can share the same dumping format -- whatever gets emitted by opt-info should also be emitted in the dump file (or replace the less well formated transformation messages that are already available in dump files), however simply filering the dump info does not solve the scalabilty issue I mentioned. thanks, David Richard. thanks, David Thanks, Richard. Thanks, David So, please fix dump-files instead. And for coverage/profiling, fill in stuff in a dump-file! Richard. It would be interested to have some warnings about missing SRA opportunities
Re: new patches using -fopt-info (issue5294043)
There are two proposals here. One is -fopt-info which prints out informational notes to stderr, and the other is -fopt-report which is more elaborate form of dump files. Are you object to both or just the opt-report one? The former is no different from any other informational notes we already have -- the only difference is that they are suppressed by default. .. ... I very well understand the intent. But I disagree with where you start to implement this. Dump files are _not_ only for developers - after all we don't have anything else. -fopt-report can get as big and unmanagable to read as dump files - in fact I argue it will be worse than dump files if you go beyond very very coarse reporting. The problem of using dump files for optimization report is that all optimization decisions are 'distributed' in phase specific dumps file. For a whole program report, the number of files that are created is not manageable (think about a program with 4000 sources each dumping 200 files). If we create a dummy pass and suck in all optimization decisions in that pass's dump file -- it will be no different from opt-report. Yes, dump files are a mess. So - why not clean them up, and at the same time annotate dump file pieces so _automatic_ filtering and redirecting to stdout with something like -fopt-report would do something sensible? I don't see why dump files have to stay messy while you at the same time would need to add _new_ code to dump to stdout for -fopt-report. In my mind, I would like to separate all dumps into three categories. 1) IR dumps, and support dump before and after (this reminds me my patches are still pending :) )-fdump-tree-pre-[before|after]- Dump into .after, .before files 2) debug tracing etc:-fdump-tree-pre-debug-... Dump into .debug files. 3) opt report : -fdump-opt or -fopt-report Changes for 1) and 2) are mechanic but requires lots of work. So, no, please do it the right way that benefits both compiler developers and your power users. And yes, the right way is not to start adding that -fopt-report switch. The right way is to make dump-files consumable by mere mortals first. I agree we need to do the right way which needs to be discussed first. I would argue that mere mortals will really appreciate opt-info (separate from dump file and opt-report). thanks, David Thanks, Richard. Thanks, David So, please fix dump-files instead. And for coverage/profiling, fill in stuff in a dump-file! Richard. It would be interested to have some warnings about missing SRA opportunities in =1 or =2. I found that sometimes fixing those can give a large speedup. Right now a common case that prevents SRA on structure field is simply a memset or memcpy. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: new patches using -fopt-info (issue5294043)
On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote: x...@google.com (Rong Xu) writes: After some off-line discussion, we decided to use a more general approach to control the printing of optimization messages/warnings. We will introduce a new option -fopt-info: * fopt-info=0 or fno-opt-info: no message will be emitted. * fopt-info or fopt-info=1: emit important warnings and optimization messages with large performance impact. * fopt-info=2: warnings and optimization messages targeting power users. * fopt-info=3: informational messages for compiler developers. This doesn't look scalable if you consider that each pass would print as much of a mess like -fvectorizer-verbose=5. What is not scalable? For level 1 dump, only the summary of vectorization will be printed just like other loop transformations. I think =2 and =3 should be omitted - we do have dump-files for a reason. Dump files are not easy to use -- it is big, and slow especially for people with large distributed build systems. Having both level 2 and 3 is debatable, but it will be useful to have a least one level above level 1. Dump files are mainly for compiler developers, while -fopt-info are for compiler developers *and* power users who know performance tuning. Also the coverage/profile cases you changed do not at all match ... with large performance impact. In fact the impact is completely unknown (as it would be the case usually). Impact of any transformations is just 'potential', coverage problems are no different from that. I'd rather have a way to make dump-files more structured (so, following some standard reporting scheme) than introducing yet another way of output. [after making dump-files more consistent it will be easy to revisit patches like this, there would be a natural general central way to implement it] Yes, I remember we have discussed about this before -- currently dump files are a big mess -- debug tracing, IR are all mixed up, but as I said above, this is a different matter -- it is for compiler developers. For more structured optimization report, we should use option -fopt-report which dump optimization information based on category -- the info data base can also be shared across modules: Example: [Loop Interchange] File a, line x, yyy File b, line xx, yyy File c, line z, It is beneficial to interchange the loop, but not done because of possible carried dependency (caused by false aliasing ...) [Loop Vectorization] [Loop Unroll] ... [SRA] [Alias summary] [Global Vars] a: addr exposed b: add not exposed .. [Global Pointers] .. ... Thanks, David So, please fix dump-files instead. And for coverage/profiling, fill in stuff in a dump-file! Richard. It would be interested to have some warnings about missing SRA opportunities in =1 or =2. I found that sometimes fixing those can give a large speedup. Right now a common case that prevents SRA on structure field is simply a memset or memcpy. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: new patches using -fopt-info (issue5294043)
While discussion for trunk version is still going, it is ok for google branches. thanks, David On Wed, Oct 19, 2011 at 4:28 PM, Rong Xu x...@google.com wrote: After some off-line discussion, we decided to use a more general approach to control the printing of optimization messages/warnings. We will introduce a new option -fopt-info: * fopt-info=0 or fno-opt-info: no message will be emitted. * fopt-info or fopt-info=1: emit important warnings and optimization messages with large performance impact. * fopt-info=2: warnings and optimization messages targeting power users. * fopt-info=3: informational messages for compiler developers. 2011-10-19 Rong Xu x...@google.com * gcc/common.opt (fopt-info): New flag. (fopt-info=) Ditto. * gcc/opts.c (common_handle_option): Handle OPT_fopt_info_. * gcc/flag-types.h (opt_info_verbosity_levels): New enum. * gcc/value-prof.c (check_ic_counter): guard warnings/notes by flag_opt_info. (find_func_by_funcdef_no): Ditto. (check_ic_target): Ditto. (check_counter): Ditto. (check_ic_counter): Ditto. * gcc/mcf.c (find_minimum_cost_flow): Ditto. * gcc/profile.c (read_profile_edge_counts): Ditto. (compute_branch_probabilities): Ditto. * gcc/coverage.c (read_counts_file): Ditto. (get_coverage_counts): Ditto. * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto. (maybe_issue_profile_use_note): Ditto. (optimize_reusedist): Ditto. * gcc/testsuite/gcc.dg/pr32773.c: add -fopt-info. * gcc/testsuite/gcc.dg/pr40209.c: Ditto. * gcc/testsuite/gcc.dg/pr26570.c: Ditto. * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto. Index: gcc/value-prof.c === --- gcc/value-prof.c (revision 180106) +++ gcc/value-prof.c (working copy) @@ -472,9 +472,10 @@ : DECL_SOURCE_LOCATION (current_function_decl); if (flag_profile_correction) { - inform (locus, correcting inconsistent value profile: - %s profiler overall count (%d) does not match BB count - (%d), name, (int)*all, (int)bb_count); + if (flag_opt_info = OPT_INFO_MAX) + inform (locus, correcting inconsistent value profile: %s + profiler overall count (%d) does not match BB count + (%d), name, (int)*all, (int)bb_count); *all = bb_count; if (*count *all) *count = *all; @@ -510,33 +511,42 @@ location_t locus; if (*count1 all flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Correcting inconsistent value profile: - ic (topn) profiler top target count (%ld) exceeds - BB count (%ld), (long)*count1, (long)all); + if (flag_opt_info = OPT_INFO_MAX) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Correcting inconsistent value profile: + ic (topn) profiler top target count (%ld) exceeds + BB count (%ld), (long)*count1, (long)all); + } *count1 = all; } if (*count2 all flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Correcting inconsistent value profile: - ic (topn) profiler second target count (%ld) exceeds - BB count (%ld), (long)*count2, (long)all); + if (flag_opt_info = OPT_INFO_MAX) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Correcting inconsistent value profile: + ic (topn) profiler second target count (%ld) exceeds + BB count (%ld), (long)*count2, (long)all); + } *count2 = all; } if (*count2 *count1) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Corrupted topn ic value profile: - first target count (%ld) is less than the second - target count (%ld), (long)*count1, (long)*count2); + if (flag_opt_info = OPT_INFO_MAX) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Corrupted topn ic value profile: + first target count (%ld) is less than the second +
Re: new patches using -fopt-info (issue5294043)
On Thu, Oct 20, 2011 at 12:53 PM, Andi Kleen a...@firstfloor.org wrote: This warnings/notes are caused by inconsistent profile due to data race (which is currently common in multi-thread programs), I never quite understood why the gcov counters are not simply marked __thread. This would make the profiled programs faster too because they wouldn't bounce cache lines that much. Especially on larger systems (2S) frequent cache line bouncing can lead to extreme slow downs, and even on smaller systems it's very expensive. This would also eliminate data races, except for signals and somesuch. It uses stack space and for programs with hundreds and thousands of threads, it can be a big problem. David -Andi
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Wed, Oct 19, 2011 at 12:02 PM, konstantin.s.serebry...@gmail.com wrote: Minimized the crash to this: struct Foo { unsigned bf1:1; unsigned bf2:1; unsigned bf3:1; }; void foo (struct Foo *ob) { ob-bf2 = 1; } D.2731_4 = ob_1(D)-bf2; __asan_base_addr.2 = (long unsigned int) D.2731_4; D.2732_5 = __asan_base_addr.2 3; D.2733_6 = 1 44; D.2734_7 = D.2732_5 + D.2733_6; D.2735_8 = VIEW_CONVERT_EXPRchar *(D.2734_7); # VUSE .MEM __asan_shadow.3 = *D.2735_8; D.2737_9 = __asan_shadow.3 != 0; D.2738_10 = __asan_base_addr.2 7; D.2739_11 = (char) D.2738_10; D.2740_12 = D.2739_11 + 0; D.2741_13 = D.2740_12 = __asan_shadow.3; __asan_crash_cond.4 = D.2737_9 D.2741_13; if (__asan_crash_cond.4 != 0) ./expand_expr_addr_expr_1_err.c: In function ‘foo’: ./expand_expr_addr_expr_1_err.c:8:11: internal compiler error: in expand_expr_addr_expr_1, at expr.c:7381 How do I avoid instrumenting bitfields? Use get_inner_reference to compute the bitpos, and check if it is multiple of bits_per_unit. David http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
what kind of failures? David On Wed, Oct 19, 2011 at 2:04 PM, konstantin.s.serebry...@gmail.com wrote: On 2011/10/19 20:38:35, kcc wrote: Added code to avoid bitfields. Is there anything I should know about splitting basic blocks in the presence of EH? Currently, asan fails on 483.xalancbmk, 453.povray and 450.soplex, which are known to have EH. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Oct 18, 2011 at 3:56 PM, konstantin.s.serebry...@gmail.com wrote: On 2011/10/18 22:52:33, davidxl wrote: http://codereview.appspot.com/5272048/diff/18001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325 tree-asan.c:325: base = build_addr (t, current_function_decl); There are issues with creating address expressions from TARGET_MEM_REF in gcc. Since you want compiler to do optimization on instrumented code as much as possible, asan instrumentation is better done as early as possible after ipa Why? I would actually say that I want the instrumentation to happen as late as possible because this will instrument fewer memory accesses. For example, asan certainly needs to happen after loop invariants are moved out and common subexpressions (including loads) are eliminated. No? yes -- so a good choice would be after PRE and PDE (pre, sink_code) which should handle most of the loop invariant memory loads. David inlining -- and this will also solve this problem. I tried moving asan pass before loop opt, and it worked fine. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
It will be weird to put the instrumentation pass inside loop opt, besides memory loads which are loop invariants and redundant stores in loop should be handled by pre/pde. +cc Richard Guenther You may want to ask the middle-end maintainer to review your code at this point if you want it to be in trunk. thanks, David On Tue, Oct 18, 2011 at 4:12 PM, konstantin.s.serebry...@gmail.com wrote: yes -- so a good choice would be after PRE and PDE (pre, sink_code) which should handle most of the loop invariant memory loads. how about pass_lim? I think asan should be after lim. http://codereview.appspot.com/5272048/
Re: [google] Suppress FDO-use related notes/warnings (issue5294043)
On Tue, Oct 18, 2011 at 3:48 PM, Rong Xu x...@google.com wrote: Suppress verbose notes/warnings printed in FDO-use compilation. (1) Add option -fprofile-use-verbose. Gcc currently does not emit informational messages on high level transformations such as inlining, value profiling transformations (ic promotion info messages are emitted in lipo mode in google/main), loop peeling and unrolling, and other loop opts. I can see those are very useful for triaging performance regressions etc, but turning those on would be too verbose for non-power users and can be annoying. They (when introduced in the future) should be guarded. I can see they should be guarded using the same option and possibly with a verbose level. Something like: -fopt-info=1,2,3 For now, a nonparameterized option should be good enough. (similarly, -fopt-report can be used to dump optimization report which is more structured -- e.g, grouped via optimizations, not functions). There are some unrelated changes (e.g, histogram free etc) which should be committed separately. David When this option is on, FDO-use compilation prints out all the notes as that of today. When this option is off (the default), all notes are suppressed. (2) Make several unconditional warnings under OPT_Wcoverage_mismatch. So that they can be turned off via -Wno-coverage-mismatch. (3) Make several unconditional warnings for LIPO under flag_ripa_verbose. (can be turned on via -fripa-verbose). This patch is for google-main only. Tested with bootstrap and regression tests. 2011-10-18 Rong Xu x...@google.com * gcc/common.opt (fprofile-use-verbose): New flag. * gcc/value-prof.c (check_ic_counter): guard notes by flag_profile_use_verbose. (find_func_by_funcdef_no): Ditto. (check_ic_target): Ditto. (check_counter): Ditto. (check_ic_counter): Ditto. * gcc/mcf.c (find_minimum_cost_flow): Ditto. * gcc/profile.c (read_profile_edge_counts): (compute_branch_probabilities): * gcc/coverage.c (read_counts_file): guard LIPO warnings by flag_ripa_verbose. (get_coverage_counts): guard notes by flag_profile_use_verbose; make warning under OPT_Wcoverage_mismatch. * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto. (maybe_issue_profile_use_note): Ditto. (optimize_reusedist): Ditto. * gcc/testsuite/gcc.dg/pr32773.c: add -fprofile-use-verbose. * gcc/testsuite/gcc.dg/pr40209.c: Ditto. * gcc/testsuite/gcc.dg/pr26570.c: Ditto. * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto. Index: gcc/value-prof.c === --- gcc/value-prof.c (revision 180106) +++ gcc/value-prof.c (working copy) @@ -472,9 +472,10 @@ : DECL_SOURCE_LOCATION (current_function_decl); if (flag_profile_correction) { - inform (locus, correcting inconsistent value profile: - %s profiler overall count (%d) does not match BB count - (%d), name, (int)*all, (int)bb_count); + if (flag_profile_use_verbose) + inform (locus, correcting inconsistent value profile: %s + profiler overall count (%d) does not match BB count + (%d), name, (int)*all, (int)bb_count); *all = bb_count; if (*count *all) *count = *all; @@ -510,33 +511,42 @@ location_t locus; if (*count1 all flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Correcting inconsistent value profile: - ic (topn) profiler top target count (%ld) exceeds - BB count (%ld), (long)*count1, (long)all); + if (flag_profile_use_verbose) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Correcting inconsistent value profile: + ic (topn) profiler top target count (%ld) exceeds + BB count (%ld), (long)*count1, (long)all); + } *count1 = all; } if (*count2 all flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Correcting inconsistent value profile: - ic (topn) profiler second target count (%ld) exceeds - BB count (%ld), (long)*count2, (long)all); + if (flag_profile_use_verbose) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Correcting inconsistent value
Re: [google] support for building Linux kernel with FDO (issue4523061)
This patch is for google/main which is 4.7 based, but the validated version is in google_46 branch (which is based on 4.6). By the way (given that you are from intel), do you know if linux kernel can be built with icc with PGO turned on? Our intern Xiaotian has tried to use icc (12.0) to built kernel, and had some problems. The bootable kernel built with icc + gcc (for those failed with icc) does not perform quite well. Thanks, David On Thu, Oct 13, 2011 at 7:02 PM, vulcansh steven.t.hamp...@intel.com wrote: Rong Xu wrote: That will be good. But you never know, we internally have fixed some bugs that filed to us because people use kernel's old gcov code (many versions guarded by ifdef) for their tests. -Rong Has there been any progress one this patch? What version of gcc is this patch for? I am interested in something that works with gcc 4.7. -Steve -- View this message in context: http://old.nabble.com/-google---support-for-building-Linux-kernel-with-FDO-%28issue4523061%29-tp31607746p32649731.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: [google] record compiler options to .note sections
ok. David On Tue, Oct 11, 2011 at 9:51 PM, Dehao Chen de...@google.com wrote: Attached is the new patch. Bootstrapped on x86_64, no regressions. gcc/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com Add a flag (-frecord-gcc-switches-in-elf) to record compiler command line options to .gnu.switches.text sections of the object file. * coverage.c (write_opts_to_asm): Write the options to .gnu.switches.text sections. * common.opt: Ditto. * opts.h: Ditto. gcc/c-family/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com * c-opts.c (c_common_parse_file): Write the options to .gnu.switches.text sections. gcc/testsuite/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com * gcc.dg/record-gcc-switches-in-elf-1.c: New test. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 179836) +++ gcc/doc/invoke.texi (working copy) @@ -391,6 +391,7 @@ -fpmu-profile-generate=@var{pmuoption} @gol -fpmu-profile-use=@var{pmuoption} @gol -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol +-frecord-gcc-switches-in-elf@gol -freorder-blocks-and-partition -freorder-functions @gol -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol -fripa -fripa-disallow-asm-modules -fripa-disallow-opt-mismatch @gol @@ -8170,6 +8171,11 @@ number of times it is called. The params variable note-cgraph-section-edge-threshold can be used to only list edges above a certain threshold. + +@item -frecord-gcc-switches-in-elf +@opindex frecord-gcc-switches-in-elf +Record the command line options in the .gnu.switches.text elf section for sample +based LIPO to do module grouping. @end table The following options control compiler behavior regarding floating Index: gcc/c-family/c-opts.c === --- gcc/c-family/c-opts.c (revision 179836) +++ gcc/c-family/c-opts.c (working copy) @@ -1109,6 +1109,8 @@ for (;;) { c_finish_options (); + if (flag_record_gcc_switches_in_elf i == 0) + write_opts_to_asm (); pch_init (); set_lipo_c_parsing_context (parse_in, i, verbose); push_file_scope (); Index: gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c === --- gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0) +++ gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile} */ +/* { dg-options -frecord-gcc-switches-in-elf -Dtest -dA } */ + +void foobar(int); + +void +foo (void) +{ + int i; + for (i = 0; i 100; i++) + { + foobar(i); + } +} + +/* { dg-final { scan-assembler-times Dtest 1 } } */ Index: gcc/opts.h === --- gcc/opts.h (revision 179836) +++ gcc/opts.h (working copy) @@ -381,4 +381,5 @@ extern void set_struct_debug_option (struct gcc_options *opts, location_t loc, const char *value); +extern void write_opts_to_asm (void); #endif Index: gcc/coverage.c === --- gcc/coverage.c (revision 179836) +++ gcc/coverage.c (working copy) @@ -55,6 +55,7 @@ #include diagnostic-core.h #include intl.h #include l-ipo.h +#include dwarf2asm.h #include gcov-io.h #include gcov-io.c @@ -2146,4 +2147,69 @@ return 0; } +/* Write command line options to the .note section. */ + +void +write_opts_to_asm (void) +{ + size_t i; + cpp_dir *quote_paths, *bracket_paths, *pdir; + struct str_list *pdef, *pinc; + int num_quote_paths = 0; + int num_bracket_paths = 0; + + get_include_chains (quote_paths, bracket_paths); + + /* Write quote_paths to ASM section. */ + switch_to_section (get_section (.gnu.switches.text.quote_paths, + SECTION_DEBUG, NULL)); + for (pdir = quote_paths; pdir; pdir = pdir-next) + { + if (pdir == bracket_paths) + break; + num_quote_paths++; + } + dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL); + dw2_asm_output_data_uleb128 (num_quote_paths, NULL); + for (pdir = quote_paths; pdir; pdir = pdir-next) + { + if (pdir == bracket_paths) + break; + dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL); + } + + /* Write bracket_paths to ASM section. */ + switch_to_section (get_section (.gnu.switches.text.bracket_paths, + SECTION_DEBUG, NULL)); + for (pdir = bracket_paths; pdir; pdir = pdir-next) + num_bracket_paths++; + dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL); + dw2_asm_output_data_uleb128 (num_bracket_paths, NULL); + for (pdir =
Re: [google] record compiler options to .note sections
Ok for google branches. 1) document the difference of this option with -grecord-gcc-switches (this one only record codegen related options, and recorded in debug section), and with -frecord-gcc-switches? 2) may be better to use option name: -frecord-gcc-switches-in-object thanks, David On Sun, Oct 9, 2011 at 6:16 PM, Dehao Chen de...@google.com wrote: On Sun, Oct 9, 2011 at 5:28 PM, Jakub Jelinek ja...@redhat.com wrote: On Sun, Oct 09, 2011 at 09:18:25AM +0800, Dehao Chen wrote: Unfortunately -frecord-gcc-switches cannot serve our purpose because the recorded switches are mergable, i.e. the linker will merge all options to a set of strings. However, object files may have distinct compile options. We want to preserve every object file's compile options when doing LIPO build. And -grecord-gcc-switches? That one, although it is mergeable, still preserves every object files's compile options. I tried -grecord-gcc-switches, but looks like it's not recording options that I want. e.g. the following two commands output the same assembly code, while the former should record one more options. gcc -g3 -grecord-gcc-switches a.c -Dabcdefgh -Dxy -I/usr/ -S gcc -g3 -grecord-gcc-switches a.c -Dabcdefgh -Dxy -S Thanks, Dehao Jakub
Re: patch ping: Dump the degree of overlap to compare static profile with instrumentation profile
Yes, this is useful for static prediction tuning. David On Fri, Oct 7, 2011 at 6:40 PM, Dehao Chen de...@google.com wrote: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01293.html
Re: Disable early inlining while compiling for coverage (issue5173042)
On Sat, Oct 1, 2011 at 5:17 AM, Jan Hubicka hubi...@ucw.cz wrote: Yes, this will improve test coverage option's usability, but please provide the example to explain the issues. David On Fri, Sep 30, 2011 at 6:12 PM, Sharad Singhai sing...@google.com wrote: This patch disables early inlining when --coverage option is specified. This improves coverage data in presence of other optimizations, specially with -O2 where early inlining changes the control flow graph sufficiently enough to generate seemingly very odd source coverage. BTW early inlining was introduced to make coverage possible on some C++ sources, like tramp3d ;) Early inline can be important for FDO performance reasons -- as inline instances get 'context' sensitive profile data. However the problem here is that since we moved coverage to SSA, we do it too late. My longer term plan is to separate coverage and profile feedback passes (so they can't be done both together) and instrument early for coverage ensure that everything before coverage is done is save WRT line info. Yes, coverage and FDO are two different animals having different requirements -- they happen to share the same instrumentation mechanism. Inlining alone does not mess up the line info, but most optimizations we have in early optimization queue do. Inlining can do some damage too but to a less extent. For instance, the exit block of the callee instance merged with caller's bb causing confusion. We discussed it back when Richi implemented SSA profiling but we didn't do that basically due to lack of testcases. Would be possible to take one you have and fill in some PRs? Those are regressions WRT pre-SSA profiling releases (I think 4.5?) Yes. Sharad, I did not see the test case attached? Please file a bug about this. In the meantime, you can checkin the workaround to google banches. thanks, David Honza Bootstrapped okay and regression tests passed. Okay for google/gcc-4_6? 2011-09-30 Sharad Singhai sing...@google.com * gcc.c (cc1_options): Added -fno-early-inlining for coverage. Index: gcc.c === --- gcc.c (revision 179402) +++ gcc.c (working copy) @@ -776,7 +776,7 @@ %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}}\ %{fsyntax-only:-o %j} %{-param*}\ %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants}\ - %{coverage:-fprofile-arcs -ftest-coverage}; + %{coverage:-fprofile-arcs -ftest-coverage -fno-early-inlining}; /* If an assembler wrapper is used to invoke post-assembly tools like MAO, --save-temps need to be passed to save the output of -- This patch is available for review at http://codereview.appspot.com/5173042
Re: [google] Fix bugs in sampled profile collection
ok. David On Fri, Sep 30, 2011 at 6:54 PM, Easwaran Raman era...@google.com wrote: This fixes two issues with sampled profile collection. It delays cleanup of instrumentation_to_be_sampled after all callgraph nodes have been instrumented and prevents gcov_sample_counter_decl and gcov_sampling_rate_decl from being garbage collected. Ok for google/gcc-4_6 and google/main branches? -Easwaran 2011-09-30 Easwaran Raman era...@google.com * tree-profile.c (gcov_sample_counter_decl): Add GTY marker. (gcov_sampling_rate_decl): Likewise. (add_sampling_to_edge_counters): Do not free instrumentation_to_be_sampled. (cleanup_instrumentation_sampling): New function. (tree_profiling): Call cleanup_instrumentation_sampling at the end. testsuite/ChangeLog.google-4_6: 2011-09-30 Easwaran Raman era...@google.com * gcc.dg/sample-profile-generate-1.c: New test. Index: gcc/testsuite/gcc.dg/sample-profile-generate-1.c === --- gcc/testsuite/gcc.dg/sample-profile-generate-1.c (revision 0) +++ gcc/testsuite/gcc.dg/sample-profile-generate-1.c (revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile} */ +/* { dg-options -O2 -fprofile-generate -fprofile-generate-sampling } */ + +void foobar(int); + +void +foo (void) +{ + int i; + for (i = 0; i 100; i++) + { + foobar(i); + } +} + +void +bar (void) +{ + int i; + for (i = 0; i 100; i++) + { + foobar(i); + } +} + +/* { dg-final { cleanup-coverage-files } } */ Index: tree-profile.c === --- tree-profile.c (revision 178897) +++ tree-profile.c (working copy) @@ -163,10 +163,10 @@ init_ic_make_global_vars (void) static struct pointer_set_t *instrumentation_to_be_sampled = NULL; /* extern __thread gcov_unsigned_t __gcov_sample_counter */ -static tree gcov_sample_counter_decl = NULL_TREE; +static GTY(()) tree gcov_sample_counter_decl = NULL_TREE; /* extern gcov_unsigned_t __gcov_sampling_rate */ -static tree gcov_sampling_rate_decl = NULL_TREE; +static GTY(()) tree gcov_sampling_rate_decl = NULL_TREE; /* forward declaration. */ void gimple_init_instrumentation_sampling (void); @@ -281,9 +281,13 @@ add_sampling_to_edge_counters (void) break; } } +} +static void +cleanup_instrumentation_sampling (void) +{ /* Free the bitmap. */ - if (instrumentation_to_be_sampled) + if (flag_profile_generate_sampling instrumentation_to_be_sampled) { pointer_set_destroy (instrumentation_to_be_sampled); instrumentation_to_be_sampled = NULL; @@ -1452,6 +1456,7 @@ tree_profiling (void) } del_node_map(); + cleanup_instrumentation_sampling(); return 0; }
Re: Disable early inlining while compiling for coverage (issue5173042)
Yes, this will improve test coverage option's usability, but please provide the example to explain the issues. David On Fri, Sep 30, 2011 at 6:12 PM, Sharad Singhai sing...@google.com wrote: This patch disables early inlining when --coverage option is specified. This improves coverage data in presence of other optimizations, specially with -O2 where early inlining changes the control flow graph sufficiently enough to generate seemingly very odd source coverage. Bootstrapped okay and regression tests passed. Okay for google/gcc-4_6? 2011-09-30 Sharad Singhai sing...@google.com * gcc.c (cc1_options): Added -fno-early-inlining for coverage. Index: gcc.c === --- gcc.c (revision 179402) +++ gcc.c (working copy) @@ -776,7 +776,7 @@ %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}}\ %{fsyntax-only:-o %j} %{-param*}\ %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants}\ - %{coverage:-fprofile-arcs -ftest-coverage}; + %{coverage:-fprofile-arcs -ftest-coverage -fno-early-inlining}; /* If an assembler wrapper is used to invoke post-assembly tools like MAO, --save-temps need to be passed to save the output of -- This patch is available for review at http://codereview.appspot.com/5173042
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
Sri, please add a new api to do cpu_indicator initialization on demand to be used in IFUNC context. Perhaps also add some debug check to make sure no conflicting cpu model is set. Ok for google branches for now while the discussion continues. thanks, David On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. 2) Add symbol versions. 3) Move all builtins to the i386 port. 4) Add check for atom processor. 5) No separate passes to fold the builtins. Please let me know what you think. Thanks, -Sri. * config/i386/i386.c (build_struct_with_one_bit_fields): New function. (make_var_decl): New function. (get_field_from_struct): New function. (fold_builtin_target): New function. (ix86_fold_builtin): New function. (ix86_expand_builtin): Expand new builtins by folding them. (TARGET_FOLD_BUILTIN): New macro. (IX86_BUILTIN_CPU_SUPPORTS_CMOV): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_MMX): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_SSE): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_SSE2): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_SSE3): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_SSSE3): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_SSE4_1): New enum value. (IX86_BUILTIN_CPU_SUPPORTS_SSE4_2): New enum value. (IX86_BUILTIN_CPU_IS_AMD): New enum value. (IX86_BUILTIN_CPU_IS_INTEL): New enum value. (IX86_BUILTIN_CPU_IS_INTEL_ATOM): New enum value. (IX86_BUILTIN_CPU_IS_INTEL_CORE2): New enum value. (IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM): New enum value. (IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE): New enum value. (IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE): New enum value. (IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA): New enum value. (IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI): New enum value. (IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL): New enum value. * config/i386/libgcc-glibc.ver (__cpu_indicator_init): Export symbol. (__cpu_model): Export symbol. (__cpu_features): Export symbol. * config/i386/i386-builtin-types.def: New function type. * config/i386/i386-cpuinfo.c: New file. * config/i386/t-cpuinfo: New file. * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc * gcc.dg/builtin_target.c: New test. On Tue, Aug 23, 2011 at 4:35 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 22 Aug 2011, H.J. Lu wrote: void __attribute__((constructor)) bla(void) { __cpu_indicator_init (); } I don't see any complication.? Order of constructors. A constructor may call functions which use __cpu_indicator. That's why I wrote also: The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. It obviously has to be guarded against multiple calls. The ctor in libgcc would be mere convenience because then non-ctor code can rely on the data being initialized, and only (potential) ctor code has to check and call the init function on demand. Ciao, Michael.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
Is there a standard way to force this init function to be called before all ctors? Adding a ctor in one crtx.o ? David On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. Please correct me if I am wrong since I did not follow the IFUNC part you mentioned. However, it looks like this could be solved with adding an explicit call to __cpu_indicator_init from within the ctor foo. To me, it seems like the pain of adding this call explicitly in other ctors is worth it because it works cleanly for non-ctors. static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... __cpu_indicator_init (); call bar (); ... } Will this work? Do I have to do that in every constructor, including C++ global constructors? It is ridiculous. -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
IFUNC selector will need to call get_cpu_indicator (as proposed by HJ or something similar), while in other contexts, the implementation should find a way to make sure the indicator is already initialized such that the builtins accessing the features can be directly used (See also Michael and Richard's previous comments). The runtime penalty is much smaller. david On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. Please correct me if I am wrong since I did not follow the IFUNC part you mentioned. However, it looks like this could be solved with adding an explicit call to __cpu_indicator_init from within the ctor foo. To me, it seems like the pain of adding this call explicitly in other ctors is worth it because it works cleanly for non-ctors. static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... __cpu_indicator_init (); call bar (); ... } Will this work? Thanks, -Sri. -- H.J.
Re: [google] Increase hotness count fraction
Good for google branches. Need to measure the performance and size impact on SPEC2k/06 with this patch and the previous hotcaller patch for trunk. thanks, David On Wed, Aug 24, 2011 at 2:43 PM, Mark Heffernan meh...@google.com wrote: This patch bumps up the parameter 'hot-bb-count-fraction' from 1 to 4. This results in about a 0.5% geomean performance improvement across internal benchmarks for x86-64 LIPO. The parameter change effectively increases the number of functions/callsites which are considered hot. The performance improvement is likely due to increased inlining (more callsites are considered hot and available for inlining). Bootstrapped and reg-tested on x86-64. OK for google/gcc-4_6? Mark 2011-08-24 Mark Heffernan meh...@google.com * params.def (hot-bb-count-fraction): Change default value. Index: params.def === --- params.def (revision 177964) +++ params.def (working copy) @@ -382,7 +382,7 @@ DEFPARAM(PARAM_SMS_LOOP_AVERAGE_COUNT_TH DEFPARAM(HOT_BB_COUNT_FRACTION, hot-bb-count-fraction, Select fraction of the maximal count of repetitions of basic block in program given basic block needs to have to be considered hot, - 1, 0, 0) + 4, 0, 0) DEFPARAM(HOT_BB_FREQUENCY_FRACTION, hot-bb-frequency-fraction, Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot,
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Thu, Aug 18, 2011 at 6:10 AM, Michael Matz m...@suse.de wrote: Hi, On Thu, 18 Aug 2011, Richard Guenther wrote: CPUID to get target features and set global vars corresponding to the features. So, the builtin should be folded by into the appropriate variable in libgcc. Hm, but then the variable should reside in libgcc and you'd only need an extern variant in the varpool. I'm not sure separate constructors (possibly in each module ...) would be better than a single one in libgcc that would get run unconditionally. Would be my preference too. determining target features and setting the appropriate globals. If the new builtins are called, gcc will call __cpu_indicator_init in a constructor so that it is called exactly once. Then, gcc will fold the builtin to the appropriate global variable. I see, but this sounds like premature optimization to me, no? Considering you'd do this in each module and our inability to merge those constructors at link time. If we put __cpu_indicator, the constructor and the assorted support into a separate module inside libgcc.a could we arrange it in a way that if __cpu_indicator is not referenced from the program that piece isn't linked in? (not sure if that is possible with constructors) If you make an .o file only exporting __cpu_indicator, then it won't be included in a link where no object file refers to that symbol. If you put the ctor for that variable in the same .o file you win. I also take issue with the large number of builtins, I'd have expected one single builtin returning the CPU type, and an enum that can be tested. That potentially requires an installed gcc private header, but I think enabling access to this cpu detection facility in libgcc to our users is worthwhile. The CPU type builtins can probably be combined, not the feature testing ones. David Ciao, Michael.
primary vtable not initialized properly
Compile the following program using 4.6 or trunk compiler and run it, the program will seg fault. The problem is that the D3_Spec's primary vtable has a null entry for virtual function id which it overrides. The root cause is that the base class's virtual function list for the derived class is copied from the base class -- so that lost_primary bit may be 'inherited'. However this bit is not properly reset to 0 after the derive class's vtable is analyzed. The attached trivial patch fixed the problem. Bootstrap and tested x86_64/linux. Ok for trunk and 46 branch? Thanks, David p Description: Binary data
Re: primary vtable not initialized properly
A new patch with lightly modified the test case. David On Thu, Aug 11, 2011 at 5:02 PM, Xinliang David Li davi...@google.com wrote: Compile the following program using 4.6 or trunk compiler and run it, the program will seg fault. The problem is that the D3_Spec's primary vtable has a null entry for virtual function id which it overrides. The root cause is that the base class's virtual function list for the derived class is copied from the base class -- so that lost_primary bit may be 'inherited'. However this bit is not properly reset to 0 after the derive class's vtable is analyzed. The attached trivial patch fixed the problem. Bootstrap and tested x86_64/linux. Ok for trunk and 46 branch? Thanks, David p Description: Binary data
cfg fixup bug fix
The attached patch fixed a minor bug in cfg fixup -- the outgoing edge profile count is not scaled after inlining leading to warnings printed in IR dump -- 'Invalid sum of ...'. Bootstrap and tested on x86-64/linux, ok for trunk? Thanks, David p2 Description: Binary data
-freorder-function is broken
The attached patch fixed the problem. The root cause of the problem is due to the ordering change of profile_estimation and tree_profile pass. In trunk, the function/node frequency is not computed after profile annotation is done leading to missing information. Another side effect of this breakage is optimize_function_for_size query is also broken leading to larger than necessary binary with FDO. Bootstrapped on x86_64/linux. All FDO testing passed. Ok after regression test? David p Description: Binary data
Re: [google] fix global vars incorrectly marked as read-only in LIPO mode (issue4798045)
On Thu, Jul 21, 2011 at 11:32 AM, Rong Xu x...@google.com wrote: On Thu, Jul 21, 2011 at 11:09 AM, davi...@google.com wrote: http://codereview.appspot.com/4798045/diff/1/ipa.c File ipa.c (right): http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1034 ipa.c:1034: { Has varpool node linking happened at this point? If not, the new code here is not excersised. This functions is called in multiple places: local pass and whole_program pass. varpool_node_link is b/w these two passes. the varpool node attribute is set before the use in ipa_discover_readonly_nonaddressable_vars() So the code relies on the second visibility pass to get things right -- if that pass is disabled things can go wrong (if the default visibility value when vnode is created is true, LIPO mode will get pessemitic result; if the default is false, LIPO mode will still get wrong result if only local visiblity pass is run). A better fix might be simply do not check 'needed' bit for comdat varnode in LIPO mode and always run the visibiity checker: if ((vnode-needed || L_IPO_COMP_MODE) varpool_node_externally_visible_p (... ) David http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1041 ipa.c:1041: vnode-externally_visible = false; Better to add a simple loop after varpool_node_linking to merge the attribute in l-ipo.c assuming the linking happens after local visibility pass. We can merge in the varpool_node_link, but we still need to change here as the attribute will be overwritten here in the whole_program pass. http://codereview.appspot.com/4798045/
Re: [google] fix global vars incorrectly marked as read-only in LIPO mode (issue4798045)
Ok. David On Thu, Jul 21, 2011 at 1:45 PM, Rong Xu x...@google.com wrote: Please review the new patch attached to this email. thanks, -Rong On Thu, Jul 21, 2011 at 12:44 PM, Rong Xu x...@google.com wrote: wait a second. this still won't work if we disable the whole-program pass (like my original change, the visibility analysis change won't kick in). we also need to change varpool_node_link() to merge the local attribute. -Rong On Thu, Jul 21, 2011 at 12:35 PM, Xinliang David Li davi...@google.com wrote: On Thu, Jul 21, 2011 at 12:27 PM, Rong Xu x...@google.com wrote: this is a good point. ipa_discover_readonly_nonaddressable_vars() is called in two passes. whole-program (whole program visibility analysis) and static-var. The one in whole-program is ok here as it is bundled together with the analysis. the invocation in static-var can go wrong. should we add a check for COMDAT flag also? like Probably not -- only non referenced comdat vars will be marked as not needed. David if ((vnode-needed || (L_IPO_COMP_MODE DECL_COMDAT (node-decl))) varpool_node_externally_visible_p) ( Thanks, -Rong On Thu, Jul 21, 2011 at 11:59 AM, Xinliang David Li davi...@google.com wrote: On Thu, Jul 21, 2011 at 11:32 AM, Rong Xu x...@google.com wrote: On Thu, Jul 21, 2011 at 11:09 AM, davi...@google.com wrote: http://codereview.appspot.com/4798045/diff/1/ipa.c File ipa.c (right): http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1034 ipa.c:1034: { Has varpool node linking happened at this point? If not, the new code here is not excersised. This functions is called in multiple places: local pass and whole_program pass. varpool_node_link is b/w these two passes. the varpool node attribute is set before the use in ipa_discover_readonly_nonaddressable_vars() So the code relies on the second visibility pass to get things right -- if that pass is disabled things can go wrong (if the default visibility value when vnode is created is true, LIPO mode will get pessemitic result; if the default is false, LIPO mode will still get wrong result if only local visiblity pass is run). A better fix might be simply do not check 'needed' bit for comdat varnode in LIPO mode and always run the visibiity checker: if ((vnode-needed || L_IPO_COMP_MODE) varpool_node_externally_visible_p (... ) David http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1041 ipa.c:1041: vnode-externally_visible = false; Better to add a simple loop after varpool_node_linking to merge the attribute in l-ipo.c assuming the linking happens after local visibility pass. We can merge in the varpool_node_link, but we still need to change here as the attribute will be overwritten here in the whole_program pass. http://codereview.appspot.com/4798045/
Re: [testsuite] Remove dg-extra-errors in gcc.dg/inline_[12].c etc.
Your fix works ok for me (on x86-64/linux) too. Thanks, David On Tue, Jun 28, 2011 at 10:09 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Three new testcases seem to XPASS everywhere, at least on all of my targets: XPASS: gcc.dg/inline_1.c (test for excess errors) XPASS: gcc.dg/inline_2.c (test for excess errors) XPASS: gcc.dg/unroll_1.c (test for excess errors) The following patch fixes this to remove the noise. Tested with the appropriate runtest invocation on i386-pc-solaris2.10. Ok for mainline? Rainer 2011-06-28 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.dg/inline_1.c: Remove dg-excess-errors. * gcc.dg/inline_2.c: Likewise. * gcc.dg/unroll_1.c: Likewise. Index: gcc/testsuite/gcc.dg/inline_2.c === --- gcc/testsuite/gcc.dg/inline_2.c (revision 175590) +++ gcc/testsuite/gcc.dg/inline_2.c (working copy) @@ -20,4 +20,3 @@ /* { dg-final { scan-tree-dump-times bar 5 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ -/* { dg-excess-errors extra notes } */ Index: gcc/testsuite/gcc.dg/inline_1.c === --- gcc/testsuite/gcc.dg/inline_1.c (revision 175590) +++ gcc/testsuite/gcc.dg/inline_1.c (working copy) @@ -20,4 +20,3 @@ /* { dg-final { scan-tree-dump-times bar 5 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ -/* { dg-excess-errors extra notes } */ Index: gcc/testsuite/gcc.dg/unroll_1.c === --- gcc/testsuite/gcc.dg/unroll_1.c (revision 175590) +++ gcc/testsuite/gcc.dg/unroll_1.c (working copy) @@ -30,4 +30,3 @@ /* { dg-final { scan-rtl-dump-times Decided to peel loop completely 2 loop2_unroll } } */ /* { dg-final { cleanup-rtl-dump loop2_unroll } } */ -/* { dg-excess-errors extra notes } */ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: SRA generates uninitialized var use
On Tue, Jun 21, 2011 at 1:42 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Jun 21, 2011 at 1:28 AM, Xinliang David Li davi...@google.com wrote: Good point -- but why does SRA have to be so complicated? If it just do structure expansion and let subsequent phases to clean it up, would it be simpler? Anyway this is off the topic. Well, it's certainly non-optimal to insert new memory backed variables to get rid of memory backed variables ... Yes, in the current way it is not optimal. Before that problem is resolved, is the simple patch ok for trunk? The non-optimal code issue can be tracked with a different bug. Thanks, David Richard. Thanks, David On Mon, Jun 20, 2011 at 1:47 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 20, 2011 at 6:15 PM, Xinliang David Li davi...@google.com wrote: It is used to indicate the fact the var decl needs to have a memory home (addressable) -- is there another way to do this? this is to avoid the following situation: 1) after SRA before update SSA, the IR looks like: MEM[ SR_123] = ... other_var = SR_123; (x) In this case, SR_123 is not of aggregate type, and it is not addressable, update_ssa won't assign a VUSE for (x), leading to The point is, SRA should never have created the above MEM[ SR_123] = ... Martin, why would it even create new _memory_ backed decls? Richard. 2) final IR after SRA: MEM[..., SR_123] = .. other_var = SR_123_yyy(D); David On Mon, Jun 20, 2011 at 4:13 AM, Richard Guenther richard.guent...@gmail.com wrote: On Sat, Jun 18, 2011 at 10:56 AM, Xinliang David Li davi...@google.com wrote: Compiling the test case in the patch with -O2 -m32 without the fix, the program will abort. The problem is a var decl whose address is taken is not marked as addressable leading to bad SSA update (missing VUSE). (the triaging used the the .after and .after_cleanup dump diff and found the problem). the test is on going. Ok after testing? That doesn't make sense. SRA shouldn't generate anything that has its address taken. So, where do we take its address? Richard. Thanks, David
Re: SRA generates uninitialized var use
It is used to indicate the fact the var decl needs to have a memory home (addressable) -- is there another way to do this? this is to avoid the following situation: 1) after SRA before update SSA, the IR looks like: MEM[ SR_123] = ... other_var = SR_123; (x) In this case, SR_123 is not of aggregate type, and it is not addressable, update_ssa won't assign a VUSE for (x), leading to 2) final IR after SRA: MEM[..., SR_123] = .. other_var = SR_123_yyy(D); David On Mon, Jun 20, 2011 at 4:13 AM, Richard Guenther richard.guent...@gmail.com wrote: On Sat, Jun 18, 2011 at 10:56 AM, Xinliang David Li davi...@google.com wrote: Compiling the test case in the patch with -O2 -m32 without the fix, the program will abort. The problem is a var decl whose address is taken is not marked as addressable leading to bad SSA update (missing VUSE). (the triaging used the the .after and .after_cleanup dump diff and found the problem). the test is on going. Ok after testing? That doesn't make sense. SRA shouldn't generate anything that has its address taken. So, where do we take its address? Richard. Thanks, David
Re: SRA generates uninitialized var use
Good point -- but why does SRA have to be so complicated? If it just do structure expansion and let subsequent phases to clean it up, would it be simpler? Anyway this is off the topic. Thanks, David On Mon, Jun 20, 2011 at 1:47 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 20, 2011 at 6:15 PM, Xinliang David Li davi...@google.com wrote: It is used to indicate the fact the var decl needs to have a memory home (addressable) -- is there another way to do this? this is to avoid the following situation: 1) after SRA before update SSA, the IR looks like: MEM[ SR_123] = ... other_var = SR_123; (x) In this case, SR_123 is not of aggregate type, and it is not addressable, update_ssa won't assign a VUSE for (x), leading to The point is, SRA should never have created the above MEM[ SR_123] = ... Martin, why would it even create new _memory_ backed decls? Richard. 2) final IR after SRA: MEM[..., SR_123] = .. other_var = SR_123_yyy(D); David On Mon, Jun 20, 2011 at 4:13 AM, Richard Guenther richard.guent...@gmail.com wrote: On Sat, Jun 18, 2011 at 10:56 AM, Xinliang David Li davi...@google.com wrote: Compiling the test case in the patch with -O2 -m32 without the fix, the program will abort. The problem is a var decl whose address is taken is not marked as addressable leading to bad SSA update (missing VUSE). (the triaging used the the .after and .after_cleanup dump diff and found the problem). the test is on going. Ok after testing? That doesn't make sense. SRA shouldn't generate anything that has its address taken. So, where do we take its address? Richard. Thanks, David
Re: SRA generates uninitialized var use
Bootstrap and tested on linux/x86_64. Ok for trunk? David On Sat, Jun 18, 2011 at 1:56 AM, Xinliang David Li davi...@google.com wrote: Compiling the test case in the patch with -O2 -m32 without the fix, the program will abort. The problem is a var decl whose address is taken is not marked as addressable leading to bad SSA update (missing VUSE). (the triaging used the the .after and .after_cleanup dump diff and found the problem). the test is on going. Ok after testing? Thanks, David
SRA generates uninitialized var use
Compiling the test case in the patch with -O2 -m32 without the fix, the program will abort. The problem is a var decl whose address is taken is not marked as addressable leading to bad SSA update (missing VUSE). (the triaging used the the .after and .after_cleanup dump diff and found the problem). the test is on going. Ok after testing? Thanks, David sra_bug.p Description: Binary data
Re: [google] Add intermediate text format for gcov (issue4595053)
Ok for google/main. David On Tue, Jun 14, 2011 at 1:50 PM, Sharad Singhai (शरद सिंघई) sing...@google.com wrote: Sorry, Rietveld didn't send out the updated patch along with my mail. Here it is. Sharad 2011-06-14 Sharad Singhai sing...@google.com Google Ref 3 * doc/gcov.texi: Document gcov intermediate format. * gcov.c (get_gcov_file_intermediate_name): New function. (output_intermediate_file): New function. * testsuite/lib/gcov.exp: Handle intermediate format. * testsuite/g++.dg/gcov/gcov-7.C: New test. Index: doc/gcov.texi === --- doc/gcov.texi (revision 174926) +++ doc/gcov.texi (working copy) @@ -130,6 +130,7 @@ [@option{-f}|@option{--function-summaries}] [@option{-o}|@option{--object-directory} @var{directory|file}] @var{sourcefiles} [@option{-u}|@option{--unconditional-branches}] + [@option{-i}|@option{--intermediate-format}] [@option{-d}|@option{--display-progress}] @c man end @c man begin SEEALSO @@ -216,6 +217,32 @@ @itemx --display-progress Display the progress on the standard output. +@item -i +@itemx --intermediate-format +Output gcov file in an intermediate text format that can be used by +@command{lcov} or other applications. It will output a single *.gcov file per +*.gcda file. No source code is required. + +The format of the intermediate @file{.gcov} file is plain text with +one entry per line + +@smallexample +SF:@var{source_file_name} +FN:@var{line_number},@var{function_name} +FNDA:@var{execution_count},@var{function_name} +BA:@var{line_num},@var{branch_coverage_type} +DA:@var{line number},@var{execution_count} + +Where the @var{branch_coverage_type} is + 0 (Branch not executed) + 1 (Branch executed, but not taken) + 2 (Branch executed and taken) + +There can be multiple SF entries in an intermediate gcov file. All +entries following SF pertain to that source file until the next SF +entry. +@end smallexample + @end table @command{gcov} should be run with the current directory the same as that Index: gcov.c === --- gcov.c (revision 174926) +++ gcov.c (working copy) @@ -38,6 +38,7 @@ #include tm.h #include intl.h #include version.h +#include demangle.h #include getopt.h @@ -310,6 +311,9 @@ static int flag_display_progress = 0; +/* Output *.gcov file in intermediate format used by 'lcov'. */ +static int flag_intermediate_format = 0; + /* For included files, make the gcov output file name include the name of the input source file. For example, if x.h is included in a.c, then the output file name is a.c##x.h.gcov instead of x.h.gcov. */ @@ -436,6 +440,11 @@ fnotice (file, -o, --object-directory DIR|FILE Search for object files in DIR or called FILE\n); fnotice (file, -p, --preserve-paths Preserve all pathname components\n); fnotice (file, -u, --unconditional-branches Show unconditional branch counts too\n); + fnotice (file, -i, --intermediate-format Output .gcov file in an intermediate text\n\ + format that can be used by 'lcov' or other\n\ + applications. It will output a single\n\ + .gcov file per .gcda file. No source file\n\ + is required.\n); fnotice (file, -d, --display-progress Display progress information\n); fnotice (file, \nFor bug reporting instructions, please see:\n%s.\n, bug_report_url); @@ -472,6 +481,7 @@ { object-file, required_argument, NULL, 'o' }, { unconditional-branches, no_argument, NULL, 'u' }, { display-progress, no_argument, NULL, 'd' }, + { intermediate-format, no_argument, NULL, 'i' }, { 0, 0, 0, 0 } }; @@ -482,7 +492,8 @@ { int opt; - while ((opt = getopt_long (argc, argv, abcdfhlno:puv, options, NULL)) != -1) + while ((opt = getopt_long (argc, argv, abcdfhilno:puv, options, NULL)) != + -1) { switch (opt) { @@ -516,6 +527,10 @@ case 'u': flag_unconditional = 1; break; + case 'i': + flag_intermediate_format = 1; + flag_gcov_file = 1; + break; case 'd': flag_display_progress = 1; break; @@ -531,6 +546,109 @@ return optind; } +/* Get the name of the gcov file. The return value must be free'd. + + It appends the '.gcov' extension to the *basename* of the file. + The resulting file name will be in PWD. + + e.g., + input: foo.da, output: foo.da.gcov + input: a/b/foo.cc, output: foo.cc.gcov */ + +static char * +get_gcov_file_intermediate_name (const char *file_name) +{ + const char *gcov = .gcov; +
[google] backport r174930 to google/main
Backported r174930 to google/main. David
Re: Dump before flag
Committed after Bootstrapping and regression testing on x86-64/linux. The follow up patch will come soon. Thanks, David On Tue, Jun 14, 2011 at 8:57 AM, Xinliang David Li davi...@google.com wrote: On Tue, Jun 14, 2011 at 6:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Jun 10, 2011 at 8:44 PM, Xinliang David Li davi...@google.com wrote: This is the revised patch as suggested. How does it look? } +static void +execute_function_dump (void *data ATTRIBUTE_UNUSED) function needs a comment. Ok with that change. Please always specify how you tested the patch - the past fallouts suggest you didn't do the required testing carefully. I think I did -- the fallout was probably due to different '--enable-checking' setting. I have now turned it to 'yes' Thanks, David A changelog is missing as well. Thanks, Richard. Thanks, David On Fri, Jun 10, 2011 at 9:22 AM, Xinliang David Li davi...@google.com wrote: On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li davi...@google.com wrote: See attached. Hmm. I don't like how you still wire dumping in the TODO routines. Doesn't it work to just dump the body from pass_fini_dump_file ()? Or if that doesn't sound clean from (a subset of) places where it is called? (we might want to exclude the ipa read/write/summary stages) That may require another round of function traversal -- but probably not a big deal -- it sounds cleaner. David Richard. Thanks, David On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li davi...@google.com wrote: this is the patch that just removes the TODO_dump flag and forces it to dump. The original code cfun-last_verified = flags TODO_verify_all looks weird -- depending on TODO_dump is set or not, the behavior of the update is different (when no other todo flags is set). Ok for trunk? -ENOPATCH. Richard. David On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com wrote: The following is the patch that does the job. Most of the changes are just removing TODO_dump_func. The major change is in passes.c and tree-pass.h. -fdump-xxx-yyy-start -- dump before TODO_start -fdump-xxx-yyy-before -- dump before main pass after TODO_pass -fdump-xxx-yyy-after -- dump after main pass before TODO_finish -fdump-xxx-yyy-finish -- dump after TODO_finish Can we bikeshed a bit more about these names? These names may be less confusing: before_preparation before after after_cleanup David start and before have no semantical difference to me ... as the dump before TODO_start of a pass and the dump after TODO_finish of the previous pass are identical (hopefully ;)), maybe merge those into a -between flag? If you'd specify it for a single pass then you'd get both -start and -finish (using your naming scheme). Splitting that dump(s) to different files then might make sense (not sure about the name to use). Note that I find it extremely useful to have dumping done in chronological order - splitting some of it to different files destroys this, especially a dump after TODO_start or before TODO_finish should appear in the same file (or we could also start splitting individual TODO_ output into sub-dump-files). I guess what would be nice instread would be a fancy dump-file viewer that could show diffs, hide things like SCEV output, etc. I suppose a patch that removes the dump TODO and unconditionally dumps at the current point would be a good preparation for this enhancing patch. Richard. The default is 'finish'. Does it look ok? Thanks, David On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote: Your patch doesn't really improve this but adds to the confusion. + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) + { + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; + } and certainly writing to pass is not ok. And the TDF_BEFORE flag looks misplaced as it controls TODOs, not dumping behavior. Yes, it's a mess right now but the above looks like a hack ontop of that mess (maybe because of it, but well ...). How about removing dumping TODO completely -- this can be done easily -- I don't understand why pass wants extra control on the dumping if user already asked for dumping -- it is annoying to see empty IR dump for a pass when I want to see it. At least I would have expected to also get the dump after
Re: Dump only functions with name matching patterns
Attached the patch. David On Tue, Jun 14, 2011 at 4:21 PM, Xinliang David Li davi...@google.com wrote: This is the second (hopefully the last in the series of dumper changes) follow-up patch. It adds a control so that verbosity of the dump can be greatly reduced (and hopefully simplify gcc developer's life a little). For instance: -fdump-tree-dce=foo[0-9]$ to dump IR (and debug trace) only for function with assembler names matching the specified pattern. Notes: 1) the pattern is specified using regular expression -- unlike disable/enable option where exact name patterns are required, for dumping, fuzziness is preferred 2) patterns specified in different -fdump-.. options will be 'ORed' together. The patch has not been fully tested, but comments are welcome. Thanks, David On Tue, Jun 14, 2011 at 4:13 PM, Xinliang David Li davi...@google.com wrote: Here is one the of follow up patches: support of -before_preparation, -before, -after, -after_cleanup dump flags. The default dumping behavior does not change at all, but if any one of the above flags is specified, the function IR will be dumped into a file with the corresponding suffix. The enhancement is to simplify IR diffing. Bootstrapped and regression tested on x86-64/linux. Ok for trunk? Thanks, David On Tue, Jun 14, 2011 at 12:40 PM, Xinliang David Li davi...@google.com wrote: Committed after Bootstrapping and regression testing on x86-64/linux. The follow up patch will come soon. Thanks, David On Tue, Jun 14, 2011 at 8:57 AM, Xinliang David Li davi...@google.com wrote: On Tue, Jun 14, 2011 at 6:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Jun 10, 2011 at 8:44 PM, Xinliang David Li davi...@google.com wrote: This is the revised patch as suggested. How does it look? } +static void +execute_function_dump (void *data ATTRIBUTE_UNUSED) function needs a comment. Ok with that change. Please always specify how you tested the patch - the past fallouts suggest you didn't do the required testing carefully. I think I did -- the fallout was probably due to different '--enable-checking' setting. I have now turned it to 'yes' Thanks, David A changelog is missing as well. Thanks, Richard. Thanks, David On Fri, Jun 10, 2011 at 9:22 AM, Xinliang David Li davi...@google.com wrote: On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li davi...@google.com wrote: See attached. Hmm. I don't like how you still wire dumping in the TODO routines. Doesn't it work to just dump the body from pass_fini_dump_file ()? Or if that doesn't sound clean from (a subset of) places where it is called? (we might want to exclude the ipa read/write/summary stages) That may require another round of function traversal -- but probably not a big deal -- it sounds cleaner. David Richard. Thanks, David On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li davi...@google.com wrote: this is the patch that just removes the TODO_dump flag and forces it to dump. The original code cfun-last_verified = flags TODO_verify_all looks weird -- depending on TODO_dump is set or not, the behavior of the update is different (when no other todo flags is set). Ok for trunk? -ENOPATCH. Richard. David On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com wrote: The following is the patch that does the job. Most of the changes are just removing TODO_dump_func. The major change is in passes.c and tree-pass.h. -fdump-xxx-yyy-start -- dump before TODO_start -fdump-xxx-yyy-before -- dump before main pass after TODO_pass -fdump-xxx-yyy-after -- dump after main pass before TODO_finish -fdump-xxx-yyy-finish -- dump after TODO_finish Can we bikeshed a bit more about these names? These names may be less confusing: before_preparation before after after_cleanup David start and before have no semantical difference to me ... as the dump before TODO_start of a pass and the dump after TODO_finish of the previous pass are identical (hopefully ;)), maybe merge those into a -between flag? If you'd specify it for a single pass then you'd get both -start and -finish (using your naming scheme). Splitting that dump(s) to different files then might make sense (not sure about the name to use). Note that I find it extremely useful to have dumping done in chronological order - splitting some of it to different files destroys this, especially a dump after TODO_start or before TODO_finish should appear in the same file (or we could also start splitting individual TODO_ output
Re: -fdump-passes -fenable-xxx=func_name_list
On Fri, Jun 10, 2011 at 2:03 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Jun 10, 2011 at 12:51 AM, Xinliang David Li davi...@google.com wrote: Patch is temporally rolled back. Richard, looks like deeper pass manager cleanup is needed -- I would like to delay that. For now, this leaves us two choices -- 1) do cfunc push/pop, or 2) do pass dump while executing. None of them is ideal, but safe enough. Well. It seems we should take a step back and look at the whole picture and try to figure out how it should look like in the end (maybe do that in London). For now I prefer 1) over 2). That sounds good. For now I will check in 1) (as it has no impact on default behavior) and do pass clean up later. I won't be in London for discussion, but you let me know how the discussion goes. Thanks, David Thanks, Richard. Thanks, David On Thu, Jun 9, 2011 at 3:32 PM, Xinliang David Li davi...@google.com wrote: Though I can not reproduce it, it might be related to what Richard mentioned in the review -- The TODO's are executed though the pass is not. This opened up a can of worm -- I will revert the patches for now. David On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote: Please review the attached two patches. In the first patch, gate functions are cleaned up. All the per function legality checks are moved into the executor and the optimization heuristic checks (optimize for size) remain in the gators. These allow the the following overriding order: common flags (O2, -ftree-vrp, -fgcse etc) --- compiler heuristic (optimize for size/speed) --- -fdisable/enable forcing pass options --- legality check Testing under going. Ok for trunk? This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350 -- H.J.
Re: Dump before flag
On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li davi...@google.com wrote: See attached. Hmm. I don't like how you still wire dumping in the TODO routines. Doesn't it work to just dump the body from pass_fini_dump_file ()? Or if that doesn't sound clean from (a subset of) places where it is called? (we might want to exclude the ipa read/write/summary stages) That may require another round of function traversal -- but probably not a big deal -- it sounds cleaner. David Richard. Thanks, David On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li davi...@google.com wrote: this is the patch that just removes the TODO_dump flag and forces it to dump. The original code cfun-last_verified = flags TODO_verify_all looks weird -- depending on TODO_dump is set or not, the behavior of the update is different (when no other todo flags is set). Ok for trunk? -ENOPATCH. Richard. David On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com wrote: The following is the patch that does the job. Most of the changes are just removing TODO_dump_func. The major change is in passes.c and tree-pass.h. -fdump-xxx-yyy-start -- dump before TODO_start -fdump-xxx-yyy-before -- dump before main pass after TODO_pass -fdump-xxx-yyy-after -- dump after main pass before TODO_finish -fdump-xxx-yyy-finish -- dump after TODO_finish Can we bikeshed a bit more about these names? These names may be less confusing: before_preparation before after after_cleanup David start and before have no semantical difference to me ... as the dump before TODO_start of a pass and the dump after TODO_finish of the previous pass are identical (hopefully ;)), maybe merge those into a -between flag? If you'd specify it for a single pass then you'd get both -start and -finish (using your naming scheme). Splitting that dump(s) to different files then might make sense (not sure about the name to use). Note that I find it extremely useful to have dumping done in chronological order - splitting some of it to different files destroys this, especially a dump after TODO_start or before TODO_finish should appear in the same file (or we could also start splitting individual TODO_ output into sub-dump-files). I guess what would be nice instread would be a fancy dump-file viewer that could show diffs, hide things like SCEV output, etc. I suppose a patch that removes the dump TODO and unconditionally dumps at the current point would be a good preparation for this enhancing patch. Richard. The default is 'finish'. Does it look ok? Thanks, David On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote: Your patch doesn't really improve this but adds to the confusion. + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) + { + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; + } and certainly writing to pass is not ok. And the TDF_BEFORE flag looks misplaced as it controls TODOs, not dumping behavior. Yes, it's a mess right now but the above looks like a hack ontop of that mess (maybe because of it, but well ...). How about removing dumping TODO completely -- this can be done easily -- I don't understand why pass wants extra control on the dumping if user already asked for dumping -- it is annoying to see empty IR dump for a pass when I want to see it. At least I would have expected to also get the dump after the pass, not only the one before it with this dump flag. Now, why can't you look at the previous pass output for the before-dump (as I do usually)? For one thing, you need to either remember what is the previous pass, or dump all passes which for large files can take very long time. Even with all the dumps, you will need to eyeballing to find the previous pass which may or may not have the IR dumped. How about removing dump TODO? Yeah, I think this would go in the right direction. Currently some passes do not dump function bodies because they presumably do no IL modification. But this is certainly the minority (and some passes do not dump bodies even though they are modifying the IL ...). So I'd say we should by default dump function bodies. Note that there are three useful dumping positions (maybe four), before todo-start, after todo-start, before todo-finish and after todo-finish. By default we'd want after todo-finish. When we no longer dump
[google] backport 174848, 174849 to google/main
Backported -fdump-passes option impl. David
Re: -fdump-passes -fenable-xxx=func_name_list
Can you send me a trace? I can not reproduce the problem. David On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote: Please review the attached two patches. In the first patch, gate functions are cleaned up. All the per function legality checks are moved into the executor and the optimization heuristic checks (optimize for size) remain in the gators. These allow the the following overriding order: common flags (O2, -ftree-vrp, -fgcse etc) --- compiler heuristic (optimize for size/speed) --- -fdisable/enable forcing pass options --- legality check Testing under going. Ok for trunk? This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350 -- H.J.
Re: -fdump-passes -fenable-xxx=func_name_list
Though I can not reproduce it, it might be related to what Richard mentioned in the review -- The TODO's are executed though the pass is not. This opened up a can of worm -- I will revert the patches for now. David On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote: Please review the attached two patches. In the first patch, gate functions are cleaned up. All the per function legality checks are moved into the executor and the optimization heuristic checks (optimize for size) remain in the gators. These allow the the following overriding order: common flags (O2, -ftree-vrp, -fgcse etc) --- compiler heuristic (optimize for size/speed) --- -fdisable/enable forcing pass options --- legality check Testing under going. Ok for trunk? This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350 -- H.J.
Re: -fdump-passes -fenable-xxx=func_name_list
Patch is temporally rolled back. Richard, looks like deeper pass manager cleanup is needed -- I would like to delay that. For now, this leaves us two choices -- 1) do cfunc push/pop, or 2) do pass dump while executing. None of them is ideal, but safe enough. Thanks, David On Thu, Jun 9, 2011 at 3:32 PM, Xinliang David Li davi...@google.com wrote: Though I can not reproduce it, it might be related to what Richard mentioned in the review -- The TODO's are executed though the pass is not. This opened up a can of worm -- I will revert the patches for now. David On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote: Please review the attached two patches. In the first patch, gate functions are cleaned up. All the per function legality checks are moved into the executor and the optimization heuristic checks (optimize for size) remain in the gators. These allow the the following overriding order: common flags (O2, -ftree-vrp, -fgcse etc) --- compiler heuristic (optimize for size/speed) --- -fdisable/enable forcing pass options --- legality check Testing under going. Ok for trunk? This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350 -- H.J.
Re: Dump before flag
On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com wrote: The following is the patch that does the job. Most of the changes are just removing TODO_dump_func. The major change is in passes.c and tree-pass.h. -fdump-xxx-yyy-start -- dump before TODO_start -fdump-xxx-yyy-before -- dump before main pass after TODO_pass -fdump-xxx-yyy-after -- dump after main pass before TODO_finish -fdump-xxx-yyy-finish -- dump after TODO_finish Can we bikeshed a bit more about these names? These names may be less confusing: before_preparation before after after_cleanup David start and before have no semantical difference to me ... as the dump before TODO_start of a pass and the dump after TODO_finish of the previous pass are identical (hopefully ;)), maybe merge those into a -between flag? If you'd specify it for a single pass then you'd get both -start and -finish (using your naming scheme). Splitting that dump(s) to different files then might make sense (not sure about the name to use). Note that I find it extremely useful to have dumping done in chronological order - splitting some of it to different files destroys this, especially a dump after TODO_start or before TODO_finish should appear in the same file (or we could also start splitting individual TODO_ output into sub-dump-files). I guess what would be nice instread would be a fancy dump-file viewer that could show diffs, hide things like SCEV output, etc. I suppose a patch that removes the dump TODO and unconditionally dumps at the current point would be a good preparation for this enhancing patch. Richard. The default is 'finish'. Does it look ok? Thanks, David On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote: Your patch doesn't really improve this but adds to the confusion. + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) + { + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; + } and certainly writing to pass is not ok. And the TDF_BEFORE flag looks misplaced as it controls TODOs, not dumping behavior. Yes, it's a mess right now but the above looks like a hack ontop of that mess (maybe because of it, but well ...). How about removing dumping TODO completely -- this can be done easily -- I don't understand why pass wants extra control on the dumping if user already asked for dumping -- it is annoying to see empty IR dump for a pass when I want to see it. At least I would have expected to also get the dump after the pass, not only the one before it with this dump flag. Now, why can't you look at the previous pass output for the before-dump (as I do usually)? For one thing, you need to either remember what is the previous pass, or dump all passes which for large files can take very long time. Even with all the dumps, you will need to eyeballing to find the previous pass which may or may not have the IR dumped. How about removing dump TODO? Yeah, I think this would go in the right direction. Currently some passes do not dump function bodies because they presumably do no IL modification. But this is certainly the minority (and some passes do not dump bodies even though they are modifying the IL ...). So I'd say we should by default dump function bodies. Note that there are three useful dumping positions (maybe four), before todo-start, after todo-start, before todo-finish and after todo-finish. By default we'd want after todo-finish. When we no longer dump via a TODO then we could indeed use dump-flags to control this (maybe -original for the body before todo-start). What to others think? Richard. Thanks, David Richard.
Re: Dump before flag
this is the patch that just removes the TODO_dump flag and forces it to dump. The original code cfun-last_verified = flags TODO_verify_all looks weird -- depending on TODO_dump is set or not, the behavior of the update is different (when no other todo flags is set). Ok for trunk? David On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com wrote: The following is the patch that does the job. Most of the changes are just removing TODO_dump_func. The major change is in passes.c and tree-pass.h. -fdump-xxx-yyy-start -- dump before TODO_start -fdump-xxx-yyy-before -- dump before main pass after TODO_pass -fdump-xxx-yyy-after -- dump after main pass before TODO_finish -fdump-xxx-yyy-finish -- dump after TODO_finish Can we bikeshed a bit more about these names? These names may be less confusing: before_preparation before after after_cleanup David start and before have no semantical difference to me ... as the dump before TODO_start of a pass and the dump after TODO_finish of the previous pass are identical (hopefully ;)), maybe merge those into a -between flag? If you'd specify it for a single pass then you'd get both -start and -finish (using your naming scheme). Splitting that dump(s) to different files then might make sense (not sure about the name to use). Note that I find it extremely useful to have dumping done in chronological order - splitting some of it to different files destroys this, especially a dump after TODO_start or before TODO_finish should appear in the same file (or we could also start splitting individual TODO_ output into sub-dump-files). I guess what would be nice instread would be a fancy dump-file viewer that could show diffs, hide things like SCEV output, etc. I suppose a patch that removes the dump TODO and unconditionally dumps at the current point would be a good preparation for this enhancing patch. Richard. The default is 'finish'. Does it look ok? Thanks, David On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote: Your patch doesn't really improve this but adds to the confusion. + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) + { + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; + } and certainly writing to pass is not ok. And the TDF_BEFORE flag looks misplaced as it controls TODOs, not dumping behavior. Yes, it's a mess right now but the above looks like a hack ontop of that mess (maybe because of it, but well ...). How about removing dumping TODO completely -- this can be done easily -- I don't understand why pass wants extra control on the dumping if user already asked for dumping -- it is annoying to see empty IR dump for a pass when I want to see it. At least I would have expected to also get the dump after the pass, not only the one before it with this dump flag. Now, why can't you look at the previous pass output for the before-dump (as I do usually)? For one thing, you need to either remember what is the previous pass, or dump all passes which for large files can take very long time. Even with all the dumps, you will need to eyeballing to find the previous pass which may or may not have the IR dumped. How about removing dump TODO? Yeah, I think this would go in the right direction. Currently some passes do not dump function bodies because they presumably do no IL modification. But this is certainly the minority (and some passes do not dump bodies even though they are modifying the IL ...). So I'd say we should by default dump function bodies. Note that there are three useful dumping positions (maybe four), before todo-start, after todo-start, before todo-finish and after todo-finish. By default we'd want after todo-finish. When we no longer dump via a TODO then we could indeed use dump-flags to control this (maybe -original for the body before todo-start). What to others think? Richard. Thanks, David Richard.
Re: Dump before flag
On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote: Your patch doesn't really improve this but adds to the confusion. + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) + { + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; + } and certainly writing to pass is not ok. And the TDF_BEFORE flag looks misplaced as it controls TODOs, not dumping behavior. Yes, it's a mess right now but the above looks like a hack ontop of that mess (maybe because of it, but well ...). How about removing dumping TODO completely -- this can be done easily -- I don't understand why pass wants extra control on the dumping if user already asked for dumping -- it is annoying to see empty IR dump for a pass when I want to see it. At least I would have expected to also get the dump after the pass, not only the one before it with this dump flag. Now, why can't you look at the previous pass output for the before-dump (as I do usually)? For one thing, you need to either remember what is the previous pass, or dump all passes which for large files can take very long time. Even with all the dumps, you will need to eyeballing to find the previous pass which may or may not have the IR dumped. How about removing dump TODO? Yeah, I think this would go in the right direction. Currently some passes do not dump function bodies because they presumably do no IL modification. But this is certainly the minority (and some passes do not dump bodies even though they are modifying the IL ...). So I'd say we should by default dump function bodies. Note that there are three useful dumping positions (maybe four), before todo-start, after todo-start, before todo-finish and after todo-finish. By default we'd want after todo-finish. When we no longer dump via a TODO then we could indeed use dump-flags to control this (maybe -original for the body before todo-start). What to others think? I think that is a very good thing to have. David Richard. Thanks, David Richard.
Re: -fdump-passes -fenable-xxx=func_name_list
Ok -- that sounds good. David On Tue, Jun 7, 2011 at 3:10 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:00 PM, Xinliang David Li davi...@google.com wrote: On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com wrote: This is the version of the patch that walks through pass lists. Ok with this one? +/* Dump all optimization passes. */ + +void +dump_passes (void) +{ + struct cgraph_node *n, *node = NULL; + tree save_fndecl = current_function_decl; + + fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid); this isn't accurate info - cloning can cause more cgraph nodes to appear (it also looks completely unrelated to dump_passes ...). Please drop it. Ok. + create_pass_tab(); + gcc_assert (pass_tab); you have quite many asserts of this kind - we don't want them when the previous stmt as in this case indicates everything is ok. ok. + push_cfun (DECL_STRUCT_FUNCTION (node-decl)); this has side-effects, I'm not sure we want this here. Why do you need it? Probably because of + is_really_on = override_gate_status (pass, current_function_decl, is_on); ? But that is dependent on the function given which should have no effect (unless it is overridden globally in which case override_gate_status and friends should deal with a NULL cfun). As we discussed, currently some pass gate functions depend on per node information -- those checks need to be pushed into execute functions. I would like to clean those up later -- at which time, the push/pop can be removed. I'd like to do it the other way around, first clean up the gate functions then drop in this patch without the cfun push/pop. The revised patch looks ok to me with the cfun push/pop removed. Thanks, Richard. I don't understand why you need another table mapping pass to name when pass-name is available and the info is trivially re-constructible. This is needed as the pass-name is not the canonicalized name (i.e., not with number suffix etc), so the extra mapping from id to normalized name is needed. Thanks, David Thanks, Richard. David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output will be bogus anyway. So - what is the output intended for if it isn't reliable? This needs to be cleaned up at some point -- the gate function should behave the same for all functions and per-function decisions need to be pushed down to the executor body. I will try to rework the patch as you suggested to see if there are problems. David Richard. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have
Re: Dump before flag
Any suggestions on the dump position specification string, before and after is not enough. How about start, before, after, and finish? I.e. -fdump-tree-pre-start -- dump IR before TODO_start of PRE pass -fdump-tree-pre-before -- dump IR just before PRE after its TODO start finishes -fdump-tree-pre-after -- dump IR just after PRE -fdump-tree-pre-finish -- dump it after TODO_finish of PRE? Thanks, David On Tue, Jun 7, 2011 at 9:43 AM, Diego Novillo dnovi...@google.com wrote: On Tue, Jun 7, 2011 at 02:36, Richard Guenther richard.guent...@gmail.com wrote: For one thing, you need to either remember what is the previous pass, or dump all passes which for large files can take very long time. Even with all the dumps, you will need to eyeballing to find the previous pass which may or may not have the IR dumped. How about removing dump TODO? Yeah, I think this would go in the right direction. Agreed. Diego.
Re: Dump before flag
On Tue, Jun 7, 2011 at 10:01 AM, Diego Novillo dnovi...@google.com wrote: On Tue, Jun 7, 2011 at 09:51, Xinliang David Li davi...@google.com wrote: Any suggestions on the dump position specification string, before and after is not enough. How about start, before, after, and finish? I.e. -fdump-tree-pre-start -- dump IR before TODO_start of PRE pass -fdump-tree-pre-before -- dump IR just before PRE after its TODO start finishes What would be the difference between these two? The TODO_start actions don't affect the IL, in general. But by design, it can right? David -fdump-tree-pre-after -- dump IR just after PRE -fdump-tree-pre-finish -- dump it after TODO_finish of PRE? This would be to catch changes like cfg cleanup and SSA update actions, right? Sure. I don't mind too much about the naming. Anything that makes sense is good with me. Diego.
Re: -fdump-passes -fenable-xxx=func_name_list
Please review the attached two patches. In the first patch, gate functions are cleaned up. All the per function legality checks are moved into the executor and the optimization heuristic checks (optimize for size) remain in the gators. These allow the the following overriding order: common flags (O2, -ftree-vrp, -fgcse etc) --- compiler heuristic (optimize for size/speed) --- -fdisable/enable forcing pass options --- legality check Testing under going. Ok for trunk? Thanks, David On Tue, Jun 7, 2011 at 9:24 AM, Xinliang David Li davi...@google.com wrote: Ok -- that sounds good. David On Tue, Jun 7, 2011 at 3:10 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:00 PM, Xinliang David Li davi...@google.com wrote: On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com wrote: This is the version of the patch that walks through pass lists. Ok with this one? +/* Dump all optimization passes. */ + +void +dump_passes (void) +{ + struct cgraph_node *n, *node = NULL; + tree save_fndecl = current_function_decl; + + fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid); this isn't accurate info - cloning can cause more cgraph nodes to appear (it also looks completely unrelated to dump_passes ...). Please drop it. Ok. + create_pass_tab(); + gcc_assert (pass_tab); you have quite many asserts of this kind - we don't want them when the previous stmt as in this case indicates everything is ok. ok. + push_cfun (DECL_STRUCT_FUNCTION (node-decl)); this has side-effects, I'm not sure we want this here. Why do you need it? Probably because of + is_really_on = override_gate_status (pass, current_function_decl, is_on); ? But that is dependent on the function given which should have no effect (unless it is overridden globally in which case override_gate_status and friends should deal with a NULL cfun). As we discussed, currently some pass gate functions depend on per node information -- those checks need to be pushed into execute functions. I would like to clean those up later -- at which time, the push/pop can be removed. I'd like to do it the other way around, first clean up the gate functions then drop in this patch without the cfun push/pop. The revised patch looks ok to me with the cfun push/pop removed. Thanks, Richard. I don't understand why you need another table mapping pass to name when pass-name is available and the info is trivially re-constructible. This is needed as the pass-name is not the canonicalized name (i.e., not with number suffix etc), so the extra mapping from id to normalized name is needed. Thanks, David Thanks, Richard. David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output will be bogus anyway. So - what is the output intended for if it isn't reliable? This needs to be cleaned up at some point -- the gate
Re: -fdump-passes -fenable-xxx=func_name_list
The dump-pass patch with test case. David On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote: Please review the attached two patches. In the first patch, gate functions are cleaned up. All the per function legality checks are moved into the executor and the optimization heuristic checks (optimize for size) remain in the gators. These allow the the following overriding order: common flags (O2, -ftree-vrp, -fgcse etc) --- compiler heuristic (optimize for size/speed) --- -fdisable/enable forcing pass options --- legality check Testing under going. Ok for trunk? Thanks, David On Tue, Jun 7, 2011 at 9:24 AM, Xinliang David Li davi...@google.com wrote: Ok -- that sounds good. David On Tue, Jun 7, 2011 at 3:10 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:00 PM, Xinliang David Li davi...@google.com wrote: On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com wrote: This is the version of the patch that walks through pass lists. Ok with this one? +/* Dump all optimization passes. */ + +void +dump_passes (void) +{ + struct cgraph_node *n, *node = NULL; + tree save_fndecl = current_function_decl; + + fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid); this isn't accurate info - cloning can cause more cgraph nodes to appear (it also looks completely unrelated to dump_passes ...). Please drop it. Ok. + create_pass_tab(); + gcc_assert (pass_tab); you have quite many asserts of this kind - we don't want them when the previous stmt as in this case indicates everything is ok. ok. + push_cfun (DECL_STRUCT_FUNCTION (node-decl)); this has side-effects, I'm not sure we want this here. Why do you need it? Probably because of + is_really_on = override_gate_status (pass, current_function_decl, is_on); ? But that is dependent on the function given which should have no effect (unless it is overridden globally in which case override_gate_status and friends should deal with a NULL cfun). As we discussed, currently some pass gate functions depend on per node information -- those checks need to be pushed into execute functions. I would like to clean those up later -- at which time, the push/pop can be removed. I'd like to do it the other way around, first clean up the gate functions then drop in this patch without the cfun push/pop. The revised patch looks ok to me with the cfun push/pop removed. Thanks, Richard. I don't understand why you need another table mapping pass to name when pass-name is available and the info is trivially re-constructible. This is needed as the pass-name is not the canonicalized name (i.e., not with number suffix etc), so the extra mapping from id to normalized name is needed. Thanks, David Thanks, Richard. David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output
Re: Dump before flag
It might be also useful to implement the dumping behavior like this: if any of the start/before/after/finish option is explicitly specified, IR (and only IR) will be dumped into files suffixed with .start/.before/.after/.finish. The debug dump will be dumped as usual into the non suffixed file name. By default, the IR dump and debug dump will be dumped into the same file which is the current behavior. David On Tue, Jun 7, 2011 at 4:08 PM, Xinliang David Li davi...@google.com wrote: The following is the patch that does the job. Most of the changes are just removing TODO_dump_func. The major change is in passes.c and tree-pass.h. -fdump-xxx-yyy-start -- dump before TODO_start -fdump-xxx-yyy-before -- dump before main pass after TODO_pass -fdump-xxx-yyy-after -- dump after main pass before TODO_finish -fdump-xxx-yyy-finish -- dump after TODO_finish The default is 'finish'. Does it look ok? Thanks, David On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote: Your patch doesn't really improve this but adds to the confusion. + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) + { + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; + } and certainly writing to pass is not ok. And the TDF_BEFORE flag looks misplaced as it controls TODOs, not dumping behavior. Yes, it's a mess right now but the above looks like a hack ontop of that mess (maybe because of it, but well ...). How about removing dumping TODO completely -- this can be done easily -- I don't understand why pass wants extra control on the dumping if user already asked for dumping -- it is annoying to see empty IR dump for a pass when I want to see it. At least I would have expected to also get the dump after the pass, not only the one before it with this dump flag. Now, why can't you look at the previous pass output for the before-dump (as I do usually)? For one thing, you need to either remember what is the previous pass, or dump all passes which for large files can take very long time. Even with all the dumps, you will need to eyeballing to find the previous pass which may or may not have the IR dumped. How about removing dump TODO? Yeah, I think this would go in the right direction. Currently some passes do not dump function bodies because they presumably do no IL modification. But this is certainly the minority (and some passes do not dump bodies even though they are modifying the IL ...). So I'd say we should by default dump function bodies. Note that there are three useful dumping positions (maybe four), before todo-start, after todo-start, before todo-finish and after todo-finish. By default we'd want after todo-finish. When we no longer dump via a TODO then we could indeed use dump-flags to control this (maybe -original for the body before todo-start). What to others think? Richard. Thanks, David Richard.
Re: [google] pessimize stack accounting during inlining
Ok for google/main. A good candidate patch for trunk too. Thanks, David On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan meh...@google.com wrote: This patch pessimizes stack accounting during inlining. This enables setting a firm stack size limit (via parameters large-stack-frame and large-stack-frame-growth). Without this patch the inliner is overly optimistic about potential stack reuse resulting in actual stack frames much larger than the parameterized limits. Internal benchmarks show minor performance differences with non-fdo and lipo, but overall neutral. Tested/bootstrapped on x86-64. Ok for google-main? Mark 2011-06-07 Mark Heffernan meh...@google.com * cgraph.h (cgraph_global_info): Remove field. * ipa-inline.c (cgraph_clone_inlined_nodes): Change stack frame computation. (cgraph_check_inline_limits): Ditto. (compute_inline_parameters): Remove dead initialization. Index: gcc/cgraph.h === --- gcc/cgraph.h (revision 174512) +++ gcc/cgraph.h (working copy) @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info { struct GTY(()) cgraph_global_info { /* Estimated stack frame consumption by the function. */ HOST_WIDE_INT estimated_stack_size; - /* Expected offset of the stack frame of inlined function. */ - HOST_WIDE_INT stack_frame_offset; /* For inline clones this points to the function they will be inlined into. */ Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 174512) +++ gcc/ipa-inline.c (working copy) @@ -229,8 +229,6 @@ void cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate, bool update_original) { - HOST_WIDE_INT peak; - if (duplicate) { /* We may eliminate the need for out-of-line copy to be output. @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap e-callee-global.inlined_to = e-caller-global.inlined_to; else e-callee-global.inlined_to = e-caller; - e-callee-global.stack_frame_offset - = e-caller-global.stack_frame_offset - + inline_summary (e-caller)-estimated_self_stack_size; - peak = e-callee-global.stack_frame_offset - + inline_summary (e-callee)-estimated_self_stack_size; - if (e-callee-global.inlined_to-global.estimated_stack_size peak) - e-callee-global.inlined_to-global.estimated_stack_size = peak; + + /* Pessimistically assume no sharing of stack space. That is, the + frame size of a function is estimated as the original frame size + plus the sum of the frame sizes of all inlined callees. */ + e-callee-global.inlined_to-global.estimated_stack_size += + inline_summary (e-callee)-estimated_self_stack_size; + cgraph_propagate_frequency (e-callee); /* Recursively clone all bodies. */ @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap stack_size_limit += stack_size_limit * PARAM_VALUE (PARAM_STACK_FRAME_GROWTH) / 100; - inlined_stack = (to-global.stack_frame_offset - + inline_summary (to)-estimated_self_stack_size + inlined_stack = (to-global.estimated_stack_size + what-global.estimated_stack_size); if (inlined_stack stack_size_limit inlined_stack PARAM_VALUE (PARAM_LARGE_STACK_FRAME)) @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph self_stack_size = optimize ? estimated_stack_frame_size (node) : 0; inline_summary (node)-estimated_self_stack_size = self_stack_size; node-global.estimated_stack_size = self_stack_size; - node-global.stack_frame_offset = 0; /* Can this function be inlined at all? */ node-local.inlinable = tree_inlinable_function_p (node-decl);
Re: -fdump-passes -fenable-xxx=func_name_list
On Mon, Jun 6, 2011 at 4:22 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 7:24 PM, Xinliang David Li davi...@google.com wrote: The attached is the split #1 patch that enhances -fenable/disable. Ok after testing? I expect the testcases will be quite fragile, so while I appreciate test coverage for new options I think we should go without those that involve any kind of UID. Those which use assembler names also will fail randomly dependent on how targets mangle their functions - so I think we have to drop all testcases. Ok -- how about keeping tests with large uid range, and assembler name for x86? A feature without testing is just to easy to break without being noticed. Also +/* A helper function to determine if an identifier is valid to + be an assembler name (better to use target specific hook). */ + +static bool +is_valid_assembler_name (const char *str) +{ + const char *p = str; + char c; + + c = *p; + if (!((c = 'a' c = 'z') + || (c = 'A' c = 'Z') + || *p == '_')) + return false; + + p++; + while ((c = *p)) + { + if (!((c = 'a' c = 'z') + || (c = 'A' c = 'Z') + || (c = '0' c = '9') + || *p == '_')) + return false; + p++; + } + + return true; +} why all that complicated checks? Why not just check for p[0] in [^0-9] and re-structure the range parsing to switch between UIDs and assembler-names that way? Ok. David Thanks, Richard. Thanks, David On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David
Re: -fdump-passes -fenable-xxx=func_name_list
On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com wrote: This is the version of the patch that walks through pass lists. Ok with this one? +/* Dump all optimization passes. */ + +void +dump_passes (void) +{ + struct cgraph_node *n, *node = NULL; + tree save_fndecl = current_function_decl; + + fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid); this isn't accurate info - cloning can cause more cgraph nodes to appear (it also looks completely unrelated to dump_passes ...). Please drop it. Ok. + create_pass_tab(); + gcc_assert (pass_tab); you have quite many asserts of this kind - we don't want them when the previous stmt as in this case indicates everything is ok. ok. + push_cfun (DECL_STRUCT_FUNCTION (node-decl)); this has side-effects, I'm not sure we want this here. Why do you need it? Probably because of + is_really_on = override_gate_status (pass, current_function_decl, is_on); ? But that is dependent on the function given which should have no effect (unless it is overridden globally in which case override_gate_status and friends should deal with a NULL cfun). As we discussed, currently some pass gate functions depend on per node information -- those checks need to be pushed into execute functions. I would like to clean those up later -- at which time, the push/pop can be removed. I don't understand why you need another table mapping pass to name when pass-name is available and the info is trivially re-constructible. This is needed as the pass-name is not the canonicalized name (i.e., not with number suffix etc), so the extra mapping from id to normalized name is needed. Thanks, David Thanks, Richard. David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output will be bogus anyway. So - what is the output intended for if it isn't reliable? This needs to be cleaned up at some point -- the gate function should behave the same for all functions and per-function decisions need to be pushed down to the executor body. I will try to rework the patch as you suggested to see if there are problems. David Richard. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David
Re: Dump before flag
Your patch doesn't really improve this but adds to the confusion. + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) + { + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; + } and certainly writing to pass is not ok. And the TDF_BEFORE flag looks misplaced as it controls TODOs, not dumping behavior. Yes, it's a mess right now but the above looks like a hack ontop of that mess (maybe because of it, but well ...). How about removing dumping TODO completely -- this can be done easily -- I don't understand why pass wants extra control on the dumping if user already asked for dumping -- it is annoying to see empty IR dump for a pass when I want to see it. At least I would have expected to also get the dump after the pass, not only the one before it with this dump flag. Now, why can't you look at the previous pass output for the before-dump (as I do usually)? For one thing, you need to either remember what is the previous pass, or dump all passes which for large files can take very long time. Even with all the dumps, you will need to eyeballing to find the previous pass which may or may not have the IR dumped. How about removing dump TODO? Thanks, David Richard.
Re: -fdump-passes -fenable-xxx=func_name_list
Please take a look at the revised one. Thanks, David On Mon, Jun 6, 2011 at 4:22 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 7:24 PM, Xinliang David Li davi...@google.com wrote: The attached is the split #1 patch that enhances -fenable/disable. Ok after testing? I expect the testcases will be quite fragile, so while I appreciate test coverage for new options I think we should go without those that involve any kind of UID. Those which use assembler names also will fail randomly dependent on how targets mangle their functions - so I think we have to drop all testcases. Also +/* A helper function to determine if an identifier is valid to + be an assembler name (better to use target specific hook). */ + +static bool +is_valid_assembler_name (const char *str) +{ + const char *p = str; + char c; + + c = *p; + if (!((c = 'a' c = 'z') + || (c = 'A' c = 'Z') + || *p == '_')) + return false; + + p++; + while ((c = *p)) + { + if (!((c = 'a' c = 'z') + || (c = 'A' c = 'Z') + || (c = '0' c = '9') + || *p == '_')) + return false; + p++; + } + + return true; +} why all that complicated checks? Why not just check for p[0] in [^0-9] and re-structure the range parsing to switch between UIDs and assembler-names that way? Thanks, Richard. Thanks, David On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David Index: doc/invoke.texi === --- doc/invoke.texi (revision 174549) +++ doc/invoke.texi (working copy) @@ -5056,11 +5056,12 @@ appended with a sequential number starti Disable rtl pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. @var{range-list} is a comma -seperated list of function ranges. Each range is a number pair seperated by a colon. -The range is inclusive in both ends. If the range is trivial, the number pair can be -simplified a a single number. If the function's cgraph node's @var{uid} is falling -within one of the specified ranges, the @var{pass} is disabled for that function. -The @var{uid} is shown in the function header of a dump file. +seperated list of function ranges or assembler names. Each range is a number +pair seperated by a colon. The range is inclusive in both ends. If the range +is trivial, the number pair can be simplified as a single number. If the +function's cgraph node's @var{uid} is falling within one of the specified ranges, +the @var{pass} is disabled for that function. The @var{uid} is shown in the +function header of a dump file. @item -fdisable-tree-@var{pass} @item -fdisable-tree-@var{pass}=@var{range-list} @@ -5090,7 +5091,8 @@ of option arguments. -fenable-tree-cunroll=1 # disable gcse2 for functions at the following ranges [1,1], # [300,400], and [400,1000] - -fdisable-rtl-gcse2=1:100,300,400:1000 +# disable gcse2 for functions foo and foo2
Re: -fdump-passes -fenable-xxx=func_name_list
This is the patch with max id removed. Thanks, David On Mon, Jun 6, 2011 at 9:00 AM, Xinliang David Li davi...@google.com wrote: On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com wrote: This is the version of the patch that walks through pass lists. Ok with this one? +/* Dump all optimization passes. */ + +void +dump_passes (void) +{ + struct cgraph_node *n, *node = NULL; + tree save_fndecl = current_function_decl; + + fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid); this isn't accurate info - cloning can cause more cgraph nodes to appear (it also looks completely unrelated to dump_passes ...). Please drop it. Ok. + create_pass_tab(); + gcc_assert (pass_tab); you have quite many asserts of this kind - we don't want them when the previous stmt as in this case indicates everything is ok. ok. + push_cfun (DECL_STRUCT_FUNCTION (node-decl)); this has side-effects, I'm not sure we want this here. Why do you need it? Probably because of + is_really_on = override_gate_status (pass, current_function_decl, is_on); ? But that is dependent on the function given which should have no effect (unless it is overridden globally in which case override_gate_status and friends should deal with a NULL cfun). As we discussed, currently some pass gate functions depend on per node information -- those checks need to be pushed into execute functions. I would like to clean those up later -- at which time, the push/pop can be removed. I don't understand why you need another table mapping pass to name when pass-name is available and the info is trivially re-constructible. This is needed as the pass-name is not the canonicalized name (i.e., not with number suffix etc), so the extra mapping from id to normalized name is needed. Thanks, David Thanks, Richard. David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output will be bogus anyway. So - what is the output intended for if it isn't reliable? This needs to be cleaned up at some point -- the gate function should behave the same for all functions and per-function decisions need to be pushed down to the executor body. I will try to rework the patch as you suggested to see if there are problems. David Richard. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David Index: doc/invoke.texi === --- doc
Re: -fdump-passes -fenable-xxx=func_name_list
Is this patch ok? Thanks, David On Wed, Jun 1, 2011 at 10:24 AM, Xinliang David Li davi...@google.com wrote: The attached is the split #1 patch that enhances -fenable/disable. Ok after testing? Thanks, David On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David
Re: -fdump-passes -fenable-xxx=func_name_list
Is this one ok? Thanks, David On Thu, Jun 2, 2011 at 12:12 AM, Xinliang David Li davi...@google.com wrote: This is the version of the patch that walks through pass lists. Ok with this one? David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output will be bogus anyway. So - what is the output intended for if it isn't reliable? This needs to be cleaned up at some point -- the gate function should behave the same for all functions and per-function decisions need to be pushed down to the executor body. I will try to rework the patch as you suggested to see if there are problems. David Richard. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David
Re: don't dump decl_uid with TDF_NOUID
Looks good to me, but you need an OK from a maintainer. Thanks, David On Fri, Jun 3, 2011 at 7:17 AM, Alexandre Oliva aol...@redhat.com wrote: A recent change introduced decl_uid in the “;; Function ” header in dump files. This breaks -fcompare-debug (and bootstrap-debug-lean), because decl uids aren't kept in sync between -g and non-g compilations. This patch rearranges the header so that decl_uid is omitted when dumping with NOUID, and it prints the header for the -fcompare-debug final dump with NOUID. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok? -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: -fdump-passes -fenable-xxx=func_name_list
This is the version of the patch that walks through pass lists. Ok with this one? David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output will be bogus anyway. So - what is the output intended for if it isn't reliable? This needs to be cleaned up at some point -- the gate function should behave the same for all functions and per-function decisions need to be pushed down to the executor body. I will try to rework the patch as you suggested to see if there are problems. David Richard. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David dump-pass3.p Description: Binary data out Description: Binary data
Re: [google] Use minimum cost circulation, not minimum cost flow to smooth profiles other minor fixes. (issue4536106)
ok for google/main. David On Thu, Jun 2, 2011 at 11:00 AM, Martin Thuresson mart...@google.com wrote: This patch from Neil Vachharajani and Dehao Chen improves mcf by using minimum cost circulation instead of minimum cost flow to smooth profiles. It also introduces a parameter for controlling running time of the algorithm. This was what was originally presented in the academic work and handles certain cases where the function entry and exit have incorrect profile weights. For now, this is for google/main. Once I have collected performance results from SPEC I will propose this patch for trunk as well. Bootstraps and no test regressions. Ok for google/main? 2011-06-02 Neil Vachharajani nvach...@gmail.com, Dehao Chen daniel...@gmail.com * gcc/doc/invoke.texi (min-mcf-cancel-iters): Document. * gcc/mcf.c (MAX_ITER): Use new param PARAM_MIN_MCF_CANCEL_ITERS. (edge_type): Add SINK_SOURCE_EDGE. (dump_fixup_edge): Handle SINK_SOURCE_EDGE. (create_fixup_graph): Make problem miminum cost circulation. (cancel_negative_cycle): Update handling of infinite capacity. (compute_residual_flow): Update handling of infinite capacity. (find_max_flow): Update handling of infinite capacity. (modify_sink_source_capacity): New function. (find_minimum_cost_flow): Make problem miminum cost circulation. Use param PARAM_MIN_MCF_CANCEL_ITERS. * gcc/params.def (PARAM_MIN_MCF_CANCEL_ITERS): Define. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 174456) +++ gcc/doc/invoke.texi (working copy) @@ -8341,6 +8341,12 @@ whether the result of a complex multipli The default is @option{-fno-cx-fortran-rules}. +@item min-mcf-cancel-iters +The minimum number of iterations of negative cycle cancellation during +MCF profile correction before early termination. This parameter is +only useful when using @option{-fprofile-correction}. + + @end table The following options control optimizations that may improve Index: gcc/mcf.c === --- gcc/mcf.c (revision 174456) +++ gcc/mcf.c (working copy) @@ -52,6 +52,8 @@ along with GCC; see the file COPYING3. #include langhooks.h #include tree.h #include gcov-io.h +#include params.h +#include diagnostic-core.h #include profile.h @@ -64,15 +66,18 @@ along with GCC; see the file COPYING3. #define COST(k, w) ((k) / mcf_ln ((w) + 2)) /* Limit the number of iterations for cancel_negative_cycles() to ensure reasonable compile time. */ -#define MAX_ITER(n, e) 10 + (100 / ((n) * (e))) +#define MAX_ITER(n, e) (PARAM_VALUE (PARAM_MIN_MCF_CANCEL_ITERS) + \ + (100 / ((n) * (e + typedef enum { - INVALID_EDGE, + INVALID_EDGE = 0, VERTEX_SPLIT_EDGE, /* Edge to represent vertex with w(e) = w(v). */ REDIRECT_EDGE, /* Edge after vertex transformation. */ REVERSE_EDGE, SOURCE_CONNECT_EDGE, /* Single edge connecting to single source. */ SINK_CONNECT_EDGE, /* Single edge connecting to single sink. */ + SINK_SOURCE_EDGE, /* Single edge connecting sink to source. */ BALANCE_EDGE, /* Edge connecting with source/sink: cp(e) = 0. */ REDIRECT_NORMALIZED_EDGE, /* Normalized edge for a redirect edge. */ REVERSE_NORMALIZED_EDGE /* Normalized edge for a reverse edge. */ @@ -250,6 +255,10 @@ dump_fixup_edge (FILE *file, fixup_graph fputs ( @SINK_CONNECT_EDGE, file); break; + case SINK_SOURCE_EDGE: + fputs ( @SINK_SOURCE_EDGE, file); + break; + case REVERSE_EDGE: fputs ( @REVERSE_EDGE, file); break; @@ -465,7 +474,7 @@ create_fixup_graph (fixup_graph_type *fi double k_neg = 0; /* Vector to hold D(v) = sum_out_edges(v) - sum_in_edges(v). */ gcov_type *diff_out_in = NULL; - gcov_type supply_value = 1, demand_value = 0; + gcov_type supply_value = 0, demand_value = 0; gcov_type fcost = 0; int new_entry_index = 0, new_exit_index = 0; int i = 0, j = 0; @@ -486,14 +495,15 @@ create_fixup_graph (fixup_graph_type *fi fnum_vertices_after_transform + n_edges + n_basic_blocks + 2; /* In create_fixup_graph: Each basic block and edge can be split into 3 - edges. Number of balance edges = n_basic_blocks. So after - create_fixup_graph: - max_edges = 4 * n_basic_blocks + 3 * n_edges + edges. Number of balance edges = n_basic_blocks - 1. And there is 1 edge + connecting new_entry and new_exit, and 2 edges connecting new_entry to + entry, and exit to new_exit. So after create_fixup_graph: + max_edges = 4 * n_basic_blocks + 3 * n_edges + 2 Accounting for residual flow edges - max_edges = 2 * (4 * n_basic_blocks + 3 * n_edges) - = 8 * n_basic_blocks + 6 *
Re: [google] Use minimum cost circulation, not minimum cost flow to smooth profiles other minor fixes. (issue4536106)
Counter overflow? David On Thu, Jun 2, 2011 at 11:12 AM, Martin Thuresson mart...@google.com wrote: On Thu, Jun 2, 2011 at 11:05 AM, Xinliang David Li davi...@google.com wrote: Smoothing works for sample FDO and profile data from multi-threaded programs. You won't see any difference in SPEC. Dehao reported some performance improvements from the algorithmic improvements he added in terms of extra fixup edges and handling of infinite capacity. Martin David On Thu, Jun 2, 2011 at 11:00 AM, Martin Thuresson mart...@google.com wrote: This patch from Neil Vachharajani and Dehao Chen improves mcf by using minimum cost circulation instead of minimum cost flow to smooth profiles. It also introduces a parameter for controlling running time of the algorithm. This was what was originally presented in the academic work and handles certain cases where the function entry and exit have incorrect profile weights. For now, this is for google/main. Once I have collected performance results from SPEC I will propose this patch for trunk as well. Bootstraps and no test regressions. Ok for google/main? 2011-06-02 Neil Vachharajani nvach...@gmail.com, Dehao Chen daniel...@gmail.com * gcc/doc/invoke.texi (min-mcf-cancel-iters): Document. * gcc/mcf.c (MAX_ITER): Use new param PARAM_MIN_MCF_CANCEL_ITERS. (edge_type): Add SINK_SOURCE_EDGE. (dump_fixup_edge): Handle SINK_SOURCE_EDGE. (create_fixup_graph): Make problem miminum cost circulation. (cancel_negative_cycle): Update handling of infinite capacity. (compute_residual_flow): Update handling of infinite capacity. (find_max_flow): Update handling of infinite capacity. (modify_sink_source_capacity): New function. (find_minimum_cost_flow): Make problem miminum cost circulation. Use param PARAM_MIN_MCF_CANCEL_ITERS. * gcc/params.def (PARAM_MIN_MCF_CANCEL_ITERS): Define. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 174456) +++ gcc/doc/invoke.texi (working copy) @@ -8341,6 +8341,12 @@ whether the result of a complex multipli The default is @option{-fno-cx-fortran-rules}. +@item min-mcf-cancel-iters +The minimum number of iterations of negative cycle cancellation during +MCF profile correction before early termination. This parameter is +only useful when using @option{-fprofile-correction}. + + @end table The following options control optimizations that may improve Index: gcc/mcf.c === --- gcc/mcf.c (revision 174456) +++ gcc/mcf.c (working copy) @@ -52,6 +52,8 @@ along with GCC; see the file COPYING3. #include langhooks.h #include tree.h #include gcov-io.h +#include params.h +#include diagnostic-core.h #include profile.h @@ -64,15 +66,18 @@ along with GCC; see the file COPYING3. #define COST(k, w) ((k) / mcf_ln ((w) + 2)) /* Limit the number of iterations for cancel_negative_cycles() to ensure reasonable compile time. */ -#define MAX_ITER(n, e) 10 + (100 / ((n) * (e))) +#define MAX_ITER(n, e) (PARAM_VALUE (PARAM_MIN_MCF_CANCEL_ITERS) + \ + (100 / ((n) * (e + typedef enum { - INVALID_EDGE, + INVALID_EDGE = 0, VERTEX_SPLIT_EDGE, /* Edge to represent vertex with w(e) = w(v). */ REDIRECT_EDGE, /* Edge after vertex transformation. */ REVERSE_EDGE, SOURCE_CONNECT_EDGE, /* Single edge connecting to single source. */ SINK_CONNECT_EDGE, /* Single edge connecting to single sink. */ + SINK_SOURCE_EDGE, /* Single edge connecting sink to source. */ BALANCE_EDGE, /* Edge connecting with source/sink: cp(e) = 0. */ REDIRECT_NORMALIZED_EDGE, /* Normalized edge for a redirect edge. */ REVERSE_NORMALIZED_EDGE /* Normalized edge for a reverse edge. */ @@ -250,6 +255,10 @@ dump_fixup_edge (FILE *file, fixup_graph fputs ( @SINK_CONNECT_EDGE, file); break; + case SINK_SOURCE_EDGE: + fputs ( @SINK_SOURCE_EDGE, file); + break; + case REVERSE_EDGE: fputs ( @REVERSE_EDGE, file); break; @@ -465,7 +474,7 @@ create_fixup_graph (fixup_graph_type *fi double k_neg = 0; /* Vector to hold D(v) = sum_out_edges(v) - sum_in_edges(v). */ gcov_type *diff_out_in = NULL; - gcov_type supply_value = 1, demand_value = 0; + gcov_type supply_value = 0, demand_value = 0; gcov_type fcost = 0; int new_entry_index = 0, new_exit_index = 0; int i = 0, j = 0; @@ -486,14 +495,15 @@ create_fixup_graph (fixup_graph_type *fi fnum_vertices_after_transform + n_edges + n_basic_blocks + 2; /* In create_fixup_graph: Each basic block and edge can be split
Re: -fdump-passes -fenable-xxx=func_name_list
On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David
Re: -fdump-passes -fenable-xxx=func_name_list
The attached is the split #1 patch that enhances -fenable/disable. Ok after testing? Thanks, David On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David Index: doc/invoke.texi === --- doc/invoke.texi (revision 174424) +++ doc/invoke.texi (working copy) @@ -5056,11 +5056,12 @@ appended with a sequential number starti Disable rtl pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. @var{range-list} is a comma -seperated list of function ranges. Each range is a number pair seperated by a colon. -The range is inclusive in both ends. If the range is trivial, the number pair can be -simplified a a single number. If the function's cgraph node's @var{uid} is falling -within one of the specified ranges, the @var{pass} is disabled for that function. -The @var{uid} is shown in the function header of a dump file. +seperated list of function ranges or assembler names. Each range is a number +pair seperated by a colon. The range is inclusive in both ends. If the range +is trivial, the number pair can be simplified as a single number. If the +function's cgraph node's @var{uid} is falling within one of the specified ranges, +the @var{pass} is disabled for that function. The @var{uid} is shown in the +function header of a dump file. @item -fdisable-tree-@var{pass} @item -fdisable-tree-@var{pass}=@var{range-list} @@ -5090,7 +5091,8 @@ of option arguments. -fenable-tree-cunroll=1 # disable gcse2 for functions at the following ranges [1,1], # [300,400], and [400,1000] - -fdisable-rtl-gcse2=1:100,300,400:1000 +# disable gcse2 for functions foo and foo2 + -fdisable-rtl-gcse2=foo,foo2 # disable early inlining -fdisable-tree-einline # disable ipa inlining Index: testsuite/gcc.dg/inline_2.c === --- testsuite/gcc.dg/inline_2.c (revision 0) +++ testsuite/gcc.dg/inline_2.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=0:3 -fdisable-ipa-inline } */ +int g; +__attribute__((always_inline)) void bar (void) +{ + g++; +} + +int foo (void) +{ + bar (); + return g; +} + +int foo2 (void) +{ + bar(); + return g + 1; +} + +/* { dg-final { scan-tree-dump-times bar 5 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ +/* { dg-excess-errors extra notes } */ Index: testsuite/gcc.dg/inline_6.c === --- testsuite/gcc.dg/inline_6.c (revision 0) +++ testsuite/gcc.dg/inline_6.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=foo2 -fdisable-ipa-inline } */ +int g; +__attribute__((always_inline)) void bar (void) +{ + g++; +} + +int foo (void) +{ + bar (); + return g; +} + +int foo2 (void) +{ + bar(); + return g + 1; +} + +/* { dg-final { scan-tree-dump-times bar 4 optimized
Re: -fdump-passes -fenable-xxx=func_name_list
The attached is patch-2 (-fdump-passes) and a sample output: Ok for trunk? David On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David Index: doc/invoke.texi === --- doc/invoke.texi (revision 174535) +++ doc/invoke.texi (working copy) @@ -291,6 +291,7 @@ Objective-C and Objective-C++ Dialects}. -fdump-translation-unit@r{[}-@var{n}@r{]} @gol -fdump-class-hierarchy@r{[}-@var{n}@r{]} @gol -fdump-ipa-all -fdump-ipa-cgraph -fdump-ipa-inline @gol +-fdump-passes @gol -fdump-statistics @gol -fdump-tree-all @gol -fdump-tree-original@r{[}-@var{n}@r{]} @gol @@ -5060,7 +5061,8 @@ seperated list of function ranges. Each The range is inclusive in both ends. If the range is trivial, the number pair can be simplified a a single number. If the function's cgraph node's @var{uid} is falling within one of the specified ranges, the @var{pass} is disabled for that function. -The @var{uid} is shown in the function header of a dump file. +The @var{uid} is shown in the function header of a dump file, and pass names can be +dumped by using option @option{-fdump-passes}. @item -fdisable-tree-@var{pass} @item -fdisable-tree-@var{pass}=@var{range-list} @@ -5483,6 +5485,11 @@ Dump after function inlining. @end table +@item -fdump-passes +@opindex fdump-passes +Dump the list of optimization passes that are turned on and off by +the current command line options. + @item -fdump-statistics-@var{option} @opindex fdump-statistics Enable and control dumping of pass statistics in a separate file. The Index: common.opt === --- common.opt (revision 174535) +++ common.opt (working copy) @@ -1012,6 +1012,10 @@ fdump-noaddr Common Report Var(flag_dump_noaddr) Suppress output of addresses in debugging dumps +fdump-passes +Common Var(flag_dump_passes) Init(0) +Dump optimization passes + fdump-unnumbered Common Report Var(flag_dump_unnumbered) Suppress output of instruction numbers, line number notes and addresses in debugging dumps Index: passes.c === --- passes.c (revision 174536) +++ passes.c (working copy) @@ -478,7 +478,7 @@ passr_eq (const void *p1, const void *p2 return !strcmp (s1-unique_name, s2-unique_name); } -static htab_t pass_name_tab = NULL; +static htab_t name_to_pass_map = NULL; /* Register PASS with NAME. */ @@ -488,11 +488,11 @@ register_pass_name (struct opt_pass *pas struct pass_registry **slot; struct pass_registry pr; - if (!pass_name_tab) -pass_name_tab = htab_create (256, passr_hash, passr_eq, NULL); + if (!name_to_pass_map) +name_to_pass_map = htab_create (256, passr_hash, passr_eq, NULL); pr.unique_name = name; - slot = (struct pass_registry **) htab_find_slot (pass_name_tab, pr, INSERT); + slot = (struct pass_registry **) htab_find_slot (name_to_pass_map, pr, INSERT); if (!*slot) { struct pass_registry *new_pr; @@ -506,6 +506,117 @@ register_pass_name (struct opt_pass *pas return; /* Ignore plugin passes
Dump before flag
Hi, this is a simple patch that support dump_before flag. E.g, -fdump-tree-pre-before This is useful for diffing the the IR before and after a pass. Gcc dumping needs more cleanups -- such as allowing IR only dump, allowing IR dumping for a particular function etc. The exposure of 'dumpfile' (instead of a dumping_level () function) makes those change a little messy, but can be done. Ok for trunk? Thanks, David
Re: Dump before flag
Sorry about it. Here it is. David On Wed, Jun 1, 2011 at 1:36 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 10:26 PM, Xinliang David Li davi...@google.com wrote: Hi, this is a simple patch that support dump_before flag. E.g, -fdump-tree-pre-before This is useful for diffing the the IR before and after a pass. Gcc dumping needs more cleanups -- such as allowing IR only dump, allowing IR dumping for a particular function etc. The exposure of 'dumpfile' (instead of a dumping_level () function) makes those change a little messy, but can be done. Ok for trunk? -ENOPATCH Thanks, David 2011-06-01 David Li davi...@google.com * tree-dump.c: New dump flags. * tree-pass.h: New dump flags. * passes.c (execute_one_pass): Handle dump_before flag. Index: tree-dump.c === --- tree-dump.c (revision 174446) +++ tree-dump.c (working copy) @@ -808,6 +808,7 @@ struct dump_option_value_info in tree.h */ static const struct dump_option_value_info dump_options[] = { + {before, TDF_BEFORE}, {address, TDF_ADDRESS}, {asmname, TDF_ASMNAME}, {slim, TDF_SLIM}, Index: tree-pass.h === --- tree-pass.h (revision 174446) +++ tree-pass.h (working copy) @@ -83,6 +83,7 @@ enum tree_dump_index #define TDF_ALIAS (1 21) /* display alias information */ #define TDF_ENUMERATE_LOCALS (1 22) /* Enumerate locals by uid. */ #define TDF_CSELIB (1 23) /* Dump cselib details. */ +#define TDF_BEFORE (1 24) /* Dump IR before pass. */ /* In tree-dump.c */ Index: passes.c === --- passes.c (revision 174446) +++ passes.c (working copy) @@ -1563,6 +1563,13 @@ execute_one_pass (struct opt_pass *pass) initializing_dump = pass_init_dump_file (pass); + /* Override dump TODOs. */ + if (dump_file (pass-todo_flags_finish TODO_dump_func) + (dump_flags TDF_BEFORE)) +{ + pass-todo_flags_finish = ~TODO_dump_func; + pass-todo_flags_start |= TODO_dump_func; +} /* Run pre-pass verification. */ execute_todo (pass-todo_flags_start);
Re: Dump before flag
On Wed, Jun 1, 2011 at 2:12 PM, Basile Starynkevitch bas...@starynkevitch.net wrote: On Wed, 1 Jun 2011 13:26:24 -0700 Xinliang David Li davi...@google.com wrote: Hi, this is a simple patch that support dump_before flag. E.g, -fdump-tree-pre-before This is useful for diffing the the IR before and after a pass. Perhaps you forgot to actually attach the patch? Right -- attached in a follow up email. Gcc dumping needs more cleanups -- such as allowing IR only dump, allowing IR dumping for a particular function etc. The exposure of 'dumpfile' (instead of a dumping_level () function) makes those change a little messy, but can be done. I don't understand what you mean by a dumping_level () function. What should that hypothetical function do? (I'm wrongly guessing it would return an integer, but IIRC dumpfile is a FILE*) THere are two sources of dump: 1) IR dump performed by pass manager 2) pass specific debugging dump (the verbosity is controlled by -details flag). 2) is the part that is messy and needs cleanup. Every pass just checks if dump_file is null or not and decide to dump the debugging info -- there is no easy way to turn it on and off. Ideally, individual pass should call int debug_dump_level () -- dumps when it returns 0. With that in place, the dump flag -fdump-xxx-yyy-ir_only can be easily implemented -- it only turns on pass manager dump, but lowers the debug dump level to 0. David Regards -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: New options to disable/enable any pass for any functions (issue4550056)
On Tue, May 31, 2011 at 2:05 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, May 30, 2011 at 10:16 PM, Xinliang David Li davi...@google.com wrote: The attached are two simple follow up patches 1) the first patch does some refactorization on function header dumping (with more information printed) 2) the second patch cleans up some pass names. Part of the cleanup results from a previous discussion with Honza -- a) rename 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and 'profile' into 'profile_estimate'. The rest of cleanups are needed to make sure pass names are unique. Ok for trunk? + +void +pass_dump_function_header (FILE *dump_file, tree fdecl, struct function *fun) This function needs documentation, the ChangeLog entry misses the tree-pretty-print.h change. +struct function; instead of this please include coretypes.h from tree-pretty-print.h and add the struct function forward declaration there if it isn't already present. You change the output of the header, so please make sure you have bootstrapped and tested with _all_ languages included (and also watch for bugreports for target specific bugs). Ok. + fprintf (dump_file, \n;; Function %s (%s, funcdef_no=%d, uid=%d), + dname, aname, fun-funcdef_no, node-uid); I see no point in dumping funcdef_no - it wasn't dumped before in any place. Instead I miss dumping of the DECL_UID and thus a more specific 'uid', like 'cgraph-uid'. Ok will add decl_uid. Funcdef_no is very useful for debugging FDO coverage mismatch related problems as it is the id used in profile hashing. + aname = (IDENTIFIER_POINTER + (DECL_ASSEMBLER_NAME (fdecl))); using DECL_ASSEMBLER_NAME is bad - it might trigger computation of DECL_ASSEMBLER_NAME which certainly shouldn't be done only for dumping purposes. Instead do sth like if (DECL_ASSEMBLER_NAME_SET_P (fdecl)) aname = DECL_ASSEMBLER_NAME (fdecl); else aname = 'unset-asm-name'; Ok. and please also watch for cgraph_get_node returning NULL. Also please call the function dump_function_header instead of pass_dump_function_header. Ok. Thanks, David Please re-post with appropriate changes. Thanks, Richard. Thanks, David On Fri, May 27, 2011 at 2:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com wrote: The latest version that only exports 2 functions from passes.c. Ok with ... @@ -637,4 +637,8 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void disable_pass (const char *); +extern void enable_pass (const char *); +struct function; + struct function forward decl removed. + explicitly_enabled = is_pass_explicitly_enabled (pass, func); + explicitly_disabled = is_pass_explicitly_disabled (pass, func); both functions inlined here and removed. +#define MAX_PASS_ID 512 this removed and instead a VEC_safe_grow_cleared () or VEC_length () before the accesses. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol +-fenable-tree-@var{pass}=@var{range-list} @gol -fenable-@var{kind}-@var{pass}, etc. +@item -fdisable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list} +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list} likewise. Thanks, Richard. David On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote: On Thu, May 26, 2011 at 2:04 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). Hmm, I suppose then you reviewed the option handling parts and they are ok? Those globbing options always cause headache to me. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol so, no -fenable-tree-@var{pass}=@var{range-list}? Missed. Will add. Does the pass name match 1:1 with the dump file name? In which case Yes. +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. isn't true as passes that are invoked only a single time lack the number suffix (yes, I'd
-fdump-passes -fenable-xxx=func_name_list
The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Thanks, David out Description: Binary data 2011-05-31 David Li davi...@google.com * passes.c (passr_eq): (register_pass_name): Change pass name table name. (get_pass_by_name): Ditto. (enable_disable_pass): Handle function assembler name. (is_pass_explicitly_enabled_or_disabled): Ditto. (execute_one_pass): Add pass dumping. (execute_pass_list): Track pass identation. (execute_ipa_pass_list): Ditto. (dump_one_pass): New function. (pass_traverse): Ditto. (enter_pass_list): Ditto. (exit_pass_list): Ditto. (is_valid_assembler_name): Ditto. 2011-05-31 David Li davi...@google.com * testsuite/gcc.dg/inline_2.c (revision 0): * testsuite/gcc.dg/inline_6.c (revision 0): * testsuite/gcc.dg/unroll_2.c (revision 0): * testsuite/gcc.dg/inline_3.c (revision 0): * testsuite/gcc.dg/unroll_3.c (revision 0): * testsuite/gcc.dg/inline_4.c (revision 0): * testsuite/gcc.dg/unroll_4.c (revision 0): * testsuite/gcc.dg/inline_1.c (revision 0): * testsuite/gcc.dg/inline_5.c (revision 0): * testsuite/gcc.dg/unroll_1.c (revision 0): Index: doc/invoke.texi === --- doc/invoke.texi (revision 174424) +++ doc/invoke.texi (working copy) @@ -291,6 +291,7 @@ Objective-C and Objective-C++ Dialects}. -fdump-translation-unit@r{[}-@var{n}@r{]} @gol -fdump-class-hierarchy@r{[}-@var{n}@r{]} @gol -fdump-ipa-all -fdump-ipa-cgraph -fdump-ipa-inline @gol +-fdump-passes @gol -fdump-statistics @gol -fdump-tree-all @gol -fdump-tree-original@r{[}-@var{n}@r{]} @gol @@ -5056,11 +5057,12 @@ appended with a sequential number starti Disable rtl pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. @var{range-list} is a comma -seperated list of function ranges. Each range is a number pair seperated by a colon. -The range is inclusive in both ends. If the range is trivial, the number pair can be -simplified a a single number. If the function's cgraph node's @var{uid} is falling -within one of the specified ranges, the @var{pass} is disabled for that function. -The @var{uid} is shown in the function header of a dump file. +seperated list of function ranges or assembler names. Each range is a number +pair seperated by a colon. The range is inclusive in both ends. If the range +is trivial, the number pair can be simplified as a single number. If the +function's cgraph node's @var{uid} is falling within one of the specified ranges, +the @var{pass} is disabled for that function. The @var{uid} is shown in the +function header of a dump file. @item -fdisable-tree-@var{pass} @item -fdisable-tree-@var{pass}=@var{range-list} @@ -5090,7 +5092,8 @@ of option arguments. -fenable-tree-cunroll=1 # disable gcse2 for functions at the following ranges [1,1], # [300,400], and [400,1000] - -fdisable-rtl-gcse2=1:100,300,400:1000 +# disable gcse2 for functions foo and foo2 + -fdisable-rtl-gcse2=foo,foo2 # disable early inlining -fdisable-tree-einline # disable ipa inlining @@ -5483,6 +5486,11 @@ Dump after function inlining. @end table +@item -fdump-passes +@optindex fdump-passes +Dump the list of optimization passes that are turned on and off by +the current command line options. + @item -fdump-statistics-@var{option} @opindex fdump-statistics Enable and control dumping of pass statistics in a separate file. The Index: testsuite/gcc.dg/inline_2.c === --- testsuite/gcc.dg/inline_2.c (revision 0) +++ testsuite/gcc.dg/inline_2.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=0:3 -fdisable-ipa-inline } */ +int g; +__attribute__((always_inline)) void bar (void) +{ + g++; +} + +int foo (void) +{ + bar (); + return g; +} + +int foo2 (void) +{ + bar(); + return g + 1; +} + +/* { dg-final { scan-tree-dump-times bar 5 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ +/* { dg-excess-errors extra notes } */ Index: testsuite/gcc.dg/inline_6.c === --- testsuite/gcc.dg/inline_6.c (revision 0) +++ testsuite/gcc.dg/inline_6.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=foo2 -fdisable-ipa-inline } */ +int g; +__attribute__((always_inline)) void bar (void) +{ + g++; +} + +int foo (void) +{ + bar
Re: New options to disable/enable any pass for any functions (issue4550056)
The new patch is attached. The test (c,c++,fortran, java, ada) is on going. Thanks, David On Tue, May 31, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote: On Tue, May 31, 2011 at 2:05 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, May 30, 2011 at 10:16 PM, Xinliang David Li davi...@google.com wrote: The attached are two simple follow up patches 1) the first patch does some refactorization on function header dumping (with more information printed) 2) the second patch cleans up some pass names. Part of the cleanup results from a previous discussion with Honza -- a) rename 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and 'profile' into 'profile_estimate'. The rest of cleanups are needed to make sure pass names are unique. Ok for trunk? + +void +pass_dump_function_header (FILE *dump_file, tree fdecl, struct function *fun) This function needs documentation, the ChangeLog entry misses the tree-pretty-print.h change. +struct function; instead of this please include coretypes.h from tree-pretty-print.h and add the struct function forward declaration there if it isn't already present. You change the output of the header, so please make sure you have bootstrapped and tested with _all_ languages included (and also watch for bugreports for target specific bugs). Ok. + fprintf (dump_file, \n;; Function %s (%s, funcdef_no=%d, uid=%d), + dname, aname, fun-funcdef_no, node-uid); I see no point in dumping funcdef_no - it wasn't dumped before in any place. Instead I miss dumping of the DECL_UID and thus a more specific 'uid', like 'cgraph-uid'. Ok will add decl_uid. Funcdef_no is very useful for debugging FDO coverage mismatch related problems as it is the id used in profile hashing. + aname = (IDENTIFIER_POINTER + (DECL_ASSEMBLER_NAME (fdecl))); using DECL_ASSEMBLER_NAME is bad - it might trigger computation of DECL_ASSEMBLER_NAME which certainly shouldn't be done only for dumping purposes. Instead do sth like if (DECL_ASSEMBLER_NAME_SET_P (fdecl)) aname = DECL_ASSEMBLER_NAME (fdecl); else aname = 'unset-asm-name'; Ok. and please also watch for cgraph_get_node returning NULL. Also please call the function dump_function_header instead of pass_dump_function_header. Ok. Thanks, David Please re-post with appropriate changes. Thanks, Richard. Thanks, David On Fri, May 27, 2011 at 2:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com wrote: The latest version that only exports 2 functions from passes.c. Ok with ... @@ -637,4 +637,8 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void disable_pass (const char *); +extern void enable_pass (const char *); +struct function; + struct function forward decl removed. + explicitly_enabled = is_pass_explicitly_enabled (pass, func); + explicitly_disabled = is_pass_explicitly_disabled (pass, func); both functions inlined here and removed. +#define MAX_PASS_ID 512 this removed and instead a VEC_safe_grow_cleared () or VEC_length () before the accesses. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol +-fenable-tree-@var{pass}=@var{range-list} @gol -fenable-@var{kind}-@var{pass}, etc. +@item -fdisable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list} +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list} likewise. Thanks, Richard. David On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote: On Thu, May 26, 2011 at 2:04 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). Hmm, I suppose then you reviewed the option handling parts and they are ok? Those globbing options always cause headache to me. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol so, no -fenable-tree-@var{pass}=@var{range-list}? Missed. Will add. Does the pass name match 1:1 with the dump file name? In which case Yes. +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler
Re: New options to disable/enable any pass for any functions (issue4550056)
Honza, are you ok with the pass name change? David On Tue, May 31, 2011 at 2:07 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, May 30, 2011 at 11:44 PM, Xinliang David Li davi...@google.com wrote: This is the complete patch for pass name fixes (with test case changes). This is ok if Honza thinks the profile pass names make more sense this way. Thanks, Richard. David On Mon, May 30, 2011 at 1:16 PM, Xinliang David Li davi...@google.com wrote: The attached are two simple follow up patches 1) the first patch does some refactorization on function header dumping (with more information printed) 2) the second patch cleans up some pass names. Part of the cleanup results from a previous discussion with Honza -- a) rename 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and 'profile' into 'profile_estimate'. The rest of cleanups are needed to make sure pass names are unique. Ok for trunk? Thanks, David On Fri, May 27, 2011 at 2:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com wrote: The latest version that only exports 2 functions from passes.c. Ok with ... @@ -637,4 +637,8 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void disable_pass (const char *); +extern void enable_pass (const char *); +struct function; + struct function forward decl removed. + explicitly_enabled = is_pass_explicitly_enabled (pass, func); + explicitly_disabled = is_pass_explicitly_disabled (pass, func); both functions inlined here and removed. +#define MAX_PASS_ID 512 this removed and instead a VEC_safe_grow_cleared () or VEC_length () before the accesses. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol +-fenable-tree-@var{pass}=@var{range-list} @gol -fenable-@var{kind}-@var{pass}, etc. +@item -fdisable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list} +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list} likewise. Thanks, Richard. David On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote: On Thu, May 26, 2011 at 2:04 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). Hmm, I suppose then you reviewed the option handling parts and they are ok? Those globbing options always cause headache to me. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol so, no -fenable-tree-@var{pass}=@var{range-list}? Missed. Will add. Does the pass name match 1:1 with the dump file name? In which case Yes. +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. isn't true as passes that are invoked only a single time lack the number suffix (yes, I'd really like that to be changed ...) Yes, pass with single static instance does not need number suffix. Please break likes also in .texi files and stick to 80 columns. Done. Please document that these options are debugging options and regular options for enabling/disabling passes should be used. I would suggest to group documentation differently and document -fenable-* and -fdisable-*, thus, + -fdisable-@var{kind}-@var{pass} + -fenable-@var{kind}-@var{pass} Even in .texi files please two spaces after a full-stop. Done +extern void enable_disable_pass (const char *, bool); I'd rather have both enable_pass and disable_pass ;) Ok. +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); that's odd and probably should be split out, the function should maybe reside in tree-pretty-print.c. Ok. Index: tree-ssa-loop-ivopts.c === --- tree-ssa-loop-ivopts.c (revision 173837) +++ tree-ssa-loop-ivopts.c (working copy) @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d well - doesn't belong here ;) Sorry -- many things in the same client -- not needed with your latest change anyway. +static hashval_t +passr_hash
Re: New options to disable/enable any pass for any functions (issue4550056)
Please discard the previous one. This is the right one: David On Tue, May 31, 2011 at 5:01 PM, Xinliang David Li davi...@google.com wrote: The new patch is attached. The test (c,c++,fortran, java, ada) is on going. Thanks, David On Tue, May 31, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote: On Tue, May 31, 2011 at 2:05 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, May 30, 2011 at 10:16 PM, Xinliang David Li davi...@google.com wrote: The attached are two simple follow up patches 1) the first patch does some refactorization on function header dumping (with more information printed) 2) the second patch cleans up some pass names. Part of the cleanup results from a previous discussion with Honza -- a) rename 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and 'profile' into 'profile_estimate'. The rest of cleanups are needed to make sure pass names are unique. Ok for trunk? + +void +pass_dump_function_header (FILE *dump_file, tree fdecl, struct function *fun) This function needs documentation, the ChangeLog entry misses the tree-pretty-print.h change. +struct function; instead of this please include coretypes.h from tree-pretty-print.h and add the struct function forward declaration there if it isn't already present. You change the output of the header, so please make sure you have bootstrapped and tested with _all_ languages included (and also watch for bugreports for target specific bugs). Ok. + fprintf (dump_file, \n;; Function %s (%s, funcdef_no=%d, uid=%d), + dname, aname, fun-funcdef_no, node-uid); I see no point in dumping funcdef_no - it wasn't dumped before in any place. Instead I miss dumping of the DECL_UID and thus a more specific 'uid', like 'cgraph-uid'. Ok will add decl_uid. Funcdef_no is very useful for debugging FDO coverage mismatch related problems as it is the id used in profile hashing. + aname = (IDENTIFIER_POINTER + (DECL_ASSEMBLER_NAME (fdecl))); using DECL_ASSEMBLER_NAME is bad - it might trigger computation of DECL_ASSEMBLER_NAME which certainly shouldn't be done only for dumping purposes. Instead do sth like if (DECL_ASSEMBLER_NAME_SET_P (fdecl)) aname = DECL_ASSEMBLER_NAME (fdecl); else aname = 'unset-asm-name'; Ok. and please also watch for cgraph_get_node returning NULL. Also please call the function dump_function_header instead of pass_dump_function_header. Ok. Thanks, David Please re-post with appropriate changes. Thanks, Richard. Thanks, David On Fri, May 27, 2011 at 2:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com wrote: The latest version that only exports 2 functions from passes.c. Ok with ... @@ -637,4 +637,8 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void disable_pass (const char *); +extern void enable_pass (const char *); +struct function; + struct function forward decl removed. + explicitly_enabled = is_pass_explicitly_enabled (pass, func); + explicitly_disabled = is_pass_explicitly_disabled (pass, func); both functions inlined here and removed. +#define MAX_PASS_ID 512 this removed and instead a VEC_safe_grow_cleared () or VEC_length () before the accesses. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol +-fenable-tree-@var{pass}=@var{range-list} @gol -fenable-@var{kind}-@var{pass}, etc. +@item -fdisable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list} +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list} likewise. Thanks, Richard. David On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote: On Thu, May 26, 2011 at 2:04 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). Hmm, I suppose then you reviewed the option handling parts and they are ok? Those globbing options always cause headache to me. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol so, no -fenable-tree-@var{pass}=@var{range-list}? Missed. Will add. Does the pass name match 1:1 with the dump file name
Re: New options to disable/enable any pass for any functions (issue4550056)
The attached are two simple follow up patches 1) the first patch does some refactorization on function header dumping (with more information printed) 2) the second patch cleans up some pass names. Part of the cleanup results from a previous discussion with Honza -- a) rename 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and 'profile' into 'profile_estimate'. The rest of cleanups are needed to make sure pass names are unique. Ok for trunk? Thanks, David On Fri, May 27, 2011 at 2:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com wrote: The latest version that only exports 2 functions from passes.c. Ok with ... @@ -637,4 +637,8 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void disable_pass (const char *); +extern void enable_pass (const char *); +struct function; + struct function forward decl removed. + explicitly_enabled = is_pass_explicitly_enabled (pass, func); + explicitly_disabled = is_pass_explicitly_disabled (pass, func); both functions inlined here and removed. +#define MAX_PASS_ID 512 this removed and instead a VEC_safe_grow_cleared () or VEC_length () before the accesses. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol +-fenable-tree-@var{pass}=@var{range-list} @gol -fenable-@var{kind}-@var{pass}, etc. +@item -fdisable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list} +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list} likewise. Thanks, Richard. David On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote: On Thu, May 26, 2011 at 2:04 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). Hmm, I suppose then you reviewed the option handling parts and they are ok? Those globbing options always cause headache to me. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol so, no -fenable-tree-@var{pass}=@var{range-list}? Missed. Will add. Does the pass name match 1:1 with the dump file name? In which case Yes. +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. isn't true as passes that are invoked only a single time lack the number suffix (yes, I'd really like that to be changed ...) Yes, pass with single static instance does not need number suffix. Please break likes also in .texi files and stick to 80 columns. Done. Please document that these options are debugging options and regular options for enabling/disabling passes should be used. I would suggest to group documentation differently and document -fenable-* and -fdisable-*, thus, + -fdisable-@var{kind}-@var{pass} + -fenable-@var{kind}-@var{pass} Even in .texi files please two spaces after a full-stop. Done +extern void enable_disable_pass (const char *, bool); I'd rather have both enable_pass and disable_pass ;) Ok. +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); that's odd and probably should be split out, the function should maybe reside in tree-pretty-print.c. Ok. Index: tree-ssa-loop-ivopts.c === --- tree-ssa-loop-ivopts.c (revision 173837) +++ tree-ssa-loop-ivopts.c (working copy) @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d well - doesn't belong here ;) Sorry -- many things in the same client -- not needed with your latest change anyway. +static hashval_t +passr_hash (const void *p) +{ + const struct pass_registry *const s = (const struct pass_registry *const) p; + return htab_hash_string (s-unique_name); +} + +/* Hash equal function */ + +static int +passr_eq (const void *p1, const void *p2) +{ + const struct pass_registry *const s1 = (const struct pass_registry *const) p1; + const struct pass_registry *const s2 = (const struct pass_registry *const) p2; + + return !strcmp (s1-unique_name, s2-unique_name); +} you can use
[google] backport 174423, 174424 to google/main
Committed. David
Re: New options to disable/enable any pass for any functions (issue4550056)
This is the complete patch for pass name fixes (with test case changes). David On Mon, May 30, 2011 at 1:16 PM, Xinliang David Li davi...@google.com wrote: The attached are two simple follow up patches 1) the first patch does some refactorization on function header dumping (with more information printed) 2) the second patch cleans up some pass names. Part of the cleanup results from a previous discussion with Honza -- a) rename 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and 'profile' into 'profile_estimate'. The rest of cleanups are needed to make sure pass names are unique. Ok for trunk? Thanks, David On Fri, May 27, 2011 at 2:58 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com wrote: The latest version that only exports 2 functions from passes.c. Ok with ... @@ -637,4 +637,8 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void disable_pass (const char *); +extern void enable_pass (const char *); +struct function; + struct function forward decl removed. + explicitly_enabled = is_pass_explicitly_enabled (pass, func); + explicitly_disabled = is_pass_explicitly_disabled (pass, func); both functions inlined here and removed. +#define MAX_PASS_ID 512 this removed and instead a VEC_safe_grow_cleared () or VEC_length () before the accesses. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol +-fenable-tree-@var{pass}=@var{range-list} @gol -fenable-@var{kind}-@var{pass}, etc. +@item -fdisable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass} +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list} +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list} likewise. Thanks, Richard. David On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote: On Thu, May 26, 2011 at 2:04 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). Hmm, I suppose then you reviewed the option handling parts and they are ok? Those globbing options always cause headache to me. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol so, no -fenable-tree-@var{pass}=@var{range-list}? Missed. Will add. Does the pass name match 1:1 with the dump file name? In which case Yes. +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. isn't true as passes that are invoked only a single time lack the number suffix (yes, I'd really like that to be changed ...) Yes, pass with single static instance does not need number suffix. Please break likes also in .texi files and stick to 80 columns. Done. Please document that these options are debugging options and regular options for enabling/disabling passes should be used. I would suggest to group documentation differently and document -fenable-* and -fdisable-*, thus, + -fdisable-@var{kind}-@var{pass} + -fenable-@var{kind}-@var{pass} Even in .texi files please two spaces after a full-stop. Done +extern void enable_disable_pass (const char *, bool); I'd rather have both enable_pass and disable_pass ;) Ok. +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); that's odd and probably should be split out, the function should maybe reside in tree-pretty-print.c. Ok. Index: tree-ssa-loop-ivopts.c === --- tree-ssa-loop-ivopts.c (revision 173837) +++ tree-ssa-loop-ivopts.c (working copy) @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d well - doesn't belong here ;) Sorry -- many things in the same client -- not needed with your latest change anyway. +static hashval_t +passr_hash (const void *p) +{ + const struct pass_registry *const s = (const struct pass_registry *const) p; + return htab_hash_string (s-unique_name); +} + +/* Hash equal function */ + +static int +passr_eq (const void *p1, const void *p2) +{ + const struct pass_registry *const s1 = (const struct pass_registry *const) p1
Re: New options to disable/enable any pass for any functions (issue4550056)
The latest version that only exports 2 functions from passes.c. David On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote: On Thu, May 26, 2011 at 2:04 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 25 May 2011, Xinliang David Li wrote: Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html I don't consider this an option handling patch. Patches adding whole new features involving new options should be reviewed by maintainers for the part of the compiler relevant to those features (since there isn't a pass manager maintainer, I guess that means middle-end). Hmm, I suppose then you reviewed the option handling parts and they are ok? Those globbing options always cause headache to me. +-fenable-ipa-@var{pass} @gol +-fenable-rtl-@var{pass} @gol +-fenable-rtl-@var{pass}=@var{range-list} @gol +-fenable-tree-@var{pass} @gol so, no -fenable-tree-@var{pass}=@var{range-list}? Missed. Will add. Does the pass name match 1:1 with the dump file name? In which case Yes. +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same pass is statically invoked in the compiler multiple times, the pass name should be appended with a sequential number starting from 1. isn't true as passes that are invoked only a single time lack the number suffix (yes, I'd really like that to be changed ...) Yes, pass with single static instance does not need number suffix. Please break likes also in .texi files and stick to 80 columns. Done. Please document that these options are debugging options and regular options for enabling/disabling passes should be used. I would suggest to group documentation differently and document -fenable-* and -fdisable-*, thus, + -fdisable-@var{kind}-@var{pass} + -fenable-@var{kind}-@var{pass} Even in .texi files please two spaces after a full-stop. Done +extern void enable_disable_pass (const char *, bool); I'd rather have both enable_pass and disable_pass ;) Ok. +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); that's odd and probably should be split out, the function should maybe reside in tree-pretty-print.c. Ok. Index: tree-ssa-loop-ivopts.c === --- tree-ssa-loop-ivopts.c (revision 173837) +++ tree-ssa-loop-ivopts.c (working copy) @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d well - doesn't belong here ;) Sorry -- many things in the same client -- not needed with your latest change anyway. +static hashval_t +passr_hash (const void *p) +{ + const struct pass_registry *const s = (const struct pass_registry *const) p; + return htab_hash_string (s-unique_name); +} + +/* Hash equal function */ + +static int +passr_eq (const void *p1, const void *p2) +{ + const struct pass_registry *const s1 = (const struct pass_registry *const) p1; + const struct pass_registry *const s2 = (const struct pass_registry *const) p2; + + return !strcmp (s1-unique_name, s2-unique_name); +} you can use htab_hash_string and strcmp directly, no need for these wrappers. The hashtable entry is not string in this case. It is pass_registry, thus the wrapper. +void +register_pass_name (struct opt_pass *pass, const char *name) doesn't seem to need exporting, so don't and make it static. Done. + if (!pass_name_tab) + pass_name_tab = htab_create (10, passr_hash, passr_eq, NULL); see above, the initial size should be larger - we have 223 passes at the moment, so use 256. Done. + else + return; /* Ignore plugin passes. */ ? You mean they might clash? Yes, name clash. +struct opt_pass * +get_pass_by_name (const char *name) doesn't need exporting either. Done. + if (is_enable) + error (unrecognized option -fenable); + else + error (unrecognized option -fdisable); I think that should be fatal_error - Joseph? + if (is_enable) + error (unknown pass %s specified in -fenable, phase_name); + else + error (unknown pass %s specified in -fdisble, phase_name); likewise. + if (!enabled_pass_uid_range_tab) + enabled_pass_uid_range_tab = htab_create (10, pass_hash, pass_eq, NULL); instead of using a hashtable here please use a VEC indexed by the static_pass_number which shoud speed up Ok. The reason I did not use it is because in most of the cases, the htab will be very small -- it is determined by the number of passes specified in the command line, while the VEC requires allocating const size array. Not an issue as long as by default the array is not allocated. +static bool +is_pass_explicitly_enabled_or_disabled (struct opt_pass *pass, + tree func, htab_t tab
Re: New options to disable/enable any pass for any functions (issue4550056)
Ping. The link to the message: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html Thanks, David On Sun, May 22, 2011 at 4:17 PM, Xinliang David Li davi...@google.com wrote: Ping. David On Fri, May 20, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote: Ok to check in this one? Thanks, David On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 18 May 2011, David Li wrote: + error (Unrecognized option %s, is_enable ? -fenable : -fdisable); + error (Unknown pass %s specified in %s, + phase_name, + is_enable ? -fenable : -fdisable); Follow GNU Coding Standards for diagnostics (start with lowercase letter). + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Use separate calls to inform for the enable and disable cases, so that full sentences can be extracted for translation. + error (Invalid range %s in option %s, + one_range, + is_enable ? -fenable : -fdisable); GNU Coding Standards. + error (Invalid range %s in option %s, Likewise. + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Again needs GCS and i18n fixes. -- Joseph S. Myers jos...@codesourcery.com
Fix for PR 48988
Hi, the following trial patch fixed PR 48988 which is a regression. Bootstrap and tested on x86/linux. Verified the reported failure is fixed. Ok for trunk? David 2011-05-22 David Li davi...@google.com PR tree-optimization/48988 * tree-ssa-uninit.c (convert_control_dep_chain_into_preds): Initialize has_valid_pred for each pred chain. u.p Description: Binary data
Re: New options to disable/enable any pass for any functions (issue4550056)
Ping. David On Fri, May 20, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote: Ok to check in this one? Thanks, David On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 18 May 2011, David Li wrote: + error (Unrecognized option %s, is_enable ? -fenable : -fdisable); + error (Unknown pass %s specified in %s, + phase_name, + is_enable ? -fenable : -fdisable); Follow GNU Coding Standards for diagnostics (start with lowercase letter). + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Use separate calls to inform for the enable and disable cases, so that full sentences can be extracted for translation. + error (Invalid range %s in option %s, + one_range, + is_enable ? -fenable : -fdisable); GNU Coding Standards. + error (Invalid range %s in option %s, Likewise. + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Again needs GCS and i18n fixes. -- Joseph S. Myers jos...@codesourcery.com
Re: [google] Increase inlining limits with FDO/LIPO
On Fri, May 20, 2011 at 2:12 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, May 19, 2011 at 8:26 PM, Xinliang David Li davi...@google.com wrote: I have done some SPEC testing evaluating the performance impact of your patch. They look very positive. LIPO got helped even more than FDO (I only did SPEC2k LIPO testing). Did you also check impact on compile-time and code-size? Not yet for SPEC -- will pick some benchmarks and take a look. David Thanks, David 1. SPEC06 (C/C++) with FDO before after Improvement - 400.perlbench 27.4 28.2 2.89% --- 401.bzip2 18.1 18.2 0.28% 403.gcc 25.5 26.3 3.26% --- 429.mcf 26.0 26.0 0.08% 445.gobmk 22.6 23.2 2.30% 456.hmmer 20.1 19.8 -1.25% 458.sjeng 23.6 23.6 -0.42% 462.libquantum 57.1 56.9 -0.40% 464.h264ref 34.4 34.1 -0.70% 471.omnetpp 18.8 18.9 0.53% 473.astar 16.6 17.0 2.53% --- 483.xalancbmk 27.4 28.5 3.79% --- 999.specrand 94.9 98.4 3.71% --- 450.soplex 34.5 33.8 -2.00% 447.dealII 32.0 31.9 -0.34% 453.povray 25.9 28.3 9.02% --- 482.sphinx3 32.6 31.4 -3.50% 2. SPEC2k FDO before after Improvement -- 164.gzip 1308 1372 4.95% 175.vpr 1723 1805 4.76% 176.gcc 2407 2504 4.01% 181.mcf 1724 1748 1.38% 186.crafty 2292 2349 2.47% 197.parser 1457 1601 9.88% 252.eon 2557 2588 1.22% 253.perlbmk 2479 2574 3.83% 254.gap 1996 2013 0.84% 255.vortex 2683 2798 4.31% 256.bzip2 1833 1829 -0.26% 300.twolf 2321 2359 1.63% 188.ammp 771 766 -0.72% 183.equake 1071 1071 0.05% 179.art 2954 2979 0.85% 3. SPEC2k LIPO: before after Improvement - 164.gzip 1311 1405 7.18% 175.vpr 1732 1772 2.35% 176.gcc 2462 2559 3.96% 181.mcf 1723 1731 0.50% 186.crafty 2552 2662 4.33% 197.parser 1468 1671 13.78% 252.eon 2690 3000 11.49% 253.perlbmk 2545 2611 2.60% 254.gap 2097 2152 2.60% 255.vortex 2949 3719 26.11% 256.bzip2 1864 1935 3.78% 300.twolf 2371 2471 4.22% 188.ammp 771 774 0.41% 183.equake 1081 1081 -0.04% 179.art 2878 2884 0.24% 4. SPEC2k LIPO vs FDO before the change: FDO(before) LIPO(before) Improvement - 164.gzip 1308 1311 0.22% 175.vpr 1723 1732 0.53% 176.gcc 2407 2462 2.27% 181.mcf 1724 1723 -0.07% 186.crafty 2292 2552 11.32% 197.parser 1457
Re: New options to disable/enable any pass for any functions (issue4550056)
Ok to check in this one? Thanks, David On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 18 May 2011, David Li wrote: + error (Unrecognized option %s, is_enable ? -fenable : -fdisable); + error (Unknown pass %s specified in %s, + phase_name, + is_enable ? -fenable : -fdisable); Follow GNU Coding Standards for diagnostics (start with lowercase letter). + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Use separate calls to inform for the enable and disable cases, so that full sentences can be extracted for translation. + error (Invalid range %s in option %s, + one_range, + is_enable ? -fenable : -fdisable); GNU Coding Standards. + error (Invalid range %s in option %s, Likewise. + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Again needs GCS and i18n fixes. -- Joseph S. Myers jos...@codesourcery.com
Re: [google] Increase inlining limits with FDO/LIPO
Some code size and timing number (profile use compile) are collected. Summary: Compile time for profile-use compilation increase for all cases -- this is probably not a big issue as this is for peak performance. It is more interesting to look at the size numbers. C++ program size actually decrease in many cases (which match the results from our internal benchmarks). For some benchmark such as povray, it reduce quite a bit (either due to more DFE, or better cleanups -- have not looked at details). For C programs, especially smaller ones, the size increase (some significantly). David 1. xalancbmk: Size (total size of object files, not text size which is correlated) Before: 17,801,488 After : 17,252,032 Change: -3% Time (make all with -j1) Before: 248 seconds After : 286 seconds Change: 15% 2. povray: Size Before: 2,855,690 After: 2,113,754 Change: -25% Time: Before: 36.5s After:43.2 s Change: +18% 3. eon Size Before: 3,988,460 After :4,099,108 Change: +2.8% Time Before: 53.1s After:56.8s Change: +6.9% 4. gcc Size Before: 14,162,908 After:15,089,106 Change: +6.5% Time Before: 57.3s After:60.7s Change: +5.9% 5. Parser: Size Before: 1,175,509 After: 1,568,135 Change: +33.4% Time Before: 5.3s After: 12.9s Change: +143% On Fri, May 20, 2011 at 8:53 AM, Xinliang David Li davi...@google.com wrote: On Fri, May 20, 2011 at 2:12 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, May 19, 2011 at 8:26 PM, Xinliang David Li davi...@google.com wrote: I have done some SPEC testing evaluating the performance impact of your patch. They look very positive. LIPO got helped even more than FDO (I only did SPEC2k LIPO testing). Did you also check impact on compile-time and code-size? Not yet for SPEC -- will pick some benchmarks and take a look. David Thanks, David 1. SPEC06 (C/C++) with FDO before after Improvement - 400.perlbench 27.4 28.2 2.89% --- 401.bzip2 18.1 18.2 0.28% 403.gcc 25.5 26.3 3.26% --- 429.mcf 26.0 26.0 0.08% 445.gobmk 22.6 23.2 2.30% 456.hmmer 20.1 19.8 -1.25% 458.sjeng 23.6 23.6 -0.42% 462.libquantum 57.1 56.9 -0.40% 464.h264ref 34.4 34.1 -0.70% 471.omnetpp 18.8 18.9 0.53% 473.astar 16.6 17.0 2.53% --- 483.xalancbmk 27.4 28.5 3.79% --- 999.specrand 94.9 98.4 3.71% --- 450.soplex 34.5 33.8 -2.00% 447.dealII 32.0 31.9 -0.34% 453.povray 25.9 28.3 9.02% --- 482.sphinx3 32.6 31.4 -3.50% 2. SPEC2k FDO before after Improvement -- 164.gzip 1308 1372 4.95% 175.vpr 1723 1805 4.76% 176.gcc 2407 2504 4.01% 181.mcf 1724 1748 1.38% 186.crafty 2292 2349 2.47% 197.parser 1457 1601 9.88% 252.eon 2557 2588 1.22% 253.perlbmk 2479 2574 3.83% 254.gap 1996 2013 0.84% 255.vortex 2683 2798 4.31% 256.bzip2 1833 1829 -0.26% 300.twolf 2321 2359 1.63% 188.ammp 771 766 -0.72% 183.equake 1071 1071 0.05% 179.art 2954 2979 0.85% 3. SPEC2k LIPO: before after Improvement - 164.gzip 1311 1405 7.18% 175.vpr 1732 1772 2.35% 176.gcc 2462 2559 3.96% 181.mcf 1723 1731 0.50% 186.crafty
Re: New options to disable/enable any pass for any functions (issue4550056)
On Thu, May 19, 2011 at 11:04 AM, Andi Kleen a...@firstfloor.org wrote: davi...@google.com (David Li) writes: -fdisable-tree-ccp1 --- disable ccp1 for all functions -fenable-tree-cunroll=1 --- enable complete unroll for the function whose cgraphnode uid is 1 -fdisable-rtl-gcse2=1:100,300,400:1000 -- disable gcse2 for functions at the following ranges [1,1], [300,400], and [400,1000] How are the ranges defined? I doubt numbers are a good interface here to specify functions. The numbers are cgraph uids for the functions. The form that takes the range is mainly for developers who use scripts to auto search (bug triaging and optimization space traverse). This would be better done with #pragmas? This is not good for automation. However, I do plan (later after this patch is in) to add another form of the option which takes a comma separated list of assembler names --- this form will be quite useful to workaround compiler bugs temporarily in the make file. David -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [google] Increase inlining limits with FDO/LIPO
9.91% 256.bzip218331864 1.67% 300.twolf23212371 2.15% 188.ammp 771 771 -0.04% 183.equake10711081 1.02% 179.art29542878 -2.59% 5. SPEC2k LIPO vs FDO after the change: FDO(after)LIPO(after) Improvement 164.gzip13721405 2.36% 175.vpr18051772 -1.79% 176.gcc25042559 2.21% 181.mcf17481731 -0.94% 186.crafty23492662 13.35% 197.parser16011671 4.38% 252.eon25883000 15.88% 253.perlbmk25742611 1.44% 254.gap20132152 6.87% 255.vortex27983719 32.88% 256.bzip218291935 5.79% 300.twolf23592471 4.75% 188.ammp 766 774 1.10% 183.equake10711081 0.92% 179.art29792884 -3.18% On Wed, May 18, 2011 at 12:53 PM, Mark Heffernan meh...@google.com wrote: Verified identical binaries created and submitted. Mark On Wed, May 18, 2011 at 11:37 AM, Xinliang David Li davi...@google.com wrote: Ok with that change to google/main with some retesting. David On Wed, May 18, 2011 at 11:34 AM, Mark Heffernan meh...@google.com wrote: On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com wrote: The new change won't help those. Your original place will be ok if you test profile_arcs and branch_probability flags. Ah, yes. I see your point now. Reverted to the original change with condition profile_arc_flag and flag_branch_probabilities. Mark David On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote: On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com wrote: To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). I changed the location where these parameters are set to someplace more appropriate (to where the flags are set when profile gen/use is indicated). Verified identical binaries are generated. OK as updated? Mark 2011-05-18 Mark Heffernan meh...@google.com * opts.c (set_profile_parameters): New function. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ opts-x_help_columns, opts, lang_mask); } + +/* Set parameters to more appropriate values when profile information + is available. */ +static void +set_profile_parameters (struct gcc_options *opts, + struct gcc_options *opts_set) +{ + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); +} + + /* Handle target- and language-independent options. Return zero to generate an unknown option message. Only options that need extra handling need to be listed here; if you simply want @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options opts-x_flag_unswitch_loops = value; if (!opts_set-x_flag_gcse_after_reload) opts-x_flag_gcse_after_reload = value; + set_profile_parameters (opts, opts_set); break; case OPT_fprofile_generate_: @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options is done. */ if (!opts_set-x_flag_ipa_reference in_lto_p) opts-x_flag_ipa_reference = false; + set_profile_parameters (opts, opts_set); break; case OPT_fshow_column:
Re: New options to disable/enable any pass for any functions (issue4550056)
On Thu, May 19, 2011 at 11:17 AM, Andi Kleen a...@firstfloor.org wrote: On Thu, May 19, 2011 at 11:10:24AM -0700, Xinliang David Li wrote: On Thu, May 19, 2011 at 11:04 AM, Andi Kleen a...@firstfloor.org wrote: davi...@google.com (David Li) writes: -fdisable-tree-ccp1 --- disable ccp1 for all functions -fenable-tree-cunroll=1 --- enable complete unroll for the function whose cgraphnode uid is 1 -fdisable-rtl-gcse2=1:100,300,400:1000 -- disable gcse2 for functions at the following ranges [1,1], [300,400], and [400,1000] How are the ranges defined? I doubt numbers are a good interface here to specify functions. The numbers are cgraph uids for the functions. The form that takes the range is mainly for developers who use scripts to auto search (bug triaging and optimization space traverse). How about function names? Not so easy for things like binary search. This would be better done with #pragmas? This is not good for automation. However, I do plan (later after this Why not? It should be easy enough to write a script to add such pragmas given a dwarf2 symbol-file:line dump This may require changes many many sources files as compared to a simple command line tweak -- for most of the cases, you don't even need to know the id ranges, and can be blindly set to a large initial value. I think pragmas would be a lot more useful for non compiler developers. As compared with the -fdisable-tree-xxx=func1,func2? The semantics of the pragma is also in question -- does it apply to all inline instances of the function with pragma or just the emitted instance? (The pragma scheme can be implemented independently if it is considered useful). Thanks, David -Andi
Re: [google] Increase inlining limits with FDO/LIPO
To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). David On Tue, May 17, 2011 at 10:50 PM, Mark Heffernan meh...@google.com wrote: This small patch greatly expands the function size limits for inlining with FDO/LIPO. With profile information, the inliner is much more selective and precise and so the limits can be increased with less worry that functions and total code size will blow up. This speeds up x86-64 internal benchmarks by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to 1.5% with FDO. Size increase is negligible (0.1% mean). Bootstrapped and regression tested on x86-64. Trunk testing to follow. Ok for google/main? Mark 2011-05-17 Mark Heffernan meh...@google.com * opts.c (finish_options): Increase inlining limits with profile generate and use. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts opts-x_flag_split_stack = 0; } } + + if (opts-x_flag_profile_use + || opts-x_profile_arc_flag + || opts-x_flag_profile_values) + { + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. Changes must be added to both + the generate and use builds to avoid profile mismatches. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); + } }
Re: [google] Increase inlining limits with FDO/LIPO
Though not common, people can do this: 1. for profile gen: gcc -fprofile-arcs ... 2. for profile use gcc -fbranch-probabilities ... The new change won't help those. Your original place will be ok if you test profile_arcs and branch_probability flags. David On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote: On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com wrote: To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). I changed the location where these parameters are set to someplace more appropriate (to where the flags are set when profile gen/use is indicated). Verified identical binaries are generated. OK as updated? Mark 2011-05-18 Mark Heffernan meh...@google.com * opts.c (set_profile_parameters): New function. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ opts-x_help_columns, opts, lang_mask); } + +/* Set parameters to more appropriate values when profile information + is available. */ +static void +set_profile_parameters (struct gcc_options *opts, + struct gcc_options *opts_set) +{ + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); +} + + /* Handle target- and language-independent options. Return zero to generate an unknown option message. Only options that need extra handling need to be listed here; if you simply want @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options opts-x_flag_unswitch_loops = value; if (!opts_set-x_flag_gcse_after_reload) opts-x_flag_gcse_after_reload = value; + set_profile_parameters (opts, opts_set); break; case OPT_fprofile_generate_: @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options is done. */ if (!opts_set-x_flag_ipa_reference in_lto_p) opts-x_flag_ipa_reference = false; + set_profile_parameters (opts, opts_set); break; case OPT_fshow_column:
Re: [google] Increase inlining limits with FDO/LIPO
Ok with that change to google/main with some retesting. David On Wed, May 18, 2011 at 11:34 AM, Mark Heffernan meh...@google.com wrote: On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com wrote: The new change won't help those. Your original place will be ok if you test profile_arcs and branch_probability flags. Ah, yes. I see your point now. Reverted to the original change with condition profile_arc_flag and flag_branch_probabilities. Mark David On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote: On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com wrote: To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). I changed the location where these parameters are set to someplace more appropriate (to where the flags are set when profile gen/use is indicated). Verified identical binaries are generated. OK as updated? Mark 2011-05-18 Mark Heffernan meh...@google.com * opts.c (set_profile_parameters): New function. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ opts-x_help_columns, opts, lang_mask); } + +/* Set parameters to more appropriate values when profile information + is available. */ +static void +set_profile_parameters (struct gcc_options *opts, + struct gcc_options *opts_set) +{ + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); +} + + /* Handle target- and language-independent options. Return zero to generate an unknown option message. Only options that need extra handling need to be listed here; if you simply want @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options opts-x_flag_unswitch_loops = value; if (!opts_set-x_flag_gcse_after_reload) opts-x_flag_gcse_after_reload = value; + set_profile_parameters (opts, opts_set); break; case OPT_fprofile_generate_: @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options is done. */ if (!opts_set-x_flag_ipa_reference in_lto_p) opts-x_flag_ipa_reference = false; + set_profile_parameters (opts, opts_set); break; case OPT_fshow_column:
Re: New options to disable/enable any pass for any functions (issue4550056)
Thanks for the comment. Will fix those. David On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 18 May 2011, David Li wrote: + error (Unrecognized option %s, is_enable ? -fenable : -fdisable); + error (Unknown pass %s specified in %s, + phase_name, + is_enable ? -fenable : -fdisable); Follow GNU Coding Standards for diagnostics (start with lowercase letter). + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Use separate calls to inform for the enable and disable cases, so that full sentences can be extracted for translation. + error (Invalid range %s in option %s, + one_range, + is_enable ? -fenable : -fdisable); GNU Coding Standards. + error (Invalid range %s in option %s, Likewise. + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Again needs GCS and i18n fixes. -- Joseph S. Myers jos...@codesourcery.com
Re: New options to disable/enable any pass for any functions (issue4550056)
Will fix the Changelog, and add documentation. Thanks, David On Wed, May 18, 2011 at 1:26 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 18, 2011 at 8:37 PM, David Li davi...@google.com wrote: In gcc, not all passes have user level control to turn it on/off, and there is no way to flip on/off the pass for a subset of functions. I implemented a generic option handling scheme in gcc to allow disabling/enabling any gcc pass for any specified function(s). The new options will be very useful for things like performance experiments and bug triaging (gcc has dbgcnt mechanism, but not all passes have the counter). The option syntax is very similar to -fdump- options. The following are some examples: -fdisable-tree-ccp1 --- disable ccp1 for all functions -fenable-tree-cunroll=1 --- enable complete unroll for the function whose cgraphnode uid is 1 -fdisable-rtl-gcse2=1:100,300,400:1000 -- disable gcse2 for functions at the following ranges [1,1], [300,400], and [400,1000] -fdisable-tree-einline -- disable early inlining for all callers -fdisable-ipa-inline -- disable ipa inlininig In the gcc dumps, the uid numbers are displayed in the function header. The options are intended to be used internally by gcc developers. Ok for trunk ? (There is a little LIPO specific change that can be removed). David 2011-05-18 David Li davi...@google.com * final.c (rest_of_clean_state): Call function header dumper. * opts-global.c (handle_common_deferred_options): Handle new options. * tree-cfg.c (gimple_dump_cfg): Call function header dumper. * passes.c (register_one_dump_file): Call register_pass_name. (pass_init_dump_file): Call function header dumper. (execute_one_pass): Check explicit enable/disable flag. (passr_hash): New function. (passr_eq): (register_pass_name): (get_pass_by_name): (pass_hash): (pass_eq): (enable_disable_pass): (is_pass_explicitly_enabled_or_disabled): (is_pass_explicitly_enabled): (is_pass_explicitly_disabled): Bogus changelog entry. New options need documenting in doc/invoke.texi. Richard. Index: tree-pass.h === --- tree-pass.h (revision 173635) +++ tree-pass.h (working copy) @@ -644,4 +644,12 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void enable_disable_pass (const char *, bool); +extern bool is_pass_explicitly_disabled (struct opt_pass *, tree); +extern bool is_pass_explicitly_enabled (struct opt_pass *, tree); +extern void register_pass_name (struct opt_pass *, const char *); +extern struct opt_pass *get_pass_by_name (const char *); +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); + #endif /* GCC_TREE_PASS_H */ Index: final.c === --- final.c (revision 173635) +++ final.c (working copy) @@ -4456,19 +4456,7 @@ rest_of_clean_state (void) } else { - const char *aname; - struct cgraph_node *node = cgraph_node (current_function_decl); - - aname = (IDENTIFIER_POINTER - (DECL_ASSEMBLER_NAME (current_function_decl))); - fprintf (final_output, \n;; Function (%s) %s\n\n, aname, - node-frequency == NODE_FREQUENCY_HOT - ? (hot) - : node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED - ? (unlikely executed) - : node-frequency == NODE_FREQUENCY_EXECUTED_ONCE - ? (executed once) - : ); + pass_dump_function_header (final_output, current_function_decl, cfun); flag_dump_noaddr = flag_dump_unnumbered = 1; if (flag_compare_debug_opt || flag_compare_debug) Index: common.opt === --- common.opt (revision 173635) +++ common.opt (working copy) @@ -1018,6 +1018,14 @@ fdiagnostics-show-option Common Var(flag_diagnostics_show_option) Init(1) Amend appropriate diagnostic messages with the command line option that controls them +fdisable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass + +fenable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass + fdump- Common Joined RejectNegative Var(common_deferred_options) Defer -fdump-type Dump various compiler internals to a file Index: opts-global.c === ---
Re: [google] Parameterize function overhead estimate for inlining
You will have a followup patch to override arm defaults, right? Ok for google/main. Thanks, David On Tue, May 17, 2011 at 9:29 PM, Mark Heffernan meh...@google.com wrote: This tiny change improves the size estimation for inlining and results in an average 1% size reduction and a small (maybe 0.25% geomean) performance increase on internal benchmarks on x86-64. I parameterized the value rather than changing it directly because previous exploration with x86 and ARM arches indicated that it varies significantly with architecture. Default value is tuned for x86-64. Bootstrapped and tested on x86-64. Will explore relevance and effectiveness for trunk and SPEC later. Ok for google/main? Mark 2011-05-17 Mark Heffernan meh...@google.com * ipa-inline.c (estimate_function_body_sizes): Parameterize static function static overhead. * params.def (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE): New parameter. Index: ipa-inline.c === --- ipa-inline.c (revision 173845) +++ ipa-inline.c (working copy) @@ -1979,10 +1979,11 @@ estimate_function_body_sizes (struct cgr gcov_type time = 0; gcov_type time_inlining_benefit = 0; /* Estimate static overhead for function prologue/epilogue and alignment. */ - int size = 2; + int size = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); /* Benefits are scaled by probability of elimination that is in range 0,2. */ - int size_inlining_benefit = 2 * 2; + int size_inlining_benefit = + PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE) * 2; basic_block bb; gimple_stmt_iterator bsi; struct function *my_function = DECL_STRUCT_FUNCTION (node-decl); Index: params.def === --- params.def (revision 173845) +++ params.def (working copy) @@ -110,6 +110,11 @@ DEFPARAM (PARAM_MIN_INLINE_RECURSIVE_PRO Inline recursively only when the probability of call being executed exceeds the parameter, 10, 0, 0) +DEFPARAM (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE, + inline-function-overhead-size, + Size estimate of function overhead (prologue and epilogue) for inlining purposes, + 7, 0, 0) + /* Limit of iterations of early inliner. This basically bounds number of nested indirect calls early inliner can resolve. Deeper chains are still handled by late inlining. */
Re: [google] support for building Linux kernel with FDO (issue4523061)
On Fri, May 13, 2011 at 5:54 AM, Paolo Bonzini bonz...@gnu.org wrote: On 05/13/2011 03:03 AM, Rong Xu wrote: * gcc/coverage.c (revision 173717): set a flag if building for Linux kernel. * gcc/tree-profile.c (revision 173717): don't emit TLS declarations for Linux kernel builds. I think this should be done without touching at all the profiling machinery in GCC. 1) add a new TLS model -ftls-model=none and make the kernel uses it. The model would simply force targetm.have_tls to false. This is a good idea. 2) as Richi mentioned, gcov-io and libgcov changes then can move to the kernel, and GCC needs no change at all here. In fact -- reuse gcc code profiling machinery for FDO is the KEY objective in this effort --- the effort tries to minimize gcc changes by refactoring gcc code and isolating/abstracting away part of gcc implementation that is user space program specific without using runtime hooks. Aside from the kernel FDO change -- the refactoring itself actually makes the libgcov code more readable -- the existing implementation has many huge functions etc. Kernel source has their implementation of coverage testing -- but it makes lots of data structure assumptions and hard coded -- it can easily out of sync with gcc and is considered unmaintainable. Rong will have more explanation on the design. Thanks, David BTW, these parts of LIPO: + if (!is_kernel_build) + DECL_TLS_MODEL (dc_gcov_type_ptr_var) = + decl_default_tls_model (dc_gcov_type_ptr_var); dc_void_ptr_var = build_decl (UNKNOWN_LOCATION, VAR_DECL, @@ -1488,8 +1493,9 @@ ptr_void); DECL_ARTIFICIAL (dc_void_ptr_var) = 1; DECL_EXTERNAL (dc_void_ptr_var) = 1; - DECL_TLS_MODEL (dc_void_ptr_var) = - decl_default_tls_model (dc_void_ptr_var); + if (!is_kernel_build) + DECL_TLS_MODEL (dc_void_ptr_var) = + decl_default_tls_model (dc_void_ptr_var); Probably are missing a !targetm.have_tls. Paolo
[google] temporary fix to lipo build problems using check enabled compiler
The following patch temporarily disable some of the checking which is not fully 'lipo' aware. It will be checked into google/main and further cleanups will follow. David 2011-05-10 David Li davi...@google.com * cgraphunit.c (revision 173635) (verify_cgraph_node): (cgraph_mark_functions_to_output): * tree-cfg.c (revision 173635) (verify_stmt): Index: cgraphunit.c === --- cgraphunit.c (revision 173635) +++ cgraphunit.c (working copy) @@ -480,6 +480,10 @@ verify_cgraph_node (struct cgraph_node * if (seen_error ()) return; + /* Disable checking for LIPO for now. */ + if (L_IPO_COMP_MODE) +return; + timevar_push (TV_CGRAPH_VERIFY); for (e = node-callees; e; e = e-next_callee) if (e-aux) @@ -1315,7 +1319,8 @@ cgraph_mark_functions_to_output (void) are inside partition, we can end up not removing the body since we no longer have analyzed node pointing to it. */ !node-in_other_partition - !(DECL_EXTERNAL (decl) || cgraph_is_aux_decl_external (node))) + !(DECL_EXTERNAL (decl) || cgraph_is_aux_decl_external (node)) + !L_IPO_COMP_MODE) { dump_cgraph_node (stderr, node); internal_error (failed to reclaim unneeded function); @@ -1331,7 +1336,7 @@ cgraph_mark_functions_to_output (void) } #ifdef ENABLE_CHECKING - if (check_same_comdat_groups) + if (check_same_comdat_groups !L_IPO_COMP_MODE) for (node = cgraph_nodes; node; node = node-next) if (node-same_comdat_group !node-process) { Index: tree-cfg.c === --- tree-cfg.c (revision 173635) +++ tree-cfg.c (working copy) @@ -4159,6 +4159,10 @@ verify_stmt (gimple_stmt_iterator *gsi) gimple stmt = gsi_stmt (*gsi); int lp_nr; + /* TODO: Disable for now. */ + if (L_IPO_COMP_MODE) +return false; + if (is_gimple_omp (stmt)) { /* OpenMP directives are validated by the FE and never operated
Re: [google]Implement an optimization based on reuse distance profiling (issue4517049)
Ok. The instrumentation and optimization runtime has not been open sourced yet -- will need to be done later at some point. (For reference, see Silvius's CGO2011 paper). Thanks, David On Mon, May 9, 2011 at 2:49 PM, Easwaran Raman era...@google.com wrote: This patch by Silvius Rus replaces calls to certain functions with a specialized version that uses non-temporal stores based on memory reuse distance profiling. Bootstraps, no test regressions and the profiling works for a small test case. Ok for google/main.? -Easwaran 2011-05-09 Silvius Rus silvius@gmail.com * value-prof.c (interesting_stringop_to_profile_p): Disbale stringop profiling if reuse distance profiling is turned on. * value-prof.h (gimple_gen_reusedist, optimize_reusedist): Declare. * gcov-io.h: (GCOV_COUNTER_REUSE_DIST): New counter. (GCOV_COUNTERS): Update. (GCOV_COUNTER_NAMES): Add reuse_distance. (GCOV_MERGE_FUNCTIONS): Add __gcov_merge_reusedist. (__gcov_merge_reusedist): New declaration. * profile.c (branch_prob): Enable reuse distance profiling and optimization. * common.opt (fprofile-reusedist, foptimize-locality): New options. * tree-profile.c: Include params.h. (stringop_subst, reusedist_t): New structures. (stringop_subst_t, reusedist_t): New typedefs. (stringop_decl): New global array. (RD_NUM_COUNTERS): New constant. (get_stringop_subst, reusedist_is_interesting_call) (reusedist_instr_func_name, reusedist_get_instr_decl) (reusedist_make_instr_call, reusedist_from_counters) (gimple_gen_reusedist, nt_op_name, reusedist_get_size_threshold) (reusedist_get_distance_large_threshold) (reusedist_get_distance_small_threshold) (reusedist_get_count_threshold, reusedist_nt_is_worth_it) (reusedist_get_nt_decl, maybe_issue_profile_use_note) (reusedist_maybe_replace_with_nt_version, optimize_reusedist): New functions. (gate_tree_profile_ipa): Return true if reuse distance instrumentation or use is turned on. * Makefile.in (LIBGCOV): Add _gcov_merge_reusedist. * libgcov.c (__gcov_weighted_mean2, __gcov_merge_reusedist): New functions. * params.def (PARAM_REUSEDIST_MEAN_DIST_LARGE_THRESH) (PARAM_REUSEDIST_MEAN_DIST_SMALL_THRESH) (PARAM_REUSEDIST_CALL_COUNT_THRESH, PARAM_REUSEDIST_MEMCPY_SIZE_THRESH) (PARAM_REUSEDIST_MEMSET_SIZE_THRESH): New params. Index: gcc/value-prof.c === --- gcc/value-prof.c (revision 173496) +++ gcc/value-prof.c (working copy) @@ -1708,6 +1708,14 @@ interesting_stringop_to_profile_p (tree fndecl, gi { enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl); + /* Disable stringop collection with reuse distance instrumentation + or optimization. Otherwise we end up with hard to correct profile + mismatches for functions where reuse distance-based transformation are + made. We see a number of memcpy at instrumentation time and a different + number of memcpy at profile use time. */ + if (flag_profile_reusedist || flag_optimize_locality) + return false; + if (fcode != BUILT_IN_MEMCPY fcode != BUILT_IN_MEMPCPY fcode != BUILT_IN_MEMSET fcode != BUILT_IN_BZERO) return false; Index: gcc/value-prof.h === --- gcc/value-prof.h (revision 173496) +++ gcc/value-prof.h (working copy) @@ -103,6 +103,8 @@ extern void gimple_gen_const_delta_profiler (histo unsigned, unsigned); extern void gimple_gen_average_profiler (histogram_value, unsigned, unsigned); extern void gimple_gen_ior_profiler (histogram_value, unsigned, unsigned); +extern void gimple_gen_reusedist (void); +extern void optimize_reusedist (void); /* In profile.c. */ extern void init_branch_prob (void); Index: gcc/gcov-io.h === --- gcc/gcov-io.h (revision 173496) +++ gcc/gcov-io.h (working copy) @@ -374,7 +374,8 @@ typedef HOST_WIDEST_INT gcov_type; #define GCOV_LAST_VALUE_COUNTER 8 /* The last of counters used for value profiling. */ #define GCOV_COUNTER_DIRECT_CALL 9 /* Direct call counts. */ -#define GCOV_COUNTERS 10 +#define GCOV_COUNTER_REUSE_DIST 10 /* Reuse distance measure. */ +#define GCOV_COUNTERS 11 /* Number of counters used for value profiling. */ #define GCOV_N_VALUE_COUNTERS \ @@ -383,7 +384,8 @@ typedef HOST_WIDEST_INT gcov_type; /* A list of human readable names of the counters */ #define GCOV_COUNTER_NAMES {arcs, interval, pow2, single, \ delta,indirect_call, average, ior, \ -
Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078)
On Sat, May 7, 2011 at 5:46 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, May 6, 2011 at 7:57 PM, Xinliang David Li davi...@google.com wrote: I want propose a more general solution. 1) Generic Annotation Support for gcc IR -- it is used attach to application/optimization specific annotation to gimple statements and annotations can be passed around across passes. In gcc, I only see HISTOGRAM annotation for value profiling, which is not general enough 2) Support of CallInfo for each callsite. This is an annotation, but more standardized. The callinfo can be used to record information such as call attributes, call side effects, mod-ref information etc --- current gimple_call_flags can be folded into this Info structure. I don't like generic annotation facilities. What should passes to with annotated stmts that are a) transformed, b) removed? See RTL notes and all the interesting issues they cause. Then how do you store information that needs to be passed across optimization passes -- you can not possibly dump all of them into the core IR. In fact, anything that is derived from (via analysis) but not part of the core IR need to worry about update and maintenance. In current GIMPLE, we can find many such instances -- DU chains, Memory SSA, control flow information, as well as flags like visited, no_warning, PLF (?), etc. Have a unified way of representing them is a good thing so that 1) make the IR lean and mean; 2) avoid too many different side data structures. The important thing is to have a good verifier to catch insanity and inconsistency of the annotation after each pass. Well, as soon as you have a verifier it's no longer generic but _is_ part of the core IL. Similar to how EH info is part of the core IL, or alias info, or loop information. The difference is that no-core IL can be thrown away anytime and be recomputed if needed. David Richard.