Re: new patches using -fopt-info (issue5294043)

2011-10-24 Thread Xinliang David Li
 Well, you seem to keep not reading what I write.  I am not opposed
 to adding -fopt-info/report nor to funnel messages to stdout/err.  What
 I am opposed is the way you want to introduce them.  I want you to
 fix what we dump into dump files, so that both -fopt-report and -fopt-info
 can be implemented by outputting selected pieces of the dump file
 to stdout/stderr.  We already have -fdump-*-stats which supposedly
 could match -fopt-report, and the default -fdump-* should be what
 goes to -fopt-info (minus the function bodies, of course).

That sounds good. What you propose seems like

-fdump-pass-[ir_only|transformation|debug]-stderr

and -fopt-info is a short cut for
-fdump-tree-all-transformations-stderr
-fdump-ipa-all-tranformations-stderr
-fdump-rtl-all-transformations-stderr


thanks,

David






 Yes, dump files are a mess.  So - why not clean them up, and at the
 same time annotate dump file pieces so _automatic_ filtering and
 redirecting to stdout with something like -fopt-report would do something
 sensible?  I don't see why dump files have to stay messy while you at
 the same time would need to add _new_ code to dump to stdout for
 -fopt-report.

 In my mind, I would like to separate all dumps into three categories.

 1) IR dumps, and support dump before and after (this reminds me my
 patches are still pending :) )    -fdump-tree-pre-[before|after]-
  Dump into .after, .before files
 2) debug tracing etc:        -fdump-tree-pre-debug-...          Dump
 into .debug files.
 3) opt report : -fdump-opt or -fopt-report

 Changes for 1) and 2) are mechanic but requires lots of work.

 You can do that, but I want the passes to use a single mechanism to
 feed all three separated dumps.


 Can you elaborate on single mechanism here? A set of well defined
 dumping APIs (instead of free form of  if (dump_file) fprintf
 (dump_file, ...) ) ?

 Well, design one that will work.  But yes, a set of well-defined
 dumping APIs, like

  print_start_{loop,location,region,...} (...);
  print_end_{loop...} (...);

 or so.

   debug_print (message, dump_flags, message_verbose_level, ...)

 Rather instead of verbosity levels use TDF_* flags (with maybe
 reorganizing them a bit) internally, a verbosity level can be
 implemented ontop of that by -fopt-{info,report} if needed.

   trace_enter (trace_header_note)
   trace_exit (trace_header_not)
   opt_info_print (location, message_template, insertion)

 Or  how dump files are organized?

 I am all for clean up of dumping, but I don't see how -fopt-info get
 in the way of that.

 In the way?  It is a prerequesite to both -fopt-info and -fopt-report.
 Otherwise you will end up adding _additional_ dumping to passes.
 Which is what I very very much object to.  You can transition
 to the common dump API incrementally and only handle the passes
 you care for initially.

 But anything else from a common mechanism isn't going to be
 maintainable.


 So, no, please do it the right way that benefits both compiler developers
 and your power users.

 And yes, the right way is not to start adding that -fopt-report switch.
 The right way is to make dump-files consumable by mere mortals first.

 I agree we need to do the right way which needs to be discussed first.
 I would argue that mere mortals will really appreciate opt-info
 (separate from dump file and opt-report).

 Well, still what you print with opt-info should be better also be present
 with opt-report and in dump files.  Thus it all boils down to be able
 to filter what passes put in their dump files.

 opt-report is different (needs to buffer information and dumping at
 the end of compilation).

 Why at the end of compilation?  Passes already collect info for
 -stats dumping.  What would -fopt-report print?  Something like

 note: I have reduced size of your binary by 90%
 note: You should improve your programming skills

 ?  Let's put -fopt-report aside for now as I don't have the slightest
 idea what it should be.

   Dump files and fopt-info can share the same
 dumping format -- whatever gets emitted by opt-info should also be
 emitted in the dump file (or replace the less well formated
 transformation messages that are already available in dump files),
 however simply filering the dump info does not solve the scalabilty
 issue I mentioned.

 What scalability issue?  I see a maintainance issue and a code
 readability issue.

 Richard.

 thanks,

 David


 Richard.

 thanks,

 David


 Thanks,
 Richard.


 Thanks,

 David


 So, please fix dump-files instead.  And for coverage/profiling, fill
 in stuff in a dump-file!

 Richard.

 It would be interested to have some warnings about missing SRA
 opportunities in =1 or =2. I found that sometimes fixing those can 
 give a
 large speedup.

 Right now a common case that prevents SRA on structure field
 is simply a memset or memcpy.

 -Andi


 --
 a...@linux.intel.com -- Speaking for myself only










Re: new patches using -fopt-info (issue5294043)

2011-10-23 Thread Xinliang David Li
On Sun, Oct 23, 2011 at 3:18 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Fri, Oct 21, 2011 at 6:48 PM, Xinliang David Li davi...@google.com wrote:
 There are two proposals here. One is -fopt-info which prints out
 informational notes to stderr, and the other is -fopt-report which is
 more elaborate form of dump files. Are you object to both or just the
 opt-report one?

 What?  I'm objected to adding _two_ variants.  Didn't even realize
 you proposed that.

They are different -- -fopt-info is on the fly -- the notes are
emitted as the transformations are done while -fopt-report is for more
structured report so it requires more compiler changes.  Bringing in
-fopt-report is a little distraction as the main discussion is on
-fopt-info.


  The former is no different from any other
 informational notes we already have -- the only difference is that
 they are suppressed by default.

 We do not have many informational notes, so it is different.

Why different? opt information notes are not even emitted by default.


    ..
  ...

 I very well understand the intent.  But I disagree with where you start
 to implement this.  Dump files are _not_ only for developers - after
 all we don't have anything else.  -fopt-report can get as big and 
 unmanagable
 to read as dump files - in fact I argue it will be worse than dump files if
 you go beyond very very coarse reporting.

 The problem of using dump files for optimization report is that all
 optimization decisions are 'distributed' in phase specific dumps file.
 For a whole program report, the number of files that are created is
 not manageable (think about a program with 4000 sources each dumping
 200 files).  If we create a dummy pass and suck in all optimization
 decisions in that pass's dump file -- it will be no different from
 opt-report.

 Well, -fopt-whatever will just funnel selected pieces also to stderr.
 I object to duplicate dumping when we just need a way to filter
 what goes to dump files.


that is the main point -- using dump files are not scalable. If you
are just against using stderr and propose dumping the selected
information into a single shared dump file per build, I don't see the
difference with using stderr -- they are not emitted by default and
won't contaminate the build log.



 Yes, dump files are a mess.  So - why not clean them up, and at the
 same time annotate dump file pieces so _automatic_ filtering and
 redirecting to stdout with something like -fopt-report would do something
 sensible?  I don't see why dump files have to stay messy while you at
 the same time would need to add _new_ code to dump to stdout for
 -fopt-report.

 In my mind, I would like to separate all dumps into three categories.

 1) IR dumps, and support dump before and after (this reminds me my
 patches are still pending :) )    -fdump-tree-pre-[before|after]-
  Dump into .after, .before files
 2) debug tracing etc:        -fdump-tree-pre-debug-...          Dump
 into .debug files.
 3) opt report : -fdump-opt or -fopt-report

 Changes for 1) and 2) are mechanic but requires lots of work.

 You can do that, but I want the passes to use a single mechanism to
 feed all three separated dumps.


Can you elaborate on single mechanism here? A set of well defined
dumping APIs (instead of free form of  if (dump_file) fprintf
(dump_file, ...) ) ?

   debug_print (message, dump_flags, message_verbose_level, ...)
   trace_enter (trace_header_note)
   trace_exit (trace_header_not)
   opt_info_print (location, message_template, insertion)

Or  how dump files are organized?

I am all for clean up of dumping, but I don't see how -fopt-info get
in the way of that.


 So, no, please do it the right way that benefits both compiler developers
 and your power users.

 And yes, the right way is not to start adding that -fopt-report switch.
 The right way is to make dump-files consumable by mere mortals first.

 I agree we need to do the right way which needs to be discussed first.
 I would argue that mere mortals will really appreciate opt-info
 (separate from dump file and opt-report).

 Well, still what you print with opt-info should be better also be present
 with opt-report and in dump files.  Thus it all boils down to be able
 to filter what passes put in their dump files.

opt-report is different (needs to buffer information and dumping at
the end of compilation).   Dump files and fopt-info can share the same
dumping format -- whatever gets emitted by opt-info should also be
emitted in the dump file (or replace the less well formated
transformation messages that are already available in dump files),
however simply filering the dump info does not solve the scalabilty
issue I mentioned.

thanks,

David


 Richard.

 thanks,

 David


 Thanks,
 Richard.


 Thanks,

 David


 So, please fix dump-files instead.  And for coverage/profiling, fill
 in stuff in a dump-file!

 Richard.

 It would be interested to have some warnings about missing SRA
 opportunities

Re: new patches using -fopt-info (issue5294043)

2011-10-21 Thread Xinliang David Li
There are two proposals here. One is -fopt-info which prints out
informational notes to stderr, and the other is -fopt-report which is
more elaborate form of dump files. Are you object to both or just the
opt-report one?  The former is no different from any other
informational notes we already have -- the only difference is that
they are suppressed by default.

    ..
  ...

 I very well understand the intent.  But I disagree with where you start
 to implement this.  Dump files are _not_ only for developers - after
 all we don't have anything else.  -fopt-report can get as big and unmanagable
 to read as dump files - in fact I argue it will be worse than dump files if
 you go beyond very very coarse reporting.

The problem of using dump files for optimization report is that all
optimization decisions are 'distributed' in phase specific dumps file.
For a whole program report, the number of files that are created is
not manageable (think about a program with 4000 sources each dumping
200 files).  If we create a dummy pass and suck in all optimization
decisions in that pass's dump file -- it will be no different from
opt-report.


 Yes, dump files are a mess.  So - why not clean them up, and at the
 same time annotate dump file pieces so _automatic_ filtering and
 redirecting to stdout with something like -fopt-report would do something
 sensible?  I don't see why dump files have to stay messy while you at
 the same time would need to add _new_ code to dump to stdout for
 -fopt-report.

In my mind, I would like to separate all dumps into three categories.

1) IR dumps, and support dump before and after (this reminds me my
patches are still pending :) )-fdump-tree-pre-[before|after]-
 Dump into .after, .before files
2) debug tracing etc:-fdump-tree-pre-debug-...  Dump
into .debug files.
3) opt report : -fdump-opt or -fopt-report

Changes for 1) and 2) are mechanic but requires lots of work.


 So, no, please do it the right way that benefits both compiler developers
 and your power users.

 And yes, the right way is not to start adding that -fopt-report switch.
 The right way is to make dump-files consumable by mere mortals first.

I agree we need to do the right way which needs to be discussed first.
I would argue that mere mortals will really appreciate opt-info
(separate from dump file and opt-report).

thanks,

David


 Thanks,
 Richard.


 Thanks,

 David


 So, please fix dump-files instead.  And for coverage/profiling, fill
 in stuff in a dump-file!

 Richard.

 It would be interested to have some warnings about missing SRA
 opportunities in =1 or =2. I found that sometimes fixing those can give a
 large speedup.

 Right now a common case that prevents SRA on structure field
 is simply a memset or memcpy.

 -Andi


 --
 a...@linux.intel.com -- Speaking for myself only






Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Xinliang David Li
On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote:
 x...@google.com (Rong Xu) writes:

 After some off-line discussion, we decided to use a more general approach
 to control the printing of optimization messages/warnings. We will
 introduce a new option -fopt-info:
  * fopt-info=0 or fno-opt-info: no message will be emitted.
  * fopt-info or fopt-info=1: emit important warnings and optimization
    messages with large performance impact.
  * fopt-info=2: warnings and optimization messages targeting power users.
  * fopt-info=3: informational messages for compiler developers.

 This doesn't look scalable if you consider that each pass would print
 as much of a mess like -fvectorizer-verbose=5.

What is not scalable? For level 1 dump, only the summary of
vectorization will be printed just like other loop transformations.


 I think =2 and =3 should be omitted - we do have dump-files for a reason.

Dump files are not easy to use -- it is big, and slow especially for
people with large distributed build systems.  Having both level 2 and
3 is debatable, but it will be useful to have a least one level above
level 1. Dump files are mainly for compiler developers, while
-fopt-info are for compiler developers *and* power users who know
performance tuning.

 Also the coverage/profile cases you changed do not at all match
 ... with large performance impact.  In fact the impact is completely
 unknown (as it would be the case usually).

Impact of any transformations is just 'potential', coverage problems
are no different from that.


 I'd rather have a way to make dump-files more structured (so, following
 some standard reporting scheme) than introducing yet another way
 of output.  [after making dump-files more consistent it will be easy
 to revisit patches like this, there would be a natural general central
 way to implement it]

Yes, I remember we have discussed about this before -- currently dump
files are a big mess -- debug tracing, IR are all mixed up, but as I
said above, this is a different matter -- it is for compiler
developers.

For more structured optimization report, we should use option
-fopt-report which dump optimization information based on category --
the info data base can also be shared across modules:

Example:

[Loop Interchange]
File a, line x,   yyy
File b, line xx, yyy

File c, line z,   It is beneficial to interchange the loop, but not
done because of possible carried dependency (caused by false aliasing
...)

[Loop Vectorization]


[Loop Unroll]
...

[SRA]

[Alias summary]
  [Global Vars]
   a: addr exposed
   b: add not exposed
   ..
  [Global Pointers]
..
  ...


Thanks,

David


 So, please fix dump-files instead.  And for coverage/profiling, fill
 in stuff in a dump-file!

 Richard.

 It would be interested to have some warnings about missing SRA
 opportunities in =1 or =2. I found that sometimes fixing those can give a
 large speedup.

 Right now a common case that prevents SRA on structure field
 is simply a memset or memcpy.

 -Andi


 --
 a...@linux.intel.com -- Speaking for myself only




Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Xinliang David Li
While discussion for trunk version is still going, it is ok for google branches.

thanks,

David

On Wed, Oct 19, 2011 at 4:28 PM, Rong Xu x...@google.com wrote:
 After some off-line discussion, we decided to use a more general approach
 to control the printing of optimization messages/warnings. We will
 introduce a new option -fopt-info:
  * fopt-info=0 or fno-opt-info: no message will be emitted.
  * fopt-info or fopt-info=1: emit important warnings and optimization
   messages with large performance impact.
  * fopt-info=2: warnings and optimization messages targeting power users.
  * fopt-info=3: informational messages for compiler developers.

 2011-10-19   Rong Xu  x...@google.com

        * gcc/common.opt (fopt-info): New flag. (fopt-info=) Ditto.
        * gcc/opts.c (common_handle_option): Handle OPT_fopt_info_.
        * gcc/flag-types.h (opt_info_verbosity_levels): New enum.
        * gcc/value-prof.c (check_ic_counter): guard warnings/notes by
          flag_opt_info.
          (find_func_by_funcdef_no): Ditto.
          (check_ic_target): Ditto.
          (check_counter): Ditto.
          (check_ic_counter): Ditto.
        * gcc/mcf.c (find_minimum_cost_flow): Ditto.
        * gcc/profile.c (read_profile_edge_counts): Ditto.
          (compute_branch_probabilities): Ditto.
        * gcc/coverage.c (read_counts_file): Ditto.
          (get_coverage_counts): Ditto.
        * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto.
          (maybe_issue_profile_use_note): Ditto.
          (optimize_reusedist): Ditto.
        * gcc/testsuite/gcc.dg/pr32773.c: add -fopt-info.
        * gcc/testsuite/gcc.dg/pr40209.c: Ditto.
        * gcc/testsuite/gcc.dg/pr26570.c: Ditto.
        * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.


 Index: gcc/value-prof.c
 ===
 --- gcc/value-prof.c    (revision 180106)
 +++ gcc/value-prof.c    (working copy)
 @@ -472,9 +472,10 @@
               : DECL_SOURCE_LOCATION (current_function_decl);
       if (flag_profile_correction)
         {
 -         inform (locus, correcting inconsistent value profile: 
 -                 %s profiler overall count (%d) does not match BB count 
 -                  (%d), name, (int)*all, (int)bb_count);
 +          if (flag_opt_info = OPT_INFO_MAX)
 +            inform (locus, correcting inconsistent value profile: %s 
 +                   profiler overall count (%d) does not match BB count 
 +                    (%d), name, (int)*all, (int)bb_count);
          *all = bb_count;
          if (*count  *all)
             *count = *all;
 @@ -510,33 +511,42 @@
   location_t locus;
   if (*count1  all  flag_profile_correction)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Correcting inconsistent value profile: 
 -              ic (topn) profiler top target count (%ld) exceeds 
 -             BB count (%ld), (long)*count1, (long)all);
 +      if (flag_opt_info = OPT_INFO_MAX)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Correcting inconsistent value profile: 
 +                  ic (topn) profiler top target count (%ld) exceeds 
 +                  BB count (%ld), (long)*count1, (long)all);
 +        }
       *count1 = all;
     }
   if (*count2  all  flag_profile_correction)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Correcting inconsistent value profile: 
 -              ic (topn) profiler second target count (%ld) exceeds 
 -             BB count (%ld), (long)*count2, (long)all);
 +      if (flag_opt_info = OPT_INFO_MAX)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Correcting inconsistent value profile: 
 +                  ic (topn) profiler second target count (%ld) exceeds 
 +                 BB count (%ld), (long)*count2, (long)all);
 +        }
       *count2 = all;
     }

   if (*count2  *count1)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Corrupted topn ic value profile: 
 -             first target count (%ld) is less than the second 
 -             target count (%ld), (long)*count1, (long)*count2);
 +      if (flag_opt_info = OPT_INFO_MAX)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Corrupted topn ic value profile: 
 +                 first target count (%ld) is less than the second 
 +    

Re: new patches using -fopt-info (issue5294043)

2011-10-20 Thread Xinliang David Li
On Thu, Oct 20, 2011 at 12:53 PM, Andi Kleen a...@firstfloor.org wrote:
 This warnings/notes are caused
 by inconsistent profile due to data race (which is currently common in
 multi-thread programs),

 I never quite understood why the gcov counters are not simply marked
 __thread. This would make the profiled programs faster too because
 they wouldn't bounce cache lines  that much. Especially on larger
 systems (2S) frequent cache line bouncing can lead to extreme slow downs,
 and even on smaller systems it's very expensive.
 This would also eliminate data races, except for signals and somesuch.

It uses stack space and for programs with hundreds and thousands of
threads, it can be a big problem.

David


 -Andi




Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-10-19 Thread Xinliang David Li
On Wed, Oct 19, 2011 at 12:02 PM,  konstantin.s.serebry...@gmail.com wrote:
 Minimized the crash to this:

 struct Foo {
  unsigned bf1:1;
  unsigned bf2:1;
  unsigned bf3:1;
 };

 void foo (struct Foo *ob) {
  ob-bf2 = 1;
 }



 D.2731_4 = ob_1(D)-bf2;
 __asan_base_addr.2 = (long unsigned int) D.2731_4;
 D.2732_5 = __asan_base_addr.2  3;
 D.2733_6 = 1  44;
 D.2734_7 = D.2732_5 + D.2733_6;
 D.2735_8 = VIEW_CONVERT_EXPRchar *(D.2734_7);
 # VUSE .MEM
 __asan_shadow.3 = *D.2735_8;
 D.2737_9 = __asan_shadow.3 != 0;
 D.2738_10 = __asan_base_addr.2  7;
 D.2739_11 = (char) D.2738_10;
 D.2740_12 = D.2739_11 + 0;
 D.2741_13 = D.2740_12 = __asan_shadow.3;
 __asan_crash_cond.4 = D.2737_9  D.2741_13;
 if (__asan_crash_cond.4 != 0)
 ./expand_expr_addr_expr_1_err.c: In function ‘foo’:
 ./expand_expr_addr_expr_1_err.c:8:11: internal compiler error: in
 expand_expr_addr_expr_1, at expr.c:7381



 How do I avoid instrumenting bitfields?

Use get_inner_reference to compute the bitpos, and check if it is
multiple of bits_per_unit.

David


 http://codereview.appspot.com/5272048/



Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-10-19 Thread Xinliang David Li
what kind of failures?

David

On Wed, Oct 19, 2011 at 2:04 PM,  konstantin.s.serebry...@gmail.com wrote:
 On 2011/10/19 20:38:35, kcc wrote:

 Added code to avoid bitfields.

 Is there anything I should know about splitting basic blocks in the
 presence of EH?
 Currently, asan fails on 483.xalancbmk, 453.povray and 450.soplex, which
 are known to have EH.

 http://codereview.appspot.com/5272048/



Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-10-18 Thread Xinliang David Li
On Tue, Oct 18, 2011 at 3:56 PM,  konstantin.s.serebry...@gmail.com wrote:
 On 2011/10/18 22:52:33, davidxl wrote:

 http://codereview.appspot.com/5272048/diff/18001/tree-asan.c
 File tree-asan.c (right):


 http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325

 tree-asan.c:325: base = build_addr (t, current_function_decl);
 There are issues with creating address expressions from TARGET_MEM_REF

 in gcc.

 Since you want compiler to do optimization on instrumented code as

 much as

 possible, asan instrumentation is better done as early as possible

 after ipa

 Why?
 I would actually say that I want the instrumentation to happen as late
 as possible because this will instrument fewer memory accesses.
 For example, asan certainly needs to happen after loop invariants are
 moved out and common subexpressions (including loads) are eliminated.
 No?

yes -- so a good choice would be after PRE and PDE (pre, sink_code)
which should handle most of the loop invariant memory loads.

David


 inlining -- and this will also solve this problem. I tried moving asan

 pass

 before loop opt, and it worked fine.



 http://codereview.appspot.com/5272048/



Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-10-18 Thread Xinliang David Li
It will be weird to put the instrumentation pass inside loop opt,
besides memory loads which are loop invariants and redundant stores in
loop should be handled by pre/pde.

+cc Richard Guenther

You may want to ask the middle-end maintainer to review your code at
this point if you want it to be in trunk.

thanks,

David

On Tue, Oct 18, 2011 at 4:12 PM,  konstantin.s.serebry...@gmail.com wrote:
 yes -- so a good choice would be after PRE and PDE (pre, sink_code)
 which should handle most of the loop invariant memory loads.

 how about pass_lim? I think asan should be after lim.

 http://codereview.appspot.com/5272048/



Re: [google] Suppress FDO-use related notes/warnings (issue5294043)

2011-10-18 Thread Xinliang David Li
On Tue, Oct 18, 2011 at 3:48 PM, Rong Xu x...@google.com wrote:
 Suppress verbose notes/warnings printed in FDO-use compilation.
 (1) Add option -fprofile-use-verbose.

Gcc currently does not emit informational messages on high level
transformations such as inlining, value profiling transformations (ic
promotion info messages are emitted in lipo mode in google/main), loop
peeling and unrolling, and other loop opts. I can see those are very
useful for triaging performance regressions etc, but turning those on
would be too verbose for non-power users and can be annoying. They
(when introduced in the future) should be guarded. I can see they
should be guarded using the same option and possibly with a verbose
level. Something like:

-fopt-info=1,2,3

For now, a nonparameterized option should be good enough.

(similarly, -fopt-report can be used to dump optimization report which
is more structured -- e.g, grouped via optimizations, not functions).

There are some unrelated changes (e.g, histogram free etc) which
should be committed separately.


David

 When this option is on, FDO-use
 compilation prints out all the notes as that of today. When this
 option is off (the default), all notes are suppressed.
 (2) Make several unconditional warnings under OPT_Wcoverage_mismatch.
 So that they can be turned off via -Wno-coverage-mismatch.
 (3) Make several unconditional warnings for LIPO under flag_ripa_verbose.
 (can be turned on via -fripa-verbose).




 This patch is for google-main only.

 Tested with bootstrap and regression tests.

 2011-10-18   Rong Xu  x...@google.com

        * gcc/common.opt (fprofile-use-verbose): New flag.
        * gcc/value-prof.c (check_ic_counter): guard notes by
          flag_profile_use_verbose.
          (find_func_by_funcdef_no): Ditto.
          (check_ic_target): Ditto.
          (check_counter): Ditto.
          (check_ic_counter): Ditto.
        * gcc/mcf.c (find_minimum_cost_flow): Ditto.
        * gcc/profile.c (read_profile_edge_counts):
          (compute_branch_probabilities):
        * gcc/coverage.c (read_counts_file): guard LIPO
          warnings by flag_ripa_verbose.
          (get_coverage_counts): guard notes
          by flag_profile_use_verbose; make warning
          under OPT_Wcoverage_mismatch.
        * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto.
          (maybe_issue_profile_use_note): Ditto.
          (optimize_reusedist): Ditto.
        * gcc/testsuite/gcc.dg/pr32773.c: add -fprofile-use-verbose.
        * gcc/testsuite/gcc.dg/pr40209.c: Ditto.
        * gcc/testsuite/gcc.dg/pr26570.c: Ditto.
        * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.

 Index: gcc/value-prof.c
 ===
 --- gcc/value-prof.c    (revision 180106)
 +++ gcc/value-prof.c    (working copy)
 @@ -472,9 +472,10 @@
               : DECL_SOURCE_LOCATION (current_function_decl);
       if (flag_profile_correction)
         {
 -         inform (locus, correcting inconsistent value profile: 
 -                 %s profiler overall count (%d) does not match BB count 
 -                  (%d), name, (int)*all, (int)bb_count);
 +          if (flag_profile_use_verbose)
 +            inform (locus, correcting inconsistent value profile: %s 
 +                   profiler overall count (%d) does not match BB count 
 +                    (%d), name, (int)*all, (int)bb_count);
          *all = bb_count;
          if (*count  *all)
             *count = *all;
 @@ -510,33 +511,42 @@
   location_t locus;
   if (*count1  all  flag_profile_correction)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Correcting inconsistent value profile: 
 -              ic (topn) profiler top target count (%ld) exceeds 
 -             BB count (%ld), (long)*count1, (long)all);
 +      if (flag_profile_use_verbose)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Correcting inconsistent value profile: 
 +                  ic (topn) profiler top target count (%ld) exceeds 
 +                  BB count (%ld), (long)*count1, (long)all);
 +        }
       *count1 = all;
     }
   if (*count2  all  flag_profile_correction)
     {
 -      locus = (stmt != NULL)
 -              ? gimple_location (stmt)
 -              : DECL_SOURCE_LOCATION (current_function_decl);
 -      inform (locus, Correcting inconsistent value profile: 
 -              ic (topn) profiler second target count (%ld) exceeds 
 -             BB count (%ld), (long)*count2, (long)all);
 +      if (flag_profile_use_verbose)
 +        {
 +          locus = (stmt != NULL)
 +                  ? gimple_location (stmt)
 +                  : DECL_SOURCE_LOCATION (current_function_decl);
 +          inform (locus, Correcting inconsistent value 

Re: [google] support for building Linux kernel with FDO (issue4523061)

2011-10-13 Thread Xinliang David Li
This patch is for google/main which is 4.7 based, but the validated
version is in google_46 branch (which is based on 4.6).

By the way (given that you are from intel),  do you know if linux
kernel can be built with icc with PGO turned on? Our intern Xiaotian
has tried to use icc (12.0) to built kernel, and had some problems.
The bootable kernel built with icc + gcc (for those failed with icc)
does not perform quite well.

Thanks,

David


On Thu, Oct 13, 2011 at 7:02 PM, vulcansh steven.t.hamp...@intel.com wrote:


 Rong Xu wrote:

 That will be good.
 But you never know, we internally have fixed some bugs that filed to
 us because people use kernel's old gcov code (many versions guarded by
 ifdef) for their tests.

 -Rong


 Has there been any progress one this patch?  What version of gcc is this
 patch for?  I am interested in something that works with gcc 4.7.

 -Steve

 --
 View this message in context: 
 http://old.nabble.com/-google---support-for-building-Linux-kernel-with-FDO-%28issue4523061%29-tp31607746p32649731.html
 Sent from the gcc - patches mailing list archive at Nabble.com.




Re: [google] record compiler options to .note sections

2011-10-11 Thread Xinliang David Li
ok.

David

On Tue, Oct 11, 2011 at 9:51 PM, Dehao Chen de...@google.com wrote:
 Attached is the new patch. Bootstrapped on x86_64, no regressions.

 gcc/ChangeLog.google-4_6:
 2011-10-08  Dehao Chen  de...@google.com

       Add a flag (-frecord-gcc-switches-in-elf) to record compiler
       command line options to .gnu.switches.text sections of
       the object file.
       * coverage.c (write_opts_to_asm): Write the options to
       .gnu.switches.text sections.
       * common.opt: Ditto.
       * opts.h: Ditto.

 gcc/c-family/ChangeLog.google-4_6:
 2011-10-08  Dehao Chen  de...@google.com
       * c-opts.c (c_common_parse_file): Write the options to
       .gnu.switches.text sections.

 gcc/testsuite/ChangeLog.google-4_6:
 2011-10-08  Dehao Chen  de...@google.com

       * gcc.dg/record-gcc-switches-in-elf-1.c: New test.

 Index: gcc/doc/invoke.texi
 ===
 --- gcc/doc/invoke.texi (revision 179836)
 +++ gcc/doc/invoke.texi (working copy)
 @@ -391,6 +391,7 @@
  -fpmu-profile-generate=@var{pmuoption} @gol
  -fpmu-profile-use=@var{pmuoption} @gol
  -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
 +-frecord-gcc-switches-in-elf@gol
  -freorder-blocks-and-partition -freorder-functions @gol
  -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
  -fripa -fripa-disallow-asm-modules -fripa-disallow-opt-mismatch @gol
 @@ -8170,6 +8171,11 @@
  number of times it is called. The params variable
  note-cgraph-section-edge-threshold can be used to only list edges above a
  certain threshold.
 +
 +@item -frecord-gcc-switches-in-elf
 +@opindex frecord-gcc-switches-in-elf
 +Record the command line options in the .gnu.switches.text elf section
 for sample
 +based LIPO to do module grouping.
  @end table

  The following options control compiler behavior regarding floating
 Index: gcc/c-family/c-opts.c
 ===
 --- gcc/c-family/c-opts.c       (revision 179836)
 +++ gcc/c-family/c-opts.c       (working copy)
 @@ -1109,6 +1109,8 @@
   for (;;)
     {
       c_finish_options ();
 +      if (flag_record_gcc_switches_in_elf  i == 0)
 +       write_opts_to_asm ();
       pch_init ();
       set_lipo_c_parsing_context (parse_in, i, verbose);
       push_file_scope ();
 Index: gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c
 ===
 --- gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0)
 +++ gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0)
 @@ -0,0 +1,16 @@
 +/* { dg-do compile} */
 +/* { dg-options -frecord-gcc-switches-in-elf -Dtest -dA } */
 +
 +void foobar(int);
 +
 +void
 +foo (void)
 +{
 +  int i;
 +  for (i = 0; i  100; i++)
 +    {
 +      foobar(i);
 +    }
 +}
 +
 +/* { dg-final { scan-assembler-times Dtest 1 } } */
 Index: gcc/opts.h
 ===
 --- gcc/opts.h  (revision 179836)
 +++ gcc/opts.h  (working copy)
 @@ -381,4 +381,5 @@
  extern void set_struct_debug_option (struct gcc_options *opts,
                                     location_t loc,
                                     const char *value);
 +extern void write_opts_to_asm (void);
  #endif
 Index: gcc/coverage.c
 ===
 --- gcc/coverage.c      (revision 179836)
 +++ gcc/coverage.c      (working copy)
 @@ -55,6 +55,7 @@
  #include diagnostic-core.h
  #include intl.h
  #include l-ipo.h
 +#include dwarf2asm.h

  #include gcov-io.h
  #include gcov-io.c
 @@ -2146,4 +2147,69 @@
   return 0;
  }

 +/* Write command line options to the .note section.  */
 +
 +void
 +write_opts_to_asm (void)
 +{
 +  size_t i;
 +  cpp_dir *quote_paths, *bracket_paths, *pdir;
 +  struct str_list *pdef, *pinc;
 +  int num_quote_paths = 0;
 +  int num_bracket_paths = 0;
 +
 +  get_include_chains (quote_paths, bracket_paths);
 +
 +  /* Write quote_paths to ASM section.  */
 +  switch_to_section (get_section (.gnu.switches.text.quote_paths,
 +                                 SECTION_DEBUG, NULL));
 +  for (pdir = quote_paths; pdir; pdir = pdir-next)
 +    {
 +      if (pdir == bracket_paths)
 +       break;
 +      num_quote_paths++;
 +    }
 +  dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL);
 +  dw2_asm_output_data_uleb128 (num_quote_paths, NULL);
 +  for (pdir = quote_paths; pdir; pdir = pdir-next)
 +    {
 +      if (pdir == bracket_paths)
 +       break;
 +      dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL);
 +    }
 +
 +  /* Write bracket_paths to ASM section.  */
 +  switch_to_section (get_section (.gnu.switches.text.bracket_paths,
 +                                 SECTION_DEBUG, NULL));
 +  for (pdir = bracket_paths; pdir; pdir = pdir-next)
 +    num_bracket_paths++;
 +  dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL);
 +  dw2_asm_output_data_uleb128 (num_bracket_paths, NULL);
 +  for (pdir = 

Re: [google] record compiler options to .note sections

2011-10-10 Thread Xinliang David Li
Ok for google branches.

1) document the difference of this option with -grecord-gcc-switches
(this one only record codegen related options, and recorded in debug
section), and with -frecord-gcc-switches?
2) may be better to use option name: -frecord-gcc-switches-in-object

thanks,

David

On Sun, Oct 9, 2011 at 6:16 PM, Dehao Chen de...@google.com wrote:
 On Sun, Oct 9, 2011 at 5:28 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Sun, Oct 09, 2011 at 09:18:25AM +0800, Dehao Chen wrote:
 Unfortunately -frecord-gcc-switches cannot serve our purpose because
 the recorded switches are mergable, i.e. the linker will merge all
 options to a set of strings. However, object files may have distinct
 compile options. We want to preserve every object file's compile
 options when doing LIPO build.

 And -grecord-gcc-switches?  That one, although it is mergeable, still
 preserves every object files's compile options.

 I tried -grecord-gcc-switches, but looks like it's not recording
 options that I want.

 e.g. the following two commands output the same assembly code, while
 the former should record one more options.

 gcc -g3 -grecord-gcc-switches a.c -Dabcdefgh -Dxy -I/usr/ -S
 gcc -g3 -grecord-gcc-switches a.c -Dabcdefgh -Dxy -S

 Thanks,
 Dehao


        Jakub




Re: patch ping: Dump the degree of overlap to compare static profile with instrumentation profile

2011-10-07 Thread Xinliang David Li
Yes, this is useful for static prediction tuning.

David

On Fri, Oct 7, 2011 at 6:40 PM, Dehao Chen de...@google.com wrote:

 http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01293.html


Re: Disable early inlining while compiling for coverage (issue5173042)

2011-10-01 Thread Xinliang David Li
On Sat, Oct 1, 2011 at 5:17 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Yes, this will improve test coverage option's usability, but please
 provide the example to explain the issues.

 David

 On Fri, Sep 30, 2011 at 6:12 PM, Sharad Singhai sing...@google.com wrote:
  This patch disables early inlining when --coverage option is
  specified. This improves coverage data in presence of other
  optimizations, specially with -O2 where early inlining changes the
  control flow graph sufficiently enough to generate seemingly very odd
  source coverage.

 BTW early inlining was introduced to make coverage possible on some C++
 sources, like tramp3d ;)

Early inline can be important for FDO performance reasons -- as inline
instances get 'context' sensitive profile data.

However the problem here is that since we moved
 coverage to SSA, we do it too late.  My longer term plan is to separate
 coverage and profile feedback passes (so they can't be done both together) and
 instrument early for coverage  ensure that everything before coverage is done
 is save WRT line info.

Yes, coverage and FDO are two different animals having different
requirements -- they happen to share the same instrumentation
mechanism.

  Inlining alone does not mess up the line info, but most
 optimizations we have in early optimization queue do.

Inlining can do some damage too but to a less extent. For instance,
the exit block of the callee instance merged with caller's bb causing
confusion.


 We discussed it back when Richi implemented SSA profiling but we didn't do 
 that
 basically due to lack of testcases.  Would be possible to take one you have
 and fill in some PRs? Those are regressions WRT pre-SSA profiling releases (I 
 think 4.5?)

Yes.

Sharad, I did not see the test case attached? Please file a bug about
this. In the meantime, you can checkin the workaround to google
banches.

thanks,

David


 Honza
 
  Bootstrapped okay and regression tests passed.
 
  Okay for google/gcc-4_6?
 
  2011-09-30   Sharad Singhai  sing...@google.com
 
         * gcc.c (cc1_options): Added -fno-early-inlining for coverage.
 
  Index: gcc.c
  ===
  --- gcc.c       (revision 179402)
  +++ gcc.c       (working copy)
  @@ -776,7 +776,7 @@
   %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}}\
   %{fsyntax-only:-o %j} %{-param*}\
   %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants}\
  - %{coverage:-fprofile-arcs -ftest-coverage};
  + %{coverage:-fprofile-arcs -ftest-coverage -fno-early-inlining};
 
   /* If an assembler wrapper is used to invoke post-assembly tools
     like MAO, --save-temps need to be passed to save the output of
 
  --
  This patch is available for review at http://codereview.appspot.com/5173042
 



Re: [google] Fix bugs in sampled profile collection

2011-09-30 Thread Xinliang David Li
ok.

David

On Fri, Sep 30, 2011 at 6:54 PM, Easwaran Raman era...@google.com wrote:
 This fixes two issues with sampled profile collection. It delays
 cleanup of instrumentation_to_be_sampled after all callgraph nodes
 have been instrumented and prevents  gcov_sample_counter_decl and
 gcov_sampling_rate_decl from being garbage collected.

  Ok for google/gcc-4_6 and google/main branches?

 -Easwaran

 2011-09-30  Easwaran Raman  era...@google.com

        * tree-profile.c (gcov_sample_counter_decl): Add GTY marker.
        (gcov_sampling_rate_decl): Likewise.
        (add_sampling_to_edge_counters): Do not free
        instrumentation_to_be_sampled.
        (cleanup_instrumentation_sampling): New function.
        (tree_profiling): Call cleanup_instrumentation_sampling at the end.

 testsuite/ChangeLog.google-4_6:

 2011-09-30  Easwaran Raman  era...@google.com

        * gcc.dg/sample-profile-generate-1.c: New test.

 Index: gcc/testsuite/gcc.dg/sample-profile-generate-1.c
 ===
 --- gcc/testsuite/gcc.dg/sample-profile-generate-1.c    (revision 0)
 +++ gcc/testsuite/gcc.dg/sample-profile-generate-1.c    (revision 0)
 @@ -0,0 +1,26 @@
 +/* { dg-do compile} */
 +/* { dg-options -O2 -fprofile-generate -fprofile-generate-sampling } */
 +
 +void foobar(int);
 +
 +void
 +foo (void)
 +{
 +  int i;
 +  for (i = 0; i  100; i++)
 +    {
 +      foobar(i);
 +    }
 +}
 +
 +void
 +bar (void)
 +{
 +  int i;
 +  for (i = 0; i  100; i++)
 +    {
 +      foobar(i);
 +    }
 +}
 +
 +/* { dg-final { cleanup-coverage-files } } */

 Index: tree-profile.c
 ===
 --- tree-profile.c      (revision 178897)
 +++ tree-profile.c      (working copy)
 @@ -163,10 +163,10 @@ init_ic_make_global_vars (void)
  static struct pointer_set_t *instrumentation_to_be_sampled = NULL;

  /* extern __thread gcov_unsigned_t __gcov_sample_counter  */
 -static tree gcov_sample_counter_decl = NULL_TREE;
 +static GTY(()) tree gcov_sample_counter_decl = NULL_TREE;

  /* extern gcov_unsigned_t __gcov_sampling_rate  */
 -static tree gcov_sampling_rate_decl = NULL_TREE;
 +static GTY(()) tree gcov_sampling_rate_decl = NULL_TREE;

  /* forward declaration.  */
  void gimple_init_instrumentation_sampling (void);
 @@ -281,9 +281,13 @@ add_sampling_to_edge_counters (void)
             break;
           }
       }
 +}

 +static void
 +cleanup_instrumentation_sampling (void)
 +{
   /* Free the bitmap.  */
 -  if (instrumentation_to_be_sampled)
 +  if (flag_profile_generate_sampling  instrumentation_to_be_sampled)
     {
       pointer_set_destroy (instrumentation_to_be_sampled);
       instrumentation_to_be_sampled = NULL;
 @@ -1452,6 +1456,7 @@ tree_profiling (void)
     }

   del_node_map();
 +  cleanup_instrumentation_sampling();
   return 0;
  }



Re: Disable early inlining while compiling for coverage (issue5173042)

2011-09-30 Thread Xinliang David Li
Yes, this will improve test coverage option's usability, but please
provide the example to explain the issues.

David

On Fri, Sep 30, 2011 at 6:12 PM, Sharad Singhai sing...@google.com wrote:
 This patch disables early inlining when --coverage option is
 specified. This improves coverage data in presence of other
 optimizations, specially with -O2 where early inlining changes the
 control flow graph sufficiently enough to generate seemingly very odd
 source coverage.

 Bootstrapped okay and regression tests passed.

 Okay for google/gcc-4_6?

 2011-09-30   Sharad Singhai  sing...@google.com

        * gcc.c (cc1_options): Added -fno-early-inlining for coverage.

 Index: gcc.c
 ===
 --- gcc.c       (revision 179402)
 +++ gcc.c       (working copy)
 @@ -776,7 +776,7 @@
  %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}}\
  %{fsyntax-only:-o %j} %{-param*}\
  %{fmudflap|fmudflapth:-fno-builtin -fno-merge-constants}\
 - %{coverage:-fprofile-arcs -ftest-coverage};
 + %{coverage:-fprofile-arcs -ftest-coverage -fno-early-inlining};

  /* If an assembler wrapper is used to invoke post-assembly tools
    like MAO, --save-temps need to be passed to save the output of

 --
 This patch is available for review at http://codereview.appspot.com/5173042



Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-28 Thread Xinliang David Li
Sri, please add a new api to do cpu_indicator initialization on demand
to be used in IFUNC context. Perhaps also add some debug check to make
sure no conflicting cpu model is set.

Ok for google branches for now while the discussion continues.

thanks,

David

On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.
 2) Add symbol versions.
 3) Move all builtins to the i386 port.
 4) Add check for atom processor.
 5) No separate passes to fold the builtins.

 Please let me know what you think.
 Thanks,
 -Sri.

        * config/i386/i386.c (build_struct_with_one_bit_fields): New function.
        (make_var_decl): New function.
        (get_field_from_struct): New function.
        (fold_builtin_target): New function.
        (ix86_fold_builtin): New function.
        (ix86_expand_builtin): Expand new builtins by folding them.
        (TARGET_FOLD_BUILTIN): New macro.
        (IX86_BUILTIN_CPU_SUPPORTS_CMOV): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_MMX): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_POPCOUNT): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_SSE): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_SSE2): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_SSE3): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_SSSE3): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_SSE4_1): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS_SSE4_2): New enum value.
        (IX86_BUILTIN_CPU_IS_AMD): New enum value.
        (IX86_BUILTIN_CPU_IS_INTEL): New enum value.
        (IX86_BUILTIN_CPU_IS_INTEL_ATOM): New enum value.
        (IX86_BUILTIN_CPU_IS_INTEL_CORE2): New enum value.
        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_NEHALEM): New enum value.
        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_WESTMERE): New enum value.
        (IX86_BUILTIN_CPU_IS_INTEL_COREI7_SANDYBRIDGE): New enum value.
        (IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA): New enum value.
        (IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI): New enum value.
        (IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL): New enum value.
        * config/i386/libgcc-glibc.ver (__cpu_indicator_init): Export symbol.
        (__cpu_model): Export symbol.
        (__cpu_features): Export symbol.
        * config/i386/i386-builtin-types.def: New function type.

        * config/i386/i386-cpuinfo.c: New file.
        * config/i386/t-cpuinfo: New file.
        * config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc

        * gcc.dg/builtin_target.c: New test.




 On Tue, Aug 23, 2011 at 4:35 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 22 Aug 2011, H.J. Lu wrote:

  void __attribute__((constructor)) bla(void)
  {
   __cpu_indicator_init ();
  }
 
  I don't see any complication.?
 

 Order of constructors.  A constructor may call functions
 which use __cpu_indicator.

 That's why I wrote also:

 The initializer function has to be callable from pre-.init contexts, e.g.
 ifunc dispatchers.

 It obviously has to be guarded against multiple calls.  The ctor in libgcc
 would be mere convenience because then non-ctor code can rely on the data
 being initialized, and only (potential) ctor code has to check and call
 the init function on demand.


 Ciao,
 Michael.



Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Xinliang David Li
Is there a standard way to force this init function to be called
before all ctors?  Adding a ctor in one crtx.o ?

David

On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


 I have

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
   ...
   call bar ();
   ...
 }

 in my application. bar () uses those cpu specific functions.
 foo () is called before __cpu_indicator_init.  Since IFUNC
 returns the cpu specific function address only for the
 first call, the proper cpu specific functions will never be used.

 Please correct me if I am wrong since I did not follow the IFUNC part
 you mentioned.  However, it looks like this could be solved with
 adding an explicit call to __cpu_indicator_init from within the ctor
 foo. To me, it seems like the pain of adding this call explicitly in
 other ctors is worth it because it works cleanly for non-ctors.

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
 }

 Will this work?



 Do I have to do that in every constructor, including
 C++ global constructors?  It is ridiculous.

 --
 H.J.



Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Xinliang David Li
IFUNC selector will need to call get_cpu_indicator (as proposed by HJ
or something similar), while in other contexts, the implementation
should find a way to make sure the indicator is already initialized
such that the builtins accessing the features can be directly used
(See also Michael and Richard's previous comments).  The runtime
penalty is much smaller.

david

On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


 I have

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
   ...
   call bar ();
   ...
 }

 in my application. bar () uses those cpu specific functions.
 foo () is called before __cpu_indicator_init.  Since IFUNC
 returns the cpu specific function address only for the
 first call, the proper cpu specific functions will never be used.

 Please correct me if I am wrong since I did not follow the IFUNC part
 you mentioned.  However, it looks like this could be solved with
 adding an explicit call to __cpu_indicator_init from within the ctor
 foo. To me, it seems like the pain of adding this call explicitly in
 other ctors is worth it because it works cleanly for non-ctors.

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
 }

 Will this work?

 Thanks,
 -Sri.



 --
 H.J.




Re: [google] Increase hotness count fraction

2011-08-24 Thread Xinliang David Li
Good for google branches.

Need to measure the performance and size impact on SPEC2k/06 with this
patch and the previous hotcaller patch for trunk.

thanks,

David

On Wed, Aug 24, 2011 at 2:43 PM, Mark Heffernan meh...@google.com wrote:
 This patch bumps up the parameter 'hot-bb-count-fraction' from 1
 to 4.  This results in about a 0.5% geomean performance
 improvement across internal benchmarks for x86-64 LIPO.  The parameter
 change effectively increases the number of functions/callsites which
 are considered hot.  The performance improvement is likely due to
 increased inlining (more callsites are considered hot and available
 for inlining).

 Bootstrapped and reg-tested on x86-64.  OK for google/gcc-4_6?

 Mark


 2011-08-24  Mark Heffernan  meh...@google.com

        * params.def (hot-bb-count-fraction): Change default value.



 Index: params.def
 ===
 --- params.def  (revision 177964)
 +++ params.def  (working copy)
 @@ -382,7 +382,7 @@ DEFPARAM(PARAM_SMS_LOOP_AVERAGE_COUNT_TH
  DEFPARAM(HOT_BB_COUNT_FRACTION,
         hot-bb-count-fraction,
         Select fraction of the maximal count of repetitions of basic
 block in program given basic block needs to have to be considered
 hot,
 -        1, 0, 0)
 +        4, 0, 0)
  DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
         hot-bb-frequency-fraction,
         Select fraction of the maximal frequency of executions of
 basic block in function given basic block needs to have to be
 considered hot,



Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-18 Thread Xinliang David Li
On Thu, Aug 18, 2011 at 6:10 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 18 Aug 2011, Richard Guenther wrote:

  CPUID to get target features and set global vars corresponding to the
  features. So, the builtin should be folded by into the appropriate
  variable in libgcc.

 Hm, but then the variable should reside in libgcc and you'd only need an
 extern variant in the varpool.  I'm not sure separate constructors
 (possibly in each module ...) would be better than a single one in
 libgcc that would get run unconditionally.

 Would be my preference too.

  determining target features and setting the appropriate globals. If
  the new builtins are called, gcc will call __cpu_indicator_init in a
  constructor so that it is called exactly once. Then, gcc will fold the
  builtin to the appropriate global variable.

 I see, but this sounds like premature optimization to me, no?  Considering
 you'd do this in each module and our inability to merge those constructors
 at link time.  If we put __cpu_indicator, the constructor and the assorted
 support into a separate module inside libgcc.a could we arrange it in a way
 that if __cpu_indicator is not referenced from the program that piece isn't
 linked in?  (not sure if that is possible with constructors)

 If you make an .o file only exporting __cpu_indicator, then it won't be
 included in a link where no object file refers to that symbol.  If you put
 the ctor for that variable in the same .o file you win.

 I also take issue with the large number of builtins, I'd have expected one
 single builtin returning the CPU type, and an enum that can be tested.
 That potentially requires an installed gcc private header, but I think
 enabling access to this cpu detection facility in libgcc to our users is
 worthwhile.

The CPU type builtins can probably be combined, not the feature testing ones.

David



 Ciao,
 Michael.



primary vtable not initialized properly

2011-08-11 Thread Xinliang David Li
Compile the following program using 4.6 or trunk compiler and run it,
the program will seg fault. The problem is that the D3_Spec's primary
vtable has a null entry for virtual function id which it overrides.

The root cause is that the base class's virtual function list for the
derived class is copied from the base class -- so that lost_primary
bit may be 'inherited'. However this bit is not properly reset to 0
after the derive class's vtable is analyzed.

The attached trivial patch fixed the problem.

Bootstrap and tested x86_64/linux.

Ok for trunk and 46 branch?

Thanks,

David


p
Description: Binary data


Re: primary vtable not initialized properly

2011-08-11 Thread Xinliang David Li
A new patch with lightly modified the test case.

David

On Thu, Aug 11, 2011 at 5:02 PM, Xinliang David Li davi...@google.com wrote:
 Compile the following program using 4.6 or trunk compiler and run it,
 the program will seg fault. The problem is that the D3_Spec's primary
 vtable has a null entry for virtual function id which it overrides.

 The root cause is that the base class's virtual function list for the
 derived class is copied from the base class -- so that lost_primary
 bit may be 'inherited'. However this bit is not properly reset to 0
 after the derive class's vtable is analyzed.

 The attached trivial patch fixed the problem.

 Bootstrap and tested x86_64/linux.

 Ok for trunk and 46 branch?

 Thanks,

 David



p
Description: Binary data


cfg fixup bug fix

2011-08-01 Thread Xinliang David Li
The attached patch fixed a minor bug in cfg fixup -- the outgoing edge
profile count is not scaled after inlining leading to warnings printed
in IR dump -- 'Invalid sum of ...'.

Bootstrap and tested on x86-64/linux, ok for trunk?

Thanks,

David


p2
Description: Binary data


-freorder-function is broken

2011-07-29 Thread Xinliang David Li
The attached patch fixed the problem. The root cause of the problem is
due to the ordering change of profile_estimation and tree_profile
pass. In trunk, the function/node frequency is not computed after
profile annotation is done leading to missing information. Another
side effect of this breakage is optimize_function_for_size query is
also broken leading to larger than necessary binary with FDO.

Bootstrapped on x86_64/linux. All FDO testing passed.

Ok after regression test?

David


p
Description: Binary data


Re: [google] fix global vars incorrectly marked as read-only in LIPO mode (issue4798045)

2011-07-21 Thread Xinliang David Li
On Thu, Jul 21, 2011 at 11:32 AM, Rong Xu x...@google.com wrote:
 On Thu, Jul 21, 2011 at 11:09 AM,  davi...@google.com wrote:

 http://codereview.appspot.com/4798045/diff/1/ipa.c
 File ipa.c (right):

 http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1034
 ipa.c:1034: {
 Has varpool node linking happened at this point? If not, the new code
 here is not excersised.

 This functions is called in multiple places: local pass and
 whole_program pass. varpool_node_link is b/w these two passes. the
 varpool node attribute is set before the use in
 ipa_discover_readonly_nonaddressable_vars()

So the code relies on the second visibility pass to get things right
-- if that pass is disabled things can go wrong (if the default
visibility value when vnode is created is true, LIPO mode will get
pessemitic result; if the default is false, LIPO mode will still get
wrong result if only local visiblity pass is run).

A better fix might be simply do not check 'needed' bit for comdat
varnode in LIPO mode and always run the visibiity checker:

if ((vnode-needed || L_IPO_COMP_MODE)
varpool_node_externally_visible_p (...
  )


David




 http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1041
 ipa.c:1041: vnode-externally_visible = false;
 Better to add a simple loop after varpool_node_linking to merge the
 attribute in l-ipo.c assuming the linking happens after local visibility
 pass.

 We can merge in the varpool_node_link, but we still need to change
 here as the attribute will be overwritten here in the whole_program
 pass.


 http://codereview.appspot.com/4798045/




Re: [google] fix global vars incorrectly marked as read-only in LIPO mode (issue4798045)

2011-07-21 Thread Xinliang David Li
Ok.

David

On Thu, Jul 21, 2011 at 1:45 PM, Rong Xu x...@google.com wrote:
 Please review the new patch attached to this email.

 thanks,

 -Rong

 On Thu, Jul 21, 2011 at 12:44 PM, Rong Xu x...@google.com wrote:
 wait a second. this still won't work if we disable the whole-program
 pass (like my original change, the visibility analysis change won't
 kick in). we also need to change varpool_node_link() to merge the
 local attribute.

 -Rong

 On Thu, Jul 21, 2011 at 12:35 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, Jul 21, 2011 at 12:27 PM, Rong Xu x...@google.com wrote:
 this is a good point. ipa_discover_readonly_nonaddressable_vars() is
 called in two passes. whole-program (whole program visibility
 analysis) and static-var. The one in whole-program is ok here as it is
  bundled together with the analysis. the invocation in static-var can
 go wrong.

 should we add a check for COMDAT flag also? like

 Probably not -- only non referenced comdat vars will be marked as not 
 needed.

 David

 if ((vnode-needed || (L_IPO_COMP_MODE  DECL_COMDAT (node-decl)))
     varpool_node_externally_visible_p) (

 Thanks,

 -Rong
 On Thu, Jul 21, 2011 at 11:59 AM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, Jul 21, 2011 at 11:32 AM, Rong Xu x...@google.com wrote:
 On Thu, Jul 21, 2011 at 11:09 AM,  davi...@google.com wrote:

 http://codereview.appspot.com/4798045/diff/1/ipa.c
 File ipa.c (right):

 http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1034
 ipa.c:1034: {
 Has varpool node linking happened at this point? If not, the new code
 here is not excersised.

 This functions is called in multiple places: local pass and
 whole_program pass. varpool_node_link is b/w these two passes. the
 varpool node attribute is set before the use in
 ipa_discover_readonly_nonaddressable_vars()

 So the code relies on the second visibility pass to get things right
 -- if that pass is disabled things can go wrong (if the default
 visibility value when vnode is created is true, LIPO mode will get
 pessemitic result; if the default is false, LIPO mode will still get
 wrong result if only local visiblity pass is run).

 A better fix might be simply do not check 'needed' bit for comdat
 varnode in LIPO mode and always run the visibiity checker:

 if ((vnode-needed || L_IPO_COMP_MODE)
    varpool_node_externally_visible_p (...
  )


 David




 http://codereview.appspot.com/4798045/diff/1/ipa.c#newcode1041
 ipa.c:1041: vnode-externally_visible = false;
 Better to add a simple loop after varpool_node_linking to merge the
 attribute in l-ipo.c assuming the linking happens after local visibility
 pass.

 We can merge in the varpool_node_link, but we still need to change
 here as the attribute will be overwritten here in the whole_program
 pass.


 http://codereview.appspot.com/4798045/









Re: [testsuite] Remove dg-extra-errors in gcc.dg/inline_[12].c etc.

2011-06-28 Thread Xinliang David Li
Your fix works ok for me (on x86-64/linux) too.

Thanks,

David

On Tue, Jun 28, 2011 at 10:09 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Three new testcases seem to XPASS everywhere, at least on all of my
 targets:

 XPASS: gcc.dg/inline_1.c (test for excess errors)
 XPASS: gcc.dg/inline_2.c (test for excess errors)
 XPASS: gcc.dg/unroll_1.c (test for excess errors)

 The following patch fixes this to remove the noise.  Tested with the
 appropriate runtest invocation on i386-pc-solaris2.10.

 Ok for mainline?

        Rainer


 2011-06-28  Rainer Orth  r...@cebitec.uni-bielefeld.de

        * gcc.dg/inline_1.c: Remove dg-excess-errors.
        * gcc.dg/inline_2.c: Likewise.
        * gcc.dg/unroll_1.c: Likewise.

 Index: gcc/testsuite/gcc.dg/inline_2.c
 ===
 --- gcc/testsuite/gcc.dg/inline_2.c     (revision 175590)
 +++ gcc/testsuite/gcc.dg/inline_2.c     (working copy)
 @@ -20,4 +20,3 @@

  /* { dg-final { scan-tree-dump-times bar 5 optimized } } */
  /* { dg-final { cleanup-tree-dump optimized } } */
 -/* { dg-excess-errors extra notes } */
 Index: gcc/testsuite/gcc.dg/inline_1.c
 ===
 --- gcc/testsuite/gcc.dg/inline_1.c     (revision 175590)
 +++ gcc/testsuite/gcc.dg/inline_1.c     (working copy)
 @@ -20,4 +20,3 @@

  /* { dg-final { scan-tree-dump-times bar 5 optimized } } */
  /* { dg-final { cleanup-tree-dump optimized } } */
 -/* { dg-excess-errors extra notes } */
 Index: gcc/testsuite/gcc.dg/unroll_1.c
 ===
 --- gcc/testsuite/gcc.dg/unroll_1.c     (revision 175590)
 +++ gcc/testsuite/gcc.dg/unroll_1.c     (working copy)
 @@ -30,4 +30,3 @@

  /* { dg-final { scan-rtl-dump-times Decided to peel loop completely 2 
 loop2_unroll } } */
  /* { dg-final { cleanup-rtl-dump loop2_unroll } } */
 -/* { dg-excess-errors extra notes } */

 --
 -
 Rainer Orth, Center for Biotechnology, Bielefeld University



Re: SRA generates uninitialized var use

2011-06-21 Thread Xinliang David Li
On Tue, Jun 21, 2011 at 1:42 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Tue, Jun 21, 2011 at 1:28 AM, Xinliang David Li davi...@google.com wrote:
 Good point -- but why does SRA have to be so complicated? If it just
 do structure expansion and let subsequent phases to clean it up, would
 it be simpler? Anyway this is off the topic.

 Well, it's certainly non-optimal to insert new memory backed variables
 to get rid of memory backed variables ...


Yes, in the current way it is not optimal.

Before that problem is resolved, is the simple patch ok for trunk? The
non-optimal code issue can be tracked with a different bug.

Thanks,

David

 Richard.

 Thanks,

 David


 On Mon, Jun 20, 2011 at 1:47 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 20, 2011 at 6:15 PM, Xinliang David Li davi...@google.com 
 wrote:
 It is used to indicate the fact the var decl needs to have a memory
 home (addressable) -- is there another way to do this? this is to
 avoid the following situation:

 1) after SRA before update SSA, the IR looks like:

   MEM[ SR_123] = ...

   other_var = SR_123;    (x)


 In this case, SR_123 is not of aggregate type, and it is not
 addressable, update_ssa won't assign a VUSE for (x), leading to

 The point is, SRA should never have created the above

  MEM[ SR_123] = ...

 Martin, why would it even create new _memory_ backed decls?

 Richard.

 2) final IR after SRA:

   MEM[..., SR_123] = ..
   other_var = SR_123_yyy(D);


 David

 On Mon, Jun 20, 2011 at 4:13 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Sat, Jun 18, 2011 at 10:56 AM, Xinliang David Li davi...@google.com 
 wrote:
 Compiling the test case in the patch with -O2 -m32 without the fix,
 the program will abort. The problem is a var decl whose address is
 taken is not marked as addressable leading to bad SSA update (missing
 VUSE).  (the triaging used the the .after and .after_cleanup dump diff
 and found the problem).

 the test is on going. Ok after testing?

 That doesn't make sense.  SRA shouldn't generate anything that has
 its address taken.  So, where do we take its address?

 Richard.

 Thanks,

 David








Re: SRA generates uninitialized var use

2011-06-20 Thread Xinliang David Li
It is used to indicate the fact the var decl needs to have a memory
home (addressable) -- is there another way to do this? this is to
avoid the following situation:

1) after SRA before update SSA, the IR looks like:

   MEM[ SR_123] = ...

   other_var = SR_123;    (x)


In this case, SR_123 is not of aggregate type, and it is not
addressable, update_ssa won't assign a VUSE for (x), leading to

2) final IR after SRA:

   MEM[..., SR_123] = ..
   other_var = SR_123_yyy(D);


David

On Mon, Jun 20, 2011 at 4:13 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Sat, Jun 18, 2011 at 10:56 AM, Xinliang David Li davi...@google.com 
 wrote:
 Compiling the test case in the patch with -O2 -m32 without the fix,
 the program will abort. The problem is a var decl whose address is
 taken is not marked as addressable leading to bad SSA update (missing
 VUSE).  (the triaging used the the .after and .after_cleanup dump diff
 and found the problem).

 the test is on going. Ok after testing?

 That doesn't make sense.  SRA shouldn't generate anything that has
 its address taken.  So, where do we take its address?

 Richard.

 Thanks,

 David




Re: SRA generates uninitialized var use

2011-06-20 Thread Xinliang David Li
Good point -- but why does SRA have to be so complicated? If it just
do structure expansion and let subsequent phases to clean it up, would
it be simpler? Anyway this is off the topic.

Thanks,

David


On Mon, Jun 20, 2011 at 1:47 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Jun 20, 2011 at 6:15 PM, Xinliang David Li davi...@google.com wrote:
 It is used to indicate the fact the var decl needs to have a memory
 home (addressable) -- is there another way to do this? this is to
 avoid the following situation:

 1) after SRA before update SSA, the IR looks like:

   MEM[ SR_123] = ...

   other_var = SR_123;    (x)


 In this case, SR_123 is not of aggregate type, and it is not
 addressable, update_ssa won't assign a VUSE for (x), leading to

 The point is, SRA should never have created the above

  MEM[ SR_123] = ...

 Martin, why would it even create new _memory_ backed decls?

 Richard.

 2) final IR after SRA:

   MEM[..., SR_123] = ..
   other_var = SR_123_yyy(D);


 David

 On Mon, Jun 20, 2011 at 4:13 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Sat, Jun 18, 2011 at 10:56 AM, Xinliang David Li davi...@google.com 
 wrote:
 Compiling the test case in the patch with -O2 -m32 without the fix,
 the program will abort. The problem is a var decl whose address is
 taken is not marked as addressable leading to bad SSA update (missing
 VUSE).  (the triaging used the the .after and .after_cleanup dump diff
 and found the problem).

 the test is on going. Ok after testing?

 That doesn't make sense.  SRA shouldn't generate anything that has
 its address taken.  So, where do we take its address?

 Richard.

 Thanks,

 David






Re: SRA generates uninitialized var use

2011-06-19 Thread Xinliang David Li
Bootstrap and tested on linux/x86_64.

Ok for trunk?

David


On Sat, Jun 18, 2011 at 1:56 AM, Xinliang David Li davi...@google.com wrote:
 Compiling the test case in the patch with -O2 -m32 without the fix,
 the program will abort. The problem is a var decl whose address is
 taken is not marked as addressable leading to bad SSA update (missing
 VUSE).  (the triaging used the the .after and .after_cleanup dump diff
 and found the problem).

 the test is on going. Ok after testing?

 Thanks,

 David



SRA generates uninitialized var use

2011-06-18 Thread Xinliang David Li
Compiling the test case in the patch with -O2 -m32 without the fix,
the program will abort. The problem is a var decl whose address is
taken is not marked as addressable leading to bad SSA update (missing
VUSE).  (the triaging used the the .after and .after_cleanup dump diff
and found the problem).

the test is on going. Ok after testing?

Thanks,

David


sra_bug.p
Description: Binary data


Re: [google] Add intermediate text format for gcov (issue4595053)

2011-06-15 Thread Xinliang David Li
Ok for google/main.

David

On Tue, Jun 14, 2011 at 1:50 PM, Sharad Singhai (शरद सिंघई)
sing...@google.com wrote:
 Sorry, Rietveld didn't send out the updated patch along with my mail.
 Here it is.

 Sharad

 2011-06-14   Sharad Singhai  sing...@google.com

        Google Ref 3

        * doc/gcov.texi: Document gcov intermediate format.
        * gcov.c (get_gcov_file_intermediate_name): New function.
        (output_intermediate_file): New function.
        * testsuite/lib/gcov.exp: Handle intermediate format.
        * testsuite/g++.dg/gcov/gcov-7.C: New test.

 Index: doc/gcov.texi
 ===
 --- doc/gcov.texi       (revision 174926)
 +++ doc/gcov.texi       (working copy)
 @@ -130,6 +130,7 @@
      [@option{-f}|@option{--function-summaries}]
      [@option{-o}|@option{--object-directory} @var{directory|file}]
 @var{sourcefiles}
      [@option{-u}|@option{--unconditional-branches}]
 +     [@option{-i}|@option{--intermediate-format}]
      [@option{-d}|@option{--display-progress}]
  @c man end
  @c man begin SEEALSO
 @@ -216,6 +217,32 @@
  @itemx --display-progress
  Display the progress on the standard output.

 +@item -i
 +@itemx --intermediate-format
 +Output gcov file in an intermediate text format that can be used by
 +@command{lcov} or other applications. It will output a single *.gcov file per
 +*.gcda file. No source code is required.
 +
 +The format of the intermediate @file{.gcov} file is plain text with
 +one entry per line
 +
 +@smallexample
 +SF:@var{source_file_name}
 +FN:@var{line_number},@var{function_name}
 +FNDA:@var{execution_count},@var{function_name}
 +BA:@var{line_num},@var{branch_coverage_type}
 +DA:@var{line number},@var{execution_count}
 +
 +Where the @var{branch_coverage_type} is
 +   0 (Branch not executed)
 +   1 (Branch executed, but not taken)
 +   2 (Branch executed and taken)
 +
 +There can be multiple SF entries in an intermediate gcov file. All
 +entries following SF pertain to that source file until the next SF
 +entry.
 +@end smallexample
 +
  @end table

  @command{gcov} should be run with the current directory the same as that
 Index: gcov.c
 ===
 --- gcov.c      (revision 174926)
 +++ gcov.c      (working copy)
 @@ -38,6 +38,7 @@
  #include tm.h
  #include intl.h
  #include version.h
 +#include demangle.h

  #include getopt.h

 @@ -310,6 +311,9 @@

  static int flag_display_progress = 0;

 +/* Output *.gcov file in intermediate format used by 'lcov'.  */
 +static int flag_intermediate_format = 0;
 +
  /* For included files, make the gcov output file name include the name
    of the input source file.  For example, if x.h is included in a.c,
    then the output file name is a.c##x.h.gcov instead of x.h.gcov.  */
 @@ -436,6 +440,11 @@
   fnotice (file,   -o, --object-directory DIR|FILE Search for object
 files in DIR or called FILE\n);
   fnotice (file,   -p, --preserve-paths            Preserve all
 pathname components\n);
   fnotice (file,   -u, --unconditional-branches    Show
 unconditional branch counts too\n);
 +  fnotice (file,   -i, --intermediate-format       Output .gcov file
 in an intermediate text\n\
 +                                    format that can be used by 'lcov'
 or other\n\
 +                                    applications.  It will output a single\n\
 +                                    .gcov file per .gcda file.  No
 source file\n\
 +                                    is required.\n);
   fnotice (file,   -d, --display-progress          Display progress
 information\n);
   fnotice (file, \nFor bug reporting instructions, please see:\n%s.\n,
           bug_report_url);
 @@ -472,6 +481,7 @@
   { object-file,          required_argument, NULL, 'o' },
   { unconditional-branches, no_argument,     NULL, 'u' },
   { display-progress,     no_argument,       NULL, 'd' },
 +  { intermediate-format,  no_argument,       NULL, 'i' },
   { 0, 0, 0, 0 }
  };

 @@ -482,7 +492,8 @@
  {
   int opt;

 -  while ((opt = getopt_long (argc, argv, abcdfhlno:puv, options,
 NULL)) != -1)
 +  while ((opt = getopt_long (argc, argv, abcdfhilno:puv, options, NULL)) !=
 +         -1)
     {
       switch (opt)
        {
 @@ -516,6 +527,10 @@
        case 'u':
          flag_unconditional = 1;
          break;
 +       case 'i':
 +          flag_intermediate_format = 1;
 +          flag_gcov_file = 1;
 +          break;
         case 'd':
           flag_display_progress = 1;
           break;
 @@ -531,6 +546,109 @@
   return optind;
  }

 +/* Get the name of the gcov file.  The return value must be free'd.
 +
 +   It appends the '.gcov' extension to the *basename* of the file.
 +   The resulting file name will be in PWD.
 +
 +   e.g.,
 +   input: foo.da,       output: foo.da.gcov
 +   input: a/b/foo.cc,   output: foo.cc.gcov  */
 +
 +static char *
 +get_gcov_file_intermediate_name (const char *file_name)
 +{
 +  const char *gcov = .gcov;
 +  

[google] backport r174930 to google/main

2011-06-14 Thread Xinliang David Li
Backported r174930 to google/main.

David


Re: Dump before flag

2011-06-14 Thread Xinliang David Li
Committed after Bootstrapping and regression testing on x86-64/linux.
The follow up patch will come soon.

Thanks,

David

On Tue, Jun 14, 2011 at 8:57 AM, Xinliang David Li davi...@google.com wrote:
 On Tue, Jun 14, 2011 at 6:58 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, Jun 10, 2011 at 8:44 PM, Xinliang David Li davi...@google.com 
 wrote:
 This is the revised patch as suggested.

 How does it look?

  }

 +static void
 +execute_function_dump (void *data ATTRIBUTE_UNUSED)

 function needs a comment.

 Ok with that change.

 Please always specify how you tested the patch - the past fallouts
 suggest you didn't do the required testing carefully.

 I think I did -- the fallout was probably due to different
 '--enable-checking' setting. I have now turned it to 'yes'

 Thanks,

 David


 A changelog is missing as well.

 Thanks,
 Richard.

 Thanks,

 David

 On Fri, Jun 10, 2011 at 9:22 AM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li davi...@google.com 
 wrote:
 See attached.

 Hmm.  I don't like how you still wire dumping in the TODO routines.
 Doesn't it work to just dump the body from pass_fini_dump_file ()?
 Or if that doesn't sound clean from (a subset of) places where it
 is called? (we might want to exclude the ipa read/write/summary
 stages)

 That may require another round of function traversal -- but probably
 not a big deal -- it sounds cleaner.

 David


 Richard.

 Thanks,

 David

 On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li davi...@google.com 
 wrote:
 this is the patch that just removes the TODO_dump flag and forces it
 to dump. The original code cfun-last_verified = flags 
 TODO_verify_all looks weird -- depending on TODO_dump is set or not,
 the behavior of the update is different (when no other todo flags is
 set).

 Ok for trunk?

 -ENOPATCH.

 Richard.

 David

 On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li 
 davi...@google.com wrote:
 The following is the patch that does the job. Most of the changes 
 are
 just  removing TODO_dump_func. The major change is in passes.c and
 tree-pass.h.

 -fdump-xxx-yyy-start       -- dump before TODO_start
 -fdump-xxx-yyy-before    -- dump before main pass after TODO_pass
 -fdump-xxx-yyy-after       -- dump after main pass before 
 TODO_finish
 -fdump-xxx-yyy-finish      -- dump after TODO_finish

 Can we bikeshed a bit more about these names?

 These names may be less confusing:

 before_preparation
 before
 after
 after_cleanup

 David

 start and before
 have no semantical difference to me ... as the dump before TODO_start
 of a pass and the dump after TODO_finish of the previous pass are
 identical (hopefully ;)), maybe merge those into a -between flag?
 If you'd specify it for a single pass then you'd get both -start and 
 -finish
 (using your naming scheme).  Splitting that dump(s) to different 
 files
 then might make sense (not sure about the name to use).

 Note that I find it extremely useful to have dumping done in
 chronological order - splitting some of it to different files 
 destroys
 this, especially a dump after TODO_start or before TODO_finish
 should appear in the same file (or we could also start splitting
 individual TODO_ output into sub-dump-files).  I guess what would
 be nice instread would be a fancy dump-file viewer that could
 show diffs, hide things like SCEV output, etc.

 I suppose a patch that removes the dump TODO and unconditionally
 dumps at the current point would be a good preparation for this
 enhancing patch.

 Richard.

 The default is 'finish'.

 Does it look ok?

 Thanks,

 David

 On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li 
 davi...@google.com wrote:

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
 +       (dump_flags  TDF_BEFORE))
 +    {
 +      pass-todo_flags_finish = ~TODO_dump_func;
 +      pass-todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).


 How about removing dumping TODO completely -- this can be done 
 easily
 -- I don't understand why pass wants extra control on the dumping 
 if
 user already asked for dumping -- it is annoying to see empty IR 
 dump
 for a pass when I want to see it.

 At least I would have expected to also get the dump after

Re: Dump only functions with name matching patterns

2011-06-14 Thread Xinliang David Li
Attached the patch.

David

On Tue, Jun 14, 2011 at 4:21 PM, Xinliang David Li davi...@google.com wrote:
 This is the second (hopefully the last in the series of dumper
 changes) follow-up patch.

 It adds a control so that verbosity of the dump can be greatly reduced
 (and hopefully simplify gcc developer's life a little).  For instance:

 -fdump-tree-dce=foo[0-9]$

 to dump IR (and debug trace) only for function with assembler names
 matching the specified pattern.

 Notes:

 1) the pattern is specified using regular expression -- unlike
 disable/enable option where exact name patterns are required, for
 dumping, fuzziness is preferred
 2) patterns specified in different -fdump-.. options will be 'ORed' together.

 The patch has not been fully tested, but comments are welcome.

 Thanks,

 David



 On Tue, Jun 14, 2011 at 4:13 PM, Xinliang David Li davi...@google.com wrote:
 Here is one the of follow up patches:  support of -before_preparation,
 -before, -after, -after_cleanup dump flags.

 The default dumping behavior does not change at all, but if any one of
 the above flags is specified, the function IR will be dumped into a
 file with the corresponding suffix.  The enhancement is to simplify IR
 diffing.

 Bootstrapped and regression tested on x86-64/linux.

 Ok for trunk?

 Thanks,

 David

 On Tue, Jun 14, 2011 at 12:40 PM, Xinliang David Li davi...@google.com 
 wrote:
 Committed after Bootstrapping and regression testing on x86-64/linux.
 The follow up patch will come soon.

 Thanks,

 David

 On Tue, Jun 14, 2011 at 8:57 AM, Xinliang David Li davi...@google.com 
 wrote:
 On Tue, Jun 14, 2011 at 6:58 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, Jun 10, 2011 at 8:44 PM, Xinliang David Li davi...@google.com 
 wrote:
 This is the revised patch as suggested.

 How does it look?

  }

 +static void
 +execute_function_dump (void *data ATTRIBUTE_UNUSED)

 function needs a comment.

 Ok with that change.

 Please always specify how you tested the patch - the past fallouts
 suggest you didn't do the required testing carefully.

 I think I did -- the fallout was probably due to different
 '--enable-checking' setting. I have now turned it to 'yes'

 Thanks,

 David


 A changelog is missing as well.

 Thanks,
 Richard.

 Thanks,

 David

 On Fri, Jun 10, 2011 at 9:22 AM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li davi...@google.com 
 wrote:
 See attached.

 Hmm.  I don't like how you still wire dumping in the TODO routines.
 Doesn't it work to just dump the body from pass_fini_dump_file ()?
 Or if that doesn't sound clean from (a subset of) places where it
 is called? (we might want to exclude the ipa read/write/summary
 stages)

 That may require another round of function traversal -- but probably
 not a big deal -- it sounds cleaner.

 David


 Richard.

 Thanks,

 David

 On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li 
 davi...@google.com wrote:
 this is the patch that just removes the TODO_dump flag and forces it
 to dump. The original code cfun-last_verified = flags 
 TODO_verify_all looks weird -- depending on TODO_dump is set or not,
 the behavior of the update is different (when no other todo flags is
 set).

 Ok for trunk?

 -ENOPATCH.

 Richard.

 David

 On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li 
 davi...@google.com wrote:
 On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li 
 davi...@google.com wrote:
 The following is the patch that does the job. Most of the 
 changes are
 just  removing TODO_dump_func. The major change is in passes.c 
 and
 tree-pass.h.

 -fdump-xxx-yyy-start       -- dump before TODO_start
 -fdump-xxx-yyy-before    -- dump before main pass after 
 TODO_pass
 -fdump-xxx-yyy-after       -- dump after main pass before 
 TODO_finish
 -fdump-xxx-yyy-finish      -- dump after TODO_finish

 Can we bikeshed a bit more about these names?

 These names may be less confusing:

 before_preparation
 before
 after
 after_cleanup

 David

 start and before
 have no semantical difference to me ... as the dump before 
 TODO_start
 of a pass and the dump after TODO_finish of the previous pass are
 identical (hopefully ;)), maybe merge those into a -between flag?
 If you'd specify it for a single pass then you'd get both -start 
 and -finish
 (using your naming scheme).  Splitting that dump(s) to different 
 files
 then might make sense (not sure about the name to use).

 Note that I find it extremely useful to have dumping done in
 chronological order - splitting some of it to different files 
 destroys
 this, especially a dump after TODO_start or before TODO_finish
 should appear in the same file (or we could also start splitting
 individual TODO_ output

Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-10 Thread Xinliang David Li
On Fri, Jun 10, 2011 at 2:03 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Fri, Jun 10, 2011 at 12:51 AM, Xinliang David Li davi...@google.com 
 wrote:
 Patch is temporally rolled back.

 Richard, looks like deeper pass manager cleanup is needed -- I would
 like to delay that. For now, this leaves us two choices -- 1) do cfunc
 push/pop, or 2) do pass dump while executing. None of them is ideal,
 but safe enough.

 Well.  It seems we should take a step back and look at the whole picture
 and try to figure out how it should look like in the end (maybe do that
 in London).

 For now I prefer 1) over 2).

That sounds good. For now I will check in 1) (as it has no impact on
default behavior) and do pass clean up later. I won't be in London for
discussion, but you let me know how the discussion goes.

Thanks,

David


 Thanks,
 Richard.

 Thanks,

 David

 On Thu, Jun 9, 2011 at 3:32 PM, Xinliang David Li davi...@google.com wrote:
 Though I can not reproduce it, it might be related to what Richard
 mentioned in the review -- The TODO's are executed though the pass is
 not. This opened up a can of worm -- I will revert the patches for
 now.

 David


 On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com 
 wrote:
 Please review the attached two patches.

 In the first patch, gate functions are cleaned up. All the per
 function legality checks are moved into the executor and the
 optimization heuristic checks (optimize for size) remain in the
 gators. These allow the the following overriding order:

    common flags (O2, -ftree-vrp, -fgcse etc)   ---  compiler
 heuristic (optimize for size/speed) --- -fdisable/enable forcing pass
 options  --- legality check

 Testing under going. Ok for trunk?


 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350

 --
 H.J.






Re: Dump before flag

2011-06-10 Thread Xinliang David Li
On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li davi...@google.com wrote:
 See attached.

 Hmm.  I don't like how you still wire dumping in the TODO routines.
 Doesn't it work to just dump the body from pass_fini_dump_file ()?
 Or if that doesn't sound clean from (a subset of) places where it
 is called? (we might want to exclude the ipa read/write/summary
 stages)

That may require another round of function traversal -- but probably
not a big deal -- it sounds cleaner.

David


 Richard.

 Thanks,

 David

 On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li davi...@google.com 
 wrote:
 this is the patch that just removes the TODO_dump flag and forces it
 to dump. The original code cfun-last_verified = flags 
 TODO_verify_all looks weird -- depending on TODO_dump is set or not,
 the behavior of the update is different (when no other todo flags is
 set).

 Ok for trunk?

 -ENOPATCH.

 Richard.

 David

 On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following is the patch that does the job. Most of the changes are
 just  removing TODO_dump_func. The major change is in passes.c and
 tree-pass.h.

 -fdump-xxx-yyy-start       -- dump before TODO_start
 -fdump-xxx-yyy-before    -- dump before main pass after TODO_pass
 -fdump-xxx-yyy-after       -- dump after main pass before TODO_finish
 -fdump-xxx-yyy-finish      -- dump after TODO_finish

 Can we bikeshed a bit more about these names?

 These names may be less confusing:

 before_preparation
 before
 after
 after_cleanup

 David

 start and before
 have no semantical difference to me ... as the dump before TODO_start
 of a pass and the dump after TODO_finish of the previous pass are
 identical (hopefully ;)), maybe merge those into a -between flag?
 If you'd specify it for a single pass then you'd get both -start and 
 -finish
 (using your naming scheme).  Splitting that dump(s) to different files
 then might make sense (not sure about the name to use).

 Note that I find it extremely useful to have dumping done in
 chronological order - splitting some of it to different files destroys
 this, especially a dump after TODO_start or before TODO_finish
 should appear in the same file (or we could also start splitting
 individual TODO_ output into sub-dump-files).  I guess what would
 be nice instread would be a fancy dump-file viewer that could
 show diffs, hide things like SCEV output, etc.

 I suppose a patch that removes the dump TODO and unconditionally
 dumps at the current point would be a good preparation for this
 enhancing patch.

 Richard.

 The default is 'finish'.

 Does it look ok?

 Thanks,

 David

 On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com 
 wrote:

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
 +       (dump_flags  TDF_BEFORE))
 +    {
 +      pass-todo_flags_finish = ~TODO_dump_func;
 +      pass-todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).


 How about removing dumping TODO completely -- this can be done easily
 -- I don't understand why pass wants extra control on the dumping if
 user already asked for dumping -- it is annoying to see empty IR dump
 for a pass when I want to see it.

 At least I would have expected to also get the dump after the
 pass, not only the one before it with this dump flag.

 Now, why can't you look at the previous pass output for the
 before-dump (as I do usually)?

 For one thing, you need to either remember what is the previous pass,
 or dump all passes which for large files can take very long time. Even
 with all the dumps, you will need to eyeballing to find the previous
 pass which may or may not have the IR dumped.

 How about removing dump TODO?

 Yeah, I think this would go in the right direction.  Currently some 
 passes
 do not dump function bodies because they presumably do no IL
 modification.  But this is certainly the minority (and some passes do 
 not
 dump bodies even though they are modifying the IL ...).

 So I'd say we should by default dump function bodies.

 Note that there are three useful dumping positions (maybe four),
 before todo-start, after todo-start, before todo-finish and after 
 todo-finish.
 By default we'd want after todo-finish.  When we no longer dump

[google] backport 174848, 174849 to google/main

2011-06-09 Thread Xinliang David Li
Backported -fdump-passes option impl.

David


Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-09 Thread Xinliang David Li
Can you send me a trace? I can not reproduce the problem.

David


On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote:
 Please review the attached two patches.

 In the first patch, gate functions are cleaned up. All the per
 function legality checks are moved into the executor and the
 optimization heuristic checks (optimize for size) remain in the
 gators. These allow the the following overriding order:

    common flags (O2, -ftree-vrp, -fgcse etc)   ---  compiler
 heuristic (optimize for size/speed) --- -fdisable/enable forcing pass
 options  --- legality check

 Testing under going. Ok for trunk?


 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350

 --
 H.J.



Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-09 Thread Xinliang David Li
Though I can not reproduce it, it might be related to what Richard
mentioned in the review -- The TODO's are executed though the pass is
not. This opened up a can of worm -- I will revert the patches for
now.

David


On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote:
 Please review the attached two patches.

 In the first patch, gate functions are cleaned up. All the per
 function legality checks are moved into the executor and the
 optimization heuristic checks (optimize for size) remain in the
 gators. These allow the the following overriding order:

    common flags (O2, -ftree-vrp, -fgcse etc)   ---  compiler
 heuristic (optimize for size/speed) --- -fdisable/enable forcing pass
 options  --- legality check

 Testing under going. Ok for trunk?


 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350

 --
 H.J.



Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-09 Thread Xinliang David Li
Patch is temporally rolled back.

Richard, looks like deeper pass manager cleanup is needed -- I would
like to delay that. For now, this leaves us two choices -- 1) do cfunc
push/pop, or 2) do pass dump while executing. None of them is ideal,
but safe enough.

Thanks,

David

On Thu, Jun 9, 2011 at 3:32 PM, Xinliang David Li davi...@google.com wrote:
 Though I can not reproduce it, it might be related to what Richard
 mentioned in the review -- The TODO's are executed though the pass is
 not. This opened up a can of worm -- I will revert the patches for
 now.

 David


 On Thu, Jun 9, 2011 at 3:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com 
 wrote:
 Please review the attached two patches.

 In the first patch, gate functions are cleaned up. All the per
 function legality checks are moved into the executor and the
 optimization heuristic checks (optimize for size) remain in the
 gators. These allow the the following overriding order:

    common flags (O2, -ftree-vrp, -fgcse etc)   ---  compiler
 heuristic (optimize for size/speed) --- -fdisable/enable forcing pass
 options  --- legality check

 Testing under going. Ok for trunk?


 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49350

 --
 H.J.




Re: Dump before flag

2011-06-08 Thread Xinliang David Li
On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com wrote:
 The following is the patch that does the job. Most of the changes are
 just  removing TODO_dump_func. The major change is in passes.c and
 tree-pass.h.

 -fdump-xxx-yyy-start       -- dump before TODO_start
 -fdump-xxx-yyy-before    -- dump before main pass after TODO_pass
 -fdump-xxx-yyy-after       -- dump after main pass before TODO_finish
 -fdump-xxx-yyy-finish      -- dump after TODO_finish

 Can we bikeshed a bit more about these names?

These names may be less confusing:

before_preparation
before
after
after_cleanup

David

 start and before
 have no semantical difference to me ... as the dump before TODO_start
 of a pass and the dump after TODO_finish of the previous pass are
 identical (hopefully ;)), maybe merge those into a -between flag?
 If you'd specify it for a single pass then you'd get both -start and -finish
 (using your naming scheme).  Splitting that dump(s) to different files
 then might make sense (not sure about the name to use).

 Note that I find it extremely useful to have dumping done in
 chronological order - splitting some of it to different files destroys
 this, especially a dump after TODO_start or before TODO_finish
 should appear in the same file (or we could also start splitting
 individual TODO_ output into sub-dump-files).  I guess what would
 be nice instread would be a fancy dump-file viewer that could
 show diffs, hide things like SCEV output, etc.

 I suppose a patch that removes the dump TODO and unconditionally
 dumps at the current point would be a good preparation for this
 enhancing patch.

 Richard.

 The default is 'finish'.

 Does it look ok?

 Thanks,

 David

 On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com 
 wrote:

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
 +       (dump_flags  TDF_BEFORE))
 +    {
 +      pass-todo_flags_finish = ~TODO_dump_func;
 +      pass-todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).


 How about removing dumping TODO completely -- this can be done easily
 -- I don't understand why pass wants extra control on the dumping if
 user already asked for dumping -- it is annoying to see empty IR dump
 for a pass when I want to see it.

 At least I would have expected to also get the dump after the
 pass, not only the one before it with this dump flag.

 Now, why can't you look at the previous pass output for the
 before-dump (as I do usually)?

 For one thing, you need to either remember what is the previous pass,
 or dump all passes which for large files can take very long time. Even
 with all the dumps, you will need to eyeballing to find the previous
 pass which may or may not have the IR dumped.

 How about removing dump TODO?

 Yeah, I think this would go in the right direction.  Currently some passes
 do not dump function bodies because they presumably do no IL
 modification.  But this is certainly the minority (and some passes do not
 dump bodies even though they are modifying the IL ...).

 So I'd say we should by default dump function bodies.

 Note that there are three useful dumping positions (maybe four),
 before todo-start, after todo-start, before todo-finish and after 
 todo-finish.
 By default we'd want after todo-finish.  When we no longer dump via
 a TODO then we could indeed use dump-flags to control this
 (maybe -original for the body before todo-start).

 What to others think?

 Richard.

 Thanks,

 David



 Richard.







Re: Dump before flag

2011-06-08 Thread Xinliang David Li
this is the patch that just removes the TODO_dump flag and forces it
to dump. The original code cfun-last_verified = flags 
TODO_verify_all looks weird -- depending on TODO_dump is set or not,
the behavior of the update is different (when no other todo flags is
set).

Ok for trunk?

David

On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li davi...@google.com wrote:
 The following is the patch that does the job. Most of the changes are
 just  removing TODO_dump_func. The major change is in passes.c and
 tree-pass.h.

 -fdump-xxx-yyy-start       -- dump before TODO_start
 -fdump-xxx-yyy-before    -- dump before main pass after TODO_pass
 -fdump-xxx-yyy-after       -- dump after main pass before TODO_finish
 -fdump-xxx-yyy-finish      -- dump after TODO_finish

 Can we bikeshed a bit more about these names?

 These names may be less confusing:

 before_preparation
 before
 after
 after_cleanup

 David

 start and before
 have no semantical difference to me ... as the dump before TODO_start
 of a pass and the dump after TODO_finish of the previous pass are
 identical (hopefully ;)), maybe merge those into a -between flag?
 If you'd specify it for a single pass then you'd get both -start and -finish
 (using your naming scheme).  Splitting that dump(s) to different files
 then might make sense (not sure about the name to use).

 Note that I find it extremely useful to have dumping done in
 chronological order - splitting some of it to different files destroys
 this, especially a dump after TODO_start or before TODO_finish
 should appear in the same file (or we could also start splitting
 individual TODO_ output into sub-dump-files).  I guess what would
 be nice instread would be a fancy dump-file viewer that could
 show diffs, hide things like SCEV output, etc.

 I suppose a patch that removes the dump TODO and unconditionally
 dumps at the current point would be a good preparation for this
 enhancing patch.

 Richard.

 The default is 'finish'.

 Does it look ok?

 Thanks,

 David

 On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com 
 wrote:

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
 +       (dump_flags  TDF_BEFORE))
 +    {
 +      pass-todo_flags_finish = ~TODO_dump_func;
 +      pass-todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).


 How about removing dumping TODO completely -- this can be done easily
 -- I don't understand why pass wants extra control on the dumping if
 user already asked for dumping -- it is annoying to see empty IR dump
 for a pass when I want to see it.

 At least I would have expected to also get the dump after the
 pass, not only the one before it with this dump flag.

 Now, why can't you look at the previous pass output for the
 before-dump (as I do usually)?

 For one thing, you need to either remember what is the previous pass,
 or dump all passes which for large files can take very long time. Even
 with all the dumps, you will need to eyeballing to find the previous
 pass which may or may not have the IR dumped.

 How about removing dump TODO?

 Yeah, I think this would go in the right direction.  Currently some passes
 do not dump function bodies because they presumably do no IL
 modification.  But this is certainly the minority (and some passes do not
 dump bodies even though they are modifying the IL ...).

 So I'd say we should by default dump function bodies.

 Note that there are three useful dumping positions (maybe four),
 before todo-start, after todo-start, before todo-finish and after 
 todo-finish.
 By default we'd want after todo-finish.  When we no longer dump via
 a TODO then we could indeed use dump-flags to control this
 (maybe -original for the body before todo-start).

 What to others think?

 Richard.

 Thanks,

 David



 Richard.








Re: Dump before flag

2011-06-07 Thread Xinliang David Li
On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote:

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
 +       (dump_flags  TDF_BEFORE))
 +    {
 +      pass-todo_flags_finish = ~TODO_dump_func;
 +      pass-todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).


 How about removing dumping TODO completely -- this can be done easily
 -- I don't understand why pass wants extra control on the dumping if
 user already asked for dumping -- it is annoying to see empty IR dump
 for a pass when I want to see it.

 At least I would have expected to also get the dump after the
 pass, not only the one before it with this dump flag.

 Now, why can't you look at the previous pass output for the
 before-dump (as I do usually)?

 For one thing, you need to either remember what is the previous pass,
 or dump all passes which for large files can take very long time. Even
 with all the dumps, you will need to eyeballing to find the previous
 pass which may or may not have the IR dumped.

 How about removing dump TODO?

 Yeah, I think this would go in the right direction.  Currently some passes
 do not dump function bodies because they presumably do no IL
 modification.  But this is certainly the minority (and some passes do not
 dump bodies even though they are modifying the IL ...).

 So I'd say we should by default dump function bodies.

 Note that there are three useful dumping positions (maybe four),
 before todo-start, after todo-start, before todo-finish and after todo-finish.
 By default we'd want after todo-finish.  When we no longer dump via
 a TODO then we could indeed use dump-flags to control this
 (maybe -original for the body before todo-start).

 What to others think?

I think  that is a very good thing to have.

David



 Richard.

 Thanks,

 David



 Richard.





Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-07 Thread Xinliang David Li
Ok -- that sounds good.

David

On Tue, Jun 7, 2011 at 3:10 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:00 PM, Xinliang David Li davi...@google.com wrote:
 On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com 
 wrote:
 This is the version of the patch that walks through pass lists.

 Ok with this one?

 +/* Dump all optimization passes.  */
 +
 +void
 +dump_passes (void)
 +{
 +  struct cgraph_node *n, *node = NULL;
 +  tree save_fndecl = current_function_decl;
 +
 +  fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid);

 this isn't accurate info - cloning can cause more cgraph nodes to
 appear (it also looks completely unrelated to dump_passes ...).
 Please drop it.

 Ok.



 +  create_pass_tab();
 +  gcc_assert (pass_tab);

 you have quite many asserts of this kind - we don't want them when
 the previous stmt as in this case indicates everything is ok.

 ok.


 +  push_cfun (DECL_STRUCT_FUNCTION (node-decl));

 this has side-effects, I'm not sure we want this here.  Why do you
 need it?  Probably because of

 +  is_really_on = override_gate_status (pass, current_function_decl, is_on);

 ?  But that is dependent on the function given which should have no
 effect (unless it is overridden globally in which case override_gate_status
 and friends should deal with a NULL cfun).

 As we discussed, currently some pass gate functions depend on per node
 information -- those checks need to be pushed into execute functions.
 I would like to clean those up later -- at which time, the push/pop
 can be removed.

 I'd like to do it the other way around, first clean up the gate functions then
 drop in this patch without the cfun push/pop.  The revised patch looks ok
 to me with the cfun push/pop removed.

 Thanks,
 Richard.


 I don't understand why you need another table mapping pass to name
 when pass-name is available and the info is trivially re-constructible.

 This is needed as the pass-name is not the canonicalized name (i.e.,
 not with number suffix etc), so the extra mapping from id to
 normalized name is needed.

 Thanks,

 David


 Thanks,
 Richard.

 David

 On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have 
 dependencies on cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.

 Well, if you have a CU that is empty or optimized to nothing at some 
 point
 you will not get a complete pass list.  I suppose optimize attributes 
 might
 also confuse output.  Your solution might not be that intrusive
 but it is still ugly.  I don't see 1) as an issue, for 2) you can just 
 call the
 dumping from toplev_main before calling do_compile (), 3) gate functions
 shouldn't have side-effects, but as they could gate on 
 optimize_for_speed ()
 your option summary output will be bogus anyway.

 So - what is the output intended for if it isn't reliable?

 This needs to be cleaned up at some point -- the gate function should
 behave the same for all functions and per-function decisions need to
 be pushed down to the executor body.  I will try to rework the patch
 as you suggested to see if there are problems.

 David



 Richard.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have

Re: Dump before flag

2011-06-07 Thread Xinliang David Li
Any suggestions on the dump position specification string, before and
after is not enough. How about

start, before, after, and finish?

I.e.

-fdump-tree-pre-start -- dump IR before TODO_start of PRE pass
-fdump-tree-pre-before -- dump IR just before PRE after its TODO start finishes
-fdump-tree-pre-after -- dump IR just after PRE
-fdump-tree-pre-finish -- dump it after TODO_finish of PRE?

Thanks,

David


On Tue, Jun 7, 2011 at 9:43 AM, Diego Novillo dnovi...@google.com wrote:
 On Tue, Jun 7, 2011 at 02:36, Richard Guenther
 richard.guent...@gmail.com wrote:

 For one thing, you need to either remember what is the previous pass,
 or dump all passes which for large files can take very long time. Even
 with all the dumps, you will need to eyeballing to find the previous
 pass which may or may not have the IR dumped.

 How about removing dump TODO?

 Yeah, I think this would go in the right direction.

 Agreed.


 Diego.



Re: Dump before flag

2011-06-07 Thread Xinliang David Li
On Tue, Jun 7, 2011 at 10:01 AM, Diego Novillo dnovi...@google.com wrote:
 On Tue, Jun 7, 2011 at 09:51, Xinliang David Li davi...@google.com wrote:
 Any suggestions on the dump position specification string, before and
 after is not enough. How about

 start, before, after, and finish?

 I.e.

 -fdump-tree-pre-start -- dump IR before TODO_start of PRE pass
 -fdump-tree-pre-before -- dump IR just before PRE after its TODO start 
 finishes

 What would be the difference between these two?  The TODO_start
 actions don't affect the IL, in general.

But by design, it can right?

David


 -fdump-tree-pre-after -- dump IR just after PRE
 -fdump-tree-pre-finish -- dump it after TODO_finish of PRE?

 This would be to catch changes like cfg cleanup and SSA update
 actions, right?  Sure.

 I don't mind too much about the naming.  Anything that makes sense is
 good with me.


 Diego.



Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-07 Thread Xinliang David Li
Please review the attached two patches.

In the first patch, gate functions are cleaned up. All the per
function legality checks are moved into the executor and the
optimization heuristic checks (optimize for size) remain in the
gators. These allow the the following overriding order:

common flags (O2, -ftree-vrp, -fgcse etc)   ---  compiler
heuristic (optimize for size/speed) --- -fdisable/enable forcing pass
options  --- legality check

Testing under going. Ok for trunk?

Thanks,

David

On Tue, Jun 7, 2011 at 9:24 AM, Xinliang David Li davi...@google.com wrote:
 Ok -- that sounds good.

 David

 On Tue, Jun 7, 2011 at 3:10 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:00 PM, Xinliang David Li davi...@google.com wrote:
 On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com 
 wrote:
 This is the version of the patch that walks through pass lists.

 Ok with this one?

 +/* Dump all optimization passes.  */
 +
 +void
 +dump_passes (void)
 +{
 +  struct cgraph_node *n, *node = NULL;
 +  tree save_fndecl = current_function_decl;
 +
 +  fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid);

 this isn't accurate info - cloning can cause more cgraph nodes to
 appear (it also looks completely unrelated to dump_passes ...).
 Please drop it.

 Ok.



 +  create_pass_tab();
 +  gcc_assert (pass_tab);

 you have quite many asserts of this kind - we don't want them when
 the previous stmt as in this case indicates everything is ok.

 ok.


 +  push_cfun (DECL_STRUCT_FUNCTION (node-decl));

 this has side-effects, I'm not sure we want this here.  Why do you
 need it?  Probably because of

 +  is_really_on = override_gate_status (pass, current_function_decl, 
 is_on);

 ?  But that is dependent on the function given which should have no
 effect (unless it is overridden globally in which case override_gate_status
 and friends should deal with a NULL cfun).

 As we discussed, currently some pass gate functions depend on per node
 information -- those checks need to be pushed into execute functions.
 I would like to clean those up later -- at which time, the push/pop
 can be removed.

 I'd like to do it the other way around, first clean up the gate functions 
 then
 drop in this patch without the cfun push/pop.  The revised patch looks ok
 to me with the cfun push/pop removed.

 Thanks,
 Richard.


 I don't understand why you need another table mapping pass to name
 when pass-name is available and the info is trivially re-constructible.

 This is needed as the pass-name is not the canonicalized name (i.e.,
 not with number suffix etc), so the extra mapping from id to
 normalized name is needed.

 Thanks,

 David


 Thanks,
 Richard.

 David

 On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li 
 davi...@google.com wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not 
 simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of 
 it
 at pass execution time when it's not already dumped - that really 
 looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have 
 dependencies on cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.

 Well, if you have a CU that is empty or optimized to nothing at some 
 point
 you will not get a complete pass list.  I suppose optimize attributes 
 might
 also confuse output.  Your solution might not be that intrusive
 but it is still ugly.  I don't see 1) as an issue, for 2) you can just 
 call the
 dumping from toplev_main before calling do_compile (), 3) gate functions
 shouldn't have side-effects, but as they could gate on 
 optimize_for_speed ()
 your option summary output will be bogus anyway.

 So - what is the output intended for if it isn't reliable?

 This needs to be cleaned up at some point -- the gate

Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-07 Thread Xinliang David Li
The dump-pass patch with test case.

David

On Tue, Jun 7, 2011 at 11:54 AM, Xinliang David Li davi...@google.com wrote:
 Please review the attached two patches.

 In the first patch, gate functions are cleaned up. All the per
 function legality checks are moved into the executor and the
 optimization heuristic checks (optimize for size) remain in the
 gators. These allow the the following overriding order:

    common flags (O2, -ftree-vrp, -fgcse etc)   ---  compiler
 heuristic (optimize for size/speed) --- -fdisable/enable forcing pass
 options  --- legality check

 Testing under going. Ok for trunk?

 Thanks,

 David

 On Tue, Jun 7, 2011 at 9:24 AM, Xinliang David Li davi...@google.com wrote:
 Ok -- that sounds good.

 David

 On Tue, Jun 7, 2011 at 3:10 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:00 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com 
 wrote:
 This is the version of the patch that walks through pass lists.

 Ok with this one?

 +/* Dump all optimization passes.  */
 +
 +void
 +dump_passes (void)
 +{
 +  struct cgraph_node *n, *node = NULL;
 +  tree save_fndecl = current_function_decl;
 +
 +  fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid);

 this isn't accurate info - cloning can cause more cgraph nodes to
 appear (it also looks completely unrelated to dump_passes ...).
 Please drop it.

 Ok.



 +  create_pass_tab();
 +  gcc_assert (pass_tab);

 you have quite many asserts of this kind - we don't want them when
 the previous stmt as in this case indicates everything is ok.

 ok.


 +  push_cfun (DECL_STRUCT_FUNCTION (node-decl));

 this has side-effects, I'm not sure we want this here.  Why do you
 need it?  Probably because of

 +  is_really_on = override_gate_status (pass, current_function_decl, 
 is_on);

 ?  But that is dependent on the function given which should have no
 effect (unless it is overridden globally in which case 
 override_gate_status
 and friends should deal with a NULL cfun).

 As we discussed, currently some pass gate functions depend on per node
 information -- those checks need to be pushed into execute functions.
 I would like to clean those up later -- at which time, the push/pop
 can be removed.

 I'd like to do it the other way around, first clean up the gate functions 
 then
 drop in this patch without the cfun push/pop.  The revised patch looks ok
 to me with the cfun push/pop removed.

 Thanks,
 Richard.


 I don't understand why you need another table mapping pass to name
 when pass-name is available and the info is trivially re-constructible.

 This is needed as the pass-name is not the canonicalized name (i.e.,
 not with number suffix etc), so the extra mapping from id to
 normalized name is needed.

 Thanks,

 David


 Thanks,
 Richard.

 David

 On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li 
 davi...@google.com wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a 
 list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not 
 simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of 
 it
 at pass execution time when it's not already dumped - that really 
 looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have 
 dependencies on cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.

 Well, if you have a CU that is empty or optimized to nothing at some 
 point
 you will not get a complete pass list.  I suppose optimize attributes 
 might
 also confuse output.  Your solution might not be that intrusive
 but it is still ugly.  I don't see 1) as an issue, for 2) you can just 
 call the
 dumping from toplev_main before calling do_compile (), 3) gate 
 functions
 shouldn't have side-effects, but as they could gate on 
 optimize_for_speed ()
 your option summary output

Re: Dump before flag

2011-06-07 Thread Xinliang David Li
It might be also useful to implement the dumping behavior like this:
if any of the start/before/after/finish option is explicitly
specified, IR (and only IR) will be dumped into files suffixed with
.start/.before/.after/.finish. The debug dump will be dumped as usual
into the non suffixed file name. By default, the IR dump and debug
dump will be dumped into the same file which is the current behavior.

David


On Tue, Jun 7, 2011 at 4:08 PM, Xinliang David Li davi...@google.com wrote:
 The following is the patch that does the job. Most of the changes are
 just  removing TODO_dump_func. The major change is in passes.c and
 tree-pass.h.

 -fdump-xxx-yyy-start       -- dump before TODO_start
 -fdump-xxx-yyy-before    -- dump before main pass after TODO_pass
 -fdump-xxx-yyy-after       -- dump after main pass before TODO_finish
 -fdump-xxx-yyy-finish      -- dump after TODO_finish

 The default is 'finish'.

 Does it look ok?

 Thanks,

 David

 On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li davi...@google.com wrote:

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
 +       (dump_flags  TDF_BEFORE))
 +    {
 +      pass-todo_flags_finish = ~TODO_dump_func;
 +      pass-todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).


 How about removing dumping TODO completely -- this can be done easily
 -- I don't understand why pass wants extra control on the dumping if
 user already asked for dumping -- it is annoying to see empty IR dump
 for a pass when I want to see it.

 At least I would have expected to also get the dump after the
 pass, not only the one before it with this dump flag.

 Now, why can't you look at the previous pass output for the
 before-dump (as I do usually)?

 For one thing, you need to either remember what is the previous pass,
 or dump all passes which for large files can take very long time. Even
 with all the dumps, you will need to eyeballing to find the previous
 pass which may or may not have the IR dumped.

 How about removing dump TODO?

 Yeah, I think this would go in the right direction.  Currently some passes
 do not dump function bodies because they presumably do no IL
 modification.  But this is certainly the minority (and some passes do not
 dump bodies even though they are modifying the IL ...).

 So I'd say we should by default dump function bodies.

 Note that there are three useful dumping positions (maybe four),
 before todo-start, after todo-start, before todo-finish and after 
 todo-finish.
 By default we'd want after todo-finish.  When we no longer dump via
 a TODO then we could indeed use dump-flags to control this
 (maybe -original for the body before todo-start).

 What to others think?

 Richard.

 Thanks,

 David



 Richard.






Re: [google] pessimize stack accounting during inlining

2011-06-07 Thread Xinliang David Li
Ok for google/main.   A good candidate patch for trunk too.

Thanks,

David

On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan meh...@google.com wrote:
 This patch pessimizes stack accounting during inlining.  This enables
 setting a firm stack size limit (via parameters large-stack-frame and
 large-stack-frame-growth).  Without this patch the inliner is overly
 optimistic about potential stack reuse resulting in actual stack frames much
 larger than the parameterized limits.
 Internal benchmarks show minor performance differences with non-fdo and
 lipo, but overall neutral.  Tested/bootstrapped on x86-64.
 Ok for google-main?
 Mark

 2011-06-07  Mark Heffernan  meh...@google.com
         * cgraph.h (cgraph_global_info): Remove field.
         * ipa-inline.c (cgraph_clone_inlined_nodes): Change
         stack frame computation.
         (cgraph_check_inline_limits): Ditto.
         (compute_inline_parameters): Remove dead initialization.

 Index: gcc/cgraph.h
 ===
 --- gcc/cgraph.h        (revision 174512)
 +++ gcc/cgraph.h        (working copy)
 @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info {
  struct GTY(()) cgraph_global_info {
    /* Estimated stack frame consumption by the function.  */
    HOST_WIDE_INT estimated_stack_size;
 -  /* Expected offset of the stack frame of inlined function.  */
 -  HOST_WIDE_INT stack_frame_offset;

    /* For inline clones this points to the function they will be
       inlined into.  */
 Index: gcc/ipa-inline.c
 ===
 --- gcc/ipa-inline.c    (revision 174512)
 +++ gcc/ipa-inline.c    (working copy)
 @@ -229,8 +229,6 @@ void
  cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate,
                             bool update_original)
  {
 -  HOST_WIDE_INT peak;
 -
    if (duplicate)
      {
        /* We may eliminate the need for out-of-line copy to be output.
 @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap
      e-callee-global.inlined_to = e-caller-global.inlined_to;
    else
      e-callee-global.inlined_to = e-caller;
 -  e-callee-global.stack_frame_offset
 -    = e-caller-global.stack_frame_offset
 -      + inline_summary (e-caller)-estimated_self_stack_size;
 -  peak = e-callee-global.stack_frame_offset
 -      + inline_summary (e-callee)-estimated_self_stack_size;
 -  if (e-callee-global.inlined_to-global.estimated_stack_size  peak)
 -    e-callee-global.inlined_to-global.estimated_stack_size = peak;
 +
 +  /* Pessimistically assume no sharing of stack space.  That is, the
 +     frame size of a function is estimated as the original frame size
 +     plus the sum of the frame sizes of all inlined callees.  */
 +  e-callee-global.inlined_to-global.estimated_stack_size +=
 +    inline_summary (e-callee)-estimated_self_stack_size;
 +
    cgraph_propagate_frequency (e-callee);

    /* Recursively clone all bodies.  */
 @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap

    stack_size_limit += stack_size_limit * PARAM_VALUE
 (PARAM_STACK_FRAME_GROWTH) / 100;

 -  inlined_stack = (to-global.stack_frame_offset
 -                  + inline_summary (to)-estimated_self_stack_size
 +  inlined_stack = (to-global.estimated_stack_size
                    + what-global.estimated_stack_size);
    if (inlined_stack   stack_size_limit
         inlined_stack  PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
 @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph
    self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
    inline_summary (node)-estimated_self_stack_size = self_stack_size;
    node-global.estimated_stack_size = self_stack_size;
 -  node-global.stack_frame_offset = 0;

    /* Can this function be inlined at all?  */
    node-local.inlinable = tree_inlinable_function_p (node-decl);



Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-06 Thread Xinliang David Li
On Mon, Jun 6, 2011 at 4:22 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 7:24 PM, Xinliang David Li davi...@google.com wrote:
 The attached is the split #1 patch that enhances -fenable/disable.

 Ok after testing?

 I expect the testcases will be quite fragile, so while I appreciate
 test coverage for new options I think we should go without those
 that involve any kind of UID.  Those which use assembler names
 also will fail randomly dependent on how targets mangle their
 functions - so I think we have to drop all testcases.

Ok -- how about keeping tests with large uid range, and assembler name
for x86? A feature without testing is just to easy to break without
being noticed.


 Also

 +/* A helper function to determine if an identifier is valid to
 +   be an assembler name (better to use target specific hook).  */
 +
 +static bool
 +is_valid_assembler_name (const char *str)
 +{
 +  const char *p = str;
 +  char c;
 +
 +  c = *p;
 +  if (!((c = 'a'  c = 'z')
 +        || (c = 'A'  c = 'Z')
 +        || *p == '_'))
 +    return false;
 +
 +  p++;
 +  while ((c = *p))
 +   {
 +     if (!((c = 'a'  c = 'z')
 +           || (c = 'A'  c = 'Z')
 +           || (c = '0'  c = '9')
 +           || *p == '_'))
 +       return false;
 +     p++;
 +   }
 +
 +  return true;
 +}

 why all that complicated checks?  Why not just check for p[0]
 in [^0-9] and re-structure the range parsing to switch between
 UIDs and assembler-names that way?

Ok.

David


 Thanks,
 Richard.

 Thanks,
 David

 On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies on 
 cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David







Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-06 Thread Xinliang David Li
On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com wrote:
 This is the version of the patch that walks through pass lists.

 Ok with this one?

 +/* Dump all optimization passes.  */
 +
 +void
 +dump_passes (void)
 +{
 +  struct cgraph_node *n, *node = NULL;
 +  tree save_fndecl = current_function_decl;
 +
 +  fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid);

 this isn't accurate info - cloning can cause more cgraph nodes to
 appear (it also looks completely unrelated to dump_passes ...).
 Please drop it.

Ok.



 +  create_pass_tab();
 +  gcc_assert (pass_tab);

 you have quite many asserts of this kind - we don't want them when
 the previous stmt as in this case indicates everything is ok.

ok.


 +  push_cfun (DECL_STRUCT_FUNCTION (node-decl));

 this has side-effects, I'm not sure we want this here.  Why do you
 need it?  Probably because of

 +  is_really_on = override_gate_status (pass, current_function_decl, is_on);

 ?  But that is dependent on the function given which should have no
 effect (unless it is overridden globally in which case override_gate_status
 and friends should deal with a NULL cfun).

As we discussed, currently some pass gate functions depend on per node
information -- those checks need to be pushed into execute functions.
I would like to clean those up later -- at which time, the push/pop
can be removed.


 I don't understand why you need another table mapping pass to name
 when pass-name is available and the info is trivially re-constructible.

This is needed as the pass-name is not the canonicalized name (i.e.,
not with number suffix etc), so the extra mapping from id to
normalized name is needed.

Thanks,

David


 Thanks,
 Richard.

 David

 On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies 
 on cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.

 Well, if you have a CU that is empty or optimized to nothing at some point
 you will not get a complete pass list.  I suppose optimize attributes might
 also confuse output.  Your solution might not be that intrusive
 but it is still ugly.  I don't see 1) as an issue, for 2) you can just 
 call the
 dumping from toplev_main before calling do_compile (), 3) gate functions
 shouldn't have side-effects, but as they could gate on optimize_for_speed 
 ()
 your option summary output will be bogus anyway.

 So - what is the output intended for if it isn't reliable?

 This needs to be cleaned up at some point -- the gate function should
 behave the same for all functions and per-function decisions need to
 be pushed down to the executor body.  I will try to rework the patch
 as you suggested to see if there are problems.

 David



 Richard.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David









Re: Dump before flag

2011-06-06 Thread Xinliang David Li

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
 +       (dump_flags  TDF_BEFORE))
 +    {
 +      pass-todo_flags_finish = ~TODO_dump_func;
 +      pass-todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).


How about removing dumping TODO completely -- this can be done easily
-- I don't understand why pass wants extra control on the dumping if
user already asked for dumping -- it is annoying to see empty IR dump
for a pass when I want to see it.

 At least I would have expected to also get the dump after the
 pass, not only the one before it with this dump flag.

 Now, why can't you look at the previous pass output for the
 before-dump (as I do usually)?

For one thing, you need to either remember what is the previous pass,
or dump all passes which for large files can take very long time. Even
with all the dumps, you will need to eyeballing to find the previous
pass which may or may not have the IR dumped.

How about removing dump TODO?

Thanks,

David



 Richard.



Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-06 Thread Xinliang David Li
Please take a look at the revised one.

Thanks,

David

On Mon, Jun 6, 2011 at 4:22 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 7:24 PM, Xinliang David Li davi...@google.com wrote:
 The attached is the split #1 patch that enhances -fenable/disable.

 Ok after testing?

 I expect the testcases will be quite fragile, so while I appreciate
 test coverage for new options I think we should go without those
 that involve any kind of UID.  Those which use assembler names
 also will fail randomly dependent on how targets mangle their
 functions - so I think we have to drop all testcases.

 Also

 +/* A helper function to determine if an identifier is valid to
 +   be an assembler name (better to use target specific hook).  */
 +
 +static bool
 +is_valid_assembler_name (const char *str)
 +{
 +  const char *p = str;
 +  char c;
 +
 +  c = *p;
 +  if (!((c = 'a'  c = 'z')
 +        || (c = 'A'  c = 'Z')
 +        || *p == '_'))
 +    return false;
 +
 +  p++;
 +  while ((c = *p))
 +   {
 +     if (!((c = 'a'  c = 'z')
 +           || (c = 'A'  c = 'Z')
 +           || (c = '0'  c = '9')
 +           || *p == '_'))
 +       return false;
 +     p++;
 +   }
 +
 +  return true;
 +}

 why all that complicated checks?  Why not just check for p[0]
 in [^0-9] and re-structure the range parsing to switch between
 UIDs and assembler-names that way?

 Thanks,
 Richard.

 Thanks,
 David

 On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies on 
 cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David





Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 174549)
+++ doc/invoke.texi	(working copy)
@@ -5056,11 +5056,12 @@ appended with a sequential number starti
 Disable rtl pass @var{pass}.  @var{pass} is the pass name.  If the same pass is
 statically invoked in the compiler multiple times, the pass name should be
 appended with a sequential number starting from 1.  @var{range-list} is a comma
-seperated list of function ranges.  Each range is a number pair seperated by a colon.
-The range is inclusive in both ends.  If the range is trivial, the number pair can be
-simplified a a single number.  If the function's cgraph node's @var{uid} is falling
-within one of the specified ranges, the @var{pass} is disabled for that function.
-The @var{uid} is shown in the function header of a dump file.
+seperated list of function ranges or assembler names.  Each range is a number
+pair seperated by a colon.  The range is inclusive in both ends.  If the range
+is trivial, the number pair can be simplified as a single number.  If the
+function's cgraph node's @var{uid} is falling within one of the specified ranges,
+the @var{pass} is disabled for that function.  The @var{uid} is shown in the
+function header of a dump file.
 
 @item -fdisable-tree-@var{pass}
 @item -fdisable-tree-@var{pass}=@var{range-list}
@@ -5090,7 +5091,8 @@ of option arguments.
-fenable-tree-cunroll=1
 # disable gcse2 for functions at the following ranges [1,1],
 # [300,400], and [400,1000]
-   -fdisable-rtl-gcse2=1:100,300,400:1000
+# disable gcse2 for functions foo and foo2

Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-06 Thread Xinliang David Li
This is the patch with max id removed.

Thanks,

David

On Mon, Jun 6, 2011 at 9:00 AM, Xinliang David Li davi...@google.com wrote:
 On Mon, Jun 6, 2011 at 4:38 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jun 2, 2011 at 9:12 AM, Xinliang David Li davi...@google.com wrote:
 This is the version of the patch that walks through pass lists.

 Ok with this one?

 +/* Dump all optimization passes.  */
 +
 +void
 +dump_passes (void)
 +{
 +  struct cgraph_node *n, *node = NULL;
 +  tree save_fndecl = current_function_decl;
 +
 +  fprintf (stderr, MAX_UID = %d\n, cgraph_max_uid);

 this isn't accurate info - cloning can cause more cgraph nodes to
 appear (it also looks completely unrelated to dump_passes ...).
 Please drop it.

 Ok.



 +  create_pass_tab();
 +  gcc_assert (pass_tab);

 you have quite many asserts of this kind - we don't want them when
 the previous stmt as in this case indicates everything is ok.

 ok.


 +  push_cfun (DECL_STRUCT_FUNCTION (node-decl));

 this has side-effects, I'm not sure we want this here.  Why do you
 need it?  Probably because of

 +  is_really_on = override_gate_status (pass, current_function_decl, is_on);

 ?  But that is dependent on the function given which should have no
 effect (unless it is overridden globally in which case override_gate_status
 and friends should deal with a NULL cfun).

 As we discussed, currently some pass gate functions depend on per node
 information -- those checks need to be pushed into execute functions.
 I would like to clean those up later -- at which time, the push/pop
 can be removed.


 I don't understand why you need another table mapping pass to name
 when pass-name is available and the info is trivially re-constructible.

 This is needed as the pass-name is not the canonicalized name (i.e.,
 not with number suffix etc), so the extra mapping from id to
 normalized name is needed.

 Thanks,

 David


 Thanks,
 Richard.

 David

 On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies 
 on cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.

 Well, if you have a CU that is empty or optimized to nothing at some point
 you will not get a complete pass list.  I suppose optimize attributes 
 might
 also confuse output.  Your solution might not be that intrusive
 but it is still ugly.  I don't see 1) as an issue, for 2) you can just 
 call the
 dumping from toplev_main before calling do_compile (), 3) gate functions
 shouldn't have side-effects, but as they could gate on optimize_for_speed 
 ()
 your option summary output will be bogus anyway.

 So - what is the output intended for if it isn't reliable?

 This needs to be cleaned up at some point -- the gate function should
 behave the same for all functions and per-function decisions need to
 be pushed down to the executor body.  I will try to rework the patch
 as you suggested to see if there are problems.

 David



 Richard.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David








Index: doc/invoke.texi
===
--- doc

Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-05 Thread Xinliang David Li
Is this patch ok?

Thanks,

David

On Wed, Jun 1, 2011 at 10:24 AM, Xinliang David Li davi...@google.com wrote:
 The attached is the split #1 patch that enhances -fenable/disable.

 Ok after testing?

 Thanks,
 David

 On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies on 
 cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David






Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-05 Thread Xinliang David Li
Is this one ok?

Thanks,

David

On Thu, Jun 2, 2011 at 12:12 AM, Xinliang David Li davi...@google.com wrote:
 This is the version of the patch that walks through pass lists.

 Ok with this one?

 David

 On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies 
 on cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.

 Well, if you have a CU that is empty or optimized to nothing at some point
 you will not get a complete pass list.  I suppose optimize attributes might
 also confuse output.  Your solution might not be that intrusive
 but it is still ugly.  I don't see 1) as an issue, for 2) you can just call 
 the
 dumping from toplev_main before calling do_compile (), 3) gate functions
 shouldn't have side-effects, but as they could gate on optimize_for_speed ()
 your option summary output will be bogus anyway.

 So - what is the output intended for if it isn't reliable?

 This needs to be cleaned up at some point -- the gate function should
 behave the same for all functions and per-function decisions need to
 be pushed down to the executor body.  I will try to rework the patch
 as you suggested to see if there are problems.

 David



 Richard.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David








Re: don't dump decl_uid with TDF_NOUID

2011-06-03 Thread Xinliang David Li
Looks good to me, but you need an OK from a maintainer.

Thanks,

David

On Fri, Jun 3, 2011 at 7:17 AM, Alexandre Oliva aol...@redhat.com wrote:
 A recent change introduced decl_uid in the “;; Function ” header in dump
 files.  This breaks -fcompare-debug (and bootstrap-debug-lean), because
 decl uids aren't kept in sync between -g and non-g compilations.

 This patch rearranges the header so that decl_uid is omitted when
 dumping with NOUID, and it prints the header for the -fcompare-debug
 final dump with NOUID.

 Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok?




 --
 Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
 You must be the change you wish to see in the world. -- Gandhi
 Be Free! -- http://FSFLA.org/   FSF Latin America board member
 Free Software Evangelist      Red Hat Brazil Compiler Engineer




Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-02 Thread Xinliang David Li
This is the version of the patch that walks through pass lists.

Ok with this one?

David

On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com 
 wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies on 
 cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.

 Well, if you have a CU that is empty or optimized to nothing at some point
 you will not get a complete pass list.  I suppose optimize attributes might
 also confuse output.  Your solution might not be that intrusive
 but it is still ugly.  I don't see 1) as an issue, for 2) you can just call 
 the
 dumping from toplev_main before calling do_compile (), 3) gate functions
 shouldn't have side-effects, but as they could gate on optimize_for_speed ()
 your option summary output will be bogus anyway.

 So - what is the output intended for if it isn't reliable?

 This needs to be cleaned up at some point -- the gate function should
 behave the same for all functions and per-function decisions need to
 be pushed down to the executor body.  I will try to rework the patch
 as you suggested to see if there are problems.

 David



 Richard.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David







dump-pass3.p
Description: Binary data


out
Description: Binary data


Re: [google] Use minimum cost circulation, not minimum cost flow to smooth profiles other minor fixes. (issue4536106)

2011-06-02 Thread Xinliang David Li
ok for google/main.

David

On Thu, Jun 2, 2011 at 11:00 AM, Martin Thuresson mart...@google.com wrote:
 This patch from Neil Vachharajani and Dehao Chen improves mcf by using
 minimum cost circulation instead of minimum cost flow to smooth profiles.
 It also introduces a parameter for controlling running time of the algorithm.
 This was what was originally presented in the academic work and handles
 certain cases where the function entry and exit have incorrect profile
 weights.

 For now, this is for google/main. Once I have collected performance results
 from SPEC I will propose this patch for trunk as well.

 Bootstraps and no test regressions. Ok for google/main?

 2011-06-02  Neil Vachharajani  nvach...@gmail.com, Dehao Chen
        daniel...@gmail.com

        * gcc/doc/invoke.texi (min-mcf-cancel-iters): Document.
        * gcc/mcf.c (MAX_ITER): Use new param PARAM_MIN_MCF_CANCEL_ITERS.
        (edge_type): Add SINK_SOURCE_EDGE.
        (dump_fixup_edge): Handle SINK_SOURCE_EDGE.
        (create_fixup_graph): Make problem miminum cost circulation.
        (cancel_negative_cycle): Update handling of infinite capacity.
        (compute_residual_flow): Update handling of infinite capacity.
        (find_max_flow): Update handling of infinite capacity.
        (modify_sink_source_capacity): New function.
        (find_minimum_cost_flow): Make problem miminum cost circulation.
        Use param PARAM_MIN_MCF_CANCEL_ITERS.
        * gcc/params.def (PARAM_MIN_MCF_CANCEL_ITERS): Define.

 Index: gcc/doc/invoke.texi
 ===
 --- gcc/doc/invoke.texi (revision 174456)
 +++ gcc/doc/invoke.texi (working copy)
 @@ -8341,6 +8341,12 @@ whether the result of a complex multipli

  The default is @option{-fno-cx-fortran-rules}.

 +@item min-mcf-cancel-iters
 +The minimum number of iterations of negative cycle cancellation during
 +MCF profile correction before early termination.  This parameter is
 +only useful when using @option{-fprofile-correction}.
 +
 +
  @end table

  The following options control optimizations that may improve
 Index: gcc/mcf.c
 ===
 --- gcc/mcf.c   (revision 174456)
 +++ gcc/mcf.c   (working copy)
 @@ -52,6 +52,8 @@ along with GCC; see the file COPYING3.
  #include langhooks.h
  #include tree.h
  #include gcov-io.h
 +#include params.h
 +#include diagnostic-core.h

  #include profile.h

 @@ -64,15 +66,18 @@ along with GCC; see the file COPYING3.
  #define COST(k, w)      ((k) / mcf_ln ((w) + 2))
  /* Limit the number of iterations for cancel_negative_cycles() to ensure
    reasonable compile time.  */
 -#define MAX_ITER(n, e)  10 + (100 / ((n) * (e)))
 +#define MAX_ITER(n, e)  (PARAM_VALUE (PARAM_MIN_MCF_CANCEL_ITERS) + \
 +                        (100 / ((n) * (e
 +
  typedef enum
  {
 -  INVALID_EDGE,
 +  INVALID_EDGE = 0,
   VERTEX_SPLIT_EDGE,       /* Edge to represent vertex with w(e) = w(v).  */
   REDIRECT_EDGE,           /* Edge after vertex transformation.  */
   REVERSE_EDGE,
   SOURCE_CONNECT_EDGE,     /* Single edge connecting to single source.  */
   SINK_CONNECT_EDGE,       /* Single edge connecting to single sink.  */
 +  SINK_SOURCE_EDGE,        /* Single edge connecting sink to source.  */
   BALANCE_EDGE,                    /* Edge connecting with source/sink: cp(e) 
 = 0.  */
   REDIRECT_NORMALIZED_EDGE, /* Normalized edge for a redirect edge.  */
   REVERSE_NORMALIZED_EDGE   /* Normalized edge for a reverse edge.  */
 @@ -250,6 +255,10 @@ dump_fixup_edge (FILE *file, fixup_graph
          fputs ( @SINK_CONNECT_EDGE, file);
          break;

 +       case SINK_SOURCE_EDGE:
 +         fputs ( @SINK_SOURCE_EDGE, file);
 +         break;
 +
        case REVERSE_EDGE:
          fputs ( @REVERSE_EDGE, file);
          break;
 @@ -465,7 +474,7 @@ create_fixup_graph (fixup_graph_type *fi
   double k_neg = 0;
   /* Vector to hold D(v) = sum_out_edges(v) - sum_in_edges(v).  */
   gcov_type *diff_out_in = NULL;
 -  gcov_type supply_value = 1, demand_value = 0;
 +  gcov_type supply_value = 0, demand_value = 0;
   gcov_type fcost = 0;
   int new_entry_index = 0, new_exit_index = 0;
   int i = 0, j = 0;
 @@ -486,14 +495,15 @@ create_fixup_graph (fixup_graph_type *fi
     fnum_vertices_after_transform + n_edges + n_basic_blocks + 2;

   /* In create_fixup_graph: Each basic block and edge can be split into 3
 -     edges. Number of balance edges = n_basic_blocks. So after
 -     create_fixup_graph:
 -     max_edges = 4 * n_basic_blocks + 3 * n_edges
 +     edges. Number of balance edges = n_basic_blocks - 1. And there is 1 edge
 +     connecting new_entry and new_exit, and 2 edges connecting new_entry to
 +     entry, and exit to new_exit. So after create_fixup_graph:
 +     max_edges = 4 * n_basic_blocks + 3 * n_edges + 2
      Accounting for residual flow edges
 -     max_edges = 2 * (4 * n_basic_blocks + 3 * n_edges)
 -     = 8 * n_basic_blocks + 6 * 

Re: [google] Use minimum cost circulation, not minimum cost flow to smooth profiles other minor fixes. (issue4536106)

2011-06-02 Thread Xinliang David Li
Counter overflow?

David

On Thu, Jun 2, 2011 at 11:12 AM, Martin Thuresson mart...@google.com wrote:
 On Thu, Jun 2, 2011 at 11:05 AM, Xinliang David Li davi...@google.com wrote:

 Smoothing works for sample FDO and profile data from multi-threaded
 programs. You won't see any difference in SPEC.

 Dehao reported some performance improvements from the algorithmic improvements
 he added in terms of extra fixup edges and handling of infinite capacity.

 Martin





 David

 On Thu, Jun 2, 2011 at 11:00 AM, Martin Thuresson mart...@google.com wrote:
  This patch from Neil Vachharajani and Dehao Chen improves mcf by using
  minimum cost circulation instead of minimum cost flow to smooth profiles.
  It also introduces a parameter for controlling running time of the 
  algorithm.
  This was what was originally presented in the academic work and handles
  certain cases where the function entry and exit have incorrect profile
  weights.
 
  For now, this is for google/main. Once I have collected performance results
  from SPEC I will propose this patch for trunk as well.
 
  Bootstraps and no test regressions. Ok for google/main?
 
  2011-06-02  Neil Vachharajani  nvach...@gmail.com, Dehao Chen
         daniel...@gmail.com
 
         * gcc/doc/invoke.texi (min-mcf-cancel-iters): Document.
         * gcc/mcf.c (MAX_ITER): Use new param PARAM_MIN_MCF_CANCEL_ITERS.
         (edge_type): Add SINK_SOURCE_EDGE.
         (dump_fixup_edge): Handle SINK_SOURCE_EDGE.
         (create_fixup_graph): Make problem miminum cost circulation.
         (cancel_negative_cycle): Update handling of infinite capacity.
         (compute_residual_flow): Update handling of infinite capacity.
         (find_max_flow): Update handling of infinite capacity.
         (modify_sink_source_capacity): New function.
         (find_minimum_cost_flow): Make problem miminum cost circulation.
         Use param PARAM_MIN_MCF_CANCEL_ITERS.
         * gcc/params.def (PARAM_MIN_MCF_CANCEL_ITERS): Define.
 
  Index: gcc/doc/invoke.texi
  ===
  --- gcc/doc/invoke.texi (revision 174456)
  +++ gcc/doc/invoke.texi (working copy)
  @@ -8341,6 +8341,12 @@ whether the result of a complex multipli
 
   The default is @option{-fno-cx-fortran-rules}.
 
  +@item min-mcf-cancel-iters
  +The minimum number of iterations of negative cycle cancellation during
  +MCF profile correction before early termination.  This parameter is
  +only useful when using @option{-fprofile-correction}.
  +
  +
   @end table
 
   The following options control optimizations that may improve
  Index: gcc/mcf.c
  ===
  --- gcc/mcf.c   (revision 174456)
  +++ gcc/mcf.c   (working copy)
  @@ -52,6 +52,8 @@ along with GCC; see the file COPYING3.
   #include langhooks.h
   #include tree.h
   #include gcov-io.h
  +#include params.h
  +#include diagnostic-core.h
 
   #include profile.h
 
  @@ -64,15 +66,18 @@ along with GCC; see the file COPYING3.
   #define COST(k, w)      ((k) / mcf_ln ((w) + 2))
   /* Limit the number of iterations for cancel_negative_cycles() to ensure
     reasonable compile time.  */
  -#define MAX_ITER(n, e)  10 + (100 / ((n) * (e)))
  +#define MAX_ITER(n, e)  (PARAM_VALUE (PARAM_MIN_MCF_CANCEL_ITERS) + \
  +                        (100 / ((n) * (e
  +
   typedef enum
   {
  -  INVALID_EDGE,
  +  INVALID_EDGE = 0,
    VERTEX_SPLIT_EDGE,       /* Edge to represent vertex with w(e) = w(v).  
  */
    REDIRECT_EDGE,           /* Edge after vertex transformation.  */
    REVERSE_EDGE,
    SOURCE_CONNECT_EDGE,     /* Single edge connecting to single source.  */
    SINK_CONNECT_EDGE,       /* Single edge connecting to single sink.  */
  +  SINK_SOURCE_EDGE,        /* Single edge connecting sink to source.  */
    BALANCE_EDGE,                    /* Edge connecting with source/sink: 
  cp(e) = 0.  */
    REDIRECT_NORMALIZED_EDGE, /* Normalized edge for a redirect edge.  */
    REVERSE_NORMALIZED_EDGE   /* Normalized edge for a reverse edge.  */
  @@ -250,6 +255,10 @@ dump_fixup_edge (FILE *file, fixup_graph
           fputs ( @SINK_CONNECT_EDGE, file);
           break;
 
  +       case SINK_SOURCE_EDGE:
  +         fputs ( @SINK_SOURCE_EDGE, file);
  +         break;
  +
         case REVERSE_EDGE:
           fputs ( @REVERSE_EDGE, file);
           break;
  @@ -465,7 +474,7 @@ create_fixup_graph (fixup_graph_type *fi
    double k_neg = 0;
    /* Vector to hold D(v) = sum_out_edges(v) - sum_in_edges(v).  */
    gcov_type *diff_out_in = NULL;
  -  gcov_type supply_value = 1, demand_value = 0;
  +  gcov_type supply_value = 0, demand_value = 0;
    gcov_type fcost = 0;
    int new_entry_index = 0, new_exit_index = 0;
    int i = 0, j = 0;
  @@ -486,14 +495,15 @@ create_fixup_graph (fixup_graph_type *fi
      fnum_vertices_after_transform + n_edges + n_basic_blocks + 2;
 
    /* In create_fixup_graph: Each basic block and edge can be split

Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-01 Thread Xinliang David Li
On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

Yes, that was the original plan -- but it has problems
1) the dumper needs to know the root pass lists -- which can change
frequently -- it can be a long term maintanance burden;
2) the centralized dumper needs to be done after option processing
3) not sure if gate functions have any side effects or have dependencies on cfun

The proposed solutions IMHO is not that intrusive -- just three hooks
to do the dumping and tracking indentation.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

Ok -- for ipa passes or tree/rtl passes where all functions are
explicitly disabled.

Thanks,

David


 Richard.

 Thanks,

 David




Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-01 Thread Xinliang David Li
The attached is the split #1 patch that enhances -fenable/disable.

Ok after testing?

Thanks,
David

On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies on 
 cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David



Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 174424)
+++ doc/invoke.texi	(working copy)
@@ -5056,11 +5056,12 @@ appended with a sequential number starti
 Disable rtl pass @var{pass}.  @var{pass} is the pass name.  If the same pass is
 statically invoked in the compiler multiple times, the pass name should be
 appended with a sequential number starting from 1.  @var{range-list} is a comma
-seperated list of function ranges.  Each range is a number pair seperated by a colon.
-The range is inclusive in both ends.  If the range is trivial, the number pair can be
-simplified a a single number.  If the function's cgraph node's @var{uid} is falling
-within one of the specified ranges, the @var{pass} is disabled for that function.
-The @var{uid} is shown in the function header of a dump file.
+seperated list of function ranges or assembler names.  Each range is a number
+pair seperated by a colon.  The range is inclusive in both ends.  If the range
+is trivial, the number pair can be simplified as a single number.  If the
+function's cgraph node's @var{uid} is falling within one of the specified ranges,
+the @var{pass} is disabled for that function.  The @var{uid} is shown in the
+function header of a dump file.
 
 @item -fdisable-tree-@var{pass}
 @item -fdisable-tree-@var{pass}=@var{range-list}
@@ -5090,7 +5091,8 @@ of option arguments.
-fenable-tree-cunroll=1
 # disable gcse2 for functions at the following ranges [1,1],
 # [300,400], and [400,1000]
-   -fdisable-rtl-gcse2=1:100,300,400:1000
+# disable gcse2 for functions foo and foo2
+   -fdisable-rtl-gcse2=foo,foo2
 # disable early inlining
-fdisable-tree-einline
 # disable ipa inlining
Index: testsuite/gcc.dg/inline_2.c
===
--- testsuite/gcc.dg/inline_2.c	(revision 0)
+++ testsuite/gcc.dg/inline_2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=0:3 -fdisable-ipa-inline } */
+int g;
+__attribute__((always_inline)) void bar (void)
+{
+  g++;
+}
+
+int foo (void)
+{
+  bar ();
+  return g;
+}
+
+int foo2 (void)
+{
+  bar();
+  return g + 1;
+}
+
+/* { dg-final { scan-tree-dump-times bar 5 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
+/* { dg-excess-errors extra notes } */
Index: testsuite/gcc.dg/inline_6.c
===
--- testsuite/gcc.dg/inline_6.c	(revision 0)
+++ testsuite/gcc.dg/inline_6.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=foo2 -fdisable-ipa-inline } */
+int g;
+__attribute__((always_inline)) void bar (void)
+{
+  g++;
+}
+
+int foo (void)
+{
+  bar ();
+  return g;
+}
+
+int foo2 (void)
+{
+  bar();
+  return g + 1;
+}
+
+/* { dg-final { scan-tree-dump-times bar 4 optimized

Re: -fdump-passes -fenable-xxx=func_name_list

2011-06-01 Thread Xinliang David Li
The attached is patch-2 (-fdump-passes) and a sample output:

Ok for trunk?

David

On Wed, Jun 1, 2011 at 9:16 AM, Xinliang David Li davi...@google.com wrote:
 On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote:
 The following patch implements the a new option that dumps gcc PASS
 configuration. The sample output is attached.  There is one
 limitation: some placeholder passes that are named with '*xxx' are
 note registered thus they are not listed. They are not important as
 they can not be turned on/off anyway.

 The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
 of function assembler names to be specified.

 Ok for trunk?

 Please split the patch.

 I'm not too happy how you dump the pass configuration.  Why not simply,
 at a _single_ place, walk the pass tree?  Instead of doing pieces of it
 at pass execution time when it's not already dumped - that really looks
 gross.

 Yes, that was the original plan -- but it has problems
 1) the dumper needs to know the root pass lists -- which can change
 frequently -- it can be a long term maintanance burden;
 2) the centralized dumper needs to be done after option processing
 3) not sure if gate functions have any side effects or have dependencies on 
 cfun

 The proposed solutions IMHO is not that intrusive -- just three hooks
 to do the dumping and tracking indentation.


 The documentation should also link this option to the -fenable/disable
 options as obviously the pass names in that dump are those to be
 used for those flags (and not readily available anywhere else).

 Ok.


 I also think that it would be way more useful to note in the individual
 dump files the functions (at the place they would usually appear) that
 have the pass explicitly enabled/disabled.

 Ok -- for ipa passes or tree/rtl passes where all functions are
 explicitly disabled.

 Thanks,

 David


 Richard.

 Thanks,

 David



Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 174535)
+++ doc/invoke.texi	(working copy)
@@ -291,6 +291,7 @@ Objective-C and Objective-C++ Dialects}.
 -fdump-translation-unit@r{[}-@var{n}@r{]} @gol
 -fdump-class-hierarchy@r{[}-@var{n}@r{]} @gol
 -fdump-ipa-all -fdump-ipa-cgraph -fdump-ipa-inline @gol
+-fdump-passes @gol
 -fdump-statistics @gol
 -fdump-tree-all @gol
 -fdump-tree-original@r{[}-@var{n}@r{]}  @gol
@@ -5060,7 +5061,8 @@ seperated list of function ranges.  Each
 The range is inclusive in both ends.  If the range is trivial, the number pair can be
 simplified a a single number.  If the function's cgraph node's @var{uid} is falling
 within one of the specified ranges, the @var{pass} is disabled for that function.
-The @var{uid} is shown in the function header of a dump file.
+The @var{uid} is shown in the function header of a dump file, and pass names can be
+dumped by using option @option{-fdump-passes}.
 
 @item -fdisable-tree-@var{pass}
 @item -fdisable-tree-@var{pass}=@var{range-list}
@@ -5483,6 +5485,11 @@ Dump after function inlining.
 
 @end table
 
+@item -fdump-passes
+@opindex fdump-passes
+Dump the list of optimization passes that are turned on and off by
+the current command line options.
+
 @item -fdump-statistics-@var{option}
 @opindex fdump-statistics
 Enable and control dumping of pass statistics in a separate file.  The
Index: common.opt
===
--- common.opt	(revision 174535)
+++ common.opt	(working copy)
@@ -1012,6 +1012,10 @@ fdump-noaddr
 Common Report Var(flag_dump_noaddr)
 Suppress output of addresses in debugging dumps
 
+fdump-passes
+Common Var(flag_dump_passes) Init(0)
+Dump optimization passes
+
 fdump-unnumbered
 Common Report Var(flag_dump_unnumbered)
 Suppress output of instruction numbers, line number notes and addresses in debugging dumps
Index: passes.c
===
--- passes.c	(revision 174536)
+++ passes.c	(working copy)
@@ -478,7 +478,7 @@ passr_eq (const void *p1, const void *p2
   return !strcmp (s1-unique_name, s2-unique_name);
 }
 
-static htab_t pass_name_tab = NULL;
+static htab_t name_to_pass_map = NULL;
 
 /* Register PASS with NAME.  */
 
@@ -488,11 +488,11 @@ register_pass_name (struct opt_pass *pas
   struct pass_registry **slot;
   struct pass_registry pr;
 
-  if (!pass_name_tab)
-pass_name_tab = htab_create (256, passr_hash, passr_eq, NULL);
+  if (!name_to_pass_map)
+name_to_pass_map = htab_create (256, passr_hash, passr_eq, NULL);
 
   pr.unique_name = name;
-  slot = (struct pass_registry **) htab_find_slot (pass_name_tab, pr, INSERT);
+  slot = (struct pass_registry **) htab_find_slot (name_to_pass_map, pr, INSERT);
   if (!*slot)
 {
   struct pass_registry *new_pr;
@@ -506,6 +506,117 @@ register_pass_name (struct opt_pass *pas
 return; /* Ignore plugin passes

Dump before flag

2011-06-01 Thread Xinliang David Li
Hi, this is a simple patch that support dump_before flag. E.g,

-fdump-tree-pre-before

This is useful for diffing the the IR before and after a pass.

Gcc dumping needs more cleanups -- such as allowing IR only dump,
allowing IR dumping for a particular function etc. The exposure of
'dumpfile' (instead of a dumping_level () function) makes those change
a little messy, but can be done.

Ok for trunk?

Thanks,

David


Re: Dump before flag

2011-06-01 Thread Xinliang David Li
Sorry about it. Here it is.

David


On Wed, Jun 1, 2011 at 1:36 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, Jun 1, 2011 at 10:26 PM, Xinliang David Li davi...@google.com wrote:
 Hi, this is a simple patch that support dump_before flag. E.g,

 -fdump-tree-pre-before

 This is useful for diffing the the IR before and after a pass.

 Gcc dumping needs more cleanups -- such as allowing IR only dump,
 allowing IR dumping for a particular function etc. The exposure of
 'dumpfile' (instead of a dumping_level () function) makes those change
 a little messy, but can be done.

 Ok for trunk?

 -ENOPATCH

 Thanks,

 David


2011-06-01  David Li  davi...@google.com

	* tree-dump.c: New dump flags.
	* tree-pass.h: New dump flags.
	* passes.c (execute_one_pass): Handle dump_before flag.

Index: tree-dump.c
===
--- tree-dump.c	(revision 174446)
+++ tree-dump.c	(working copy)
@@ -808,6 +808,7 @@ struct dump_option_value_info
in tree.h */
 static const struct dump_option_value_info dump_options[] =
 {
+  {before,  TDF_BEFORE},
   {address, TDF_ADDRESS},
   {asmname, TDF_ASMNAME},
   {slim, TDF_SLIM},
Index: tree-pass.h
===
--- tree-pass.h	(revision 174446)
+++ tree-pass.h	(working copy)
@@ -83,6 +83,7 @@ enum tree_dump_index
 #define TDF_ALIAS	(1  21)	/* display alias information  */
 #define TDF_ENUMERATE_LOCALS (1  22)	/* Enumerate locals by uid.  */
 #define TDF_CSELIB	(1  23)	/* Dump cselib details.  */
+#define TDF_BEFORE	(1  24)	/* Dump IR before pass.  */
 
 
 /* In tree-dump.c */
Index: passes.c
===
--- passes.c	(revision 174446)
+++ passes.c	(working copy)
@@ -1563,6 +1563,13 @@ execute_one_pass (struct opt_pass *pass)
 
   initializing_dump = pass_init_dump_file (pass);
 
+  /* Override dump TODOs.  */
+  if (dump_file  (pass-todo_flags_finish  TODO_dump_func)
+   (dump_flags  TDF_BEFORE))
+{
+  pass-todo_flags_finish = ~TODO_dump_func;
+  pass-todo_flags_start |= TODO_dump_func;
+}
   /* Run pre-pass verification.  */
   execute_todo (pass-todo_flags_start);
 


Re: Dump before flag

2011-06-01 Thread Xinliang David Li
On Wed, Jun 1, 2011 at 2:12 PM, Basile Starynkevitch
bas...@starynkevitch.net wrote:
 On Wed, 1 Jun 2011 13:26:24 -0700
 Xinliang David Li davi...@google.com wrote:

 Hi, this is a simple patch that support dump_before flag. E.g,

 -fdump-tree-pre-before

 This is useful for diffing the the IR before and after a pass.

 Perhaps you forgot to actually attach the patch?

Right -- attached in a follow up email.

 Gcc dumping needs more cleanups -- such as allowing IR only dump,
 allowing IR dumping for a particular function etc. The exposure of
 'dumpfile' (instead of a dumping_level () function) makes those change
 a little messy, but can be done.

 I don't understand what you mean by a dumping_level () function. What
 should that hypothetical function do? (I'm wrongly guessing it would
 return an integer, but IIRC dumpfile is a FILE*)

THere are two sources of dump:

1) IR dump performed by pass manager
2) pass specific debugging dump (the verbosity is controlled by -details flag).

2) is the part that is messy and needs cleanup. Every pass just checks
if dump_file is null or not and decide to dump the debugging info --
there is no easy way to turn it on and off. Ideally, individual pass
should call

  int debug_dump_level () -- dumps when it returns  0.

With that in place, the dump flag -fdump-xxx-yyy-ir_only can be easily
implemented -- it only turns on pass manager dump, but lowers the
debug dump level to 0.

David



 Regards

 --
 Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
 email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359
 8, rue de la Faiencerie, 92340 Bourg La Reine, France
 *** opinions {are only mine, sont seulement les miennes} ***



Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-31 Thread Xinliang David Li
On Tue, May 31, 2011 at 2:05 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, May 30, 2011 at 10:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 The attached are two simple follow up patches

 1) the first patch does some refactorization on function header
 dumping (with more information printed)

 2) the second patch cleans up some pass names. Part of the cleanup
 results from a previous discussion with Honza -- a) rename
 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and
 'profile' into 'profile_estimate'. The rest of cleanups are needed to
 make sure pass names are unique.

 Ok for trunk?

 +
 +void
 +pass_dump_function_header (FILE *dump_file, tree fdecl, struct function *fun)

 This function needs documentation, the ChangeLog entry misses
 the tree-pretty-print.h change.

 +struct function;

 instead of this please include coretypes.h from tree-pretty-print.h
 and add the struct function forward declaration there if it isn't already
 present.

 You change the output of the header, so please make sure you
 have bootstrapped and tested with _all_ languages included
 (and also watch for bugreports for target specific bugs).


Ok.

 +  fprintf (dump_file, \n;; Function %s (%s, funcdef_no=%d, uid=%d),
 +           dname, aname, fun-funcdef_no, node-uid);

 I see no point in dumping funcdef_no - it wasn't dumped before in
 any place.  Instead I miss dumping of the DECL_UID and thus
 a more specific 'uid', like 'cgraph-uid'.

Ok will add decl_uid. Funcdef_no is very useful for debugging FDO
coverage mismatch related problems as it is the id used in profile
hashing.


 +  aname = (IDENTIFIER_POINTER
 +          (DECL_ASSEMBLER_NAME (fdecl)));

 using DECL_ASSEMBLER_NAME is bad - it might trigger computation
 of DECL_ASSEMBLER_NAME which certainly shouldn't be done
 only for dumping purposes.  Instead do sth like

   if (DECL_ASSEMBLER_NAME_SET_P (fdecl))
     aname = DECL_ASSEMBLER_NAME (fdecl);
   else
     aname = 'unset-asm-name';

Ok.


 and please also watch for cgraph_get_node returning NULL.

 Also please call the function dump_function_header instead of
 pass_dump_function_header.


Ok.

Thanks,

David
 Please re-post with appropriate changes.

 Thanks,
 Richard.

 Thanks,

 David

 On Fri, May 27, 2011 at 2:58 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com 
 wrote:
 The latest version that only exports 2 functions from passes.c.

 Ok with ...

 @@ -637,4 +637,8 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void disable_pass (const char *);
 +extern void enable_pass (const char *);
 +struct function;
 +

 struct function forward decl removed.

 +  explicitly_enabled = is_pass_explicitly_enabled (pass, func);
 +  explicitly_disabled = is_pass_explicitly_disabled (pass, func);

 both functions inlined here and removed.

 +#define MAX_PASS_ID 512

 this removed and instead a VEC_safe_grow_cleared () or VEC_length ()
 before the accesses.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol
 +-fenable-tree-@var{pass}=@var{range-list} @gol

 -fenable-@var{kind}-@var{pass}, etc.

 +@item -fdisable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list}
 +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list}

 likewise.

 Thanks,
 Richard.

 David

 On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, May 26, 2011 at 2:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 25 May 2011, Xinliang David Li wrote:

 Ping. The link to the message:

 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

 I don't consider this an option handling patch.  Patches adding whole 
 new
 features involving new options should be reviewed by maintainers for the
 part of the compiler relevant to those features (since there isn't a 
 pass
 manager maintainer, I guess that means middle-end).

 Hmm, I suppose then you reviewed the option handling parts and they
 are ok?  Those globbing options always cause headache to me.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol

 so, no -fenable-tree-@var{pass}=@var{range-list}?


 Missed. Will add.


 Does the pass name match 1:1 with the dump file name?  In which
 case

 Yes.


 +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same
 pass is statically invoked in the compiler multiple times, the pass
 name should be appended with a sequential number starting from 1.

 isn't true as passes that are invoked only a single time lack the number
 suffix (yes, I'd

-fdump-passes -fenable-xxx=func_name_list

2011-05-31 Thread Xinliang David Li
The following patch implements the a new option that dumps gcc PASS
configuration. The sample output is attached.  There is one
limitation: some placeholder passes that are named with '*xxx' are
note registered thus they are not listed. They are not important as
they can not be turned on/off anyway.

The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list
of function assembler names to be specified.

Ok for trunk?

Thanks,

David


out
Description: Binary data
2011-05-31  David Li  davi...@google.com

	* passes.c (passr_eq):
	(register_pass_name): Change pass name table name.
	(get_pass_by_name): Ditto.
	(enable_disable_pass): Handle function assembler name.
	(is_pass_explicitly_enabled_or_disabled): Ditto.
	(execute_one_pass): Add pass dumping.
	(execute_pass_list): Track pass identation.
	(execute_ipa_pass_list): Ditto.
	(dump_one_pass): New function.
	(pass_traverse): Ditto.
	(enter_pass_list): Ditto.
	(exit_pass_list): Ditto.
	(is_valid_assembler_name): Ditto.

2011-05-31  David Li  davi...@google.com

	* testsuite/gcc.dg/inline_2.c	(revision 0):
	* testsuite/gcc.dg/inline_6.c	(revision 0):
	* testsuite/gcc.dg/unroll_2.c	(revision 0):
	* testsuite/gcc.dg/inline_3.c	(revision 0):
	* testsuite/gcc.dg/unroll_3.c	(revision 0):
	* testsuite/gcc.dg/inline_4.c	(revision 0):
	* testsuite/gcc.dg/unroll_4.c	(revision 0):
	* testsuite/gcc.dg/inline_1.c	(revision 0):
	* testsuite/gcc.dg/inline_5.c	(revision 0):
	* testsuite/gcc.dg/unroll_1.c	(revision 0):

Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 174424)
+++ doc/invoke.texi	(working copy)
@@ -291,6 +291,7 @@ Objective-C and Objective-C++ Dialects}.
 -fdump-translation-unit@r{[}-@var{n}@r{]} @gol
 -fdump-class-hierarchy@r{[}-@var{n}@r{]} @gol
 -fdump-ipa-all -fdump-ipa-cgraph -fdump-ipa-inline @gol
+-fdump-passes @gol
 -fdump-statistics @gol
 -fdump-tree-all @gol
 -fdump-tree-original@r{[}-@var{n}@r{]}  @gol
@@ -5056,11 +5057,12 @@ appended with a sequential number starti
 Disable rtl pass @var{pass}.  @var{pass} is the pass name.  If the same pass is
 statically invoked in the compiler multiple times, the pass name should be
 appended with a sequential number starting from 1.  @var{range-list} is a comma
-seperated list of function ranges.  Each range is a number pair seperated by a colon.
-The range is inclusive in both ends.  If the range is trivial, the number pair can be
-simplified a a single number.  If the function's cgraph node's @var{uid} is falling
-within one of the specified ranges, the @var{pass} is disabled for that function.
-The @var{uid} is shown in the function header of a dump file.
+seperated list of function ranges or assembler names.  Each range is a number
+pair seperated by a colon.  The range is inclusive in both ends.  If the range
+is trivial, the number pair can be simplified as a single number.  If the
+function's cgraph node's @var{uid} is falling within one of the specified ranges,
+the @var{pass} is disabled for that function.  The @var{uid} is shown in the
+function header of a dump file.
 
 @item -fdisable-tree-@var{pass}
 @item -fdisable-tree-@var{pass}=@var{range-list}
@@ -5090,7 +5092,8 @@ of option arguments.
-fenable-tree-cunroll=1
 # disable gcse2 for functions at the following ranges [1,1],
 # [300,400], and [400,1000]
-   -fdisable-rtl-gcse2=1:100,300,400:1000
+# disable gcse2 for functions foo and foo2
+   -fdisable-rtl-gcse2=foo,foo2
 # disable early inlining
-fdisable-tree-einline
 # disable ipa inlining
@@ -5483,6 +5486,11 @@ Dump after function inlining.
 
 @end table
 
+@item -fdump-passes
+@optindex fdump-passes
+Dump the list of optimization passes that are turned on and off by
+the current command line options.
+
 @item -fdump-statistics-@var{option}
 @opindex fdump-statistics
 Enable and control dumping of pass statistics in a separate file.  The
Index: testsuite/gcc.dg/inline_2.c
===
--- testsuite/gcc.dg/inline_2.c	(revision 0)
+++ testsuite/gcc.dg/inline_2.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=0:3 -fdisable-ipa-inline } */
+int g;
+__attribute__((always_inline)) void bar (void)
+{
+  g++;
+}
+
+int foo (void)
+{
+  bar ();
+  return g;
+}
+
+int foo2 (void)
+{
+  bar();
+  return g + 1;
+}
+
+/* { dg-final { scan-tree-dump-times bar 5 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
+/* { dg-excess-errors extra notes } */
Index: testsuite/gcc.dg/inline_6.c
===
--- testsuite/gcc.dg/inline_6.c	(revision 0)
+++ testsuite/gcc.dg/inline_6.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=foo2 -fdisable-ipa-inline } */
+int g;
+__attribute__((always_inline)) void bar (void)
+{
+  g++;
+}
+
+int foo (void)
+{
+  bar 

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-31 Thread Xinliang David Li
The new patch is attached. The test (c,c++,fortran, java, ada) is on going.

Thanks,

David

On Tue, May 31, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote:
 On Tue, May 31, 2011 at 2:05 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, May 30, 2011 at 10:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 The attached are two simple follow up patches

 1) the first patch does some refactorization on function header
 dumping (with more information printed)

 2) the second patch cleans up some pass names. Part of the cleanup
 results from a previous discussion with Honza -- a) rename
 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and
 'profile' into 'profile_estimate'. The rest of cleanups are needed to
 make sure pass names are unique.

 Ok for trunk?

 +
 +void
 +pass_dump_function_header (FILE *dump_file, tree fdecl, struct function 
 *fun)

 This function needs documentation, the ChangeLog entry misses
 the tree-pretty-print.h change.

 +struct function;

 instead of this please include coretypes.h from tree-pretty-print.h
 and add the struct function forward declaration there if it isn't already
 present.

 You change the output of the header, so please make sure you
 have bootstrapped and tested with _all_ languages included
 (and also watch for bugreports for target specific bugs).


 Ok.

 +  fprintf (dump_file, \n;; Function %s (%s, funcdef_no=%d, uid=%d),
 +           dname, aname, fun-funcdef_no, node-uid);

 I see no point in dumping funcdef_no - it wasn't dumped before in
 any place.  Instead I miss dumping of the DECL_UID and thus
 a more specific 'uid', like 'cgraph-uid'.

 Ok will add decl_uid. Funcdef_no is very useful for debugging FDO
 coverage mismatch related problems as it is the id used in profile
 hashing.


 +  aname = (IDENTIFIER_POINTER
 +          (DECL_ASSEMBLER_NAME (fdecl)));

 using DECL_ASSEMBLER_NAME is bad - it might trigger computation
 of DECL_ASSEMBLER_NAME which certainly shouldn't be done
 only for dumping purposes.  Instead do sth like

   if (DECL_ASSEMBLER_NAME_SET_P (fdecl))
     aname = DECL_ASSEMBLER_NAME (fdecl);
   else
     aname = 'unset-asm-name';

 Ok.


 and please also watch for cgraph_get_node returning NULL.

 Also please call the function dump_function_header instead of
 pass_dump_function_header.


 Ok.

 Thanks,

 David
 Please re-post with appropriate changes.

 Thanks,
 Richard.

 Thanks,

 David

 On Fri, May 27, 2011 at 2:58 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com 
 wrote:
 The latest version that only exports 2 functions from passes.c.

 Ok with ...

 @@ -637,4 +637,8 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void disable_pass (const char *);
 +extern void enable_pass (const char *);
 +struct function;
 +

 struct function forward decl removed.

 +  explicitly_enabled = is_pass_explicitly_enabled (pass, func);
 +  explicitly_disabled = is_pass_explicitly_disabled (pass, func);

 both functions inlined here and removed.

 +#define MAX_PASS_ID 512

 this removed and instead a VEC_safe_grow_cleared () or VEC_length ()
 before the accesses.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol
 +-fenable-tree-@var{pass}=@var{range-list} @gol

 -fenable-@var{kind}-@var{pass}, etc.

 +@item -fdisable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list}
 +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list}

 likewise.

 Thanks,
 Richard.

 David

 On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, May 26, 2011 at 2:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 25 May 2011, Xinliang David Li wrote:

 Ping. The link to the message:

 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

 I don't consider this an option handling patch.  Patches adding whole 
 new
 features involving new options should be reviewed by maintainers for 
 the
 part of the compiler relevant to those features (since there isn't a 
 pass
 manager maintainer, I guess that means middle-end).

 Hmm, I suppose then you reviewed the option handling parts and they
 are ok?  Those globbing options always cause headache to me.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol

 so, no -fenable-tree-@var{pass}=@var{range-list}?


 Missed. Will add.


 Does the pass name match 1:1 with the dump file name?  In which
 case

 Yes.


 +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same
 pass is statically invoked in the compiler

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-31 Thread Xinliang David Li
Honza, are you ok with the pass name change?

David

On Tue, May 31, 2011 at 2:07 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, May 30, 2011 at 11:44 PM, Xinliang David Li davi...@google.com 
 wrote:
 This is the complete patch for pass name fixes (with test case changes).

 This is ok if Honza thinks the profile pass names make more sense this
 way.

 Thanks,
 Richard.

 David


 On Mon, May 30, 2011 at 1:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 The attached are two simple follow up patches

 1) the first patch does some refactorization on function header
 dumping (with more information printed)

 2) the second patch cleans up some pass names. Part of the cleanup
 results from a previous discussion with Honza -- a) rename
 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and
 'profile' into 'profile_estimate'. The rest of cleanups are needed to
 make sure pass names are unique.

 Ok for trunk?

 Thanks,

 David

 On Fri, May 27, 2011 at 2:58 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com 
 wrote:
 The latest version that only exports 2 functions from passes.c.

 Ok with ...

 @@ -637,4 +637,8 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void disable_pass (const char *);
 +extern void enable_pass (const char *);
 +struct function;
 +

 struct function forward decl removed.

 +  explicitly_enabled = is_pass_explicitly_enabled (pass, func);
 +  explicitly_disabled = is_pass_explicitly_disabled (pass, func);

 both functions inlined here and removed.

 +#define MAX_PASS_ID 512

 this removed and instead a VEC_safe_grow_cleared () or VEC_length ()
 before the accesses.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol
 +-fenable-tree-@var{pass}=@var{range-list} @gol

 -fenable-@var{kind}-@var{pass}, etc.

 +@item -fdisable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list}
 +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list}

 likewise.

 Thanks,
 Richard.

 David

 On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, May 26, 2011 at 2:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 25 May 2011, Xinliang David Li wrote:

 Ping. The link to the message:

 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

 I don't consider this an option handling patch.  Patches adding whole 
 new
 features involving new options should be reviewed by maintainers for 
 the
 part of the compiler relevant to those features (since there isn't a 
 pass
 manager maintainer, I guess that means middle-end).

 Hmm, I suppose then you reviewed the option handling parts and they
 are ok?  Those globbing options always cause headache to me.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol

 so, no -fenable-tree-@var{pass}=@var{range-list}?


 Missed. Will add.


 Does the pass name match 1:1 with the dump file name?  In which
 case

 Yes.


 +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same
 pass is statically invoked in the compiler multiple times, the pass
 name should be appended with a sequential number starting from 1.

 isn't true as passes that are invoked only a single time lack the number
 suffix (yes, I'd really like that to be changed ...)

 Yes, pass with single static instance does not need number suffix.


 Please break likes also in .texi files and stick to 80 columns.

 Done.

  Please
 document that these options are debugging options and regular
 options for enabling/disabling passes should be used.  I would suggest
 to group documentation differently and document -fenable-* and
 -fdisable-*, thus,

 + -fdisable-@var{kind}-@var{pass}
 + -fenable-@var{kind}-@var{pass}

 Even in .texi files please two spaces after a full-stop.

 Done


 +extern void enable_disable_pass (const char *, bool);

 I'd rather have both enable_pass and disable_pass ;)

 Ok.


 +struct function;
 +extern void pass_dump_function_header (FILE *, tree, struct function 
 *);

 that's odd and probably should be split out, the function should
 maybe reside in tree-pretty-print.c.

 Ok.


 Index: tree-ssa-loop-ivopts.c
 ===
 --- tree-ssa-loop-ivopts.c      (revision 173837)
 +++ tree-ssa-loop-ivopts.c      (working copy)
 @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d

 well - doesn't belong here ;)

 Sorry -- many things in the same client -- not needed with your latest
 change anyway.


 +static hashval_t
 +passr_hash

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-31 Thread Xinliang David Li
Please discard the previous one. This is the right one:

David

On Tue, May 31, 2011 at 5:01 PM, Xinliang David Li davi...@google.com wrote:
 The new patch is attached. The test (c,c++,fortran, java, ada) is on going.

 Thanks,

 David

 On Tue, May 31, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote:
 On Tue, May 31, 2011 at 2:05 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, May 30, 2011 at 10:16 PM, Xinliang David Li davi...@google.com 
 wrote:
 The attached are two simple follow up patches

 1) the first patch does some refactorization on function header
 dumping (with more information printed)

 2) the second patch cleans up some pass names. Part of the cleanup
 results from a previous discussion with Honza -- a) rename
 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and
 'profile' into 'profile_estimate'. The rest of cleanups are needed to
 make sure pass names are unique.

 Ok for trunk?

 +
 +void
 +pass_dump_function_header (FILE *dump_file, tree fdecl, struct function 
 *fun)

 This function needs documentation, the ChangeLog entry misses
 the tree-pretty-print.h change.

 +struct function;

 instead of this please include coretypes.h from tree-pretty-print.h
 and add the struct function forward declaration there if it isn't already
 present.

 You change the output of the header, so please make sure you
 have bootstrapped and tested with _all_ languages included
 (and also watch for bugreports for target specific bugs).


 Ok.

 +  fprintf (dump_file, \n;; Function %s (%s, funcdef_no=%d, uid=%d),
 +           dname, aname, fun-funcdef_no, node-uid);

 I see no point in dumping funcdef_no - it wasn't dumped before in
 any place.  Instead I miss dumping of the DECL_UID and thus
 a more specific 'uid', like 'cgraph-uid'.

 Ok will add decl_uid. Funcdef_no is very useful for debugging FDO
 coverage mismatch related problems as it is the id used in profile
 hashing.


 +  aname = (IDENTIFIER_POINTER
 +          (DECL_ASSEMBLER_NAME (fdecl)));

 using DECL_ASSEMBLER_NAME is bad - it might trigger computation
 of DECL_ASSEMBLER_NAME which certainly shouldn't be done
 only for dumping purposes.  Instead do sth like

   if (DECL_ASSEMBLER_NAME_SET_P (fdecl))
     aname = DECL_ASSEMBLER_NAME (fdecl);
   else
     aname = 'unset-asm-name';

 Ok.


 and please also watch for cgraph_get_node returning NULL.

 Also please call the function dump_function_header instead of
 pass_dump_function_header.


 Ok.

 Thanks,

 David
 Please re-post with appropriate changes.

 Thanks,
 Richard.

 Thanks,

 David

 On Fri, May 27, 2011 at 2:58 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com 
 wrote:
 The latest version that only exports 2 functions from passes.c.

 Ok with ...

 @@ -637,4 +637,8 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void disable_pass (const char *);
 +extern void enable_pass (const char *);
 +struct function;
 +

 struct function forward decl removed.

 +  explicitly_enabled = is_pass_explicitly_enabled (pass, func);
 +  explicitly_disabled = is_pass_explicitly_disabled (pass, func);

 both functions inlined here and removed.

 +#define MAX_PASS_ID 512

 this removed and instead a VEC_safe_grow_cleared () or VEC_length ()
 before the accesses.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol
 +-fenable-tree-@var{pass}=@var{range-list} @gol

 -fenable-@var{kind}-@var{pass}, etc.

 +@item -fdisable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list}
 +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list}

 likewise.

 Thanks,
 Richard.

 David

 On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, May 26, 2011 at 2:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 25 May 2011, Xinliang David Li wrote:

 Ping. The link to the message:

 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

 I don't consider this an option handling patch.  Patches adding whole 
 new
 features involving new options should be reviewed by maintainers for 
 the
 part of the compiler relevant to those features (since there isn't a 
 pass
 manager maintainer, I guess that means middle-end).

 Hmm, I suppose then you reviewed the option handling parts and they
 are ok?  Those globbing options always cause headache to me.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol

 so, no -fenable-tree-@var{pass}=@var{range-list}?


 Missed. Will add.


 Does the pass name match 1:1 with the dump file name

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-30 Thread Xinliang David Li
The attached are two simple follow up patches

1) the first patch does some refactorization on function header
dumping (with more information printed)

2) the second patch cleans up some pass names. Part of the cleanup
results from a previous discussion with Honza -- a) rename
'tree_profile_ipa' into 'profile', and make 'ipa_profile' and
'profile' into 'profile_estimate'. The rest of cleanups are needed to
make sure pass names are unique.

Ok for trunk?

Thanks,

David

On Fri, May 27, 2011 at 2:58 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com 
 wrote:
 The latest version that only exports 2 functions from passes.c.

 Ok with ...

 @@ -637,4 +637,8 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void disable_pass (const char *);
 +extern void enable_pass (const char *);
 +struct function;
 +

 struct function forward decl removed.

 +  explicitly_enabled = is_pass_explicitly_enabled (pass, func);
 +  explicitly_disabled = is_pass_explicitly_disabled (pass, func);

 both functions inlined here and removed.

 +#define MAX_PASS_ID 512

 this removed and instead a VEC_safe_grow_cleared () or VEC_length ()
 before the accesses.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol
 +-fenable-tree-@var{pass}=@var{range-list} @gol

 -fenable-@var{kind}-@var{pass}, etc.

 +@item -fdisable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list}
 +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list}

 likewise.

 Thanks,
 Richard.

 David

 On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, May 26, 2011 at 2:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 25 May 2011, Xinliang David Li wrote:

 Ping. The link to the message:

 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

 I don't consider this an option handling patch.  Patches adding whole new
 features involving new options should be reviewed by maintainers for the
 part of the compiler relevant to those features (since there isn't a pass
 manager maintainer, I guess that means middle-end).

 Hmm, I suppose then you reviewed the option handling parts and they
 are ok?  Those globbing options always cause headache to me.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol

 so, no -fenable-tree-@var{pass}=@var{range-list}?


 Missed. Will add.


 Does the pass name match 1:1 with the dump file name?  In which
 case

 Yes.


 +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same
 pass is statically invoked in the compiler multiple times, the pass
 name should be appended with a sequential number starting from 1.

 isn't true as passes that are invoked only a single time lack the number
 suffix (yes, I'd really like that to be changed ...)

 Yes, pass with single static instance does not need number suffix.


 Please break likes also in .texi files and stick to 80 columns.

 Done.

  Please
 document that these options are debugging options and regular
 options for enabling/disabling passes should be used.  I would suggest
 to group documentation differently and document -fenable-* and
 -fdisable-*, thus,

 + -fdisable-@var{kind}-@var{pass}
 + -fenable-@var{kind}-@var{pass}

 Even in .texi files please two spaces after a full-stop.

 Done


 +extern void enable_disable_pass (const char *, bool);

 I'd rather have both enable_pass and disable_pass ;)

 Ok.


 +struct function;
 +extern void pass_dump_function_header (FILE *, tree, struct function *);

 that's odd and probably should be split out, the function should
 maybe reside in tree-pretty-print.c.

 Ok.


 Index: tree-ssa-loop-ivopts.c
 ===
 --- tree-ssa-loop-ivopts.c      (revision 173837)
 +++ tree-ssa-loop-ivopts.c      (working copy)
 @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d

 well - doesn't belong here ;)

 Sorry -- many things in the same client -- not needed with your latest
 change anyway.


 +static hashval_t
 +passr_hash (const void *p)
 +{
 +  const struct pass_registry *const s = (const struct pass_registry 
 *const) p;
 +  return htab_hash_string (s-unique_name);
 +}
 +
 +/* Hash equal function  */
 +
 +static int
 +passr_eq (const void *p1, const void *p2)
 +{
 +  const struct pass_registry *const s1 = (const struct pass_registry
 *const) p1;
 +  const struct pass_registry *const s2 = (const struct pass_registry
 *const) p2;
 +
 +  return !strcmp (s1-unique_name, s2-unique_name);
 +}

 you can use

[google] backport 174423, 174424 to google/main

2011-05-30 Thread Xinliang David Li
Committed.

David


Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-30 Thread Xinliang David Li
This is the complete patch for pass name fixes (with test case changes).

David


On Mon, May 30, 2011 at 1:16 PM, Xinliang David Li davi...@google.com wrote:
 The attached are two simple follow up patches

 1) the first patch does some refactorization on function header
 dumping (with more information printed)

 2) the second patch cleans up some pass names. Part of the cleanup
 results from a previous discussion with Honza -- a) rename
 'tree_profile_ipa' into 'profile', and make 'ipa_profile' and
 'profile' into 'profile_estimate'. The rest of cleanups are needed to
 make sure pass names are unique.

 Ok for trunk?

 Thanks,

 David

 On Fri, May 27, 2011 at 2:58 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, May 27, 2011 at 12:02 AM, Xinliang David Li davi...@google.com 
 wrote:
 The latest version that only exports 2 functions from passes.c.

 Ok with ...

 @@ -637,4 +637,8 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void disable_pass (const char *);
 +extern void enable_pass (const char *);
 +struct function;
 +

 struct function forward decl removed.

 +  explicitly_enabled = is_pass_explicitly_enabled (pass, func);
 +  explicitly_disabled = is_pass_explicitly_disabled (pass, func);

 both functions inlined here and removed.

 +#define MAX_PASS_ID 512

 this removed and instead a VEC_safe_grow_cleared () or VEC_length ()
 before the accesses.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol
 +-fenable-tree-@var{pass}=@var{range-list} @gol

 -fenable-@var{kind}-@var{pass}, etc.

 +@item -fdisable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fenable-@var{ipa|tree|rtl}-@var{pass}
 +@itemx -fdisable-@var{tree|rtl}-@var{pass}=@var{range-list}
 +@itemx -fenable-@var{tree|rtl}-@var{pass}=@var{range-list}

 likewise.

 Thanks,
 Richard.

 David

 On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Thu, May 26, 2011 at 2:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 25 May 2011, Xinliang David Li wrote:

 Ping. The link to the message:

 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

 I don't consider this an option handling patch.  Patches adding whole new
 features involving new options should be reviewed by maintainers for the
 part of the compiler relevant to those features (since there isn't a pass
 manager maintainer, I guess that means middle-end).

 Hmm, I suppose then you reviewed the option handling parts and they
 are ok?  Those globbing options always cause headache to me.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol

 so, no -fenable-tree-@var{pass}=@var{range-list}?


 Missed. Will add.


 Does the pass name match 1:1 with the dump file name?  In which
 case

 Yes.


 +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same
 pass is statically invoked in the compiler multiple times, the pass
 name should be appended with a sequential number starting from 1.

 isn't true as passes that are invoked only a single time lack the number
 suffix (yes, I'd really like that to be changed ...)

 Yes, pass with single static instance does not need number suffix.


 Please break likes also in .texi files and stick to 80 columns.

 Done.

  Please
 document that these options are debugging options and regular
 options for enabling/disabling passes should be used.  I would suggest
 to group documentation differently and document -fenable-* and
 -fdisable-*, thus,

 + -fdisable-@var{kind}-@var{pass}
 + -fenable-@var{kind}-@var{pass}

 Even in .texi files please two spaces after a full-stop.

 Done


 +extern void enable_disable_pass (const char *, bool);

 I'd rather have both enable_pass and disable_pass ;)

 Ok.


 +struct function;
 +extern void pass_dump_function_header (FILE *, tree, struct function *);

 that's odd and probably should be split out, the function should
 maybe reside in tree-pretty-print.c.

 Ok.


 Index: tree-ssa-loop-ivopts.c
 ===
 --- tree-ssa-loop-ivopts.c      (revision 173837)
 +++ tree-ssa-loop-ivopts.c      (working copy)
 @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d

 well - doesn't belong here ;)

 Sorry -- many things in the same client -- not needed with your latest
 change anyway.


 +static hashval_t
 +passr_hash (const void *p)
 +{
 +  const struct pass_registry *const s = (const struct pass_registry 
 *const) p;
 +  return htab_hash_string (s-unique_name);
 +}
 +
 +/* Hash equal function  */
 +
 +static int
 +passr_eq (const void *p1, const void *p2)
 +{
 +  const struct pass_registry *const s1 = (const struct pass_registry
 *const) p1

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-26 Thread Xinliang David Li
The latest version that only exports 2 functions from passes.c.

David

On Thu, May 26, 2011 at 2:04 PM, Xinliang David Li davi...@google.com wrote:
 On Thu, May 26, 2011 at 2:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, May 25, 2011 at 6:53 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 25 May 2011, Xinliang David Li wrote:

 Ping. The link to the message:

 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

 I don't consider this an option handling patch.  Patches adding whole new
 features involving new options should be reviewed by maintainers for the
 part of the compiler relevant to those features (since there isn't a pass
 manager maintainer, I guess that means middle-end).

 Hmm, I suppose then you reviewed the option handling parts and they
 are ok?  Those globbing options always cause headache to me.

 +-fenable-ipa-@var{pass} @gol
 +-fenable-rtl-@var{pass} @gol
 +-fenable-rtl-@var{pass}=@var{range-list} @gol
 +-fenable-tree-@var{pass} @gol

 so, no -fenable-tree-@var{pass}=@var{range-list}?


 Missed. Will add.


 Does the pass name match 1:1 with the dump file name?  In which
 case

 Yes.


 +Disable ipa pass @var{pass}. @var{pass} is the pass name. If the same
 pass is statically invoked in the compiler multiple times, the pass
 name should be appended with a sequential number starting from 1.

 isn't true as passes that are invoked only a single time lack the number
 suffix (yes, I'd really like that to be changed ...)

 Yes, pass with single static instance does not need number suffix.


 Please break likes also in .texi files and stick to 80 columns.

 Done.

  Please
 document that these options are debugging options and regular
 options for enabling/disabling passes should be used.  I would suggest
 to group documentation differently and document -fenable-* and
 -fdisable-*, thus,

 + -fdisable-@var{kind}-@var{pass}
 + -fenable-@var{kind}-@var{pass}

 Even in .texi files please two spaces after a full-stop.

 Done


 +extern void enable_disable_pass (const char *, bool);

 I'd rather have both enable_pass and disable_pass ;)

 Ok.


 +struct function;
 +extern void pass_dump_function_header (FILE *, tree, struct function *);

 that's odd and probably should be split out, the function should
 maybe reside in tree-pretty-print.c.

 Ok.


 Index: tree-ssa-loop-ivopts.c
 ===
 --- tree-ssa-loop-ivopts.c      (revision 173837)
 +++ tree-ssa-loop-ivopts.c      (working copy)
 @@ -3968,7 +3968,7 @@ get_computation_cost_at (struct ivopts_d

 well - doesn't belong here ;)

 Sorry -- many things in the same client -- not needed with your latest
 change anyway.


 +static hashval_t
 +passr_hash (const void *p)
 +{
 +  const struct pass_registry *const s = (const struct pass_registry *const) 
 p;
 +  return htab_hash_string (s-unique_name);
 +}
 +
 +/* Hash equal function  */
 +
 +static int
 +passr_eq (const void *p1, const void *p2)
 +{
 +  const struct pass_registry *const s1 = (const struct pass_registry
 *const) p1;
 +  const struct pass_registry *const s2 = (const struct pass_registry
 *const) p2;
 +
 +  return !strcmp (s1-unique_name, s2-unique_name);
 +}

 you can use htab_hash_string and strcmp directly, no need for these
 wrappers.

 The hashtable entry is not string in this case. It is pass_registry,
 thus the wrapper.


 +void
 +register_pass_name (struct opt_pass *pass, const char *name)

 doesn't seem to need exporting, so don't and make it static.

 Done.


 +  if (!pass_name_tab)
 +    pass_name_tab = htab_create (10, passr_hash, passr_eq, NULL);

 see above, the initial size should be larger - we have 223 passes at the
 moment, so use 256.

 Done.


 +  else
 +    return; /* Ignore plugin passes.  */

 ?  You mean they might clash?

 Yes, name clash.


 +struct opt_pass *
 +get_pass_by_name (const char *name)

 doesn't need exporting either.

 Done.


 +      if (is_enable)
 +        error (unrecognized option -fenable);
 +      else
 +        error (unrecognized option -fdisable);

 I think that should be fatal_error - Joseph?

 +      if (is_enable)
 +        error (unknown pass %s specified in -fenable, phase_name);
 +      else
 +        error (unknown pass %s specified in -fdisble, phase_name);

 likewise.

 +      if (!enabled_pass_uid_range_tab)
 +       enabled_pass_uid_range_tab = htab_create (10, pass_hash, pass_eq, 
 NULL);

 instead of using a hashtable here please use a VEC indexed by
 the static_pass_number which shoud speed up

 Ok.  The reason I did not use it is because in most of the cases, the
 htab will be very small -- it is determined by the number of passes
 specified in the command line, while the VEC requires allocating const
 size array. Not an issue as long as by default the array is not
 allocated.


 +static bool
 +is_pass_explicitly_enabled_or_disabled (struct opt_pass *pass,
 +                                       tree func, htab_t tab

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-25 Thread Xinliang David Li
Ping. The link to the message:

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01303.html

Thanks,

David

On Sun, May 22, 2011 at 4:17 PM, Xinliang David Li davi...@google.com wrote:
 Ping.

 David


 On Fri, May 20, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote:
 Ok to check in this one?

 Thanks,

 David

 On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 18 May 2011, David Li wrote:

 +      error (Unrecognized option %s, is_enable ? -fenable : 
 -fdisable);

 +      error (Unknown pass %s specified in %s,
 +          phase_name,
 +          is_enable ? -fenable : -fdisable);

 Follow GNU Coding Standards for diagnostics (start with lowercase letter).

 +      inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of 
 [%u, %u]\n,
 +              is_enable? Enable:Disable, phase_name, 
 new_range-start, new_range-last);

 Use separate calls to inform for the enable and disable cases, so that
 full sentences can be extracted for translation.

 +           error (Invalid range %s in option %s,
 +                  one_range,
 +                  is_enable ? -fenable : -fdisable);

 GNU Coding Standards.

 +               error (Invalid range %s in option %s,

 Likewise.

 +          inform (UNKNOWN_LOCATION, %s pass %s for functions in the 
 range of [%u, %u]\n,
 +                  is_enable? Enable:Disable, phase_name, 
 new_range-start, new_range-last);

 Again needs GCS and i18n fixes.

 --
 Joseph S. Myers
 jos...@codesourcery.com





Fix for PR 48988

2011-05-22 Thread Xinliang David Li
Hi, the following trial patch fixed PR 48988 which is a regression.

Bootstrap and tested on x86/linux. Verified the reported failure is fixed.

Ok for trunk?

David


2011-05-22  David Li  davi...@google.com

PR tree-optimization/48988
* tree-ssa-uninit.c (convert_control_dep_chain_into_preds):
Initialize has_valid_pred for each pred chain.


u.p
Description: Binary data


Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-22 Thread Xinliang David Li
Ping.

David


On Fri, May 20, 2011 at 9:06 AM, Xinliang David Li davi...@google.com wrote:
 Ok to check in this one?

 Thanks,

 David

 On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Wed, 18 May 2011, David Li wrote:

 +      error (Unrecognized option %s, is_enable ? -fenable : 
 -fdisable);

 +      error (Unknown pass %s specified in %s,
 +          phase_name,
 +          is_enable ? -fenable : -fdisable);

 Follow GNU Coding Standards for diagnostics (start with lowercase letter).

 +      inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of 
 [%u, %u]\n,
 +              is_enable? Enable:Disable, phase_name, new_range-start, 
 new_range-last);

 Use separate calls to inform for the enable and disable cases, so that
 full sentences can be extracted for translation.

 +           error (Invalid range %s in option %s,
 +                  one_range,
 +                  is_enable ? -fenable : -fdisable);

 GNU Coding Standards.

 +               error (Invalid range %s in option %s,

 Likewise.

 +          inform (UNKNOWN_LOCATION, %s pass %s for functions in the range 
 of [%u, %u]\n,
 +                  is_enable? Enable:Disable, phase_name, 
 new_range-start, new_range-last);

 Again needs GCS and i18n fixes.

 --
 Joseph S. Myers
 jos...@codesourcery.com




Re: [google] Increase inlining limits with FDO/LIPO

2011-05-20 Thread Xinliang David Li
On Fri, May 20, 2011 at 2:12 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, May 19, 2011 at 8:26 PM, Xinliang David Li davi...@google.com wrote:
 I have done some SPEC testing evaluating the performance impact of
 your patch.  They look very positive.  LIPO got helped even more than
 FDO (I only did SPEC2k LIPO testing).

 Did you also check impact on compile-time and code-size?

Not yet for SPEC -- will pick some benchmarks and take a look.

David


 Thanks,

 David

 1. SPEC06 (C/C++)  with FDO

                                   before          after         Improvement
 -
       400.perlbench           27.4           28.2      2.89%         ---
           401.bzip2           18.1           18.2      0.28%
             403.gcc           25.5           26.3      3.26%           ---
             429.mcf           26.0           26.0      0.08%
           445.gobmk           22.6           23.2      2.30%         
           456.hmmer           20.1           19.8     -1.25%
           458.sjeng           23.6           23.6     -0.42%
      462.libquantum           57.1           56.9     -0.40%
         464.h264ref           34.4           34.1     -0.70%
         471.omnetpp           18.8           18.9      0.53%
           473.astar           16.6           17.0      2.53%         ---
       483.xalancbmk           27.4           28.5      3.79%    ---
        999.specrand           94.9           98.4      3.71%      ---
          450.soplex           34.5           33.8     -2.00%
          447.dealII           32.0           31.9     -0.34%
          453.povray           25.9           28.3      9.02%       ---
         482.sphinx3           32.6           31.4     -3.50%


 2. SPEC2k FDO

                                        before               after
  Improvement
 --
            164.gzip                1308                1372      4.95%
             175.vpr                1723                1805      4.76%
             176.gcc                2407                2504      4.01%
             181.mcf                1724                1748      1.38%
          186.crafty                2292                2349      2.47%
          197.parser                1457                1601      9.88%
             252.eon                2557                2588      1.22%
         253.perlbmk                2479                2574      3.83%
             254.gap                1996                2013      0.84%
          255.vortex                2683                2798      4.31%
           256.bzip2                1833                1829     -0.26%
           300.twolf                2321                2359      1.63%
            188.ammp                 771                 766     -0.72%
          183.equake                1071                1071      0.05%
             179.art                2954                2979      0.85%


 3.  SPEC2k LIPO:

                                       before              after
  Improvement
 -
            164.gzip                1311                1405      7.18%
             175.vpr                1732                1772      2.35%
             176.gcc                2462                2559      3.96%
             181.mcf                1723                1731      0.50%
          186.crafty                2552                2662      4.33%
          197.parser                1468                1671     13.78%
             252.eon                2690                3000     11.49%
         253.perlbmk                2545                2611      2.60%
             254.gap                2097                2152      2.60%
          255.vortex                2949                3719     26.11%
           256.bzip2                1864                1935      3.78%
           300.twolf                2371                2471      4.22%
            188.ammp                 771                 774      0.41%
          183.equake                1081                1081     -0.04%
             179.art                2878                2884      0.24%


 4. SPEC2k LIPO vs FDO before the change:

                                       FDO(before)    LIPO(before) Improvement
 -
           164.gzip                1308                1311      0.22%
             175.vpr                1723                1732      0.53%
             176.gcc                2407                2462      2.27%
             181.mcf                1724                1723     -0.07%
          186.crafty                2292                2552     11.32%
          197.parser                1457

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-20 Thread Xinliang David Li
Ok to check in this one?

Thanks,

David

On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Wed, 18 May 2011, David Li wrote:

 +      error (Unrecognized option %s, is_enable ? -fenable : 
 -fdisable);

 +      error (Unknown pass %s specified in %s,
 +          phase_name,
 +          is_enable ? -fenable : -fdisable);

 Follow GNU Coding Standards for diagnostics (start with lowercase letter).

 +      inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of 
 [%u, %u]\n,
 +              is_enable? Enable:Disable, phase_name, new_range-start, 
 new_range-last);

 Use separate calls to inform for the enable and disable cases, so that
 full sentences can be extracted for translation.

 +           error (Invalid range %s in option %s,
 +                  one_range,
 +                  is_enable ? -fenable : -fdisable);

 GNU Coding Standards.

 +               error (Invalid range %s in option %s,

 Likewise.

 +          inform (UNKNOWN_LOCATION, %s pass %s for functions in the range 
 of [%u, %u]\n,
 +                  is_enable? Enable:Disable, phase_name, 
 new_range-start, new_range-last);

 Again needs GCS and i18n fixes.

 --
 Joseph S. Myers
 jos...@codesourcery.com



Re: [google] Increase inlining limits with FDO/LIPO

2011-05-20 Thread Xinliang David Li
Some code size and timing number (profile use compile) are collected. Summary:

Compile time for profile-use compilation increase for all cases --
this is probably not a big issue as this is for peak performance.

It is more interesting to look at the size numbers. C++ program size
actually decrease in many cases (which match the results from our
internal benchmarks). For some benchmark such as povray, it reduce
quite a bit (either due to more DFE, or better cleanups -- have not
looked at details).   For C programs, especially smaller ones, the
size increase (some significantly).

David

1. xalancbmk:

Size (total size of object files, not text size which is correlated)

Before:  17,801,488
After :   17,252,032
Change:  -3%

Time (make all with -j1)

Before: 248 seconds
After  : 286 seconds
Change: 15%

2.  povray:

Size

Before: 2,855,690
After:   2,113,754
Change: -25%

Time:

Before:  36.5s
After:43.2 s
Change: +18%

3. eon

Size

Before:  3,988,460
After :4,099,108
Change: +2.8%


Time

Before: 53.1s
After:56.8s
Change: +6.9%


4. gcc


Size

Before:  14,162,908
After:15,089,106
Change: +6.5%

Time

Before: 57.3s
After:60.7s
Change:  +5.9%


5. Parser:

Size

Before: 1,175,509
After:   1,568,135
Change: +33.4%

Time

Before: 5.3s
After:  12.9s
Change: +143%



On Fri, May 20, 2011 at 8:53 AM, Xinliang David Li davi...@google.com wrote:
 On Fri, May 20, 2011 at 2:12 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, May 19, 2011 at 8:26 PM, Xinliang David Li davi...@google.com 
 wrote:
 I have done some SPEC testing evaluating the performance impact of
 your patch.  They look very positive.  LIPO got helped even more than
 FDO (I only did SPEC2k LIPO testing).

 Did you also check impact on compile-time and code-size?

 Not yet for SPEC -- will pick some benchmarks and take a look.

 David


 Thanks,

 David

 1. SPEC06 (C/C++)  with FDO

                                   before          after         Improvement
 -
       400.perlbench           27.4           28.2      2.89%         ---
           401.bzip2           18.1           18.2      0.28%
             403.gcc           25.5           26.3      3.26%           ---
             429.mcf           26.0           26.0      0.08%
           445.gobmk           22.6           23.2      2.30%         
           456.hmmer           20.1           19.8     -1.25%
           458.sjeng           23.6           23.6     -0.42%
      462.libquantum           57.1           56.9     -0.40%
         464.h264ref           34.4           34.1     -0.70%
         471.omnetpp           18.8           18.9      0.53%
           473.astar           16.6           17.0      2.53%         ---
       483.xalancbmk           27.4           28.5      3.79%    ---
        999.specrand           94.9           98.4      3.71%      ---
          450.soplex           34.5           33.8     -2.00%
          447.dealII           32.0           31.9     -0.34%
          453.povray           25.9           28.3      9.02%       ---
         482.sphinx3           32.6           31.4     -3.50%


 2. SPEC2k FDO

                                        before               after
  Improvement
 --
            164.gzip                1308                1372      4.95%
             175.vpr                1723                1805      4.76%
             176.gcc                2407                2504      4.01%
             181.mcf                1724                1748      1.38%
          186.crafty                2292                2349      2.47%
          197.parser                1457                1601      9.88%
             252.eon                2557                2588      1.22%
         253.perlbmk                2479                2574      3.83%
             254.gap                1996                2013      0.84%
          255.vortex                2683                2798      4.31%
           256.bzip2                1833                1829     -0.26%
           300.twolf                2321                2359      1.63%
            188.ammp                 771                 766     -0.72%
          183.equake                1071                1071      0.05%
             179.art                2954                2979      0.85%


 3.  SPEC2k LIPO:

                                       before              after
  Improvement
 -
            164.gzip                1311                1405      7.18%
             175.vpr                1732                1772      2.35%
             176.gcc                2462                2559      3.96%
             181.mcf                1723                1731      0.50%
          186.crafty

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-19 Thread Xinliang David Li
On Thu, May 19, 2011 at 11:04 AM, Andi Kleen a...@firstfloor.org wrote:
 davi...@google.com (David Li) writes:

 -fdisable-tree-ccp1    --- disable ccp1 for all functions
 -fenable-tree-cunroll=1   --- enable complete unroll for the function
                            whose cgraphnode uid is 1
 -fdisable-rtl-gcse2=1:100,300,400:1000   -- disable gcse2 for
                                            functions at the following
                                             ranges [1,1], [300,400],
 and [400,1000]

 How are the ranges defined? I doubt numbers are a good interface here
 to specify functions.

The numbers are cgraph uids for the functions. The form that takes the
range is mainly for developers who use scripts to auto search (bug
triaging and optimization space traverse).


 This would be better done with #pragmas?

This is not good for automation. However, I do plan (later after this
patch is in) to add another form of the option which takes a comma
separated list of assembler names --- this form will be quite useful
to workaround compiler bugs temporarily in the make file.

David


 -Andi



 --
 a...@linux.intel.com -- Speaking for myself only



Re: [google] Increase inlining limits with FDO/LIPO

2011-05-19 Thread Xinliang David Li
  9.91%
   256.bzip218331864  1.67%
   300.twolf23212371  2.15%
188.ammp 771 771 -0.04%
  183.equake10711081  1.02%
 179.art29542878 -2.59%



5. SPEC2k LIPO vs FDO  after the change:
 FDO(after)LIPO(after) Improvement

164.gzip13721405  2.36%
 175.vpr18051772 -1.79%
 176.gcc25042559  2.21%
 181.mcf17481731 -0.94%
  186.crafty23492662 13.35%
  197.parser16011671  4.38%
 252.eon25883000 15.88%
 253.perlbmk25742611  1.44%
 254.gap20132152  6.87%
  255.vortex27983719 32.88%
   256.bzip218291935  5.79%
   300.twolf23592471  4.75%
188.ammp 766 774  1.10%
  183.equake10711081  0.92%
 179.art29792884 -3.18%



On Wed, May 18, 2011 at 12:53 PM, Mark Heffernan meh...@google.com wrote:
 Verified identical binaries created and submitted.

 Mark

 On Wed, May 18, 2011 at 11:37 AM, Xinliang David Li davi...@google.com 
 wrote:
 Ok with that change to google/main with some retesting.

 David

 On Wed, May 18, 2011 at 11:34 AM, Mark Heffernan meh...@google.com wrote:
 On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com 
 wrote:
 The new change won't help those. Your original place will be ok if you
 test profile_arcs and branch_probability flags.

 Ah, yes.  I see your point now. Reverted to the original change with
 condition profile_arc_flag and flag_branch_probabilities.

 Mark


 David


 On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote:
 On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com
 wrote:

 To make consistent inline decisions between profile-gen and
 profile-use, probably better to check these two:

 flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
 enables profile-arcs, and value profiling is enabled only when
 edge/branch profiling is enabled (so no need to be checked).

 I changed the location where these parameters are set to someplace more
 appropriate (to where the flags are set when profile gen/use is 
 indicated).
  Verified identical binaries are generated.
 OK as updated?

 Mark
 2011-05-18  Mark Heffernan  meh...@google.com
 * opts.c (set_profile_parameters): New function.
 Index: opts.c
 ===
 --- opts.c      (revision 173666)
 +++ opts.c      (working copy)
 @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ
                        opts-x_help_columns, opts, lang_mask);
  }

 +
 +/* Set parameters to more appropriate values when profile information
 +   is available.  */
 +static void
 +set_profile_parameters (struct gcc_options *opts,
 +                       struct gcc_options *opts_set)
 +{
 +  /* With accurate profile information, inlining is much more
 +     selective and makes better decisions, so increase the
 +     inlining function size limits.  */
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +}
 +
 +
  /* Handle target- and language-independent options.  Return zero to
     generate an unknown option message.  Only options that need
     extra handling need to be listed here; if you simply want
 @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options
         opts-x_flag_unswitch_loops = value;
        if (!opts_set-x_flag_gcse_after_reload)
         opts-x_flag_gcse_after_reload = value;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fprofile_generate_:
 @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options
          is done.  */
        if (!opts_set-x_flag_ipa_reference  in_lto_p)
          opts-x_flag_ipa_reference = false;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fshow_column:







Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-19 Thread Xinliang David Li
On Thu, May 19, 2011 at 11:17 AM, Andi Kleen a...@firstfloor.org wrote:
 On Thu, May 19, 2011 at 11:10:24AM -0700, Xinliang David Li wrote:
 On Thu, May 19, 2011 at 11:04 AM, Andi Kleen a...@firstfloor.org wrote:
  davi...@google.com (David Li) writes:
 
  -fdisable-tree-ccp1    --- disable ccp1 for all functions
  -fenable-tree-cunroll=1   --- enable complete unroll for the function
                             whose cgraphnode uid is 1
  -fdisable-rtl-gcse2=1:100,300,400:1000   -- disable gcse2 for
                                             functions at the following
                                              ranges [1,1], [300,400],
  and [400,1000]
 
  How are the ranges defined? I doubt numbers are a good interface here
  to specify functions.

 The numbers are cgraph uids for the functions. The form that takes the
 range is mainly for developers who use scripts to auto search (bug
 triaging and optimization space traverse).

 How about function names?

Not so easy for things like binary search.



  This would be better done with #pragmas?

 This is not good for automation. However, I do plan (later after this

 Why not? It should be easy enough to write a script to add such
 pragmas given a dwarf2 symbol-file:line dump

This may require changes many many sources files as compared to a
simple command line tweak -- for most of the cases, you don't even
need to know the id ranges, and can be blindly set to a large initial
value.


 I think pragmas would be a lot more useful for non compiler developers.


As compared with the -fdisable-tree-xxx=func1,func2?

The semantics of the pragma is also in question -- does it apply to
all inline instances of the function with pragma or just the emitted
instance?

(The pragma scheme can be implemented independently if it is considered useful).

Thanks,

David


 -Andi



Re: [google] Increase inlining limits with FDO/LIPO

2011-05-18 Thread Xinliang David Li
To make consistent inline decisions between profile-gen and
profile-use, probably better to check these two:

flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
enables profile-arcs, and value profiling is enabled only when
edge/branch profiling is enabled (so no need to be checked).

David


On Tue, May 17, 2011 at 10:50 PM, Mark Heffernan meh...@google.com wrote:
 This small patch greatly expands the function size limits for inlining with
 FDO/LIPO.  With profile information, the inliner is much more selective and
 precise and so the limits can be increased with less worry that functions
 and total code size will blow up.  This speeds up x86-64 internal benchmarks
 by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to
 1.5% with FDO.  Size increase is negligible (0.1% mean).
 Bootstrapped and regression tested on x86-64.
 Trunk testing to follow.
 Ok for google/main?
 Mark

 2011-05-17  Mark Heffernan  meh...@google.com
        * opts.c (finish_options): Increase inlining limits with profile
        generate and use.

 Index: opts.c
 ===
 --- opts.c (revision 173666)
 +++ opts.c (working copy)
 @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts
    opts-x_flag_split_stack = 0;
   }
      }
 +
 +  if (opts-x_flag_profile_use
 +      || opts-x_profile_arc_flag
 +      || opts-x_flag_profile_values)
 +    {
 +      /* With accurate profile information, inlining is much more
 + selective and makes better decisions, so increase the
 + inlining function size limits.  Changes must be added to both
 + the generate and use builds to avoid profile mismatches.  */
 +      maybe_set_param_value
 + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 + opts-x_param_values, opts_set-x_param_values);
 +      maybe_set_param_value
 + (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 + opts-x_param_values, opts_set-x_param_values);
 +    }
  }



Re: [google] Increase inlining limits with FDO/LIPO

2011-05-18 Thread Xinliang David Li
Though not common, people can do this:

1. for profile gen:
gcc -fprofile-arcs ...

2. for profile use
gcc -fbranch-probabilities ...

The new change won't help those. Your original place will be ok if you
test profile_arcs and branch_probability flags.

David


On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote:
 On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com
 wrote:

 To make consistent inline decisions between profile-gen and
 profile-use, probably better to check these two:

 flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
 enables profile-arcs, and value profiling is enabled only when
 edge/branch profiling is enabled (so no need to be checked).

 I changed the location where these parameters are set to someplace more
 appropriate (to where the flags are set when profile gen/use is indicated).
  Verified identical binaries are generated.
 OK as updated?

 Mark
 2011-05-18  Mark Heffernan  meh...@google.com
 * opts.c (set_profile_parameters): New function.
 Index: opts.c
 ===
 --- opts.c      (revision 173666)
 +++ opts.c      (working copy)
 @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ
                        opts-x_help_columns, opts, lang_mask);
  }

 +
 +/* Set parameters to more appropriate values when profile information
 +   is available.  */
 +static void
 +set_profile_parameters (struct gcc_options *opts,
 +                       struct gcc_options *opts_set)
 +{
 +  /* With accurate profile information, inlining is much more
 +     selective and makes better decisions, so increase the
 +     inlining function size limits.  */
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +}
 +
 +
  /* Handle target- and language-independent options.  Return zero to
     generate an unknown option message.  Only options that need
     extra handling need to be listed here; if you simply want
 @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options
         opts-x_flag_unswitch_loops = value;
        if (!opts_set-x_flag_gcse_after_reload)
         opts-x_flag_gcse_after_reload = value;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fprofile_generate_:
 @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options
          is done.  */
        if (!opts_set-x_flag_ipa_reference  in_lto_p)
          opts-x_flag_ipa_reference = false;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fshow_column:



Re: [google] Increase inlining limits with FDO/LIPO

2011-05-18 Thread Xinliang David Li
Ok with that change to google/main with some retesting.

David

On Wed, May 18, 2011 at 11:34 AM, Mark Heffernan meh...@google.com wrote:
 On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com 
 wrote:
 The new change won't help those. Your original place will be ok if you
 test profile_arcs and branch_probability flags.

 Ah, yes.  I see your point now. Reverted to the original change with
 condition profile_arc_flag and flag_branch_probabilities.

 Mark


 David


 On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote:
 On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com
 wrote:

 To make consistent inline decisions between profile-gen and
 profile-use, probably better to check these two:

 flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
 enables profile-arcs, and value profiling is enabled only when
 edge/branch profiling is enabled (so no need to be checked).

 I changed the location where these parameters are set to someplace more
 appropriate (to where the flags are set when profile gen/use is indicated).
  Verified identical binaries are generated.
 OK as updated?

 Mark
 2011-05-18  Mark Heffernan  meh...@google.com
 * opts.c (set_profile_parameters): New function.
 Index: opts.c
 ===
 --- opts.c      (revision 173666)
 +++ opts.c      (working copy)
 @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ
                        opts-x_help_columns, opts, lang_mask);
  }

 +
 +/* Set parameters to more appropriate values when profile information
 +   is available.  */
 +static void
 +set_profile_parameters (struct gcc_options *opts,
 +                       struct gcc_options *opts_set)
 +{
 +  /* With accurate profile information, inlining is much more
 +     selective and makes better decisions, so increase the
 +     inlining function size limits.  */
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +}
 +
 +
  /* Handle target- and language-independent options.  Return zero to
     generate an unknown option message.  Only options that need
     extra handling need to be listed here; if you simply want
 @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options
         opts-x_flag_unswitch_loops = value;
        if (!opts_set-x_flag_gcse_after_reload)
         opts-x_flag_gcse_after_reload = value;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fprofile_generate_:
 @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options
          is done.  */
        if (!opts_set-x_flag_ipa_reference  in_lto_p)
          opts-x_flag_ipa_reference = false;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fshow_column:





Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-18 Thread Xinliang David Li
Thanks for the comment. Will fix those.

David

On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Wed, 18 May 2011, David Li wrote:

 +      error (Unrecognized option %s, is_enable ? -fenable : 
 -fdisable);

 +      error (Unknown pass %s specified in %s,
 +          phase_name,
 +          is_enable ? -fenable : -fdisable);

 Follow GNU Coding Standards for diagnostics (start with lowercase letter).

 +      inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of 
 [%u, %u]\n,
 +              is_enable? Enable:Disable, phase_name, new_range-start, 
 new_range-last);

 Use separate calls to inform for the enable and disable cases, so that
 full sentences can be extracted for translation.

 +           error (Invalid range %s in option %s,
 +                  one_range,
 +                  is_enable ? -fenable : -fdisable);

 GNU Coding Standards.

 +               error (Invalid range %s in option %s,

 Likewise.

 +          inform (UNKNOWN_LOCATION, %s pass %s for functions in the range 
 of [%u, %u]\n,
 +                  is_enable? Enable:Disable, phase_name, 
 new_range-start, new_range-last);

 Again needs GCS and i18n fixes.

 --
 Joseph S. Myers
 jos...@codesourcery.com



Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-18 Thread Xinliang David Li
Will fix the Changelog, and add documentation.

Thanks,

David



On Wed, May 18, 2011 at 1:26 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, May 18, 2011 at 8:37 PM, David Li davi...@google.com wrote:

 In gcc, not all passes have user level control to turn it on/off, and
 there is no way to flip on/off the pass for a subset of functions. I
 implemented a generic option handling scheme in gcc to allow
 disabling/enabling any gcc pass for any specified function(s).  The
 new options will be very useful for things like performance
 experiments and bug triaging (gcc has dbgcnt mechanism, but not all
 passes have the counter).

 The option syntax is very similar to -fdump- options. The following
 are some examples:

 -fdisable-tree-ccp1    --- disable ccp1 for all functions
 -fenable-tree-cunroll=1   --- enable complete unroll for the function
                           whose cgraphnode uid is 1
 -fdisable-rtl-gcse2=1:100,300,400:1000   -- disable gcse2 for
                                           functions at the following
                                            ranges [1,1], [300,400], and 
 [400,1000]
 -fdisable-tree-einline -- disable early inlining for all callers
 -fdisable-ipa-inline -- disable ipa inlininig

 In the gcc dumps, the uid numbers are displayed in the function header.

 The options are intended to be used internally by gcc developers.

 Ok for trunk ? (There is a little LIPO specific change that can be removed).

 David

 2011-05-18  David Li  davi...@google.com

        * final.c (rest_of_clean_state): Call function header dumper.
        * opts-global.c (handle_common_deferred_options): Handle new options.
        * tree-cfg.c (gimple_dump_cfg): Call function header dumper.
        * passes.c (register_one_dump_file): Call register_pass_name.
        (pass_init_dump_file): Call function header dumper.
        (execute_one_pass): Check explicit enable/disable flag.
        (passr_hash): New function.
        (passr_eq):
        (register_pass_name):
        (get_pass_by_name):
        (pass_hash):
        (pass_eq):
        (enable_disable_pass):
        (is_pass_explicitly_enabled_or_disabled):
        (is_pass_explicitly_enabled):
        (is_pass_explicitly_disabled):

 Bogus changelog entry.

 New options need documenting in doc/invoke.texi.

 Richard.


 Index: tree-pass.h
 ===
 --- tree-pass.h (revision 173635)
 +++ tree-pass.h (working copy)
 @@ -644,4 +644,12 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void enable_disable_pass (const char *, bool);
 +extern bool is_pass_explicitly_disabled (struct opt_pass *, tree);
 +extern bool is_pass_explicitly_enabled (struct opt_pass *, tree);
 +extern void register_pass_name (struct opt_pass *, const char *);
 +extern struct opt_pass *get_pass_by_name (const char *);
 +struct function;
 +extern void pass_dump_function_header (FILE *, tree, struct function *);
 +
  #endif /* GCC_TREE_PASS_H */
 Index: final.c
 ===
 --- final.c     (revision 173635)
 +++ final.c     (working copy)
 @@ -4456,19 +4456,7 @@ rest_of_clean_state (void)
        }
       else
        {
 -         const char *aname;
 -         struct cgraph_node *node = cgraph_node (current_function_decl);
 -
 -         aname = (IDENTIFIER_POINTER
 -                  (DECL_ASSEMBLER_NAME (current_function_decl)));
 -         fprintf (final_output, \n;; Function (%s) %s\n\n, aname,
 -            node-frequency == NODE_FREQUENCY_HOT
 -            ?  (hot)
 -            : node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
 -            ?  (unlikely executed)
 -            : node-frequency == NODE_FREQUENCY_EXECUTED_ONCE
 -            ?  (executed once)
 -            : );
 +         pass_dump_function_header (final_output, current_function_decl, 
 cfun);

          flag_dump_noaddr = flag_dump_unnumbered = 1;
          if (flag_compare_debug_opt || flag_compare_debug)
 Index: common.opt
 ===
 --- common.opt  (revision 173635)
 +++ common.opt  (working copy)
 @@ -1018,6 +1018,14 @@ fdiagnostics-show-option
  Common Var(flag_diagnostics_show_option) Init(1)
  Amend appropriate diagnostic messages with the command line option that 
 controls them

 +fdisable-
 +Common Joined RejectNegative Var(common_deferred_options) Defer
 +-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass
 +
 +fenable-
 +Common Joined RejectNegative Var(common_deferred_options) Defer
 +-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass
 +
  fdump-
  Common Joined RejectNegative Var(common_deferred_options) Defer
  -fdump-type  Dump various compiler internals to a file
 Index: opts-global.c
 ===
 --- 

Re: [google] Parameterize function overhead estimate for inlining

2011-05-17 Thread Xinliang David Li
You will have a followup patch to override arm defaults, right? Ok for
google/main.

Thanks,

David

On Tue, May 17, 2011 at 9:29 PM, Mark Heffernan meh...@google.com wrote:
 This tiny change improves the size estimation for inlining and results in an
 average 1% size reduction and a small (maybe 0.25% geomean) performance
 increase on internal benchmarks on x86-64.  I parameterized the value rather
 than changing it directly because previous exploration with x86 and ARM
 arches indicated that it varies significantly with architecture.  Default
 value is tuned for x86-64.
 Bootstrapped and tested on x86-64.  Will explore relevance and effectiveness
 for trunk and SPEC later.
 Ok for google/main?
 Mark
 2011-05-17  Mark Heffernan  meh...@google.com





        * ipa-inline.c (estimate_function_body_sizes): Parameterize static


        function static overhead.


        * params.def (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE): New parameter.
 Index: ipa-inline.c
 ===
 --- ipa-inline.c (revision 173845)
 +++ ipa-inline.c (working copy)
 @@ -1979,10 +1979,11 @@ estimate_function_body_sizes (struct cgr
    gcov_type time = 0;
    gcov_type time_inlining_benefit = 0;
    /* Estimate static overhead for function prologue/epilogue and alignment.
 */
 -  int size = 2;
 +  int size = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE);
    /* Benefits are scaled by probability of elimination that is in range
       0,2.  */
 -  int size_inlining_benefit = 2 * 2;
 +  int size_inlining_benefit =
 +    PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE) * 2;
    basic_block bb;
    gimple_stmt_iterator bsi;
    struct function *my_function = DECL_STRUCT_FUNCTION (node-decl);
 Index: params.def
 ===
 --- params.def (revision 173845)
 +++ params.def (working copy)
 @@ -110,6 +110,11 @@ DEFPARAM (PARAM_MIN_INLINE_RECURSIVE_PRO
    Inline recursively only when the probability of call being executed
 exceeds the parameter,
    10, 0, 0)

 +DEFPARAM (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE,
 +  inline-function-overhead-size,
 +  Size estimate of function overhead (prologue and epilogue) for inlining
 purposes,
 +  7, 0, 0)
 +
  /* Limit of iterations of early inliner.  This basically bounds number of
     nested indirect calls early inliner can resolve.  Deeper chains are
 still
     handled by late inlining.  */



Re: [google] support for building Linux kernel with FDO (issue4523061)

2011-05-13 Thread Xinliang David Li
On Fri, May 13, 2011 at 5:54 AM, Paolo Bonzini bonz...@gnu.org wrote:
 On 05/13/2011 03:03 AM, Rong Xu wrote:

        * gcc/coverage.c        (revision 173717): set a flag if building
 for Linux kernel.
        * gcc/tree-profile.c    (revision 173717): don't emit TLS
 declarations for Linux kernel builds.

 I think this should be done without touching at all the profiling machinery
 in GCC.

 1) add a new TLS model -ftls-model=none and make the kernel uses it. The
 model would simply force targetm.have_tls to false.


This is a good idea.


 2) as Richi mentioned, gcov-io and libgcov changes then can move to the
 kernel, and GCC needs no change at all here.


In fact -- reuse gcc code profiling machinery for FDO is the KEY
objective in this effort --- the effort tries to minimize gcc changes
by refactoring gcc code and isolating/abstracting away part of gcc
implementation that is user space program specific without using
runtime hooks.  Aside from the kernel FDO change -- the refactoring
itself actually makes the libgcov code more readable -- the existing
implementation has many huge functions etc.

Kernel source has their implementation of coverage testing -- but it
makes lots of data structure assumptions and hard coded -- it can
easily out of sync with gcc and is considered  unmaintainable.

Rong will have more explanation on the design.

Thanks,

David


 BTW, these parts of LIPO:

 +      if (!is_kernel_build)
 +        DECL_TLS_MODEL (dc_gcov_type_ptr_var) =
 +         decl_default_tls_model (dc_gcov_type_ptr_var);

       dc_void_ptr_var =
        build_decl (UNKNOWN_LOCATION, VAR_DECL,
 @@ -1488,8 +1493,9 @@
                    ptr_void);
       DECL_ARTIFICIAL (dc_void_ptr_var) = 1;
       DECL_EXTERNAL (dc_void_ptr_var) = 1;
 -      DECL_TLS_MODEL (dc_void_ptr_var) =
 -       decl_default_tls_model (dc_void_ptr_var);
 +      if (!is_kernel_build)
 +        DECL_TLS_MODEL (dc_void_ptr_var) =
 +         decl_default_tls_model (dc_void_ptr_var);

 Probably are missing a !targetm.have_tls.

 Paolo



[google] temporary fix to lipo build problems using check enabled compiler

2011-05-10 Thread Xinliang David Li
The following patch temporarily disable some of the checking which is
not fully 'lipo' aware. It will be checked into google/main and
further cleanups will follow.

David
2011-05-10  David Li  davi...@google.com

	* cgraphunit.c	(revision 173635) (verify_cgraph_node):
	(cgraph_mark_functions_to_output):
	* tree-cfg.c	(revision 173635) (verify_stmt):

Index: cgraphunit.c
===
--- cgraphunit.c	(revision 173635)
+++ cgraphunit.c	(working copy)
@@ -480,6 +480,10 @@ verify_cgraph_node (struct cgraph_node *
   if (seen_error ())
 return;
 
+  /* Disable checking for LIPO for now.  */
+  if (L_IPO_COMP_MODE)
+return;
+
   timevar_push (TV_CGRAPH_VERIFY);
   for (e = node-callees; e; e = e-next_callee)
 if (e-aux)
@@ -1315,7 +1319,8 @@ cgraph_mark_functions_to_output (void)
 		 are inside partition, we can end up not removing the body since we no longer
 		 have analyzed node pointing to it.  */
 	   !node-in_other_partition
-	   !(DECL_EXTERNAL (decl) || cgraph_is_aux_decl_external (node)))
+	   !(DECL_EXTERNAL (decl) || cgraph_is_aux_decl_external (node))
+	   !L_IPO_COMP_MODE)
 	{
 	  dump_cgraph_node (stderr, node);
 	  internal_error (failed to reclaim unneeded function);
@@ -1331,7 +1336,7 @@ cgraph_mark_functions_to_output (void)
 
 }
 #ifdef ENABLE_CHECKING
-  if (check_same_comdat_groups)
+  if (check_same_comdat_groups  !L_IPO_COMP_MODE)
 for (node = cgraph_nodes; node; node = node-next)
   if (node-same_comdat_group  !node-process)
 	{
Index: tree-cfg.c
===
--- tree-cfg.c	(revision 173635)
+++ tree-cfg.c	(working copy)
@@ -4159,6 +4159,10 @@ verify_stmt (gimple_stmt_iterator *gsi)
   gimple stmt = gsi_stmt (*gsi);
   int lp_nr;
 
+  /* TODO: Disable for now.  */
+  if (L_IPO_COMP_MODE)
+return false;
+
   if (is_gimple_omp (stmt))
 {
   /* OpenMP directives are validated by the FE and never operated


Re: [google]Implement an optimization based on reuse distance profiling (issue4517049)

2011-05-09 Thread Xinliang David Li
Ok.

The instrumentation and optimization runtime has not been open sourced
yet -- will need to be done later at some point.

(For reference, see Silvius's CGO2011 paper).

Thanks,

David

On Mon, May 9, 2011 at 2:49 PM, Easwaran Raman era...@google.com wrote:
 This patch by Silvius Rus replaces calls to  certain functions with a 
 specialized version that uses non-temporal stores based on memory reuse 
 distance profiling. Bootstraps, no test regressions and the profiling works 
 for a small test case. Ok for google/main.?

 -Easwaran

 2011-05-09  Silvius Rus silvius@gmail.com

        * value-prof.c (interesting_stringop_to_profile_p): Disbale
        stringop profiling if reuse distance profiling is turned on.
        * value-prof.h (gimple_gen_reusedist, optimize_reusedist): Declare.
        * gcov-io.h: (GCOV_COUNTER_REUSE_DIST): New counter.
        (GCOV_COUNTERS): Update.
        (GCOV_COUNTER_NAMES): Add reuse_distance.
        (GCOV_MERGE_FUNCTIONS): Add __gcov_merge_reusedist.
        (__gcov_merge_reusedist): New declaration.
        * profile.c (branch_prob): Enable reuse distance profiling and
        optimization.
        * common.opt (fprofile-reusedist, foptimize-locality): New options.
        * tree-profile.c: Include params.h.
        (stringop_subst, reusedist_t): New structures.
        (stringop_subst_t, reusedist_t): New typedefs.
        (stringop_decl): New global array.
        (RD_NUM_COUNTERS): New constant.
        (get_stringop_subst, reusedist_is_interesting_call)
        (reusedist_instr_func_name, reusedist_get_instr_decl)
        (reusedist_make_instr_call, reusedist_from_counters)
        (gimple_gen_reusedist, nt_op_name, reusedist_get_size_threshold)
        (reusedist_get_distance_large_threshold)
        (reusedist_get_distance_small_threshold)
        (reusedist_get_count_threshold, reusedist_nt_is_worth_it)
        (reusedist_get_nt_decl, maybe_issue_profile_use_note)
        (reusedist_maybe_replace_with_nt_version, optimize_reusedist): New
        functions.
        (gate_tree_profile_ipa): Return true if reuse distance instrumentation
        or use is turned on.
        * Makefile.in (LIBGCOV): Add _gcov_merge_reusedist.
        * libgcov.c (__gcov_weighted_mean2, __gcov_merge_reusedist): New
        functions.
        * params.def (PARAM_REUSEDIST_MEAN_DIST_LARGE_THRESH)
        (PARAM_REUSEDIST_MEAN_DIST_SMALL_THRESH)
        (PARAM_REUSEDIST_CALL_COUNT_THRESH, PARAM_REUSEDIST_MEMCPY_SIZE_THRESH)
        (PARAM_REUSEDIST_MEMSET_SIZE_THRESH): New params.



 Index: gcc/value-prof.c
 ===
 --- gcc/value-prof.c    (revision 173496)
 +++ gcc/value-prof.c    (working copy)
 @@ -1708,6 +1708,14 @@ interesting_stringop_to_profile_p (tree fndecl, gi
  {
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);

 +  /* Disable stringop collection with reuse distance instrumentation
 +     or optimization.  Otherwise we end up with hard to correct profile
 +     mismatches for functions where reuse distance-based transformation are
 +     made.  We see a number of memcpy at instrumentation time and a 
 different
 +     number of memcpy at profile use time.  */
 +  if (flag_profile_reusedist || flag_optimize_locality)
 +    return false;
 +
   if (fcode != BUILT_IN_MEMCPY  fcode != BUILT_IN_MEMPCPY
        fcode != BUILT_IN_MEMSET  fcode != BUILT_IN_BZERO)
     return false;
 Index: gcc/value-prof.h
 ===
 --- gcc/value-prof.h    (revision 173496)
 +++ gcc/value-prof.h    (working copy)
 @@ -103,6 +103,8 @@ extern void gimple_gen_const_delta_profiler (histo
                                             unsigned, unsigned);
  extern void gimple_gen_average_profiler (histogram_value, unsigned, 
 unsigned);
  extern void gimple_gen_ior_profiler (histogram_value, unsigned, unsigned);
 +extern void gimple_gen_reusedist (void);
 +extern void optimize_reusedist (void);

  /* In profile.c.  */
  extern void init_branch_prob (void);
 Index: gcc/gcov-io.h
 ===
 --- gcc/gcov-io.h       (revision 173496)
 +++ gcc/gcov-io.h       (working copy)
 @@ -374,7 +374,8 @@ typedef HOST_WIDEST_INT gcov_type;
  #define GCOV_LAST_VALUE_COUNTER 8  /* The last of counters used for value
                                      profiling.  */
  #define GCOV_COUNTER_DIRECT_CALL 9 /* Direct call counts.  */
 -#define GCOV_COUNTERS          10
 +#define GCOV_COUNTER_REUSE_DIST 10 /* Reuse distance measure.  */
 +#define GCOV_COUNTERS          11

  /* Number of counters used for value profiling.  */
  #define GCOV_N_VALUE_COUNTERS \
 @@ -383,7 +384,8 @@ typedef HOST_WIDEST_INT gcov_type;
   /* A list of human readable names of the counters */
  #define GCOV_COUNTER_NAMES     {arcs, interval, pow2, single, \
                                 delta,indirect_call, average, ior, \
 -                             

Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078)

2011-05-07 Thread Xinliang David Li
On Sat, May 7, 2011 at 5:46 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Fri, May 6, 2011 at 7:57 PM, Xinliang David Li davi...@google.com wrote:
 I want propose a more general solution.

 1) Generic Annotation Support for gcc IR -- it is used attach to
 application/optimization specific annotation to gimple statements and
 annotations can be passed around across passes. In gcc, I only see
 HISTOGRAM annotation for value profiling, which is not general enough
 2) Support of CallInfo for each callsite. This is an annotation, but
 more standardized. The callinfo can be used to record information such
 as call attributes, call side effects, mod-ref information etc ---
 current gimple_call_flags can be folded into this Info structure.

 I don't like generic annotation facilities.  What should passes to with
 annotated stmts that are a) transformed, b) removed?  See RTL notes
 and all the interesting issues they cause.


 Then how do you store information that needs to be passed across
 optimization passes -- you can not possibly dump all of them into the
 core IR. In fact, anything that is derived from (via analysis) but not
 part of the core IR need to worry about update and maintenance. In
 current GIMPLE, we can find many such instances -- DU chains, Memory
 SSA, control flow information, as well as flags like visited,
 no_warning, PLF (?), etc. Have a unified way of representing them is a
 good thing so that 1) make the IR lean and mean; 2) avoid too many
 different side data structures.  The important thing is to have a good
 verifier to catch insanity and inconsistency of the annotation after
 each pass.

 Well, as soon as you have a verifier it's no longer generic but _is_
 part of the core IL.  Similar to how EH info is part of the core IL,
 or alias info, or loop information.

The difference is that no-core IL can be thrown away anytime and be
recomputed if needed.

David



 Richard.



<    4   5   6   7   8   9   10   >