Re: GCC Optimisation, Part 0: Introduction

2011-05-04 Thread Dimitrios Apostolou

On Fri, 29 Apr 2011, Laurynas Biveinis wrote:

Thanks Diego, please send me the form. I'll sign it as soon as my
contributions require it.


Don't wait; sign it right away - it might take a while to process it.



Hey all, I should have told you, I sent the email so I'm now waiting for 
the papers. Thank you for your interest :-)



Dimitris



Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Paolo Bonzini

On 04/27/2011 03:28 PM, Yuan Pengfei wrote:

Any other advice will be appreciated.


I think you can look into llvm-clang. It compiles faster and uses
much less memory than gcc.


It is also a completely different compiler.  It doesn't make sense to 
compare the two, unless Dimitrios wants to rewrite GCC in 3 months.


Joe's suggestion (take a look at the speedup areas page, investigate 
which bullets can still be a priority) and talking to Nicola to help him 
with benchmarks and get up to speed with the code, are the best in this 
thread.


Regarding bitmaps, I have patches to use vector instructions.  The 
compile-time difference was basically a wash, but I can give them to you 
if you are interested.  Alternatively, the ebitmap.[ch] files in GCC are 
totally unused, and better in theory.  The main hurdle is that ebitmaps 
are not growable.  Still, they should be usable in many cases (or they 
can be made growable! :).


A nice exercise to get up to speed with the code is to make GCC record a 
set of bitmap operations from real compilation, and then have a driver 
that uses that set of operations for benchmarking.  This is the only 
sane way to extract the real performance of the data structure.  It 
doesn't avoid the need for validation in the compiler, of course, where 
other effects can come into play (caching etc.), but it is a useful start.


From the list in the webpage, the following items should be easier than 
the others:


* Split simulate_stmt into simulate_stmt and simulate_phi because the 
caller of simulate_stmt knows whether it's got a statement or a PHI node.


* Get rid of EXPR_LIST and INSN_LIST

* cxx_binding should be 16 bytes, not 20.

* Eliminate binfos for POD structs and other no-inheritance classes (not 
so easy)


* Put the string at the end of the IDENTIFIER_NODE using the trailing 
array hack (or possibly in the ht_identifier, see 
libcpp/include/symtab.h and libcpp/symtab.c)


Paolo


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Richard Guenther
On Fri, Apr 29, 2011 at 9:18 AM, Paolo Bonzini bonz...@gnu.org wrote:
 On 04/27/2011 03:28 PM, Yuan Pengfei wrote:

 Any other advice will be appreciated.

 I think you can look into llvm-clang. It compiles faster and uses
 much less memory than gcc.

 It is also a completely different compiler.  It doesn't make sense to
 compare the two, unless Dimitrios wants to rewrite GCC in 3 months.

 Joe's suggestion (take a look at the speedup areas page, investigate which
 bullets can still be a priority) and talking to Nicola to help him with
 benchmarks and get up to speed with the code, are the best in this thread.

 Regarding bitmaps, I have patches to use vector instructions.  The
 compile-time difference was basically a wash, but I can give them to you if
 you are interested.  Alternatively, the ebitmap.[ch] files in GCC are
 totally unused, and better in theory.  The main hurdle is that ebitmaps are
 not growable.  Still, they should be usable in many cases (or they can be
 made growable! :).

If they are not growable then they are a candidate to replace sbitmaps
(or bitmaps where sbitmaps are not used because the bitmap will be only
sparsely populated).

As usual most of the time improving the container implementation isn't
the best kind of optimization.  Instead exchanging the container type
used or the algorithms will make more of a difference.

 A nice exercise to get up to speed with the code is to make GCC record a set
 of bitmap operations from real compilation, and then have a driver that uses
 that set of operations for benchmarking.  This is the only sane way to
 extract the real performance of the data structure.  It doesn't avoid the
 need for validation in the compiler, of course, where other effects can come
 into play (caching etc.), but it is a useful start.

 From the list in the webpage, the following items should be easier than the
 others:

 * Split simulate_stmt into simulate_stmt and simulate_phi because the caller
 of simulate_stmt knows whether it's got a statement or a PHI node.

 * Get rid of EXPR_LIST and INSN_LIST

 * cxx_binding should be 16 bytes, not 20.

 * Eliminate binfos for POD structs and other no-inheritance classes (not so
 easy)

 * Put the string at the end of the IDENTIFIER_NODE using the trailing array
 hack (or possibly in the ht_identifier, see libcpp/include/symtab.h and
 libcpp/symtab.c)

Not so easy because C++ uses TREE_TYPE of it and thus you'd lose
the nice sharing of identifiers.

* Embed gimple_seq_node in gimple itself.  This should simplify some
  iterators, avoid pointer indirection and eventually be more cache friendly.
  No idea why it wasn't done this way from the start.

Richard.

 Paolo



Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Dimitrios Apostolou
Thank you all for your ideas, they are much appreciated. I will certainly 
investigate into the areas you mentioned, so do keep the feedback coming. 
I will certainly comment on them, once I have a better understanding. And 
I'd like to get in sync with existing work, so that duplicate effort is 
avoided (for example I'm already researching on finding an alternative 
hash function to replace htab_hash_string(), I don't know if Nicola has 
done that).


But for now I will be reading some tutorial slides I found from HiPEAC and 
CGO conferences, so that I understand GCC's architecture. Slides are good 
but are missing many things, so if you have video/audio transcripts of 
tutorials they would come in handy.


It is also a priority to standardise the way of measuring gcc's speed 
(which files to compile with what options), and dedicate some old 
machinery to that purpose. Perhaps this has already been done, and I can 
use existing setups? I found the following very useful pages:


http://gcc.opensuse.org/
http://gcc.gnu.org/wiki/PerformanceTesting

On another note, I can see that a similar project was accepted for 2007 
GSOC, titled as Speeding up GCC for fun and profit. Where can I find 
more info?


Finally I'd appreciate any guidelines regarding submitting patches for 
review or development in general. For example is diff -u the proper 
format to use, or should I include Changelog diff in all patches, or 
should they all be sent to gcc-patches list even if the patch is initial 
version just for reviewing, etc.



Thanks,
Dimitris




Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Diego Novillo
On Fri, Apr 29, 2011 at 09:24, Dimitrios Apostolou ji...@gmx.net wrote:
 Thank you all for your ideas, they are much appreciated. I will certainly
 investigate into the areas you mentioned, so do keep the feedback coming. I
 will certainly comment on them, once I have a better understanding. And I'd
 like to get in sync with existing work, so that duplicate effort is avoided
 (for example I'm already researching on finding an alternative hash function
 to replace htab_hash_string(), I don't know if Nicola has done that).

 But for now I will be reading some tutorial slides I found from HiPEAC and
 CGO conferences, so that I understand GCC's architecture. Slides are good
 but are missing many things, so if you have video/audio transcripts of
 tutorials they would come in handy.

I don't think there are any such transcripts, sorry.  I remember one
year they had video going for the GCC summit, but I don't think they
were ever released.

 It is also a priority to standardise the way of measuring gcc's speed (which
 files to compile with what options), and dedicate some old machinery to that
 purpose. Perhaps this has already been done, and I can use existing setups?
 I found the following very useful pages:

 http://gcc.opensuse.org/
 http://gcc.gnu.org/wiki/PerformanceTesting

Both are useful.  My recommendation is to decide on a set of programs
that you will use as your build timing benchmark and operate on those.
 A good source of code may be a big project like KDE or OpenOffice.

 On another note, I can see that a similar project was accepted for 2007
 GSOC, titled as Speeding up GCC for fun and profit. Where can I find more
 info?

I don't know what came of that, actually.  Eric?

 Finally I'd appreciate any guidelines regarding submitting patches for
 review or development in general. For example is diff -u the proper format
 to use, or should I include Changelog diff in all patches, or should they
 all be sent to gcc-patches list even if the patch is initial version just
 for reviewing, etc.

You should follow the guidelines in http://gcc.gnu.org/contribute.html.

The very first one is a copyright assignment.  Have you started that
process?  If not, I will send you the form.


Diego.


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Tom Tromey
 Paolo == Paolo Bonzini bonz...@gnu.org writes:

Paolo * Put the string at the end of the IDENTIFIER_NODE using the trailing
Paolo array hack (or possibly in the ht_identifier, see
Paolo libcpp/include/symtab.h and libcpp/symtab.c)

I implemented this once:

http://gcc.gnu.org/ml/gcc-patches/2008-03/msg01293.html

It did not go in because a different approach (the one in the code now)
was deemed clearer.

Tom


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Dimitrios Apostolou

On Fri, 29 Apr 2011, Diego Novillo wrote:


The very first one is a copyright assignment.  Have you started that
process?  If not, I will send you the form.



Thanks Diego, please send me the form. I'll sign it as soon as my 
contributions require it.



Dimitris



Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Nathan Froyd
On Fri, Apr 29, 2011 at 09:18:56AM +0200, Paolo Bonzini wrote:
 * Get rid of EXPR_LIST and INSN_LIST

This is reasonably difficult, though particular subprojects may be easy
enough.  Notable uses of EXPR_LIST:

- loop-iv.c

- the interface to TARGET_FUNCTION_VALUE

- the scheduler

- REG_NOTES

- var-tracking.c

- reload

Notable uses of INSN_LIST:

- the scheduler

- reload

- gcse.c

The biggest uses of each in the scheduler ought to be easy to deal with,
but the scheduler manipulates the lists in peculiar ways.

 * cxx_binding should be 16 bytes, not 20.

Not your fault, but comments like this on SpeedupAreas are so opaque as
to be useless.  *Why* should cxx_binding be 16 bytes?  Should we take
the next member out and have a VEC someplace instead of chaining?  Are
we duplicating information in the members themselves?  Etc.

-Nathan


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Paolo Bonzini

On 04/29/2011 04:15 PM, Nathan Froyd wrote:

  * cxx_binding should be 16 bytes, not 20.


Not your fault, but comments like this on SpeedupAreas are so opaque as
to be useless. *Why* should cxx_binding be 16 bytes?  Should we take
the next member out and have a VEC someplace instead of chaining?  Are
we duplicating information in the members themselves?  Etc.


Sorry, you're right.  It's about cache lines I guess, and moving the 
bitfields into one of the pointers.


Paolo


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Nathan Froyd
On Fri, Apr 29, 2011 at 04:20:15PM +0200, Paolo Bonzini wrote:
 On 04/29/2011 04:15 PM, Nathan Froyd wrote:
   * cxx_binding should be 16 bytes, not 20.

 Not your fault, but comments like this on SpeedupAreas are so opaque as
 to be useless. *Why* should cxx_binding be 16 bytes?  Should we take
 the next member out and have a VEC someplace instead of chaining?  Are
 we duplicating information in the members themselves?  Etc.

 Sorry, you're right.  It's about cache lines I guess, and moving the  
 bitfields into one of the pointers.

Gross. :)

-Nathan


Re: GCC Optimisation, Part 0: Introduction

2011-04-29 Thread Laurynas Biveinis
 Thanks Diego, please send me the form. I'll sign it as soon as my
 contributions require it.

Don't wait; sign it right away - it might take a while to process it.

-- 
Laurynas


Re: GCC Optimisation, Part 0: Introduction

2011-04-28 Thread Chiheng Xu
On Wed, Apr 27, 2011 at 10:29 PM, Basile Starynkevitch
bas...@starynkevitch.net wrote:

 Here are some areas I'll look closer to, as shown by some early profiling
 I performed:
  * hash tables (both htab and symtab)

 There is probably a lot of tuning possible around GCC hash tables. However,
 I would imagine that tuned performance might depend on the particular
 hardware running the so modified GCC. And C++ is becoming an acceptable
 programming language inside GCC. So replacing current C hashtables by some
 fancier C++ collections might be considered (but I have no idea if this will
 be a performance win, perhaps no!; it could be a readability win.).

 Perhaps some hashtables are used to associate data to important structures
 (like gimple) which are difficult to change. Perhaps if you have strong
 measurements that adding information inside gimple instead of keeping an
 hashtable for it you might be able to remove some hashtables, but this is
 difficult, since GCC essential data types have been very hand-optimized.


It seems to me that hash table in GCC is more like mapping(or
dictionary, or associated array, or function(Key-Value)), instead of
containter.

I think the main problem of hash table is that conflict rate is
unpredictable,  so the lookup time is unpredictable.  At it's worst
condition, you will run equal method on all the elements to find a
slot.

 Perhaps using B+ tree to implement mapping may be an alternative.
Using hash table , you should implement hash and equal methods.  Using
B+ tree, you should only implement compare method.  Using B+ tree, the
time complexity of lookup is O(log(n)), which is much better.

I don't think C++ STL map is necessarily better than hash table or B+
tree.  For every (Key, Value) pair, it will instantiate a new class,
rendering the resulting code size unnecessarily big.



 * ggc_internal_alloc_stat() or maybe implementing proper memory
 management instead of garbage collection, for hottest callers

 My opinion is that a garbage collector is (or can become) a proper memory
 manager.
 If you really want to switch back to manual memory management, please be
 sure to make changes which will be easy to revert.
 I believe that the current GCC garbage collector could be improved (but that
 is very hard work!)

Agree.



-- 
Chiheng Xu
Wuhan,China


Re: GCC Optimisation, Part 0: Introduction

2011-04-28 Thread Richard Guenther
On Thu, Apr 28, 2011 at 2:52 PM, Chiheng Xu chiheng...@gmail.com wrote:
 On Wed, Apr 27, 2011 at 10:29 PM, Basile Starynkevitch
 bas...@starynkevitch.net wrote:

 Here are some areas I'll look closer to, as shown by some early profiling
 I performed:
  * hash tables (both htab and symtab)

 There is probably a lot of tuning possible around GCC hash tables. However,
 I would imagine that tuned performance might depend on the particular
 hardware running the so modified GCC. And C++ is becoming an acceptable
 programming language inside GCC. So replacing current C hashtables by some
 fancier C++ collections might be considered (but I have no idea if this will
 be a performance win, perhaps no!; it could be a readability win.).

 Perhaps some hashtables are used to associate data to important structures
 (like gimple) which are difficult to change. Perhaps if you have strong
 measurements that adding information inside gimple instead of keeping an
 hashtable for it you might be able to remove some hashtables, but this is
 difficult, since GCC essential data types have been very hand-optimized.


 It seems to me that hash table in GCC is more like mapping(or
 dictionary, or associated array, or function(Key-Value)), instead of
 containter.

 I think the main problem of hash table is that conflict rate is
 unpredictable,  so the lookup time is unpredictable.  At it's worst
 condition, you will run equal method on all the elements to find a
 slot.

  Perhaps using B+ tree to implement mapping may be an alternative.
 Using hash table , you should implement hash and equal methods.  Using
 B+ tree, you should only implement compare method.  Using B+ tree, the
 time complexity of lookup is O(log(n)), which is much better.

Well, with a good hash-function your average lookup complexity is O(1)
which is much better.

Richard.


Re: GCC Optimisation, Part 0: Introduction

2011-04-28 Thread Robert Dewar

On 4/28/2011 8:55 AM, Richard Guenther wrote:


It seems to me that hash table in GCC is more like mapping(or
dictionary, or associated array, or function(Key-Value)), instead of
containter.

I think the main problem of hash table is that conflict rate is
unpredictable,  so the lookup time is unpredictable.  At it's worst
condition, you will run equal method on all the elements to find a
slot.

  Perhaps using B+ tree to implement mapping may be an alternative.
Using hash table , you should implement hash and equal methods.  Using
B+ tree, you should only implement compare method.  Using B+ tree, the
time complexity of lookup is O(log(n)), which is much better.


Well, with a good hash-function your average lookup complexity is O(1)
which is much better.


I think the hash table is a much better choice than the B+ tree. You
really are much more interested in average case performance in a 
compiler than worst case, especially when the worst case will not

happen in practice.


Richard.




Re: GCC Optimisation, Part 0: Introduction

2011-04-28 Thread Chiheng Xu
On Thu, Apr 28, 2011 at 9:13 PM, Robert Dewar de...@adacore.com wrote:
 I think the hash table is a much better choice than the B+ tree. You
 really are much more interested in average case performance in a compiler
 than worst case, especially when the worst case will not
 happen in practice.

Basically, I agree with you. Hash table is better in most cases, but
is not always better, especially on ultra large key set.

O(1) is better than O(log(n)), but is not much better.  If you have 1
million elements, at most, you only need 20 comparisons.  Comparison
may be much cheaper than hashing.  20  comparisons may be faster than
1 hashing.


-- 
Chiheng Xu
Wuhan,China


Re: GCC Optimisation, Part 0: Introduction

2011-04-28 Thread Chiheng Xu
On Fri, Apr 29, 2011 at 8:07 AM, Yuan Pengfei cool...@qq.com wrote:
 B+ tree is more commonly used in file systems. In memory, I think RB-tree is 
 better.
 RB-tree vs. hash table is just like map vs. unordered_map.

Any balanced tree that have O(log(n)) lookup complexity, including splay tree.


-- 
Chiheng Xu
Wuhan,China


Re: GCC Optimisation, Part 0: Introduction

2011-04-27 Thread Laurynas Biveinis
2011/4/27 Dimitrios Apostolou ji...@gmx.net:
 * ggc_internal_alloc_stat() or maybe implementing proper memory management
 instead of garbage collection, for hottest caller

This one can easily take much more time than three months. I've been
working in this area, right now I'm working on allocating RTL outside
GC.

There are some things one can do in three months, but which will not
necessarily result in speedups, only perhaps will enable future
speed-ups:
- Add object type tag to GC-allocated objects;
- Introduce explicit marking stack instead of current implicit
execution-based stack;

Having these two it should be possible to replace GC implementation
with something more advanced, for example, incremental GC.

-- 
Laurynas