Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Vladimir Makarov

Duncan Sands wrote:

Hi Steven,

I think Jack wasn't suggesting that dragonegg should be changed to 
not be
a plugin any more.  I think he was suggesting that it should live in 
the gcc

repository rather than the LLVM repository.


So, no offense, but the suggestion here is to make this subversive
(for FSF GCC) plugin part of FSF GCC? What is the benefit of this for
GCC? I don't see any. I just see a plugin trying to piggy-back on the
hard work of GCC front-end developers and negating the efforts of
those working on the middle ends and back ends.


I'm sorry you see the dragonegg project so negatively.  I think it is 
useful
for gcc (though not hugely useful), since it makes it easy to compare 
the gcc

and LLVM optimizers and code generators, not to mention the gcc and LLVM
approaches to LTO.  If LLVM manages to produce better code than gcc 
for some
testcase, then it is a convenient tool for the gcc devs to find out 
why, and
improve gcc.  If gcc is consistently better than LLVM then there's 
nothing to
worry about!  Of course, right now it is LLVM that is mostly playing 
catchup
with gcc, so for the moment it is principally the LLVM devs that get 
to learn
from gcc, but as LLVM improves the other direction is likely to occur 
more

often.
I've tried to compare gcc4.5 and dragonegg a week ago on SPEC2000 on a 
Core I7.

Here are some results.

 Only SPECIn2000 for x86_64 has been compiled fully successfully by
dragonegg.  There were a few compiler crashes including some in LLVM
itself for SPECFP2000 and for SPECINT2000 for x86.

So here is SPECInt2000 for x86_64 comparison:

dragonegg: -O3 (with LLVM release build)
gcc4.5: -O3 -flto (--enable-checking=release)

 Compilation Time  SPECINT2000
Dragonegg 122.85user 2572
gcc-4.5   283.49user 2841

 On integer benchmarks, dragonegg generates about 11% slower code.
One interesting thing is that dragonegg is a really fast compiler.  It
is 2.3 times faster than gcc.

Draggonegg generates smaller text sections but bigger data sections.
Unfortunately, my scripts measure and compare only text sections.  Therefore
I am not posting this text code size comparison because it has no 
sense.  But looking
at small benchmarks, I got an impression that gcc generates smaller code 
(text + data)

in general than dragonegg.



Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar

Vladimir Makarov wrote:

Duncan Sands wrote:

Hi Steven,

I think Jack wasn't suggesting that dragonegg should be changed to 
not be
a plugin any more.  I think he was suggesting that it should live in 
the gcc

repository rather than the LLVM repository.

So, no offense, but the suggestion here is to make this subversive
(for FSF GCC) plugin part of FSF GCC? What is the benefit of this for
GCC? I don't see any. I just see a plugin trying to piggy-back on the
hard work of GCC front-end developers and negating the efforts of
those working on the middle ends and back ends.
I'm sorry you see the dragonegg project so negatively.  I think it is 
useful
for gcc (though not hugely useful), since it makes it easy to compare 
the gcc

and LLVM optimizers and code generators, not to mention the gcc and LLVM
approaches to LTO.  If LLVM manages to produce better code than gcc 
for some
testcase, then it is a convenient tool for the gcc devs to find out 
why, and
improve gcc.  If gcc is consistently better than LLVM then there's 
nothing to
worry about!  Of course, right now it is LLVM that is mostly playing 
catchup
with gcc, so for the moment it is principally the LLVM devs that get 
to learn
from gcc, but as LLVM improves the other direction is likely to occur 
more

often.
I've tried to compare gcc4.5 and dragonegg a week ago on SPEC2000 on a 
Core I7.

Here are some results.

  Only SPECIn2000 for x86_64 has been compiled fully successfully by
dragonegg.  There were a few compiler crashes including some in LLVM
itself for SPECFP2000 and for SPECINT2000 for x86.

So here is SPECInt2000 for x86_64 comparison:

dragonegg: -O3 (with LLVM release build)
gcc4.5: -O3 -flto (--enable-checking=release)

  Compilation Time  SPECINT2000
Dragonegg 122.85user 2572
gcc-4.5   283.49user 2841

  On integer benchmarks, dragonegg generates about 11% slower code.
One interesting thing is that dragonegg is a really fast compiler.  It
is 2.3 times faster than gcc.


Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core). GCC has taken a decision to favor
performance of the code absolutely over compiler performance.
That's not such a bad bet given how fast machines are getting.
So I think this compile time advantage is not that interesting.


Draggonegg generates smaller text sections but bigger data sections.
Unfortunately, my scripts measure and compare only text sections.  Therefore
I am not posting this text code size comparison because it has no 
sense.  But looking
at small benchmarks, I got an impression that gcc generates smaller code 
(text + data)

in general than dragonegg.


Usually you will find that to a first order approximation, speed and
size are linearly related.



Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Steven Bosscher
On Wed, Apr 21, 2010 at 6:53 PM, Vladimir Makarov  wrote:
> One interesting thing is that dragonegg is a really fast compiler.  It
> is 2.3 times faster than gcc.

Yes, well, this is one thing "the crowd out there" complains about all
the time. It just appears to be almost impossible for GCC (the
project) to turn this around.

Do you also have per-benchmark compile time measurements, by any chance?

Ciao!
Steven


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Steven Bosscher
On Wed, Apr 21, 2010 at 6:56 PM, Robert Dewar  wrote:
> Actually for my taste, you have to get a MUCH bigger factor in compile
> time before you can call yourself a fast compiler (Realia COBOL by
> comparison compiles millions of lines a minute of code on current
> PC's, using just one core).

Heh, you always bring up the Realia compiler when there's a compile
time discussion. Must have been a really impressive piece of work,
that it was so fast :-)

> GCC has taken a decision to favor
> performance of the code absolutely over compiler performance.
> That's not such a bad bet given how fast machines are getting.
> So I think this compile time advantage is not that interesting.

I disagree (how unexpected is that? :-).

I think you are right that it is not per se a bad decision to favor
performance of the code over performance of the compiler itself. But a
quick investigation of, say, compile times for a linux kernel from GCC
3.1 to GCC 4.5 shows that GCC slows down faster than what Moore's law
compensates for. And people do complain about this. I'll admit it is
hard to say if the complainers are a significant number of GCC users
or just a loud minority...

Ciao!
Steven


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Vladimir Makarov

Steven Bosscher wrote:

On Wed, Apr 21, 2010 at 6:53 PM, Vladimir Makarov  wrote:
  

One interesting thing is that dragonegg is a really fast compiler.  It
is 2.3 times faster than gcc.



Yes, well, this is one thing "the crowd out there" complains about all
the time. It just appears to be almost impossible for GCC (the
project) to turn this around.

  
I don't think we should be too much worried about it.  GCC looks good in 
comparison with other industrial compiler with compile time point (and 
code size too) of view (e.g. SunStudio compiler is about 2 times slower 
and generates worse code on x86/x86_64 according to my benchmarking 2 
years ago, Intel is also slower but generates much better code than gcc).

Do you also have per-benchmark compile time measurements, by any chance?

  

No, I measure only overall compiler time for SPECInt2000 and SPECFP2000.



Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Vladimir Makarov

Robert Dewar wrote:

Vladimir Makarov wrote:

Duncan Sands wrote:

Hi Steven,

I think Jack wasn't suggesting that dragonegg should be changed to 
not be
a plugin any more.  I think he was suggesting that it should live 
in the gcc

repository rather than the LLVM repository.

So, no offense, but the suggestion here is to make this subversive
(for FSF GCC) plugin part of FSF GCC? What is the benefit of this for
GCC? I don't see any. I just see a plugin trying to piggy-back on the
hard work of GCC front-end developers and negating the efforts of
those working on the middle ends and back ends.
I'm sorry you see the dragonegg project so negatively.  I think it 
is useful
for gcc (though not hugely useful), since it makes it easy to 
compare the gcc
and LLVM optimizers and code generators, not to mention the gcc and 
LLVM
approaches to LTO.  If LLVM manages to produce better code than gcc 
for some
testcase, then it is a convenient tool for the gcc devs to find out 
why, and
improve gcc.  If gcc is consistently better than LLVM then there's 
nothing to
worry about!  Of course, right now it is LLVM that is mostly playing 
catchup
with gcc, so for the moment it is principally the LLVM devs that get 
to learn
from gcc, but as LLVM improves the other direction is likely to 
occur more

often.
I've tried to compare gcc4.5 and dragonegg a week ago on SPEC2000 on 
a Core I7.

Here are some results.

  Only SPECIn2000 for x86_64 has been compiled fully successfully by
dragonegg.  There were a few compiler crashes including some in LLVM
itself for SPECFP2000 and for SPECINT2000 for x86.

So here is SPECInt2000 for x86_64 comparison:

dragonegg: -O3 (with LLVM release build)
gcc4.5: -O3 -flto (--enable-checking=release)

  Compilation Time  SPECINT2000
Dragonegg 122.85user 2572
gcc-4.5   283.49user 2841

  On integer benchmarks, dragonegg generates about 11% slower code.
One interesting thing is that dragonegg is a really fast compiler.  It
is 2.3 times faster than gcc.


Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core). GCC has taken a decision to favor
performance of the code absolutely over compiler performance.
That's not such a bad bet given how fast machines are getting.
So I think this compile time advantage is not that interesting.
For me definitely.  Also as I wrote I would not be too much worried 
about it.  GCC looks good in comparison with other industrial compiler 
with compile time (and code size too) point of view. Here I mean Intel, 
SunStudio, PathScale, Open64 ones.




Draggonegg generates smaller text sections but bigger data sections.
Unfortunately, my scripts measure and compare only text sections.  
Therefore
I am not posting this text code size comparison because it has no 
sense.  But looking
at small benchmarks, I got an impression that gcc generates smaller 
code (text + data)

in general than dragonegg.


Usually you will find that to a first order approximation, speed and
size are linearly related.

I am agree with this for moderately optimizing compilers.  But for 
highly optimizing compilers it might be no true.  Intel generates much 
better and bigger code than gcc.  Although it might be mostly because of 
code versioning  (including one for different subtargets).


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Chris Lattner

On Apr 21, 2010, at 9:53 AM, Vladimir Makarov wrote:

> Only SPECIn2000 for x86_64 has been compiled fully successfully by
> dragonegg.  There were a few compiler crashes including some in LLVM
> itself for SPECFP2000 and for SPECINT2000 for x86.
> 
> So here is SPECInt2000 for x86_64 comparison:
> 
> dragonegg: -O3 (with LLVM release build)
> gcc4.5: -O3 -flto (--enable-checking=release)
> 
> Compilation Time  SPECINT2000
> Dragonegg 122.85user 2572
> gcc-4.5   283.49user 2841
> 
> On integer benchmarks, dragonegg generates about 11% slower code.
> One interesting thing is that dragonegg is a really fast compiler.  It
> is 2.3 times faster than gcc.

This is definitely interesting, but you're also comparing apples and oranges 
here (for both compile time and performance).  Can you get numbers showing GCC 
-O3 and dragonegg with LTO to get a better comparison?

-Chris


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Manuel López-Ibáñez
On 21 April 2010 19:11, Vladimir Makarov  wrote:
> I don't think we should be too much worried about it.  GCC looks good in
> comparison with other industrial compiler with compile time point (and code
> size too) of view (e.g. SunStudio compiler is about 2 times slower and
> generates worse code on x86/x86_64 according to my benchmarking 2 years ago,
> Intel is also slower but generates much better code than gcc).

There is the perception that GCC is too slow and every release it gets
much slower for not significant gain. At some point one has to start
asking whether there is something that can be done to alleviate this.
Either by showing that in fact there is a significant gain, or by
improving compilation speed. But we should be worried.

We have to wait until clang can compile as much C++ code as GCC and
implement a similar feature set, but the differences are going to be
much larger when LLVM uses Clang. [*] This is a major selling point of
Clang/LLVM against GCC. You can choose to ignore it but it is out
there unchallenged and GCC users are listening to it. And it will
probably show that reimplementing GCC FEs using LLVM infrastructure is
an expensive but rewarding project. In fact, given the LLVM/Clang
already has many features that GCC has not, it is not clear what is
the overhead of implementing those features in GCC.

So do you think that the differences in compilation speed can be
explained mostly by lack of optimization features in LLVM?

Cheers,

Manuel.

[*] I also would be very interested on knowing what is the impact of
the integrated assembler approach in compile time:
http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Paolo Bonzini

On 04/21/2010 07:04 PM, Steven Bosscher wrote:

On Wed, Apr 21, 2010 at 6:56 PM, Robert Dewar  wrote:

Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core).


Heh, you always bring up the Realia compiler when there's a compile
time discussion. Must have been a really impressive piece of work,
that it was so fast :-)


It was fast, I used it for my first summer job, and spent some time 
looking at its output too.  :-)


And actually I'm impressed especially because it wasn't (as far as I 
remember) an optimizing compiler, yet it was written in itself _and_ so 
fast.


Paolo


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Duncan Sands

Hi Vladimir, thank you for doing this benchmarking.


Only SPECIn2000 for x86_64 has been compiled fully successfully by
dragonegg. There were a few compiler crashes including some in LLVM
itself for SPECFP2000 and for SPECINT2000 for x86.


Sorry about that.  Can you please send me preprocessed code for the
spec tests that crashed the plugin (unless you are not allowed to).
By the way, if you target something (eg: i386) that doesn't have SSE
support then I've noticed that the plugin tends to crash on code that
does vector operations.  If you have assertions turned on in LLVM then
you get something like:

Assertion `TLI.isTypeLegal(Op.getValueType()) && "Intrinsic uses a non-legal 
type?"' failed.

Stack dump:
0.  Running pass 'X86 DAG->DAG Instruction Selection' on function 
'@_ada_sse_nolib'

So if the compile failures are of that kind, no need to send testcases, I
already have several.

Best wishes,

Duncan.


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar

Steven Bosscher wrote:

On Wed, Apr 21, 2010 at 6:56 PM, Robert Dewar  wrote:

Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core).


Heh, you always bring up the Realia compiler when there's a compile
time discussion. Must have been a really impressive piece of work,
that it was so fast :-)


Well if you look at the parameters ... that compiler was written
in the early 80's (the first machine on which we ran it was a
PC-1 with diskettes, and it achieved about 10,000 lines/minute
in that environment. Why was compilation time important? Because
COBOL programmers did and still do routinely write very large
programs (a COBOL program = a C function roughly). A COBOL
run-unit (= a C program roughly) is composed of one or more
programs, and very often it was the style for there to be
only a few really large programs, so even back in 1980, COBOL
programmers routinely compiled files consisting of tens of
thousands of lines of code. So compile time speed was a factor.

And indeed Realia COBOL was much faster than the major
competitor Microfocus.

But the interesting thing is that over time, that advantage
evaporated. It was very interesting to compile 10,000 lines
a minute when the competition does only 1,000 lpm, but it
is no longer so exciting to compile 1,000,000 lpm with the
competition managing 100,000 lpm, since both are fast
enough in practice.



GCC has taken a decision to favor
performance of the code absolutely over compiler performance.
That's not such a bad bet given how fast machines are getting.
So I think this compile time advantage is not that interesting.


I disagree (how unexpected is that? :-).

I think you are right that it is not per se a bad decision to favor
performance of the code over performance of the compiler itself. But a
quick investigation of, say, compile times for a linux kernel from GCC
3.1 to GCC 4.5 shows that GCC slows down faster than what Moore's law
compensates for.


You are ignoring the multi-core effect!

It's interesting to look at the time it takes to run our internal
gnat test suite (tens of millions of lines of code, hundreds
of thousands of files, 12000 distinct test cases).

This used to take about an hour for many many years, the test
suite seemed to grow fast enough to compensate for improved
processor performance.

But with the advent of multi-core machines, the picture has
changed entirely, although the test suite has continued to
grow rapidly in size, the time to run it is now down to
15 minutes when we run on a multi-core machine, and we just
got a machine with 2x6 cores, each hyperthreaded, which will
probably cut this in half again.

Given it is so easy to take advantage of multi-cores when
compiling large projects, the overall effect is definitely
that GCC 4.5 is much faster than GCC 3.1 for handling large
projects with latest hardware.

I do realize that some people are running gcc on very old
machines, that particularly happens say in developing
countries or with students or hobbyists using old cast
off machines. And for those compile time is a problem,
but for out Ada users, many of whom have absolutely giant
programs of millions of lines, compile time speed has not
been an issue (we would know if it was, people would
tell us, they tell us of ANY problems they have).

The one exception is that native compilation on VMS
is slow, but that's a consequence of the VMS file
system, where opening lots of small files is slow.
We are planning to encourage people using VMS to
switch to using PC-based cross-compilation.


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Vladimir Makarov

Chris Lattner wrote:

On Apr 21, 2010, at 9:53 AM, Vladimir Makarov wrote:

  

Only SPECIn2000 for x86_64 has been compiled fully successfully by
dragonegg.  There were a few compiler crashes including some in LLVM
itself for SPECFP2000 and for SPECINT2000 for x86.

So here is SPECInt2000 for x86_64 comparison:

dragonegg: -O3 (with LLVM release build)
gcc4.5: -O3 -flto (--enable-checking=release)

Compilation Time  SPECINT2000
Dragonegg 122.85user 2572
gcc-4.5   283.49user 2841

On integer benchmarks, dragonegg generates about 11% slower code.
One interesting thing is that dragonegg is a really fast compiler.  It
is 2.3 times faster than gcc.



This is definitely interesting, but you're also comparing apples and oranges 
here (for both compile time and performance).  Can you get numbers showing GCC 
-O3 and dragonegg with LTO to get a better comparison?

  
Dragonegg does not work with -flto.  It generates assembler code on 
which gas complaints (a lot of non-assembler code like target 
data-layout which are not in comments).


So I'll do gcc -O3 without -flto.  I don't think it will change average 
SPECINT2000 rate significantly (although it can change separte benchmark 
significantly)  but it will make gcc compiler much faster (may be 2 
times because I did not use -fwhole-program).  I'll post the results in 
an hour.





Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Vladimir Makarov

Duncan Sands wrote:

Hi Vladimir, thank you for doing this benchmarking.


Only SPECIn2000 for x86_64 has been compiled fully successfully by
dragonegg. There were a few compiler crashes including some in LLVM
itself for SPECFP2000 and for SPECINT2000 for x86.


Sorry about that. Can you please send me preprocessed code for the
spec tests that crashed the plugin (unless you are not allowed to).
By the way, if you target something (eg: i386) that doesn't have SSE
support then I've noticed that the plugin tends to crash on code that
does vector operations. If you have assertions turned on in LLVM then
you get something like:

Assertion `TLI.isTypeLegal(Op.getValueType()) && "Intrinsic uses a 
non-legal type?"' failed.

Stack dump:
0. Running pass 'X86 DAG->DAG Instruction Selection' on function 
'@_ada_sse_nolib'


So if the compile failures are of that kind, no need to send testcases, I
already have several.


I have one different crash on galgel:

/home/vmakarov/build/dragonegg/64/bin/gfortran -c -o bifg21.o  -ffixed-form 
-mpc64  -O3 -m32 -mpc64 
-fplugin=/home/cygnus/vmakarov/build/dragonegg/dragonegg/dragonegg.so   
bifg21.f90
specmake: *** Warning: File `Makefile.spec' has modification time in the future 
(1271359622 > 1271358843)
f951: /to/scratch/vmakarov/build/dragonegg/llvm/lib/VMCore/Instructions.cpp:320: void 
llvm::CallInst::init(llvm::Value*, llvm::Value* const*, unsigned int): Assertion `(NumParams == 
FTy->getNumParams() || (FTy->isVarArg() && NumParams > FTy->getNumParams())) && 
"Calling a function with bad signature!"' failed.
*** WARNING *** there are active plugins, do not report this as a bug unless 
you can reproduce it without enabling any plugins.
Event| Plugins
PLUGIN_FINISH_UNIT   | dragonegg
PLUGIN_FINISH| dragonegg
PLUGIN_START_UNIT| dragonegg
bifg21.f90: In function ‘bifg21_’:
bifg21.f90:21:0: internal compiler error: Aborted
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
specmake: *** [bifg21.o] Error 1

It is impossible (as I know) to send a preprocessed file for fortran90.  It 
needs other program files to compile too.  But may be this diagnostic is still 
useful for you.





Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar


I am agree with this for moderately optimizing compilers.  But for 
highly optimizing compilers it might be no true.  Intel generates much 
better and bigger code than gcc.  Although it might be mostly because of 
code versioning  (including one for different subtargets).


I don't think this is true if you select the appropriate option in
ICC to generate code for just one target, but of course if you let
ICC generate code for multiple targets (e.g. GenuineIntel with SSE
vs AuthenticAMD without SSE), then of course you get larger objects,
since you have a run time test and then essentially two separate
compilations of the same code in the same object.




Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Chris Lattner

On Apr 21, 2010, at 11:11 AM, Vladimir Makarov wrote:

>> 
>> This is definitely interesting, but you're also comparing apples and oranges 
>> here (for both compile time and performance). Can you get numbers showing 
>> GCC -O3 and dragonegg with LTO to get a better comparison?
>> 
>>  
> Dragonegg does not work with -flto.  It generates assembler code on which gas 
> complaints (a lot of non-assembler code like target data-layout which are not 
> in comments).

Ok, I didn't know that didn't get wired up.  I'm not familiar with dragonegg, 
it might require gold with the llvm lto gold plugin or something.

> So I'll do gcc -O3 without -flto.  I don't think it will change average 
> SPECINT2000 rate significantly (although it can change separte benchmark 
> significantly)  but it will make gcc compiler much faster (may be 2 times 
> because I did not use -fwhole-program).  I'll post the results in an hour.

Sounds good, thanks!  I suspect the gcc build times will improve.

-Chris


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar

Paolo Bonzini wrote:

On 04/21/2010 07:04 PM, Steven Bosscher wrote:

On Wed, Apr 21, 2010 at 6:56 PM, Robert Dewar  wrote:

Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core).

Heh, you always bring up the Realia compiler when there's a compile
time discussion. Must have been a really impressive piece of work,
that it was so fast :-)


It was fast, I used it for my first summer job, and spent some time 
looking at its output too.  :-)


And actually I'm impressed especially because it wasn't (as far as I 
remember) an optimizing compiler, yet it was written in itself _and_ so 
fast.


It did not do what we would call global optimization, but it had very
good local optimization, and very good handling of jumps and effective
inlining of PERFORMS which made a big difference. For example

   HANDLE-BALANCE.
 IF BALANCE IS NEGATIVE THEN
PERFORM SEND-BILL
 ELSE
PERFORM RECORD-CREDIT
 END-IF.

   SEND-BILL.
 <>

   RECORD-CREDIT.
 <>

(see you can read COBOL even if you never saw it before :-) :-))

was compiled as though the two performs were inlined. This is
valuable in COBOL (and not done by the IBM mainframe compiler
at the time), since it is the style in COBOL to do these small
named local refinements, COBOL programmers consider nesting
constructs such as IF's to be bad style, preferring instead
to name the refined blocks as shown above (see what a very
nice compact syntax COBOL has for this kind of refinement,
much better than the algol or fortran style languages :-)
still shouldn't start a COBOL flame war here, sorry!)


Paolo




Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar

Manuel López-Ibáñez wrote:

On 21 April 2010 19:11, Vladimir Makarov  wrote:

I don't think we should be too much worried about it.  GCC looks good in
comparison with other industrial compiler with compile time point (and code
size too) of view (e.g. SunStudio compiler is about 2 times slower and
generates worse code on x86/x86_64 according to my benchmarking 2 years ago,
Intel is also slower but generates much better code than gcc).


There is the perception that GCC is too slow and every release it gets
much slower for not significant gain. At some point one has to start
asking whether there is something that can be done to alleviate this.
Either by showing that in fact there is a significant gain, or by
improving compilation speed. But we should be worried.


We (here we = the commercial company AdaCore) would be worried if
ANY of our customers were worried, but they are not, they see a
continuous effective improvement in compile speed using the latest
available hardware, and it's not a factor for them.


So do you think that the differences in compilation speed can be
explained mostly by lack of optimization features in LLVM?


Nobody said that, the explanation is of course FAR more complex,
and to some extent it may be a matter of orientation and
enthusiasm. There is more enthusiasm in the gcc community for
implementation new optimizations to improve performance, than
in speeding up the existing compiler, which is quite
understandable.


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Basile Starynkevitch

Steven Bosscher wrote:

On Wed, Apr 21, 2010 at 6:56 PM, Robert Dewar  wrote:

Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core).


Heh, you always bring up the Realia compiler when there's a compile
time discussion. Must have been a really impressive piece of work,
that it was so fast :-)


Another example of a compiler which compiles quickly but produces slow 
code is tinycc	http://www.tinycc.org/  http://repo.or.cz/w/tinycc.git

(the program is called tcc)

In my very small & rusty experience, it did happen that tcc used to 
generate incorrect machine code, at least some old version of tcc did 
compile some old version of MELT generated code incorrectly on x86-64 
[the tcc-generated *.so crashed, while the *.so generated by GCC from 
same source did run correctly].


Now, it is indeed true that TCC probably evolved since (& MELT also), 
and I don't know where and how to get the newest TCC source (is the 
"git clone git://repo.or.cz/tinycc.git" command enough?, the version 
number seems to be 0.9.25 since more than a year...).


A useless measure of compile time (within the MELT branch, subdirectory 
gcc of the build directory. warmelt-first.1.c is a generated C file of 
96KLOC)


% time gcc-4.5 -g -DIN_GCC -DHAVE_CONFIG_H  \
  -I melt-private-build-include -I. -fPIC -c -o warmelt-first.1.pic.o 
warmelt-first.1.c


gcc-4.5 -g -DIN_GCC -DHAVE_CONFIG_H -I melt-private-build-include -I. 
-fPIC -  10.29s user 0.41s system 100% cpu 10.695 total


 % time tcc -g -DIN_GCC -DHAVE_CONFIG_H -I melt-private-build-include 
-I. -fPIC -c -o warmelt-first.1.pic.o warmelt-first.1.c
tcc -g -DIN_GCC -DHAVE_CONFIG_H -I melt-private-build-include -I. -fPIC 
-c -o  0.63s user 0.03s system 99% cpu 0.660 total


The current tcc is not really usable for me, I am not able to do a melt 
bootstrap (that is to compile warmelt-*.0.c into MELT modules 
warmelt*0.so, use them to generate warmelt*1.c, compile them to 
warmelt*1.so, and use them to generate warmelt*2.c). This MELT bootstrap 
is routinely done with GCC 4.4 & GCC 4.5 (the warmelt*1.c is generated 
but does not work ok).


Regards.

PS. About GCC MELT see http://gcc.gnu.org/wiki/MiddleEndLispTranslator
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar

From the early days, WATFOR was an impressively fast compiler,
and then there is always Borland Pascal.

I once gave a talk at the SIGPLAN compiler conference whose
theme was the great successes we were having in managing
to dramatically slow down compilers :-)


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Vladimir Makarov

Chris Lattner wrote:

On Apr 21, 2010, at 11:11 AM, Vladimir Makarov wrote:

  

This is definitely interesting, but you're also comparing apples and oranges 
here (for both compile time and performance). Can you get numbers showing GCC 
-O3 and dragonegg with LTO to get a better comparison?

 
  

Dragonegg does not work with -flto.  It generates assembler code on which gas 
complaints (a lot of non-assembler code like target data-layout which are not 
in comments).



Ok, I didn't know that didn't get wired up.  I'm not familiar with dragonegg, 
it might require gold with the llvm lto gold plugin or something.

  

So I'll do gcc -O3 without -flto.  I don't think it will change average 
SPECINT2000 rate significantly (although it can change separte benchmark 
significantly)  but it will make gcc compiler much faster (may be 2 times 
because I did not use -fwhole-program).  I'll post the results in an hour.



Sounds good, thanks!  I suspect the gcc build times will improve.
  

Here the results of SPECINT2000 on x86_64 for dragonegg -O3 vs gcc-4.5 -O3.

dragonegg: -O3 (release build)
gcc4.5: -O3 (--enable-checking=release)

 Compilation Time  SPECINT2000
Dragonegg 122.85user 2572
gcc-4.5   142.76user 2784

 Dragonegg generates about 9% slower code (vs 11% for gcc with
-flto).  Without -flto, gcc4.5 is only 16% slower than dragonegg.




Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Eric Botcazou
> We (here we = the commercial company AdaCore) would be worried if
> ANY of our customers were worried, but they are not, they see a
> continuous effective improvement in compile speed using the latest
> available hardware, and it's not a factor for them.

The Ada compiler is a little special here because our internal measures show
that GCC 4.x based Ada compilers are faster than GCC 3.x based ones, all other 
things being equal, at least on x86/Linux.

GCC 4.5 hasn't been evaluated yet though.

-- 
Eric Botcazou


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Duncan Sands

Hi Vladimir,


Dragonegg does not work with -flto. It generates assembler code on which
gas complaints (a lot of non-assembler code like target data-layout
which are not in comments).


actually it does work with -flto, in an awkward way.  When you use -flto
it spits out LLVM IR.  You need to use -S, otherwise the system assembler
tries (and fails) to compile this.  You need to then use llvm-as to turn
this into LLVM bitcode.  You can then link and optimize the bitcode either
by hand (using llvm-ld) or using the gold plugin, as described in
  http://llvm.org/docs/GoldPlugin.html

It is annoying that gcc insists on running the system assembler when passed
-c.  Not running the assembler isn't only good for avoiding the -S + llvm-as
rigmarole mentioned above.  LLVM is now capable of writing out object files
directly, i.e. without having to pass via an assembler at all.  It would be
neat if I could have the plugin immediately write out the final object file
if -c is passed.  I didn't work out how to do this yet.  It probably requires
some gcc modifications, so maybe something can be done for gcc-4.6.

For transparent LTO another possibility is to encode LLVM bitcode in the
assembler in the same way as gcc does for gimple when passed -flto.  I didn't
investigate this yet.

Ciao,

Duncan.


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Vladimir Makarov

Robert Dewar wrote:


I am agree with this for moderately optimizing compilers.  But for 
highly optimizing compilers it might be no true.  Intel generates 
much better and bigger code than gcc.  Although it might be mostly 
because of code versioning  (including one for different subtargets).


I don't think this is true if you select the appropriate option in
ICC to generate code for just one target, but of course if you let
ICC generate code for multiple targets (e.g. GenuineIntel with SSE
vs AuthenticAMD without SSE), then of course you get larger objects,
since you have a run time test and then essentially two separate
compilations of the same code in the same object.


It is hard to find appropriate options even if we put mutliple targets 
code generation away.  For example, if you use -fast for ICC it means 
using -static libraries which makes code much bigger.


Although it is not right argument to what you mean.  But example about 
vectorization would be right.  ICC vectorizes many more loops than gcc 
does.  Vectorized loops is much bigger in size than their non-vectorized 
variants.  So faster code does not mean smaller code in general.  There 
are a lot of optimization which makes code bigger and faster: like 
function versioning (based on argument values), aggressive inlining, 
modulo scheduling, vectorization, loop unrolling, loop versioning, loop 
tiling etc.  So even if the both compiler do the same optimizations and 
if one compiler is more successful in such optimizations, the generated 
code will be bigger and faster.




Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Toon Moene

Robert Dewar wrote:


Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core).


Obviously, apart from comparing a sufficiently large set of compilers on 
this, "speed of compilation" is mostly in the eye of the beholder.


Subjectively, as of gcc/gfortran 4.4, our (roughly 1 million lines of 
Fortran + 30,000 lines of C) code gets compiled (optimized and 
vectorized at -O3) in about 5 minutes on a quad core machine (using make 
-j8).


As an absolute number, this tells you nothing.  But as a measure of 
usefulness, it means that from 4.4 onwards, it is possible to recompile 
our complete weather forecasting suite at *every* new run, 4 times a day.


You bet that's sometimes useful ...

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran


Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar

Vladimir Makarov wrote:

Although it is not right argument to what you mean.  But example about 
vectorization would be right.  ICC vectorizes many more loops than gcc 
does.  Vectorized loops is much bigger in size than their non-vectorized 
variants.  So faster code does not mean smaller code in general.  There 
are a lot of optimization which makes code bigger and faster: like 
function versioning (based on argument values), aggressive inlining, 
modulo scheduling, vectorization, loop unrolling, loop versioning, loop 
tiling etc.  So even if the both compiler do the same optimizations and 
if one compiler is more successful in such optimizations, the generated 
code will be bigger and faster.


Sure, we can all find such examples, but if you take a large program,
(say hundreds of thousands of lines), you will find that the speed
vs size relation holds pretty well.



Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-21 Thread Robert Dewar

Toon Moene wrote:

Robert Dewar wrote:


Actually for my taste, you have to get a MUCH bigger factor in compile
time before you can call yourself a fast compiler (Realia COBOL by
comparison compiles millions of lines a minute of code on current
PC's, using just one core).


Obviously, apart from comparing a sufficiently large set of compilers on 
this, "speed of compilation" is mostly in the eye of the beholder.


Subjectively, as of gcc/gfortran 4.4, our (roughly 1 million lines of 
Fortran + 30,000 lines of C) code gets compiled (optimized and 
vectorized at -O3) in about 5 minutes on a quad core machine (using make 
-j8).


As an absolute number, this tells you nothing.  But as a measure of 
usefulness, it means that from 4.4 onwards, it is possible to recompile 
our complete weather forecasting suite at *every* new run, 4 times a day.


You bet that's sometimes useful ...



Right, and I think once you get down to 5 minutes, then you are good
enough, it is in the minor convenience category to get down to 2 minutes.

The Realia COBOL compiler, written entirely in COBOL, could compile
itself in less than a minute on a 25 megahertz 386, but 5 minutes
would have been fine (of course the compiler was a small program
compared to many of the customer programs, less than a hundred thousand
lines of COBOL).



Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-22 Thread Laurent GUERBY
On Wed, 2010-04-21 at 14:03 -0400, Robert Dewar wrote:
> I do realize that some people are running gcc on very old
> machines, that particularly happens say in developing
> countries or with students or hobbyists using old cast
> off machines. 

For those developping free software the compile farm
has some nice iron:

http://gcc.gnu.org/wiki/CompileFarm

> And for those compile time is a problem,
> but for out Ada users, many of whom have absolutely giant
> programs of millions of lines, compile time speed has not
> been an issue (we would know if it was, people would
> tell us, they tell us of ANY problems they have).
> 
> The one exception is that native compilation on VMS
> is slow, but that's a consequence of the VMS file
> system, where opening lots of small files is slow.
> We are planning to encourage people using VMS to
> switch to using PC-based cross-compilation.

In my (limited) experience for daily development link times on the
Windows platform for big Ada applications are an issue too, not compile
times.

Laurent





Re: Some benchmark comparison of gcc4.5 and dragonegg (was dragonegg in FSF gcc?)

2010-04-22 Thread Vladimir Makarov

Robert Dewar wrote:

Vladimir Makarov wrote:

Although it is not right argument to what you mean.  But example 
about vectorization would be right.  ICC vectorizes many more loops 
than gcc does.  Vectorized loops is much bigger in size than their 
non-vectorized variants.  So faster code does not mean smaller code 
in general.  There are a lot of optimization which makes code bigger 
and faster: like function versioning (based on argument values), 
aggressive inlining, modulo scheduling, vectorization, loop 
unrolling, loop versioning, loop tiling etc.  So even if the both 
compiler do the same optimizations and if one compiler is more 
successful in such optimizations, the generated code will be bigger 
and faster.


Sure, we can all find such examples, but if you take a large program,
(say hundreds of thousands of lines), you will find that the speed
vs size relation holds pretty well.

 Definitely not for Intel compiler and not for modern x86_64 processors 
(although it is most probably true for some other processors like ARM).  
ICC really generates much bigger code than GCC even we take subtarget 
versioning away.  The closest analog on x86_64 for gcc -O3 would be -O3 
-xT for icc.  -xT means generating code only one subtarget which is Core2.


 I tried to compile the biggest one file program I have (about 500K 
lines).   ICC crashed on it because there is not enough memory (8GB) in 
comparison with gcc which is doing fine with 2GB memory.   So I had to 
check SPEC2006.  In average the code increase on all SPEC2006 for ICC 
was 34%.  But because you mentioned programs with hundreds of thousands 
of lines, I am also giving numbers for some programs from SPEC2006.


 lines Intel code size increase
gromacs400K23%
tonto 125K29%
wrf115K44%
gobmk   197K-2%

Already long ago I got impression that ICC is good mostly for FP 
programs (for integer benchmarks gcc frequently generates a better code) 
but if icc stays its course, gcc will be always a better system compiler.