Sorry for a double posting. (Netscape mailer sucks!)
Alos, my previous post got rejected due to a maximum post size.
So I have to exclude the rpm attachment. 
This includes the comparison results only.

Alan Watson wrote:
> 
> I've had two disconcerting experiences lately. Last month,
> we upgraded a Sun from Solaris 2.6 to Solaris 2.7; it got
> faster. Last week, I upgraded my dual 450 MHz PII machine
> from RedHat 5.2 (2.0.36) to RedHat 6.0 (2.2.5-15smp); it got
> slower.
> 
> I'm working on a two-stage image compression method. The
> image is quantized in one process and fed into gzip or bzip2
> using popen(). In sh terms, this is "process1 | process2".
> This used to parallelize quite nicely, with one process
> running on each processor, but now about the best I can get
> is 120% CPU usage, often less. Overall, the pipe is slower.
> 

Please attached find the following:

1) rpm of nbench which contains nbench (several binaries compiled with
different compilers and optimization flags are available for
comparison). -- not included.
2) HTML documnet summaruzing the results
3) text version of the same document.

The short version is:
For egcs compiler, you need to compile with -mpentiumpro switch on
Pentium II machines. Binaries compiled with -m386 or -m486 switch will
run slower than binaries compiled with gcc!

I have tried to post it on slashdot twice, but was not able to get
through.

Please check it and spread the word, I think this is important.

And no, the issue is not kernel related.


-- 
----------------------------------------
Constantine Gavrilov
Unix System Administrator and Programmer
Orbotech
Yavne 81102, Israel
Phone: (972-8)-942-3064
Fax:   (972-8)-942-3800
----------------------------------------
I did some work on nbench benchmark suite. As a result, I created a nbench-byte rpm package. It included nbench binaries compiled with different compilers and different optimization flags:

nbench (gcc version)
nbench.egcs (egcs 1.03 version -m486)
nbench.586 (egcs 1.03 version -mpentium)
nbench.ppro (egcs 1.03 version -mpentiumpro)
nbench.egcs.1.1.2 (egcs 1.1.2 version -m486)
nbench.586.1.1.2 (egcs 1.1.2 version -mpentium)
nbench.ppro.1.1.2 (egcs 1.1.2 version -mpentiumpro)

I benchmarked my home computer with these binaries. The results, showing the influence of different compilers, -mpentium and -mpentiumpro flag, are summarized below. They are rather interesting. I will post the binary and source RPMs to the net, so everybody will be able to verify the results.

Results
======================================================================================================================================================================: TEST : gcc 2.7.2.3 :: egcs 1.03 :: egcs 1.03 Pentium :: egcs 1.03 Ppro :: egcs 1.1.2 ::egcs 1.1.2 Pentium :: egcs 1.1.2 Ppro : :----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------: : Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index :: Iter/sec | Index : --------------------:----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------: NUMERIC SORT : 159.16| 1.34 :: 157.07| 1.32 :: 158.04| 1.33 :: 165.44| 1.39 :: 168.76| 1.42 :: 165.71| 1.40 :: 168.52| 1.42 : STRING SORT : 16.161| 1.12 :: 17.78| 1.23 :: 17.551| 1.21 :: 18.012| 1.25 :: 16.786| 1.16 :: 17.516| 1.21 :: 17.372| 1.20 : BITFIELD :5.2937e+07| 1.90 ::4.9756e+07| 1.78 ::4.9627e+07| 1.78 ::4.9685e+07| 1.78 ::4.8988e+07| 1.76 ::4.8866e+07| 1.75 ::4.9015e+07| 1.76 : FP EMULATION : 9.8535| 1.09 :: 12.04| 1.33 :: 12.089| 1.34 :: 16.627| 1.84 :: 10.689| 1.18 :: 10.638| 1.18 :: 15.8| 1.75 : FOURIER : 3428.8| 2.19 :: 4170.7| 2.66 :: 4172.3| 2.67 :: 4154.1| 2.65 :: 4013.3| 2.56 :: 4127.6| 2.64 :: 4121.1| 2.63 : ASSIGNMENT : 2.3051| 2.28 :: 1.907| 1.88 :: 1.9534| 1.93 :: 2.186| 2.16 :: 1.9593| 1.93 :: 1.9881| 1.96 :: 2.1577| 2.13 : IDEA : 356.72| 1.62 :: 336.1| 1.53 :: 342.78| 1.56 :: 410.64| 1.86 :: 331.51| 1.51 :: 333.2| 1.51 :: 404.47| 1.84 : HUFFMAN : 169.81| 1.50 :: 148.42| 1.31 :: 150.04| 1.33 :: 163.16| 1.44 :: 149.2| 1.32 :: 154.62| 1.37 :: 167.16| 1.48 : NEURAL NET : 2.6266| 1.77 :: 3.4364| 2.32 :: 3.3386| 2.26 :: 3.483| 2.35 :: 4.5617| 3.08 :: 4.5581| 3.08 :: 4.8244| 3.26 : LU DECOMPOSITION : 125.12| 4.68 :: 137.48| 5.14 :: 135.12| 5.05 :: 153.84| 5.75 :: 145.16| 5.43 :: 146.08| 5.46 :: 151.32| 5.66 : ====================:===================::===================::===================::===================::===================::===================::===================: MEMORY INDEX : 1.690 :: 1.604 :: 1.608 :: 1.685 :: 1.579 :: 1.609 :: 1.650 : ====================:===================::===================::===================::===================::===================::===================::===================: INTEGER INDEX : 1.374 :: 1.371 :: 1.385 :: 1.621 :: 1.352 :: 1.358 :: 1.612 : ====================:===================::===================::===================::===================::===================::===================::===================: FLOATING INDEX : 2.630 :: 3.169 :: 3.121 :: 3.300 :: 3.501 :: 3.540 :: 3.649 : ====================:===================::===================::===================::===================::===================::===================::===================:

* The benchmarked CPU is GenuineIntel Pentium II (Deschutes) 375 MHz 512 KB cache.
* SuperMicro P6DGU motherboard, 256 MB RAM, 75 MHz bus clock.
* gcc and egcs compilation flags: 
          -s -static -O3 -fomit-frame-pointer -Wall -m486 -fforce-addr \
          -fforce-mem -malign-loops=2 -malign-functions=2 -malign-jumps=2\
          -funroll-loops
* egcs pentuim  compilation flags:
              -s -static -O3 -fomit-frame-pointer -Wall \
             -fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
             -malign-jumps=2 -funroll-loops -mpentium
* egcs Ppro optimized compilation flags:
             -s -static -O3 -fomit-frame-pointer -Wall \
             -fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
             -malign-jumps=2 -funroll-loops -mpentiumpro
* the same standard egcs binaries were used to generate 486, pentium and Ppro optimized
  binaries, only compilation flags were different.
Observations:

  1. Latest versions of egcs compiler provide significant speed-ups for Pentium and Pentium Pro processors, especially for floating point intensive code.
  2. Binaries generated by egcs compiler optimized for CPU of type "A" and run on a cpu of type "B" can show significant degradation of performance in memory and integer intensive operations. This is true specifically for the currently most common CPU - Pentium II (Pentium Pro architecture). For the latest versions of egcs compiler this effect seems to be even stronger.
  3. Binaries generated by egcs compiler and not optimized for the Pentium Pro CPU will not show optimal performance in floating point intensive applications when run on Pentium II / Ppro machines. Significant speed-ups for such applications can be achieved in the case the binaries were optimized for the Pentium Pro CPU. This effect seems to be stronger for the latest versions of egcs compiler.
  4. The latest versions of egcs compiler seem to generate binaries which run slower in the case of memory and integer intensive operations. This at least true for the Pentium II / Pentium Pro CPUs. This maybe corrected by different optimization flags. A word of advise, anyone?

Conclusions:

Most distribution vendors (including RedHat) have switched to egcs compiler. However, they provide binaries generated for 486 CPUs. Ordinarily, these binaries will show degradation of performance when run on the most common today Pentium II CPUs. In the best case, these binaries will not show optimal performance. Thus, distribution vendors must be pressed to compile packages optimized for Pentium Pro CPUs by default. Ever wondered why Linux seems not as fast as it could have been on the modern machines? The secret is hidden within the binaries compiled with -m486 flag!



TEST                :    gcc 2.7.2.3    ::     egcs 1.03     :: egcs 1.03 Pentium ::  
egcs 1.03 Ppro   ::    egcs 1.1.2     ::egcs 1.1.2 Pentium ::  egcs 1.1.2 Ppro  :
                    
:----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------:
                    : Iter/sec | Index  :: Iter/sec | Index  :: Iter/sec | Index  :: 
Iter/sec | Index  :: Iter/sec | Index  :: Iter/sec | Index  :: Iter/sec | Index  :
--------------------:----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------::----------|--------:
NUMERIC SORT        :    159.16| 1.34   ::    157.07| 1.32   ::    158.04| 1.33   ::   
 165.44| 1.39   ::    168.76| 1.42   ::    165.71| 1.40   ::    168.52| 1.42   :
STRING SORT         :    16.161| 1.12   ::     17.78| 1.23   ::    17.551| 1.21   ::   
 18.012| 1.25   ::    16.786| 1.16   ::    17.516| 1.21   ::    17.372| 1.20   :
BITFIELD            :5.2937e+07| 1.90   ::4.9756e+07| 1.78   ::4.9627e+07| 1.78   
::4.9685e+07| 1.78   ::4.8988e+07| 1.76   ::4.8866e+07| 1.75   ::4.9015e+07| 1.76   :
FP EMULATION        :    9.8535| 1.09   ::     12.04| 1.33   ::    12.089| 1.34   ::   
 16.627| 1.84   ::    10.689| 1.18   ::    10.638| 1.18   ::      15.8| 1.75   :
FOURIER             :    3428.8| 2.19   ::    4170.7| 2.66   ::    4172.3| 2.67   ::   
 4154.1| 2.65   ::    4013.3| 2.56   ::    4127.6| 2.64   ::    4121.1| 2.63   :
ASSIGNMENT          :    2.3051| 2.28   ::     1.907| 1.88   ::    1.9534| 1.93   ::   
  2.186| 2.16   ::    1.9593| 1.93   ::    1.9881| 1.96   ::    2.1577| 2.13   :
IDEA                :    356.72| 1.62   ::     336.1| 1.53   ::    342.78| 1.56   ::   
 410.64| 1.86   ::    331.51| 1.51   ::     333.2| 1.51   ::    404.47| 1.84   :
HUFFMAN             :    169.81| 1.50   ::    148.42| 1.31   ::    150.04| 1.33   ::   
 163.16| 1.44   ::     149.2| 1.32   ::    154.62| 1.37   ::    167.16| 1.48   :
NEURAL NET          :    2.6266| 1.77   ::    3.4364| 2.32   ::    3.3386| 2.26   ::   
  3.483| 2.35   ::    4.5617| 3.08   ::    4.5581| 3.08   ::    4.8244| 3.26   :
LU DECOMPOSITION    :    125.12| 4.68   ::    137.48| 5.14   ::    135.12| 5.05   ::   
 153.84| 5.75   ::    145.16| 5.43   ::    146.08| 5.46   ::    151.32| 5.66   :
====================:===================::===================::===================::===================::===================::===================::===================:
MEMORY INDEX        :       1.690       ::       1.604       ::       1.608       ::   
    1.685       ::       1.579       ::       1.609      ::        1.650       :
====================:===================::===================::===================::===================::===================::===================::===================:
INTEGER INDEX       :       1.374       ::       1.371       ::       1.385       ::   
    1.621       ::       1.352       ::       1.358       ::       1.612       :
====================:===================::===================::===================::===================::===================::===================::===================:
FLOATING INDEX      :       2.630       ::       3.169       ::       3.121       ::   
    3.300       ::       3.501       ::       3.540       ::       3.649       :
====================:===================::===================::===================::===================::===================::===================::===================:

* The benchmarked CPU is GenuineIntel Pentium II (Deschutes) 375 MHz 512 KB cache.
* SuperMicro P6DGU motherboard, 256 MB RAM, 75 MHz bus clock.
* gcc and egcs compilation flags: 
          -s -static -O3 -fomit-frame-pointer -Wall -m486 -fforce-addr \
          -fforce-mem -malign-loops=2 -malign-functions=2 -malign-jumps=2\
          -funroll-loops
* egcs pentuim  compilation flags:
              -s -static -O3 -fomit-frame-pointer -Wall \
             -fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
             -malign-jumps=2 -funroll-loops -mpentium
* egcs Ppro optimized compilation flags:
             -s -static -O3 -fomit-frame-pointer -Wall \
             -fforce-addr -fforce-mem -malign-loops=2 -malign-functions=2 \
             -malign-jumps=2 -funroll-loops -mpentiumpro
* the same standard egcs binaries were used to generate 486, pentium and Ppro 
optimized binaries, only compilation flags were different.

Observations:
1) Latest versions of egcs compiler provide significant speed-ups for Pentium and 
Pentium Pro processors, especially for floating
point intensive code.
2) Binaries generated  by egcs compiler optimized for CPU of type "A" and run on a cpu 
of type "B" can show significant degradation
of performance in memory and integer intensive operations. This is true specifically 
for the currently most common CPU - Pentium II
(Pentium Pro architecture). For the latest versions of egcs compiler this effect seems 
to be even stronger.
3) Binaries generated by egcs compiler and not optimized for the Pentium Pro CPU will 
not show optimal performance in floating point
intensive applications when run on Pentium II / Ppro machines. Significant speed-ups 
for such applications can be achieved in the
case the binaries were optimized for the Pentium Pro CPU. This effect seems to be 
stronger for the latest versions of egcs compiler.
4) The latest versions of egcs compiler seem to generate binaries which run slower in 
the case of memory and integer intensive
operations. This at least true for the Pentium II / Pentium Pro CPUs. This maybe 
corrected by different optimization flags. A word
of advise, anyone?

Conclusions:
Most distribution vendors (including RedHat) have switched to egcs compiler. However, 
they provide binaries generated for 486 CPUs.
Ordinarily, these binaries will show degradation of performance when run on the most 
common today Pentium II CPUs. In the best case,
these binaries will not show optimal performance. Thus, distribution vendors must be 
pressed to compile packages optimized for
Pentium Pro CPUs by default. Ever wondered why Linux seems not as fast as it could 
have been on the modern machines? The secret is
hidden within the binaries compiled with -m486 flag!

Reply via email to