As I understand the gcc docs, using both -march and -mcpu is odd. You should probably be running these tests with just -march or just -mcpu. There may be some other issues as well, but this is as good a place to start as any.

On Wednesday 19 March 2003 03:12 pm, Austin wrote:

Here's a simple benchmark from Narfi.

Athlon XP 2100.
Asus A7N8X motherboard (NForce2)
512 MB memory, PC2700 2-2-2
Mandrake 9.0

# Compiling for athlon-xp
export CFLAGS="-march=athlon-xp -mcpu=athlon-xp -O3 -finline-limit=10000
-ffast-math -fomit-frame-pointer"; export CXXFLAGS=$CFLAGS; make clean;
./configure  --disable-assert --enable-cmov-extension --enable-simd-accel;

# Compiling for i586
export CFLAGS="-march=i586 -mcpu=i586 -O3"; export CXXFLAGS=$CFLAGS; make
clean; ./configure  --disable-assert --enable-cmov-extension
--enable-simd-accel; make

# Compiling for i586, no MMX used
export CFLAGS="-march=i586 -mcpu=i586 -O3"; export CXXFLAGS=$CFLAGS; make
clean; ./configure  --disable-assert --disable-cmov-extension
--disable-simd-accel; make

He ran these 3 versions of mpeg2enc one after the other, each 3
times and I picked the best time for each version. The file encoded was
only 127MB so it fit into RAM.

The best times of the 4 were:
athlon-xp:    12.16
i586:         13.24
i586, no mmx: 57.65

[ start quote ]

In short: There are >>some<< packages which >>need to<< have mmx enabled.
These numbers should show that beyond any doubt!

How we do it is a different question.

My opinion is that going all the way and allowing the user to recompile all
packages is a potential support nightmare that Mandrake cannot risk and
that we should do the minimal: Only do this for a small number of select
packages where one can prove there is a significant benefit.

If somebody recompiles OOo for athlon-xp and can show that the startup time
after a reboot can be cut by 50% when compared to the i586 version, I am
all for it. Otherwise, I don't think it's worth the risk.


[ end quote ]

So you see,  4.4 times faster by enabling MMX, and 4.7 times faster by also
enabling fast-math athlon XP.  Thus it looks like your average app will see
very little increase in performance by recompiling for XP or P4 or
whatever, but as it stands now, our video apps are almost unuseably slow.


I get a 3.5 times faster on PIII. Honestly I never found yet a package that will increase
performance of even 10% just changing the compiler flags. Generally
the best achievement from current %optflags to best one or even forcing
-ffast-math and/or -fpmath=sse -march=..., is negligible, from 2-3% to 5%. Effectively in package
mjpegtools this performance increase is severe tue to mmx usage; you can try, compiling this one:



rpm -ba --with mmx mjpegtools.spec

note how the performance increase is almost ZERO removing -ffast-math from CFLAGS, and or

--target i686-mandrake-linux

as rpm building option.

Maybe it's worthwhile to provide a separated .i686 package for that. I wonder if there
are other packages like this giving more that 50% in performance increasing just changing
some compiler option or compiler flag.


