RE: [maemo-developers] Speed test with vfp (floating point) on N800

2007-02-16 Thread Simon Pickering
 

   The whetstone results were a little surprising, in that the vfp code 
   wasn't orders of magnitude faster than the softvfp code as expected, 
   however this is probably (as a guess) caused by libm not being compiled
for vfp.

snip

 Duh, I was looking at the wrong results from 
 http://people.bath.ac.uk/enpsgp/benchmarks/N800-fp-tests.txt, 
 my numbers match those in there even with vfp libm...

I compiled glibc last night and statically linked the whetstone vfp benchmarks
with libm.a and got the following results:

Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O0.out -c 2000

Loops: 2000, Iterations: 1, Duration: 3 sec.
C Converted Double Precision Whetstones: 66.7 MIPS

Loops: 2000, Iterations: 1, Duration: 4 sec.
C Converted Double Precision Whetstones: 50.0 MIPS


Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O1.out -c 5000

Loops: 5000, Iterations: 1, Duration: 4 sec.
C Converted Double Precision Whetstones: 125.0 MIPS


Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O2.out -c 5000

Loops: 5000, Iterations: 1, Duration: 3 sec.
C Converted Double Precision Whetstones: 166.7 MIPS

Loops: 5000, Iterations: 1, Duration: 4 sec.
C Converted Double Precision Whetstones: 125.0 MIPS


Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O3.out -c 1

Loops: 1, Iterations: 1, Duration: 4 sec.
C Converted Double Precision Whetstones: 250.0 MIPS


Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.Os.out -c 5000

Loops: 5000, Iterations: 1, Duration: 4 sec.
C Converted Double Precision Whetstones: 125.0 MIPS

Loops: 5000, Iterations: 1, Duration: 3 sec.
C Converted Double Precision Whetstones: 166.7 MIPS


So it looks like my compilation worked okay, and increased the MIPS count quite
significantly when compared with the previous non-vfp-libm results in
http://people.bath.ac.uk/enpsgp/benchmarks/N800-fp-tests.txt.

The static binaries are here: http://people.bath.ac.uk/enpsgp/benchmarks

I note that the build process was a fiddle (mainly due to my not knowing the
debian way), and it took a couple of goes before my CFLAGS were used for the
build. I also note that when I tried using the following -mcpu=arm1136j-s
-mfpu=vfp -mfloat-abi=softfp, the build failed (I don't know why - didn't look
too hard. I don't have cpu transparency setup though), but removing
-mcpu=arm1136j-s worked, and that's what I've been using.

I've also uploaded the vfp libm (libm.so.6 and libm.a) to the same location so
you can benefit without re-compiling anything unless you really want. Try
setting LD_LIBRARY_PATH before running a binary like so:

$ LD_LIBRARY_PATH=/path/containing/vfp-libm/:$LD_LIBRARY_PATH /path/to/binary

E.g.

$ LD_LIBRARY_PATH=/home/user/:$LD_LIBRARY_PATH
/home/user/benchmarks/whetstone.vfp.O0.out

It's easy to produce a wrapper script that you can call with a binary name so it
can handle all of the LD_LIBRARY_PATH business and save you from typing it every
time.

I'm not sure whether mplayer/any mplayer codecs require functions from libm (I
understand that the wmv codec does), I wonder how much of an improvement this
might make (and to other apps as well)?

Regards,


Simon


___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Speed test with vfp (floating point) on N800

2007-02-16 Thread Kalle Vahlman

2007/2/16, Simon Pickering [EMAIL PROTECTED]:
[libc/m with VFP]

I note that the build process was a fiddle (mainly due to my not knowing the
debian way), and it took a couple of goes before my CFLAGS were used for the
build. I also note that when I tried using the following -mcpu=arm1136j-s
-mfpu=vfp -mfloat-abi=softfp, the build failed (I don't know why - didn't look
too hard. I don't have cpu transparency setup though), but removing
-mcpu=arm1136j-s worked, and that's what I've been using.


Yeah, that's due to licensing issues. Qemu cannot emulate ARMv6
instructions, thus the  build fails if it tries to run anything
compiled with the correct arch.

Tuomas (who did the builds for me) noticed that there was a configure
flag given for tje libc build that most likely prevented our version
from actually using the vfp (the fact that it still gave a performance
boost for us most likely comes from the missing thumb mode instead of
using hw floats...).

So we definitely need to rerun the tests we had once a better build is done...

--
Kalle Vahlman, [EMAIL PROTECTED]
Powered by http://movial.fi
Interesting stuff at http://syslog.movial.fi
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


RE: [maemo-developers] Speed test with vfp (floating point) on N800

2007-02-12 Thread Simon Pickering
 
 I was very interesseted in the floating point copro in the 
 N800 cpu and did some tests with a progamm which calculates 
 the mandelbrod set and outputs it via SDL. I put this program 
 online on bomberman.garage.maemo.org. No installer, you have 
 to run the benchmarks on xterm or via ssh (so did I).

For reference, I've also got some benchmark results. I compiled whetstone, flops
and dhrystone (and paranoia) for the 770, N800 and my desktop PC (for
comparison).

Results here: 
http://people.bath.ac.uk/enpsgp/benchmarks/N800-fp-tests.txt
http://people.bath.ac.uk/enpsgp/benchmarks/770-fp-tests.txt
http://people.bath.ac.uk/enpsgp/benchmarks/PC-fp-tests.txt

Binaries here:
http://people.bath.ac.uk/enpsgp/benchmarks/

My apologies for the layout of the txt files, I'm in the process of tidying them
up and putting up a results page.

Curiously the 770 performs better for dhrystone with optimisation than the N800,
but these results were variable and should probably be run with a very large
number of iterations to produce less variability.

As expected, flops was far faster for the vfp code run on the N800 and slightly
faster for the softvfp code run on the N800 compared with the 770 (due to
processor speed difference).

The whetstone results were a little surprising, in that the vfp code wasn't
orders of magnitude faster than the softvfp code as expected, however this is
probably (as a guess) caused by libm not being compiled for vfp.

The whetstone tests are also rather variable, again due to the relatively small
number of iterations.

Paranoia states that both the softvfp and vfp systems produce correct IEEE 754
floating point results, though the results are different in some cases - see the
N800 results file and look at the end for the results from the hacked machar.c
(called a.vfp.out and a.softvfp.out).

Regards,


Simon

___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


[maemo-developers] Speed test with vfp (floating point) on N800

2007-02-11 Thread Klaus Rotter

Hi folks,

I was very interesseted in the floating point copro in the N800 cpu and 
did some tests with a progamm which calculates the mandelbrod set and 
outputs it via SDL. I put this program online on 
bomberman.garage.maemo.org. No installer, you have to run the benchmarks 
on xterm or via ssh (so did I).


// Results of Mandelbrot set on N800 800x480 pixels, 100 interations
// real -1 to 2
// imag -1.3 to 1.2

// dbl - uses doubles
// flt - uses floats
// vfp - compiled with -mfpu=vfp -mfloat-abi=softfp

// All binarys are compiled with -O2

// Results with full PixelDraw and SDL_Update every pixel
./mandel_armel_dbl.bin - Time: 178.284 seconds
./mandel_armel_dbl_vfp.bin - Time: 151.816 seconds
./mandel_armel_flt.bin - Time: 169.486 seconds
./mandel_armel_flt_vfp.bin - Time: 152.148 seconds

// Results with full PixelDraw and _NO_ SDL_Update.
./mandel_armel_dbl.bin - Time: 26.377 seconds
./mandel_armel_dbl_vfp.bin - Time: 1.808 seconds
./mandel_armel_flt.bin - Time: 19.813 seconds
./mandel_armel_flt_vfp.bin - Time: 1.709 seconds

// Results without any Drawing
./mandel_armel_dbl.bin - Time: 26.525 seconds
./mandel_armel_dbl_vfp.bin - Time: 1.672 seconds
./mandel_armel_flt.bin - Time: 19.647 seconds
./mandel_armel_flt_vfp.bin - Time: 1.601 seconds

// Results with full PixelDraw and SDL_Update only every column
./mandel_armel_dbl.bin - Time: 27.512 seconds
./mandel_armel_dbl_vfp.bin - Time: 2.447 seconds
./mandel_armel_flt.bin - Time: 20.689 seconds
./mandel_armel_flt_vfp.bin - Time: 2.451 seconds

What could that mean? First, SDL_Update is _very_ expensive. In the 
first try, I call SDL_Update just for every Pixel 
(SDL_Update(screen,x,y,1,1) and it slows down execution about six times, 
the vfp about 80 times. So I think there must be a kind of sync inside 
the SDL_Update function. If I update only every column, the speed loss 
is only about 80%.


Vfp float can give a speed increase about factor 10 and more. And there 
is not much difference between vfp float and double.


-Klaus

--
Klaus Rotter * klaus at rotters dot de * www.rotters.de
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers


Re: [maemo-developers] Speed test with vfp (floating point) on N800

2007-02-11 Thread Visti Andresen
On Sun, 11 Feb 2007 21:55:52 +0100
Klaus Rotter [EMAIL PROTECTED] wrote:

 Hi folks,
 
 I was very interesseted in the floating point copro in the N800 cpu and 
 did some tests with a progamm which calculates the mandelbrod set and 
 outputs it via SDL. I put this program online on 
 bomberman.garage.maemo.org. No installer, you have to run the benchmarks 
 on xterm or via ssh (so did I).

It might interest you to know that I have an integer implementation of the 
mandelbrot set.
It's not quite finished yet but, so its only available though subversion from 
the N770Demos project

 
 // Results of Mandelbrot set on N800 800x480 pixels, 100 interations
 // real -1 to 2
 // imag   -1.3 to 1.2
 
 // dbl - uses doubles
 // flt - uses floats
 // vfp - compiled with -mfpu=vfp -mfloat-abi=softfp
 
 // All binarys are compiled with -O2
 
 // Results with full PixelDraw and SDL_Update every pixel
 ./mandel_armel_dbl.bin - Time: 178.284 seconds
 ./mandel_armel_dbl_vfp.bin - Time: 151.816 seconds
 ./mandel_armel_flt.bin - Time: 169.486 seconds
 ./mandel_armel_flt_vfp.bin - Time: 152.148 seconds
 
 // Results with full PixelDraw and _NO_ SDL_Update.
 ./mandel_armel_dbl.bin - Time: 26.377 seconds
 ./mandel_armel_dbl_vfp.bin - Time: 1.808 seconds
 ./mandel_armel_flt.bin - Time: 19.813 seconds
 ./mandel_armel_flt_vfp.bin - Time: 1.709 seconds
 
 // Results without any Drawing
 ./mandel_armel_dbl.bin - Time: 26.525 seconds
 ./mandel_armel_dbl_vfp.bin - Time: 1.672 seconds
 ./mandel_armel_flt.bin - Time: 19.647 seconds
 ./mandel_armel_flt_vfp.bin - Time: 1.601 seconds
 
 // Results with full PixelDraw and SDL_Update only every column
 ./mandel_armel_dbl.bin - Time: 27.512 seconds
 ./mandel_armel_dbl_vfp.bin - Time: 2.447 seconds
 ./mandel_armel_flt.bin - Time: 20.689 seconds
 ./mandel_armel_flt_vfp.bin - Time: 2.451 seconds
 
 What could that mean? First, SDL_Update is _very_ expensive. In the 
 first try, I call SDL_Update just for every Pixel 
 (SDL_Update(screen,x,y,1,1) and it slows down execution about six times, 
 the vfp about 80 times. So I think there must be a kind of sync inside 
 the SDL_Update function. If I update only every column, the speed loss 
 is only about 80%.

When you do an update or flip the surface data is transferred to video memory 
(a slow and painful operation)
This is also true even if SDL tells you that you got a HW surface. 
Unfortunately there are no sync, to the vertical blank...
At least so have I been told. 

 
 Vfp float can give a speed increase about factor 10 and more. And there 
 is not much difference between vfp float and double.

Humm the N800 even has a floating point unit... and its quite fast.
I would say roughly as fast as an 770 using a long for the math.
Only timed by counting aloud, and with a different setup -1, 1 and -1, 1 with 
256 iterations.
(The current subversion code uses long long math for that extra resolution, and 
it costs quite some speed :)

 
 -Klaus
 
 -- 
 Klaus Rotter * klaus at rotters dot de * www.rotters.de
 ___
 maemo-developers mailing list
 maemo-developers@maemo.org
 https://maemo.org/mailman/listinfo/maemo-developers
___
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers