RE: [maemo-developers] Speed test with vfp (floating point) on N800
The whetstone results were a little surprising, in that the vfp code wasn't orders of magnitude faster than the softvfp code as expected, however this is probably (as a guess) caused by libm not being compiled for vfp. snip Duh, I was looking at the wrong results from http://people.bath.ac.uk/enpsgp/benchmarks/N800-fp-tests.txt, my numbers match those in there even with vfp libm... I compiled glibc last night and statically linked the whetstone vfp benchmarks with libm.a and got the following results: Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O0.out -c 2000 Loops: 2000, Iterations: 1, Duration: 3 sec. C Converted Double Precision Whetstones: 66.7 MIPS Loops: 2000, Iterations: 1, Duration: 4 sec. C Converted Double Precision Whetstones: 50.0 MIPS Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O1.out -c 5000 Loops: 5000, Iterations: 1, Duration: 4 sec. C Converted Double Precision Whetstones: 125.0 MIPS Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O2.out -c 5000 Loops: 5000, Iterations: 1, Duration: 3 sec. C Converted Double Precision Whetstones: 166.7 MIPS Loops: 5000, Iterations: 1, Duration: 4 sec. C Converted Double Precision Whetstones: 125.0 MIPS Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.O3.out -c 1 Loops: 1, Iterations: 1, Duration: 4 sec. C Converted Double Precision Whetstones: 250.0 MIPS Nokia-N800-51:/home/user/benchmark# ./whetstone.vfp.static_libm.Os.out -c 5000 Loops: 5000, Iterations: 1, Duration: 4 sec. C Converted Double Precision Whetstones: 125.0 MIPS Loops: 5000, Iterations: 1, Duration: 3 sec. C Converted Double Precision Whetstones: 166.7 MIPS So it looks like my compilation worked okay, and increased the MIPS count quite significantly when compared with the previous non-vfp-libm results in http://people.bath.ac.uk/enpsgp/benchmarks/N800-fp-tests.txt. The static binaries are here: http://people.bath.ac.uk/enpsgp/benchmarks I note that the build process was a fiddle (mainly due to my not knowing the debian way), and it took a couple of goes before my CFLAGS were used for the build. I also note that when I tried using the following -mcpu=arm1136j-s -mfpu=vfp -mfloat-abi=softfp, the build failed (I don't know why - didn't look too hard. I don't have cpu transparency setup though), but removing -mcpu=arm1136j-s worked, and that's what I've been using. I've also uploaded the vfp libm (libm.so.6 and libm.a) to the same location so you can benefit without re-compiling anything unless you really want. Try setting LD_LIBRARY_PATH before running a binary like so: $ LD_LIBRARY_PATH=/path/containing/vfp-libm/:$LD_LIBRARY_PATH /path/to/binary E.g. $ LD_LIBRARY_PATH=/home/user/:$LD_LIBRARY_PATH /home/user/benchmarks/whetstone.vfp.O0.out It's easy to produce a wrapper script that you can call with a binary name so it can handle all of the LD_LIBRARY_PATH business and save you from typing it every time. I'm not sure whether mplayer/any mplayer codecs require functions from libm (I understand that the wmv codec does), I wonder how much of an improvement this might make (and to other apps as well)? Regards, Simon ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Speed test with vfp (floating point) on N800
2007/2/16, Simon Pickering [EMAIL PROTECTED]: [libc/m with VFP] I note that the build process was a fiddle (mainly due to my not knowing the debian way), and it took a couple of goes before my CFLAGS were used for the build. I also note that when I tried using the following -mcpu=arm1136j-s -mfpu=vfp -mfloat-abi=softfp, the build failed (I don't know why - didn't look too hard. I don't have cpu transparency setup though), but removing -mcpu=arm1136j-s worked, and that's what I've been using. Yeah, that's due to licensing issues. Qemu cannot emulate ARMv6 instructions, thus the build fails if it tries to run anything compiled with the correct arch. Tuomas (who did the builds for me) noticed that there was a configure flag given for tje libc build that most likely prevented our version from actually using the vfp (the fact that it still gave a performance boost for us most likely comes from the missing thumb mode instead of using hw floats...). So we definitely need to rerun the tests we had once a better build is done... -- Kalle Vahlman, [EMAIL PROTECTED] Powered by http://movial.fi Interesting stuff at http://syslog.movial.fi ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
RE: [maemo-developers] Speed test with vfp (floating point) on N800
I was very interesseted in the floating point copro in the N800 cpu and did some tests with a progamm which calculates the mandelbrod set and outputs it via SDL. I put this program online on bomberman.garage.maemo.org. No installer, you have to run the benchmarks on xterm or via ssh (so did I). For reference, I've also got some benchmark results. I compiled whetstone, flops and dhrystone (and paranoia) for the 770, N800 and my desktop PC (for comparison). Results here: http://people.bath.ac.uk/enpsgp/benchmarks/N800-fp-tests.txt http://people.bath.ac.uk/enpsgp/benchmarks/770-fp-tests.txt http://people.bath.ac.uk/enpsgp/benchmarks/PC-fp-tests.txt Binaries here: http://people.bath.ac.uk/enpsgp/benchmarks/ My apologies for the layout of the txt files, I'm in the process of tidying them up and putting up a results page. Curiously the 770 performs better for dhrystone with optimisation than the N800, but these results were variable and should probably be run with a very large number of iterations to produce less variability. As expected, flops was far faster for the vfp code run on the N800 and slightly faster for the softvfp code run on the N800 compared with the 770 (due to processor speed difference). The whetstone results were a little surprising, in that the vfp code wasn't orders of magnitude faster than the softvfp code as expected, however this is probably (as a guess) caused by libm not being compiled for vfp. The whetstone tests are also rather variable, again due to the relatively small number of iterations. Paranoia states that both the softvfp and vfp systems produce correct IEEE 754 floating point results, though the results are different in some cases - see the N800 results file and look at the end for the results from the hacked machar.c (called a.vfp.out and a.softvfp.out). Regards, Simon ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] Speed test with vfp (floating point) on N800
Hi folks, I was very interesseted in the floating point copro in the N800 cpu and did some tests with a progamm which calculates the mandelbrod set and outputs it via SDL. I put this program online on bomberman.garage.maemo.org. No installer, you have to run the benchmarks on xterm or via ssh (so did I). // Results of Mandelbrot set on N800 800x480 pixels, 100 interations // real -1 to 2 // imag -1.3 to 1.2 // dbl - uses doubles // flt - uses floats // vfp - compiled with -mfpu=vfp -mfloat-abi=softfp // All binarys are compiled with -O2 // Results with full PixelDraw and SDL_Update every pixel ./mandel_armel_dbl.bin - Time: 178.284 seconds ./mandel_armel_dbl_vfp.bin - Time: 151.816 seconds ./mandel_armel_flt.bin - Time: 169.486 seconds ./mandel_armel_flt_vfp.bin - Time: 152.148 seconds // Results with full PixelDraw and _NO_ SDL_Update. ./mandel_armel_dbl.bin - Time: 26.377 seconds ./mandel_armel_dbl_vfp.bin - Time: 1.808 seconds ./mandel_armel_flt.bin - Time: 19.813 seconds ./mandel_armel_flt_vfp.bin - Time: 1.709 seconds // Results without any Drawing ./mandel_armel_dbl.bin - Time: 26.525 seconds ./mandel_armel_dbl_vfp.bin - Time: 1.672 seconds ./mandel_armel_flt.bin - Time: 19.647 seconds ./mandel_armel_flt_vfp.bin - Time: 1.601 seconds // Results with full PixelDraw and SDL_Update only every column ./mandel_armel_dbl.bin - Time: 27.512 seconds ./mandel_armel_dbl_vfp.bin - Time: 2.447 seconds ./mandel_armel_flt.bin - Time: 20.689 seconds ./mandel_armel_flt_vfp.bin - Time: 2.451 seconds What could that mean? First, SDL_Update is _very_ expensive. In the first try, I call SDL_Update just for every Pixel (SDL_Update(screen,x,y,1,1) and it slows down execution about six times, the vfp about 80 times. So I think there must be a kind of sync inside the SDL_Update function. If I update only every column, the speed loss is only about 80%. Vfp float can give a speed increase about factor 10 and more. And there is not much difference between vfp float and double. -Klaus -- Klaus Rotter * klaus at rotters dot de * www.rotters.de ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] Speed test with vfp (floating point) on N800
On Sun, 11 Feb 2007 21:55:52 +0100 Klaus Rotter [EMAIL PROTECTED] wrote: Hi folks, I was very interesseted in the floating point copro in the N800 cpu and did some tests with a progamm which calculates the mandelbrod set and outputs it via SDL. I put this program online on bomberman.garage.maemo.org. No installer, you have to run the benchmarks on xterm or via ssh (so did I). It might interest you to know that I have an integer implementation of the mandelbrot set. It's not quite finished yet but, so its only available though subversion from the N770Demos project // Results of Mandelbrot set on N800 800x480 pixels, 100 interations // real -1 to 2 // imag -1.3 to 1.2 // dbl - uses doubles // flt - uses floats // vfp - compiled with -mfpu=vfp -mfloat-abi=softfp // All binarys are compiled with -O2 // Results with full PixelDraw and SDL_Update every pixel ./mandel_armel_dbl.bin - Time: 178.284 seconds ./mandel_armel_dbl_vfp.bin - Time: 151.816 seconds ./mandel_armel_flt.bin - Time: 169.486 seconds ./mandel_armel_flt_vfp.bin - Time: 152.148 seconds // Results with full PixelDraw and _NO_ SDL_Update. ./mandel_armel_dbl.bin - Time: 26.377 seconds ./mandel_armel_dbl_vfp.bin - Time: 1.808 seconds ./mandel_armel_flt.bin - Time: 19.813 seconds ./mandel_armel_flt_vfp.bin - Time: 1.709 seconds // Results without any Drawing ./mandel_armel_dbl.bin - Time: 26.525 seconds ./mandel_armel_dbl_vfp.bin - Time: 1.672 seconds ./mandel_armel_flt.bin - Time: 19.647 seconds ./mandel_armel_flt_vfp.bin - Time: 1.601 seconds // Results with full PixelDraw and SDL_Update only every column ./mandel_armel_dbl.bin - Time: 27.512 seconds ./mandel_armel_dbl_vfp.bin - Time: 2.447 seconds ./mandel_armel_flt.bin - Time: 20.689 seconds ./mandel_armel_flt_vfp.bin - Time: 2.451 seconds What could that mean? First, SDL_Update is _very_ expensive. In the first try, I call SDL_Update just for every Pixel (SDL_Update(screen,x,y,1,1) and it slows down execution about six times, the vfp about 80 times. So I think there must be a kind of sync inside the SDL_Update function. If I update only every column, the speed loss is only about 80%. When you do an update or flip the surface data is transferred to video memory (a slow and painful operation) This is also true even if SDL tells you that you got a HW surface. Unfortunately there are no sync, to the vertical blank... At least so have I been told. Vfp float can give a speed increase about factor 10 and more. And there is not much difference between vfp float and double. Humm the N800 even has a floating point unit... and its quite fast. I would say roughly as fast as an 770 using a long for the math. Only timed by counting aloud, and with a different setup -1, 1 and -1, 1 with 256 iterations. (The current subversion code uses long long math for that extra resolution, and it costs quite some speed :) -Klaus -- Klaus Rotter * klaus at rotters dot de * www.rotters.de ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers