More benchmark results on core i5 2410M 2 cores 4 threads, 8GB RAM: gcc -Wall -fopenmp -lgomp -O3 main.c -o neighbor time ./neighbor 5000 5000 23
export OMP_NUM_THREADS=1 real 0m27.052s user 0m26.882s sys 0m0.128s export OMP_NUM_THREADS=2 real 0m15.579s user 0m30.466s sys 0m0.124s export OMP_NUM_THREADS=4 real 0m10.454s user 0m40.711s sys 0m0.120s gcc -Wall -fopenmp -lgomp -Ofast -march=core-avx-i main.c -o neighbor time ./neighbor 5000 5000 23 export OMP_NUM_THREADS=1 real 0m17.090s user 0m16.953s sys 0m0.108s export OMP_NUM_THREADS=2 real 0m9.957s user 0m19.437s sys 0m0.136s export OMP_NUM_THREADS=4 real 0m7.476s user 0m28.698s sys 0m0.124s opencc -Wall -mp -Ofast -march=auto main.c -o neighbor time ./neighbor 5000 5000 23 export OMP_NUM_THREADS=1 real 0m19.095s user 0m18.909s sys 0m0.152s export OMP_NUM_THREADS=2 real 0m11.203s user 0m22.097s sys 0m0.136s export OMP_NUM_THREADS=4 real 0m8.648s user 0m33.670s sys 0m0.160s Best regards Soeren 2013/6/29 Sören Gebbert <soerengebb...@googlemail.com> > Hi, > i have implemented a "real" average neighborhood algorithm that runs in > parallel using openmp. The source code and the benchmark shell script is > attached. > > The neighbor program computes the average moving window of arbitrary size. > The size of the map rows x cols and the size of the moving window (odd > number cols==rows) can be specified. > > ./neighbor rows cols mw_size > > IMHO the new program is better for compiler comparison and neighborhood > operation performance. > > This is the benchmark on my 5 year old AMD phenom 4 core computer using 1, > 2 and 4 threads: > > gcc -Wall -fopenmp -lgomp -Ofast main.c -o neighbor > export OMP_NUM_THREADS=1 > time ./neighbor 5000 5000 23 > real 0m37.211s > user 0m36.998s > sys 0m0.196s > > export OMP_NUM_THREADS=2 > time ./neighbor 5000 5000 23 > real 0m19.907s > user 0m38.890s > sys 0m0.248s > > export OMP_NUM_THREADS=4 > time ./neighbor 5000 5000 23 > real 0m10.170s > user 0m38.466s > sys 0m0.192s > > Happy hacking, compiling and testing. :) > > Best regards > Soeren > > > > > 2013/6/29 Markus Metz <markus.metz.gisw...@gmail.com> > >> On Sat, Jun 29, 2013 at 1:26 PM, Hamish <hamis...@yahoo.com> wrote: >> > Markus Metz wrote: >> > >> >> Some more results with Sören's test program on a Intel(R) Core(TM) i5 >> >> CPU M450 @ 2.40GHz (2 real cores, 4 fake cores) with gcc 4.7.2 and >> >> clang 3.3 >> >> >> >> gcc -O3 >> >> v is 2.09131e+13 >> >> >> >> real 2m0.393s >> >> user 1m57.610s >> >> sys 0m0.003s >> >> >> >> gcc -Ofast >> >> v is 2.09131e+13 >> >> >> >> real 0m7.218s >> >> user 0m7.018s >> >> sys 0m0.017s >> > >> > >> > nice. one thing we need to remember though is that it's not entirely >> > free, one thing -Ofast turns on is -ffast-math, >> > """ >> > This option is not turned on by any -O option besides -Ofast since it >> can >> > result in incorrect output for programs that depend on an exact >> > implementation of IEEE or ISO rules/specifications for math functions. >> It >> > may, however, yield faster code for programs that do not require the >> > guarantees of these specifications. >> > """ >> > >> > which may not be fit for our purposes. >> > >> > >> > With the ifort compiler there is '-fp-model precise' which allows only >> > optimizations which don't harm the results. Maybe gcc has something >> > similar. >> >> In gcc, you can turn of -ffoo with -fno-foo, maybe this way you can >> use -Ofast -fno-fast-math to preserve IEEE specifications. >> > >> > Glad to see -floop-parallelize-all in gcc 4.7, it will help us identify >> > places to focus OpenMP work on. >> > >> > >> > Hamish >> > >> > >
benchmark.sh
Description: Bourne shell script
_______________________________________________ grass-user mailing list grass-user@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user