Re: [GRASS-user] r.neighbors velocity

Sören Gebbert Sat, 29 Jun 2013 14:13:03 -0700

More benchmark results on core i5 2410M 2 cores 4 threads, 8GB RAM:

gcc -Wall -fopenmp -lgomp -O3 main.c -o neighbor
time ./neighbor 5000 5000 23


export OMP_NUM_THREADS=1
real 0m27.052s
user 0m26.882s
sys 0m0.128s

export OMP_NUM_THREADS=2
real 0m15.579s
user 0m30.466s
sys 0m0.124s

export OMP_NUM_THREADS=4
real 0m10.454s
user 0m40.711s
sys 0m0.120s

gcc -Wall -fopenmp -lgomp -Ofast -march=core-avx-i main.c -o neighbor
time ./neighbor 5000 5000 23

export OMP_NUM_THREADS=1
real 0m17.090s
user 0m16.953s
sys 0m0.108s

export OMP_NUM_THREADS=2
real 0m9.957s
user 0m19.437s
sys 0m0.136s

export OMP_NUM_THREADS=4
real 0m7.476s
user 0m28.698s
sys 0m0.124s

opencc -Wall -mp -Ofast -march=auto main.c -o neighbor
time ./neighbor 5000 5000 23

export OMP_NUM_THREADS=1
real 0m19.095s
user 0m18.909s
sys 0m0.152s

export OMP_NUM_THREADS=2
real 0m11.203s
user 0m22.097s
sys 0m0.136s

export OMP_NUM_THREADS=4
real 0m8.648s
user 0m33.670s
sys 0m0.160s

Best regards
Soeren



2013/6/29 Sören Gebbert <soerengebb...@googlemail.com>

> Hi,
> i have implemented a "real" average neighborhood algorithm that runs in
> parallel using openmp. The source code and the benchmark shell script is
> attached.
>
> The neighbor program computes the average moving window of arbitrary size.
> The size of the map rows x cols and the size of the moving window  (odd
> number cols==rows) can be specified.
>
> ./neighbor rows cols mw_size
>
> IMHO the new program is better for compiler comparison and neighborhood
> operation performance.
>
> This is the benchmark on my 5 year old AMD phenom 4 core computer using 1,
> 2 and 4 threads:
>
> gcc -Wall -fopenmp -lgomp -Ofast main.c -o neighbor
> export OMP_NUM_THREADS=1
> time ./neighbor 5000 5000 23
> real 0m37.211s
> user 0m36.998s
> sys 0m0.196s
>
> export OMP_NUM_THREADS=2
> time ./neighbor 5000 5000 23
> real 0m19.907s
> user 0m38.890s
> sys 0m0.248s
>
> export OMP_NUM_THREADS=4
> time ./neighbor 5000 5000 23
> real 0m10.170s
> user 0m38.466s
> sys 0m0.192s
>
> Happy hacking, compiling and testing. :)
>
> Best regards
> Soeren
>
>
>
>
> 2013/6/29 Markus Metz <markus.metz.gisw...@gmail.com>
>
>> On Sat, Jun 29, 2013 at 1:26 PM, Hamish <hamis...@yahoo.com> wrote:
>> > Markus Metz wrote:
>> >
>> >> Some more results with Sören's test program on a Intel(R) Core(TM) i5
>> >> CPU M450 @ 2.40GHz (2 real cores, 4 fake cores) with gcc 4.7.2 and
>> >> clang 3.3
>> >>
>> >> gcc -O3
>> >> v is 2.09131e+13
>> >>
>> >> real    2m0.393s
>> >> user    1m57.610s
>> >> sys    0m0.003s
>> >>
>> >> gcc -Ofast
>> >> v is 2.09131e+13
>> >>
>> >> real    0m7.218s
>> >> user    0m7.018s
>> >> sys    0m0.017s
>> >
>> >
>> > nice. one thing we need to remember though is that it's not entirely
>> > free, one thing -Ofast turns on is -ffast-math,
>> > """
>> >  This option is not turned on by any -O option besides -Ofast since it
>> can
>> >  result in incorrect output for programs that depend on an exact
>> >  implementation of IEEE or ISO rules/specifications for math functions.
>> It
>> >  may, however, yield faster code for programs that do not require the
>> >  guarantees of these specifications.
>> > """
>> >
>> > which may not be fit for our purposes.
>> >
>> >
>> > With the ifort compiler there is '-fp-model precise' which allows only
>> > optimizations which don't harm the results. Maybe gcc has something
>> > similar.
>>
>> In gcc, you can turn of -ffoo with -fno-foo, maybe this way you can
>> use -Ofast -fno-fast-math to preserve IEEE specifications.
>> >
>> > Glad to see -floop-parallelize-all in gcc 4.7, it will help us identify
>> > places to focus OpenMP work on.
>> >
>> >
>> > Hamish
>> >
>>
>
>

benchmark.sh
Description: Bourne shell script

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

Re: [GRASS-user] r.neighbors velocity

Reply via email to