Re: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460

Vincent Diepeveen Wed, 17 Sep 2008 17:44:59 -0700

How does all this change when you use a PGO optimized executable onboth sides?


Vincent


On Sep 18, 2008, at 2:34 AM, Eric Thibodeau wrote:

Vincent Diepeveen wrote:
Nah,
I guess he's referring to sometimes it's using single precisionfloating pointto get something done instead of double precision, and it tends tokeep
sometimes stuff in registers.
That isn't a problem necessarily, but if i remember well floatingpoint state
could get wiped out when switching to SSE2.

Sometimes you lose your FPU registerset in that case.
Main problem is that there is so many dangerous optimizationspossible,to speedup testsets, because in itself floating point is real slowto do at hardware,
from hardware viewpoint seen.
Yet in general last generations of intel compilers that hasimproved really a lot.
Well, running the same code here is the result discrepancy I got:
FLOPS:
my code has to do: 7,975,847,125,000 (~8Tflops) ...takes15minutes on 8*2core Opeteron with 32 Gigs-o-RAM (thank you OpenMP ;)
The running times (ran it a _few_ times...but not the statisticalminimum of 30):
   ICC -> runtime == 689.249  ; summed error == 1651.78
   GCC -> runtime == 1134.404 ; summed error == 0.883501

Compiler Flags:
   icc -xW -openmp -O3 vqOpenMP.c -o vqOpenMP
   gcc -lm -fopenmp -O3 -march=native vqOpenMP.c -o vqOpenMP_GCC
No trickery, no smoky mirrors ;) Just a _huge_ kick ASS k-Meansparallelized with OpenMP (thank gawd, otherwise it takes hours torun) and a rather big database of 1.4 Gigs
... So this is what I meant by floating point errors. Yes, theruntime was almost halved by ICC (and this is on an *opteron* basedsystem, Tyan VX50). The running time wasn't what I was actuallylooking for rather than precision skew and that's where I fell offmy chair.
For the ones itching for a little more specs:

[EMAIL PROTECTED] ~ $ icc -V
Intel(R) C Compiler for applications running on Intel(R) 64,Version 10.1 Build 20080602
Copyright (C) 1985-2008 Intel Corporation.  All rights reserved.
FOR NON-COMMERCIAL USE ONLY

[EMAIL PROTECTED] ~ $ gcc -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: /dev/shm/portage/sys-devel/gcc-4.3.1-r1/work/gcc-4.3.1/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.1 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib--disable-checking --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-cld --disable-libgcj --enable-languages=c,c++,treelang,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.3.1-r1 p1.1'
Thread model: posix
gcc version 4.3.1 (Gentoo 4.3.1-r1 p1.1)
Vincent

On Sep 17, 2008, at 10:25 PM, Greg Lindahl wrote:
On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote:
Also, note that I've had issues with icc
generating really fast but inaccurate code (fp model is not IEEE*bydefault*, I am sure _everyone_ knows this and I am stating theobvious
here).
All modern, high-performance compilers default that way. It'scertainlythe case that sometimes it goes more horribly wrong thannecessary, butI wouldn't ding icc for this default. Compare results with IEEEmode.
-- greg


_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460

Reply via email to