Hi:
I am glad to announce the release of a new version of Glucas, the
version 2.8b.
This version has some very big improvements in performance for some
targets. Most part of he improvement is due to some prefetch hints
inserted in the code, other part is from a good tuning of Glucas
parameters and there is too a small improvement to changes in the code.
The biggest improvement is for Alpha ev6 and ev67 (almost 30%). You can
see the ChangeLog to this version
at the end of this mail.
As a sample of the new Glucas performance, here some timings:
Secs per iter. Roundoff check ON/OFF.
FFT runlengh
Machine 256K 512K 1792K 4096K
(1) 0.038/0.036 0.078/0.072 0.336/0.318 0.806/0.763
(2) 0.041/0.039 0.084/0.079 0.355/0.328 0.867/0.835
(3) 0.045/0.042 0.106/0.101 0.459/0.439 1.126/1.167
(4) 0.069/0.067 0.174/0.171 0.809/0.783 2.168/2.161
(5) 0.255/0.246 0.547/0.524 2.261/2.183 5.275/5.094
(6) 0.170/0.162 0.370/0.354 1.554/1.485 3.588/3.496
(7) 0.199/0.192 0.497/0.483 2.214/2.152 5.637/5.519
(1) Alpha ev67, 667 MHz, Linux 2.4. 64 KB L1, 4MB L2.
(2) AMD Athlon, 1200 MHz 266 FSB, Linux 2.4.
(3) IA-64 Itanium, 800 MHz.
(4) Sparc UltraIIi, 450 MHz, 16 KB L1, 4 MB L2. Solaris 5.8
(5) PowerMac G3. 300 MHz, 512kB L2, Mac OS X.
(6) PowerMac G4. 400 MHz, 1MB L2, Mac OS X.
(7) RS/6000, ppc604e 375 MHz, 1MB L2, Linux 2.2.
Thanks to all people helping me to improve Glucas. Special thanks to
Klaus Kastens, B.J Beesley and Tom Cage for their work.
You can download the files at
https://sourceforge.net/project/showfiles.php?group_id=24518&release_id=47225
The home page for glucas is
http://glucas.sourceforge.net.
And this is the ChangeLog:
------------------------------------------------------------------------
v.2.8b 07/Aug/2001
-Great Prefetch working progress has made for Alphas. B.J.Beesley
found the way to insert assembler prefetch hints in Compaq-c code. The
improvement is about 30% or more for ev6 and ev67.!
-Other prefetch hints has been coded for other platforms. At the moment
there is no significative gains for other than x86 and powerpc.
-Good news for Mac OS users, both with classic MacOS and Mac OS X:
Big performance improvement for powerpc family (10%, about 3% adding
prefetch hints and 7% tuning the parameters). Klaus kastens and
Tom Cage did the job.
-Binaries for Itanium IA-64 has improved a lot, but now the credits
are for GNU/gcc team. With gcc 3.0 now Glucas is almost twice faster.
-Long macros has been coded for radices 4 and 8. It could take more
advantages of prefetch and help to less clever compilers. It can be
activated with -DY_LONG_MACROS compiler flag. The gain is from 5% to
-1% .
-Some routines has been recoded to hide some dependency stalls and to
make easy to vectorize with instructions like altivec G4+ or SSE2.
We can activate it -DY_VECTORIZE. We have to set -DY_KILL_BRANCHES to
do any effect. About 1% gain in most cases.
-Radix 4 can be recoded to use other vectorized macros, doing two
radix-4 transform in a single loop pass. It can be useful sometimes.
it is equivalent to unroll those loops. To activate -DY_VECTORIZE2
-When the prefetch hints have a lot of cost, we can try the flag
-DY_VECTORIZE_EXPENSIVE. Radices 4 and 8 will be unrolled to save a
lot of prefetch calls. It is still an experimental feature.
-Selftest now outputs all the Glucas flags actived. It is useful
in tuning and developing tasks
-If option Alternative_output_flag == 2, now the output is driven
both to stdout (console) and file set with option Output_file.
-Compiler time has been reduced a lot in most cases. The routines
are trivial when we no need them. It will make the developer work
easy.
-------------------------------------------------------------
Regards.
Guillermo.
--
Guillermo Ballester Valor
[EMAIL PROTECTED]
Granada (Spain)
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers