John Gilmore writes: >Anyone have an executable that runs under SGI Irix 6.5 I can use for >double-checking and acquire via FTP or e-mail attachment? The most recent version of my LL code, V2.6, has (Fortran-90) source code and binaries for Alpha Unix and SGI Irix (the latter optimized for MIPS R10K). Readme: ftp://nigel.mae.cwru.edu/pub/mayer/README Source: ftp://nigel.mae.cwru.edu/pub/mayer/ Alpha binary: ftp://nigel.mae.cwru.edu/pub/mayer/bin/ALPHA_OSF/Mlucas_2.6X.exe.gz (and ftp://nigel.mae.cwru.edu/pub/mayer/bin/ALPHA_OSF/libshpf.so.gz if you lack an F90 compiler - this is the run-time library you need.) SGI binary: ftp://nigel.mae.cwru.edu/pub/mayer/bin/SGI/Mlucas_2.6X.exe.gz There are two major changes from V2.5: 1) Non-power-of-2 runlengths are here. The code supports FFT runlengths of form {1,3,5,7)*2^n, i.e. the same lengths as George Woltman's Prime95. 2) More efficient FFT: the code now does a decimation-in-frequency forward FFT and decimation-in-time inverse FFT, thus avoiding any bit-reversal data reorderings. The code allows exponents up to 20M, so can be used for double-checking or current assignments. NOTE: people upgrading from V2.5 will have to finish their current exponent before switching to V2.6. Here are some per-iteration timings for two slightly different MIPS R10K setups, for exponents spanning the current double-checking and new testing ranges: FFT length / max. exponent (in millions) Platform 96K 112K 128K 160K 192K 224K 256K 320K 384K 1.99M 2.30M 2.62M 3.27M 3.91M 4.56M 5.20M 6.46M 7.71M 195 MHz R10K, .087s .104s .120s .159s .200s .244s .287s .399s .511s 32 KB D-cache 4MB L2 cache (One processor of a dual-processor Origin, run using runon 0) 250 MHz R10K, .108s .129s .145s .192s .233s .277s .311s .398s .481s 32 KB D-cache 1MB L2 cache (A single-processor Octane) Note the salutary effect of having a nice large L2 cache - the 195MHz CPU timings are better than the 250MHz up to FFT length 320K. NOTE TO SPARC USERS: I finally know why my code sucks on SPARC - a crappy F90 compiler. Jason Papadopoulos was kind enough to look at the executable produced by the SPARC F90 compiler. Here is his review: "Your program is slow on the ultra because Sun's F90 compiler does a miserable job. Even when you tell it to use the Sparc V9 instruction set, to use 64-bit loads and stores, and to target the ultra explicitly it still insists on using 32-bit loads and stores almost exclusively. It also alternates loads and stores a lot, which on the Ultra causes nasty bus-switching stalls. Finally, it has no idea about loading values in advance; all your real*8 values are loaded (one real*4 at a time) and then arithmetic is immediately performed on them. At least it mixes integer and floating point nicely." Sorry, SPARCers, you'll have to wait for the C version. Happy hunting, Ernst ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
