Re: Mersenne: Hyper-threading

2002-03-08 Thread Laurent . DESNOGUES


There are some performance results with Intel Fortran OpenMP compiler in
latest Intel Technology Journal:

 http://developer.intel.com/technology/itj/

http://developer.intel.com/technology/itj/2002/volume06issue01/art04_fortrancompiler/p08_perf_eval.htm

Regards,

   Laurent


_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: TMS XP-15 DSP card

2002-02-14 Thread Laurent . DESNOGUES


> This XP-15 ( http://www.superdsp.com/ ) bad boy DSP card looks pretty
> impressive, how useful would it be in searching for Mersenne primes? At
> $14,000 a pop, how would it compare to a farm of P4's? According to the
> Presentation pdf, it is 42 times as fast as a 1.4GHz P4 at a 1024K FFT!

   The killer thing is that the vector processing unit only handles
32 bit IEEE FP numbers...


   Laurent


_
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Mlucas 2.7x on SPARC

1999-09-27 Thread Laurent Desnogues

Laurent Desnogues wrote:
> 
[wrong list...]
>
> Please also note the SYSVABI_1.3:  I guess it means the executable
> can only be run on a SPARC station running Solaris 7!

   I hate cut & paste :(

% ldd Mlucas2.7x.exe
libfui.so.1 =>   (file not found)
libfai.so.1 =>   (file not found)
libfai2.so.1 =>  (file not found)
libfsumai.so.1 =>(file not found)
libfprodai.so.1 =>   (file not found)
libfminlai.so.1 =>   (file not found)
libfmaxlai.so.1 =>   (file not found)
libfminvai.so.1 =>   (file not found)
libfmaxvai.so.1 =>   (file not found)
libfsu.so.1 =>   (file not found)
libsunmath.so.1 =>   /opt/SUNWspro/lib/libsunmath.so.1
libm.so.1 => /usr/lib/libm.so.1
libc.so.1 => /usr/lib/libc.so.1
libc.so.1 (SYSVABI_1.3) =>   (version not found)
libdl.so.1 =>/usr/lib/libdl.so.1


Laurent
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: Mlucas 2.7x on SPARC

1999-09-27 Thread Laurent Desnogues

[EMAIL PROTECTED] wrote:
> 
> The SPARC binary Alex Kruppa sent me of Mlucas 2.7x is at my ftp site:
> 
> ftp://209.133.33.182/pub/mayer/README
> ftp://209.133.33.182/pub/mayer/bin/SPARC/Mlucas_2.7x.exe.gz
[...] 
> I don't even know if the above runs on a machine that doesn't have an f90
> compiler installed (i.e. whether the code needs any f90-specific RTL files)-
> I don't think it does, but anyone with an f90-less SPARC can easily find out.

   No, it indeed does not:

% ldd Mlucas2.7x.exe
libfui.so.1 =>   (file not found)
libfai.so.1 =>   (file not found)
libfai2.so.1 =>  (file not found)
libfsumai.so.1 =>(file not found)
libfprodai.so.1 =>   (file not found)
libfminlai.so.1 =>   (file not found)
libfmaxlai.so.1 =>   (file not found)
libfminvai.so.1 =>   (file not found)
libfmaxvai.so.1 =>   (file not found)
libfsu.so.1 =>   (file not found)
libsunmath.so.1 =>   /opt/SUNWspro/lib/libsunmath.so.1
libm.so.1 => /usr/lib/libm.so.1
libc.so.1 => /usr/lib/libc.so.1
libc.so.1  =>   (version not found)
libdl.so.1 =>/usr/lib/libdl.so.1

Please also note the SYSVABI_1.3:  I guess it means the executable
can only be run on a SPARC station running Solaris 7!


Laurent
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Mlucas 2.7x on SPARC

1999-09-27 Thread Laurent Desnogues

[EMAIL PROTECTED] wrote:
> 
> << I tried all sorts of compiler flags - unfortunately, the optimization
> flags are not linear, especially -O5 tends to produce much slower code
> than -O4 when combined with other flags. >>
> 
> I see similar weird slowdowns using the -O5 compile option on some (not
> all) Alpha CPUs (generally the older ones.) I wonder if both compilers
> are doing similar "optimizations" at -O5.

   The Sun C compiler -O5 flag should only be used when using
a profile to direct subsequent compilations...  The way to use
it is to compile with -xprofile=collect then run then recompile
with -xprofile=use...

   This might be something similar for Alpha.

   However some optimizations done at higher levels of
optimization might produce slower code.  An example is too much
loop unrolling producing code that does not fit well in L1
I-Cache.

> << I'm using -fast -libmil -xlibmopt -fnsyes now, which seems to give
> close to optimal performance. >>
[...]
> << I dont know whether this is also optimal on other types of UltraSparc, I
> only have Ultra60s for testing. >>

   This won't be optimal if you run under Solaris < 7!  Under
such OSes the -fast flag must be followed by a -xarch=v8plus in
order to use all 32 double FP regs of an UltraSPARC chip.  This
is not the case for a Solaris 7 system where -fast will use
-xarch=v9.

   Flags to also test are:

-xdepend
-xinline=all
-xsafe=mem (to be used with -xO5)

Good luck ;)


Laurent
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: SPARC times

1999-09-15 Thread Laurent Desnogues

Bill Rea wrote:
> 
> This is using MacLucasUNIX compiled with the Sun workshop compilers.

   Which version and which flags did you use?  I guess you ran
your tests under Solaris 7, right?

> (1) If I used the -xarch=v9
> option on those systems that support this option, the resulting
> binary runs slower on the tests than using -xarch=v8plusa.

   That's very strange!  I have benchmarked some code using both
flags (with -fast preprended) under Solaris 7 (which is required
to run code compiled with -xarch=v9) and v9 helped;  however the
code was purely 64-bit integer.

   There's also a very interesting flag to test that's not
documented in Sun cc doc:  -xinline=all.  I used it by error but
it did a great job with the code I was working on.

> (2) Differences in speeds on the tests supplied with the software
> don't translate into differences in speed when working on Mersenne
> numbers of the size above. Even with speed differences around 10%
> on the tests showed no discernable differences in practice.

   I don't know the tests supplied but the difference might result
from the way time is counted.  I think the best way to check the
speed of a code is to use getrusage for the process only
(RUSAGE_SELF) and to only take into account the user time.  This
way I get very consistent timings for the before mentioned code
(BTW, the code is ecdl by Robert Harley, used to crack ECC).


Laurent
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Regarding: multiply/add

1999-08-25 Thread Laurent Desnogues

Jason Stratos Papadopoulos wrote:
> 
> There was a paper in (I believe) Appl. Math and Comp. from about two
> years ago that reformulated radix-2,3,4 and 5 FFT butterfiles to use
> multiply-adds wherever possible. The savings can be impressive: a radix-2
> FFT butterfly has 4 multiplies and six adds, but this boils down to
> six multiply-adds. Can't the RS6000 do two multiply-adds per clock?

   I have found a paper titled "Implementation of Efficient FFT Algorithms
on Fused Multiply-Add Architectures" by E. Linzer and E. Feig, IEEE Trans.
on Signal Processing, vol. 41, no 1, Jan 93.  They call their method
scaled FFT.

   For a radix-8, with size 8, Cooley-Tukey FFT has 56 m/a ops and their
method has 52 m/a ops.  The number of m/a ops for radix 8 is:

- Cooley-Tukey:   7/2 n log_2(n) - 57/14 n + 32/7
- scaled FFT:11/4 n log_2(n) - 57/28 n + 16/7.

   Regarding the errors (root-mean-square error per point):
  DFTinverse DFT
- Cooley-Tukey:  5.3500 10^-15   3.3999 10^-15
- scaled FFT:5.3441 10^-15   3.2386 10^-15

They say the impovement in error is probably due to the fast that a m/a
op is more precise than a mult followed by an add.

   Their experiments were done on an RS/6000 model 530.

   I can't comment on that article, since I'm not mathematically-gifted
enough to just read it without a lot of time with a pencil!


Laurent
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: AMD K7 will

1998-10-27 Thread Laurent Desnogues

Yves & Lucile Gallot wrote:
> All modern processors use a dynamic execution architecture that blends
> out-of-order and speculative execution with hardware register renaming and
> branch prediction. These processors feature an in-order issue pipeline,
> which breaks processor macroinstructions into simple, micro-operations, and
> an out-of-order, superscalar processor core, which executes the micro-ops.
> The out-of-order core of the processor contains several pipelines to which
> integer, branch, floating-point and memory execution units are attached.
> Many instructions contain few micro-operations then the processor is able to
> execute more than 1 instruction per cycle. Some instructions are complex and
> needs several cycles to be executed (div, sqrt, cos, ...) but they are not
> often used.
> Is that processor a RISC or a CISC ? Neither a RISC nor a CISC! And it is
> really faster than a RISC or a CISC.

   Here are a few rules that I use to clearly sort
between RISCs and CISCs:

- orthogonality of the ISA
- no simultaneous mem access and computation
- regular instruction encoding
- single use of MMU.

I don't think that, for instance, the way instructions
are executed or their complexity do matter a lot,
though using these rules you can get some architectural
information.

   With these rules (we can add a few others), one can
say in which class a processor falls...  But this
thread does not belong to the mailing-list!


Laurent