Mersenne Digest V1 #627

Mersenne Digest Fri, 17 Sep 1999 16:59:58 -0700

Mersenne Digest       Friday, September 17 1999       Volume 01 : Number 627




----------------------------------------------------------------------

Date: Thu, 16 Sep 1999 11:47:48 +1200 (NZST)
From: Bill Rea <[EMAIL PROTECTED]>
Subject: Re: Mersenne: SPARC times

Laurent,

> From [EMAIL PROTECTED] Wed Sep 15 20:37:35 1999
> 
> Bill Rea wrote:
> > 
> > This is using MacLucasUNIX compiled with the Sun workshop compilers.
> 
>    Which version and which flags did you use?  I guess you ran
> your tests under Solaris 7, right?

The MacLucasUNIX was v 6.20. The Ultra 5 and 10 were Solaris 7, both
E450's were Solaris 2.6. I tried both workshop 4 and workshop 5 
compilers. The options in the make file which gave the fastest
times on the tests were:-

OPT=-fast -xO4 -xtarget=ultra -xarch=v8plusa -xsafe=mem -xdepend \
- -xparallel -xchip=ultra

>    That's very strange!  I have benchmarked some code using both
> flags (with -fast preprended) under Solaris 7 (which is required
> to run code compiled with -xarch=v9) and v9 helped;  however the
> code was purely 64-bit integer.
>
>    There's also a very interesting flag to test that's not
> documented in Sun cc doc:  -xinline=all.  I used it by error but
> it did a great job with the code I was working on.

I'll try this option and also put the -xarch=v9 to a test on a 
new exponent when the next one finishes. The implication in the
WS4 documentation is that the -xinline=all is automatically used 
with -xO4, but it's worth trying anyway.

>    I don't know the tests supplied but the difference might result
> from the way time is counted.  I think the best way to check the
> speed of a code is to use getrusage for the process only
> (RUSAGE_SELF) and to only take into account the user time.  This
> way I get very consistent timings for the before mentioned code
> (BTW, the code is ecdl by Robert Harley, used to crack ECC).

The time reports clock time, user time, and system time. I was
comparing user time. I talked very briefly to a Sun Engineer and he
said it probably had more to do with keeping the processor fed.
Sometimes optimizations in the code don't translate into faster speeds
because the limiting factor is memory access times.  The CPUs with
bigger caches perform better than you would expect given their clock
speeds. However, he was just as surprized as I was that the v9 option
didn't produce faster code.

Bill Rea, Information Technology Services, University of Canterbury  \_ 
E-Mail b dot rea at its dot canterbury dot ac dot nz                 </   New 
Phone   64-3-364-2331, Fax     64-3-364-2332                        /)  Zealand 
Unix Systems Administrator                                         (/' 
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 15 Sep 1999 20:13:02 -0500
From: Ken Kriesel <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Iters between screen outputs

At 10:29 PM 1999/09/15 +0200, "Shot" <[EMAIL PROTECTED]> wrote:
>Hi all,
>
>I was wondering why the "Iterations between screen outputs" setting 
>was defaultly set on 100, and I thought it was beacuse each screen 
>output takes precious CPU time.
>
>But when I changed it from 100 i/o (usually around 0.430 sec/iter) to 
>10 i/o, nothing really slowed down - the time was still between 0.430 
>and 0.431 s/i.
>
>So I boldly went where no man has gone before ;) and changed it to 1 
>i/o... and the times still stayed between 0.429 and 0.431 s/i.
>
>The question is, how much of the CPU's power is consumed by screen 
>outputs?

Hardly anything these days, with fast pentiums and smart video cards.  
But on 486's & 386's it was significant.  I have systems where it's
set to 5 iterations, and systems where it's set at 1000 iterations.
It all depends on system speed, exponent size, and desired update frequency.

Ken

Ken Kriesel, PE <[EMAIL PROTECTED]>
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 15 Sep 1999 22:23:58 -0400
From: Jeff Woods <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Iters between screen outputs

The actual ITERATION takes the same, no matter how long the screen output 
takes.  The report on the screen is NOT time for iteration PLUS time to 
display the report... It's time for the iteration WITHOUT the screen display.

At 10:29 PM 9/15/99 +0200, you wrote:
>Hi all,
>
>I was wondering why the "Iterations between screen outputs" setting
>was defaultly set on 100, and I thought it was beacuse each screen
>output takes precious CPU time.
>
>But when I changed it from 100 i/o (usually around 0.430 sec/iter) to
>10 i/o, nothing really slowed down - the time was still between 0.430
>and 0.431 s/i.
>
>So I boldly went where no man has gone before ;) and changed it to 1
>i/o... and the times still stayed between 0.429 and 0.431 s/i.
>
>The question is, how much of the CPU's power is consumed by screen
>outputs?
>
>If it is really around 0.001 s/i (in 0.430 s/i neighbourhood), I'll
>leave mine at 10 i/o, so I can see it's alive. ;)
>
>Thanks for your time,
>-- Shot
>   __
>  c"? Shot - [EMAIL PROTECTED]  hobbies: Star Wars, Pterry, GIMPS, ASCII
>  `-' [EMAIL PROTECTED]  join the GIMPS @ http://www.mersenne.org
>  Science Explained (by Kids): Clouds just keep circling the earth
>  around and around. And around. There is not much else to do.
>_________________________________________________________________
>Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
>Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 04:38:00 +0200 (CEST)
From: Henrik Olsen <[EMAIL PROTECTED]>
Subject: Re: Mersenne: complaint

On Wed, 15 Sep 1999, Chris Jefferson wrote:
> > On 14 Sep 99, at 7:22, Henrik Olsen wrote:
> > 
> > > Personally I found installing and running it to be an extremely smooth
> > > operation, to the point where it takes less that 2 minutes to install on a
> > > new machine, including configuration, after which I just forget about it.
> >  
> > Yes - I found the same thing - applies to both un*x & windoze 
> > clients.
> 
> Although I don't mean to insult anyone, this is exactly what I mean. I can
> install the program on a computer in 3 or 4 minutes. The problem is those
> people who can't... Also just one question. Was is it I sometimes see
> people write un*x instead of unix, which is I assume what they mean?
Un*x or the alternative *nix are used because Unix is a registered
trademark, originally by AT&T I belive, for a very specific OS.

Most people when talking about un*x are talking about a group of OS's
characterised by having (more or less) the same libraries, tools, security
model and directory structure, which means the look and feel is the same,
but the innards can be completely different, and it's only one of them
(modulo licensing agreements and other weirdness) that can be called Unix.

I hope this clears it up a bit.
- -- 
Henrik Olsen,  Dawn Solutions I/S       URL=http://www.iaeste.dk/~henrik/
Thomas Covenant: I am the savior of The Land.   Linden Avery: Can I help?
Thomas Covenant: Over my dead body.(dies)  (Linden Avery saves The Land.)
          The Second Chronicles of Thomas Covenant, Book-A-Minute version


_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 01:14:03 EDT
From: [EMAIL PROTECTED]
Subject: Mersenne: Apple? Ew.

<<I go to my mailbox and what do I find?>>

S. "There are lies, damned lies, and benchmarks" L.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 13:06:04 +0200
From: Harald Tveit Alvestrand <[EMAIL PROTECTED]>
Subject: Re: Mersenne: complaint

At 04:38 16.09.99 +0200, Henrik Olsen wrote:

>Un*x or the alternative *nix are used because Unix is a registered
>trademark, originally by AT&T I belive, for a very specific OS.
>
>Most people when talking about un*x are talking about a group of OS's
>characterised by having (more or less) the same libraries, tools, security
>model and directory structure, which means the look and feel is the same,
>but the innards can be completely different, and it's only one of them
>(modulo licensing agreements and other weirdness) that can be called Unix.

Currently I believe the UNIX trademark is administered by The Open Group,
which will allow anyone to call their system "Unix" if it passes their 
(quite expensive) Unix certification test, which tests conformance to their 
published specifications. (Yes, the specs are freely downloadable - surprise!)

One version of Linux has paid the bill and passed the test, so at least one 
version of Linux is Unix.

Go figure....

                         Harald

- --
Harald Tveit Alvestrand, Maxware, Norway
[EMAIL PROTECTED]

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 11:28:38 -0400
From: "St. Dee" <[EMAIL PROTECTED]>
Subject: Mersenne: Celerons vs. Pentium II/III at large FFT lengths?

Does anyone else notice that their Celeron based machines seem to take a
relatively bigger performance hit when moving from testing exponents in the
384K FFT size to the 448K FFT size (under V18.1, at least)?

I have a couple of non-overclocked Celeron 400 machines and, at the 384K
FFT size, they report timings nearly identical to those George sent out in
his email message of a couple of days ago, timings generated by his
PII-400.  However, when the machines test exponents with the 448K FFT size,
they are more than 20% slower!  Is the Celeron's relatively small L2 cache
finally causing it to lose ground to the PII/PIIIs as the FFT sizes get
larger and larger?

Kel
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 11:46:38 -0400
From: "St. Dee" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Celerons vs. Pentium II/III at large FFT lengths?

At 11:28 AM 9/16/1999 -0400, St. Dee wrote:
>Does anyone else notice that their Celeron based machines seem to take a
>relatively bigger performance hit when moving from testing exponents in the
>384K FFT size to the 448K FFT size (under V18.1, at least)?
>
>I have a couple of non-overclocked Celeron 400 machines and, at the 384K
>FFT size, they report timings nearly identical to those George sent out in
>his email message of a couple of days ago, timings generated by his
>PII-400.  However, when the machines test exponents with the 448K FFT size,
>they are more than 20% slower!  Is the Celeron's relatively small L2 cache
>finally causing it to lose ground to the PII/PIIIs as the FFT sizes get
>larger and larger?

Accccckkkkkk!  That was rather inexact wording.  Let me try again...my
Celeron 400 based systems crunch exponents in the 384K FFT range at about
the same speed as George's PII-400 machine.  However, at the 448K FFT size,
George's machine appears to be 20% or more faster than my Celeron 400s.
Could the 128K L2 cache of the Celeron chips (vs. the 512K L2 cache of the
PIIs) be the culprit?

Thanks,
Kel
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 17:12:29 +0100
From: Nick Craig-Wood <[EMAIL PROTECTED]>
Subject: Re: Mersenne: complaint

On Wed, Sep 15, 1999 at 07:42:53PM +0100, Brian J. Beesley wrote:
> I'm going to be _very_ interested in how many people choose to run 
> $100,000 prize candidates using v19. There is an obvious balance 
> between the time it takes to complete a test & the enthusiasm of 
> people to participate, even if there is a substantial cash prize 
> riding on it.

If one had the data then it would be reasonably easy to calculate the
expectation (in $ per year) for a given exponent size & CPU.  You'd
need to know the iteration time for any given exponent for that CPU
and the probably distribution of mersenne primes at that exponent
size.  Unfortunately I don't know these otherwise I would have worked
it out ;-)

(Digression: Though expectation is a useful statistical measure - in
real life where there are very small probablilities involved
multiplied by very large amounts the results are not useful
practically.  Eg the UK lottery - for your 1 pound stake your
expectation is 50p.  Better to go to the bookies where for your pound
you get an expectation of 95p...)

- -- 
Nick Craig-Wood
[EMAIL PROTECTED]
http://www.axis.demon.co.uk/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 19:02:08 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Mersenne: un*x

"un*x", like "windoze", is a joke. It's pronounced "eunuchs" and 
derives, so far as I know, from a saying that "real men use Multics".

If you're under 35, you're not likely to recognise the reference to a 
long-obsolete (but advanced, for its day) operating system, used on 
Honeywell mainframes, which provided the conceptual model for the 
development of Unix - a "cut down" operating system originally 
designed to run on early 70's minicomputers, which were (at best) 
about the same power as the _original_ IBM PC.

The point of the joke is lost if "*nix" is used instead.

Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 23:44:49 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: complaint

On Thu, Sep 16, 1999 at 01:06:04PM +0200, Harald Tveit Alvestrand wrote:
>One version of Linux has paid the bill and passed the test, so at least one 
>version of Linux is Unix.

If you wanted to be picky, you could always say that a version of _GNU_/Linux
has passed the test... The tests aren't for the kernel only, are they?

OK, I'm picky, I'm picky...

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 18:23:34 -0400
From: Pierre Abbat <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Re: complaint

On Thu, 16 Sep 1999, Steinar H. Gunderson wrote:
>On Thu, Sep 16, 1999 at 01:06:04PM +0200, Harald Tveit Alvestrand wrote:
>>One version of Linux has paid the bill and passed the test, so at least one 
>>version of Linux is Unix.
>
>If you wanted to be picky, you could always say that a version of _GNU_/Linux
>has passed the test... The tests aren't for the kernel only, are they?

But "GNU" stands for "GNU's Not Unix!", so how can it be Unix if it isn't?

phma
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 18:35:32 -0400 (EDT)
From: Lucas Wiman  <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Celerons vs. Pentium II/III at large FFT lengths?

> Accccckkkkkk!  That was rather inexact wording.  Let me try again...my
> Celeron 400 based systems crunch exponents in the 384K FFT range at about
> the same speed as George's PII-400 machine.  However, at the 448K FFT size,
> George's machine appears to be 20% or more faster than my Celeron 400s.
> Could the 128K L2 cache of the Celeron chips (vs. the 512K L2 cache of the
> PIIs) be the culprit?

I think so.  Yves Gallot mentioned something similar on the Primes-L list
a while ago.

This brings us to an interesting point.  Should the primenet server start
default assigning celeron's <384K FFT mersennes, and save the larger ones
for PII's/PIII's?

- -Lucas

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 22:20:30 -0400 (EDT)
From: Lucas Wiman  <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Re: complaint

> >>One version of Linux has paid the bill and passed the test, so at least one 
> >>version of Linux is Unix.
> >
> >If you wanted to be picky, you could always say that a version of _GNU_/Linux
> >has passed the test... The tests aren't for the kernel only, are they?
> 
> But "GNU" stands for "GNU's Not Unix!", so how can it be Unix if it isn't?

Wow!  Could this be used for some kind of proof of the logical inconsistancy
of the GNU system?  What if M$ has been right all along?

Scary, mind-blowing stuff.

- -Lucas
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 16 Sep 1999 21:01:20 -0700
From: Spike Jones <[EMAIL PROTECTED]>
Subject: Re: Mersenne: complaint

Nick Craig-Wood wrote:  If one had the data then it would be reasonably
easy to calculate the expectation (in $ per year) for a given exponent
size & CPU.

Nick, I worked this out a few months ago when the prizes were
announced.  Using a pentium II/400 the mathematical expectation
for those hunting the 10 million digit prize comes out to about 18
cents per year, as I recall.  Dont quit your day job...  {8-]  spike
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 17 Sep 1999 11:36:18 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Celerons vs. Pentium II/III at large FFT lengths?

On 16 Sep 99, at 18:35, Lucas Wiman wrote:
> 
> This brings us to an interesting point.  Should the primenet server start
> default assigning celeron's <384K FFT mersennes, and save the larger ones
> for PII's/PIII's?

No. Whatever the problem was (I _did_ manage to duplicate it on my 
Celeron 366 laptop using v18 - a 448K FFT was actually _slower_ than 
a 512K FFT on the same system!) v19 does not suffer from it. And the 
PrimeNet server doesn't recognise a Celeron unless v19 is running - 
in v18, all P6 architecture chips are identified as Pentium Pros.

Note, when you upgrade a v18 client to v19, you should check the CPU 
type in Options/CPU - I don't think it checks automatically, except 
on initial installation.


Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 17 Sep 1999 17:53:39 +0300
From: Jukka Santala <[EMAIL PROTECTED]>
Subject: Mersenne: P-1

Well, see the topic. In other words, is it possible to get a quick "P-1
factoring for dummies" rundown? At which point is P-1 factoring worth
the effort? Does it have any overlappign with ECM? How should the bounds
be chosen for any given exponent, and will higher bounds always find any
factors smaller bounds would have, or is there an advantage to running
lower bounds? As I understand, every run with _same_ bounds would find
the same factors?

 -Donwulff

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 17 Sep 1999 10:58:20 -0400 (EDT)
From: "St. Dee" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Celerons vs. Pentium II/III at large FFT lengths?

On Fri, 17 Sep 1999, Brian J. Beesley wrote:
> On 16 Sep 99, at 18:35, Lucas Wiman wrote:
> > 
> > This brings us to an interesting point.  Should the primenet server start
> > default assigning celeron's <384K FFT mersennes, and save the larger ones
> > for PII's/PIII's?
> 
> No. Whatever the problem was (I _did_ manage to duplicate it on my 
> Celeron 366 laptop using v18 - a 448K FFT was actually _slower_ than 
> a 512K FFT on the same system!) v19 does not suffer from it. And the 
> PrimeNet server doesn't recognise a Celeron unless v19 is running - 
> in v18, all P6 architecture chips are identified as Pentium Pros.

So I guess I should have my Celery's test larger exponents, at least until
I upgrade to V19...;-)

Kel

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 17 Sep 1999 19:57:06 +0300
From: Jukka Santala <[EMAIL PROTECTED]>
Subject: Mersenne: v19 DNS(?) crash...

v19 is giving me trouble now. When I try to start it up, it says
"Contacting PrimeNet Server", hangs up there for a while and then
crashes with Application Error "The exception unknown software exception
(0x000006ba) occured in the application at location 0x77e1fc45". I don't
need to go thru the debugger to guess this is because the university
network I'm using seems to be cut from USA currently, ie. DNS queries
among other things fail.

 -Donwulff

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 17 Sep 1999 23:38:10 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Timing(?) errors

All,

I'm running Prime95 v18 on a Dell XPS P60 (no, I don't want to hear `switch
to factoring' or anything -- it's running double-checks). When activity
happens (in this case a Word document being opened and looked at), sometimes
the log shows stuff like:

[lots of 0.814 sec iteration times]
Iteration: 3077000 / 3644xxx. Clocks: [48.8 million] = 0.814 sec.
Iteration: 3078000 / 3644xxx. Clocks: [56.1 million] = 0.936 sec. <-- Word
Iteration: 3079000 / 3644xxx. Clocks: [46.7 million] = 0.779 sec.
Iteration: 3080000 / 3644xxx. Clocks: [47.0 million] = 0.784 sec.
Iteration: 3081000 / 3644xxx. Clocks: [46.7 million] = 0.779 sec.

Anybody have an idea about why it actually is faster now?
My best guess would be some kind of timing mistake, but it still doesn't
sound right...

(The computer has been left totally untouched after the Word usage. The Word
usage shows itself in the 0.936 timing. The machine should have enough mem --
40 MB. Running Win95, no evil CPU-hogging programs.)

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 17 Sep 1999 17:37:38 EDT
From: [EMAIL PROTECTED]
Subject: Mersenne: G4: real or hype?

Jonathan Zylstra wrote about the new Apple G4 processor (see quotes below).
The G4 is really a hybrid combination of the basic PowerPC CPU with a 128-bit
vector unit called the AltiVec.  Richard Crandall (who consults for Apple)
and Jason Klivington (of Apple) used a beta version of the G4 to do some of
the all-integer verification of our (Crandall, Mayer, Papadopoulos) F24
(24th Fermat number) project. Note that we didn't get a chance to test the
floating-point capabilities of the G4 - both main "wavefront runs of the
Pe'pin test of F24 were on other hardware (a 250MHz MIPS R10000 and a
167MHz SPARC Ultra-1), both achieved a length-1 million FFT-based squaring
time in under 1 second, whereas the all-integer version needed about a 5 times
as long on a 400 MHz G4 - one can't conclude anything from that, since it's
like comparing apples and oranges.

Based on the technical specs and the relative all-integer timings on the
G4 vs. the Pentium (with similar amounts of code optimization, the G4 looks
to be somewhat faster than a Pentium, but by less than a factor of 2), it 
looks
like a good processor, but all the ballyhoo needs to be put into perspective.

Before I continue, a disclaimer: I neither work nor consult for any computer
manufacturer. My taste in processors is simple: the faster, the better.

<<<"sustained performance of 1 gigaflop." (which makes the G4 a 
'supercomputer' )
"theoretical performance of 4 gigaflops"
"It is a 128 bit processor, and can perform 4 ( sometimes 8 ) 32bit =
floating pt. calculations per cycle."
"It is 3 times faster then the PIII 600Mhz">>>

While all of this may be technically true, it's also very misleading:

- - The 1 Gflop figure comes from the fact that the G4 can in theory dispatch 2
floating-point operations (1 mul, 1 add) per cycle, so at 500 MHz that equals
1 Gflop. I defy you to get that kind of performance out of, say, a code for a
large FFT-with very careful coding, you may get half that.

The above FP capabilities are not qualitatively different than for most other
current high-end processors (Pentium, SPARC, MIPS, Alpha) and are slightly 
less
than the AMD K7, which can dispatch up to 3 FP ops per cycle (2 adds, 1 mul).
Thus, by the same reasoning, AMD could legitimately claim 2 Gflops for a
667 MHz K7. Never mind that one will only see that kind of performance for
perfectly balanced, perfectly pipelined code whose data never leave the
FP registers.

- - 128-bit: also misleading. The AltiVec vector unit can do some fairly
nice operations in which a 128-bit integer operand is treated as a vector
of 4, 8, or 16 operands of 32, 16, and 8 bits, respectively, and can do
a nice variety of 4x32-bit vector FP operations, but in my opinion that
is far from constituting a true 128-bit CPU. The above enhancements are
qualitatively similar to the Pentium MMX enhancements - useful for things
like multimedia, but nearly useless when one is doing serious math with
64-bit operands. I say "nearly" since one can, e.g. build various 64-bit
integer operations out of ones on shorter operands, but it's a pain, say,
if one wants a 64-x64==>128-bit integer multiply, which is potentially
more usefull for LL testing than parallel 16x16==>32-bit multiplies. The
AltiVec 4x32-bit floats, like the similar MMX intructions, are not very
useful for FFT-based large- integer arithmetic - not enough precision.

- - "It is 3 times faster then the PIII 600Mhz." Based on what? Give us some
SPECint or SPECfp figures that support this before making such claims. The
only datum provided (below) indicates a speedup of 1.45x, far less than 3x.

<<<On the comparison table between the PIII and G4, they show this:

Test:                PIII Clock Cycles   G4 Clock Cycles    G4 =
Performance            <- (Adjusted for MHz)
256 Pt. FFT            6.94                        4                =
1.74x better than PIII        1.45x faster than PIII>>>

In what way is a tiny 256pt FFT a good indicator of overall system 
performance?
Was this single precision (I suspect so) or double? Was it specially coded
to use the 4x32-bit FP ops supported by the G4? (I suspect so.) If so, was
similar coding attempted to use the PIII MMX instructions? (I suspect not.)
What numbers emerge when the figures are adjusted not just for MHz, but also
for price?

I've also seen some of Apples's ads in the San Jose Mercury News - the phrase
"The fastest desktop computer on earth" was used. Apparently this depends on
one's definition of both "fastest" an of "desktop computer," (perhaps even
of "Earth." :) I can buy a desktop Alpha 21264 which probably blows the G4
out of the water, performance-wise (and there, we HAVE some SPEC numbers to
guide us) on 95% of generic compiled code, say in Numerical Recipes or the
SPEC benchmark suites. So again, folks at Apple, please back your
claims up with performance data based on real-world code, the kind most
programmers really write, that tests more of the instruction set (not just
the special goodies you included) as well as the entire memory system (not
just the registers and L1 cache.)

Now, if the G4 demonstrates a SPEC FP of over 50 at 500MHz, and a SPEC INT
of over 30, then the "fastest on earth" claim will be more believable.

None of which is to say it's not a darn good processor - but let's keep the
descriptive language within the realm of the reasonable, shall we?

Some related URLs (thanks to Jason Klivington of Apple for these):

A technical discussion of the instruction set:
http://www.motorola.com/SPS/PowerPC/teksupport/teklibrary/altivec_pem.pdf

A description of the C implementation of the AltiVec instructions:
http://developer.apple.com/hardware/altivec/pdf/altivec_support.pdf

Motorola's general AltiVec site:
http://www.mot.com/SPS/PowerPC/AltiVec

Have fun,
- -Ernst

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 17 Sep 1999 19:59:13 -0400
From: "Chris Nash" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P-1

Hi there

> Well, see the topic. In other words, is it possible to get a quick "P-1
> factoring for dummies" rundown?

Sure, let's give it a try. Suppose we do some calculations modulo some
number N. In effect we're actually doing all the calculations modulo all the
prime factors of N, but of course we don't know what those prime factors
are. But if we could find a calculation for which one of the prime factors
would 'disappear', it would help us find that factor.

Fermat's little theorem tells us that a^(p-1)=1 mod p, for all primes p. In
fact, for each a there is a smallest number x>0 such that a^x=1 mod p (the
exponent of a modulo p). All we usually know about x is it divides p-1 The
idea behind P-1 factoring is this, if we compute

R=a^(some big very composite number)-1 mod N

if our 'big composite number' is divisible by x, what is left will be
divisible by some unknown factor p of N, and we could find p by examining
gcd(R,N).

Usually the big composite number is a product of powers of many small
primes, so it has many factors and there is a good chance the unknown x
(which is probably p-1) is a factor of it.

> At which point is P-1 factoring worth the effort?

Probably as soon as your factoring attempts have exceeded what the machine
is comfortable with. If you've reached 2^32, try P-1 for a little while.
Trial-factoring will be slower if you carried on trying factors, also your
probability of success with trial-factoring large numbers is extremely low.
(p divides random N with probability 1/p).

Of course P-1 may fail. You may have to go a very long way before the
unknown x divides your big composite number - what if x itself has a large
prime factor? P-1 would not find it until your P-1 loop reached that prime
(in other words, your limit has to be bigger than the biggest prime factor
of the unknown x). However there are factors that P-1 finds 'easily', and
even a failed P-1 run can tell you a little more information about the
number which might help if you try some more trial-factoring. (you know for
instance that any factor must have some prime, or power of prime, factor of
p-1 above the limit).

> Does it have any overlappign with ECM?

The theory's very similar. Both algorithms attempt to find a point where one
of the unknown prime factors 'disappears' from the calculation. However ECM
has better odds. In P-1 attempts, the unknown x is fixed. You can't do
anything about it, and even if you try using a different base a, you're very
likely going to have a similar x with the same problems (a large factor). In
ECM, choosing a different curve could give an equivalent 'x' value that
varies a lot. One of those 'x' values may disappear quite quickly from the
calculations. (but again, with large values it could be a long time before
that happens).

Of course the steps involved in ECM factoring are a little more complex than
P-1...

> How should the bounds
> be chosen for any given exponent, and will higher bounds always find any
> factors smaller bounds would have, or is there an advantage to running
> lower bounds? As I understand, every run with _same_ bounds would find
> the same factors?

In theory you can change the base and the bounds. Changing the base often
has little or no effect, unless you are incredibly lucky (of course,
'obvious' bases related to the number are likely to be of little use - don't
use base a=2 for Mersennes!). Changing the bounds though can make the
difference between finding a factor, and not finding one. We may fail when
we test

a^(some small composite number)-1

but we may succeed when we test

a^((some small composite number)*(some larger composite number))-1

By writing it like this you can also see that the larger bound succeeds if
ever the smaller bound does. (the top line always divides the bottom line,
so any factors of the top also appear at the bottom). Of course larger
bounds take longer to calculate, and there is also a possibility that larger
bounds would find "more than one" factor in one run. Ideally you check
periodically through the calculation to see if you have already met a
factor, but that might take time.

The overriding "decision factor" is based purely on the time you're willing
to spend. Factoring being explicitly harder than primality testing, you
might be happy with, say, spending 10 times as long searching for a factor
as you would on a proof the number was composite. So you might find "some
very big composite number" 10 times the bit length of N was acceptable. 10
times the bit length of N is a good ballpark estimate for the bounds setting
for the P-1 test to get that sort of time required.

Of course if you were willing to spend 100, 1000 times as long, you could
set the bounds higher... but in that case, bear in mind that the P-1 test is
often unsuccessful. If you have that much time to spend, you might prefer to
dedicate it to a more sophisticated algorithm. Just like trial-factoring,
you have to increase the bounds *A LOT* before your chances of success
improve significantly - ultimately they are related very closely, because
success depends on the factors of the unknown magic number x.

Hope this helps

Chris Nash
Lexington KY
UNITED STATES


_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #627
******************************
Mersenne Digest V1 #627

Reply via email to