Mersenne Digest V1 #581

Anonymous Wed, 16 Jun 1999 17:14:09 -0700

Mersenne Digest        Wednesday, June 16 1999        Volume 01 : Number 581




----------------------------------------------------------------------

Date: Wed, 16 Jun 1999 07:50:28 -0700 (PDT)
From: Ashton Vaz <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Mersenne Digest V1 #579

From: "Brian J Beesley" <[EMAIL PROTECTED]>

> Could I respectfully suggest that, in future, list members flame 
> each other in private (at any rate, after they've posted their views 
> once). I don't much like the smell of toasting flesh.

    Apologies to everyone on this list that had to read my flames and
ranting!  I should've thought twice before responding so "strongly".

Ashton
P.S. - I solemly promise to not do it again. ( Repeated M38 times! ;-) )
_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 10:24:22 -0500
From: "Willmore, David" <[EMAIL PROTECTED]>
Subject: RE: OT: Mersenne: ARM Licenses

        [... clipped ARM9E core license press release--by Lucent ...]

> That sounds like it would make a nice disk for my laptop. I wonder if I
> could get it to do something in it's idle time? :-)
> 
The ARM are interesting processors.  They're great for embedded
applications--
which is where they have been specializing.  I haven't looked at them in
detail
since the ARM7 core days, so I might have missed something, but I don't
remember them being all that fast per/MHz.  I remember their goals being
small code space, low power usage, and small die.  Still would be cute to
have your HD do LL testing....

> There does seem to be some prejudice against our elderly less able
> processors.
> So what if it takes months for one task if there are enough of them. What
> a lovely way for those 486s to spend their retirement years running
> Prime95.
> I am guessing that the minima of the Price/Performance curve has not yet
> touched Pentia. Maybe the best P/P is the Z80,  I am certain that it can
> square and subtract 2 (still). This is not completely in jest. I am
> thinking
> (dreaming) about a dedicated box of cheap processors (Gate Arrays even?)
> that are nicely pipelined (and the other dimensions of parallelism) like
> the
> EFFs DeepCrack.
> Any thoughts?
> 
On the i486 front, I'd say, in the US at lease, that the cost of even
powering up
the box costs more than the processing you get from it.  Old Socket7 MBs can
be had for under $10 and the same goes for processors.  Throw in two or four
old 72 pin simms and you have a good low end Prime95 system--throw in a NIC
and net boot it into Linux--no HD, just a NIC with a boot PROM.

The topic of using FPGAs and such to do LL testing has been tossed around 
on this mailing list as long as I can remember--what, since the early to mid
'90s?
I think that there can be some merit to it, but the break even point--the
point at
which it's no longer practical to just buy off the shelf processors to do
the job--
is way up there.  Maybe EFF can afford to do it--or some university with
free
design labor in the form of students--but most of the people on this group
are
not in that group.  Too many mathematicians and not enough of us computer
engineers. :)  (kidding, just kidding, I love math, too...)

> Apologies if this is slightly off topic (tho' this can add depth and
> interest)
> as this wouldn't be very distributed or internet - although I would
> reserve
> the exponents for the machine via PrimeNet - or maybe I'll pick up one
> of those small long overdue.... [only joking!]
> 
I marked it OT, now, so they're warned.

What we need is some good integer factoring code with hand tuned assembly
for 6x86 (all flavors), IDT WinChip, and AMD K6 (all flavors).  I don't know
x86
assembly--I do RISC assembly, but I can probably figure it out given some
example
code.  I'm willing to work with some mathematician to get some of the code
written.  Anyone interested?  I'll do it myself if I get no replies. :)

Cheers,
David
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 18:05:18 +0200
From: Alex Kruppa <[EMAIL PROTECTED]>
Subject: Mersenne: Another bug question..

Hello all,

sorry for bringing this up again, but there�s one more question about the v17 bug on
my mind and I don�t remember it being asked/answered before (I may have missed it).
Would a doublecheck with v17 on an exponent >4.2M result in the same (wrong)
residue as the first (wrong) test?

Ciao,
  Alex.

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 14:05:48 -0300 (EST)
From: Marc Thibeault <[EMAIL PROTECTED]>
Subject: Mersenne: assigment and manual check in

        Hi everybody! The large amount of mail about assignments and
overdue exponents move me to check how things goes for me. I join GIMPS
some time by sending email to Georges who give me some ranges. I send 
him the result then switch to the primenet manual check in form when
I knew about it. Later I connected a some PC on Primenet but I still have
one machine in my home with no internet connection. One PC on Linux
completed the whole initial range of 5 exponent that Georges gave yo me
and the other in my home completed 4 out of 10 exponents, all in the
5000000. Now, I when I check in the Internet PrimeNet Individual Account,
it gives me no credit for that work. Where theses exponents given to
someone else or is there another place to go to check? And I do not
understand what happens in the case of the fice exponent that were care
for by the linux machine that compute 24 a day and work just fine (or it
seems to me. And the v17 bug cannot be the culprit since that machine was
working with an older version.  
        I do not want to bother Georges with these small details (but
important to me!) so I decide to send it to the list. But I sure
want to know if my PC at home is computing for nothing. In the case of the
exponents that were assigned to me for the machine in my home, I check in
results each 4-5 months (the machine is only running during the day). I
never receive a message saying that the exponent was already check in. I
took great pride and pleasure in being part of this adventure. I just want
thing straighten out and to know if the work in the 10 or so exponents
taht I check in before manually are lost ;-(. Thank you very much to the
patient folks that endure my bad english all that long! 

                                Marc Thibeault
                                e-mail: [EMAIL PROTECTED]
                                ICQ:    31348153

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Tue, 15 Jun 1999 22:31:24 -0400
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: 35 exponents left on  range 3310-3960

Hi,

At 09:56 PM 6/15/99 -0700, Rudy Ruiz wrote:
>however I must say that the speed at which the last dredges of exponents
>in a range (3310K-3960K) are reclaimed is  as slow as it can be.

Most of these have been given to a reliable version 14 user who is
not operating under primenet control.  So, you can expect little movement
for a while and then a big drops as he sends in his monthly batch of a
dozen results.

On a philosophical note, I am much less concerned than I used to be with
the milestone of completing first-time tests on all exponents below some
value.  There are roughly 20,000 exponents below 3,960,000 that haven't been
double-checked yet.  Given our present error rate, that means 200 of
these really haven't been tested yet.  Thus completing these last 35
first-time tests does not imply there are no more Mersenne primes below
3,960,000.  Don't get me wrong, methodical progress is important,
it's just that we can't really claim anything definitive until the 
double-checks are done.

Best regards,
George

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 14:54:55 -0400 (EDT)
From: lrwiman <[EMAIL PROTECTED]>
Subject: Mersenne:  interesting Mathematicians (was poaching)

>> P.S. - Nice to see that GIMPSers aren't cold calculating
>> mathematicians only!

>  Mathematicians don't have to be cold or uninteresting.

Boy, That's for sure!  Galois died in a gun fight at the age of 21.
Newton was one of the biggest assholes of all time.  Leibniz was an
alcoholic and a womanizer.  And Euler was a quiet family man, who had
7 of his 13 children die.  Norbert Weiner was one of the most inept
people of all time (I'm sure you've all heard the story of when he 
went to the wrong house...)  

Well, Ok, maybe Euler wasn't all that interesting, but mathematicians
can be as interesting as the next guy...
And they certainly work on something more important... ;-)

- -Lucas Wiman
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 16:09:35 -0400 (EDT)
From: lrwiman <[EMAIL PROTECTED]>
Subject: Re: Mersenne:  interesting Mathematicians

>> Norbert Weiner was one of the most inept
>> people of all time (I'm sure you've all heard the story of when he
>> went to the wrong house...)
>  I haven't
Taken verbatim from the Linux fortune file:
Norbert Weiner was the subject of many dotty professor stories.  Weiner was, in
fact, very absent minded.  The following story is told about him: when they
moved from Cambridge to Newton his wife, knowing that he would be absolutely
useless on the move, packed him off to MIT while she directed the move.  Since
she was certain that he would forget that they had moved and where they had
moved to, she wrote down the new address on a piece of paper, and gave it to
him.  Naturally, in the course of the day, an insight occurred to him.  He
reached in his pocket, found a piece of paper on which he furiously scribbled
some notes, thought it over, decided there was a fallacy in his idea, and
threw the piece of paper away.  At the end of the day he went home (to the
old address in Cambridge, of course).  When he got there he realized that they
had moved, that he had no idea where they had moved to, and that the piece of
paper with the address was long gone.  Fortunately inspiration struck.  There
was a young girl on the street and he conceived the idea of asking her where
he had moved to, saying, "Excuse me, perhaps you know me.  I'm Norbert Weiner
and we've just moved.  Would you know where we've moved to?"  To which the
young girl replied, "Yes, Daddy, Mommy thought you would forget."
        The capper to the story is that I asked his daughter (the girl in the
story) about the truth of the story, many years later.  She said that it wasn't
quite true -- that he never forgot who his children were!  The rest of it,
however, was pretty close to what actually happened...
                -- Richard Harter

>  apparently he was involved in military targeting systems, I suppose
>  the chinese are cursing his ghost.
Well, considering the recent security at Los Alamos, maybe they are blessing
it :')

- -Lucas Wiman
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 21:12:20 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: RE: OT: Mersenne: ARM Licenses

On 16 Jun 99, at 10:24, Willmore, David wrote:
> > 
> The ARM are interesting processors.  They're great for embedded
> applications--
> which is where they have been specializing.  I haven't looked at them in
> detail
> since the ARM7 core days, so I might have missed something, but I don't
> remember them being all that fast per/MHz.  I remember their goals being
> small code space, low power usage, and small die.  Still would be cute to
> have your HD do LL testing....
> 
I believe some of the current PDAs use StrongARM at speeds 
approaching 200 MHz. They have no floating point. There is at least 
one LL test client written for StrongARM.

> > I am guessing that the minima of the Price/Performance curve has not yet
> > touched Pentia. Maybe the best P/P is the Z80,  I am certain that it can
> > square and subtract 2 (still). This is not completely in jest.

Eh?

Almost twenty years ago I was running a TRS-80 Model 1 (1.77 MHz 
Z80). If you could wring one single-precision floating-point 
operation per millisecond from it, you were doing quite well. There 
wasn't even an 8-bit integer multiply opcode, you had to multiply 
even short integers in software.

I know a lot of Z80s were manufactured, and I guess you might even be 
able to find the odd one still in use somewhere (NASA's immensely 
successful Voyager spacecraft use an even more primitive 
microprocessor), but I reckon that, for LL tests, the combined power 
of all the Z80s ever manufactured is less than that of a couple of 
today's standard desktop PCs.

Based on the fact that Intel are apparently holding back the 100MHz 
FSB versions of the Celeron until Feb 2000 (according to August's 
"Personal Computer World" which arrived with me today) - the theory 
is that Intel don't want to undermine the PII/PIII market - I would 
suggest that the PP curve for current processors minimises somewhere 
around the Celeron 400. For complete systems the story may be 
different. I suggest that, when people make these sorts of 
comparisions, they also bear in mind the price per iteration in terms 
of energy consumption - a slow system built for next to nothing from 
scrap-heap parts will cost just as much to "feed" for a year as a 
fast system bought or built from new parts.

Did anyone else see the news story about the PlayStation II? 
Apparently the US government has classified it as "strategic 
ordnance" because its theoretical processing power falls into the 
"supercomputer" range!!!

Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 21:12:20 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Re: Mersenne Digest V1 #579

On 16 Jun 99, at 0:25, Steinar H. Gunderson wrote:

> >This is why George no longer supports
> >it in the CPU check boxes.  I wonder how long it will be before he drops
> >486's.
> 
> Hopefully there will be a while to -- my 486s are all performing excellent
> factoring.
> 
Well, you don't _have_ to update your software. And, if PrimeNet 
decides to "disown" 486's, you can still use the manual pages.
> 
> And if `most sense' is applied? Will a PII only get LL assignments, or
> nothing? (Currently, this isn't a big problem, though.)

Yes - so long as the _effective_ cpu speed (MHz * hrs per day/24 * 
RollingAverage/1000) exceeds whatever the cutover point between LL 
and double checking is. Somewhere around 170 MHz now, I think, the 
program code increases this linearly from 120 to 200 over a year 
since it was released.

If you run a PII-266 12 hours a day, you'll end up getting double 
checking assignments (at least once the RollingAverage has 
stabilised).
> 
> >transaction object.  This is where we set rules like, 'give all v17 clients
> >double-checking work'
> 
> Hmmmm, I thought they were bugged?
> 
v17 is fine for exponents < 2^22 (4194304). However, for larger 
exponents, the LL testing code (which is the same as the double 
checking code) goes wrong straight away. Importing a save file for 
_any_ exponent written by any version _except_ v17 and finishing it 
with v17 is also OK.

So long as double-checking assignments have exponents less than 2^22, 
the server-imposed rule is OK. But we're starting to close in on 2^22 
- - Scott, this rule may need to be looked at in a month or two!

Having said all that, v18 seems solid, I see no reason why anyone 
still running v17 (or earlier) shouldn't upgrade.

> (From what I've read on this list, there are two different series of LL
> numbers. Perhaps I'm just way off here.)

Well, actually, there are _lots_ of them ... this is because, for 
many integers y in [0 , 2^p-1] there is more than one integer x such 
that x^2-2 (mod 2^p-1) = y.

The starting value 4 is convenient because it works with _all_ 
exponents. So does, e.g., 10, but although both sequences starting 
with 4 and 10 end with residual 0 at iteration p-2 if 2^p-1 happens 
to be prime, the final residuals for the two sequences may not be 
equal if 2^p-1 happens to be compound. (In fact, they're not likely 
to be equal). Using a fixed starting value is helpful for cross-
checking results, and helps keep the code relatively simple.

Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 23:07:04 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Mersenne Digest V1 #580

On Wed, Jun 16, 1999 at 06:21:54AM -0700, Mersenne Digest wrote:
>> Also, if a is co-prime to n, a^T=1 mod n
>> 2 is obviously co-prime to n, so 2^T=1 mod n

Excuse me if I'm very stupid here, but isn't 1 mod n = 1 for _any_ n? We
are talking about the remainder of a division here, right?

>       If q (= 2kp + 1 in Chris's notation) is an odd prime

Again, primes happen to be odd, don't they?

- ---snip---

>- ---snip---

Oh, BTW, an era is over; this is my last digest. I'll be replying to
single messages from now on... (As if you were interested.)

- ---snip---

>Or, was the problem just that David S. won't be able to verify this number
>quickly like he usually does?  

George said something about `not being any free time on a Cray', so there's
a Cray double-checking it at idle priority right now. Wasn't the estimate
4 weeks?

- ---snip---

>- -----BEGIN PGP SIGNED MESSAGE-----

Whee! :-) Your key first, perhaps? And some signatures? Generally, I'd love
to see people use PGP more often...

>Comment: 92G5S="!!;F]T:&5R(%!E<FP@2&%C:V5R"@``

Interesting...

- ---snip---

>Actually this is NOT true, if we're talking about tracking iterations 
>up to 2^n then I can find the cycle with a storage space of 8n bytes, 
>provided you let me have a "false alarm" rate of 1 in 2^64.

You don't even need a false alarm probability. I got a mail with a
very interesting way of doing it: _If_ you hit an alarm, save the
whole array (LL residue) to memory. Then compute until the (64-bit)
value repeats itself again (it's bound to happen again, remember),
and now checking the _full_ LL residues against each other should
be trivial.

Of course, letting the cycle go twice (_without_ storing the full
FFT array) should give us even smaller error rates. If I had not
been so tired, I would, of course, have remembered what (2^64)^2
was. Hmmm, (2^64)*(2^64) = 2^(64+64) = 2^128, or something.

- ---snip---

>I think all of us are at least a bit impatient ...

Who knows, perhaps we will find another one even before the first one is
finished? Now, _that_ would be media coverage. (`GIMPS is spitting out
record primes quicker than the supercomputers can double-check them!')

>I think that the verification job has also been sent to at least one 
>other user who can definitely finish the job in 4 weeks.

Not `definitely', we're talking about idle time here. But I guess they
know how much those Crays will be used :-)

>If I was George, I'd wait until the verification comes in, then get 
>in touch with the EFF people and ask for permission to make a public 
>announcement, pending publication of the paper. But I _would_ wait 
>until the verification comes in.

Agreed.

- ---snip---

>I agree that this should be put in the FAQ, so that we don't have to 
>go through this every 1-2 months...

Yes, please. In the short time I've attended this list, I've seen it
pop up at least twice, and I've thought of it once myself. (Seems
like most everybody else on this list have, too... :-) )

>Looks like it.  To toss a date out for fun, sometime during October.  Recent
>non-linear positive effects on GIMPS participation like SETI@home

SETI@Home brought positive effects to GIMPS? Wow, this thing must be bigger
than we thought, considering the people who left us.

>radio ads

Radio ads? Is somebody paying to get out `join GIMPS' on everybody's radio?

>and the last newsletter have made an accurate guess tough.

Do you mean the `v17 bug' newsletter, or has there been one (with M?38) that
I didn't get? I _have_ signed up for the newsletter on the web page.

>How about this: if the FBI quote is right, GIMPS/PrimeNet is at today's rate of
>738 GFLOP/s worth between $182,000 and $486,000 per day in CPU time.  Of course,
>'past performance is no guarantee of future results'!

Looks like we should not go for decamillion digits, but million dollars a
day :-) Oh no, I'm wasting too much money on this project, I guess I must
stop all my computers from GIMPS work. (What FBI quote, BTW?)

- ---snip---

>At this pace it might take until
>the start of October to clear this range.

Be patient, please. If it was 2010, I would have reacted, but October is
in fact close!

>Notwithstanding this, I believe that those  35 souls that are still
>owing exponents, should be looked upon. Perhaps some have completely
>stalled. The computer might not be connected to the internet anymore or
>some funny mishap might be preventing them from reporting the results.
>There are ONLY 35 now and perhaps some focussing on them might be of
>use.

Well, that's what the `poaching' debate (which is now officially over,
I believe) has been all about.

- ---snip---

>Apologies for the flippancy. Let's put Peter's well-crafted reasoning in the
>FAQ.

The day somebody asks that question and I can answer `look in the FAQ' or
even `RTFM' will be a nice day :-) Now, only for _constructing_ a FAQ, I'm
not sure if this is within scope of any existing FAQ that we have.

Now, as the last `---snip---' has been made, school's over and summer is
outside, could I please ask godoni a question: Why are _all_ my beautiful
`---snip---'s (and other people's `-----BEGIN PGP etc.' prepended with `- '???
Let us have our initial dashes! Leave them alone!

/* Steinar */
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 23:13:10 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: NTPrime and proth

On Tue, Jun 15, 1999 at 10:01:04AM -0700, Mersenne Digest wrote:
>I wonder if there is any way to tell winNT to give NTprime six times as much
>cpu-time as proth and still have them both to be idle-priority so they dont
>disturb my other activities??

I'm not sure why you want to run two different projects. I'm afraid you'll
have to choose -- running them both at the same time will make _both_ slower
(due to increased OS overhead).

>Ex.  33% of cpu-power regardless if the load on the system is otherwise
>0% or 100%

The only way I could think of was (re)writing the program so it would only
use 1/3rd of the CPU time it was `awarded' (possibly via Sleep() or any
other evil Win32 API call), and then set it to `real-time' priority. Then
NT would try to give it 100% CPU time, and it would only use 33% of it.

Still, I think running one program, at idle priority, is the best thing to
do. (What if you had 70% idle time? Your program would still only be using
33%, unless you really ran two different programs.)

/* Steinar */
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 22:28:11 +0100
From: Nick Craig-Wood <[EMAIL PROTECTED]>
Subject: Re: OT: Mersenne: ARM Licenses

On Wed, Jun 16, 1999 at 09:12:20PM +0100, Brian J. Beesley wrote:
> There is at least one LL test client written for StrongARM.

See http://www.axis.demon.co.uk/armprime if you are interested.

This will work very well on the ARM9 (give or take a bit of tuning for
cache architecture) I'm sure...

- -- 
Nick Craig-Wood
[EMAIL PROTECTED]
http://www.axis.demon.co.uk/
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 16:30:11 -0500
From: "Willmore, David" <[EMAIL PROTECTED]>
Subject: RE: OT: Mersenne: ARM Licenses

> I believe some of the current PDAs use StrongARM at speeds 
> approaching 200 MHz. They have no floating point. There is at least 
> one LL test client written for StrongARM.
> 
I've heard of prototypes and proof of concept devices, but no actual
products.  The only product that I'm aware of that use them and are in
production are some set top boxes and the empeg (car mp3 player, way cool).

I would think it would be very cool to have my car stereo discover the next
mersenne prime. *laugh*

> Based on the fact that Intel are apparently holding back the 100MHz 
> FSB versions of the Celeron until Feb 2000 (according to August's 
> "Personal Computer World" which arrived with me today) - the theory 
> is that Intel don't want to undermine the PII/PIII market - I would 
> suggest that the PP curve for current processors minimises somewhere 
> around the Celeron 400. For complete systems the story may be 
> different. I suggest that, when people make these sorts of 
> comparisions, they also bear in mind the price per iteration in terms 
> of energy consumption - a slow system built for next to nothing from 
> scrap-heap parts will cost just as much to "feed" for a year as a 
> fast system bought or built from new parts.
> 
Quite true.  I appreciate the intent of the people who say 'my 486 has been
doing LL testing for years and it'll keep doing it', but I don't understand
their logic.  Well, outside of the US, the situation is different.
Apologies to those who don't have computer shows every month where one can
purchase parts at wholesale.

> Did anyone else see the news story about the PlayStation II? 
> Apparently the US government has classified it as "strategic 
> ordnance" because its theoretical processing power falls into the 
> "supercomputer" range!!!
> 
Ha, ha.  Maybe they should look at some of the telecommunicaitons equipment.
I know of one card made my motorola that has 15 100MHz 563xx DSPs on it.
Now, before you say, "well, gee, I can get a couple of 550 MHz PIII..." keep
in mind that these chips can do one multiply-accumulate, two data moves, two
address calculations, and a logical operation (plus all data moves give
shifts for 'free') *and* maintain internal and external DMA at full speed.
Consider that the chips add up to a 4.5GB/s *sustained* DSP<>memory
bandwidth, too.  You can stuff a cabnet full of 16 of these and hook four
cabinets together. Hmmmm, add that one up. :)

Yeah, the US gov is a little out of date on these things.  Distributed
processing essentially made their job of regulating computing power
impossible.

Cheers,
David
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 15:52:51 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: Mersenne: Thoughts on Merced / IA-64

I was just perusing the IA-64 docs that came out last month...I came up with
a few thoughts on how it would be a GREAT mersenne prime CPU:

- - 128 FPU registers (126 usable)
 96 of them are rotating (not stacked) which I imagine could be used to the
code's advantage quite well, holding more data in registers during the FFT

- - 82bit FPU (??)
 One document mentioned 82 bits for the FPU and registers.  I imagine this
would help with round-off problems vs. the 80 bit FPU core.  The IA-32
processors had 80 bits, right?  The 82 bits are: 64 bit significand, 17 bit
exponent, 1 bit sign.  The IEEE double extended only specifies 80, but there
we are with 82.

- - Memory "speculation"
 Preload code and/or data...while the FPU is churning away, preload more
data into L2/L1 cache so it's in the high-speed memory by the time it's
needed (data prefetch/lfetch).  That will REALLY help on these large FFT
datasets!

- - Faster FPU
 On top of all this, the FPU core is supposedly redesigned to do more per
clock cycle.  Some of the "enhancements" I spotted were: having 4 FP
multiplier accumulators (single precision), the fused multiply-add
instruction enhancements, load-pair instruction to load 2 FPU registers
simultaneously, etc.

- - 64 bit integer ops
 Integer unit with 64 bits...need I say more?

- - 128 64 bit general purpose registers

- - 64 one bit predicate registers
 Separate registers to control the conditionals branching/execution

- - 8 64 bit branch registers
 Finally some more registers to hold branch address locations

- - 128 "application registers"
 Don't know about these...some are earmarked "for future use".  Hrmm...

- - Bunch of fun parallel arithmetic instructions
 Probably useful for large numbers...


Anyway, that's just skimming the surface.

I figure with 126 usable 82 bit FP registers, you can have A LOT of stuff
done in the registers alone, speeding up stuff greatly and really trimming
down on worrying about rounding errors once it comes out of the register.
Prefetching data into the cache from main memory will also help quite a bit.
The FPU instruction set has a few new goodies that I foresee could help out
with FFT algorithms.

Not being really on top of how the FFT code really works, I'll leave it to
others to figure out how best this would all help George's code.  And
George...I hope you'll work on a nice IA-64 native program to use all this
cool new stuff once it's available.  Using all the EPIC "hints" in your
assembly code might be tricky at first, but I think the payoff would be
significant.

Aaron

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 16:28:41 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: OT: Mersenne: ARM Licenses

> > Did anyone else see the news story about the PlayStation II?
> > Apparently the US government has classified it as "strategic
> > ordnance" because its theoretical processing power falls into the
> > "supercomputer" range!!!

> Yeah, the US gov is a little out of date on these things.  Distributed
> processing essentially made their job of regulating computing power
> impossible.

If I read the article I saw on it right (somewhere on www.latimes.com), it
mentioned the limit as involving the new (to me) MTOPS, millions of
theoretical operations per second.  I guess the Playstation II does 'em too
fast.  The PSXII is around 1.2 MTOPS or some such...over the limit anyway.
I read somewhere that alot of companies have actually been fighting this OLD
limit for years, trying to raise it to 12 MTOPS or thereabouts.

Primenet certainly goes WAY over the limit...is the Primenet server
"enabling" foreign countries to tie together into one massively parallel
effort in violation of export laws?  Oh man...I shouldn't have said that!!

BTW - Read http://www.cnn.com/TECH/computing/9906/15/supercomp.idg/

Does this sound on the level?  I'm skeptical of remarkable claims like
this...

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 16:45:40 -0600
From: "Blosser, Jeremy" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Thoughts on Merced / IA-64

> -----Original Message-----
> From: Aaron Blosser [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, June 16, 1999 4:53 PM
> To: Mersenne@Base. Com
> Subject: Mersenne: Thoughts on Merced / IA-64
> 
> 
> I was just perusing the IA-64 docs that came out last 
> month...I came up with
> a few thoughts on how it would be a GREAT mersenne prime CPU:
> 
> - 128 FPU registers (126 usable)
>  96 of them are rotating (not stacked) which I imagine could 
> be used to the
> code's advantage quite well, holding more data in registers 
> during the FFT
> 

Eh, it would only really help if you wanted to unroll quite a few loops... I
think that as can been seen from the RISC processors out there, it really
doesn't help a _whole_ lot as far as the FFT goes. I suppose you could move
to a radix-8, but that's about the extent of it. Would going past radix-8
help a whole lot?

> - 82bit FPU (??)
>  One document mentioned 82 bits for the FPU and registers.  I 
> imagine this
> would help with round-off problems vs. the 80 bit FPU core.  The IA-32
> processors had 80 bits, right?  The 82 bits are: 64 bit 
> significand, 17 bit
> exponent, 1 bit sign.  The IEEE double extended only 
> specifies 80, but there
> we are with 82.
> 
> - Memory "speculation"
>  Preload code and/or data...while the FPU is churning away, 
> preload more
> data into L2/L1 cache so it's in the high-speed memory by the 
> time it's
> needed (data prefetch/lfetch).  That will REALLY help on 
> these large FFT
> datasets!
> 

Is this limited to MMX/SIMD data only? The 3DNow (and KNI?) instruction sets
have prefetch for their SIMD opcodes, but those of course are single
precision and really kinda useless. :P

> - Faster FPU
>  On top of all this, the FPU core is supposedly redesigned to 
> do more per
> clock cycle.  Some of the "enhancements" I spotted were: having 4 FP
> multiplier accumulators (single precision), the fused multiply-add
> instruction enhancements, load-pair instruction to load 2 FPU 
> registers
> simultaneously, etc.
> 
> - 64 bit integer ops
>  Integer unit with 64 bits...need I say more?
> 

Doesn't really help a whole lot. Honest. :) Mainly cuz integer and single
precision operands are hardly ever used.

Well, maybe the fused multiple-add instructions. But I haven't looked to see
exactly what they are...

> - 128 64 bit general purpose registers
> 
> - 64 one bit predicate registers
>  Separate registers to control the conditionals branching/execution
> 
> - 8 64 bit branch registers
>  Finally some more registers to hold branch address locations
> 
> - 128 "application registers"
>  Don't know about these...some are earmarked "for future 
> use".  Hrmm...
> 
> - Bunch of fun parallel arithmetic instructions
>  Probably useful for large numbers...
> 

Whatever that means...

> 
> Anyway, that's just skimming the surface.
> 
> I figure with 126 usable 82 bit FP registers, you can have A 
> LOT of stuff
> done in the registers alone, speeding up stuff greatly and 
> really trimming
> down on worrying about rounding errors once it comes out of 
> the register.
> Prefetching data into the cache from main memory will also 
> help quite a bit.
> The FPU instruction set has a few new goodies that I foresee 
> could help out
> with FFT algorithms.
> 
> Not being really on top of how the FFT code really works, 
> I'll leave it to
> others to figure out how best this would all help George's code.  And
> George...I hope you'll work on a nice IA-64 native program to 
> use all this
> cool new stuff once it's available.  Using all the EPIC 
> "hints" in your
> assembly code might be tricky at first, but I think the 
> payoff would be
> significant.
> 

The EPIC hints are probably the biggest benefit. Its hard to convince the
CPU to take the right branches and O-O-O execution can really mess up the
pipeline. (Its hard to tell what the heck the P6 is doing dangit!)

> Aaron
> 

I'm a bit curious about the K7. Just from the minimal specs I've looked at.
Might be able to squeek a few % more out it than a similarly clocked PIII.
What has me wondering is the 3DNow instructions they added for DSP
instructions. I'm sure their single precision. But it seems kinda wacky they
added 'em in the first place.

Now, if Intel decided to put some extra silicon and support double precision
FP ops in the SIMD instruction set (The registers support it, the silicon
doesn't). Then you'd be able to get double the thrughput in the FFT code,
plus I think the latency goes down (from 2 cycles to 1?) For multiplies.

Has me thinking a bit about a NTT algorithm for doing the FFTs with integers
instead of doubles and using MMX instructions to speed it up...

But then again, I'm working on a totally different algorithm right now
anyway that _should_ be fast. But then again, I'm probably forgetting
something, so until I work out some of the details on paper, I'll leave that
one in hiding. ;)

- -Jeremy
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 17:58:05 -0500
From: Gary Diehl <[EMAIL PROTECTED]>
Subject: Mersenne: $1000 supercomputer

I thought this was interesting...

http://www.cnn.com/TECH/computing/9906/15/supercomp.idg/index.html

If you don't have time to read it, here are some quotes:

"Within 18 months, you may be able to put the equivalent of today's
supercomputer on your desktop--for about $1000"

"The new computer will be able to process 100 billion instructions per
second, according to Kent Gilson, chief technical officer of Star Bridge
Systems."

"HAL-300GrW1, a "hypercomputer" that is said to be 60,000 times as fast
as a 350-MHz Pentium, and many times as fast as IBM's supercomputer
Pacific Blue."

...

Goodness, if we could get one of these for GIMPS...

Gary Diehl
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Wed, 16 Jun 1999 18:14:03 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Thoughts on Merced / IA-64

> > - 128 FPU registers (126 usable)
> >  96 of them are rotating (not stacked) which I imagine could
> > be used to the
> > code's advantage quite well, holding more data in registers
> > during the FFT
>
> Eh, it would only really help if you wanted to unroll quite a few
> loops... I
> think that as can been seen from the RISC processors out there, it really
> doesn't help a _whole_ lot as far as the FFT goes. I suppose you
> could move
> to a radix-8, but that's about the extent of it. Would going past radix-8
> help a whole lot?

Like I said, I'm not sure how the FFT code really works, so I couldn't say
how much this would help.

> > - Memory "speculation"
> >  Preload code and/or data...while the FPU is churning away,
> > preload more
> > data into L2/L1 cache so it's in the high-speed memory by the
> > time it's
> > needed (data prefetch/lfetch).  That will REALLY help on
> > these large FFT
> > datasets!
>
> Is this limited to MMX/SIMD data only? The 3DNow (and KNI?)
> instruction sets
> have prefetch for their SIMD opcodes, but those of course are single
> precision and really kinda useless. :P

The IA-64 lets you preload code/data for ANYTHING, or at least for the FPU,
not just KNI/SIMD stuff.  It's with the "lfetch" (not sure if that's the
"official" opcode) command.  The instruction for loading 2 FPU registers at
once (as long as the data is next to each other in memory) would also be
pretty handy for mem->register loads, even if you didn't "prefetch" it.

> > - 64 bit integer ops
> >  Integer unit with 64 bits...need I say more?
>
> Doesn't really help a whole lot. Honest. :) Mainly cuz integer and single
> precision operands are hardly ever used.

Well, I was thinking for the pre-LL trial factoring.  I wonder though if it
wouldn't be helpful at all in doing an integer FFT?  Probably not, if the
FPU is all it's cracked up to be.

> > - Bunch of fun parallel arithmetic instructions
> >  Probably useful for large numbers...
>
> Whatever that means...

Adding numbers to multiple registers for instance, or doing multiple
multiplies.  Maybe that sort of stuff could come in handy for the modulo??
Just guessing.

> The EPIC hints are probably the biggest benefit. Its hard to convince the
> CPU to take the right branches and O-O-O execution can really mess up the
> pipeline. (Its hard to tell what the heck the P6 is doing dangit!)

For NT, if you really want to know what the P6 is doing, you can get the
www.sysinternals.com program to view Pentium performance counters of various
sorts (CPUMon 2.0).  Not sure if that's what you meant.

At the very least, pipelining the instructions in such a way that, for
example, while the FPU is chugging away on something, you could prefetch
data into L1 cache for the next bit, do a couple other tidy little jobs,
etc.  Again, since the LL test is pretty serialized, it's more a matter of
keeping it as efficient as possible to eliminate any lags, so the FPU will
*always* have something to do right away.  Beyond that, finding more
efficient ways of using the FPU would be nice.  Keeping more of the data in
the registers will help since you can eliminate some rounding errors by
doing that.

> I'm a bit curious about the K7. Just from the minimal specs I've
> looked at.
> Might be able to squeek a few % more out it than a similarly clocked PIII.
> What has me wondering is the 3DNow instructions they added for DSP
> instructions. I'm sure their single precision. But it seems kinda
> wacky they
> added 'em in the first place.

>From what I've seen, the K7 supposedly does even as much as 40% faster in
FPU benchmarks than a similarly clocked PIII.  Interesting, if true, but
again, those probably are single-precision numbers...

> Now, if Intel decided to put some extra silicon and support
> double precision
> FP ops in the SIMD instruction set (The registers support it, the silicon
> doesn't). Then you'd be able to get double the thrughput in the FFT code,
> plus I think the latency goes down (from 2 cycles to 1?) For multiplies.

The registers can hold it (since it uses the same FPU registers IIRC), but
the microcode would have to be significantly tweaked to handle the
double/extended double data.  Since they are "multimedia" instructions, it
makes sense that they only made them single precision capable, but it still
would have been nice...  Of course, you could always right the FFT to only
use single-precision! :-)

> Has me thinking a bit about a NTT algorithm for doing the FFTs
> with integers
> instead of doubles and using MMX instructions to speed it up...

Thus my idea about using the 128 64 bit integer registers (and the cool
integer math ops) of the IA-64.

> But then again, I'm working on a totally different algorithm right now
> anyway that _should_ be fast. But then again, I'm probably forgetting
> something, so until I work out some of the details on paper, I'll
> leave that
> one in hiding. ;)

You can tell me, I'm your brother.

By the way all, Jeremy probably wouldn't have said so, but he's getting
married this June 26th.  Gifts can be made payable to me and I'll make sure
he gets 'em! :-)

Aaron

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

End of Mersenne Digest V1 #581
******************************
Mersenne Digest V1 #581

Reply via email to