Mersenne Digest         Tuesday, June 19 2001         Volume 01 : Number 862




----------------------------------------------------------------------

Date: Sun, 17 Jun 2001 16:20:03 -0400 (EDT)
From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]>
Subject: Mersenne: finding large primes for integer FFTs (long)

Hey everybody. I've realized after finishing an integer convolution
library for the Alpha (using number-theoretic FFTs) that even a nice
64-bit processor like the Alpha couldn't give me the speed I wanted for
integer-only Mersenne-mod squaring. I've also wanted for a while to build
integer-only convolution on x86 machines that was at least somewhat
comparable to Prime95 in speed.

The big problem with the code now is that integer multiplies still take
too long. A floating point FFT can not only rearrange its structure to
minimize the amount of arithmetic but can also take advantage of floating
point hardware that can finish a result (or even two) every clock
cycle. In contrast, 62-bit integer modular multiplication takes a minimum
of 6 clocks on an Alpha ev6 (using highly vectorized assembly code) and
with extreme contortions takes 40 clocks on a Pentium and 30 clocks on an
Athlon. There are also no simplifications you can make to a
number-theoretic FFT, because typically a 4th- or 8th- root of unity has
no special form that reduces the amount of arithmetic when you're working
modulo a 62-bit prime (I use 62 bits because carries are easier to handle
than with 64-bit primes).

Choosing different kinds of primes won't help either. There are maybe 8
primes of 32-bit size that could be used for a large integer FFT, but none
of them allow a large root of 2 like a Mersenne-mod DWT requires
(never mind that you'd need two of them and CRT reconstruction). For
primes of the form i*2^32 + 1, with i between 2^30 and 2^31, there are 14
primes that allow a large root of 2 and hence allow a big DWT; however, I
can't find any primitive roots for these primes that allow "simple" roots
of unity, and so FFT speed is limited to the cycle counts mentioned
previously. As several people on this list are aware, 2^64-2^32+1 is
prime, allows for a big DWT, and has primitive roots that allow for simple
64th roots of unity. I've written a lot of x86 assembly code for small
FFTs in a finite field modulo this prime, and it seems that an FFT
butterfly takes 20-25 clocks on a Pentium (probably somewhat fewer on an
Athlon). While this is wonderful compared to general 64-bit arithmetic,
it's a far cry from the 10 clocks or so that a floating-point FFT
butterfly may need. Dealing with carries and borrows out of 64 bits also
slows it down on the Alpha, to the point where (on paper, at least) it's
just as fast to use one of those 14 other primes that do the same job.

Nussbaumer convolution and Schonhage-Strassen squaring are other
candidates, and have very little arithmetic (Nussbaumer in particular can
be optimized to be really fast), but their memory efficiency is
horrible. Both require arithmetic on huge numbers in their initial stages,
and there's no way to load a block of data into cache and spin on it for
several FFT passes like you can do with a floating point FFT. There's also
much more movement of data with these algorithms, although with Nussbaumer
you can minimize that by making the movement "virtual". Prime95 gets most
of its speed from very careful data placement that minimizes memory
traffic, and these algorithms are not good substitutes.

Finally, there are fast Galois transforms that use complex arithmetic but
with integers. A large-radix FFT is possible here, and using Mersenne
primes makes that arithmetic easier; but each complex multiply has four of
those awful integer multiplies in it, and even with a split-radix
formulation the work only seems to be about 10%-15% less than a
conventional integer FFT, for much more complexity. That's on the Alpha,
which can handle complex integer multiplies enormously faster than x86
machines can. Performance comparable to that of prime95 would require at
least a factor of two better than that.

I first discovered GIMPS around the middle of 1996, and I guess I'm
spouting off like this because years of sweating on all these methods with
no breakthrough has made me bitter. I want to contribute more than just
one dinky computer to this project, because computational number theory
and code optimization both fascinate me.

There may be a way. Nobody seems to have considered using very large
primes for an integer FFT (by large I mean 512 bits or more). Using huge
primes like this means less memory consumption (the runlengths are
smaller), more flexible FFT sizes (since the number of bits you pack into
an array element can vary over a wide range, you can use power-of-2 run-
lengths and still get the effect of a non-power-of-2 size FFT), and the
cost of arithmetic is amortized over many machine words. 

As with small primes, the chief obstacle is to find a way to do
relatively big multiprecision multiplies at very high speed. A fancy
multiply algorithm wouldn't give the speedup I'm looking for, so instead
I'm looking for large primes that have simple roots of unity. In
particular, I'm looking for numbers of the form i*2^512+1, for i the size
of one machine word, that 

1) have i>2^24 (allows 512 bits per convolution element)
2) have a large root of two ( 2^13 or more )
3) have some power of two primitive root that is of simple form.

For example, when i = 0x1326301 there are many primitive roots that allow
a 4th root of unity which only has 4 bits in it. This allows a radix-4
integer FFT which is more efficient than the radix-2 variety one is
ordinarily stuck with. Unfortunately, this prime doesn't have a large root
of 2.

I want to find more of these numbers, but it's pretty slow going. The
Alpha here will take weeks to search through all 32-bit values of i,
and if a suitable one isn't found (likely) I'll have to try i*2^32. Is
there anyone here who can lend computer power to the search? I have a
program that uses GMP and is optimized to the hilt; I have also ported
a subset of GMP 3.0 to Windows and can provide a DOS executable if anyone
is interested, as well as source. 

Any help with the theory behind all this or offers of computational
firepower would be greatly appreciated.

jasonp

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 17 Jun 2001 18:27:46 -0400
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Top Producers

Hi,

At 07:50 PM 6/17/2001 +0200, =?Windows-1252?Q?Ignacio_Larrosa_Ca=F1estro?= 
wrote:
>I sent during the last week 10/16 of June five results, as incicate my
>personal account report:
>
>But, in the Top Producers List (http://www.mersenne.org/top.htm), the
>number of exponents tested increase only from 811 to 815.
>
>Where is the exponent lost?

One of your exponents dropped out of my database.  Namely
M9318461 was later found to have a factor: 571468890364939711103

Although this isn't a particularly fair way to keep track of the top producers,
it has always been done this way.  Everyone has any equal chance of
losing a LL test.....

Regards,
George


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 17 Jun 2001 21:26:55 -0400
From: George Woltman <[EMAIL PROTECTED]>
Subject: Mersenne: Good news for Pentium 3 and Celeron 2 owners

Hi all,

        My Celeron-633 arrived two weeks ago and good news it
easily overclocked to 950.

        The Celeron 2 core is the same as the Pentium 3 core.  These
both support memory prefetch hints.  I've added these hints to prime95
and am getting about a 20% performance boost.

        Since many mailing list readers want access to these improvements
I'm making another pre-release of v21 available.  This pre-release has not gone
through Ken Kriesel's rigorous QA suite (it's been through a very quick
check).  While, I'm fairly confidant this version will produce good results,
you have been forewarned.  You can download it at
ftp://mersenne.org/gimps/p95v21a.zip  This is the executable only,
replace the prime95.exe from the v20 release with this new executable.

        There is still no timetable for the official v21 release.  I may be able
to squeeze out a few more percent, there are new features to add, QA, etc.

        Now the bad news.  The Athlon also supports these memory prefetch
instructions but the one user who tried it reported no gain.  Also, the new
executable may also be slightly slower on the P-II and Celeron 1 as it must
ignore these prefetch instructions.

        Are there any Athlon owners that would like to take a crack at
optimizing the v21 source so that Athlon owners can enjoy a speed boost too?
Alternatively, are there any Athlon owners in the Orlando area that might
be willing to let me use their machine sometime in the next month?

Have fun,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 17 Jun 2001 22:14:38 -0700
From: "Terry S. Arnold" <[EMAIL PROTECTED]>
Subject: Mersenne: Prime95 v21.1a

Folks

I downloaded and installed v21.1a. From a single machine (a P III 600 @ 
867) it runs about 23% faster. Also from a single data point on a C I (C I 
300 @ 464) it runs 2 % slower. This matches up with what George said about 
this version.

Terry

Terry S. Arnold 2975 B Street San Diego, CA 92102 USA
[EMAIL PROTECTED] (619) 235-8181 (voice) (619) 235-0016 (fax)

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 09:25:53 +0200
From: mohk <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Good news for Pentium 3 and Celeron 2 owners

At 03:26 18.06.2001, you wrote:
>Hi all,
>
>         Now the bad news.  The Athlon also supports these memory prefetch
>instructions but the one user who tried it reported no gain.  Also, the new
>executable may also be slightly slower on the P-II and Celeron 1 as it must
>ignore these prefetch instructions.

This is a very bad information. I hope we'll have speedups in further versions.


>         Are there any Athlon owners that would like to take a crack at
>optimizing the v21 source so that Athlon owners can enjoy a speed boost too?
>Alternatively, are there any Athlon owners in the Orlando area that might
>be willing to let me use their machine sometime in the next month?

I would let you use my mashine, but Germany is far away. :)


>Have fun,
>George

Deto,

Christian

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 01:34:42 -0700
From: Eric Hahn <[EMAIL PROTECTED]>
Subject: Mersenne: Prime95 - V21.1.1  aka v21a

Hi All,

  I downloaded and ran the new v21a and did some timings on
several different machines and compared them to timings done
on v19 and v20...

  I ran the timings on each version for 100 screen outputs at
100 iterations per screen output...  for a total of 10000
iterations...  and then averaged the iteration times...  I
used the same exponent on all versions and machines...  (an
exponent I am primality testing that is just shy of 11,400,000)

  What I can up with was these results...

                       V19    V20    V21
                      -----  -----  -----
PII at 266Mhz         0.579  0.578  0.586
Celeron 1 at 466Mhz   0.605  0.604  0.615
Celeron 2 at 550Mhz   0.289  0.288  0.223
P3 at 733Mhz          0.239  0.238  0.188
Athlon at 1333Mhz     0.100  0.098  0.077

  Admittedly, while the 466Mhz Celeron 1 has almost double what
the iteration time was expected to be... (for some yet unknown
reason -- Prime95 is getting 98.9% of the CPU time)...  the %
increase/decrease is still evident....

  Note also that the Athlon *did* have a performance increase
on par with the Celeron 2 and P3 machines....

Eric


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 05:20:19 -0400
From: "Rick Pali" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Good news for Pentium 3 and Celeron 2 owners

From: George Woltman

> I've added these hints to prime95 and
> am getting about a 20% performance boost.

My turn to report! :-)

          v20    v21
        --------------
p3/850   0.301  0.215
p3/515   0.398  0.309

The 515 is a slightly overclocked 500Mhz processor. Both machines are
running LL tests against exponents in the 12.8 million range.

Very nice work George! Thanks for all your efforts.

Rick.
- -+---
[EMAIL PROTECTED]
http://www.alienshore.com/seeking/

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 12:05:39 +0100
From: "Michael Bell" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Prime95 - V21.1.1  aka v21a

>   Note also that the Athlon *did* have a performance increase
> on par with the Celeron 2 and P3 machines....
>
> Eric

Is it possible that the Athlon that didn't see the increase was an original
Athlon (rather than a T'bird) and so didn't have the prefetch instructions?

Michael.


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 15:52:05 -0600
From: "Matt Goodrich" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Prime95 - V21.1.1  aka v21a

Hmmmm,
I am not getting any performance increase on my 2 Athlon's here, and I am
99% sure they are Thunderbirds.
I just tried this new executable on 7 machines. Here are the numbers.

PentiumIII 450MHz Testing M12441xxx
Iteration Times
V.20            V.21
0.330           0.255

PentiumII 233MHz Double Checking M6144xxx
Iteration Times
V.20            V.21
0.298           0.292

PentiumII 350MHz Testing M12316xxx
Iteration Times
V.20            V.21
0.416           0.423

Athlon 850MHz Testing M12328xxx
Iteration Times
V.20            V.21
0.174           0.174

Athlon 1200MHz Testing M12899xxx
Iteration Times
V.20            V.21
0.151           0.151

According to Windows 2000 these 2 Athlon's are "Family 6, Model 4, Stepping
2"

Pentium 166MHz Double Checking M6333xxx
Iteration Times
V.20            V.21
0.601           0.607

Pentium 133MHz Double Checking M6333xxx
Iteration Times
V.20            V.21
0.697           0.692

What I find interesting, is that I got an increase, albiet a very minor one,
instead of a decrease in performance on the PentiumII 233 and the P133, yet
my PentiumII 350 and P166 did decrease in performance, like everyone else is
reporting.

I wonder if the operating system has anything to do with this.
The PII 233 and the P133 are running Windows 2000 Professional and the PII
350 and the P166 are running Windows 98 and 95 respectively.

Matt

- -----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Michael
Bell
Sent: Monday, June 18, 2001 5:06 AM
To: Mersenne List
Subject: Re: Mersenne: Prime95 - V21.1.1 aka v21a


>   Note also that the Athlon *did* have a performance increase
> on par with the Celeron 2 and P3 machines....
>
> Eric

Is it possible that the Athlon that didn't see the increase was an original
Athlon (rather than a T'bird) and so didn't have the prefetch instructions?

Michael.


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 19 Jun 2001 01:43:16 +0100
From: "Michael Bell" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Prime95 - V21.1.1  aka v21a

> Hmmmm,
> I am not getting any performance increase on my 2 Athlon's here, and I am
> 99% sure they are Thunderbirds.

It appears I was wrong, the prefetch is available on all Athlons not just
Thunderbirds, so it must be something else.

Michael.


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 19:16:31 -0500
From: [EMAIL PROTECTED] (Mikus Grinbergs)
Subject: Re: Mersenne: Prime95 - V21.1.1 aka v21a

One more possibility to keep in mind for Athlons:  BIOS level
(and perhaps also which motherboard).

Don't have windows, so can't try the optimized prime95, but
have noticed on "timing runs" with the V20 mprime that my new
(ASUS A7M266) Athlon is now 2% slower than when I first got it.
The only explanation I can come up with is that hardware setup
by the BIOS is responsible for this difference.  (Nothing in my
system has changed since I started, except that I upgraded to the
latest BIOS -- and I typed in the exact same BIOS parameters as
I did originally.)  Perhaps the newer BIOS software has changed
in some manner that would explain the timing difference I saw.

mikus

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 21:16:18 -0400
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Prime95 - V21.1.1  aka v21a

Hi,

At 01:43 AM 6/19/2001 +0100, Michael Bell wrote:
> > I am not getting any performance increase on my 2 Athlon's here, and I am
> > 99% sure they are Thunderbirds.

Some Athlons are seeing a speed increase others are not.   The two
that I know are not enjoying a speed increase are running under Win2K.
Maybe there is a bug in the way v21.1 determines if prefetch is supported.

For those Athlon owners that are not seeing a speed boost, try setting
         CpuSupportsPrefetch=1
in local.ini.

Good luck,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 19:55:44 -0600
From: "Matt Goodrich" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Prime95 - V21.1.1  aka v21a

> Some Athlons are seeing a speed increase others are not.   The two
> that I know are not enjoying a speed increase are running under Win2K.
> Maybe there is a bug in the way v21.1 determines if prefetch
> is supported.
>
> For those Athlon owners that are not seeing a speed boost, try setting
>          CpuSupportsPrefetch=1
> in local.ini.

This sure worked for me.

Athlon 850MHz Testing M12328xxx
Iteration Times
V.20            V.21            V.21 with CpuSupportsPrefetch=1 added to local.ini
0.174           0.174           0.141

Athlon 1200MHz Testing M12899xxx
Iteration Times
V.20            V.21            V.21 with CpuSupportsPrefetch=1 added to local.ini
0.151           0.151           0.123

Both computers running Windows 2000 Server. Note that stopping Prime95
(leaving Prime in systray) then editing local.ini, then starting Prime
wasn't good enough. I had to actually close the program, then restart
Prime95.

Matt

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 18 Jun 2001 22:57:44 -0400
From: Nick Glover <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Prime95 - V21.1.1  aka v21a

My 1 GHz Thunderbird in Win98 SE showed speed improvements without putting 
anything in the .ini file.  I used the Advanced->Time option and noted the 
following two anomalies:

1) At 512 KB and only 512 KB, the program returned the following along with 
my timings:

timer 0: 1100364
timer 1: 4152504
timer 2: 3305631
...
and so on for timer #'s 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 20, 21, 22, 23, 30

When I rerun the test, the numbers appearing after "timer #:" are 
different, but the exact same timer #'s are outputted.


2)

FFT size        v20     v21a
192 KB  0.037   0.030
224 KB  0.044   0.038
256 KB  0.048   0.039

224 KB seems to have a disproportionately small amount of improvement.  I 
timed around 200 iterations on 224 KB to verify that 0.038 is the fastest 
time it could get.

Nick Glover: [EMAIL PROTECTED]
Computer Science, Clemson University
Homepage: http://hubcap.clemson.edu/~nglover/

"It's good to be open-minded, but not so open that your brains fall out." - 
Jacob Needleman


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 19 Jun 2001 11:04:10 +-200
From: Denis Cazor <[EMAIL PROTECTED]>
Subject: Mersenne: Prime95 - V21.1.1

Hello, 

At first time, I was not interrested in v21 (I only use Athlons),
but now ... so where to find v21.1.1 ?

Best regards.

Denis Cazor

Some Athlons are seeing a speed increase others are not.   The two
that I know are not enjoying a speed increase are running under Win2K.
Maybe there is a bug in the way v21.1 determines if prefetch is supported.

For those Athlon owners that are not seeing a speed boost, try setting
         CpuSupportsPrefetch=1
in local.ini.

Good luck,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers



_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 19 Jun 2002 05:57:45 -0400
From: vincent mooney <[EMAIL PROTECTED]>
Subject: Mersenne: Would you....

George, would you consider adding

Mersenne Number Tests  -  Version 21.1.1

either after the 

"Waiting xx seconds for boot to complete"  (e.g., 13 or 22 or 30 or whatever)

or after

"Resuming primality test of nnn..."

so that the user knows what version of software is running upon boot up. 

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 19 Jun 2001 14:11:09 +0200
From: Philip Heede <[EMAIL PROTECTED]>
Subject: Mersenne: v21 speed increase

For those interested:

The prefetch instructions are also applicable to AMD Duron processors.
It is appropriately autodetected without having to use the
CpuSupportsPrefetch=1 option explicitly.

For my current exponent (6280517) iteration times go from 0.082 to
0.061 on a Duron 850MHz.

- -- 
Sincerely,
Philip Heede

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 19 Jun 2001 18:57:37 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Prime95 - V21.1.1  aka v21a

On 18 Jun 2001, at 21:16, George Woltman wrote:

> Some Athlons are seeing a speed increase others are not.   The two
> that I know are not enjoying a speed increase are running under Win2K.
> Maybe there is a bug in the way v21.1 determines if prefetch is
> supported.

Possibly. But there is another possibility, which may also help 
explain why someone reported that the speedup was not consistent 
across a range of different run lengths.

(Another possibility, if you like conspiracy theories, is that the 
Intel/Microsoft alliance may have done something at Win2K to 
prejudice the performance of AMD processors!)

I believe that, in order for prefetch to work optimally on Athlon 
systems, the data to be prefetched should be aligned on 64 byte 
boundaries. This also applies to the stack segment if there is 
significant use of the stack for local storage. This is because the 
Athlon has 64 byte cache lines, not 32 byte like the PIII. The 
problem here is that, if a structure is aligned on an "odd" 32 byte 
boundary, prefetches will span cache lines and therefore be 
inefficient. On some systems where "odd" 32-byte alignments happen 
(by accident of the way the program is loaded into memory), the 
inefficiency may wipe out the gains which prefetch should achieve.

I believe that, again because the cache line length is twice as long, 
doubling the prefetch distance for Athlons should also be beneficial; 
in fact, this _quadruples_ the effective prefetch buffer, since 
prefetching each cache line twice (the second request will be 
ignored) will use up the six entry prefetch queue with only three 
actual prefetches.

There is some evidence that Athlons may benefit more from running 
_correctly tuned_ prefetch code than PIIIs.

Unfortunately I can't contribute timings to this argument as none of 
my three Athlon systems can run Prime95 (they're all linux systems).
Must dig out wine & see if Prime95 will run over wine/X/linux, though 
that's a mightly roundabout way of running what's essentially a 
console program!!!

It would also be interesting to find out if the PIII prefetch code 
benefits the AMD K6 processor. The K6-2 and K6-3 do have an Athlon-
compatible prefetch instruction, though it's using 32-byte cache 
lines and there is no prefetch queue i.e. only one prefetch can be 
active at any time.

As someone else said, all the current K7 family have the same 
instruction set and underlying architecture, though the size and 
relative speed of the L2 cache varies between original Slot A Athlon, 
Socket A "Thunderbird" Athlon and Socket A Duron devices.

BTW what happens to the prefetch instructions for those processors 
(like PII) that don't support prefetch? Replacing the prefetch opcode 
with a jump to the next instruction is probably as good a way as any 
of patching the code to accomodate these processors. With luck the 
jump will be predicted, and pipeline will have enough capacity to 
keep the execution units busy. The usual "NOP" opcodes actually do 
something like MOV EAX,EAX which consumes temporary registers and may 
even cause a register stall.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 19 Jun 2001 19:42:22 -0400
From: Nathan Russell <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Would you....

On Wed, 19 Jun 2002 05:57:45 -0400, vincent mooney
<[EMAIL PROTECTED]> wrote:

>George, would you consider adding
>
>Mersenne Number Tests  -  Version 21.1.1
>
>either after the 
>
>"Waiting xx seconds for boot to complete"  (e.g., 13 or 22 or 30 or whatever)
>
>or after
>
>"Resuming primality test of nnn..."
>
>so that the user knows what version of software is running upon boot up. 

At least in Prime95, that information is available by selecting 
HELP -> ABOUT from the menu; this method of determining the author and
version of a program is AFAIK somewhat standard.  

Nathan
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #862
******************************

Reply via email to