On 26 Apr 2001, at 21:23, George Woltman wrote:
>
> I seriously doubt GIMPS benchmarks will have an impact on P4 sales!
I don't know ... as I understand the situation, Intel have been
forced to drop their P4 prices way down because the units just aren't
shifting ... magazine keep printing reviews where 1.4 GHz P4 systems
have similar performance benchmark figures to 1 GHz Athlon systems,
which are about half the price, and even the average PC buyer is
starting to realize that raw clock speed is not neccessarily a good
guide to system performance.
The other factor is that most users simply do not (yet) need the raw
power available in GHz+ PCs. Those that do are often at least aware
of projects like GIMPS which can make good use of as much CPU power
as they can get. You have increased the P4 performance of Prime95 by
a factor of almost 3; at this performance level, a P4 system makes
economic sense even before the latest round of price cuts, whereas
without that performance improvement, quite frankly a power user
would have been crazy to buy a P4 system even at the less than the
price of an alternative system with a slower Athlon processor.
It certainly makes a difference as to what I will consider next time
I build myself a new system, and how I will evaluate competitive
systems next time my employer is purchasing - and, being a largish
university, that will be by the truckload.
>
> >How many data passes per iteration? I think you may be getting very
> >close to saturating PC600 memory throughput!
>
> Not even close. I use two memory passes. A 512K FFT is 4MB. Two
> reads, two writes, plus say 4MB of sin/cos data is 20MB. PC600 memory
> can deliver 2.0GB/sec. Thus, 20MB / (2.0GB/sec) is 0.01 seconds.
Ah, I thought 512K FFT might take 5 or 6 passes.
>
> Some have asked about Athlon optimizations. I'm not an expert on the
> Athlon CPU. The only change I see to make is a different memory
> layout to take advantage of its different cache layout. I suspect a
> best case improvement of 10%. That's a lot of work (for me) for a
> modest gain.
True enough. But what about implementing prefetch? The model here
would be similar to implementing prefetch for PIII, though the
details would differ. So there _might_ be a significant benefit for a
large group of non-Athlon users.
We're certainly not looking at performance times 3 here, but 10 or 20
percent is significant and valuable - _provided_ the development cost
is not too great. I certainly accept that George has the ultimate
right to target his development effort into whatever _he_ finds
"fun".
There is another downside to this - while implementing prefetch on
Athlon and PIII may be similar from the coding point of view, the
opcodes are different - so we would need yet more specific versions
of the code, which could cause a problem with distribution, and would
also need _very_ careful detection of processor type to avoid
unwanted execution of "illegal" opcodes, with the potential to hang
or crash systems. And processsor detection is hard to implement
properly because of the continual release of new types.
> AMD has committed to implementing SSE2 in a future chip.
> Then AMD users will also benefit from this new code.
Assuming the details of the implementation are reasonably compatible.
Intel might have something to say about this (like extortionate fees
for licensing the idea, even if AMD home-grow their own silicon
implementation of the SSE2 instruction set). Remember, MMX and 3DNow!
were mutually incompatible implementations of very similar extensions
to the basic Pentium instruction set.
Regards
Brian Beesley
1775*2^332181+1 is prime! (100000 digits) Discovered 22-Apr-2001
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers