Re: Mersenne: OT: P4 latencies

2001-06-23 Thread Brian J. Beesley

At 11:45 PM 6/22/2001 +0100, Michael Bell wrote:

> >But they clearly don't care about it on the P4:
> >
> >Command Ticks on P2/P3Ticks on P4
> >MOV  1   1
> >ADD/SUB  1   1
> >ADC/SBB  2   8
> >MUL  4   14-18
> >SHR/SHL  1   4

There are other things in the P4 architecture which offset this to 
some extent. Nevertheless, as I said earlier, the P4 designers had 
performance criteria weighted differently. If they could have 
squeezed a couple of million more transistors onto the die, no 
doubt they could have got the integer instructions to operate faster.

Note that the P4 was designed as a workstation processor. Processors 
designed for servers are likely to have a different performance 
weighting. I'd expect the Itanium to fly through integer work - the 
64 bit register width doubles throughput instantly, of course. 
Actually my 3 year old Alpha 21164-533 is still a most impressive 
system for pure integer work, even though it has less memory bus 
bandwidth than many current 32-bit systems and is running at what is 
no longer an impressive clock speed.

On 22 Jun 2001, at 19:32, George Woltman wrote:

> That said, the Athlon is a better performer for most applications
> today. The P4 was designed for higher clock speeds and memory
> bandwidth. Time will tell if Intel can ramp up the CPU speed to offset
> the Athlon's advantages.

Both the 1.33 GHz Athlon and the 1.7 GHz P4 have maximum power 
consumptions in excess of 60W and are known to be hard to cool 
effectively. In fact the main reason the P4 can run faster is that 
the die is larger, giving more surface area to conduct heat away. 
Significantly faster parts are going to depend on reducing the power 
consumption; both AMD and Intel are planning to shift from 0.18 
micron to 0.13 micron masks, which will help a lot in this respect.
Intel are ahead months ahead of AMD in making the change to 0.13 
micron technology; in fact I believe Intel may already be shipping 
0.13 micron PIII "Tutalin" processors, though I don't know where to 
find motherboards which support the lower operating voltages 
required.


Regards
Brian Beesley
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: OT: P4 latencies

2001-06-22 Thread George Woltman

Hi,

At 11:45 PM 6/22/2001 +0100, Michael Bell wrote:
>But they clearly don't care about it on the P4:
>
>Command Ticks on P2/P3Ticks on P4
>MOV  1   1
>ADD/SUB  1   1
>ADC/SBB  2   8
>MUL  4   14-18
>SHR/SHL  1   4

Evaluating an architecture based on just latencies can be very misleading.
For example the P4 has longish latencies on the SSE2 instructions that
prime95 uses (6 for a load, 6 for a mul, 4 for an add).  However, there is
enough parallelism available that these latencies are almost completely
hidden.

Other architecture features that may well be more important:
Cache structure & memory bandwidth
Number of registers (to help programmer expose parallelism)
Branch prediction and miss penalties.
And many, many more.

That said, the Athlon is a better performer for most applications today.
The P4 was designed for higher clock speeds and memory bandwidth.
Time will tell if Intel can ramp up the CPU speed to offset the Athlon's
advantages.

Regards,
George

_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers