(Sorry, I'm not subscribed to this list and have no clue if this reply appeares as I think it should....)
Joe Temple made some excellent points about CPU cycles and I just would like to add another thought which I think is extremely important (especially in in large SMP systems). It's the speed of electricity (or light). Electricity travels at less than 300'000KM/sec or 300'000'000meters/sec or 300'000'000'000mm/sec. So if you start to strip off zero's with kHZ, MHZ, GHZ, you find out, that an electric signal travels 30 centimeters in the time a 1 GHZ processor uses for one processing cycle (15 centimeters on a 2 GHZ processor). So packaging these processors as close as possible and packaging memory and other vital components as close as possible becomes much more important than the speed of CPU cycles. The IBM zSeries does an excellent job here with having 20 CPU's (16 for the OS, 3 for the I/O subsystem and 1 as a spare) on a 13 by 13cm substrate and all sharing the same L2 cache. Compare this to high end Unix systems with having their CPU boards METERS apart. So in my opinion, packaging is getting more important than GHZ because the speed of electricity (or light) seems to become the biggest enemy of performance improvements. This is certainly not true for single CPU bound tasks. But it is definitely true for most of commercial workloads. Kind regards, Peter Stammbach
Mark Drvodelsky wrote "But the question still does not appear to be answered - why does the mainframe have to run at such a low clock speed?"
The answer to your question has to do with how chip real estate is used. In a zSerires micro processor the primary usage of area is for large L1 caches and error detection/recovery hardware. Basically, increases in cache size result in decreases in clock rate. This is because there is more load on the critical signals. Secondly, to date the zSeries microprocessor pipleline does not do "super scalar" processing. That is it finishes 1 instruction per cyle at best. This is because it takes consideratbly more work and hardware to do mainframe style error recovery functions when more than 1 instruction can complete in a cyc;le. While super scalar execution does not help with "clock speed" it does help with cpu intense measurements like SPECint.
However,
since the cache is larger the zSeries will wait for memory less
often than other machines. Metrics like SPECint and MHz ignore cache misses. So the question becomes how much are the caches missing? The more they miss the better the zSeries looks. This is very workload dependent. One driver of cache misses is context switches; another is I/O. If you attempt to make an Intel server very busy, the cache miss rate will climb, causing throughput to saturate, unless the work is very CPU intense and cache working set per transaction or per user is very small.
The reason the Robert Nix's print server dabacle occured is that IBM made the mistake of treating Samba file/print as a single type of workload. We didn't understand at the time that a print server can behave like a network to network prototcol server. These servers actually move very little data through the cpu. Such a machine has very little context switching and
the I/O is network to network which will actually drive very little data through the caches. The combination makes the workload cpu intense and if busy a bad candidate for Linux/z. By contrast a Samba file server can be doing enough disk to network I/O which pushes more data through the caches changing blocks to packets. This can cause distributed servers can get I/O and cache bound. Samba can be either CPU or I/O intense, and the single context makes the cpu intense workloads unattractive for z particularly if the machines are busy.
So the answer to your question is that we could build a zSeries microprocessor which is "as fast as" any other processor, but to do so would cause us to lose the fundamental strengths in context switching, data caching and I/O. There is alwasy a trade off between speed and capacity. zSeries favors capacity; Intel favors speed. How much L1 cache should be given up to increase t
he clock rate? How much RAS and recovery function
should be given up to improve SPECint? We have seen this situation improve over time, and IBM will continue to improve its microprocessor design, but zSeries cannot simply abandon strength in large working set workloads to crank up the clock speed and/or instruction rate for workoads with small working sets. This particularly true when the virtualization and workload management which drive consolidation and mixed workloads is dependent on the very hardware capabilities that would have to be given up. Joe Temple [EMAIL PROTECTED] 845-435-6301 295/6301 cell 914-706-5211 home 845-338-8794Mark Darvodelsky <Mark_Darvodelsky@
royal To: [EMAIL PROTECTED]
sun.com.au> cc: Sent by: Linux on 390 Subject: Re: URGENT! really low performance. A related question... Port <[EMAIL PROTECTED] U> 02/16/2003 08:32 PM Please respond to Linux on 390 Port But the question still does not appear to be answered - why does the mainframe have to run at such a low clock speed? Perhaps someone with some hardware knowledge could explain it? Why can't the clock be cranked up to be the same speed as the latest Pentium?
;>Most of us mainframe guys understand its inherent advantages, but as
someone has already commented, it often just doesn't wash with management if a cheap Pentium outperforms a million-dollar mainframe.>> Regards. Mark Darvodelsky Data Centre - Mainframe & Facilities Royal SunAlliance Australia Phone: +61-2-99789081 Email: [EMAIL PROTECTED]
End of page
