(Sorry, I'm not subscribed to this list and have no clue if this reply
appeares as I think it should....)

Joe Temple made some excellent points about CPU cycles and I just would
like to add another thought which I think is extremely important
(especially in in large
SMP systems). It's the speed of electricity (or light). Electricity
travels at less than 300'000KM/sec or 300'000'000meters/sec or
300'000'000'000mm/sec. So if you start to strip off zero's with
kHZ, MHZ, GHZ, you find out, that an electric signal travels 30 centimeters
in the time a 1 GHZ processor uses for one processing cycle (15 centimeters
on a 2 GHZ processor). So packaging these processors as close as possible
and packaging memory and other vital components as close as possible
becomes much more important than the speed of CPU cycles. The IBM zSeries
does an excellent job here with having 20 CPU's (16 for the OS, 3 for
the I/O subsystem and 1 as a spare) on a 13 by 13cm substrate and all sharing
the same L2 cache. Compare this to high end Unix systems with having their
CPU boards METERS apart. So in my opinion, packaging is getting more important
than GHZ because the speed of electricity (or light) seems to become the biggest enemy
of performance improvements. This is certainly not true for single CPU
bound tasks. But it is definitely true for most of commercial workloads.

Kind regards, Peter Stammbach



Mark Drvodelsky wrote "But the question still does not appear to be
answered - why does the
mainframe have to run at such a low clock speed?"

The answer to your question has to do with how chip real estate is used.
In a zSerires micro processor the primary usage of area is for large L1
caches and error detection/recovery hardware.  Basically, increases in
cache size result in decreases in clock rate.  This is because there is
more load on the critical signals.  Secondly, to date the zSeries
microprocessor pipleline does not do "super scalar"  processing.  That is
it finishes 1 instruction per cyle at best.  This is because it takes
consideratbly more work and hardware to do mainframe style error recovery
functions when more than 1 instruction can complete in a cyc;le.  While
super scalar execution does not help with "clock speed" it does help with
cpu intense measurements like SPECint.

However,
since the cache is larger the zSeries will wait for memory less
often than other machines.    Metrics like SPECint and MHz ignore cache
misses. So the question becomes how much are the caches missing?  The more
they miss the better the zSeries looks.  This is very workload dependent.
One driver of cache misses is context switches; another is I/O.   If you
attempt to make an Intel server very busy,  the cache miss rate will climb,
causing throughput to saturate, unless the work is very CPU intense and
cache working set per transaction or per user is very small.

The reason the Robert Nix's print server dabacle occured is that IBM made
the mistake of treating Samba file/print as a single type of workload. We
didn't understand at the time that a print server can behave like a network
to network  prototcol server.  These servers  actually move very little
data through the cpu.  Such a machine has very little context switching and

the I/O is network to network which will actually drive very little data
through the caches.  The combination makes the workload cpu intense and if
busy a bad candidate for Linux/z.   By contrast a Samba file server can be
doing enough disk to network I/O which pushes more data through the caches
changing blocks to packets.  This can cause distributed servers can get I/O
and cache bound.  Samba can be either  CPU or  I/O intense, and the single
context makes the cpu intense workloads unattractive for z particularly if
the machines are busy.

So the answer to your question is that we could build a zSeries
microprocessor which is "as fast as"  any other processor,  but to do so
would cause us to lose the fundamental strengths in context switching, data
caching and I/O.  There is alwasy a trade off between  speed and capacity.
zSeries favors capacity; Intel favors speed.  How much L1 cache should be
given up to increase t
he clock rate?  How much RAS and recovery function
should be given up to improve SPECint?   We have seen this situation
improve over time, and IBM will continue to improve its microprocessor
design, but zSeries cannot simply abandon strength in large working set
workloads to crank up the clock speed and/or instruction rate for workoads
with small working sets.  This particularly true when the virtualization
and workload management which drive consolidation and mixed workloads is
dependent on the very hardware capabilities that would have to be given up.


Joe Temple
[EMAIL PROTECTED]
845-435-6301  295/6301   cell 914-706-5211 home 845-338-8794



                     Mark Darvodelsky
                     <Mark_Darvodelsky@
royal        To:       [EMAIL PROTECTED]
                     sun.com.au>                    cc:
                     Sent by: Linux on 390          Subject:  Re: URGENT! really low performance. A related question...
                     Port
                     <[EMAIL PROTECTED]
                     U>


                     02/16/2003 08:32 PM
                     Please respond to Linux
                     on 390 Port






But the question still does not appear to be answered - why does the
mainframe have to run at such a low clock speed?

Perhaps someone with some hardware knowledge could explain it? Why can't
the clock be cranked up to be the same speed as the latest Pentium?


;>Most of us mainframe guys understand its inherent advantages, but as
someone has already commented, it often just doesn't wash with management
if a cheap Pentium outperforms a million-dollar mainframe.>>
Regards.
Mark Darvodelsky
Data Centre - Mainframe & Facilities
Royal SunAlliance Australia
Phone: +61-2-99789081
Email: [EMAIL PROTECTED]

End of page

Reply via email to