Charles Mills wrote:
 > Are you sure? That's totally contrary to my impression.
> 
> There are three states for the above machine:
> 
> - both tasks waiting for I/O
> - one task waiting for I/O and the other task computing
> - either both tasks computing, or if a single CPU, one computing and the
> other ready-to-run and waiting for the CPU
> 
> Clearly processor speed is irrelevant to the first state. For the second
> state, a single, faster processor is clearly an advantage, because the
> single running task will run faster (and could not take any advantage of two
> CPUs). For the final state, you either have one task running at "200 MIPS"
> or two tasks running at "100 MIPS" - roughly equivalent situations from a
> thruput point of view. So clearly, the two 100-MIPS CPUs are no faster in
> the first state, slower in the second state, and no faster in the third
> state - and therefore almost certainly slower, not faster, overall. (Even
> before we consider the multi-processor overhead that you alluded to in your
> full post.)

for two processor SMP ... an SMP kernel can add possibly 20-30percent
overhead (your mileage may vary) compared to uniprocessor kernel running
on a single processor machine.

370s had extremely strong memory consistency and for cache operations
... a two-processor 370 SMP would run the processor hardware at 90
percent of a uniprocessor ... to allow for handling cross-cache
consistency chatter ... so the bare two-processor hardware started out
at 1.8times that of a uniprocessor. you added in typical smp kernel
overhead and a two-processor smp got something like 1.5 times the
thruput of a uniprocessor.

there were games I played with highly optimized and extremely efficient
smp kernel processing along with some games related to cache affinity
... and sometimes you could come out with thruput greater than two times
a uniprocessor (having twice the cache size and some games with cache
affinity and cache hit ratios more than compensating for the base
two-processor machine hardware running only 1.8times a single processor).

when 3081 came out it was only going to be in multiprocessor versions
(so the uniprocessor vis-a-vis multiprocessor slow-down wasn't going to
be evident). however ACP/TPF didn't have multiprocessor support and that
represented a significant customer base. Frequently you found ACP/TPF
running under VM on 3081 (solely using VM to managed two processor
operation). eventually they were forced into coming out with the single
processor 3083 ... which had individual processor that ran at nearly 15
percent faster than 3081 processor (because of the elimination of the
slow-down provisions for cross-cache chatter)

running the processors (in multiprocessor mode) at only .9 (that of
uniprocessor to allow for cross-cache chatter) was only the start. any
actual cross-cache chatter could result in even further hardware thruput
degradation. going to the four-processor 3084 ... the amount of
cross-cache chatter effects got worse (in the two-processor case, a
cache was getting signals from one other cache, in the four-processor
case, a cache was getting hit with signals from three other caches).

in that time-frame you saw both the VM and MVS kernels restructured so
that internal kernel structures and storage management was carefully
laid out on cache-line boundaries and done in multiples of cache-lines
... to reduce the impact of stuff like cache-line thrashing. That
restructuring supposedly got something like a five percent overall
system thruput increase.

there was some joke that to compensate for the smp cache effects, the
3090 caches used a machine cycle ten times faster than that of the 3090
processor machine cycle.

there can be secondary considerations. in the 158/168 time-frame ...
370/158-3 at around 1mip processing was about at the knee of the
price/technology curve. the 370/168-3 at around 3mip processing was way
past the knee of price/technology curve ... and cost significantly more
to build/manufacture.

at one point we had a project called logical machines to build a 16-way
smp using 158-3 engines ... that still cost less to manufacture (parts
plus build) than a single processor 168. we were going great guns ...
until some upper executive realized that MVS was never going to be able
to ship 16-way SMP support within the life-time of the project and
killed the effort (in part because it wouldn't look good if there was a
flagship 16-way smp hardware product and there was no MVS capable of
running on it). we had also relaxed some cache consistency requirements
... that made it much less painful getting to 16-way smp operation.

something similar to your description of machine processing states has
also been used in the past to describe the overhead of virtual machine
operation. if the guest is in wait state ... the amount of virtual
machine overhead is zero. if the guest is executing only problem state
instructions, the virtual machine overhead is zero. it isn't until you
start executing various supervisor state instructions that you start to
see various kinds of virtual machine overhead degradation.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to