Charles Mills wrote: > Are you sure? That's totally contrary to my impression. > > There are three states for the above machine: > > - both tasks waiting for I/O > - one task waiting for I/O and the other task computing > - either both tasks computing, or if a single CPU, one computing and the > other ready-to-run and waiting for the CPU > > Clearly processor speed is irrelevant to the first state. For the second > state, a single, faster processor is clearly an advantage, because the > single running task will run faster (and could not take any advantage of two > CPUs). For the final state, you either have one task running at "200 MIPS" > or two tasks running at "100 MIPS" - roughly equivalent situations from a > thruput point of view. So clearly, the two 100-MIPS CPUs are no faster in > the first state, slower in the second state, and no faster in the third > state - and therefore almost certainly slower, not faster, overall. (Even > before we consider the multi-processor overhead that you alluded to in your > full post.)
for two processor SMP ... an SMP kernel can add possibly 20-30percent overhead (your mileage may vary) compared to uniprocessor kernel running on a single processor machine. 370s had extremely strong memory consistency and for cache operations ... a two-processor 370 SMP would run the processor hardware at 90 percent of a uniprocessor ... to allow for handling cross-cache consistency chatter ... so the bare two-processor hardware started out at 1.8times that of a uniprocessor. you added in typical smp kernel overhead and a two-processor smp got something like 1.5 times the thruput of a uniprocessor. there were games I played with highly optimized and extremely efficient smp kernel processing along with some games related to cache affinity ... and sometimes you could come out with thruput greater than two times a uniprocessor (having twice the cache size and some games with cache affinity and cache hit ratios more than compensating for the base two-processor machine hardware running only 1.8times a single processor). when 3081 came out it was only going to be in multiprocessor versions (so the uniprocessor vis-a-vis multiprocessor slow-down wasn't going to be evident). however ACP/TPF didn't have multiprocessor support and that represented a significant customer base. Frequently you found ACP/TPF running under VM on 3081 (solely using VM to managed two processor operation). eventually they were forced into coming out with the single processor 3083 ... which had individual processor that ran at nearly 15 percent faster than 3081 processor (because of the elimination of the slow-down provisions for cross-cache chatter) running the processors (in multiprocessor mode) at only .9 (that of uniprocessor to allow for cross-cache chatter) was only the start. any actual cross-cache chatter could result in even further hardware thruput degradation. going to the four-processor 3084 ... the amount of cross-cache chatter effects got worse (in the two-processor case, a cache was getting signals from one other cache, in the four-processor case, a cache was getting hit with signals from three other caches). in that time-frame you saw both the VM and MVS kernels restructured so that internal kernel structures and storage management was carefully laid out on cache-line boundaries and done in multiples of cache-lines ... to reduce the impact of stuff like cache-line thrashing. That restructuring supposedly got something like a five percent overall system thruput increase. there was some joke that to compensate for the smp cache effects, the 3090 caches used a machine cycle ten times faster than that of the 3090 processor machine cycle. there can be secondary considerations. in the 158/168 time-frame ... 370/158-3 at around 1mip processing was about at the knee of the price/technology curve. the 370/168-3 at around 3mip processing was way past the knee of price/technology curve ... and cost significantly more to build/manufacture. at one point we had a project called logical machines to build a 16-way smp using 158-3 engines ... that still cost less to manufacture (parts plus build) than a single processor 168. we were going great guns ... until some upper executive realized that MVS was never going to be able to ship 16-way SMP support within the life-time of the project and killed the effort (in part because it wouldn't look good if there was a flagship 16-way smp hardware product and there was no MVS capable of running on it). we had also relaxed some cache consistency requirements ... that made it much less painful getting to 16-way smp operation. something similar to your description of machine processing states has also been used in the past to describe the overhead of virtual machine operation. if the guest is in wait state ... the amount of virtual machine overhead is zero. if the guest is executing only problem state instructions, the virtual machine overhead is zero. it isn't until you start executing various supervisor state instructions that you start to see various kinds of virtual machine overhead degradation. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html