mw...@ssfcu.org (Ward, Mike S) writes: > This is one area where I really have a problem. It used to be back in > the 370 days that if a machine was rated at 50 mips and you moved up > to 100 mips you really noticed the difference in execution time. Today > if you have a 100 mip machine (I know they're rated at msu's not mips) > and you moved up to a dual with 160 mips you might be cutting your own > throat. They may give you 2 processors each rated at 80 mips for a > total of 160 mips. If your workload is such that it can't take > advantage of dual processors then you have just dropped down to an 80 > mip machine when you used to have a 100 mip machine. I know I'm on a > rant, but it happened to up and we were being pressured by the vendor > to go to the dual processor and that we would be very happy. We > weren't. (end of rant)
370s & for a few generations ... going from uniprocessor to dual-processor started off by slowing machine cycle of each processor down by 10% ... bascially allowing caches a little headroom to handle cross-cache invalidations from the other cache (store through processor caches, every store operation would also involve sending invalidation signal to the other cache for that cache line). So basic two-processor hardware ran at 1.8 times a single processor. Then operating system multiprocessor overhead would increase (back when single processor MVS "capture ratio" could be 50%) ... leaving even less cycles for application execution ... aka same exact 10mip uniprocessor would only start out only being 9mip processor in two processor mode. Note that actual handling of cross-cache invalidation was over&above the 10% processor cycle slowdown (in real live operation, 10mip process running at 9mips ... would actually effectively have less than 9mips, further reduced by multiprocessor operating system overhead & cache overhead of handling cross-cache invalidation signals). strategy with 3081 was to never again to offer single process at the high-end. this ran into a couple problems ... clone processor vendors were offering uniprocessor and ACP/TPF didn't have multiprocessor support. All sorts of unnatural acts were done to try an make a 3081 acceptable to ACP/TPF (and head off customer base all moving to clone processors). this is besides the issues outline here about comparison between 3081 and clone processors: http://www.jfsowa.com/computer/memo125.htm eventually there was 3083 (in large part for the ACP/TPF market) which was created by removing a processor from a 3081 (which is not as simple as you might think, processor 0 was at the top of the frame, so processor 1 in the middle of the frame would be the one removed ... but that made the frame dangerously top-heavy). Being only single processor, turning off the cross-cache 10% slowdown made the processor nearly 15% faster (than a processor in 3081). combining two 3081s together for a four-process 3084 was big challenge ... singe it met that each processor cache would be getting cross-cache invalidation signals from three other caches (not just one). kernel storage use became significant ... so operating systems running on 3084 were cache-line sensitised ... all kernel storage was changed to align on cache-line boundaries and be multiples of cache-lines. The problem was that if the start of end of one storage location was at the start of a cache-line and the start of a different storage location was at the end of the same cache-line ... the two different storage locations could be in use by different processors simultaneously. However, it represents only a single storage block for cache management ... and could result in cache "thrashing". The storage cache sensitivity change is claimed to improve 3084 throughput by 5-6% (minimizing cache line thrashing). However, higher-end 370s processor throughput was quite sensitive to cache hit ratios ... which would be seriously affected by high-rate of asynchronous i/o interrupts. For my "resource manager" ... I did some hacks (at high i/o rates) turning off enabling for I/O interrupts for periods of time and then draining all pending I/O interrupts. I could demonstrate aggregate higher throughput (even I/O throughput) ... since the batching of I/O interrupts would have much higher processor throughput (because of better cache hit ratio) ... offsetting any delay in taking the interrupt (note part of 370/xa was attempting to address same issue with various kinds of i/o queuing in the hardware). When I first did two-processor 370 support ... I was able to deploy in production environemnt ... two processors running more than twice MIP rate as single processor ... including processor cycle only running at .9 that of single processor. Some games with cache affinity allowed improved cache hit ratio ... which more than offset the 10% slowdown in processor cycle. -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN