Brian Inglis <[EMAIL PROTECTED]> writes: > Might be relevant if Lynn Wheeler could expand on the unreleased VAMPS > microcode to speed up 370 SMP, and also provided logical processors > with similarities to those on current zSeries LPARs, although that may > just have dropped parts of 370 sequential code down into microcode.
so presumably this recent post vis-a-vis vamps and the later i432 http://www.garlic.com/~lynn/2006n.html#42 Why is zSeries so CPU poor? misc. collected past vamps postings http://www.garlic.com/~lynn/subtopic.html#bounce early microcode effort was "VMA" original for 370/158 that helped virtual machine performance. for subset of "supervisor" state instructions, microcode was added to execute the instruction using "virtual machine" rules (to avoid interrupting into the virtual machine hypervisor where the instruction was simulated). concurrent with VAMPS effort was "ECPS" for 370 138&148. ECPS did some more stuff like VMA on the 158 (direct supervisor state instruction execution) ... but it also identified parts of the hypervisor kernel and moved that kernel code into microcode. the issue on 138&148 machines was that there was an avg. of 10:1 microcode instructions executed for every 370 instruction. Much of the kernel code moved to microcode on straigh 1:1 basis resulting in ten times performance speed up. old posting identifying specific kernel code segments for migrating into microcode. http://www.garlic.com/~lynn/94.html#21 370 ECPS VM microcode assist the VMA-related efforts eventually evolved into SIE ... where nearly all supervisor state instructions had microcode enhancement for directly executing with regard to virtual machine rules (avoiding a lot of interruption into virtual machine hypervisor to simulate supervisor state instructions). SIE was a state change instruction that gathered up all the fields needed by various supervisor state instructions to execute according to "virtual machine" rules. post of old SIE discussion about implementation issue differences between 3081 and "trout" (3090) http://www.garlic.com/~lynn/2006j.html#27 virtual memory there were still things like page faults for the virtual machine that resulted in interruptions into the hypervisor kernel for handling. a special case was defined involving things like dedicated real storage for a virtual machine ... eliminating need to interrupt into the hypervisor kernel. This resulted in being able to operate a virtual machine subset directly supported by hardware ... w/o the need for a virtual machine kernel. This was called "PR/SM" ... and PR/SM capability eventually evolved into the current LPARs (logical partitions). a reference discussing some current LPAR and PR/SM http://researchweb.watson.ibm.com/journal/rd/483/siegel.html current machines can have a configurable limited number of LPARs ... and it is possible to run a virtual machine hypervisor in an LPAR, which in turns supports a much larger number of virtual machines. The has been an evoluation of the SIE support. Initially, SIE was not virtualized but LPARs make use of SIE for support. That met that a virtual machine hypervisor running in an LPAR wouldn't have performance assist of SIE for running its virtual machines (all virtual machine supervisor instructions would interrupt into the hypervisor for simulation). Enhancements were required to virtualize SIE for at least one level (so it could be used both by LPAR function and also by hypervisor running in an LPAR). Since I was doing both VAMPS and ECPS ... I borrowed a lot of stuff done for ECPS for doing VAMPS. However, for VAMPS, I wanted it extended in a much more architected way ... rather than simply doing a 1-fo-1 movement of existing kernel 370 code into microcode. VAMPS was to have up to five processors ... and I defined a microcode hardware queued work interface where the hypervisor put units of work on the queued work interface (and the microcode took the queued work and executed on whatever available processor there were). The hardeware microcode also placed queued work for the hypervisor to handle ... like things that were i/o interrupts in traditional 370 or page fault interrupts (from executing virtual machines), etc. The VAMPS abstraction of queued work for multiprocessor environment was somewhat akin to the later defintion found later in i432. Some of the VAMPS abstraction for i/o work queueing was somewhat akin to what showed up later for 370-xa i/o operations. After VAMPS was killed, I adapted the multiprocessing microcode queued processing to an software implementation. A lot of the SMP kernel implementations used a single, global kernel SPIN lock to serialize all kernel execution. This drastically minimized the amount of code changes to adapt a single-processor operating system to support a multiprocessor operation. In adapting the VAMPs multiprocessing microcode support to software, I took the equivalent kernel software functions (that had been moved to microcode in VAMPS) and made them multiprocessing parallelized with fine-grain locking. This amounted to the majority of the software kernel execution time ... but a relatively small amount of the total kernel instructions. The majority of the kernel instructions relied on a somewhat traditional global kernel lock. However, when ever the "parallized" kernel code required to transition into the "sequential" kernel code ... rather than "spinning" on the global kernel lock ... it "bounced". If it obtained the global kernel lock, then it proceeded as normal. If it couldn't obtain the global kernel lock, it would queue a super lightweight work request ... and go off and look for other "parallelized" work. This approach obtained almost all the thruput benefit of having a kernel fine-grain locking implementation, avoided the degradation of single kernel spin-lock implementation ... but the kernel code changes were not significantly more than required for a single kernel spin-lock implementation. This implementation shipped in VM370 release four.