IBM Mainframe Discussion List <IBM-MAIN@bama.ua.edu> wrote on 13/06/2011 14:51:39:
> You can think about it this way. Suppose you are designing a processor. > In order to process instructions, there are several things that have to be > done. Something like this: <snip good explanation of CPU design> > That is a pipeline. You probably realized when reading this that there > are situations where the pipeline has to wait for something. For example, > if the second instruction uses the results of the first instruction, the > operands for the second instruction can not be fetched as described > above. That causes the pipeline to stall. > > Pipelines on modern processors are considerably more complex than what > I have described here, but I hope you get the idea. As clock speeds go up, pipelines need to get longer in terms of the number of stages (6 on a z9, 14 on a z10). This has the side-effect that any stall in the pipeline becomes relatively more expensive as you go up the machine range. Adding extra cache to the hardware (3 levels on a z10, 4 levels on a z196) gives you a better chance of having the data available without needing to access the relatively slow main storage. There's a very in-depth paper available with details of the z10 processor design and some pretty diagrams. [1] > Actually, it is not so much the pipeline that makes the execution times > for an instruction unpredictable, but the variability in the time required > to access memory. Modern processors have small, high speed memory > called cache that the processor can access quickly. If the data is not > there, it has to try a second or third level cache, each of which takes > more time to access. If the data can not be found there, it has to be > fetched from main memory, which is _much_ slower. The processor must > wait for it, and that wait is part of the execution time for the instruction > being processed. Virtual memory translation also takes time, because it > involves even more memory accesses, but Translation Lookaside Buffers > reduce the need to perform page table lookups. The z10 introduced the Relative Nest Intensity (RNI) metric [2], and this tries to encapsulate how well your application fits within the cache. A workload with a high RNI will (probably) suffer less pipeline stalls than one with a low RNI. Ian. [1] http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05388586 [2] https://www.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprwork?OpenDocument -- Ian Burnett :: CICS TS for z/OS Performance :: ian.burn...@uk.ibm.com Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html