IBM Mainframe Discussion List <IBM-MAIN@bama.ua.edu> wrote on 13/06/2011 
14:51:39:

> You can think about it this way.  Suppose you are designing a processor. 

> In order to process instructions, there are several things that have to 
be 
> done.  Something like this:

<snip good explanation of CPU design>

> That is a pipeline.  You probably realized when reading this that there 
> are situations where the pipeline has to wait for something.  For 
example, 
> if the second instruction uses the results of the first instruction, the 

> operands for the second instruction can not be fetched as described 
> above.  That causes the pipeline to stall.
> 
> Pipelines on modern processors are considerably more complex than what 
> I have described here, but I hope you get the idea.

As clock speeds go up, pipelines need to get longer in terms of the number 
of stages (6 on a z9, 14 on a z10). This has the side-effect that any 
stall in the pipeline becomes relatively more expensive as you go up the 
machine range. Adding extra cache to the hardware (3 levels on a z10, 4 
levels on a z196) gives you a better chance of having the data available 
without needing to access the relatively slow main storage.

There's a very in-depth paper available with details of the z10 processor 
design and some pretty diagrams. [1]

> Actually, it is not so much the pipeline that makes the execution times 
> for an instruction unpredictable, but the variability in the time 
required 
> to access memory.  Modern processors have small, high speed memory 
> called cache that the processor can access quickly.  If the data is not 
> there, it has to try a second or third level cache, each of which takes 
> more time to access.  If the data can not be found there, it has to be 
> fetched from main memory, which is _much_ slower.  The processor must 
> wait for it, and that wait is part of the execution time for the 
instruction 
> being processed.  Virtual memory translation also takes time, because it 

> involves even more memory accesses, but Translation Lookaside Buffers 
> reduce the need to perform page table lookups.

The z10 introduced the Relative Nest Intensity (RNI) metric [2], and this 
tries to encapsulate how well your application fits within the cache. A 
workload with a high RNI will (probably) suffer less pipeline stalls than 
one with a low RNI.

Ian.

[1] http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05388586

[2] 
https://www.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprwork?OpenDocument

-- 
Ian Burnett :: CICS TS for z/OS Performance :: ian.burn...@uk.ibm.com





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU






----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to