What if you measured the total cpu time consumed by code such as the following 
to execute a truly huge number of only XR instructions and then divide by the 
number of XR instructions executed?  I would think that this would be the 
smallest possible time for one XR; i.e., the maximum possible pipelining with 
zero stalls. 
              LAY   R0,1000000 
              LA      R1,LOOP1 
* force alignment here to a 256-byte boundary; i.e., the length of a cache line 
LOOP1  XR    R2,R2                 first of 127 such XR instructions 
              XR    R2,R2                 second of 127 such XR instructions 
              ... 
              XR    R2,R2                127th and last of 127 such XR 
instructions 
              BCTR  R0,R1             execute the previous 127 XR instructions 
one million times 
* at this point, we have filled one cache line with 127 consecutive XR 
instructions followed by the BCTR, and all 128 of these instructions fit 
exactly within one cache line. 
         ... end of loop. 
When finished performing the loop, we will have executed 127,000,000 XR 
instructions and 1,000,000 BCTR instructions.  Ignore the time used by the BCTR 
instructions.  Divide total CPU time delta by 127,000,000 to compute the 
approximate minimum time possible to do one XR instruction. 
  
Then do the same thing for an SR, an SLR, and a LR that is loading a register 
from another register that has been previously zeroed.  This technique could 
also be done with 63 consecutive LA Rx,0 instructions. 
  
Bill Fairchild 
Nolensville, TN 
  

----- Original Message -----

From: "Christopher Y. Blaicher" <[email protected]> 
To: [email protected] 
Sent: Tuesday, June 3, 2014 9:50:34 AM 
Subject: Re: Out of Order and Superscalar - small experiment 

IBM stopped publishing instruction timings quite a while ago.  With the advent 
of multiple stage processors and the inter-dependencies of the instructions, 
timings for single instructions became meaningless.  Even for an XR or SLR a 
lot depends on what and when the register was used or will be used can affect 
what you are doing. 

Put another way, individual instructions don't matter as much as they used to.  
The sequence of instructions is much more important. 

Reply via email to