I'm in the process of hand-tuning a small, performance critical algorithm on a 
Z13, and I'm hampered by the lack of detailed information on the 
instruction-level performance of the machine. Back in the day, IBM used to 
publish a "Functional Characteristics" manual for each CPU model that provided 
this information; those seem to have been discontinued. I'm looking for things 
like:

* Is register renaming used for GPRs and/or FPRs? (affects the need for loop 
unrolling)
* Are there any ways to bypass the L1 cache on moves of less than a page, when 
simply moving data without looking at it?
* Does LMG/STMG outperform a linear sequence of LG/STG? Under what 
circumstances?

At the very least, compiler maintainers need this information to select the 
right instruction sequences for each model. Does anyone know of a source for 
this information, other than writing test kernels and trying things out? I 
already have the SHARE presentations by David Bond and Bob Rogers, which 
contain some of this information, but it's not enough.

I was really excited by the addition of vector registers on the Z13 (yippee, an 
additional 512 bytes of high speed scratch storage!), but the load/store 
performance hasn't turned out to be what I had hoped for. I may well be using 
the facility "the wrong way"; having detailed implementation information would 
sure help.


-- Jerry Callen
   Rocket Software

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to