On 12/23/2015 7:46 AM, Jerry Callen wrote:
I'm in the process of hand-tuning a small, performance critical algorithm on a Z13, and 
I'm hampered by the lack of detailed information on the instruction-level performance of 
the machine. Back in the day, IBM used to publish a "Functional 
Characteristics" manual for each CPU model that provided this information; those 
seem to have been discontinued. I'm looking for things like:

* Is register renaming used for GPRs and/or FPRs? (affects the need for loop 
unrolling)
* Are there any ways to bypass the L1 cache on moves of less than a page, when 
simply moving data without looking at it?
* Does LMG/STMG outperform a linear sequence of LG/STG? Under what 
circumstances?

At the very least, compiler maintainers need this information to select the 
right instruction sequences for each model. Does anyone know of a source for 
this information, other than writing test kernels and trying things out? I 
already have the SHARE presentations by David Bond and Bob Rogers, which 
contain some of this information, but it's not enough.

I was really excited by the addition of vector registers on the Z13 (yippee, an 
additional 512 bytes of high speed scratch storage!), but the load/store performance 
hasn't turned out to be what I had hoped for. I may well be using the facility "the 
wrong way"; having detailed implementation information would sure help.

There is an instruction-level performance benchmark/report function in recent zHISR releases. I'm not sure, but you might need a later release than what's currently installed at your location to get that feature.

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to