On 12/23/2015 7:46 AM, Jerry Callen wrote:
I'm in the process of hand-tuning a small, performance critical algorithm on a Z13, and
I'm hampered by the lack of detailed information on the instruction-level performance of
the machine. Back in the day, IBM used to publish a "Functional
Characteristics" manual for each CPU model that provided this information; those
seem to have been discontinued. I'm looking for things like:
* Is register renaming used for GPRs and/or FPRs? (affects the need for loop
unrolling)
* Are there any ways to bypass the L1 cache on moves of less than a page, when
simply moving data without looking at it?
* Does LMG/STMG outperform a linear sequence of LG/STG? Under what
circumstances?
At the very least, compiler maintainers need this information to select the
right instruction sequences for each model. Does anyone know of a source for
this information, other than writing test kernels and trying things out? I
already have the SHARE presentations by David Bond and Bob Rogers, which
contain some of this information, but it's not enough.
I was really excited by the addition of vector registers on the Z13 (yippee, an
additional 512 bytes of high speed scratch storage!), but the load/store performance
hasn't turned out to be what I had hoped for. I may well be using the facility "the
wrong way"; having detailed implementation information would sure help.
There is an instruction-level performance benchmark/report function in
recent zHISR releases. I'm not sure, but you might need a later release
than what's currently installed at your location to get that feature.
--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN